Ruijin Yang Thesis

GRAMMAR AND VOCABULARY TESTING
IN THE SENIOR HIGH SCHOOL ENTRANCE

ENGLISH TEST IN CHINA: A WASHBACK
STUDY FROM A LEARNING ORIENTED
ASSESSMENT PERSPECTIVE
Ruijin Yang
BA (English for Medical Purposes)
MA (Foreign Linguistics and Applied Linguistics)
Submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy
School of Teacher Education and Leadership

Faculty of Education
Queensland University of Technology
2020
Keywords
English Curriculum Standards for Compulsory Education (ECSCE), English as

a Foreign Language (EFL), grammar and vocabulary testing, Learning Oriented
Assessment (LOA), Senior High School Entrance English Test (SHSEET), washback,
washback intensity, washback value
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective i
Abstract
High-stakes standardised English tests may impact significantly on

corresponding teaching and learning practices, which can lead to positive or negative
washback. This exploratory sequential mixed methods research (MMR) study set in
Chongqing, China examined the influence of the grammar and vocabulary testing in
the Senior High School Entrance English Test (SHSEET) on Grade 9 teaching and
learning of English as a Foreign Language (EFL). Due to its gatekeeping role, the
SHSEET is used for both junior high school graduation and senior high school
enrolment. Moreover, as the use of separate grammar and vocabulary tasks in
summative assessment goes against current trends towards more integrative language
assessment tasks, it is thus important to explore the washback of the separate testing
of grammar and vocabulary knowledge in four tasks in the Grammar and Vocabulary
Test in the SHSEET (hereafter the GVT). To explore the potential for positive
washback at the macro level of the exam system and the micro level of teaching and
learning practices, Learning Oriented Assessment (LOA) theories were employed to
frame the study. Against this backdrop, this study explores the washback of the GVT
on teaching as well as learning and the opportunities for and challenges of the
incorporation of LOA principles in GVT preparation in junior high schools in China.
The exploratory sequential MMR design in this study contained both the
qualitative phase of 15 classroom observations, three teacher semi-structured
interviews, and three student focus groups and the quantitative phase of a student
survey (N=922). Both qualitative data and quantitative data were collected from Grade
9 teachers and students and were combined to address research questions. The findings
of this study revealed the complexity of washback phenomena in the GVT context.
Regarding washback value, both positive and negative washback were identified.
At the macro level of washback value, negative washback outweighed positive
washback since teachers tended to not implement the English Curriculum Standards
for Compulsory Education (ECSCE) principles in Grade 9 and a narrowing of the
curriculum phenomenon was evident in GVT preparation. At the micro level, it was
found that the GVT exerted both positive and negative washback on teaching and
learning in various aspects, including participants’ perceptions of test design
ii Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Study from A Learning Oriented Assessment Perspective
characteristics, affective factors, test preparation materials, and grammar and
vocabulary learning strategies. As for washback intensity, qualitative findings revealed
both in-class and extra-curricular GVT preparation practices. Quantitative results
gained from Multiple Correspondence Analysis (MCA) identified four different
patterns of washback intensity of GVT preparation. In further investigation of the
washback mechanism through Structural Equation Modelling (SEM), results showed
that students’ perceptions of test design characteristics and test importance indirectly
influence their test preparation through affective factors of test anxiety and intrinsic
motivation. In turn, students’ test preparation appeared to be associated with their self-
reported SHSEET scores.
In relation to the LOA opportunities and challenges, various perceptions and

practises were identified from the data. In general, teachers and students believed that
the GVT offered opportunities for incorporating LOA principles in test preparation,
and they used identifiable LOA strategies during GVT preparation. A further
investigation of identifiable LOA practices in Confirmatory Factor Analysis (CFA)
showed that Learning Oriented Assessment was indeed a dynamic and
multidimensional construct in GVT preparation, which consisted of classroom
interactions, involvement in assessment, learner autonomy, and feedback. However,
both teachers and students perceived different challenges of the incorporation of LOA
principles in GVT preparation.
This study is significant from multiple perspectives. At the theoretical level, the
study suggests that LOA theories can be applied in washback studies, especially in
exploring positive washback results in high-stakes standardised EFL test preparation
context. Methodologically, this MMR study is, to the best of the researcher’s
knowledge, the first trial of combining both thematic analyses and various advanced
statistical modelling, which includes MCA, SEM, CFA, and a robust survey design in
one single washback study. As a contribution to practice, suggestions and implications
for promoting positive washback and LOA practices during test preparation are
provided to inform in-service teacher training, Grade 9 teaching and learning, and the
GVT test design. The study contributes to research on reconciling the tension between
assessment, teaching, and learning in a summative assessment context.
from A Learning Oriented Assessment Perspective iii
Table of Contents
Keywords .................................................................................................................................. i
Abstract .................................................................................................................................... ii
Table of Contents .................................................................................................................... iv
List of Figures ....................................................................................................................... viii
List of Tables ............................................................................................................................ x
List of Abbreviations .............................................................................................................. xii
Statement of Original Authorship ......................................................................................... xiv
Acknowledgements ................................................................................................................ xv
Chapter 1: Introduction ...................................................................................... 1
1.1 Background .................................................................................................................... 1
1.2 Context ........................................................................................................................... 2
1.2.1 The Education system in China ........................................................................... 3
1.2.2 English as a Foreign Language Education in China and the English
Curriculum Standards for Compulsory Education ............................................... 6
1.2.3 The role of standardised English exams in China .............................................. 11
1.2.4 Overview of the SHSEET .................................................................................. 12
1.2.5 Grammar and Vocabulary Testing in the SHSEET ........................................... 14
1.3 Aims of the Study ........................................................................................................ 16
1.3.1 Research objective ............................................................................................. 16
1.3.2 Research questions............................................................................................. 16
1.4 Significance of the Research ........................................................................................ 17
1.5 Thesis Outline .............................................................................................................. 17
Chapter 2: Literature Review ........................................................................... 19
2.1 Standardised English Language Tests .......................................................................... 19
2.1.1 High-stakes standardised English language tests in the international
context................................................................................................................ 20
2.1.2 High-stakes standardised English language tests in China ................................ 21
2.1.3 Empirical studies of the SHSEET ...................................................................... 22
2.1.4 Section summary................................................................................................ 24
2.2 English Grammar and Vocabulary Testing .................................................................. 24
2.2.1 English grammar testing .................................................................................... 25
2.2.2 English vocabulary testing ................................................................................. 28
2.2.3 The testing of English grammar and vocabulary ............................................... 31
2.2.4 Section summary................................................................................................ 33
2.3 Washback ..................................................................................................................... 33
2.3.1 Washback concepts and dimensions .................................................................. 34
2.3.2 Working towards positive washback ................................................................. 37
2.3.3 A new approach to positive washback: Learning Oriented Assessment............ 38
2.3.4 Washback and stakeholders ............................................................................... 39
2.3.5 Washback of high-stakes standardised English tests ......................................... 41
2.3.6 Section summary................................................................................................ 51
iv Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
2.4 Summary and Implications ...........................................................................................51
Chapter 3: Theoretical Framework ................................................................. 53
3.1 Key Developments in Washback Theorisation .............................................................53
3.1.1 Washback Hypothesis ........................................................................................53
3.1.2 A curriculum innovation model..........................................................................55
3.1.3 Washback models of learning and teaching .......................................................55
3.2 A Washback Model Incorporating Intensity and Direction ..........................................57
3.3 Washback Mechanism ..................................................................................................61
3.4 Key Learning Oriented Assessment Frameworks ........................................................63
3.4.1 The LOA framework of Carless .........................................................................64
3.4.2 The LOA cycle developed by Cambridge English Language Assessment ........66
3.4.3 The LOA cycle in the SHSEET context .............................................................69
3.5 A New Washback Model Incorporating LOA ..............................................................71
3.6 Chapter Summary .........................................................................................................72
Chapter 4: Methodology.................................................................................... 75
4.1 The Methodological Review of Washback Studies ......................................................75
4.2 Mixed Methods Research .............................................................................................77
4.3 An Exploratory Sequential Mixed Methods Research Design .....................................79
4.4 Qualitative Phase ..........................................................................................................82
4.4.1 Site selection.......................................................................................................82
4.4.2 Participants .........................................................................................................84
4.4.3 Classroom observations ......................................................................................85
4.4.4 Interviews ...........................................................................................................87
4.4.5 Transcription and translation ..............................................................................92
4.4.6 Thematic analysis ...............................................................................................93
4.4.7 Validity and reliability........................................................................................95
4.5 Quantitative Phase ........................................................................................................98
4.5.1 Instrument design and development ...................................................................99
4.5.2 Pilot study .........................................................................................................101
4.5.3 Main study ........................................................................................................104
4.6 Ethical Considerations ................................................................................................118
4.7 Chapter Summary .......................................................................................................119
Chapter 5: Test Preparation: Washback Value ............................................ 121
5.1 Understanding and Use of Official Test Reference Documents .................................122
5.1.1 Understanding the role of official test reference documents ............................123
5.1.2 Implementing the principles in the ECSCE and using the Test
Specifications as test preparation reference......................................................125
5.2 Perceptions of Test Design Characteristics ................................................................129
5.2.1 Authenticity ......................................................................................................131
5.2.2 Provision of context..........................................................................................133
5.2.3 Test method ......................................................................................................134
5.2.4 Assessing language use ....................................................................................138
5.2.5 Perceptions of GVT design characteristics as measured in the student
survey ...............................................................................................................141
5.3 Affective Factors ........................................................................................................144
5.3.1 Test anxiety ......................................................................................................145
from A Learning Oriented Assessment Perspective v
5.3.2 Intrinsic motivation .......................................................................................... 148
5.3.3 Extrinsic motivation......................................................................................... 149
5.4 Test Preparation Materials ......................................................................................... 152
5.4.1 Exam-oriented test preparation materials ........................................................ 152
5.4.2 Non-exam oriented learning materials ............................................................. 157
5.5 Grammar and Vocabulary Learning Strategies .......................................................... 158
5.5.1 Test-use oriented grammar and vocabulary learning strategies ....................... 160
5.5.2 Language-use oriented grammar and vocabulary learning strategies .............. 167
5.6 Chapter Summary ...................................................................................................... 170
Chapter 6: Test Preparation: Washback Intensity ....................................... 173
6.1 Perception of Test Importance ................................................................................... 174
6.1.1 The GVT is perceived as highly important ...................................................... 175
6.1.2 The GVT is perceived as relatively unimportant ............................................. 177
6.1.3 Perceptions of test importance as measured in the student survey .................. 178
6.2 Perception of Test Difficulty...................................................................................... 179
6.3 Test Preparation Effort ............................................................................................... 182
6.3.1 In-class test preparation effort ......................................................................... 182
6.3.2 Extra-curricular test preparation effort ............................................................ 184
6.3.3 Test preparation effort as tested in the student survey ..................................... 185
6.4 Washback Intensity: Multiple Correspondence Analysis .......................................... 187
6.5 Washback Mechanism: Structural Equation Modelling............................................. 192
6.6 Chapter Summary ...................................................................................................... 199
Chapter 7: The Incorporation of LOA Principles: Opportunities and
Challenges in GVT Preparation............................................................................ 203
7.1 Beliefs about Opportunities for the Incorporation of LOA Principles in GVT
Preparation ........................................................................................................................... 203
7.1.1 Alignment with students’ EFL learning stage ................................................. 206
7.1.2 Developing communication abilities in real life .............................................. 207
7.1.3 Developing students’ learning skills in general ............................................... 208
7.1.4 Learning-oriented test design........................................................................... 209
7.1.5 Transferring language knowledge into performance on macroskills ............... 209
7.1.6 The level of challenge ...................................................................................... 210
7.2 Identifiable LOA Strategies and Activities ................................................................ 211
7.2.1 Interactive classroom activities ........................................................................ 212
7.2.2 Feedback .......................................................................................................... 219
7.2.3 Learning-oriented strategies ............................................................................ 223
7.3 Learning Oriented Assessment as a Dynamic Multidimensional Construct in GVT
Preparation ........................................................................................................................... 232
7.4 Learning Oriented Assessment Practices in GVT Preparation and Student Test
Performance ......................................................................................................................... 235
7.5 Challenges of the Incorporation of LOA Principles in GVT Preparation .................. 236
7.5.1 Efficient use of class time ................................................................................ 237
7.5.2 The consideration of high test stakes ............................................................... 239
7.5.3 Administrative influence.................................................................................. 241
7.5.4 Student language proficiency........................................................................... 242
7.5.5 Class size ......................................................................................................... 244
7.5.6 The concern over teaching performance .......................................................... 245
vi Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
7.5.7 Limited teaching experiences and expertise .....................................................245
7.5.8 Test method ......................................................................................................246
7.6 Chapter Summary .......................................................................................................248
Chapter 8: Discussion and Conclusion .......................................................... 251
8.1 Discussion ...................................................................................................................251
8.1.1 Washback value ................................................................................................251
8.1.2 Washback intensity...........................................................................................259
8.1.3 Washback mechanism ......................................................................................262
8.1.4 LOA opportunities and challenges ...................................................................264
8.1.5 Section summary ..............................................................................................269
8.2 Contributions and Implications...................................................................................270
8.2.1 Theoretical contributions ..................................................................................270
8.2.2 Methodological contributions ...........................................................................272
8.2.3 Implications for practice ...................................................................................273
8.2.4 Section summary ..............................................................................................278
8.3 Reflection....................................................................................................................279
8.4 Limitations ..................................................................................................................280
8.5 Future Directions ........................................................................................................281
8.6 Overall Conclusions of the Study ...............................................................................284
Bibliography ........................................................................................................... 287
Appendices .............................................................................................................. 313
Appendix A Language Knowledge Requirement at Level 5 in the ECSCE .........................313
Appendix B Test Item Examples of the GVT from the Authentic 2018 SHSEET Paper
(Chongqing, Paper A) ...........................................................................................................314
Appendix C Table of Empirical Washback Studies of High-stakes Standardised English
Tests in the International Context .........................................................................................319
Appendix D Table of Empirical Washback Studies of High-stakes Standardised English
Tests in China .......................................................................................................................322
Appendix E Classroom Observation Scheme .......................................................................325
Appendix F Semi-structured Interview Protocol ..................................................................326
Appendix G Focus Group Interview Protocol ......................................................................330
Appendix H Transcription Symbols Used in This Study (adapted from Powers (2005)) .....333
Appendix I Student Survey ...................................................................................................334
Appendix J Independent Samples T-test Results of the Main Study ....................................347
Appendix K Descriptive Statistics of Indicators in the Main Study Instrument ...................349
Appendix L Summary of Factor Analysis Results for Main Study Instrument ....................351
Appendix M Qualitative Results to RQ1a ............................................................................354
Appendix N Ethics Form for the Online Student Survey .....................................................355
from A Learning Oriented Assessment Perspective vii
List of Figures
Figure 1.1. The structure of curriculum objectives in the English Curriculum

Standards for Compulsory Education (ECSCE) ............................................ 9
Figure 2.1. A framework for vocabulary testing (Read & Chapelle, 2001, p.
10) ................................................................................................................ 31
Figure 2.2. Influential factors for washback from empirical studies ......................... 50
Figure 3.1. Washback: a curriculum innovation model (Burrows, 2004, p.126) ...... 55
Figure 3.2. Washback models of learning and teaching (Shih, 2007, p. 151;
2009, p. 199) ................................................................................................ 56
Figure 3.3. Model of washback, incorporating intensity and direction (Green,
2007a, p. 24)................................................................................................. 58
Figure 3.4. The washback model of the GVT (adapted from Green (2007a)) .......... 60
Figure 3.5. The LOA framework proposed by Carless (2007) .................................. 65
Figure 3.6. Evidence for learning – the LOA cycle (Jones & Saville, 2016) ............ 66
Figure 3.7. The LOA cycle in the SHSEET context (adapted from Jones and
Saville (2016)).............................................................................................. 70
Figure 3.8. A new washback model incorporating LOA (Green, 2007a;
Carless, 2007; Jones & Saville, 2016) ......................................................... 71
Figure 4.1. An exploratory sequential design in the present study ............................ 79
Figure 4.2. Flowchart of the exploratory sequential MMR design in this study ....... 81
Figure 4.3. The process of thematic analysis (Braun & Clarke, 2006)...................... 94
Figure 4.4. Test paper taken by participants ............................................................ 106
Figure 4.5. Gender distribution ................................................................................ 107
Figure 4.6. Distribution of school district ................................................................ 107
Figure 4.7. Distribution of participants’ SHSEET test scores ................................. 108
Figure 4.8. Scree plot of the motivation construct ................................................... 113
Figure 4.9. Measurement model for the motivation construct ................................. 114
Figure 4.10. Modified measurement model for the motivation construct ............... 115
Figure 4.11. Washback mechanism of the GVT ...................................................... 117
Figure 5.1. Focus of the new washback model in Chapter Five (Carless, 2007;
Green, 2007a; Jones & Saville, 2016)........................................................ 121
Figure 6.1. Focus of the new washback model in Chapter Six (Carless, 2007;
Green, 2007a; Jones & Saville, 2016)........................................................ 173
Figure 6.2. Indicators display (measures of discrimination) ................................... 189
Figure 6.3. Washback intensity patterns of the GVT .............................................. 189
Figure 6.4. Washback intensity by Green (2007a) .................................................. 191
viii Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Figure 6.5. Structural model for the relationship within GVT washback
mechanism (N=922) .................................................................................. 194
Figure 6.6. The structural relationships within the measurement model of the
GVT washback mechanism ....................................................................... 199
Figure 6.7. The GVT washback model on teaching and learning ........................... 202
Figure 7.1. Structural model for the relationship within LOA practices in GVT
preparation (N=488) .................................................................................. 233
Figure 8.1. LOA dynamic in the GVT context ........................................................ 268
Figure 8.2. Washback model of the GVT ................................................................ 270
from A Learning Oriented Assessment Perspective ix
List of Tables
Table 1.1 Four types of academic junior high schools ................................................ 4

Table 1.2 Major standardised entrance tests in China ................................................ 5
Table 1.3 Composition of 2018 SHSEET (Chongqing) test paper ............................. 13
Table 3.1 Washback Hypothesis (Alderson & Hamp-Lyons, 1996; Alderson &
Wall, 1993)................................................................................................... 54
Table 4.1 Qualitative and quantitative methods in the present study ........................ 78
Table 4.2 Summary of participant involvement in qualitative phase ......................... 82
Table 4.3 Information on the participating schools ................................................... 83
Table 4.4 Information on the participating students from School A .......................... 89
Table 4.5 Information on the participating students from School B .......................... 90
Table 4.6 Information on the participating students from School C .......................... 90
Table 4.7 Methods for increasing reliability in the present study ............................. 96
Table 4.8 Methods for increasing reliability in the present study ............................. 96
Table 4.9 Co-coding results for qualitative transcripts ............................................. 97
Table 4.10 Summary results of internal consistency reliability test of the pilot
instrument................................................................................................... 102
Table 4.11 Item-Total Statistics of construct reliability of negative perceptions
of test design characteristics ...................................................................... 103
Table 4.12 Summary results of internal consistency reliability test of the main
instrument................................................................................................... 109
Table 4.13 Item-total statistics of construct reliability of extrinsic motivation ....... 109
Table 4.14 Correlation matrix for the indicators within the motivation
construct ..................................................................................................... 111
Table 4.15 KMO and Bartlett’s test for the indicators within the motivation
construct ..................................................................................................... 111
Table 4.16 Communalities for the indicators within the motivation construct ........ 112
Table 4.17 Rotated factor matrix of the motivation construct ................................. 112
Table 4.18 Total variance of the motivation construct explained by its
indicators ................................................................................................... 113
Table 4.19 Assessment of normality for the indicators within the construct
motivation................................................................................................... 114
Table 4.20 The overall research procedure for the present study ........................... 119
Table 5.1 Information on the participating teachers ............................................... 123
Table 5.2 Student Perceptions of GVT design characteristics(see instrument
reliability and validity in section 4.5.3) ..................................................... 142
x Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Table 5.3 Indicators of test anxiety in the GVT context (see instrument
Table 5.4 Indicators of intrinsic motivation (see instrument reliability and
validity in section 4.5.3) ............................................................................. 149
Table 5.5 Indicators of extrinsic motivation (see instrument reliability and
Table 5.6 Indicators of test-use oriented learning strategies (see instrument
Table 5.7 Indicators of language-use oriented learning strategies (see
instrument reliability and validity in section 4.5.3) ................................... 169
Table 6.1 Indicators of test importance (see instrument reliability and validity
in section 4.5.3) .......................................................................................... 178
Table 6.2 GVT task types and perceptions of test difficulty (see instrument
Table 6.3 Number of test papers taken for GVT tasks (see instrument reliability
and validity in section 4.5.3) ...................................................................... 185
Table 6.4 Time spent on preparing for GVT tasks (see instrument reliability
Table 6.5 Model summary of washback intensity .................................................... 187
Table 6.6 Discrimination of variables for the dimensions ....................................... 188
Table 6.7 Standardised path coefficients of the structural model of the GVT
washback mechanism ................................................................................. 195
Table 6.8 Qualitative results to RQ1b ..................................................................... 200
Table 7.1 Indicators of classroom interaction (see instrument reliability and
Table 7.2 Indicators of feedback (see instrument reliability and validity in
section 4.5.3) .............................................................................................. 222
Table 7.3 Indicators of learner autonomy (see instrument reliability and
Table 7.4 Indicators of involvement in assessment (see instrument reliability
Table 7.5 Correlation coefficients between SHSEET score and LOA practices
(N=922) ..................................................................................................... 235
Table 7.6 Qualitative findings of RQ2 ..................................................................... 249
from A Learning Oriented Assessment Perspective xi
List of Abbreviations
AfL Assessment for Learning
CEFR Common European Framework of Reference for languages
CET College English Test
CFA Confirmatory Factor Analysis
CFI Comparative Fit Index
CLT communicative language teaching
CNKI China National Knowledge Infrastructure
COET Computerised Oral English Test
EAP English for Academic Purposes
ECSCE English Curriculum Standards for Compulsory Education
EFA Exploratory Factor Analysis
EFL English as a Foreign Language
ESL English as a Second Language
GEPT General English Proficiency Test
GP cloze Gap-filling cloze
GSEEE Graduate School Entrance English Examination
GVT the Grammar and Vocabulary Test in the SHSEET
Ha Alternative Hypothesis
H0 Null Hypothesis
IB CET-4 Internet-based College English Test (Band 4)
IELTS International English Language Testing System
IFI Incremental Fit Index
KMO Kaiser-Meyer-Olkin
L1 first language
xii Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
L2 second language
LOA Learning Oriented Assessment
LOLA Learning Oriented Language Assessment
MCA Multiple Correspondence Analysis
MCQ Multiple Choice Question
MMR mixed methods research
MOE Ministry of Education
NFI Normed Fit Index
NME National Matriculation Examination
NMET National Matriculation English Test
NPEE National Postgraduate Entrance Examination
PEP People’s Education Press
RFI Relative Fit Index
RMSEA Root Mean Square Error of Approximation
SEM Structural Equation Modelling
SHSEE Senior High School Entrance Examination
SHSEET Senior High School Entrance English Test
SMC Square multiple correlations
TEM (-8) Test for English Majors (Band 8)
TLI Tucker-Lewis Index
TLU target language use
TOEFL iBT TOEFL internet-based test
VKS Vocabulary Knowledge Scale
VLT Vocabulary Levels Test
WAT Word Associates Test
ZDP zone of proximal development
from A Learning Oriented Assessment Perspective xiii
Statement of Original Authorship
The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the best
of my knowledge and belief, the thesis contains no material previously published or
written by another person except where due reference is made.
Signature: OUT Verified Signature
Date: 14- /o'd / ")-0),o
xiv Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: AWashback
Acknowledgements
Under the sponsorship of China Scholarship Council and Queensland University

of Technology (QUT) Top-up scholarships, I completed this wonderful PhD journey.
There are gains and pains during this process, but I appreciate my spirit along this
journey. Most importantly, as this thesis aimed to explore, my PhD study is also a
learning-oriented process, an unforgettable journey to myself!
First and foremost, I would like to express my special appreciation to my

supervisors, Dr. Lyn May and Dr. Guanglun Michael Mu. Thanks for bringing me into
the deeper engagement with the scholarship and the academic field. Thanks for
encouraging me to build up my confidence and giving me support and guidance
throughout this PhD journey. I feel lucky to have both of you on my supervisory team
and this PhD is a rewarding journey for me as I have learned a lot from you in both
expertise and supervision. I can still remember the first day when we met in person at
B Block. Your warm smiles and kind welcome inspired me at the beginning of this
journey. Now, I’m at the end of this journey, I would like to take this opportunity to
thank you, Lyn, for your encouragement and your insightful feedback on my thesis
and the PhD project. Thanks very much for guiding me through a quality qualitative
research phase. I am also grateful to Michael, thanks for your wisdom which guided
me through the quantitative analyses and triggered my great interest in qualitative
research. Thanks for your care, I have spent a lovely time with you and our Chinese
community at QUT and in Brisbane. Special thanks also go to my Master supervisor
Professor Xingdong Gu (辜向东), who kindly supported my PhD data collection and
keeps inspiring me to pursue higher in the academic life. Thank you all, my great
supervisors!
Secondly, I would also like to take this great opportunity to thank my participants
and those who helped me in data collection. I have learned and known more about
junior high school teaching and learning from your kind suggestions. Thanks, my three
participant teachers, without you, this study could not proceed smoothly. Special
thanks should go to Zhang (pseudonym) who always answered my questions during
thesis writing and provided me with important information. You are like a “big sister”
of mine and I feel grateful for that. Thank you to the English inspectors who helped
from A Learning Oriented Assessment Perspective xv
me contact schools and my survey participants who contributed their time to my study.
Thanks for your great support!
Thirdly, my sincere thanks are given to my friends and officemates who

supported me and brought me happiness during my PhD journey. I would like to
specifically express my appreciation to Zahid, my “big brother” from Bangladesh.
Thank you for sharing rich experiences of life and research, and thank you for your
delicious curry and pie during the past three years. Thanks to my great friends, Pauline,
Zinpai, Ayomi, and Gladys, thanks for being with me during this journey and feeding
me during my thesis writing. Thanks to Dr. Yue Yin, Dr. Liwei Liu, Dr. Jennifer Smith,
Dr. Lynn Downes, Mary, Pengfei, Danwei, Congcong, Santi, Imali, Michelle, Hoi, and
Aminath, who shared lunches and encouragement with me. Thanks to my friends from
afar, Huiling Qin and Xiaohui Dong, thanks for keeping me updated and checking on
me regularly. Specifically, thanks to my possum friends in the office whose joyful
steps on the roof accompanied me during my night writing. Thanks to my lovely
curlew friends outside J Block, who I always met on the way home at midnight. Thanks
all for your friendship and accompany, without you, this PhD journey would not have
been as wonderful as it is.
Next, I would like to thank the Faculty of Education at QUT, especially for
offering me the great opportunity to go to the University of Calgary with excellent
professors, Professor Suzanne Carrington and Professor Karen Dooley; and my
colleagues and friends, Bridget, Ayomi, and Jonathan. It was a lovely and
unforgettable trip and I will always remember these happy moments with you in
Canada. Thank you, my faculty, thanks for your culture, numerous workshops, and the
academic environment where I have learned a lot and been able to improve myself.
Last but not least, my greatest thanks go to my beloved families. Primarily, I

would like to thank my daughter Yiyang Geng (耿懿杨) for this important event in my
life. I am sorry that I left you before you turned half a year old, but I am always thinking
of you when I have happiness or sorrow in my PhD. I feel grateful and thankful for
you being my baby girl and growing up without me aside. Sometimes, you asked me
why mom is not at home and could not play with you, and you might have wondered
why mom cannot attend every important milestone in your life. I know it will take time
for you to understand the importance of this PhD study to me, but I won’t regret this
long journey for me as I know it is both important for me and for you and other
xvi Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
families. Besides, I would like to thank my in-laws, thanks very much for your kind
and generous support and care for Yiyang. Without you, I could hardly feel so
determined to continue my PhD study. My husband Yuefei Geng (耿跃飞), thanks for
our love and support for each other over the past fourteen years. Thanks for respect
my choice and support my decisions. My PhD is completed under both your and
Yiyang’s moral support. Moreover, thanks to my own parents and siblings who
supported me and encouraged me all the time. Special thanks go to my father, who
motivated and inspired me to pursue for a PhD degree when I started my
undergraduate. Finally, I also owe many thanks to my grandmother who gave me great
care and warmest love in my childhood. You supported me mentally when I feel I am
going to collapse. Thanks to all my families, my achievement goes to all of you!
This PhD study offered me a lot. As my thesis title says, I have gone through a
learning-oriented PhD study project and I would like to keep learning and being
learning-oriented in my future life and profession. Thank you, Ruijin, thanks for being
learning-oriented for all those years and thanks for your hard work to make this PhD
dream come true!
from A Learning Oriented Assessment Perspective xvii
Chapter 1: Introduction
This study investigates the positive and negative washback of the grammar and
vocabulary testing in the Senior High School Entrance English Test (SHSEET) from
a Learning Oriented Assessment (LOA) perspective in junior high schools in China.
To begin with, washback refers to the test influence on teaching (McNamara, 1996),
learning (Shohamy et al., 1996) or on teaching and learning (Bailey, 1996). Generally,
when the Grammar and Vocabulary Test in the SHSEET (referred to as the GVT from
this point onwards) brings about positive influence such as motivating students to
spend more time on language learning, positive washback will occur; when the test
brings about negative influence such as spending excessive time on test-related
exercises, negative washback will occur. In the GVT context, when the emphasis of
teaching and learning English as a Foreign Language (EFL) in junior high schools is
on language use and promoting communicative language teaching, it is perceived as
an indication of positive washback and thus the fulfilment of the intended washback
of the GVT. However, if the classroom teaching and learning are dominated by a focus
on rote-learning language knowledge and test-driven practices which are regarded as
undermining the intentions of curriculum developers, then the exam is understood as
failing to direct the EFL teaching and learning in a positive way.
This opening chapter sets the scene for the study. Section 1.1 addresses the
research background. Section 1.2 presents the relevant contextual information. Section
1.3 delineates the research aims and questions, which is followed by the significance
of the study in section 1.4. The chapter concludes with an overview of the structure of
the thesis in section 1.5.
1.1 BACKGROUND
High-stakes standardised English tests can have a profound impact on EFL

teaching and learning practices (Cheng, 1998, 1999; Watanabe, 1996a). This impact
can create a dissonance in contexts such as the People’s Republic of China (hereafter
China), where communicative language teaching (CLT) has been stipulated and
prioritised in policy and curriculum, but the grammar-translation methods have been
predominant in EFL teaching for decades (Pan & Qian, 2017).
Chapter 1: Introduction 1
This study is situated in the high-stakes context of the Senior High School
Entrance Examination (SHSEE), success in which qualifies test-takers for entry to
senior high schools. English is one of the seven subjects tested in the SHSEE, and the
English test is called the Senior High School Entrance English Test (SHSEET).
Despite its selective nature, the SHSEET is intended to bring about changes to the
existing exam-oriented education system and facilitate the aims of learner-centred
English instruction and student development in junior high schools (Ministry of
Education, 2011). In practice, however, whether the SHSEET meets the Ministry’s
expectations remains largely unknown.
The CLT-oriented curriculum has seen the focus of test tasks shifting from a
simple display of language knowledge to the assessment of the ability to use language.
Against this backdrop, the present study aims to examine the washback of the GVT of
which the multiple-choice tasks have received particular attention and are criticised as
reported by Xu’s study (as cited in Pan & Qian, 2017) from a Learning Oriented
Assessment (LOA) perspective. In the proposed study, washback is explored through
LOA theory which stresses the synergy between instruction, testing, and learning
(Turner & Purpura, 2016) and intends to promote positive washback from an
examination provider’s viewpoint (Saville & Salamoura, 2014). To this end, teachers’
and students’ perceptions and the underlying factors influencing their perceptions are
explored to provide empirical evidence for test developers and education authorities
to reflect on the test design and EFL teaching and learning at junior high schools. The
present study therefore demonstrates the washback of the GVT and realises the
potential of using LOA theory in a high-stakes standardised English test context as a
lens to explore the potential for positive washback.
1.2 CONTEXT
English, as a lingua franca (Phillipson, 2009; Seidlhofer, 2005), has claimed

predominance in foreign language education in almost every non-English speaking
country. The spread of English is closely linked to globalisation (Fan, 2015; Salverda,
2002), thus the prioritisation of the teaching and learning of communicative English is
essential if countries are to prosper in the global context. This is no exception for China
where EFL teaching and learning have long been included in formal education. As
early as 1904, Chinese students had opportunities to learn a foreign language from
secondary school onwards, and later, they were required to learn a foreign language
2 Chapter 1: Introduction
(principally English this time) from primary school onwards in 1912 (Education
History Research Group in Teaching Materials Research Institute, 2008). Over the past
decades, different theories and approaches such as CLT (Yu, 2001) and the product-
oriented approach (POA) (Wen, 2018) have been imported or developed to inform
EFL teaching in China. This indicates that EFL teaching in China has endeavoured to
move towards a communicative-oriented direction and to promote learning. Under the
influence of the fast-developing English education in China, English tests continue to
be important, denoting that high-stakes tests exacerbate pressures such as
standardisation, measurement, and accountability (Barksdale-Ladd & Thomas, 2000).
Worldwide, China has the largest education market (Wang, 2003),

accommodating the largest population of EFL learners and test-takers (Cheng, 2008a).
Among the 400 million EFL learners in China (Zhi & Wang, 2019), 86 million are
high school students (Ministry of Education, 2019). They are studying a foreign
language as a compulsory subject, and English now precedes any other foreign
languages in Chinese education (Wu, 2017).
Selective examinations have long prevailed in the Chinese education system

(Cheng, 2008). Competition, accountability, marketisation, performance-based
evaluation, and standardised testing are becoming increasingly visible in Chinese
educational contexts (Wang, 2014; Wang & Mok, 2013), reflecting factors such as the
global impact of neoliberalism (Guo et al., 2019; McChesney, 1999). In this neoliberal
context, the high-stakes nature of the SHSEET and its impact on EFL teaching and
learning come to the fore. Accordingly, the study examines the washback of the GVT
on the EFL teaching and learning at junior high schools. The subsequent sections
provide an overview of the Chinese education system, EFL teaching and learning and
English curriculum for the compulsory education, as well as the role of high-stakes
standardised English tests in China. The SHSEET and the GVT are also introduced.
1.2.1 The Education system in China

In China, the Ministry of Education (MOE) gives the overall direction and
guidance, and the provincial-level education department or commission and county-
level education authorities implement strategies, policies, rules, and regulations
(Ministry of Education, 2015). According to different degree levels, the education
system can be divided into four stages: (1) higher education, including both
postgraduate (PhD and Master) and undergraduate levels, (2) senior high school level
from Grade 10 to Grade 12, (3) compulsory education level, which is composed of
junior high school and primary school education, and (4) pre-school level
(kindergarten). According to the Compulsory Education Law in China (Ministry of
Education, 2006), compulsory education consists of a total of nine years of schooling,
commonly including a six-year primary school education and a three-year junior high
school education. The present study investigates the washback of the GVT on teachers
and students who are preparing for the entry to academic senior high schools; hence,
adult junior high schools go beyond the scope of the study. In China, four types of
academic schools offer junior high school education (see Table 1.1).
Table 1.1
Four types of academic junior high schools
School type Years of program Grades of program

Regular junior high schools 3 Grade 7-9
9-year schools 9 Grade 1-9
12-year schools 12 Grade 1-12
Combined schools 6 Grade 7-12
Schools shown in Table 1.1 can also be divided into key schools and non-key
schools. Compared to non-key schools, key schools receive more government funds
and have more high-achieving students as well as teachers with higher education
background. However, according to the Compulsory Education Law of the People’s
Republic of China (Ministry of Education, 2006), the distinction between the terms of
key and the non-key is not suggested since the government is trying to narrow the
achievement gap between schools. Despite the policy-level requirement, the terms of
“key schools” and “non-key schools” are still used both in research and in practice.
As China has a long education history, testing in China can be traced back to the
Han Dynasty (202 BC - 220 AD) when exams were used to select civil servants (Tan,
2020). Since then, exams have played a major role in selecting qualified candidates to
enter the next level of education. Internationally, Chinese students have excellent
academic performance in high-stakes tests such as the Programme for International
Student Assessment (PISA). According to 2018 statistics, the Chinese mainland cities
of Beijing, Shanghai, Jiangsu, and Zhejiang ranked first in PISA results; however, the
well-being of students was lower than average (OECD, 2019). Domestically, students
are also engaged with countless exams in their school life. Currently, four major
standardised test batteries are administered across different education levels in China,
which are summarised in Table 1.2. The highly selective function of the education
system brings about the result that the higher the education level, the smaller the
number of students, and only a limited number of students can finally reach the highest
position in the system by achieving success in various competitive examinations (Qi,
2010).
Table 1.2
Major standardised entrance tests in China
Degree Test (paper test) Time Subjects tested in the main test
Admission units are in charge of
The PhD Student Twice in a year,
the test design and
Entrance every year in
PhD administration, common subjects:
Examination March and
Politics, Foreign language, and
(PSEE) October/November
Professional Subjects
The National The weekend
Four subjects: Politics, Foreign
Postgraduate before 23rd Dec.
Language,
Master Entrance in Chinese lunar
Mathematics/Professional
Examination calendar (two
Subject 1, Professional Subject 2
(NPEE) days), annually
Four subjects: Chinese,
The National Mathematics, Foreign Language
Matriculation 7th-8th June, (mainly Englisha), Social
Undergraduate
Examination annually Sciences (Politics, History and
(NME) Geography)/Natural Sciences
(Physics, Chemistry and Biology)
Flexible due to different region:
The Senior High mainly Chinese, Mathematics,
Senior high School Entrance Foreign Language (mainly
June, annually
school Examination English), Politics, History,
(SHSEE) Geography, Biology, Physics,
and Chemistry
Note. a Other Foreign Languages include Russian, French, Japanese, German, and Spanish.
The present study is situated in the SHSEE context. At the time of graduating
from junior high schools, students need to take two types of standardised tests: one for
obtaining the graduation certificate or school leaving purpose (huikao) and the other
being the Senior High School Entrance Examination (SHSEE, zhongkao). In fact, the
school-leaving certificate test and the SHSEE are undergoing major changes, and it is
common in some places that these two tests have long been combined as one (Ministry
of Education, 1999). Nonetheless, in some regions, the two tests are still organised
separately, and the SHSEE is administered by local education authorities at the
provincial- or county-level under the guidance of the MOE.
In effect, not every junior high school graduate can continue to three years of
formal study in academic senior high schools. According to the statistics from the
MOE, although the overall promotion rate of junior high school graduates has steadily
increased and now remains stable, many of them have to enter the job market or be
enrolled in vocational schools which are career-oriented. Taking Chongqing as an
example, the total number of SHSEE test-takers was 305,000 in 2018; however, the
planned enrolment for regular junior high schools was 195,000 students, and the rest
of them might attend vocational junior high schools (Chongqing Municipal People's
Government Network, 2018). In contrast to the National Matriculation Examination
(NME) which allows higher education institutes to recruit students nationally, junior
high school graduates usually attend a senior high school close to their residence,
known as the catchment area. The quality of schools in the same catchment area often
differs, sometimes dramatically. Students with better SHSEE outcomes are more
competitive for admission to better senior high schools. In this way, the SHSEE
becomes the yardstick for senior high school entrance. As acknowledged earlier,
English is one of the major subjects tested in the SHSEE, and the SHSEET is therefore
high-stakes in nature. By researching Grade 9 teachers’ and students’ (14 or 15 years
old) SHSEET test preparation experiences and test perceptions, the study is expected
to provide valuable insight into and implications for the study of grammar and
vocabulary during SHSEET preparation and compulsory education years.
1.2.2 English as a Foreign Language Education in China and the English

Curriculum Standards for Compulsory Education
The inclusion of English language as a subject in formal Chinese education has
gone through several main stages (Gu, 2012; Shi, 2001; Wang, 2007). The first is the
so-called Soviet Stage (1949-1956). Due to close political connection to the then
Soviet Union, Russian dominated the foreign language teaching and learning in
Chinese schools after the foundation of the new China in 1949; while during the same
era, EFL had a peripheral status and was even abolished because of the political
antagonism between China and the West. The second stage refers to the overall
exploration stage (1956-1966) when English gained renewed attention and replaced
Russian as the major foreign language in schools. Third, during the 10-year period of
the Cultural Revolution (1966-1976), higher education was disrupted, and English
language learning was neglected (Sun, 2010). Fourth, after the Cultural Revolution,
both overall education and EFL education in China were restored and have undergone
modernisation (from 1977 to 1993) and globalisation (from 1993 onwards) (Adamson,
2004).
From 1978 onwards, English language education has become a priority in

national education development since China launched a modernisation program under
the leadership of Deng Xiaoping (Hu, 2005a). This major decision has put foreign
language learning, especially English, as the priority in China’s education. As
explained by Jin (2017, p. 9), the Reform and Opening-up Policy set the aim of EFL
education as “developing learners’ communicative competence in English”, thus it
practically encouraged EFL learners to develop the ability to use English for authentic
communication. Furthermore, in recent years “quality education (suzhi jiaoyu)”, which
is the new focus of China’s education, is becoming significant at all levels (Dello-
Iacovo, 2009) since it has a crucial role in realising the modernisation of China (Hu,
2005b). Quality education “encompasses a range of educational ideas, but generally
refers to a more holistic style of education which centres on the whole person” (Dello-
Iacovo, 2009, p. 241). It puts students and learning at the centre of education and
becomes the overarching goal of curriculum development, EFL education included.
These education reforms relating to EFL are influenced by both political and
economic factors. To keep up with educational as well as policy changes and respond
to globalisation, the official commencement of foreign language (mainly English) as a
mandatory subject in compulsory education shifted from junior middle school level
(Grade 7, 12 years old) to primary school level (Grade 3, 9 years old) in 2001 (Wang,
2007). Currently, the importance of foreign language (mainly English) is clear at
various education levels (see Table 1.2). Most importantly, in contemporary China,
English ability is not only accentuated in the academic context but also greatly valued
by people working in government, education, and research sectors who need to seek
promotion opportunities in their professions (He, 2001).
In line with the changing status of EFL education in China, the English
curriculum/syllabus used in primary and secondary school education has experienced
more than ten changes and revisions during the past century (Education History
Research Group in Teaching Materials Research Institute, 2008). Together with other
18 curriculum standards for school subjects, the most recent version of the English
curriculum for the compulsory education is the English Curriculum Standards for
Compulsory Education (ECSCE), which was issued by the MOE in 2011 and
implemented from September 2012. The ECSCE, which the SHSEET has to abide by,
is a national curriculum and the major reference for test designers, classroom teachers,
and students to conduct English test designing, teaching, and learning activities during
compulsory education years (from Grade 3 to Grade 9). However, it is often perceived
that the lack of effective communication among curriculum developers, classroom
teachers, and test designers hampers the successful relationship between assessment
and instruction. This phenomenon is referred to as a “curriculum sandwich” (Gu, 2012,
p. 48). As a result, each group holds a different understanding of the curriculum and
thus implements the curriculum separately according to their own knowledge. The
“sandwich” separation model contributes to the discrepancy between those groups. In
this regard, understanding the major characteristics and requirements of the ECSCE is
a priority for this washback study to clarify the guiding principles for test design.
The overarching aim of the ECSCE is to promote students’ overall ability in

language use, which is exemplified in the role of formative assessment or Assessment
for Learning (AfL) and reflected in the communicative competence focus of a quality-
education-oriented curriculum (Gu, 2012; Hu, 2005b; Wang, 2007). To fulfil the
overarching objective, the ECSCE defines five specific objectives, namely language
skills, language knowledge, cultural understanding, affective attitudes, and learning
strategies, to be developed comprehensively (Wang, 2007). The relationship between
the five objectives and the overall objective of English education is depicted in the
ECSCE framework (see Figure 1.1 on the next page) (Ministry of Education, 2011).
The ECSCE clearly defines five levels of requirements to achieve each objective.
These levels run through the nine-year compulsory education, with Level 2 set as the
basic requirement for Grade 6 graduates in primary schools, and Level 3 to 5 set as the
requirement for the three grades in junior high schools respectively. Level 5 is
therefore regarded as the guiding interpretation for students’ performance in the
SHSEET. In addition, the ECSCE lists the requirements at each level, similar to the
“can-do” statements in the Common European Framework of Reference for languages
(CEFR). These requirements give test designers, teachers, and students a clear
reference to assess learners’ English abilities.
Figure 1.1. The structure of curriculum objectives in the English Curriculum Standards for
Compulsory Education (ECSCE)
Furthermore, the link between EFL education and assessment is intended to be

strengthened by encouraging both formative and summative assessment. Accordingly,
the ECSCE has developed a new assessment system that addresses the following nine
recommendations (Ministry of Education, 2011, pp. 34-38):
• Making full use of the leading role of assessment to achieve positive results
for different stakeholder groups;
• Reflecting students’ central role in the assessment;
• Developing assessment content and criteria based on ECSCE;
• Paying attention to assessment appropriateness and diversity;
• Helping monitor and improve the process of teaching and learning by using
formative assessment;
• Focusing on the examination of students’ overall ability to use language by

using summative assessment;
• Paying attention to the relationship between instruction and assessment;
• Prioritising student motivation in primary schools;
• Designing and implementing an appropriate graduation exam for junior high

school graduates (e.g., the SHSEET).
Thus, performance on formative assessment and learning progress rather than

the single use of an exam is assumed to be central (Wang, 2007). As a result, the
ECSCE promotes a quality education and reduces the selective function of summative
assessment (the SHSEET in this case), aiming to achieve a positive impact on teaching
and learning (Gu, 2012). In order to achieve the goal of quality education, both students
and learning have become the foci of implementing the ECSCE.
Furthermore, to explain the ninth suggestion regarding the SHSEET, another

four principles are stipulated in the ECSCE to guide test design (Ministry of Education,
2011, p. 38):
• Determining the test content and criteria according to the ECSCE;
• Emphasising the assessment of students’ overall ability to use language and

avoiding a simple testing of language knowledge;
• Taking full consideration of students’ lives and their wellbeing;
• Selecting real and authentic language texts and designing test tasks
according to the real language use context.
The emphasis on the overall ability to use language coincides with international
CLT trends as well as the concept of AfL and LOA theories, indicating the learner-
centred orientation of the ECSCE. It is thus necessary to investigate the real classroom
context and examine the implementation of those test design standards through key
stakeholders’ perspectives. Therein lies the focus of the current study: the actual
washback of the GVT and the potential for the GVT preparation to incorporate LOA
principles from teachers’ and students’ perspectives. The reason for focusing on the
GVT will then be discussed in section 1.2.5.
Moreover, it is believed that an essential aspect to consider when researching the

washback of an exam is to check what is required and stated in “the official statements
about the goals of the examination and of the textbook series it was meant to reinforce”
(Wall & Alderson, 1993, p. 44). Therefore, in this study, the requirements in ECSCE
and Test Specifications for SHSEET (hereafter Test Specifications) are essential for
examining SHSEET washback. As Test Specifications takes ECSCE as its reference,
it therefore follows the curriculum to focus more on students and their learning.
Additionally, SHSEET grammar and vocabulary test scope in Test Specifications has
been drawn up to meet the language knowledge objective defined by ECSCE Level 5
(see Appendix A). In sum, the intention of the curriculum developers is to bring about
a positive influence on EFL teaching and learning by shifting the teaching and
assessment emphases from formal linguistic knowledge to language use and practice.
Therefore, it is crucial to obtain empirical data to understand the extent to which this
intention has been realised and further explore the learning-oriented possibility of the
GVT.
1.2.3 The role of standardised English exams in China

The widespread teaching of EFL in schools has been accompanied by an
increasing number of language tests in China. The importance of English exams in
China is reflected through the assessment of a foreign language (mainly English) in all
high-stakes standardised test batteries at different education levels. The most well-
known English tests with a selective function are the Graduate School Entrance
English Examination (GSEEE) for postgraduates at the national level, the National
Matriculation English Test (NMET) for senior high school graduates at the national
level, and the SHSEET for junior high school graduates at the municipal level. In
addition, other tests are designed to examine the language proficiency of EFL learners,
such as the College English Test (CET) for non-English major undergraduate and
postgraduate students, the Test for English Majors (TEM) for English major students
in universities, and the Public English Test (PET) for the general public. Irrespective
of their differences in form and content, these tests function to evaluate language
proficiency and/or provide a criterion for promotion in both academic and professional
contexts (Cheng, 2008a; Cheng & Curtis, 2010; Dello-Iacovo, 2009). Regardless of
the various functions of those standardised English tests, it is believed that test
designers, such as the NMET designers, intended to bring about positive washback by
shifting EFL teaching and learning from a focus on linguistic knowledge to a focus on
the ability to use language (Li, 1990; Qi, 2005).
Nonetheless, these English tests, due to their gatekeeping roles in the education
system, have been criticised for bringing about negative washback through impeding
curriculum implementation and reforms (Dello-Iacovo, 2009). As mentioned above,
tests like the SHSEET are an important symbol of the exam-oriented education in
China. These tests are at loggerheads with the nationwide implementation of
significant curriculum reforms that aim to shift test focus to the ability to use language
rather than rote memorisation. As Spolsky (1995) pointed out, test designers expect to
use exams for directing classroom activities; however, exams narrow down the
education process, limiting the focus of teaching and learning to what is to be tested.
Therefore, to achieve desirable test results, test preparation practices such as rote
memorisation and mechanical drilling often dominate the teaching process and finally
result in a failure in fulfilling the positive intentions of the curriculum standards (Qi,
2005) such as the ECSCE. Herein lies the significance of the present study, which
investigates the washback of the GVT on teaching and learning in order to determine
the opportunities and challenges concerning English instruction and assessment. The
next section provides a focused introduction to the SHSEET that is pertinent to various
stakeholders including policymakers, curriculum developers, test designers, classroom
teachers, and students.
1.2.4 Overview of the SHSEET

The nine-year compulsory schooling requires students to study English for at
least seven years and expects students’ overall English ability to reach Level 5 as set
in the ECSCE. In this sense, the SHSEET not only evaluates English learning during
compulsory education years but also assesses students’ English language proficiency.
Outcomes of the SHSEET are crucial for students, especially those who aim to pursue
their studies in academic senior high schools. Generally, these students only have one
chance to sit for the SHSEET. Therefore, the SHSEET not only is an achievement test
but also has a gatekeeping function for these students.
The structure of the SHSEET in Chongqing – the data collection site for the
current PhD project – is presented in Table 1.3. The total score for the SHSEET in
Chongqing is 150 marks, with 98 marks allocated to Paper I and 52 marks allocated to
Paper II. The test duration is two hours.
Table 1.3
Composition of 2018 SHSEET (Chongqing) test paper
No. of Weighting
Components Test content Test method Marks
items (%)
Paper I
Listening Multiple choice
I. Listening question 20 30 20
(MCQ)
II. Multiple Grammar and
Choice Questions vocabulary MCQ 15a 15 10
(MCQ)
Grammar and
III. Cloze MCQ 10 15 10
vocabulary
IV. Reading Reading
MCQ 15 30 20
Comprehension
V. Oral Test Speaking MCQ 5 5 3
Paper II
VI. Task-based Reading Open-ended
4 9 6
reading questions
VII. Sentence Grammar and
Gap filling 5 10 7
Completion vocabulary
VIII. Gap-filling Grammar and Open-ended
8 16 11
cloze vocabulary gap filling
IX. Writing Task Writing Guided writing 1 20 13
Total 83 150 100
Note. a The total mark for this task started to change from 2017, previously it was 20 marks in total. In
2017, it was 18 marks.
Although test items in the SHSEET test papers in each province or county vary,
the tests are all guided by the ECSCE Level 5 requirements. They all have similar test
components as depicted in Table 1.3. Paper I mainly contains fixed multiple-choice
tasks and Paper II contains constructed-response tasks. The four macro language skills,
namely listening, speaking, reading, and writing, are tested in the SHSEET, and the
test tasks are assumed to evaluate learners’ overall ability to use language according
to the ECSCE.
This washback study takes Chongqing as the research site due to two major
reasons. First, the researcher obtained her Master’s degree in Chongqing University,
which ensures her familiarity with the city and access to schools. Second, as one of the
four municipalities in China, Chongqing has a massive education population and rapid
economic development. As the largest and most populous municipality, Chongqing is
located in the southwest of China. It has 26 districts, eight counties, and four
autonomous counties (Chongqing Municipal People’s Government, 2015), with a total
population of 33.89 million and about 19.70 million people live in the urban areas in
2017 (National Bureau of Statistics of China, 2018). In 2017, 98.69% of the SHSEE
test-takers moved on to the next education level, with 63.9% attending academic senior
high schools (Chongqing Municipal People's Government Network, 2018).
The test administration time in Chongqing is from 12th to 14th June every year,
and English is tested on the morning of 14th June. Two sets of SHSEET test papers
(Paper A and Paper B) are designed every year in Chongqing. The nine main districts
and seven districts or counties that take Paper A in 2017 constitute the so-called “joint
area”, and the other districts and counties which use Paper B are not included in the
joint area (Chongqing Zhongkao, 2017). Since the study mainly collected qualitative
data from the joint area, Paper A was first analysed before data collection started.
Nonetheless, Paper B was also consulted as these two test papers remain similar in
most aspects.
1.2.5 Grammar and Vocabulary Testing in the SHSEET

As the two major building blocks of any human language, the instruction of
grammar and vocabulary skills are regarded as essential in language learning. In fact,
the teaching and assessment of grammar and vocabulary have long been emphasised
in the Chinese EFL context. The knowledge of English grammar and vocabulary
required for compulsory education years has been explicitly defined in the ECSCE at
two levels (Level 2 and Level 5), which are assumed to be used as the aim for
classroom instruction and the criteria for English assessment. This washback study has
chosen the GVT mainly due to its potential impact on teaching and learning, test
purpose and content, and test methods.
First, whether the GVT mitigates against the CLT objective and learning-
oriented aims of the ECSCE and whether it directs EFL teaching and learning towards
a greater focus on grammatical accuracy in classrooms remain unclear. As required by
the ECSCE, summative assessment (the SHSEET in this case) should focus on testing
students’ integrated language use and avoid the discrete testing of language
knowledge. However, the SHSEET still reflects the importance of the separate testing
of grammar and vocabulary through four different tasks of MCQ, Cloze, Sentence
Completion, and Gap-filling cloze which account for 38% of the total SHSEET score
(see Table 1.3). In this thesis, these four different GVT tasks are referred to according
to the names of these test tasks in the authentic SHSEET papers. In the GVT, MCQ is
the task that uses selected-response items and the Sentence Completion is the task
which uses constructed-response items, and both tasks are sentence-based.
Furthermore, the Cloze has selected-response items as choices are provided and the
Gap-filling cloze has constructed-response items which require written answers, and
both tasks are passage-based. Examples of each of these tasks are given in Appendix
B.
Second, the present research is situated at a time of ongoing discussion in the

field regarding the meaningfulness of testing grammar and vocabulary separately from
the four macroskills. Historically, to assess grammar, scholars such as Rea-Dickins
(1997) stress the importance of testing grammar in an integrated way (as embedded
and reflected in other skill-based tests like reading or writing) rather than in a
decontextualised manner. Purpura (2004) even raised doubts about the nature of
grammar testing (i.e., what the grammar testing actually tests). In a similar vein, Read
(1993) recommended the integrated testing of vocabulary knowledge in writing and/or
speaking texts and Bachman and Palmer (1996) critiqued the narrow nature of
grammatical knowledge (i.e., vocabulary, syntax, and phonology in their perspectives)
in testing language use.
Third, to achieve communicative aims, researchers have suggested different test

methods (e.g., paraphrase, completion, cloze) for testing grammar and vocabulary
(Hughes, 1989; Rea-Dickins, 1997) which are discourse-based (Celce-Murcia, 2007;
Celce-Murcia & Larsen-Freeman, 1999). The testing of grammar and vocabulary in
the SHSEET incorporates four different task types (see Appendix B). However, except
for the passage-based tasks of Cloze and Gap-filling cloze, the SHSEET still adopts a
traditional approach through the use of discrete-point tasks of MCQ and Sentence
Completion to assess grammar and vocabulary, the impact of which is of particular
interest to the researcher.
To sum up, as grammar and vocabulary have a significant role in language

learning, it is necessary to investigate the potential impact of the GVT on EFL teaching
and learning. As such, the GVT became the focus of this study by taking account of
three reasons: first, the CLT-featured as well as learning-oriented ECSCE emphasises
the teaching and assessment of integrated language use; second, the test purposes of
using GVT tasks to test grammar and vocabulary have been criticised; and third, the
particular test methods of GVT tasks to test grammar and vocabulary are of interest.
1.3 AIMS OF THE STUDY
Compared with the previous English curriculum (Ministry of Education, 2001),

the ECSCE has changed to focus more on students and communicative teaching and
learning. This reflects both educational trends and the public call to focus on language
competence and use rather than language knowledge alone. According to the ECSCE,
English assessment and instruction should support each other and EFL instruction is
positioned as learning-oriented. However, classroom teachers and researchers feel that
assessment can conflict with the stated aim of EFL teaching and learning (Qi, 2005,
2007). More significantly, as perceived by other scholars, test preparation practices
such as taking model tests should not be taken for granted as the indication of negative
washback (Watanabe, 1996b), and test washback should be studied, not simply
asserted (Wall & Alderson, 1993). Therefore, whether the learner-centred and thus
learning-oriented intentions of the ECSCE are fulfilled in actual classrooms is yet to
be empirically explored at the junior high school level. To this end, the current study,
starting from the stated positive intentions envisaged by curriculum developers,
investigates the washback of the GVT from both teachers’ and students’ perspectives.
1.3.1 Research objective

The research objective of this study is two-fold. First, the study explores the
positive and negative as well as intended and unintended washback (see Chapter Two)
from both teachers’ and students’ perceptions to reveal the actual washback of the
GVT. Second, the study explores the use of LOA theory (see Chapter Three) in the
GVT context and its possible contribution to positive washback. In this way, the extent
to which the communicative features and learner-centred focus of the ECSCE are
implemented in the preparation leading up to the test is examined.
1.3.2 Research questions

Foregrounding an LOA perspective, this study aims to investigate the washback
of the GVT. Accordingly, there are two research questions proposed in this study:
RQ 1: What is the washback of the Grammar and Vocabulary Test in the

Senior High School Entrance English Test (the GVT)?
RQ 1a: What is the washback value of the GVT?
RQ 1b: What is the washback intensity of the GVT?
RQ 2: What are the opportunities for and challenges of the incorporation of
Learning Oriented Assessment (LOA) principles in GVT preparation?
In order to address these two research questions, this study applied an
exploratory sequential mixed method research (MMR) design. The qualitative data
were collected through classroom observations, semi-structured interviews and focus
groups, while the quantitative data were collected through the administration of a
student survey to Grade 9 students in Chongqing.
1.4 SIGNIFICANCE OF THE RESEARCH
This study, which focuses on the washback of the GVT, first provides empirical
evidence of the test washback. Contextualised in China, specifically Chongqing, the
study sheds light on EFL grammar and vocabulary teaching, learning, and testing at
the compulsory education level. It can address the scarcity of research on the washback
of junior high school English tests in China, and therefore, potentially benefit Grade 9
teachers and students in terms of identifying challenges and opportunities in the
instruction and assessment of English grammar and vocabulary and providing
suggestions for more learning-oriented pedagogy and assessment.
In light of incorporating LOA theories (Carless, 2007; Jones & Saville, 2016)
into washback study, this washback study has a potential to make a theoretical
contribution to promoting positive washback in EFL teaching and learning. Further, as
the separate testing of grammar and vocabulary is also common to other high-stakes
standardised English tests in China (e.g., the NMET, the TEM), this study is of interest
and value to EFL teachers and test designers in similar contexts at different education
levels. Likewise, the high-stakes nature of the SHSEET also provides implications to
a wider international context. Most significantly, the present study has the potential to
help policymakers, curriculum developers, and test designers to realise the potential of
or difficulties in curriculum implementation and test design.
1.5 THESIS OUTLINE
The thesis constitutes eight chapters. Each chapter is organised based on the
research questions presented in section 1.3.2. This first chapter has introduced the
research background and contextual information on EFL exams and the SHSEET,
which sets the scene for the current study. Moreover, it also explained the aim, research
questions, and significance of the study. Chapter Two reviews three bodies of the
literature: standardised English testing, grammar and vocabulary testing, and
washback studies. Chapter Three introduces washback models of teaching and
learning, and specifically depicts the LOA cycle which is combined with the washback
model as a new theoretical framework for this study. Chapter Four delineates the
exploratory sequential MMR design. Research findings are reported in Chapters Five,
Six, and Seven. In particular, the qualitative and quantitative results are integrated to
address the two proposed research questions in each chapter. Finally, Chapter Eight
discusses the overall research findings in connection to the research questions and
concludes the study with contributions, implications, reflections, limitations, and
future directions.
Chapter 2: Literature Review
This chapter presents research related to the current washback study regarding
the Grammar and Vocabulary Test in the SHSEET (the GVT). To start with, the two
research questions introduced in section 1.3.2, which are in relation to test washback
(both washback value and washback intensity) and the opportunities for as well as
challenges of the incorporation of Learning Oriented Assessment (LOA) principles in
GVT preparation, were informed by this literature review.
Studies relevant to the two research questions are presented in three sections.
First, considering the nature of the SHSEET, standardised English language tests are
first briefly reviewed in section 2.1. Second, the testing of grammar and vocabulary,
as aspects of language which are assessed separately in the SHSEET, are addressed in
section 2.2. As the major foundation of this study, washback is given substantial
attention in section 2.3 from both theoretical and empirical perspectives. The chapter
is concluded in section 2.4 with the identification of potential research gaps.
2.1 STANDARDISED ENGLISH LANGUAGE TESTS
A standardised test refers to “any examination that’s administered and scored in

a predetermined, standard manner” (Popham, 1999, p. 8). These tests are often high-
stakes, as the results can be used to inform important decisions and opportunities, such
as enrolment, promotion, or graduation, which directly or immediately affect
stakeholders (Madaus, 1988). High-stakes standardised tests have also been used to
evaluate schools as well as teachers and teaching (Popham, 1999). Therefore,
standardised testing, which the term ‘high-stakes’ is commonly associated with,
involves a form of examination linking results on one set of standardised tests to broad
sets of practices (Bousfield & Ragusa, 2014).
Empirical studies on high-stakes standardised tests, the current SHSEET

included, focus on achievement and proficiency tests. The former, used mainly for
graduation purposes, assesses graduates’ academic achievement, such as the
Scholastic Assessment Test (SAT) for college or university admission test in the
United State of America (U.S.). The latter, aiming to identify test-takers’ language
proficiency levels, included tests such as the Japanese-Language Proficiency Test
Chapter 2: Literature Review 19

(JLPT). This distinction is also applicable to the EFL context (i.e., countries where
English is regarded as a foreign language) where English language is taken as a subject
in schools. For example, entrance or graduation English tests include Japanese
university entrance examinations and the Hong Kong Certificate of Education
Examination (HKCEE), while language proficiency tests include the International
English Language Testing System (IELTS) and the Test of English as a Foreign
Language (TOEFL). Envisaging such a distinction and considering the test purpose of
SHSEET, the current review on high-stakes standardised English tests will include
relevant tests for entrance or graduation purposes in both international and Chinese
contexts.
2.1.1 High-stakes standardised English language tests in the international

context
Negative impacts, including the failure to encourage motivation to learn, have
long been associated with high-stakes standardised English tests (Berwick & Ross,
1989). Internationally, empirical studies found matriculation tests such as the Iranian
high school final English exam (Damankesh & Babaii, 2015) and the Nepali School
Leaving Certificate English examination (SLC) fail to bring about positive results or
to motivate students to continue to learn English and further support the generally
negative perceptions of those tests. However, the consistency between educational
objectives and test influence rather than the test itself needs more scholarly scrutiny.
Internationally, empirical studies present that the test content of English

language examinations sometimes fails to align with educational aims, as outlined in
the syllabus. For example, in the contexts of Sudan and Jordan, the secondary school
certificate English language examination, which aimed at bringing about positive
changes to the teaching and learning of reading and writing skills, failed its designed
intentions (Sharif & Siddiek, 2017). By analysing past test papers and documenting
students’ cognitive processes during test taking, researchers found that reading
questions elicited language knowledge and literal comprehension, but young learners’
critical thinking was neglected. More importantly, overlooking speaking and listening
skills led to a devaluing of these skills and failed to represent the importance of all
macroskills in language learning. As a result, there was an inconsistency between
educational objectives of the language syllabus and the content of language tests.
Informative as this study is, comparison of the syllabus and test content cannot fully
20 Chapter 2: Literature Review

reveal the comprehensive influence of the test, since the exploration of the alignment
between learning objectives and the influence of the test on teaching and learning is
more critical. From this viewpoint, the current investigation of the test influence of the
SHSEET, required to promote positive learning by curriculum design, is significant.
2.1.2 High-stakes standardised English language tests in China

Traditionally, China has a long history of using a standardised testing system
(Spolsky, 1995) and English tests have flourished from the late 1980s (Jiang, 2003).
Taking the large number of Chinese EFL learners into consideration, high-stakes
standardised English tests inevitably play an essential role in the education system.
Extant research has revealed both the negative and the positive influence of these tests,
although the former was more commonly reported.
The negative influence of high stakes English language tests in China is reflected
by test-takers’ extrinsic motivation as reported, which results in intense test-driven
practices in test preparation. For example, participants’ extrinsic motivation to succeed
in the GSEEE negatively impacts on the education system and teaching, leading to
teaching to the test by coaching schools. An impact study of the GSEEE test
preparation program looked into test-takers’ intentions, reasons for attending test
preparation programs, and their expectations of the coaching programs (He, 2010).
Findings indicated that most test participants aimed for better job opportunities and/or
to secure their current jobs by enrolling in a Master’s degree. They attended the
coaching programs with the expectation of a rapid increase in test scores rather than
improving their language proficiency. These goals led to an exclusive focus on the test
content and format in coaching centres.
Negative test influence also occurs when different test functions (i.e., the
selection function and the educational change function) conflict with each other. For
example, a washback study of the editing task in the NMET found that the task failed
to achieve its positive intentions (Qi, 2010). Although all stakeholders agreed that the
editing task discriminated among different levels of language proficiency, which
contributes to the selection function of the NMET, test designers and teaching experts
expressed their negative perceptions of the editing task. Test preparation activities
were evident in classes, where the teaching of test-taking strategies and grammar
points and time spent on mock tests were contrary to the expected impact of enabling
students to identify and correct errors in writing. As a result, the test designers’

intentions were not realised. As the NMET has the same function as the SHSEET, and
the editing task also tests grammar and vocabulary, it is important to ascertain whether
the washback of the grammar and vocabulary section can reflect and contribute to the
selection function of the SHSEET and whether its actual impact accords with the
intended washback or not.
Positive test influence happens when the test design and interpretation of
performance on the test meet design intentions. For instance, the construct validity
(i.e., test designers’ intended aim of fulfilling communicative purpose) of the
Computerised Oral English Test (COET) of the NMET in Guangdong was the focus
of research by Zeng (2010). Despite the absence of interactional features, the COET
did measure key features of students’ oral competence since it reflected the test
construct (factors of pronunciation and intonation, translation, and comprehension and
oral production). Likewise, although some grammar items were not identical with the
authentic use of first language (L1) speakers, Pan and Qian (2017) found that the
content validity of the grammar subtest in the NMET (Shanghai) met the intended
design purpose.
To conclude, empirical studies of high-stakes standardised English tests in China

are abundant (see, for example, Dong, 2020; Jin, 2000; Pan, 2014; Ren, 2011; Xiao,
2014; Zhang, 2019) but most have been in the context of tertiary education and a
relatively limited number of studies have investigated test washback in junior high
schools (see, for example, Chou, 2019; Teng & Fu, 2019). Further, similar to those in
Australia (Bousfield & Ragusa, 2014), the U.S., and the United Kingdom (U.K.),
standardised tests in China have often led to a negative washback in terms of schooling
practices and the quality of education. In addition, researchers have ascertained the
washback value by comparing the actual test influence with test constructors’
intentions, and test quality by examining the test design stipulations.
2.1.3 Empirical studies of the SHSEET

Reflecting the long history of examinations in China, high-stakes standardised
tests remain widely used. However, compared with the many studies of high-stakes
standardised tests at higher education levels (the NMET in particular), there has been
less attention paid to the SHSEET. This section summarises theses and empirical
studies indexed in the China National Knowledge Infrastructure (CNKI), which is the
largest and most widely-used academic database in China.

In terms of language macroskills, the focus of SHSEET studies has thus far been
reading (Li, 2017; Li, 2018; Song, 2013), writing (Hu, 2015), listening (Geng, 2013;
Ma, 2018), speaking (Chen, 2007; Liu, 2012), and grammar (Yang, 2015). The most
frequent topics include test item writing and analysis (Cui, 2006; Li et al., 2019), test
preparation strategies (Li, 2009), content validity or analysis of test papers (Ding,
2014), specific test methods such as the use of cloze (Shi, 2013), and validity studies
on test usefulness (He, 2015) as well as washback (Deng, 2018; Yang, 2015; Zeng,
2008). In the latter study of Yang (2015), the triangulated data showed both positive
and negative washback contributed to the complexity of washback phenomena.
Results showed that the grammar task of MCQ influenced instruction and learning
positively, and teachers’ and students’ positive perceptions of the test motivated
learning and exerted an influence on the teaching and learning practices. Furthermore,
washback varied from person to person and to a different degree. This study revealed
some of the current concerns about the teaching and learning of grammar in junior
high schools, which is relevant to the present study. However, the participant groups
(students from Grade 7 to Grade 9) were an issue. The wider participant groups made
it hard to pay more specific attention to the Grade 9 cohort who are believed to
experience the most intense SHSEET influence in test preparation and other test-
related practices such as time investment on test tasks in the class (Bailey, 1999;
Cheng, 2005; Cheng et al., 2011) (see further details in section 4.4.2).
In summary, although the SHSEET is a high-stakes test in the context of Chinese

high schools, there has been relatively less research carried out compared to tests like
the NMET, particularly in terms of the separate sections testing grammar and
vocabulary. It is important to note that while most SHSEET studies aim at improving
practical pedagogical practices, they are confined to specific research sites since the
test is administered at a provincial- or county-level. Nonetheless, these studies can also
shed light on the impact of the test in other regions, since the SHSEET design at the
national level is under the uniform guidance of the ECSCE. However, it is insufficient
to mainly pay attention to the test quality and test preparation strategies to improve test
scores and enhance test-taking strategies; more studies should be conducted to
investigate the actual influence that tests exert upon teaching and learning.

2.1.4 Section summary
Several issues can be identified in the empirical studies reviewed. First,
researchers have focused on tests for entrance or graduation purposes at the tertiary
level or above. As one of the major English graduation exams in China, the SHSEET
requires further exploration to meet the needs of stakeholders, especially young
learners. Second, most tests have not succeeded in bringing about positive or intended
effects. However, a study of washback cannot be confined to checking the alignment
between educational goals and the tests themselves. Third, most SHSEET studies
published in China aim to improve test scores and enhance test-taking strategies by
examining the test quality and test preparation. Although teaching to the test (Popham,
2001) or teaching to raise test scores (Qi, 2007) is claimed to be common due to the
instrumental purpose of high-stakes standardised English tests, it is necessary to
document the actual classroom teaching and learning practices associated with the
SHSEET. This would enable a more comprehensive exploration of the washback of
testing grammar and vocabulary so that it can be clearly understood by stakeholders.
2.2 ENGLISH GRAMMAR AND VOCABULARY TESTING
As documented in Chapter One, the GVT has tested language knowledge

through both selected-response items (MCQ, Cloze) and constructed-response items
(Sentence Completion, Gap-filling cloze). A discrete-point approach to testing
(Madsen, 1983), which is context-independent in nature, involves the use of selected-
response items such as MCQs1 at the sentential level. This approach was used to assess
a learner’s knowledge of grammar and vocabulary and measured candidates’
recognition and accuracy of language forms or meanings, which is different to
assessing language use at the macroskill level such as speaking and writing tasks,
where candidates need to produce language that is appropriate for a given context and
purpose.
Further, as the two building blocks of any language, grammar and vocabulary
are viewed as fundamental for language acquisition and communication. However,
they have traditionally been viewed as a body of knowledge which is static in nature
and reflects the unchanging knowledge of rules, especially grammar. In contrast, a
1
When “MCQs” is used, it means test items with an MCQ format, and MCQ in this study refers
specifically to the first GVT task in the SHSEET.

more communicative perspective positions grammar and vocabulary as resources
which learners use to successfully communicate (Canale, 1983a, 1983b; Canale &
Swain, 1980; Larsen-Freeman, 2003). Communicative competence is thus seen as
involving language knowledge and skills to achieve effective communication. Hence,
learners should be equipped with both language knowledge and the ability to use
language as intended by the curriculum. For a language test with communicative
characteristics such as the SHSEET, both the knowledge of the language and the ability
to use language should be included in the test content to measure learners’
communicative competence. Therefore, it is essential to review second language (L2)
grammar and vocabulary testing literature to conceptualise and differentiate the
decontextualised and atomised testing of grammar and vocabulary from the integrated
testing of macroskills.
2.2.1 English grammar testing

Historically, the knowledge of grammar has been extensively employed as a
component of EFL/English as a Second Language (ESL) testing (Purpura, 2004).
Grammar assessment has remained a typical feature of standardised language
proficiency tests and many school- or classroom-based assessments (Rea-Dickins,
1997), particularly in the Chinese context (Pan & Feng, 2015; Pan & Qian, 2017). In
language teaching and linguistics, grammar is regarded as a set of rules that govern
linguistic components of words, phrases, and clauses to make sentences in a structural
way. Moreover, grammar considers the meanings and functions of those sentences in
the overall language system (Richards & Schmidt, 2013). The role of the grammar
instruction has long been the subject of controversy in language teaching and learning
(Ellis, 2002; Nassaji & Fotos, 2004; Pazaver & Wang, 2009) and the debate on the role
and usefulness of testing L2 grammar has been longstanding.
It is therefore unsurprising that test methods used in grammar assessment have

become the focus of controversy. First, decontextualised MCQs are most frequently
used to test L2 grammar (Rea-Dickins, 1997). The discrete-point approach, including
MCQs, emphasises the accuracy of linguistic features and is thus criticised for failing
to assess communicative ability adequately (Halleck, 1992). Researchers have also
focused on exploring the extent to which the use of different test methods may result
in the more effective assessment of students’ grammatical knowledge. For example,
Alemi and Miraghaee (2011) compared students’ grammar test performance on cloze

and multiple-choice tasks. Statistically, it was proved that the average test performance
of the two groups did not show any major difference since using cloze did not
necessarily improve students’ grammar knowledge more than administering the
multiple-choice grammar tests. In addition, newer test methods have also been used to
assess grammatical knowledge. For example, in the Hungarian educational context,
the “multitrak” test method was a modified multiple-choice format used in the national
admission English test for future English majors from 1993 to 2004. To answer
“multitrak” items, test-takers had to select the one incorrect answer from four given
choices. To validate and evaluate this item type, Dávid (2007) compared “multitrak”
with other three multiple-choice test methods (text-based multiple-choice task in a
cloze, standard four-choice sentence-based items, and double-blank four-choice
sentence-based items), and found that the “multitrak” tasks provided more information
about students’ grammatical ability range and contained more difficult grammatical
content. Nonetheless, these item types mentioned above are still all discrete-point in
nature, which means they assess learners’ receptive knowledge and are criticised
because they undermine communicative language use (Prodromou, 1995).
More recently, grammatical knowledge has been regarded as a resource through

which learners make meaning (Larsen-Freeman, 2003) and therefore, the ability to use
this knowledge has been widely tested in the context of macroskills including writing
and speaking (Purpura, 2004). For example, grammar is seen as a resource for
enhancing speaking performance since it can raise student awareness and encourage
autonomous learning (Wallace, 2014), and also enhance communicative effectiveness
(Turner & Upshur, 1995). Hence, grammar can be positioned as a resource upon which
learners can draw flexibly and appropriately when engaging in macroskills and the
assessment of the ability to use grammatical knowledge can be integrated within the
testing of a particular macroskill.
Although integrated tasks are frequently adopted in large-scale standardised

English tests like the TOEFL internet-based test (TOEFL iBT), other tests still choose
to separately examine test-takers’ grammatical knowledge, such as the SHSEET
(Yang, 2015) and the NMET (Pan & Qian, 2017). Studies that have examined the
influence of grammar tests on EFL teaching and learning found that the teaching and
learning of grammar can be positive for students’ overall language ability

development, but they can also exert negative influences such as the passive intake of
grammatical instruction (Macmillan et al., 2014; Yang, 2015).
Using corpora to design and develop grammar tests can be an efficient method
to maintain content validity (Argüelles Álvarez, 2013; Macmillan et al., 2014; Pan &
Qian, 2017). Through a corpus-based approach, the content of the grammar section in
the NMET delivered in Shanghai was validated, and the benefits of considering the
incorporation of corpora into test design and development phase were confirmed (Pan
& Qian, 2017). Findings indicated that test items generally covered the grammatical
domains listed in test specifications, but certain drawbacks remained. First, due to
practical item writing constraints, it is difficult to test articles (a/an/the) since no
sufficient context is provided in the test items. Second, not all the listed grammatical
features had been tested in the test, especially some crucial grammatical features (e.g.,
cleft sentence, appositive clause). Therefore, there was a lack of content
representativeness. Further, to avoid testing grammatical structures with low
frequency in high-stakes tests, the reference to corpora during test design could be
useful.
Researchers have endeavoured to find a way of meaningfully and effectively

assessing grammatical knowledge. Grammar consciousness-raising tasks, which
combine the development of the targeted L2 grammatical knowledge “with the
provision for meaning-focused use of the target language” (Fotos, 1994, p. 323),
proved to be effective (Fotos & Ellis, 1991). Fotos (1994) argued further that formal
grammatical instruction can be integrated within a communicative framework by using
these types of grammar tasks. Moreover, communicative grammar testing should
contain contextualised test items and direct teaching focus towards not only word
forms, but also word meanings (Rea-Dickins, 1991). In addition, Purpura (2004)
looked into methods of assessing grammatical ability in a learning-oriented way, since
he believed that teaching problematic grammatical aspects is the key to pedagogical
innovation. He proposed four different types of teaching techniques, of which the
form-focused type includes implicit inductive and explicit deductive teaching of
grammatical knowledge, and contains consciousness-raising activities; the input-based
type, requiring learners to use their instructional input to construct grammatical form
and meaning; the feedback-based type takes advantage of negative evidence of
learners’ grammatical gains and outcomes; while the practice-based type illustrates the

facts of grammar teaching and grammar use. The identification of these techniques
enables more learning-oriented grammar teaching and assessment.
In conclusion, issues remain in the testing of grammar regarding both test

methods (e.g., multiple-choice tasks, cloze) and test focus (i.e., knowledge or use of
grammar). Further, the assessment of grammar in high-stakes standardised English
tests in China is problematic (e.g., lack of contextulisation and content
representativeness) and can exert both negative and positive influences on teaching
and learning. Most importantly, possible methods to promote communicative features
and learning-orientedness of grammar tests can certainly shed light on the current
washback study. Therefore, it is important to examine the effects of grammar tests on
EFL teaching and learning and whether the testing of grammatical knowledge helps to
improve grammar learning.
2.2.2 English vocabulary testing

Vocabulary, a building block for any language, is defined as “a set of lexemes,
including single words, compound words, and idioms” (Richards & Schmidt, 2013, p.
629) and has been widely researched in the field of applied linguistics (Nation, 2018;
Schmitt, 2019). The importance of vocabulary in EFL learning has long been
recognised, with vocabulary tests used since the 1920s in the U.S. (Read, 1997;
Spolsky, 1995). Traditionally, vocabulary testing originated from objective testing,
which adopts a discrete-point approach to measure learners’ mastery of vocabulary
knowledge in a decontextualised, selected-response item. Existing vocabulary testing
literature has focused on test validity by examining vocabulary breadth (how many
words learners know) through tests such as the Vocabulary Levels Test (VLT) (Nation,
2001; Schmitt, 2000), the Updated Vocabulary Levels Test (Webb et al., 2017), and
Vocabulary Size Test (Nation & Beglar, 2007), and examining vocabulary depth (how
well a learner knows and can use a word appropriately) through tests such as the Word
Associates Test (WAT) (Read, 1993; Verplanck, 1992) and the Vocabulary
Knowledge Scale (VKS) (Paribakht & Wesche, 1997). Moreover, issues in the
separate and integrated testing of vocabulary knowledge in standardised English tests
continue to be the focus of research.
Vocabulary tests which use MCQs are mostly preferred by both L1 and L2
speakers (Read, 2000, 2019). The invention and widespread use of vocabulary tests,
scaling and checklists such as the VLT (Nation, 1990; Schmitt, 2000) exemplify this

trend. Although widely used, the MCQs are not without flaws. For example, test
performance can be significantly influenced by test-takers’ willingness to guess
(Nurweni & Read, 1999; Read, 2019; Schmitt et al., 2020), which may result in
overestimation of examinees’ vocabulary size (Gyllstad et al., 2015). Hence, assessing
the ‘breadth’ of vocabulary knowledge through using MCQs may constitute a threat to
reliability.
In addition to MCQs, alternative methods have also been developed to test

learners’ vocabulary size. For instance, the yes/no technique (a dichotomous item) has
been claimed to be more effective than MCQs (Meara & Buxton, 1987). This test
method is characterised by ease of construction and is timesaving in tests. Therefore,
compared to the traditional multiple-choice format, the yes/no technique can test more
aspects and report quantified vocabulary knowledge results. Nevertheless, the timed
yes/no test method, which tests the vocabulary as a discrete-point and context-
independent construct (Harrington, 2018) through a selected-response item, only
predicts certain aspects of vocabulary knowledge and cannot test learners’ vocabulary
knowledge in a communicative language context.
Although testing vocabulary size can indicate students’ overall vocabulary

knowledge, it is rather superficial in that it cannot indicate the ‘depth’ or quality of any
specific word; that is, how well the word is understood by learners (Read, 2000). First,
researchers measure vocabulary proficiency through different test methods. In effect,
scholars tried to examine word depth using different test methods rather than the single
use of discrete-point items (e.g., multiple-choice task, the yes/no technique). For
example, WAT (Read, 1993; Verplanck, 1992) is more contextulised since both
different meanings and use of the target words were included in the test and VKS
(Paribakht & Wesche, 1997) can check how well learners know a particular word to
provide further evidence about vocabulary depth. However, findings indicate that
guessing, which also influences the testing of vocabulary size with a multiple-choice
format (Gyllstad et al., 2015; Nurweni & Read, 1999; Read, 1993, 1995), affected the
final results which brought about uncertainties regarding the interpretation of
performance on the test. Therefore, although the WAT format seems to measure the
vocabulary depth and is more effective than multiple-choice items, issues of
guessability and the limited number of items remain. Additionally, the performance of
vocabulary ‘breadth’ and ‘depth’ varies according to different language proficiency

levels, and learners with a low language proficiency perform the poorest on both
(Nurweni & Read, 1999). Therefore, it is necessary to incorporate new ways to
meaningfully assess students’ vocabulary knowledge both in tests and daily learning.
Despite a separate component of vocabulary knowledge in tests like earlier

versions of the TOEFL (Henning, 1991), vocabulary has been perceived as a crucial
resource in the testing of macroskills. More significantly, scholars argue that
vocabulary should be best measured in context (Laufer et al., 2004). In effect,
presenting vocabulary test items in authentic contexts has long been preferred (Oller,
1979) and studies indicate that the vocabulary assessment should combine specific
contexts to measure whether students have the adequate vocabulary knowledge to
achieve a communicative purpose. For example, vocabulary knowledge is viewed as
a resource to answer listening tasks and the vocabulary use in the Cambridge English:
First (FCE, now known as B2 First2) listening subtest. This was proved to resemble
real-world listening content by comparing the exam-based corpus and the real-world
text corpus with the British National Corpus (BNC) (Rose, 2008).
In addition, scholars have also endeavoured to incorporate test purpose into test
design to assess vocabulary. Read and Chapelle (2001, p. 10) provide a framework for
realising this incorporation. From Figure 2.1 on the next page, in between the intended
test purpose and test design, three factors mediate the process: construct definition,
performance summary and reporting, and test presentation. The presence of these
mediating factors influences the desired impacts on stakeholders. Moreover, the
consideration of test purpose in test design and the validation of vocabulary tests
through the listed process in Figure 2.1 will allow opportunities for positive
consequences.
To summarise, existing research mainly focuses on how to measure vocabulary

size or proficiency with specific test methods employed, and the use of vocabulary
knowledge in macroskills or specific contexts to realise a communicative purpose.
However, whether vocabulary tests could be learning-oriented to promote more
2
Although the name of the test was changed, this study adopts the name used by the researchers whose
work is cited (i.e., the former name of FCE). This also applies to the other Cambridge tests appear later
in the thesis (i.e., CAE, CPE).

effective and engaging vocabulary teaching and learning practices requires more
empirical investigation.
Figure 2.1. A framework for vocabulary testing (Read & Chapelle, 2001, p. 10)
2.2.3 The testing of English grammar and vocabulary

As the fundamental pillars of any language, grammar and vocabulary are often
discussed together in the EFL context. The attempt to separate grammar and
vocabulary into two independent components is challenging and may be impossible in
both learning and assessment contexts (Celce-Murcia & Larsen-Freeman, 1999;
Halliday, 2004). In a learning context, learners’ grammatical development is always
pertinent to their vocabulary development. Specifically, the learning of lexical
meanings comes before the learning of language forms, and in turn, the range of lexical
items develops simultaneously with the increase in learners’ grammatical structures
(Harrison, 2015). Traditionally, grammar and vocabulary are tested as separate
language constructs, with a focus on recognising language forms and/or meanings, and
language depth or appropriateness in authentic contexts is not touched upon.
Communicatively, they can be learned and tested to successfully carry out tasks
involving language macroskills. In this way, the language use is tested through
contextualisation and in depth.
The testing of grammar and vocabulary as an actual test component can be

generalised into three perspectives. First, some test developers regard them as
constructs that can be assessed through knowledge, such as VLT (Petrescu et al., 2017)

and WAS (Nurweni & Read, 1999). Second, most language proficiency tests view
them as a resource which is utilised by candidates to fulfill tasks involving language
macroskills of speaking, listening, writing, and reading (Jamieson et al., 2000). To this
end, they can successfully achieve a communicative purpose. Such tests include the
IELTS (Ostovar-Namaghi & Safaee, 2017) and the TOEFL iBT (Biber & Gray, 2013).
Third, other tests assess grammar and vocabulary as both resources and separable
constructs, such as the NMET (Pan & Qian, 2017), the FCE (Rose, 2008), the
Cambridge English: Advanced (CAE, now known as C1 Advanced) (Docherty, 2015),
and the Cambridge English: Proficiency (CPE, now known as C2 Proficiency) (Booth
& Saville, 2000). In those tests, besides the inclusion of grammar and vocabulary use
in the testing of macroskills, a subsection is incorporated to test grammar and
vocabulary knowledge, such as the Structure and Written Expression in the TOEFL
paper-based test (PBT) and the Reading and Use of English in the FCE, the CAE, and
the CPE. For example, Zhuang (2008) found that, contradicting the “more reflective
of communicative competence models” claim (Jamieson et al., 2000, p. 3), every part
of the TOEFL PBT emphasised grammar and vocabulary skills, of which the Structure
and Written Expression was top of the list. Through attempting to test writing
competence indirectly, the Structure and Written Expression part overlooked the
general characteristics of the TOEFL PBT, which is positioned as a communicative
test in the academic context. Its limited item types and the testing of grammatical
knowledge posed threats to construct validity and changes were required.
The design and revision of the Use of English paper in tests of the Cambridge
Main Suite Exams (MSE) can be a good reference for designing and testing grammar
and vocabulary. The Use of English component was first introduced into the
Cambridge MSE in the 1950s and the testing focus has progressed according to
teaching and testing changes over time (Weir, 2013). Generally, to align with
communicative teaching and assessment, tests in the Cambridge MSE take a lexico-
grammatical approach to accentuate the relationship between grammar and
vocabulary, which requires not only basic language knowledge but also the ability to
use language in context. Originally, gapped-sentence tasks in the CPE Use of English
paper proved to be able to demonstrate test-takers’ full linguistic repertoire (Docherty,
2015). Compared to the difficulty in designing multiple-choice distractors, gapped-
sentence tasks enable the examination of candidates’ productive knowledge and

linguistic competence (Booth & Saville, 2000). However, this task type was removed
based on Structural Equation Modelling (SEM) results (i.e., checking testing
constructs) in CAE test papers, due to its overlapping with the testing focus of the
Reading paper (i.e., the cohesion and coherence test focus) (Geranpayeh, 2007). Most
recently, the Use of English paper and the Reading paper have been combined into one
paper titled Reading and Use of English in the three tests (Docherty, 2015). The revised
Use of English section includes tasks of multiple-choice cloze, open cloze, word
formation, and keyword transformations. All those test tasks jointly examine learners’
knowledge and ability to use language, which typically links form, meaning, and use,
and put more emphases on productive tasks over discrete-point items. Therefore, they
are in line with the curriculum focus on communicative competence development and
thus assumed to bring about positive washback in classrooms since language use is
also valued.

In sum, studies in either grammar testing and/or vocabulary testing have focused
on test content (i.e., knowledge or communicative competence) and different test
methods (i.e., how to test grammar and vocabulary). It seems that finding
contextualised test methods (e.g., the lexico-grammatical approach in the Cambridge
MSE) can help assess learners’ language knowledge and the ability to use the
language, which is claimed to bring about positive washback in classrooms. However,
it should be noted that test methods influence both content and methods for teaching
and learning (Green, 2007a), which, in turn, affect the test influence since even in
integrated tasks, rote memorisation can also exert a negative influence (Linn et al.,
1991). As such, empirical evidence is needed to clarify the actual influence of the four
GVT tasks, which test grammar and vocabulary through different test methods, on
both teaching and learning.
2.3 WASHBACK
Based on the previous review and considering present research objectives, this
section discusses the theoretical foundations of washback, the movement towards
positive washback and LOA, washback stakeholders, and significant washback
studies.

2.3.1 Washback concepts and dimensions
Since Alderson and Wall (1993) and Wall and Alderson (1993) conducted
seminal research on washback, it has received considerable attention in language
testing and assessment (So, 2014). Existing washback studies have mainly focused on
test effects in the teaching and learning context, which is considered to represent only
one aspect of consequential validity in Messick’s validity theory (McNamara, 1996;
Messick, 1989). Nevertheless, different conceptualisations of washback have evolved
over time. It has been defined as the connections between testing and learning by some
researchers (see, for example, Shohamy et al., 1996) and as the impact of the test on
the teaching practices aiming at test preparation for others (see, for example,
McNamara, 1996). Furthermore, washback is regarded as the influence of language
testing on both teaching and learning for some other scholars (see, for example, Bailey,
1996; Brown, 2000; Gates, 1995; Khaniya, 1990), irrespective of positive or negative
effects (Messick, 1996). Language teachers and learners may engage in certain test-
informed educational practices that they would not necessarily do without the test,
which is known as washback (Messick, 1996; Wall & Alderson, 1993). Washback is
now understood as encompassing the close relationship of testing, teaching, and
learning in language classrooms (Brown, 2000) which includes the impact of a test on
teaching and learning during the time leading up to it (Green, 2013). In the current
study, ‘washback’ is defined as the effects of a test on teaching, learning, teachers, and
learners in test preparation classes. Therefore, the present study examines the
washback of the GVT on teaching and learning in Grade 9 classes in China from the
perspectives of the most directly impacted stakeholders: teachers and students.
Significantly, washback is believed to vary according to contexts and

participants in at least two dimensions (Green, 2013): washback intensity and
washback value. Washback intensity (Cheng, 2005), also washback extent (Bachman
& Palmer, 1996), refers to the amount of effort made by participants to meet test
requirements; therefore, participants are envisaged to change their behaviours because
of the test. Traditionally, test importance or the perceived “stakes” of the test (Madaus,
1988) and test difficulty (Green, 2007a) have been viewed as the driving force for
washback intensity (Green, 2013). As a result, participants’ perceptions of tests can
lead to strong or weak washback. In effect, washback intensity can vary among
different tests (Shohamy et al., 1996), participants (Ferman, 2004; Watanabe, 1996a),

processes (Cheng, 2005; Wall & Alderson, 1993), and time periods (Shohamy et al.,
1996). To achieve visible washback, tests need to be recognised as important, and
participants should have sufficient motivation to learn and substantial resources to
succeed (Hughes, 1993). Otherwise, little effort will be devoted to test preparation
(Green, 2013).
Further, washback is traditionally categorised as either beneficial or negative

(Alderson & Wall, 1993; Hughes, 1989) since the existence of tests can encourage or
hamper the teaching and learning leading up to them. To judge the value or direction
of washback, participants, the context, the investigation time, the rationale, and even
the process should all be considered (Cheng & Curtis, 2004). The following section
will explain the washback phenomenon by first addressing intended and unintended
washback, and then positive and negative washback.
Intended and unintended washback

Washback plays a central role in any new testing system, and it can be a major
focus of the validation of test uses and interpretations (Messick, 1989). To achieve the
intended washback, education authorities should take a range of factors other than the
exam itself into consideration (Wall & Alderson, 1993). The intended washback
occurs when test effects are used for curriculum administration or desirable
educational changes. Similarly, the distinction between intended and unintended
consequences has also been identified by Linn (1993), which is closely linked with the
successful implementation of positive effects intentionally brought about by the
assessment system. In the context of exploring pedagogical changes in response to
changes in the NMET writing tasks, Qi (2007) defined intended washback as the
teaching and learning approaches which could be brought about by testing from test
constructors’ and policymakers’ perspectives. Needless to say, the extent of
successfully implementing the curriculum and achieving desirable educational
changes is the core essence of ‘intended washback’ and ‘unintended washback’. To
understand the intended and unintended washback of the GVT, the present study
examined the stated intentions of the curriculum (i.e., the ECSCE) and Test
Specifications, the actual test content (i.e., the GVT), and teachers’ and students’ goals
and their perceptions.

Positive and negative washback
The value, direction or quality of washback generally refers to positive or
negative test effects on teaching and learning. Whether washback is positive or
negative depends on the test results and test uses. For teachers and students, if the test
encourages them to attain their educational goals, it can be viewed as positive
washback; otherwise, if the test hampers or discourages the achievement of teaching
and learning goals, negative washback is believed to occur (Bailey, 1996; Green,
2007a). However, the extent of positive or negative washback is also subject to test
constructors’ or policymakers’ original test design intentions. For them, once the
educational goals are attained or students pass the test, the washback can be generally
viewed as positive (Green, 2007a); otherwise, the washback will move in a negative
direction. In such cases, the definitions of positive and negative washback overlap with
the intended and unintended washback, since the test design goals and educational
goals are both perceived as the target to be achieved. Nonetheless, the ultimate goals
of students, teachers, test designers and other stakeholders vary, which can explain
their different perceptions of the washback value (Green, 2007a).
Drawing implications from the definitions of different categories of washback

value (e.g., intended and unintended; positive and negative), it is perceived that
(un)intended washback is understood mainly from a test designer’s and policy maker’s
perspective, whereas positive or negative washback is generally comprehended by
different stakeholders. Those stakeholders include wider participant cohorts who are
likely to be impacted by the test, such as teachers, students, educational professionals,
employers, and so on. As such, (un)intended washback indicates a top-down approach
but positive as well as negative washback imply a multi-faceted approach.
It is true that tests can exert influence on some participants, but not all (Linn,
1993), or a test may have positive or negative washback or even neutral washback
(Alderson & Wall, 1993). To investigate the washback of one particular test, the
washback of test outcomes and uses should be demonstrated and understood, not
simply asserted (Linn et al., 1991; Wall & Alderson, 1993). Therefore, researchers
should not take it for granted that a good test can certainly bring about positive
washback, or that a poor test will result in negative washback (Messick, 1996).
In the present study, the positive and negative washback as well as the intended
and unintended washback, which all indicate the direction or value of washback, are

used to discuss the washback of the GVT. To some extent, they are perceived to be
overlapping with each other. This perception has been discussed and agreed by
scholars in the field, and they claim that the achievement of intended washback is
positive and unintended washback is usually negative (Ali & Hamid, 2020; Cheng,
1997). However, considering the dominant factors (i.e., participants, context, time,
rationale, process) involved in the phenomenon of washback (Cheng & Curtis, 2004)
and the research participants of this study (teachers and students), positive washback
includes the desirable or intended teaching and learning goals which the curriculum
(i.e., the ECSCE) stipulates to be implemented in classroom teaching and learning
through assessment, and also desirable or intended teaching and learning goals with
regard to classroom teachers’ and learners’ perceptions. To this end, the positive
washback in this study aims to explore both the intended washback from a curriculum
aspect and the positive test influence on teaching as well as learning.
2.3.2 Working towards positive washback

The question of how to promote positive washback was the focus of Bailey’s
(1996) study, in which after reviewing available literature, Bailey concluded that four
factors are necessary to achieve beneficial washback. First, whether the educational
goals of learners and programme members are achieved or hampered. As Buck (1988)
has claimed, the concentration on and achievement of language learning goals in
classroom instruction can generate beneficial washback, but the difficulty emerges as
students usually have the short-term goal of attaining a higher score and the long-term
goal of enhancing language proficiency. Thus, negative washback occurs once those
conflicting goals are perceived differently by learners (Bailey, 1996). Second, whether
a test is authentic or not. Following Carroll (1980), Morrow (1991) clearly addresses
the distinction between text authenticity and task authenticity, thus the input (text) and
the process of input (task) should correspond to the real-life situation or target
language use (TLU) domain. Tests with beneficial washback should be authentic and
represent communicative language skills (Messick, 1996), but should first minimise
“construct under-representation” and “construct-irrelevance” which are two main
threats to validity (Messick, 1996, p. 252). Third, positive washback can emerge
through learner autonomy and self-assessment. Fourth, test reporting methods are also
key to the achievement of positive washback. Instead of a general score as the final
report for students’ achievement and progress, Shohamy (1992) and Spolsky (1990)

suggest a detailed score reporting to guard against the superficial interpretation of the
test results. In addition, although Shih (2009) pointed out the significance of promoting
positive washback, how to incorporate stakeholders in one systematic positive
washback model remains theoretically unresolved to a large extent.
2.3.3 A new approach to positive washback: Learning Oriented Assessment

Although researchers in the field of language testing and assessment have
incorporated theories such as language learning, motivation, and intention into their
conceptualisation of washback, they did not necessarily start from a positive
viewpoint. As the test design of the SHSEET is required to follow the ECSCE, which
advocates “learner-centredness” and CLT, the washback model employed in the
current study should pay particular attention to students and their learning. In this case,
the LOA cycle, a framework proposed by Cambridge English Language Assessment
(Jones & Saville, 2016) can provide a systematic model to underpin research into
positive washback, the current study included.
Underpinning the current evolution of integrated assessment in language testing

(i.e., viewing learning as a process, emphasising the testing of integrated macroskills)
(Uyaniker, 2017) and seeking to reconcile formative and summative assessments
(Carless, 2007; Carless et al., 2006), LOA theory aims to promote positive washback
(Jones & Saville, 2016) and focuses on students’ learning. It involves documenting
and interpreting evidence of test performance for further language development
decisions and test construction implications (Purpura, 2004). Further, it gives priority
to the centrality of L2 processing as well as learning outcomes which come from both
planned and unplanned assessments (Purpura & Turner, 2013). Therefore, it prioritises
learning in any context and aims to raise standards over time (Saville & Salamoura,
2014), acknowledging the synergy between instruction, assessment, and learning
(Turner & Purpura, 2016).
Theoretically, LOA is generated from formative classroom assessment contexts

(Jones & Saville, 2016; Turner & Purpura, 2016). However, since both formative
assessment (formal and informal assessments) and summative assessment (including
large-scale standardised tests like the SHSEET) are directed to improve learning
outcomes, once the central theme of assessment is to engineer appropriate student
learning, the LOA purpose of promoting learning in assessment can be achieved in
either form (Carless, 2007). For summative assessment, it can be learning-oriented in

certain conditions such as encouraging deep rather than surface approaches to learning
and promoting a high level of cognitive engagement throughout a test (Carless, 2015).
Significantly, working towards well-designed summative assessments can also
provide opportunities for employing formative assessment strategies (e.g., peer
feedback, student self-assessment) (Carless, 2007). In a similar vein, an LOA
framework with five agents and with seven dimensions proposed by Turner and
Purpura (2016) can reflect the synergy between formative and summative assessment.
The five agents are learners, peers, language teachers, curriculum materials and
standards, and technology. The seven dimensions contain contextual (socio-political
forces, teacher and student attributes), elicitation (planned and unplanned language
elicitation activities in classrooms), proficiency (what students are expected to learn
and what is the criteria for success), learning (learning and cognition, feedback and
assistance, self-regulation), instructional (teachers’ L2, topical and pedagogical
content knowledge), interactional (positive and negative turn exchanges related to
learning goals), and affective (learners’ socio-psychological predispositions, such as
emotions, beliefs, attitudes, motivation).
Empirically, inspired by the LOA theory, Hamp-Lyons and Green (2014)

demonstrated the application of LOA theory to summative assessments, using the term
Learning Oriented Language Assessment (LOLA) to investigate the FCE Speaking
test. Tsagari (2014) presented findings from unplanned LOA in EFL classrooms. Both
studies suggest that LOA/LOLA training for teachers and its principles and practices
are necessary to integrate exams into learning, thus providing an empirical basis for
LOA application in specific summative examinations.
Nevertheless, the understanding of how LOA is applicable to summative

assessments is still limited. Few studies have yet focused on applying LOA theory and
frameworks to large-scale and high-stakes standardised tests which are summative in
nature. Therefore, the proposed study aims to make a contribution in this regard, and
the theoretical application of LOA in the present study is delineated in Chapter Three.
In the following sections, empirical washback studies from stakeholder and high-
stakes standardised English tests perspectives will be reviewed.
2.3.4 Washback and stakeholders

In educational assessment, stakeholders are defined as any individual or group
of individuals whose lives are influenced by a test (So, 2014). The concern of

stakeholder groups with assessment is not new, as it has been recorded as early as in
1877 (Latham, 1886) since they are both directly and indirectly influenced by the tests.
Specifically, a full range of stakeholders and their opinions should be consulted while
monitoring the impact of the test on language materials and classroom activities (Weir,
2005). However, it was only after Alderson and Wall (1993) proposed their washback
hypotheses (see further details in Section 3.1) that empirical washback studies began
to flourish and stakeholders were prioritised. The stakeholder groups focused upon in
washback theories are different. For example, apart from teachers and students,
Hughes (1993) considers administrators, materials developers, and publishers as
influential participants in test production, while Bailey (1996) identifies materials
writers, curriculum designers, and researchers as important to test processes. Broader
stakeholder groups of students, teachers, publishers, materials writers, course
providers, and other stakeholders were included in Green’s (2007a) washback model
which incorporates washback value and intensity. Although various stakeholder
groups have been addressed, the key stakeholder groups of students and teachers are
at the centre of every washback model, and they are the major participants in the
present study.
A review of the relevant literature shows that unequal attention has been paid to
different key stakeholders. As students constitute the most direct and ultimate
stakeholder of any assessment, scholars (see, for example, Hamp-Lyons, 1997)
advocate that more studies should foreground their perceptions of washback, tests, and
test results as there are still a paucity of learner washabck studies (Damankesh &
Babaii, 2015; Pan, 2014; Xie & Andrews, 2013). Therefore, students and their learning
practices have recently become the focus of research (see, for example, Andrews et
al., 2002; Cheng et al., 2011; Reynolds et al., 2018; Saglam & Farhady, 2019; Shih,
2007; Xie & Andrews, 2013; Zhan & Andrews, 2014). Noticeably, students are mainly
included in current washback studies, either researched together with teachers (Green,
2006a, 2006b; Pan & Newfields, 2011) or other stakeholders such as parents (Cheng
et al., 2011) or test constructors (Qi, 2004a; 2005; 2007). For example, in the Hong
Kong secondary school context, Cheng et al. (2011) conducted impact studies among
students and their parents during the test innovation period. An investigation of their
perceptions of the impact of school-based assessment (SBA), identified that students’
perceptions of test learning activities related to their language proficiency awareness.

Further, significant differences were also identified in students’ perceptions of learning
activities before and after the introduction of SBA. Moreover, parents’ perceptions of
and knowledge about the test related to their support for students’ test preparation
activities. Most significantly, there was a direct and significant relationship between
the two stakeholder groups’ perceptions regarding SBA impact.
In sum, as the two essential and most basic stakeholder groups, it is necessary to
investigate both teachers and students, who are more directly affected by the test than
any other stakeholder groups, since the testing process is closely linked with both
teaching and learning. To this end, a comprehensive understanding of the classroom
practices leading up to the test is important to attain.
2.3.5 Washback of high-stakes standardised English tests

It is often argued that high-stakes standardised English tests dominate the
corresponding EFL teaching and learning processes (Watanabe, 1996a) and national
tests are usually employed as tools to introduce changes into centralised educational
systems (Shohamy et al., 1996). Empirically, studies pertaining to high-stakes
standardised English tests and large-scale entrance English exams are abundant in the
washback literature (Hung, 2012). In this section, the washback studies of high-stakes
standardised English tests, the current SHSEET included, are documented, and both
the international and Chinese contexts are explored. It should be noted that the studies
listed in this section are not exhaustive (see Appendix C and Appendix D), but they
have been selected for their salience to the current study.
Empirical studies in the international context

In the international context (see Appendix C), the washback evidence of high-
stakes standardised English tests is not uniform, illustrating the complexity of the
phenomenon itself. Hence, different patterns of washback value and washback
intensity have been identified.
Positive washback is first found in terms of test factors such as test

characteristics of the IELTS (Erfani, 2012; Hawkey, 2006) and test content of the O-
Level exam (Wall & Alderson, 1993). For example, the communicative feature of the
IELTS can bring about positive washback on teaching content or activities, teaching
methods, and test preparation materials (Erfani, 2012; Hawkey, 2006). In a washback
study on the IELTS, Hawkey (2006) found stakeholders perceived test washback

positively due to the test quality and teachers viewed the IELTS as an authentic and
fair test. In turn, this characteristic further encouraged teachers to adopt more varied
teaching methods, materials, and activities focusing on tasks and macroskills with
communicative features. Additionally, the emphases of reading and writing in the O-
Level exam in Sri Lanka directed the teaching practices to focus more on reading and
writing skills rather than only on grammar, and test items which resembled those tasks
in textbooks also positively influenced teaching and learning (Wall & Alderson, 1993).
Positive washback is also identified through teaching factors such as teaching

content in studies of the O-Level exam (Wall & Alderson, 1993), the TOEFL
(Alderson & Hamp-Lyons, 1996), and the International Teaching Assistant (ITA)
program (Saif, 2006). The change of teaching content positively affected teaching
methodology in the ITA program (Saif, 2006), as teachers adjusted teaching objectives
to align with the exam aims and designed teaching content as well as activities
according to the exam to be used in their teaching. Therefore, additional materials were
introduced to the classroom teaching, extra time was allocated to students’
presentations and feedback activities, and teachers made use of the test’s rating
instrument for in-class evaluations and feedback.
Positive washback is further reflected in terms of participant factors such as the

teacher factor in the studies of the IELTS (Green, 2006b; Hawkey, 2006) and the ITA
program (Saif, 2006). In the IELTS washback studies, Hawkey (2006) found that
teacher factors, including their attitudes to the test, contributed to the opportunity for
students’ communicative learning (e.g., macroskill activities), and Green (2006b)
found that teachers’ beliefs in editing and redrafting could compensate for what was
missing in the writing test design of the IELTS.
In contrast, negative washback is reflected in test factors such as test

characteristics of the TOEFL (Alderson & Hamp-Lyons, 1996) and the IELTS (Green,
2006b), and test stakes of the IELTS (Hawkey, 2006). For instance, compared to non-
TOEFL classes, since the nature of TOEFL did not allow for communicative teaching
at the discourse level (this was the former version of the TOEFL, which mainly
contained discrete-point items focusing on language knowledge), teachers used
different teaching methods (e.g., less student questioning, less interaction between
student with peers and/or teacher) in TOEFL preparation classes (Alderson & Hamp-
Lyons, 1996). In a similar vein, through comparing English for Academic Purposes

(EAP) and IELTS writing classes, Green (2006b) found that since test design features
of the IELTS required timed, relatively short responses to writing prompts, preparation
classes failed to provide the required skills and knowledge as specified in EAP
programs. In addition, the perceived test stakes have been found to negatively
influence stakeholders. For instance, the difficulty of reading and writing subtests of
the IELTS made students feel anxious, and high test stakes of the IELTS generated
minor negative washback (Hawkey, 2006).
Negative washback is reflected in the teaching and learning content in relation

to various test preparations such as the O-Level exam (Wall & Alderson, 1993), the
ITA program (Saif, 2006), FCE (Tsagari, 2011), the IELTS as well as the TOEFL
(iBT) (Erfani, 2012; Zafarghandi & Nemati, 2015), and locally designed high-stakes
language proficiency test of Test of Readiness for Academic English (TRACE) in
Turkey (Saglam & Farhady, 2019). Since classroom activities are dominated by the
skills tested, teachers and students pay more attention to the test focus during
classroom instruction. In the Sri Lankan context, Wall and Alderson’s (1993)
washback study of the O-Level test found that in the third term of the academic year,
a “narrowing of the curriculum” (Madaus, 1988; Shohamy et al., 1996) occurred since
classroom teachers started to prepare students for the exam by ignoring textbooks and
relied heavily on past exam papers and commercial test preparation materials.
Similarly, Saif (2006) found that teachers abandoned textbook content including
cultural topics and oral skills which were not tested in the ITA test. Moreover, the
ordinary teaching was disrupted since teachers started to teach more grammar and
vocabulary, which was considered by them as the key to success in the FCE papers
(particularly Use of English) in Greece (Tsagari, 2011). Therefore, the intended
positive washback claim was not realised since teachers were found to ‘teach to the
test’ and students were found to ‘study for the test’ (Zafarghandi & Nemati, 2015)
when students tended to prioritise test-oriented practice (Saglam & Farhady, 2019) and
little attention was paid by students to materials perceived to be not relevant to the test
(Damankesh & Babaii, 2015; Erfani, 2012).
Similarly, negative washback has also been discovered in teaching methodology

and learning strategies in preparation for the IELTS (Green, 2006b), the FCE (Tsagari,
2011), the high school final English exam in Iran (Damankesh & Babaii, 2015), and
the TRACE (Saglam & Farhady, 2019); and time investment in TOEFL preparation

(Alderson & Hamp-Lyons, 1996). For instance, in the comparative study of IELTS
academic writing classes (Green, 2006b), rather than language proficiency
development and language skill acquisition, students and teachers focused on using
test-taking strategies and test memorisation. Broad similarities regarding teaching
methods were also found, such as classroom talk that featured teacher-dominated
interactions and student modality featured as more listening than writing activities. In
a study set in Greece, Tsagari (2011) found that since teachers perceived that FCE
grammar and vocabulary items lacked communicative features, they still adopted
traditional teaching methods, and students tended to memorise repeated test points and
standardise their learning focus, which is unproductive. Moreover, tests are regarded
negatively since teachers and students invest a great amount of time in classroom
teaching and learning for test preparation. For example, it was found that teachers and
students neglected regular class activities to compensate for TOEFL preparation, or
they invested extra-curricular time in test preparation (Alderson & Hamp-Lyons,
1996).
Additionally, negative washback was found in participant characteristics such as

student and teacher attitudes towards the FCE (Tsagari, 2011) and teacher factors in
the study of Japanese entrance exams (Watanabe, 1996a). In the case of the FCE,
teachers felt stressed and anxious because their students’ success in the exam became
a criterion for evaluating their professional competence, and the exam demotivated
students’ language learning, increased their anxiety, and made them feel bored.
Students disliked communicative activities and materials since they did not see any
potential to enhance their test performance (Tsagari, 2011).
Intriguingly, neutral or no washback was found regarding educational changes

at the macro level in the study of the ITA program (Saif, 2006) and teaching
methodology of the TOEFL (Alderson & Hamp-Lyons, 1996) as well as the O-Level
exam (Wall & Alderson, 1993). The failure of achieving the intended washback is
evidenced by no washback on policy or educational changes. In the ITA washback
study of Saif (2006), although changes took place at the micro classroom level (e.g.,
teaching materials and methodology), no evidential link was found between the test
and educational changes or teaching methodology at a Canadian institutional level.
Although tests are theoretically assumed to bring about changes to both what and how
to teach and to learn (Alderson & Wall, 1993), empirically, despite the few studies

which found limited changes (Saif, 2006), most studies reported no obvious washback
on teaching methodology (Alderson & Hamp-Lyons, 1996; Wall & Alderson, 1993).
For example, Wall and Alderson (1993) compared the actual teaching methodology
with the teaching methods recommended by the Teacher’s guides. In the textbook-
based classroom teaching, teachers’ content presentation and student exercises were
not identical with the O-Level exam or textbooks and might go against the textbook
principles, but neither positive nor negative washback evidence was shown on teaching
methodology. The main reason for this phenomenon was that teachers were not certain
about the exam objectives or the textbook teaching, and not clear about how to teach
since no suggestions were provided by the Teacher’s guide or exam-support materials.
Therefore, the teaching methods remained the same throughout the academic year.
Furthermore, washback can have different degrees of intensity in that it varies

among participants and time periods in studies of the IELTS and the TOEFL iBT
(Erfani, 2012), the Arabic as a Second Language (ASL) and the English as a Foreign
Language (EFL) oral exam (Shohamy et al., 1996), Japanese university entrance
exams (Watanabe, 1996a), and the TOEFL (Alderson & Hamp-Lyons, 1996). For
example, in the Israeli context, different stakeholders perceive washback differently.
Education authorities or bureaucrats who usually use the tests as instruments for
demonstrating political power or gaining control over the education system (e.g., to
force teachers and students to teach and learn Arabic, to raise the prestige of the Arabic
language) usually view the ASL and the EFL oral exams as having the potential to
bring about positive washback to influence teaching and learning. Teachers and
students expressed their negative feelings towards the test. Further, for teachers and
students, washback of the low-stakes ASL test was restricted and decreasing yearly,
but washback of the high-stakes oral test of EFL was strong and increasing yearly
(Shohamy et al., 1996). Therefore, washback intensity also varies over time, and
several factors contribute to this phenomenon: test stakes, language status, test
purpose, test format, and skills tested.
In sum, internationally, tests could not induce washback by themselves.

Participant characteristics including attitudes, and teaching and learning factors like
content, methodology, and time investment, can all influence washback value; while
time, participants, and tests impact on washback intensity. In addition, the same test
may have different washback values on teaching and learning, such as the negative test

investment for TOEFL preparation but neutral washback on teaching methodology
(Alderson & Hamp-Lyons, 1996), and negative washback on teaching and learning
content but neutral washback on teaching methodology for the O-Level exam (Wall &
Alderson, 1993). However, the complexity of those findings is subject to different
contexts, tests, and even participants. Therefore, more research is needed to reflect the
different extent and types of washback (Spratt, 2005) and to help generalise the
findings to other EFL teaching and learning contexts (Tsagari, 2011).
Empirical studies in China

In China, different patterns of washback value and intensity have been reported
in empirical studies (see Appendix D).
Positive washback is identified in teaching and learning factors such as teaching

and learning content and time investment in the context of the NMET (Li, 1990), and
teaching methodology and learning strategies of the Internet-based CET-Band 4 (IB
CET-4) (Wang et al., 2014), the NMET (Zhi & Wang, 2019), and the SHSEET (Yang,
2015). Teaching content and time investment can be positively influenced by tests. For
example, in Li’s (1990) washback study of the NMET, the increased use of imported
and teacher-developed materials affected teaching and learning positively. Therefore,
the test shifted the formal teaching of linguistic knowledge to language skills practice
and use and teachers as well as students spent more time practising language skills
both inside and outside classrooms, which contradicted the generally negative
perception of high-stakes tests (see, for example, Alderson & Hamp-Lyons, 1996;
Shohamy et al., 1996). Further, in the IB CET-4 washback study, the test was found to
help establish a new computer and internet-based teaching model and develop
students’ autonomous learning abilities (Wang et al., 2014). Further, language-use
strategies such as recalling grammar knowledge and reading strategies utilised by test-
takers are perceived as useful in real-life tasks, which indicate positive washback in
the NMET (Zhi & Wang, 2019). Likewise, the grammar section (i.e., the MCQ task)
of the SHSEET also exerted positive washback on teaching approaches such as
teaching problem-solving techniques, and learning strategies including learners
summarising problem-solving techniques by themselves (Yang, 2015). Additionally,
tests can exert a positive influence on students’ affective factors such as their
motivation to learn; for example, students developed an enthusiasm to learn English
in extra-curricular time because of the NMET (Li, 1990), and test preparation

encouraged students to learn English (Gan, 2009; Teng & Fu, 2019; Zeng, 2008).
Another case of positive washback was identified in the NMET context where
students’ test anxiety was mitigated due to the new policy of two tests administered a
year from 2016 onwards; as a result, the test had a positive influence due to the
decrease of test anxiety which further impacted on learning positively (Chen et al.,
2018).
In contrast, negative washback is found in test aspects such as test characteristics

of the NMET (Qi, 2007) and the CET-4 (Xie & Andrews, 2013), test use of the NMET
(Qi, 2004b, 2005, 2007) and the TEM-Band 8 (TEM-8) (Zou & Xu, 2017), and test
method of the IB CET-4 (Wang et al., 2014) and the SHSEET (Yang, 2015). For
example, the absence of communicative tasks in writing practice made it impossible
to bring about the pedagogical change intended by NMET constructors (Qi, 2007).
Similarly, the failure to achieve the communicative language skills in the CET-4 test
preparation proved that test constructors’ intentions to bring about innovation in
teaching were unsuccessful (Xie & Andrews, 2013). Further, test use or purpose may
induce negative washback. For example, as a gatekeeping test for recruiting senior
high school graduates, the NMET failed to promote educational or pedagogical
changes intended by test constructors since it conflicted with the selection function of
the exam (Qi, 2004b, 2005, 2007). Likewise, the misuse of test results can produce
negative washback. For example, as a language proficiency test for English majors,
the TEM-8 was found to have unintended uses such as including test scores as
graduation requirements, assessing teachers and program administrators, and even
ranking universities and departments (Zou & Xu, 2017). Moreover, test method
influences teaching and learning negatively. For example, the overemphasis of
computer ability in the IB CET-4 resulted in the lack of spelling and grammatical
knowledge for students’ written abilities (Wang et al., 2014), and the multiple-choice
exercises could become the test preparation focus due to the multiple-choice format in
the test (Yang, 2015).
Negative washback is also identified regarding teaching and learning content and
teaching and learning methodology of the SHSEET (Yang, 2015; Zeng, 2008), the
CET-4 (Zhan & Andrews, 2014), and the NMET (Qi, 2005). A SHSEET washback
study by Zeng (2008) found that for teachers, tests disrupted the teaching routine and
limited their choices of teaching content such as neglecting speaking as well as the

curriculum requirements and they felt challenged in applying communicative language
teaching; while for students, test preparation materials dominated their self-study time.
Similarly, in the context of the CET-4, students shifted from authentic materials like
newspapers to the specific use of past exam papers, test-related websites, and
vocabulary books when the exam date was approaching (Zhan & Andrews, 2014).
Therefore, tests limited teaching materials and teachers were prone to ‘teach to the
test’ rather than promoting teaching and learning in an intended way (Qi, 2005; Yang,
2015). Further, when the CET-4 was approaching, instead of using self-management
learning strategies, test-takers still adopted traditional test preparation methods such
as rote memorisation, doing past exam papers and exam-like exercises (Zhan &
Andrews, 2014). In addition, it was possible that grammar teaching was isolated from
the entire teaching process, and rather than developing language competence, students
tried to master test-taking strategies and did more multiple-choice exercises for test
preparation due to fewer opportunities to use English for communication even inside
classrooms as teachers rarely used English during teaching (Yang, 2015).
Negative washback can also be identified through participant characteristics

such as attitudes towards the SHSEET (Zeng, 2008) and learning motivation of the
General English Proficiency Test (GEPT) (Shih, 2007). Especially for students, high-
stakes English tests like the SHSEET aroused their anxiety and stress levels which led
them to ‘learn to the test’ and principally adopted test preparation materials even in
their self-study time (Zeng, 2008). This, in turn, hinders their learning motivation and
achievement of language competence. Moreover, in a Taiwanese private higher
institute, a damaging influence of the GEPT on students’ learning motivation and self-
confidence was found, because the exam content did not align with school learning
(Shih, 2007). Hence, students could not apply their classroom learning to the test,
which made them lose self-confidence and decreased their motivation to learn.
It is important to note that no obvious washback has been found regarding

teaching methodology and learning strategies in studies of the HKCEE (Cheng, 1997,
1998) and the CET-4 (Zhan & Andrews, 2014). In Hong Kong, after the introduction
of the new HKCEE, there were obvious, quick and efficient changes in teaching
materials, but changes in teaching methods were limited, superficial, slow and
reluctant. Therefore, teaching methods remained the same no matter before or after the
test innovation (Cheng, 1997, 1998). Likewise, diverse washback did exist on the

content of CET-4 test-takers’ studies, but not in the way they learnt (Zhan & Andrews,
2014).
Significantly, the washback intensity is different among stakeholders of the “Use

of English” (UE) Oral Examination (Andrews et al., 2002), the SBA (Cheng et al.,
2011), and the SHSEET (Yang, 2015), and time periods in the study of the UE Oral
(Andrews et al., 2002). On the one hand, washback varies among students. For some
students, their learning outcomes might be improved through preparing for the test,
while for others, the test might only result in superficial learning performance such as
producing memorised content and coping with specific test methods and requirements
(Andrews et al., 2002). On the other hand, washback varies among students with
different language competence (Cheng et al., 2011). Moreover, washback is not
immediate. For example, the introduction of the UE Oral did exert some influence on
students’ oral performance, but the washback was delayed and more visible in the
second year of test administration since teachers gradually became familiar with the
test and its requirements (Andrews et al., 2002).
To conclude, in China, the washback of high-stakes standardised English tests is

found to have different patterns of value and degrees of intensity in three main aspects.
For washback values, influential factors include test factors such as test characteristics,
test use and test methods, teaching and learning factors such as content, methodology,
and time investment; and participant factors such as learning motivation and
participant beliefs. Moreover, the washback intensity is influenced by different
stakeholders and time periods. Despite the different factors involved, it is found that
the same factor (such as teaching methods) can positively, negatively, or not influence
the test-related teaching and learning. In common with other international studies, the
complexity of washback findings in China is not only reflected through the test itself.
Therefore, the present washback study could help to contribute to the evidence of
washback value and intensity in the Chinese junior high school EFL teaching and
learning context.
Summary
Concluding from the above empirical washback literature in both the
international and Chinese contexts, the complexity of washback phenomenon is seen
through the different patterns of washback value and washback intensity among
stakeholders, tests, and over time. Major similarities are as follows: first, both positive

and negative washback are prevalent in those high-stakes standardised English tests.
Second, more negative washback has been found than the positive and neutral or no
washback value patterns. Third, negative washback occurs once the test fails to achieve
its intended purpose to bring about positive educational changes as envisaged by test
constructors. In sum, all the potential factors for washback value and washback
intensity, which influence or are influenced by the test-related teaching and learning,
are depicted in Figure 2.2.
Figure 2.2. Influential factors for washback from empirical studies
As shown in Figure 2.2, the complexity of washback is composed of value and

intensity. For washback value, test factors and participant factors may influence
washback value perceptions, which in turn exerts effects on teaching and learning
process. For washback intensity, washback may vary over time and among different
tests as well as participants. Therefore, these two major aspects explain the washback
of high-stakes standardised English tests.

From this literature review, it is clear that empirical washback studies conducted
are as complex as the washback phenomenon itself. The following research gaps have
been identified.
First, in order to explore positive washback, the employment of a substantial

washback model is necessary, for which the LOA framework (see further details in
section 3.4) seems relevant. Second, various stakeholder groups have been
investigated to identify the complexity of washback from different perspectives. The
exploration of washback from both parties (teachers and learners) can provide a
comprehensive analysis. Third, controversy over the washback value remains.
Therefore, it is important that evidence of positive and negative washback needs to be
collected not only through test preparation of teachers and students, but also through
their understanding and perceptions of educational goals (Linn et al., 1991). To
conclude, although test constructors and education authorities claim that the test is to
bring about beneficial washback, it should be demonstrated and studied, rather than
asserted and assumed (Haertel, 1992; Linn et al., 1991; McNamara, 1996; Messick,
1994; Wall & Alderson, 1993).
2.4 SUMMARY AND IMPLICATIONS
The two-fold research objective mentioned in section 1.3.1 is situated in the

current literature. The first objective is to explore the positive and negative washback
as well as intended and unintended washback of the GVT on teaching and learning
from teachers’ and students’ perspectives. As the literature review demonstrates, high-
stakes standardised English tests are usually intended by education authorities or test
designers to achieve positive washback (Sharif & Siddiek, 2017) or bring about a
change in teaching and learning (Qi, 2004a, 2004b, 2005, 2007, 2010). However, the
separate testing of grammar and vocabulary through discrete-point items poses
questions regarding the test validity and washback. Furthermore, there is a dearth of
washback research specifically on grammar and vocabulary tasks. Therefore,
analysing the test itself or assembling washback information from test constructors to
compare with the actual washback cannot fully depict and reveal the washback
complexity. To this end, teachers and students, as the most direct stakeholder groups
(Hamp-Lyons, 1997; Watanabe, 2004), should be investigated to probe their washback

understandings and test preparation practices. Second, although researchers have
raised the possibility of using an LOA theory with summative assessments (Carless,
2007, 2015; Jones & Saville, 2016), existing studies have not yet convincingly
demonstrated this potential.
In sum, the general research gaps lead to the employment of LOA theory into
the current washback study. Therefore, this washback study of the GVT will utilise the
LOA theory to explore the positive and negative as well as intended and unintended
washback on teaching and learning from teachers’ and students’ perspectives in the
junior high school context in China. The application of LOA theory is further
explained in Chapter Three.

Chapter 3: Theoretical Framework
In the previous chapter, empirical studies on standardised English tests, grammar

and vocabulary testing, and washback were reviewed. In the current chapter, key
developments in washback theorisation are first presented in section 3.1. Green’s
(2007a) washback model is then explained and used to theorise the current washback
study in section 3.2. Following this, washback mechanism is delineated in section 3.3,
and the conceptualisations of LOA theories and frameworks are presented in section
3.4. In section 3.5, the application of Green’s (2007a) washback model and the LOA
cycle (Jones & Saville, 2016) to the present research is explained. The chapter is then
concluded in section 3.6.
3.1 KEY DEVELOPMENTS IN WASHBACK THEORISATION
During the past decades, in order to clarify the mechanism of washback, various
models and frameworks have been developed (see, for example, Alderson & Wall,
1993; Bailey, 1996; Burrows, 2004; Green, 2007a; Hughes, 1993; Shih, 2009). In this
section, three washback frameworks targeting learning and teaching are presented.
However, the model developed by Green (2007a) was adopted due to its alignment
with the first objective of the current study, which is to explore teachers’ and students’
perceptions of the washback.
3.1.1 Washback Hypothesis

The first classic washback model was proposed by Alderson and Wall (1993, pp.
120-121) and further supplemented by Alderson and Hamp-Lyons (1996, p. 296).
Their “Washback Hypothesis” demonstrates that washback will inevitably occur
whenever a test is introduced. The 16 hypotheses, grouped by the researcher according
to the impact on teaching, learning, and intensity are listed in Table 3.1. The 16
washback hypotheses take most of the possible washback on teaching and learning
into consideration and Hypotheses 12 to 16 have also pointed to the intensity of
washback. For washback on teaching and learning, content, methods, rate and
sequence, degree and depth, and teachers’ as well as learners’ attitudes are all
considered. For washback intensity, the major considerations are different tests and
different participants (i.e., teachers and students).
Chapter 3: Theoretical Framework 53

Table 3.1
Washback Hypothesis (Alderson & Hamp-Lyons, 1996; Alderson & Wall, 1993)
Note. The numbering of hypotheses in this table is not consecutive due to the grouping under different
categories.
However, not all these hypotheses are reflected in reality. For example, in their
Sri Lankan washback study, Wall and Alderson (1993) found that the introduction of
O-Level examinations did not result in changes in teachers’ pedagogy. Therefore,
Hypothesis 4 is not verified. In fact, the “Washback Hypothesis” by Alderson and Wall
(1993) is not exhaustive and it has been criticised as being overly simplistic and
general, since it mainly focuses on the linear relationship between tests and test-related
teaching as well as learning aspects (Shih, 2007). Considering the complexity of
washback, the “Washback Hypothesis”, while incomplete, attempts to clarify the
washback concept and has provided a foundation for both empirical studies and more
comprehensive washback models which followed.
54 Chapter 3: Theoretical Framework

3.1.2 A curriculum innovation model
Teachers were foregrounded in a curriculum innovation model by Burrows
(2004), which was developed on the basis of traditional washback theory and a “black
box” model. The former depicts a linear process linking testing with a response and
originates from the assumption, as outlined in the “Washback Hypothesis” (Alderson
& Wall, 1993), that whenever a new test is introduced, there will be washback related
to the test. Moreover, test qualities rather than teacher factors play a major role. The
latter contests the claim that a new test will result in a single washback response and
further incorporates variables such as teachers’ beliefs, assumptions, and knowledge
since studies such as Wall (1996) found that each individual teacher’s response to a
test differs. The new model (see Figure 3.1) takes qualitative data into consideration
and asserts that washback studies and curriculum innovation are the same, because
new exams which bring about educational changes will have washback on teaching.
Moreover, different patterns of teacher response (i.e., resister, adopter, partial adopter,
and adapter) occur since teachers play different roles in response to the new test.
Figure 3.1. Washback: a curriculum innovation model (Burrows, 2004, p.126)
In conclusion, Burrows’ (2004) washback model, which aligns washback with

curriculum innovation, views teachers’ beliefs, assumptions, and knowledge as the
main factors that generate different washback responses. However, this model is
limited in the sense that it is only teacher-focused and other influential factors related
to teachers besides beliefs, assumptions, and knowledge are neglected. This drawback
is also pointed out by Shih (2009), as he claims that other factors, especially contextual
factors at different education levels, should be included in the washback mechanism.
3.1.3 Washback models of learning and teaching

Situated in the context of Taiwan, Shih (2007, 2009) provided new washback
models based on the General English Proficiency Test (GEPT) washback data. In line
with many other colleagues (Alderson & Hamp-Lyons, 1996; Saif, 2006; Tsagari,
2011; Zhan & Andrews, 2014), Shih points out that the phenomenon of washback is
intricate, as not only a test itself, but also various factors influence learning and

teaching during test preparation. Considering extrinsic factors, intrinsic factors, and
test factors, Shih (2007, p. 25) attempted to encompass the full domain of washback
on learning. Further, based on the assumption that policymakers should take both
teacher factors and micro-level contextual factors into consideration in order to fulfill
the test function of bringing about educational changes, Shih (2009, p. 199) proposed
the teaching washback model including contextual factors, test factors, and teacher
factors. The two models are combined and presented in Figure 3.2.
Learning Teaching
Figure 3.2. Washback models of learning and teaching (Shih, 2007, p. 151; 2009, p. 199)
From Figure 3.2, it is evident that these two models also consider the time factor
(the axis and (t) symbol) to point out that washback varies over time (Shohamy et al.,
1996; Shohamy et al., 1986). Similar to Burrows (2004), these two models also map
the flow of influences which brings about washback in dotted lines. Although the
factors and relationships seem to be exhaustive and comprehensive, they fail to map
teaching and learning into a comprehensive picture since teaching and learning are

always integrated. In addition, although Shih explains that those factors can influence
the degree of washback (Shih, 2009, p. 200), it is yet unclear how the degree will vary.
To summarise, these three washback theories and models try to depict the
complexity of the washback phenomenon, either from the whole teaching and learning
system (Alderson & Wall, 1993) or from specific teaching (Burrows, 2004; Shih,
2009) and learning (Shih, 2007) aspects. Insightful and informative as they are, they
all fail to touch upon the specific issue of the positive or negative washback value and
need clearer identification of washback intensity. This was addressed in a more
comprehensive washback model by Green (2007a), outlined in the next section.
3.2 A WASHBACK MODEL INCORPORATING INTENSITY AND

DIRECTION
Taking the complexity of washback into consideration, Green (2007a)

progressively developed three washback models, from simple to complex, to examine
the direction (i.e., value), test stakes, and intensity of washback. The final framework
(see Figure 3.3), which reflects the general findings from high-stakes standardised
English tests in Chapter Two (see Figure 2.2), is the washback model guiding the
proposed study. In brief, this washback model clearly indicates the washback value
and washback intensity which can help answer the first research question and points
out the positive assessment direction as emphasised in the ECSCE guidelines in section
1.2.2 (Ministry of Education, 2011).
Green’s model, incorporating intensity and direction, depicts the complexity of

washback. Differing from previous views that washback is an external force to reform
instruction, this new model recognises the central place of washback. The direction or
value of washback is first dependent on the relationship between test content and
curriculum requirements to align with the trend to move classroom activities towards
CLT. It is necessary that the test content should be matched with teaching and learning
as closely as possible to bring about positive direction or value of washback. Further,
as Green (2007a) maintains, positive washback is difficult to achieve in its basic model
(i.e., the overlap between focal construct and test design characteristics, see Figure
3.3), for not only test design, but also many other factors can affect the washback
direction. Those factors, as indicated in washback variability, specifically refer to the
knowledge or understanding of test demands, and the resources as well as acceptance
of test demands by participants. In addition to washback direction, washback intensity

also plays a role in the complexity of washback. Similar to the “Washback Hypothesis”
by Alderson and Wall (1993) who considered different test consequences and
Alderson and Hamp-Lyons (1996) who added the factor of different stakeholders in
the washback intensity, Green (2007a) views it as mainly influenced by participants’
perceptions of test importance and test difficulty. Further, the perceived test
importance and test difficulty by participants result in the variation of washback
intensity. Test importance, as understood by Green (2007a), incorporates test stakes
and plays a deterministic role in washback intensity. In other words, if more
significance is attached to the test by stakeholders, there will be a greater possibility
of intense washback (Madaus, 1988; Popham, 1987). However, intense washback will
not occur if the test is too easy or too difficult.
Figure 3.3. Model of washback, incorporating intensity and direction (Green, 2007a, p. 24)
As a result, in order to achieve the most intense washback, the following

conditions should be met by test stakeholders (Green, 2007a, p. 25).
• value success on the test above developing skills for the target language use
domain;

• consider success on the tests to be challenging (but both attainable and
amenable to preparation);
• work in a context where these perceptions are shared (or dictated) by other
participants.
Taking RQ 1 of the present study (What is the washback of the Grammar and
Vocabulary Test in the SHSEET (the GVT)?) as an example, the ‘overlap’ between
test design characteristics (the GVT) and the focal construct (the development of
communicative language use) in the current model can guide the exploration of
positive, negative, intended, and unintended washback. The focal construct, as
explained by Green (2007a), also refers to the characteristics of the TLU domain (i.e.,
how the target language is used outside of the test itself). To decide the direction or
value of the test washback, the ‘overlap’ can first be analysed through the examination
of content of the tasks (i.e., comparing the test tasks with the TLU domain in actual
teaching). However, the ‘overlap’ should be considered from participant
characteristics and values; that is, how participants interpret the test demands, how
they understand the test content and the relationship between test tasks and the TLU
domain. It thus proposes the necessity of gathering information from participants to
understand their opinions about the ‘focal construct’. It is also specified in the model
that the washback variability of participant characteristics and values play a major role
in the realisation of washback. In this way, differing from previous studies which
mainly collected data from test designers (see, for example, Qi, 2004a, 2004b),
teachers and students should be consulted to enable the researcher to gather
information about the positive and negative as well as intended and unintended
washback. Therefore, in this study, their perceptions were elicited through interviews
with teachers and interviews as well as surveys with students to find out the washback
value of the GVT.
In summary, of the proposed washback models presented in this chapter, Green

(2007a) depicts the most intricate relationship between test stakes and test washback,
and in order to move towards a positive washback direction, test construct, test design,
and stakeholder characteristics are considered. However, comprehensive and
penetrating as his model is, it only touches upon the macro context of participants’
perceptions and test characteristics. The ways in which actual teaching and learning
activities lead stakeholders to promote positive washback still remains a question to

be answered. In addition, test design itself cannot decide the value or direction of
washback. Therefore, although Green (2007a) indicates in the model that washback
direction is also subject to participant characteristics and values, more explicit factors
such as the teaching and learning practices inside classrooms should also be
considered. Most significantly, the time factor is not considered in this washback
model, but previous studies reported that washback varies not only among different
stakeholders and tests, but also over time (Alderson & Hamp-Lyons, 1996; Andrews
et al., 2002; Shohamy et al., 1996; Shohamy et al., 1986).While acknowledging these
drawbacks, Green’s (2007) model is suitable to guide the current SHSEET washback
study theoretically since both the direction and intensity of washback are considered.
The alignment of this model with the current study is then depicted in Figure 3.4.
Figure 3.4. The washback model of the GVT (adapted from Green (2007a))
As shown in Figure 3.4, the focal construct of the SHSEET is the development
of communicative language use as the ECSCE has indicated (Ministry of Education,
2011). However, since the study object is the GVT, the test features (use of MCQs,
decontextualised testing of grammar and vocabulary knowledge, etc.) have to be taken
into consideration, so a further content analysis of the test tasks was conducted before
entering the field (i.e., conducting classroom observation and interviews). Moreover,
to encompass a comprehensive view of washback direction, participant characteristics

and values (i.e., teachers and students in this study) were explored. In the present study,
teachers’ and students’ knowledge and beliefs about the test requirement (mainly about
the ECSCE), and the ways in which they prepare for the test were all considered. The
latter, which is not indicated in the model (i.e., the lack of explicit micro level of
classroom practices), has been proven to be important in the literature (see Figure 2.2).
This information is related to the focal construct of communicative language
development. In addition, the washback intensity of the GVT was also reflected in
participants’ perceptions of the test. In sum, this model guided the research design in
exploring washback.
As high-stakes standardised English tests, the current SHSEET included, are

generally used as agents for educational reform and with an intention to bring about
positive washback, this study should include a learning-oriented perspective.
However, most traditional washback theories and models originate from a general
washback perspective and thus no washback value is indicated (Alderson & Wall,
1993; Burrows, 2004; Shih, 2007, 2009). Further, considering the perceived
drawbacks of Green’s (2007a) washback model (i.e., the lack of explicit micro level
of classroom practices), it is unclear how to direct the test washback in a positive way.
Thus, the inclusion of an LOA framework, which aims to bring about positive
washback, is vital for this study.
Against this background and the limitation of Green’s model, the relationships
among different washback components are explored before moving on to LOA
theories. Therefore, the washback mechanism is reviewed, and a key summary is
provided as follows.
3.3 WASHBACK MECHANISM
As summarised in section 2.3.5, the various influential factors for washback

value and intensity included test factors and participant factors, which can then exert
an influence on teaching and learning factors in test preparation. Based on this
literature summary and drawing implications from Green’s (2007a) washback model,
this study not only presents all washback evidence of the GVT, but also tries to depict
the relationship between those factors. To this end, washback mechanism within GVT
preparation is of concern.

Washback mechanism has been generally discussed by scholars. To start with,
Hughes’ (1993) “participants-process-product” trichotomy model illustrates that the
nature of a test influences participants’ perceptions and attitudes which, in turn, affect
their test preparation practices (i.e., process), and finally influences their learning
outcomes which are perceived as product. Following suit, Bailey’s (1996) washback
model contains washback to learners and washback to the programme. It further
develops the trichotomy washback model (Hughes, 1993) and combines this with the
Washback Hypothesis (Wall & Alderson, 1993). Moreover, Green (2007a) expands
on these two washback models by outlining the relationships between test design
considerations and their mediators of participant characteristics as well as perceptions
of test importance and difficulty (Green, 2013). Further development of the
conceptualisation of the washback mechanism have been proposed, such as Xie and
Andrew’s (2013) washback mechanism with expectancy-value theory.
Empirically, different models attempt to explain and explore the relationship

between test perceptions and test preparation practices or test perceptions and learning
outcomes. For example, participants’ test perceptions or attitudes can affect their test-
related behaviours (Chapman & Snyder, 2000; Dong, 2020; Mizutani, 2009; Xie,
2015a; Xie & Andrews, 2013) and their test preparation behaviours further influence
their products or learning outcomes (Dong, 2020; Xie, 2013). However, washback
mechanism has not yet been fully gauged since few systematic studies have explored
the relationship between perceptions, test preparation practices, and learning outcomes
in a single model (Dong, 2020).
Based on insights from both Green’s model (2007a) and the literature summary
(see section 2.3.5), the researcher posits that test perceptions eventually influence
students’ learning outcomes through the indication of both participants’ affective
factors such as motivation and test preparation practices. This assumption further takes
reference from theoretical conceptualisations from Hughes (1993), Green (2007a), Xie
and Andrew (2013), and Wolf and Smith (1995). According to Hughes (1993),
participants’ test perceptions influence learning outcomes through test preparation
processes. Green (2007a) contends that participants’ “understanding of test demand”
influences their learning outcomes more than learning content. Furthermore, Xie and
Andrew (2013) suggest that test-takers’ perceptions of test use and test design
positively correlate and they influence test preparation through participants’

expectations as well as their evaluations of test importance. Further, Wolf and Smith
(1995) explored the relationships among test consequence, motivation and anxiety,
and test performance; likewise, Jin and Cheng’s (2013) study found that test
importance of CET-4 had a significant influence on test-takers’ anxiety and
motivation. Therefore, from these two studies, it was assumed that participants’
affective factors of motivation and test anxiety could be possible factors to indicate the
relationship between test perceptions and test preparation practices.
Drawing implications from all aforementioned theories and models, this study
conceptualises the washback mechanism of the GVT. In detail, test perceptions include
participants’ perceptions of test design characteristics and test importance, and
participants’ affective factors contain motivation and test anxiety. Further, test
preparation practices mainly refer to participants’ learning strategy use and test
preparation effort. Finally, test performance (i.e., students’ learning outcomes) is
represented by final SHSEET test scores reported by students. To conclude, test
perceptions are assumed to influence participants’ characteristics, which in turn affect
their test preparation practices. As a result, perceptions regarding the test may
influence students’ final learning outcomes. The detailed components of this washback
model will be explored in following chapters and the conceptualisation of washback
mechanism will then be tested through statistical modelling (see section 4.5.3.6).
3.4 KEY LEARNING ORIENTED ASSESSMENT FRAMEWORKS
The inclusion of an LOA model in the present washback study is both possible
and timely on theoretical and empirical bases (see section 2.3.3). As Chapter Two and
previous sections have indicated, the value or direction of washback is widely
discussed in the field of language testing and assessment. However, how to bring about
positive washback to encourage students’ learning has, until recently, not been the
focus of research. Theoretically, the LOA frameworks, reflecting the centrality of
students and learning, and aiming to promote learning, align with the assessment
guidelines and SHSEET design principles (see section 1.2.2) according to the ECSCE
(Ministry of Education, 2011). Empirically, frameworks of LOA are ideal since they
offer practical guidance for the preparation activities leading up to the test. To this end,
this section mainly focuses on key LOA frameworks, which can help answer RQ 2
(What are the opportunities for and challenges of the incorporation of Learning
Oriented Assessment (LOA) principles in GVT preparation?).

Aiming to promote positive washback, LOA places learning tasks at the centre
both inside and outside the classroom (Hamp-Lyons & Green, 2014; Jones & Saville,
2016). The development of LOA has undergone several stages. It originated from the
terms Assessment for Learning (AfL) or formative assessment used by the
Assessment Reform Group (ARG, from 1989 to 2010) in England, and is closely
related to Black and Wiliam’s (2009) concept of formative assessment and Shepard’s
(2000) classroom-based assessment. Further, it was influenced by the criticism
relating to psychometric testing (e.g., the predominant use of MCQs) in the U.S., and
the development and implementation of teacher-based assessment (i.e., teacher-
centred, teachers assess learning objectives in a non-exam situation) in Australia, New
Zealand, Hong Kong, and Scotland (Jones & Saville, 2016). However, in order to
accentuate the traits of learning in assessment and further develop them, Carless
(2007) brought the term of LOA into a wider usage. Progressively, building on the
work of Purpura (2004), Turner and Purpura (2016) have established their own
working framework for LOA which centres on agents in the assessment context and
lists interrelated dimensions in LOA. More recently, LOA has been adopted by
Cambridge English Language Assessment to enhance the synergy between formal
assessment and classroom-based assessment, in order to achieve the dual goals of
assessment to promote learning and to measure as well as interpret learning outcomes
(Jones & Saville, 2016). In sum, four main LOA theories and frameworks (see section
2.3.3) are proposed in the field of language testing and assessment (Carless, 2007;
Hamp-Lyons & Green, 2014; Jones & Saville, 2016; Turner & Purpura, 2016). To
augment Green’s (2007a) washback model at the micro level, two LOA frameworks
of Carless (2007) and Jones and Saville (2016) are elaborated and the alignment with
the present study is also discussed.
3.4.1 The LOA framework of Carless

The LOA framework put forward by Carless in 2007 is the first of its kind which
focuses on the potential of the learning aspects of assessment (Carless, 2007; Carless
et al., 2006). According to Carless (2007), the first and crucial component of LOA
framework is learning, and the starting point for LOA is the assessment processes in
which the significance of the learning exceeds that of the assessment function. In the
learning-oriented assessment project, the framework, adapted from Joughin (2005), is
promoted and summarised by three principles and intends to clarify the current

thinking about assessment and learning in a productive way. The framework is
depicted in the following figure.
Figure 3.5. The LOA framework proposed by Carless (2007)
From Figure 3.5, the link between LOA and positive washback is evident. In
Green’s (2007a) washback model which incorporates intensity and direction (Figure
3.3), the overlap between the focal construct and test characteristics can lead teaching
and learning in the direction of positive washback. For Carless (2007), LOA occurs
when the certification purpose and the learning purpose of assessment overlap with
each other. As for the SHSEET, guided by the ECSCE’s key tenet of learner-
centredness, once the teaching and learning purpose overlaps with the test objective,
LOA opportunities will be achieved. In this sense, LOA coincides with the positive
washback concept in that the relationship between the test objective and teaching and
learning purposes is put in the same position as the ‘overlap’ between the focal
construct and test characteristics (Green, 2007a) to encourage students’ learning
practices.
Further, LOA aims to strengthen learning processes and can be achieved through
both formative and summative assessments (Carless, 2007). The realisation of LOA
can be achieved by following three principles (Carless, 2007, pp. 59-60). First, tasks
for assessment should promote students’ learning in a productive and comprehensive
way. Second, students should be involved in the assessment system through different
ways such as participating in the development of the assessment criteria and quality
and taking part in self- or peer-assessment. Third, teachers should provide timely and
forward-looking feedback to help students’ current as well as future learning.
Although it clarifies the possibility that LOA can promote positive washback,
the LOA framework proposed by Carless (2007) is set at the tertiary institutional level

and from a higher education perspective. Insightful as it is, whether this framework is
applicable to other contexts such as teenage learners in junior high schools is unknown.
3.4.2 The LOA cycle developed by Cambridge English Language Assessment

Conceptually, the LOA cycle is introduced by Jones and Saville (2016) who
claim LOA to be an action theory aiming at bringing about positive washback by
design. In effect, this cycle depicts the detailed process of LOA implementation and
operationalisation. According to Jones and Saville (2016), LOA sits in a much wider
theory of social constructivism and views both effective learning and cognitive
development in a social context. Social constructivism, differing from cognitive
constructivism which focuses on the individual cognition, puts more emphases on the
effective learning in a social and cultural context (Jones & Saville, 2016). In this sense,
the “zone of proximal development” (ZDP) proposed by Vygotsky comes to the front,
and interaction becomes learning rather than the context for learning (Vygotsky,
1986). Moreover, ZPD places effective learning in a social and cultural context rather
than positioning it as the simple development of knowledge (Jones & Saville, 2016;
Sjøberg, 2007). Based on social constructivism and endeavouring to align large-scale
tests (the overall assessment system and the education system of curriculum and
content) with classroom assessment, the following LOA cycle (see Figure 3.6) has
been proposed with the aim to achieve positive washback (Jones & Saville, 2016, p.
13).
Figure 3.6. Evidence for learning – the LOA cycle (Jones & Saville, 2016)

As shown in Figure 3.6, the LOA cycle is reflected in both micro and macro
contexts. At the macro educational system level, all Cambridge English tests are
aligned with the Common European Framework for Languages (CEFR) which defines
macro-level goals and fosters micro-level content which, in turn, leads to an LOA
syllabus. Further, through summative monitoring, the LOA syllabus interacts with the
achievement records which are produced by an external exam. Finally, the
achievement record from the external exam will be interpreted according to the CEFR.
In this sense, it roughly aligns with the washback value dimension (Green, 2007a) of
which the test design characteristics and focal construct from curriculum requirements
are presented in the forms of the external exam (Cambridge exams) and a frame of
reference (CEFR). Most interestingly, the ‘overlap’ between the focal construct and
test characteristics by Green (2007a) is represented by the LOA syllabus in this cycle,
which is produced by the overlap between macro-level objectives and micro-level
content.
At the micro level, learning practices take place in the classroom context.
Generally, tasks can be learning-oriented once the LOA cycle is implemented in a
specific way of where teachers and students are all fully involved and play their roles.
During the LOA cycle inside the classroom, the records generated from teachers’
observation of students’ language activities, both learner-centred and content-centred
activities, are integrated into an informal record which will be jointly interpreted with
the achievement record from the external exam. More importantly, although the LOA
cycle clearly positions learning and learners at the centre, the teaching and teachers’
roles are not overlooked. The teacher’s role can be best depicted by the classroom
LOA circle since they design tasks to inspire and encourage language activities for
students, which are again recorded informally after observation. Further, the records
are used by teachers to make decisions and provide feedback to revise learning
objectives as well as examine prior knowledge. To this end, similar to Carless (2007),
the LOA cycle values the three major principles of learning tasks, classroom
interactions, and feedback at the centre of the micro classroom context. Moreover, the
micro level can also align with and compensate for the washback variability of
participant characteristics and values in the washback model of Green (2007a).
Therefore, participants’ knowledge or understandings of, resources to meet, and
acceptance of test demands are affected by the major principles of learning tasks,

classroom interactions, and feedback in a classroom context. In this way, the washback
variability, which is implicit in Green’s (2007) model, becomes explicit in the LOA
cycle. Before moving on to the strengths of using the LOA cycle in this study, it is
necessary to unpack the relationships between key LOA principles or practices, which
will be investigated in the current study.
The LOA cycle is an ecological and systematic model for AfL (Jones & Saville,
2016). However, in addition to the aforementioned key principles in LOA
(involvement in assessment, classroom interaction, and feedback), learner autonomy
also needs to be considered. In particular, although the importance of learner autonomy
has been recognised by researchers (Salamoura & Unsworth, 2015), it has not been
foregrounded in LOA theoretical frameworks (Carless, 2007; Jones & Saville, 2016).
Nonetheless, it has been claimed that autonomous behaviours can promote assessment
for learning (Lamb, 2010) and are closely related to self-assessment and peer-
assessment (Dam, 1995; Tassinari, 2012). Most importantly, the deeper the
involvement in (self-) assessment, the greater extent of autonomous practices (Bell &
Harris, 2013; Everhard, 2015). Indeed, self-assessment and peer-assessment are
important to cultivating learner autonomy (Little, 1996). These claims provide
theoretical and empirical evidence for including learner autonomy in LOA frameworks
and support its relationship with involvement in assessment and feedback.
To clarify, learner autonomy refers to learners’ “ability to manage one’s own

learning” (Holec, 1981, p. 7). It originates from adult education and self-access
learning systems (Little, 2007) and studies point out that learner autonomy should be
enhanced in children and teenage students to improve AfL practices (Lamb, 2010).
Further, language proficiency has been found to correlate with learner autonomy
(Little, 2007), which can be realised through learners’ involvement in assessment
practices such as self-assessment and peer-assessment with the latter contributing more
than the former to learners’ test performance (Birjandi & Siyyari, 2010).
The macro LOA cycle is closely linked with the reference for interpreting
achievement records or learning outcomes from both the summative external exam and
the formative structured records by teachers. The micro LOA cycle is fulfilled by
teachers and students who go through language activities and provide informal records
for further interpretation. In addition, this LOA framework can be applied to different
CEFR levels of English proficiency, from basic users at the Breakthrough or Beginner

stage (A1) to proficient users at the Mastery or Proficiency level (C2). To this end, it
is also applicable to the junior high school EFL teaching and learning context.
Furthermore, similar to the SHSEET, the Cambridge exams are also large-scale
assessments (Jones & Saville, 2016), which theoretically support the application of the
LOA cycle in the current washback study.
The strengths of this LOA cycle are two-fold. On the one hand, it seeks the
potential of aligning the large-scale summative assessment with a classroom-based
formative assessment by exploring the common ground between them. The external
exam of the SHSEET and internal assessment of classroom activities jointly provide
records of both formal and informal assessment-related activities. On the other hand,
the core and major stakeholder groups are considered at both macro and micro levels.
At the macro level, the higher-level objectives and lower-level assessment content are
decided by curriculum developers and test constructors. At the micro level, LOA
activities are carried out by key stakeholder groups of teachers and students.
Although numerous strengths and possible alignment with the washback model
are clear, the LOA cycle is not without flaws and researchers should consider the
following aspects. First, the reference frame of CEFR for learning outcome
interpretation and the LOA cycle are envisaged in Cambridge English exams in the
European context, while the SHSEET as a yardstick for junior high school graduates
is in a Chinese high-stakes and large-scale standardised testing context. Second, as
Jones and Saville (2016) elaborated, the implementation of the LOA system entails
considerable effort. For example, the development of curriculum and materials at the
macro level, teacher training at the micro level to enhance their ability to understand
and identify students’ strengths and weaknesses, and also the approach which links
classroom assessment and external assessment should all be considered. Despite the
perceived shortcomings, as a positive learning promotion system, the LOA cycle can
be applied in the current SHSEET study.
3.4.3 The LOA cycle in the SHSEET context

The following figure depicts how to align the LOA cycle with the SHSEET (with
the GVT as the focus).

Note. a According to the ECSCE, can-do statements of language knowledge (phonetics, grammar,
vocabulary, functions, and topics) have two levels (Level 2 for Grade 6 and Level 5 for Grade 9).
Figure 3.7. The LOA cycle in the SHSEET context (adapted from Jones and Saville (2016))
In Figure 3.7, at the macro level, the frame of reference for checking students’
learning outcomes is the ECSCE, which sets the proficiency level of the SHSEET at
Level 5. The macro-level learning objectives from the ECSCE (the development of
communicative language use) and the micro-level content from the test (test
characteristics) are reflected in the SHSEET Test Specifications. In other words, the
SHSEET Test Specifications resembles and seeks reference from the ECSCE but
reflects the actual test content. In the current study, the GVT acts as the external exam
of which the learning outcomes will be interpreted in accordance with Level 5 (see
Appendix A) set in the ECSCE in the macro context. As for the classroom learning,
SHSEET-related activities are assumed to be learning-oriented as the ECSCE
prescribes (see section 1.2.2). Therefore, the whole SHSEET LOA cycle is envisaged
to be moving towards a positive direction, which can be explored through classroom
observations.
The SHSEET, which has a major selection function, first needs to bring about
educational changes to meet the requirement of the “quality education” policy
proposed by the Chinese government from the 1990s onwards (Dello-Iacovo, 2009).
Through “quality education”, in contrast to the “exam-oriented education”, student

learning is put at the centre. Second, the SHSEET is further informed by CLT, as
documented in the ECSCE. The LOA cycle can be used to explore the actual test
effects of the SHSEET on EFL teaching and learning both from the macro curriculum
level and from the micro classroom context. Therefore, the current research design
makes particular use of the LOA cycle to explore the classroom practices of the
SHSEET test preparation. The SHSEET LOA cycle is thus useful in discovering the
value of washback and the existence of LOA in the qualitative designs of classroom
observation and interview. Further, the LOA cycle was considered in the quantitative
instrument design relevant to the RQ 2 (What are the opportunities for and challenges
of the incorporation of LOA principles in GVT preparation?) as well. To this end, a
new washback model which links the LOA cycle (Jones & Saville, 2016) with the
washback model (Green, 2007a) needs to be clearly depicted to reflect the research
design.
3.5 A NEW WASHBACK MODEL INCORPORATING LOA
The potential of using the LOA cycle (Jones & Saville, 2016) in the chosen
washback model (Green, 2007a) has been explained in the previous sections.
Moreover, the key concepts in Carless (2007) are also considered in this new washback
model, which are indicated in LOA practices. Figure 3.8 presents the relationship
between and the alignment of the two models in one diagram.
Figure 3.8. A new washback model incorporating LOA (Green, 2007a; Carless, 2007; Jones &
Saville, 2016)
As denoted in Figure 3.8, the macro level and micro level of the LOA cycle are
embedded into the washback dimension of value (or direction). The key elements of
the macro value comprise both the curriculum reference of the ECSCE (focal

construct) and the external exam (GVT design characteristics). As for the micro value,
the major factors include both the participant characteristics and the teaching and
learning activities (including teaching methods/learning strategies and LOA practices
such as classroom interaction, involvement in assessment, feedback, and learner
autonomy). In addition, the participant perceptions of test importance and difficulty
remain unchanged from the original washback model. Therefore, this new washback
model incorporating LOA (Figure 3.8) guided the entire research design and thesis
writing processes. Nevertheless, it is necessary to point out that the detailed factors at
both the macro level (e.g., the GVT design characteristics) and the micro level (e.g.,
perceptions of test characteristics, affective factors, teaching methods/learning
strategies) are not listed at this stage, since the literature findings (see Figure 2.2) and
the analysis of the qualitative data were used to help the researcher to make the final
decision on their potential factors.
The advantage of this proposed model is two-fold. On the one hand, the model
is comprehensive in that it provides all possible factors for the GVT washback on
teaching and learning. On the other hand, the model provides relevant LOA practices
to guide the data collection and analysis. In this way, it enables the researcher to
explore the LOA dynamic in the actual summative test preparation stages. However,
one limitation exists regarding the proposed model, that is, the inter-relationships
among different factors are not depicted. For example, the relationships among
different washback variability and washback intensity factors. Nonetheless, the
proposed model contributes to the extensive knowledge of washback in the GVT
context and thus provides theoretical as well as empirical directions for the proposed
study. To overcome the limitation, the researcher has made an attempt to investigate
the inter-relationships between those factors by exploring their structural relationships.
Thus, the washback mechanism of the GVT will be presented in Chapter Six. Most
importantly, a revised washback model incorporating LOA will be finally depicted
after the data analysis.
3.6 CHAPTER SUMMARY
To sum up, based on the empirical findings from Chapter Two and the theoretical
discussion in this chapter, the present washback study of the GVT used the washback
model with two dimensions of the value and intensity (Green, 2007a) and the LOA
cycle with both micro and macro levels (Jones & Saville, 2016). As has been

discussed, the synergy and differences between Green’s (2007a) washback model and
the general washback theories and models (Alderson & Wall, 1993; Burrows, 2004;
Shih, 2007, 2009) make the former distinctive in that it better depicts the complexity
of the phenomenon of washback. Moreover, how to operationalise teaching and
learning to achieve a positive result is further suggested by the LOA cycle. Jointly, the
combined model helped to answer the study’s research questions and guided the
exploratory sequential mixed methods research (MMR) design which is detailed in the
next chapter.

Chapter 4: Methodology
The review of literature in Chapter Two and the theoretical framework in

Chapter Three established the empirical and conceptual basis of the current study. This
chapter provides the rationale for the chosen methodology and articulates the research
process. In order to explore the washback of the Grammar and Vocabulary Test in the
Senior High School Entrance English Test (the GVT) from both teachers’ and
students’ perspectives, two main research questions and their attendant sub-research
questions are addressed in the present study:
RQ 1: What is the washback of the GVT?

Learning Oriented Assessment (LOA) principles in GVT preparation?
In order to answer the above research questions, a mixed methods research

(MMR) design was adopted. This chapter is arranged in the following sequence.
Section 4.1 discusses the methodological foundation of washback studies and section
4.2 explains MMR design. Section 4.3 presents the exploratory sequential MMR
design adopted in the current study. Section 4.4 and section 4.5 explain the initial
qualitative phase and the following quantitative phase respectively to answer the
research questions. Section 4.6 addresses ethical considerations and section 4.7
concludes the chapter.
4.1 THE METHODOLOGICAL REVIEW OF WASHBACK STUDIES
Chapter Two and Chapter Three have demonstrated the abundance of washback
studies and theories. However, it is important to point out that as a well-established
concept, washback is not only theoretically rich but also methodologically fruitful. An
overview of the literature establishes that a wide range of methods has been employed
in empirical washback studies, among which interviews, questionnaires, test
administration, and classroom observations are most common. Structured
questionnaire is a widely used quantitative method (Hawkey, 2006; Xie & Andrews,
2013). Furthermore, questionnaires are also used in combination with qualitative
Chapter 4: Methodology 75
methods such as interviews (Gu & Saville, 2012; Qi, 2007) and classroom observations
(Hawkey, 2006; Saif, 2006) to employ MMR approaches in washback studies
(Burrows, 2004; Green, 2007b). Other methods such as textbook or material analysis
(Hawkey, 2006; Tsagari, 2009), administration of tests (Andrews et al., 2002; Green,
2007b), and student learning diaries (Gu et al., 2014; Zhan & Andrews, 2014) are also
used.
Quantitative research methods, questionnaire in particular, are commonly used

in empirical washback studies. For example, in the context of the College English Test
Band 4 (CET-4), Xie and Andrews (2013) examined the relationship between test
perceptions and test preparation through structural equation modelling (SEM).
Importantly, the test perception questionnaire and test preparation questionnaire were
distributed to 870 sophomores ten weeks and two weeks before the test administration
respectively. Such periods are claimed by many other studies (Cheng, 2005; Cheng et
al., 2011; Watanabe, 1996a) to be a crucial stage when washback at its most intense
can be observed. The study of Xie and Andrews (2013) influenced the current research
design in regard to questionnaire distribution timing.
In addition to quantitative investigations, empirical qualitative research has been

abundant in washback studies over time (see, for example, Alderson & Hamp-Lyons,
1996; Green, 2006b; Pan & Roever, 2016; Tsagari, 2011; Wall & Alderson, 1993;
Watanabe, 2004; Zhan & Andrews, 2014). For example, Pan and Roever (2016)
conducted a social impact study of English certification exit requirements in Taiwan.
Through interviewing 19 potential employers, they found that the certification was
viewed by employers as proof of students’ future professional capabilities rather than
a representation of their English language skills. Therefore, unintended test
consequences were verified.
Considering the complexity of washback phenomena, it is beneficial to use both

quantitative and qualitative research methods to reveal the washback in different
contexts. For example, to investigate the washback of the National Matriculation
English Test (NMET) writing task, Qi (2007) employed teacher and student
questionnaires, interview, and classroom observation to triangulate findings on the
complexity of washback. Other washback researchers have also combined quantitative
data collected through questionnaires and/or test administrations with qualitative data
from interviews and/or classroom observations (see, for example, Andrews et al.,
76 Chapter 4: Methodology
2002; Erfani, 2012; Fan & Ji, 2014; Özmen, 2011; Qi, 2005, 2007; Yang et al., 2013).
For instance, an exploratory sequential MMR design was adopted by Qi (2005) to
explore what factors influenced the intended washback of the NMET. The first phase
of the study collected qualitative data from interviews with test inspectors, test
constructors, teachers, and students. Qualitative data were then analysed to inform the
design of teachers’ and students’ questionnaires. Quantitative data, in turn, were used
to generalise findings from qualitative data.
In summary, considering the complexity of washback, methodological

triangulation can compensate for the shortcomings of the employment of only
quantitative or qualitative methods. More importantly, the results from the initial phase
can inform the subsequent phase, thus constituting an integrated MMR design for
understanding complex research phenomena. As such, the current study adopts an
exploratory sequential MMR design to investigate the test washback from both
students’ and teachers’ perceptions, and the opportunities for and challenges of
implementing LOA principles in GVT preparation.
4.2 MIXED METHODS RESEARCH
In MMR, worldviews and research questions are perceived from multiple

perspectives and positions (Johnson et al., 2007). Taking advantage of plural
worldviews, MMR enables a synergy of the strength of both quantitative and
qualitative methods (Creswell, 2011; Nastasi et al., 2010; Ziegler & Kang, 2016). To
achieve a synergic effect (Ziegler & Kang, 2016), MMR adopts abductive reasoning
which moves in between deduction (testing theory and assumption) and induction
(discovering pattern/construct) (Teddlie & Tashakkori, 2012). In this vein, MMR
“combines elements of qualitative and quantitative research approaches (e.g., the use
of qualitative and quantitative viewpoints, data collections, analysis, inference
techniques) for the broad purposes of breadth and depth of understanding and
corroboration” (Johnson et al., 2007, p. 123). The focus of MMR has thus shifted to
an inclusive research process of data collection, findings integration, and implications
drawing from both quantitative and qualitative methods in a single study (Tashakkori
& Creswell, 2007). As MMR may provide researchers with a deeper understanding of
the research phenomena than the single use of either method, it is becoming
increasingly powerful in social science research in general (Creswell & Plano Clark,
2011; Moeller et al., 2016) and particularly in language testing and assessment studies
(Moeller et al., 2016).
Taking the present study as an example, the MMR design considered

triangulation, complementarity, and development (Greene et al., 1989). First,
triangulation of both qualitative and quantitative data helped to verify washback
practices, participants’ perceptions, and LOA opportunities. Second, qualitative data
(from classroom observations and interviews) collected from a small-sized sample
were complemented by quantitative data (survey). Third, development of a
quantitative instrument (survey in this study) was based on the classroom observation
and interview results, as previous literature provided insufficient information about
LOA practices and the actual practices inside the classroom. This design helped the
researcher to interpret the research findings, validate the constructs, and counteract the
biases or shortcomings of the single use of the quantitative or qualitative method.
Of all the different models of MMR, the present study adopted an exploratory
sequential MMR design to first collect qualitative data to explore factors for washback
values, washback intensity, and LOA opportunities and challenges. Findings from
qualitative data analysis informed the quantitative phase. The quantitative data
collection and analysis in turn provided further insight into the qualitative findings. As
a result, the research problem and questions were more comprehensively understood,
and both quantitative and qualitative methods were considered indispensable in the
present study. Table 4.1 depicts the current MMR design in detail.
Table 4.1
Qualitative and quantitative methods in the present study
Research methods Aims to investigate RQs to answer

Classroom • Teaching and learning practices
observation • LOA practices
Qualitative • Perceptions of the GVT washback
Interview • Perceptions of LOA RQ1
• Reasons/factors for perceptions +
RQ2
• LOA practices
Quantitative Student Survey • Perceptions of the GVT washback
• Test preparation practices
To summarise, the MMR design of this study addressed different aims in

different stages, but the results were intended to answer the research questions by
combining both the qualitative and quantitative findings. The classroom observations
recorded and discovered the teaching and learning practices regarding the GVT
washback and LOA possibilities. The interviews further asked for student and teacher
perceptions about the GVT washback and LOA possibilities, especially exploring the
factors that influence their perceptions. The student survey was designed to address
the same questions regarding learners’ perceptions of the GVT washback and LOA
practices. In addition, the student survey sought to explore the qualitative findings in
a larger sample. The exploratory sequential MMR design and procedure are detailed
in the next section.
4.3 AN EXPLORATORY SEQUENTIAL MIXED METHODS RESEARCH

DESIGN
This exploratory sequential MMR study unfolded in four stages (see Figure 4.1).
In brief, Stage 1 informs Stage 2, Stage 2 leads to Stage 3, and Stages 1, 2, and 3
necessitate Stage 4 by integrating the final interpretation to answer the research
questions. Therefore, the four stages were accorded with equal importance throughout
the research.
STAGE 2 STAGE 4
• qualitative data • quantitative data
collection • instrument collection • interpretation
• qualitative data design • quantitative data
analysis analysis
STAGE 1 STAGE 3
Figure 4.1. An exploratory sequential design in the present study
In Stage 1, through classroom observations and interviews, actual classroom

practices were observed, and participants’ perceptions were consulted from a small
participant group (see details in section 4.4). The qualitative data analysis provided
preliminary answers to research questions, which in turn paved the way for instrument
design in Stage 2. In Stage 2, three sources informed the quantitative instrument
design: selected theoretical frameworks of washback and LOA (see Figure 3.8), extant
knowledge discussed in the literature review (see Figure 2.2), and preliminary results
of the qualitative phase (Stage 1). In Stage 3, quantitative data were collected and
analysed to examine, generalise, or problematise findings from the initial qualitative
phase (Stage 1). In Stage 4, both the qualitative dataset and the quantitative dataset
were integrated to answer the research questions. Details of Stages 2, 3, and 4 are
provided in Section 4.5.
The entire research procedure is depicted in Figure 4.2 (see next page). The
credibility and accuracy of the findings were maximised through data triangulation
throughout the whole research process. The research procedure is detailed in following
sections.
4.4 QUALITATIVE PHASE
Qualitative approaches, interpretative in nature, are used to develop an in-depth

understanding of a phenomenon and treat the meaning of the occurring phenomenon
as the focal point (Hatch, 2002). In this study, the qualitative findings revealed the
general practices and factors which influence the GVT washback, and which the latter
quantitative phase helped to clarify and generalise. Moreover, these findings were used
to construct the quantitative instrument.
Of all the major types of qualitative data collection, observations and interviews
are recommended for exploratory ends (Johnson & Christensen, 2012; Merriam,
2016). In this study, the qualitative phase involved the collection of data from three
sources: classroom observations of three Grade 9 English teachers’ classes (15
sessions in total, consisting of five 40 minute lessons with each class), semi-structured
individual interviews with the three class teachers, and three focus groups with 18
students (six students in each group). Qualitative data sources are documented in Table
4.2.
Table 4.2
Summary of participant involvement in qualitative phase
Classroom Semi-structured
Participants Focus group interviews
observations interviews
Teachers √ √
Students √ √
4.4.1 Site selection

As elaborated in Chapter One (see Table 1.1), the major research site was in
Chongqing where four types of academic junior high schools (regular junior high
schools, 9-Year schools, 12-Year schools, and combined high schools) are situated.
Three junior high schools in Chongqing were chosen, two were combined high schools
identified as School A and School B, and the other was a regular junior high school,
identified as School C. Relevant school information is listed in Table 4.3.
Table 4.3
Information on the participating schools
School A * School B School C

Combined high Regular junior high
Combined high school
School type school (Grade 7 to school (Grade 7 to
(Grade 7 to 12)
Grade 12)/key school Grade 9)/non-key school
About 600 (55
No. of Grade 9 About 930 (47 students on 103 (35 students on
students on average
students average in each class) average in each class)
in each class)
No. of Grade 9
18 6 2
English teachers
No. of grade 9
20 11 3
classes
Number of 9 classes per week (40
7 classes per week (40 6 classes per week (40
English class per minutes each, 2
minutes each) minutes each)
week classes on Saturday)
Textbooks introduced from Textbooks published
overseas (before Grade 9); by PEP; Ba Shu
Textbooks published by
Teaching Textbooks published by Talents SHSEET
PEP; New Pivot (新支
materials People’s Education Press Final Review (巴蜀
点)
(PEP) in Grade 9; New 英才中考总复习方
Direction (新方向) 案)
*
As suggested by the Teaching and Research Officer (TRO), both School A and School B are key
schools, and School C is a non-key school.
The researcher was able to implement classroom observations and interviews in

these three schools through the introduction of a Teaching and Research Officer
(hereafter TRO) in Chongqing, who facilitated the initial contact with and access to
the participating schools. The introduction from the TRO helped the researcher to
obtain consent from the Head of English Curriculum as well as teachers and students
involved in classroom observations and interviews. Therefore, before entering the
field, the researcher had already established some connection with the schools.
Although these three schools differed in aspects like facilities, funding, and teacher
resources, they had similarities in terms of classroom intake, curriculum, Test
Specifications, and teaching materials. Moreover, all students took the same SHSEE
test battery. School A is a more prestigious junior high school, School B is an average
performing junior high school, while School C is a low-performing junior high school.
This allowed data variety and a comprehensive understanding of the washback
phenomenon.
4.4.2 Participants
For classroom observations, one Grade 9 class from School A (48 students), one
Grade 9 class from School B (52 students), and one Grade 9 class from School C (36
students) were chosen. The three English teachers of these classes were then
interviewed individually after classroom observations, and 18 students (six from each
class) were interviewed in three focus groups. Grade 9 was chosen because the students
in this grade were about to sit for the SHSEET. Specifically, the second semester of
Grade 9 was chosen because it was close to the examination date (each year, the
SHSEET is tested on the morning of 14th June from 9am to 11am in Chongqing), a
time relating to seasonality of washback (Bailey, 1999; Cheng, 2005), which has been
demonstrated by previous washback studies to be critical. Hence, the closer the time
of examination, the more intensive the test washback will be observed (Bailey, 1999;
Cheng, 2005; Cheng et al., 2011). It was beneficial to focus on Grade 9 in this study
to identify the extent the GVT might influence the teaching and learning practices, and
the opportunities that the GVT allowed for learning-oriented practices in test
preparation classes.
Both the participants in classroom observations and semi-structured interviews

were selected by using convenience sampling (Creswell, 2015; Fink, 2009; Johnson &
Christensen, 2012). As expressed in site selection, the personal connection with the
schools influenced the researcher’s final choice, thus it aligned with the characteristics
of convenience sampling that data are collected from participants who are available
and willing to participate, and also easily recruited from the population. In this study,
since the time for data collection was quite limited, convenience sampling saved the
researcher’s time.
Practical constraints were believed to force researchers to choose the

convenience sampling (Johnson & Christensen, 2012) for classroom observations. In
this study, the convenience sampling for classroom observations was quite limited in
the sense that data from only three classrooms might be biased and not be able to record
the totality of washback. However, on the positive side, it could enable an in-depth
understanding of the washback phenomenon, which, in turn, generated more precise
questions to be asked in the interview stage and added to the refinement of measures
in the quantitative instrument design. More importantly, staying in one classroom for
a longer time helped the researcher to establish trust and good communication with the
classroom teacher and students, which was also a prerequisite for running smooth
interviews.
As for semi-structured interviews, three teachers were included, each from one
of the three observed classrooms. Elsewhere, researchers suggest varied numbers of
interviewees, from five to 25 (Creswell, 2013), or from 12 to 20 (Kuzel, 1992), with
saturation commonly occurring after the first 12 interviews (Guest et al., 2006; Lincoln
& Guba, 1985). However, saturation was not the purpose of the interviews in the
current study. The purpose was to “pair” classroom observation and teacher interview.
Purposive sampling was applied for conducting focus groups. According to

Johnson and Christensen (2012, p. 231), purposive sampling is used when researchers
intend to specify “the characteristics of a population of interest” and attempt to “locate
individuals who have those characteristics”. In the present study, there were three
focus groups in total, selected from each of the three observed classes. After
conducting the classroom observations, the researcher followed teachers’
recommendations and purposively selected six students from each class to form the
focus groups. Six participants within each focus group aligns with the suggestion of
Chrzanowska (2002) and Johnson and Christensen (2012). Although it was impossible
for those students to represent the entire student population, the selected students were
characterised as including high-achieving, intermediate-achieving, and low-achieving
students. Classroom teachers, who knew their students better than the researcher,
helped to select appropriate participants.
4.4.3 Classroom observations

Observation entails watching people’s behaviours in specific situations to collect
genuine information of interest (Johnson & Christensen, 2012). Theoretically, the
implementation of classroom observation reflects the claim that attitudes and
behavioural acts are not always identical (Johnson & Christensen, 2012). Empirically,
according to Alderson and Wall (1993) and Bailey (1996), interviews and
questionnaires are not completely reliable or comprehensive; thus, they argue that
observing genuine classroom practices helps to clarify the phenomenon of washback.
The initial qualitative observation was conducted to collect information about teaching
and learning practices regarding the GVT washback in classrooms. More significantly,
the researcher explored the extent to which LOA practices were enacted and the
potential for LOA practices in test preparation courses.
Before entering the classrooms, the researcher obtained consent from schools,
teachers, and students. Traditionally, qualitative observation is always conducted in
natural settings with an exploratory purpose, and four types of observers’ roles are
classified: the complete participant, the participant-as-observer, the observer-as-
participant, and the complete observer (Johnson & Christensen, 2012). In order to
minimise unexpected effects or “frontstage behaviour” (Goffman, 1971), the
researcher acted as a “complete observer” (Johnson & Christensen, 2012) who took on
the role of “outsider” and nonparticipant observer (Creswell, 2015; Patton, 2015). In
this sense, the researcher entered the classrooms from the back door and sat in the back
of the class.
Classroom teachers’ information were obtained before conducting the classroom

observations. Among the three observed teachers, Lan and Hu were comparatively
new teachers, while Zhang was an experienced teacher who had 18 years’ teaching
experience. Moreover, Lan and Hu were teaching Grade 9 for the first time. The three
teachers had different education background and enjoyed divergent school resources
(e.g., professional development programs, teaching materials), ranging from high level
to low level. It is important to note that Zhang was awarded a Bachelor’s degree
through a correspondence education program in 2000, but she started teaching in 1994
in her current workplace. More information of these three teachers is summarised in
section 5.1.
In total, 15 teaching sessions (five sessions for each teacher) were observed from
April to May in 2018 during the second semester of the final junior high school year.
The observation timetable was based on the teachers’ willingness and availability.
Although each school had a different number of English lessons per week, each lesson
lasted 40 minutes, which enabled the collection of a comparable amount of data across
schools. Besides, the observed classroom sessions were also audio-recorded with a
digital recorder, and extensive field notes were taken during and after observations
since it was necessary that field notes should be taken down, corrected, and edited in
time for memorising important details for later analysis (Johnson & Christensen,
2012).
In order to better record the important phenomena, the researcher designed an

observation scheme (Appendix E) for recording and keeping observations in an
organised manner (Creswell, 2015). The strength of the scheme is that it was open-
ended; therefore, any idea that was omitted in design could be easily added after each
observation session. Generally, the scheme considered five major aspects. The first
part recorded demographic information. Part A and Part B were designed by referring
to the Communicative Orientation Language Teaching (COLT) (Sinclair & Coulthard,
1975), a widely used observation instrument in washback studies (Green, 2006b), and
the theoretical framework of the LOA cycle adopted (see Figure 3.7). Therefore, both
parts were used to document test preparation practices. Part C mainly dealt with the
omitted information at the design stage; therefore, anything interesting and worthy of
attention was added to the scheme. In addition, nonverbal behaviours were also
recorded in this part. After each classroom observation, the comment for the
observation and questions to be asked in the follow-up interviews were reflected in
Part D.
In order to conduct efficient observations, background information about the

class, teacher, and students was also obtained. A short informal conversation with each
classroom teacher before the first observation was conducted to help the researcher
understand more about the observed classes. In fact, the classroom observation scheme
was mainly used for the researcher to comprehend the nuanced understandings of the
ecology of LOA (Jones & Saville, 2016), which facilitated the researcher’s analysis of
classroom interactions afterwards. Therefore, although the scheme was not included
in the qualitative analysis, it helped the researcher to keep a record and thus understand
the dynamics of the classes and LOA practices in classes.
4.4.4 Interviews
Interview, as a widely-used data collection method, is often preferred by
researchers to conduct qualitative studies, during which the interviewees are asked to
answer general and open-ended questions, and the interviewer records the answers
(Creswell, 2015; Johnson & Christensen, 2012; Patton, 2015). As indicated in Table
4.1 and Figure 4.2, the post-observational phase contained interviews to ascertain and
explore participants’ perceptions of the GVT washback and opportunities for and
challenges of the incorporation of LOA principles in test preparation.
Specifically, this study adopted semi-structured one-on-one interviews with

teachers and focus group interviews with students. To ensure the efficiency of
interviews, the interviewer (i.e., the researcher herself) considered seven main aspects.
First, the interview consent was obtained before the interview started. Second, the
researcher tried to keep impartial (Johnson & Christensen, 2012). Third, the researcher
built trust with interviewees by 1) orally introducing herself and this PhD project; 2)
explaining the significance of this project and the value of their participation in the
integrity of this washback study; and 3) assuring interviewees that their information
would be kept confidential. Fourth, all the interviews were audio-recorded to secure
the data integrity for future retrieval in the data analysis phase. Fifth, separate
interview protocols for individual and focus group interviews were used. Sixth, since
all the participants were Chinese speakers and Grade 9 students might be shy to speak
in English, Chinese was used during interview processes to enable better
communication and eliminate misunderstanding. Seventh, to eliminate
misunderstanding and difficulty in answering interview questions, specific terms such
as “washback” were avoided. Instead, “test influence on teaching and learning” was
adopted.
It is important to note that the term “Learning Oriented Assessment” was used
in teachers’ semi-structured interviews, but not in students’ focus groups. The rationale
behind this decision was the actual consideration of participants’ assessment literacy
and age factors. This decision, made after experiencing actual classroom sessions and
interview discussions, was due to students’ difficulty in comprehending such a
technical term and the consideration of data collection efficiency. Nonetheless, to
better explore teachers’ perspectives, an LOA information sheet (e.g., a brief definition
of LOA) was provided to teachers after they gave their understanding the term.
Semi-structured interviews
Semi-structured one-on-one interviews were conducted with the three observed
English teachers. The one-on-one interviews in this study were semi-structured mainly
due to question design; that is, the researchers asked open-ended questions with broad
topics in a flexible wording or order (Minichiello et al., 2008). Semi-structured
interviews in the present study offered guidelines for the interview questions,
permitted flexibility to interviewees, and allowed possibilities for the interviewer to
probe new ideas (Merriam, 2016; Simons, 2009). After collecting and briefly analysing
classroom observation data, the individual semi-structured interviews were conducted
face-to-face in April and May 2018.
An interview protocol was used (Appendix F): except for some demographic
information such as date, place, and interviewer, the protocol for the interviews
contained open-ended questions developed around previous relevant studies, the LOA
framework, and the research questions. The interview questions were designed to elicit
teachers’ perceptions and their teaching experiences in terms of test washback and
LOA. Since those teachers were all experienced and quite familiar with the researcher
after going through the classroom observation stage, they were expected to articulate
and share their opinions comfortably.
Focus groups
As part of the qualitative phase of this exploratory sequential MMR study, focus
groups were immediately scheduled after asking for teachers’ permission and students’
availability. Focus groups were conducted to explore similar issues from the
perspective of students’ learning experiences and perceptions of the GVT as well as
LOA. As indicated in section 4.4.2, three groups of six students were purposively
selected to take part in the interviews. The information on student participants from
three schools is shown in Table 4.4, Table 4.5, and Table 4.6.
As shown in Table 4.4, Fei-SA and Chao-SA were students with a high language
proficiency from Lan’s class, Ming-SA and Ling-SA were ranked next, while Wei-SA
and Xia-SA had scored comparatively lower. Therefore, participants in School A were
regarded as students with very high (Fei-SA, Chao-SA) and intermediate language
proficiency levels. The reason for this categorisation was to keep student language
proficiency across schools in mind.
Table 4.4
Information on the participating students from School A
Student Pseudonyms Gender Age (years English Mock SHSEET
old) learning from test score
SA-S1* Ming-SA Male 15 Grade 7 120
SA-S2 Wei-SA Male 15 Not given 110
SA-S3 Fei-SA Male 15 Not given 140
SA-S4 Xia-SA Female 15 Grade 3 110
SA-S5 Ling-SA Female 15 Grade 3 120
SA-S6 Chao-SA Male 15 Not given 140
* SA-S1 indicates that this is Student 1 from School A.
The six students suggested by Hu were generally at a similar language

proficiency level. The reason that Hu chose those six students was “students who have
signed the contract3 [for a senior high school enrolment] may feel fine to have their
one-hour study time taken”. She further considered it necessary for students with low
language proficiency level to have more time to study as participating in the focus
group might affect their mood. Therefore, School B students had an intermediate level
of language proficiency (see Table 4.5).
Table 4.5
Information on the participating students from School B
Age (years English Mock SHSEET

Student Pseudonyms Gender
SB-S1* Long-SB Male 14 Grade 3 121
SB-S2 Xun-SB Male 15 Grade 4 121
SB-S3 Hui-SB Male 15 Grade 3 106.5
SB-S4 Yao-SB Female 15 Grade 3 111
SB-S5 Shu-SB Female 14 Grade 3 117.5
SB-S6 Na-SB Female 15 Grade 3 127
* SB-S1 indicates that this is Student 1 from School B.
Similarly, Zhang recommended six students to participate in the focus group. As

indicated in Table 4.6, School C students were categorised as participants with three
levels of language proficiency. Therefore, it was found that students in Zhang’s class
had a wider range of English proficiency. This phenomenon aligned with their school
background, that is, a low performing non-key school in the district. As a result,
students with language proficiency levels from high (Hua-SC) to low (Meng-SC, Kai-
SC) were included in this focus group.
Table 4.6
Information on the participating students from School C
Student Pseudonyms Gender Age (years English Mock SHSEET

SC-S1* Fang-SC Female 14 Grade 6 106.5
SC-S2 Ping-SC Female 15 Grade 1 127
SC-S3 Jing-SC Female 15 Grade 6 100
SC-S4 Meng-SC Male 16 Grade 7 61
SC-S5 Kai-SC Male 15 Grade 3 73
SC-S6 Hua-SC Female 15 Grade 3 135
* SC-S1 indicates that this is Student 1 from School C.
3
Contract signing is a common phenomenon in the SHSEE in Chongqing. According to teachers, there
is one large-scale mock SHSEE test in April each year, and after the release of the test scores, senior
high schools will pre-enrol students that meet up with their enrolment standards. Therefore, after signing
this senior high school enrolment contract, students are expected to attend the school, if the actual
SHSEE score of a student finally meets their requirement.
Theoretically, the idea of a ‘focus group’ originated from sociology (Merton et
al., 1956; Merton & Kendall, 1946), which means that the moderator leads the
discussion to collect data on the same topics or questions from a group of individuals
(Kamberelis & Dimitriadis, 2013). Therefore, in the current study, the moderator (i.e.,
the researcher) led the group discussion to elicit responses from all individuals but not
intrude when interviewees were talking. Once relevant ideas were expressed by the
interviewees, the researcher quickly jotted them down and made a quick decision of
whether to ask further questions about those ideas.
On the one hand, it is claimed that a focus group should usually include
homogeneous participants to improve discussion or heterogeneous participants based
on the research purpose (Johnson & Christensen, 2012). Although resources were
limited, the researcher tried to have heterogeneous interviewees in the focus groups.
Therefore, students with different English proficiency levels from low to high were
included. Moreover, interaction and cooperation could help to exchange and build
upon ideas smoothly in groups.
On the other hand, it is quite challenging for the interviewer to lead and keep all
the participants focus on the interview topic (Johnson & Christensen, 2012) and not be
distracted by others’ opinions or talking about other irrelevant ideas. To address this
concern, an interview protocol was used. The protocol design for the focus group
contained two sections (see Appendix G): the interview record (demographic
information) and interview questions.
Although focus groups are useful to collect shared understanding from several
individuals at a time and yield the best information through active interactions, it is
possible that students generate a ‘group think’ (Simons, 2009) and some students feel
hesitant to express themselves (Creswell, 2015). These challenges were addressed
through the seven steps mentioned at the beginning of section 4.4.4. Furthermore, in
order to recognise individual voices in audio-recordings, the researcher identified each
student by numbering them as SA-S1, SA-S2, SB-S1, SC-S1, etc. before starting the
interview. Thus, students responded to the interview questions by taking turns.
However, pseudonyms were used in data transcription and analyses in the main study.
4.4.5 Transcription and translation
Before data analysis, the audio-taped excerpts from classroom observations and
interviews were transcribed. Due to funding limitations, the researcher transcribed all
the recordings by herself, which was considered as an effective measure since the
researcher knew the materials better after conducting all the observations and
interviews in person. Therefore, the firsthand experience of the research process
helped transcription (Halcomb & Davidson, 2006). Moreover, field notes, observation
schemes, and interview protocols directly supported the transcripts and helped to better
understand the recordings. Therefore, during transcription, the researcher combined
the voice recordings with the field notes (i.e., Part C in the observation scheme) and
kept a transcription log when she made changes to transcripts. In this way, the
comprehensiveness of the qualitative data was ensured. Moreover, a transcription
symbol list was adapted from Powers (2005) to guide the transcription (see Appendix
H).
As stated in the previous sections, all the participants were Mandarin speakers
and the study was conducted by using Mandarin. Therefore, language translation was
a necessary and crucial process during the instrument design, data analysis and for the
final report. At the research design stage, for the predesigned interview protocols, the
researcher offered both the English and Chinese versions for the supervisory team to
check the reliability and translation. The supervisory team includes a Chinese-English
bilingual scholar who has studied and researched in English speaking countries for
many years. In addition, the researcher herself is familiar with bilingual translation as
an English major and has studied as well as used English for more than fifteen years.
Before entering the filed, all protocols were discussed with the supervisory team.
After finalising qualitative instruments, the researcher then started to collect data
in Chinese. The use of the shared first language enabled better communication and
understanding between the researcher and participants, especially the teenage students.
At the data analysis stage, the original analysis was in the source language of Chinese
(Squires, 2009), but the researcher kept communication with the supervisory team in
English to get advice on analysis and writing up of the project in English.
At the final stage of project report, a Chinese-English translation was done by

the researcher first. Although cross-checking by different translators, especially
competent bilingual professionals is beneficial (Jaeyoung et al., 2012; Regmi et al.,
2010), this was unrealistic because of the limited funding and resources for the current
project. Nonetheless, dynamic equivalence, which seeks the most natural way to
enable target language users to comprehend the information reproduced and translated
by source language users, was used as the translation principle (Nida, 1977; Sutrisno
et al., 2014). Dynamic equivalence was essential in that the collected data reflected
Chinese culture, but the report in English demands translation from Chinese to English
which brought about cultural change (Halai, 2007). Although it was demanding, the
researcher’s capability in Chinese-English translation, the bilingual capacity of the
supervisory team, plus cross-checking of the reported scripts by several English
speakers enabled the integrity of the translation to be established.
4.4.6 Thematic analysis

Data analysis started after completing the transcription and translation. At this
stage, thematic analysis, which is widely used in most qualitative studies, was used for
identifying, generating, analysing, and interpreting themes from the collected data
(Braun & Clarke, 2006; Clarke & Braun, 2017). A theme derived from data represents
a pattern of meaning and is related to research questions (Braun & Clarke, 2006).
Two types of thematic analysis have been identified. One is inductive (thematic)
analysis (Braun & Clarke, 2006; Patton, 2015) or data-driven thematic analysis
(Boyatzis, 1998). As the name indicates, this analysis mainly uses qualitative data to
generate new ideas, identify themes, and/or theories. The other type of thematic
analysis is deductive (Braun & Clarke, 2006; Patton, 2015), known as theory-driven
or prior-research-driven (Boyatzis, 1998). In contrast to inductive thematic analysis,
the deductive thematic analysis is usually driven by the theoretical framework (a new
washback model incorporating LOA, Figure 3.8) or analytic interest, and it determines
how qualitative data collected support the theory or framework being used. The
flexibility of thematic analysis allowed the researcher to identify key features of the
abundant qualitative data and unpredictable insights (Braun & Clarke, 2006). On the
one hand, using the deductive thematic analysis, which built on the theoretical
framework of the new washback model (see Figure 3.8), could elicit explicitly specific
and significant data aspects regarding the washback of the GVT. On the other hand, it
was also important to be open to new themes emerging from the data, which could add
new knowledge to the study and the model. Therefore, both deductive and inductive
thematic analyses were adopted in the data analysis. The process of applying thematic
analysis is depicted in Figure 4.3.
Figure 4.3. The process of thematic analysis (Braun & Clarke, 2006)
As indicated in Figure 4.3, six stages are necessary. In the current study, the
researcher generated two coding schemes based on the proposed washback model (see
Figure 3.8). The teacher coding scheme was constructed based on classroom
observations and teacher interviews, and the student coding scheme was constructed
based on focus group data.
Following transcription, segmenting the transcripts were done before sending

out the data to co-coders. The criterion for segmenting is that the “chunk” of words
can represent a single whole idea unit, which is complete in its meaning and can be
coded into a same macro code with the segment next to it. An example of segmenting
an excerpt is shown below:
Ming-SA: In my opinion, its test items are mainly testing our grammar
knowledge,/
which are totally irrelevant to our daily life./
In fact, I think that MCQ in the GVT has so many problems./
Besides, some of the items are, one and two options of one test item,
according to the meaning, are correct, but the grammar knowledge…/
(FG-SA)
Following the above procedure, the qualitative dataset was segmented, and two
coding schemes were constructed. After constructing the teacher and student coding
schemes, co-coding was conducted. In effect, using co-coding was of concern as Braun
et al. (2019) argue that co-coding works at odds with fully qualitative paradigms and
aligns more with a (post)positivism paradigm. While acknowledging their caveat, the
philosophical position of this MMR study is to think through qualitative and
quantitative paradigms. In this vein, co-coding enables methodological values sharing
between the two paradigms. As Boyatzis (1998, p. vii), a seminal researcher on coding
reliability approach, explains, coding reliability “is a translator of those speaking the
language of qualitative analysis and those speaking the language of quantitative
analysis”. As such, the co-coding was completed before writing up the thesis, the
details of co-coding are reported later in section 4.4.7.
In this exploratory sequential MMR study, qualitative analysis was crucial for
the quantitative instrument design. However, the time for transcription, translation,
and analysis was relatively limited (from end of May to end of July 2018) since the
survey was planned to be distributed shortly after students took SHSEET on 14th June
2018. Therefore, it was impractical to complete the qualitative analysis in such a short
time. Nonetheless, the researcher managed to preliminarily transcribe and analyse the
data to facilitate the survey design. This process was feasible and durable since the
survey design was also guided by literature findings (see Figure 2.2) and the proposed
washback model (see Figure 3.8) in relation to research questions. Therefore, the key
ideas from the coding of the qualitative data were adopted in instrument design for
both the pilot survey (August 2018) and finalised survey (September and October
2018).
4.4.7 Validity and reliability

Validity and reliability are key considerations for any study and were thoroughly
considered throughout the process of this PhD project. Steps taken to strengthen
validity and reliability were discussed in the previous sections, and the current section
summarises these.
Literally, validity means how valid or truthful the study is. To ensure the
accuracy of the study, “qualitative inquirers often employ validation procedures”
(Creswell, 2015, p. 261). Various scholars have proposed methods to enhance both
internal and external validity in qualitative research (Creswell, 2015; Franklin &
Ballan, 2001; Merriam, 2016). Table 4.7 lists ways to enhance validity in the
qualitative phase through synthesising suggestions from Franklin and Ballan (2001)
and Creswell (2015).
Table 4.7
Methods for increasing reliability in the present study
Method Application in the present study

Classroom observations and interviews were carried out in
Using prolonged engagement
three schools for over one month.
Triangulation
Data were collected from multiple sources.
(methodological)
Using a guiding theory to The current study was guided by established washback
verify findings theories, especially the LOA framework.
The qualitative study was reported as detailed as possible and
Using thick descriptions in
available materials were presented in the appendices to
write-up
increase the transferability of this study.
Reliability refers to “whether the results are consistent to the data collected”
(Merriam, 2016, p. 251), which means consistency or dependability (Lincoln & Guba,
1985). In order to maintain the internal and external reliability, different methods are
suggested (Franklin & Ballan, 2001; Merriam, 2016). Table 4.8 lists ways suggested
by Franklin and Ballan (2001) to enhance research reliability.
Table 4.8
Methods for increasing reliability in the present study
Method Application in the present study

Classroom observations and interviews were audio-
Adopting recording with field notes
recorded and conducted with protocols (field notes).
The collected data, interpretations, and transcriptions were
Cross-checking
cross-checked by the supervisory team.
The researcher participated in every stage of this research
Staying close to the empirical data
from instrument design, data collection, to data analysis.
The researcher kept a research diary to reflect the research
Developing an audit trail
processes.
Applying a consistent analytic
The deductive and inductive thematic analyses were used.
method
The researcher learned and used NVivo 12 Pro for
Using computer software
qualitative data coding and analysis.
The reliability of this qualitative stage was enhanced by establishing both intra-
coder reliability and inter-coder reliability. For intra-coder reliability, the researcher
herself went back to check the coded segments several times and revised the coding
scheme and coded segments accordingly. For inter-coder reliability, the researcher
invited two independent third parties to co-code 30% of the data, as recommended by
Gass and Mackey (2000). The student focus group data were co-coded by a junior high
school English teacher who was not involved in the data collection. Thus, one out of
the three focus group transcripts was co-coded. For the teacher data, four classroom
observations4 were co-coded and for semi-structured teacher interviews, one out of the
three transcripts was co-coded. The co-coding results are documented in Table 4.9.
Table 4.9
Co-coding results for qualitative transcripts
File Total number of Number of Agreement

segments disagreed segments rate (%)
School A Focus Group 268 segments 28 segments 89.55
Teacher C semi-structured interview 232 segments 37 segments 84.05
Teacher A Classroom Observation 3 211 segments 22 segments 89.57
Teacher B Classroom Observation 2 153 segments 12 segments 92.16
Teacher C Classroom Observation 1 270 segments 38 segments 85.93
Table 4.9 shows that the overall co-coding agreement for student data is 89.55%,
and the average co-coding agreement for teacher data (both interviews and classroom
observations) is 87.93%, which are all above the suggested co-coding rate of 80%
(Braun et al., 2019). These levels are acceptable, and most importantly, the researcher
discussed with the co-coders regarding coded segments that were coded differently.
The researcher first asked why co-coders coded certain segments differently, and then
reached an agreement after going through all the relevant segments. Co-coding was an
important instrument to establish the reliability of the qualitative coding analysis.
Although this was a time-consuming process, since the co-coders had to spend some
time to discuss with the researcher and detailed reasons had to be explored and codes
amended, it was worthwhile.
Taking the issues of validity and reliability seriously in the qualitative data
collection and analysis, a series of themes were identified and located. In detail, those
themes were related to washback and LOA practices, participants’ perceptions of the
GVT, participants’ perceptions of incorporating LOA principles in test preparation,
and possible reasons for their perceptions regarding the LOA incorporation. Therefore,
the qualitative phase laid the foundation for the subsequent quantitative phase,
especially the instrument design.
4
Although 15 classroom sessions were observed, three of them (two from Hu, and the other one from
Zhang) were focusing on either reading comprehension or listening comprehension, thus only classroom
sessions related to grammar and vocabulary was coded for main study analysis. Therefore, three
observed sessions were not counted and analysed in the qualitative phase.
4.5 QUANTITATIVE PHASE
Quantitative research, relying on quantifiable data (Johnson & Christensen,

2012), aims to identify trends or test theories, and further explain the reasons through
using variables (Creswell, 2015). The quantitative phase in this study aimed to test and
generalise qualitative findings from the previous phase. Specifically, qualitative
findings regarding washback and LOA practices as well as perceptions were
quantitatively examined. By so doing, the quantitative phase complemented the
qualitative phase.
To implement the quantitative phase, a student survey was designed. As a

quantitative approach for examining behaviours, perceptions, or characteristics of
certain populations, the survey is widely used for testing hypotheses and analysing
tendencies. In a survey design, questionnaire is usually preferred to collect information
from participants (Creswell, 2015; Johnson & Christensen, 2012). In addition to the
measurement of variables to answer research questions, basic demographic
information of participants was also collected. There are mainly two types of
questionnaires: mailed questionnaires and web-based questionnaires (Creswell, 2015).
Given the wide access to the internet in the current research context, the study initially
planned to adopt a web-based questionnaire to distribute student questionnaires online.
“Questionnaire star”, running by Sojump team, is an online platform for
questionnaires, examinations, and voting in the Chinese internet environment. It is
widely used and regarded by the public as the largest Chinese online survey tool.
Therefore, “questionnaire star” was adopted in the current study to create an online
survey link for participants to access the questionnaire.
Compared with paper-based questionnaires, online questionnaires are more

convenient for respondents to complete. The fast-developing technology and improved
living conditions enable students and teachers to easily access internet on electronic
devices such as computer, mobile, and iPad. Even if students left the original junior
high school after the SHSEET, the link to the online questionnaire was distributed to
them through WeChat 5 by disseminating the online flyer to potential schools and
participants. Respondents were able to fill in the questionnaires at their convenience.
Besides, data collected through an online questionnaire were transferred automatically
5
WeChat is a very popular social media app in China.
to statistical software (e.g., SPSS) for analysis, which saved time for data entry and
data entry error was controlled.
However, although the original plan was to collect online questionnaires only, it
turned out that some participant schools were reluctant to forward the online flyer due
to their school management considerations 6 . Therefore, the final data collection
included both online and paper questionnaires, and the paper questionnaires were
mailed back to the researcher. Nonetheless, no difference existed in the two versions,
as the online flyer was also printed out to students when they were asked to fill in the
paper survey. Further, as the ethical clearance for the quantitative phase did not
anticipate this change, a later report of this incident was submitted online to the Office
of Research Ethics and Integrity at QUT and this incident will be reflected in section
8.3.
4.5.1 Instrument design and development

Considering the time restriction for the survey design and distribution, it was
decided that only a student survey would be designed and applied in the quantitative
phase. Although this meant that teachers’ perceptions and practices were not
quantitatively measured, the student survey could provide information about teaching
practices from students’ perspectives. This could provide useful information, since
students were the recipients of classroom teaching. For example, relevant questions
such as “In the time leading up to the test, we did the following grammar and
vocabulary activities in class …” could help to disclose relevant information about
teaching practices.
Before instrument design started, the language of the survey was carefully
considered. Firstly, the participants’ mother tongue was Chinese, and secondly,
considering the age factor of participants, Chinese was chosen as the research language.
Therefore, the survey was designed in Chinese but translated into English for the
purposes of supervision, ethics application, and thesis reporting. After these
fundamental considerations, the process of instrument design and development was
undertaken from the following four aspects.
6
In some schools, mobile phones were restricted, and some teachers still preferred a traditional paper
survey.
First, preliminarily analyses of the qualitative data informed the survey design.
As the researcher was also the observer and interviewer during the qualitative stage,
she was very familiar with the data. As a result, key items of demographic information,
perceptions of test design characteristics, test anxiety, classroom interaction,
involvement in assessment, and feedback were designed based on the observation and
interview data. Moreover, single-item measure such as test difficulty for each of the
four task types was overt to design.
Second, during the design of student survey, existing instruments in the literature
were referred to for certain constructs. For example, knowledge of test design
characteristics (Xie, 2010), motivation (Qi, 2004b; Xie, 2010), time investment (Qi,
2004b; Xie, 2010), learning strategy (Qi, 2004b; Xie, 2010), learner autonomy (Zhang
& Li, 2004), and test importance (Jin & Cheng, 2013).
Third, the new washback model (Figure 3.8) and the LOA cycle in the GVT
context (Figure 3.7) were also the reference for instrument design.
Fourth, face validity was ensured prior to pilot study. As the researcher found in
focus groups, academic or complicated words could not be understood by students.
Once the survey was drafted by the researcher, the supervisory team worked together
to check whether the survey items were related to the research questions. As survey
participants were junior high school graduates, who were teenagers, the intelligibility
of the wording was the priority for answering the survey. Therefore, before the pilot,
face validity of the survey was ensured by different stakeholder cohorts.
The first cohort was ten Grade 9 English teachers from Chongqing and
Guangdong, Gansu, and Yunnan provinces. They suggested that the survey was clear
and easy to understand, but further suggestions regarding the survey format and
wording were provided and adopted. The second cohort included two Chinese
language teachers and one undergraduate who majored in Chinese language. As the
student survey would be mainly implemented in Chinese, these stakeholders provided
professional language editing advice. The third cohort was composed of eight Grade 9
graduates and three Grade 9 to-be students from Guangdong and Henan provinces.
Interestingly, they provided comments that items like “classroom interaction” and
“involvement in assessment” were not clear enough, which was edited accordingly.
The last cohort was the research team and researcher’s colleagues. The researcher had
been working closely with the supervisory team during the survey design and revising

the original design according to supervisors’ feedback. Moreover, three of the
researcher’s PhD colleagues also read through the whole survey and provided
corresponding suggestions to the survey revision. As a result, the survey for the pilot
study was revised by incorporating all those four cohorts’ comments and the
researcher’s recursive self-reflection and revision of the survey.
The survey comprised two main parts. The first part was about students’
demographic and background information (i.e., gender, SHSEET test score, school
district, school name, class number). The second part was the washback questionnaire
relating to the GVT. Specifically, the washback questionnaire contained two sections:
washback value and washback intensity. It is noteworthy that the macro level of
washback value was not included since students could hardly provide any information
regarding curriculum and/or Test Specifications. In total, seven demographic
information items and 73 washback items were included in the pilot study; and six
demographic items and 77 washback items were included in the main study. A five-
point Likert scale (e.g., 1=Never, 2=Seldom, 3=Sometimes, 4=Often, 5=Always) was
adopted. The detailed items within the pilot version and the final version of the survey
are listed in Appendix I. The process of piloting is discussed in the following section.
4.5.2 Pilot study

A pilot study, which is the pre-testing of the research instrument, is always an
important process for ensuring the validity and reliability of the main study (Baker,
1994; Van Teijlingen & Hundley, 2001). Usually, the pilot study is a process in which
researchers collect data from a small sample and then revise or fine-tune the instrument
based on feedback and evaluation of the pilot results (Creswell, 2015). Therefore, the
survey design in the current study included a pilot stage to further evaluate its face
validity and examine its internal consistency reliability.
The researcher distributed the pilot survey first online mainly to students in two
observed schools and students in another unobserved school in Chongqing. The survey
link created through Sojump was sent to Grade 9 graduates in those schools. In total,
192 students responded to the pilot survey. Among the 192 respondents, two of them
were non-Chongqing students. Another four questionnaires were invalid due to
students’ using of same answers to the whole scale. Therefore, 186 valid
questionnaires were collected in the pilot stage. This sample size was considered as

The detailed results of each sub-scale helped to identify unreliable item(s), which
were further removed or reworded to improve the internal consistency reliability of the
survey. Before running the reliability test, v2-v4 were reversely coded because these
indicators were negatively worded as designed by the researcher. As shown in Table
4.10, the Cronbach’s alpha value of most constructs was above the cut-off value of 0.7
(Field, 2009) except “test importance” (alpha=.514), “classroom interaction”
(alpha=.674), “negative perception” (alpha=.674), and “negative strategy”
(alpha=.666). Therefore, taking “negative perception of test design characteristics” as
an example, modification of these constructs was performed.
Table 4.11
Item-Total Statistics of construct reliability of negative perceptions of test design characteristics
Item-Total Statistics
Scale Mean if Scale Variance Corrected Item- Squared Multiple Cronbach's Alpha
Indicators Item Deleted if Item Deleted Total Correlation Correlation if Item Deleted
v2 4.6290 2.732 .382 .164 .703
v3 4.6505 2.077 .596 .363 .427
v4 4.8710 2.156 .496 .299 .569
As shown in Table 4.11, the reliability of “negative perceptions of test design

characteristics” could be improved if “v2 As long as I rote-memorise grammar rules
and vocabulary knowledge, I can get a high score” were deleted (alpha=.674 before
deletion, see Table 4.10; alpha=.703 if deleted, see Table 4.11). However, as is
common practice, at least three items for measuring a construct is ideal to avoid the
problem of an under-identified model. Further, according to Field (2009, p. 673), “any
items that result in substantially greater values of alpha than the overall alpha may
need to be deleted from the scale”. As the increase from .674 to .703 is not substantial,
deleting v2 should be cautious.
Another indicator of the item reliability was the value of corrected item total
correlation to all items. If the value was below the cut-off value of .33, it means that
the corresponding item can account for less than 10% of the variance of the scale and
is therefore not reliable enough (Ho, 2006). The corrected item total correlation value
of .382 for v2 justified that v2 explained more than 10% of the scale variance of
“negative perception” construct (Ho, 2006). Therefore, rather than deleting v2, a
decision was made to revise the item. Based on the suggestions of the supervisory team
and survey respondents, this item was changed to “v1 The GVT only aims to test

students’ ability of rote-memorising vocabulary and fixed collocations” in the main
study. Furthermore, another three items were added to this sub-construct because of
the further reading of literature (Xie, 2010) and findings from qualitative data.
Following suit, constructs and sub-constructs of “extrinsic motivation”,

“negative strategy”, “classroom interaction”, “test importance”, and “test difficulty”
were all examined. The finalised version for main study is included in Appendix I.
4.5.3 Main study

After revising the pilot survey, the main survey was distributed to a much larger
sample. The sample size was to meet the minimal sample size requirement for
performing Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis
(CFA). For factor analysis, the minimum sample size of 100 to 200 is used as a rule of
thumb (Bentler & Chou, 1987; Brown, 2015; Harrington, 2009), and a sample size of
300 is regarded as “large enough” by many studies (Comrey & Lee, 1992; Tabachnick
& Fidell, 2007). In addition to rules of thumb, the number of survey items should also
be considered due to Nunnally (1978) suggesting that the number of participants
should be ten times as many as the items. Therefore, considering the rules of thumb
and the number of items, at least 700 Grade 9 students were needed.
Survey distribution
To approach potential student participants, the researcher first contacted the
schools observed and an TRO who was working in Chongqing educational department
to distribute the online questionnaire mainly to former Grade 9 (Grade 10 at the time
of data collection) students in the nine districts and seven counties in Chongqing (the
joint area, see section 1.2.3). The time for carrying out the main survey was at the
beginning of September 2018 when students had known their SHSEET scores and
restarted school. This was an ideal time to collect quantitative data as students were no
longer coping with the stress of test preparation, but their test experiences were still
fresh. It was thus feasible to ask them to reflect on their perceptions and activities
regarding SHSEET test preparation.
As indicated at the beginning of section 4.5, quantitative data were collected both
online and from a paper version. Further, the collected data were from both the
expected districts/counties and regions not included in the joint area. Since respondents
were from different educational jurisdictions, when they responded to the survey

items, they related to different versions of the SHSEET (paper A and paper B).
Although this did not seem to match the original plan, the data were still usable in that
these two versions of test papers were with no major difference.
The total number of valid surveys was 922. For paper version, 538 students’ data
were collected, among which seven were from non-Chongqing graduates, two students
only wrote down their demographic information, and another 29 questionnaires used
the same answer to the whole survey (mainly choosing 3 on the five-Likert scale).
After the removal of these invalid responses, 500 valid surveys were used as the data
source for analysis. As for the online survey, 541 students accessed the online survey
through Sojump. However, 116 students either participated in the pilot version or did
not graduate from a Chongqing school. Therefore, they were not eligible for the survey
research. Besides, another three students gave the same answer to all the survey items,
so ultimately there were 422 valid responses to the online survey. In order to make
sure that the survey remained statistically “identical” to the two student cohorts (i.e.,
paper survey participants and online survey participants), the results of data screening
and comparison are reported in the following section.
Data screening and comparison

The purpose of data cleaning was to ensure the validity of the data and minimise
data entry errors. Data entry errors were a potential concern since the 522 paper-based
survey data were manually entered into SPSS. After data entry, the researcher decided
to randomly check 20 questionnaires. It was found the entry error rate was quite high.
Therefore, all paper survey data were checked a second time.
The missing values of paper-based survey data were replaced with “series mean”.
The reasoning for this is the missing value percentage was minimal (0.562%, far less
than 5%) (Graham, 2009). Further, both paper survey and online survey were
compared through applying Independent Samples T-test and results are attached in
Appendix J. In general, although paper participants (N=500) and online participants
(N=422) responded to most items in a significant different way (p<.05), the effect size
of the difference was rather small (r<.30). As such, it could be argued that despite the
statistical difference between the two cohorts, their difference, in practice, was not
meaningful (Field, 2009). Therefore, responses from both paper and online survey
participants were combined as a whole dataset. The dataset was randomly split into

halves for factor analysis in SPSS. In this vein, sample one (N=434) was used for EFA
and sample two (N=488) was adopted for CFA.
Demographic information and descriptive statistics

Demographic information was collected at the beginning of the survey. It was
used to understand participants’ attributes. One concern was that age was not included
in the demographic variables. The reason was that Grade 9 students were generally at
the age of 14 to 16, which was the common age for this compulsory education stage.
Besides, different ages were not considered as a prominent factor in the current
washback study among students. Therefore, test paper types, gender, districts of the
survey collected, and self-reported SHSEET scores are reported in this section.
First, the test paper types were analysed. It was found that 805 students (87.31%)
used Paper A and 115 students (12.47%) used Paper B. Two students (0.22%) did not
respond to this item. Although Paper A and Paper B were two different SHSEET test
papers, they were of the same content and characteristics.
900 805
800
700
600
Frequency
500
400
300
200 115
100 2
0
N/G Paper A Paper B
Figure 4.4. Test paper taken by participants
Nonetheless, to test that Paper A and Paper B participants responded to the

survey in a statistically identical way, Independent Samples T-test was applied and the
results are attached in Appendix J. In general, across all the 77 items, there were only
six items that demonstrated statistically significant differences between Paper A and
Paper B participants (p<.05). However, the effect size of these six variables remained
small (r<.30) and thus there were no need of concern at this stage (Field, 2009). It
could be argued that both Paper A and Paper B participants responded to the survey in
the same way and combining these two cohorts into one dataset for analysis was thus
reliable.

The second demographic information from students was their gender. As shown
in Figure 4.5, except for the three missing values (0.33%), 518 females (56.18%) and
401 males (43.49%) participated in the quantitative phase.
600
518
500
401
400
Frequency
300
200
100
3
0
N/G Female Male
Figure 4.5. Gender distribution
Further, except for six students who did not report their district (see Figure 4.6),
both the online survey and paper survey were mainly completed by schools in Nanan
District (N=307, 33.30%), Changshou District (N=166, 18.00%), Jiangbei District
(N=145, 15.73%), Jiulongpo District (N=69, 7.48%), Jiangjin District (N=42, 4.56%),
and Shapingba District (N=33, 3.58%). As a municipality, Chongqing has 26 districts,
8 counties, and 4 autonomous counties. Hence, the participants in the main study came
from a wide range of districts and counties in Chongqing, although not equal in
numbers.
350
307
300
250
Frequency
200 166
145
150
100 69
42 33
50 14 23 21
6 2 5 4 1 2 5 1 1 3 12 4 1 2 3 4 3 10 3 1 9 6 1 10 3
0
0
1
7
3
4
2
17
21
28
10
15
29
32
33
34
12
24
38
30
25
26
16
14
11
22
35
18
23
37
13
Note. 8=Jiulongpo District; 9=Nanan District; 5=Dadukou District; 12=Banan District; 38=Pengshui
Miao and Tujia Autonomous County; 6=Jiangbei District; 14=Jiangjin District; 7=Shapingba District;
4=Yuzhong District; 11=Yubei District; 18=Qijiang District; 23=Rongchang District; 21=Tongliang
District; 13=Changshou District.
Figure 4.6. Distribution of school district
As shown in Figure 4.7, student respondents reported a wide range of test

performance. However, except for 19 students who did not give their test scores

(2.1%), more students with higher language proficiency levels (SHSEET scores above
90, 78.3%) responded to the survey. This indicates a non-normal distribution and low-
achieving students were underrepresented in the sample. Further, to clarify, the student
survey collected their overall SHSEET scores. Those scores which were self-reported
were thus applied in the statistical analyses. Although students’ self-reported SHSEET
scores might not be their actual scores and were not equal to their GVT scores, those
scores were highly related to their actual test scores and were the only score data that
the researcher could collect.
40
35
30
25
Frequency
20
15
10
0
21
30
33
38
51
56
63
71
77
81
86
95
42.5
46.5
67.5
90.5
101
105
108
116
121
124
129
131
134
139
144
148
112.5
118.5
126.5
136.5
141.5
Figure 4.7. Distribution of participants’ SHSEET test scores
Before running EFA, CFA, and main statistical analyses, descriptive analysis
was conducted, and results are shown in Appendix K. From the results, the standard
deviations of the items range from 0.78 to 1.21 on a five-point Likert scale. Therefore,
the results suggested an adequate variance in participant responses except the ones
identified as problematic by the item analysis reported earlier.
Reliability of the scale

The reliability of the main study scale was calculated similarly as the pilot
survey. Before running the reliability test, v1-v6 were reversely coded as the pilot
survey. The reason is that those six items were negatively worded in the student survey.
From Table 4.12, reliability of motivation, intrinsic motivation, extrinsic motivation,
learning strategy, and test difficulty had potential to improve because removal of
certain items from their corresponding construct may increase the alpha value.

In a similar vein, reliability check was also applied to motivation, intrinsic
motivation, learning strategy, and test difficulty. Overall, no major reliability issue was
observed from the main study instrument, but variables of concern were v19, v20, and
v41. Further decisions on item deletion of the main instrument considered validity
issues.
Validity of the scale

To examine the scale validity, EFA and CFA were applied to each construct. As
two major classes of factors analysis, EFA was adopted to identify the underlying
patterns within each of the constructs, and CFA was applied to investigate to what
extent the indicators reflected their corresponding construct (Brown, 2015;
Tabachnick & Fidell, 2007). In factor analysis, the minimum sample size of 100 to 200
was used as a rule of thumb (Bentler & Chou, 1987; Brown, 2015; Harrington, 2009).
Thus, the current two research samples (N=434, N=488) were considered as adequate
for applying both EFA and CFA.
In order to verify the construct validity of the instrument, both EFA and CFA
results were demonstrated for each construct of the main study. To repeat, variables in
test preparation effort and test difficulty was not analysed through factor analysis since
they are single-item measures (see section 4.5.1). For EFA results, both correlation
matrix for indicators, Kaise-Meyer-Olkin (KMO) measure, communalities, total
variance, scree plot, and assessment of normality of each construct were explored to
check the validity of constructs in the instrument. For CFA results, main model fit
indices of each construct were examined. Most importantly, before starting factor
analysis, the researcher decided to use maximum-likelihood method in factor analysis.
The reason was that the current sampling was expected to be able to generalise the
results to a larger population (Field, 2009).
For the sake of conciseness, this study mainly took the motivation construct as
an example for conducting EFA, which demonstrates a theory-driven method
(extraction of a fixed number of two factors from motivation construct according to
motivation theory). Therefore, instead of extracting factors based on eigenvalues
greater than 1, the researcher set a fixed number of two factors to be extracted
according to the theoretical dimension of intrinsic and extrinsic motivation. Further,
CFA results are also discussed to provide an example for the validation of the
motivation construct.

The inter-item correlation was first examined. As presented in Table 4.14, values
presented in the correlation matrix suggest that except for v20, other items are
statistically significantly correlated with each other (p<.001). For v20 “Learning
English grammar and vocabulary is to meet the requirement of school courses.”, its
correlations with other nine indicators did not fall within the cut-off range between .30
and .90 (Field, 2009). Therefore, v20 was of concern and was further explored.
Table 4.14
Correlation matrix for the indicators within the motivation construct
Indicators v15 v16 v17 v18 v19 v20 v21 v22 v23 v24
v15 1.000 .651 .608 .618 .388 -.187 .314 .356 .367 .380
v16 .651 1.000 .767 .724 .408 -.150 .311 .407 .476 .477
v17 .608 .767 1.000 .733 .443 -.119 .287 .414 .478 .467
Correlation
v18 .618 .724 .733 1.000 .516 -.084 .364 .482 .517 .547
v19 .388 .408 .443 .516 1.000 -.165 .176 .320 .498 .343
v20 -.187 -.150 -.119 -.084 -.165 1.000 .260 .219 .083 .152
v21 .314 .311 .287 .364 .176 .260 1.000 .566 .453 .489
v22 .356 .407 .414 .482 .320 .219 .566 1.000 .547 .589
v23 .367 .476 .478 .517 .498 .083 .453 .547 1.000 .654
v24 .380 .477 .467 .547 .343 .152 .489 .589 .654 1.000
v15 .000 .000 .000 .000 .000 .000 .000 .000 .000
v16 .000 .000 .000 .000 .001 .000 .000 .000 .000
v17 .000 .000 .000 .000 .007 .000 .000 .000 .000
Sig. (1-tailed)
v18 .000 .000 .000 .000 .040 .000 .000 .000 .000
v19 .000 .000 .000 .000 .000 .000 .000 .000 .000
v20 .000 .001 .007 .040 .000 .000 .000 .042 .001
v21 .000 .000 .000 .000 .000 .000 .000 .000 .000
v22 .000 .000 .000 .000 .000 .000 .000 .000 .000
v23 .000 .000 .000 .000 .000 .042 .000 .000 .000
v24 .000 .000 .000 .000 .000 .001 .000 .000 .000
As shown in Table 4.15, KMO value of the motivation construct was .886, which
was well above the cut-off value of .50 (Kaiser, 1974). Moreover, Bartlett’s test of
sphericity confirmed the adequacy of the magnitude of the correlations by presenting
a Chi-square value of 2220.687 (p<.001), which was statistically significant.
Therefore, these two results indicated that EFA should generate distinct and
predictable factors.
Table 4.15
KMO and Bartlett’s test for the indicators within the motivation construct
KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .886
Bartlett's Test of Sphericity Approx. Chi-Square 2220.687
df 45
Sig. .000

The eigenvalues and total variance explained by the indicators were examined.
As indicated in Table 4.18, two factors with an eigenvalue above 1 were derived and
thus retained (Kaiser, 1960). Cumulatively, these two factors accounted for 55.631%
of the total variance of those ten indicators, which was greater than 50% (Field, 2009).
Table 4.18
Total variance of the motivation construct explained by its indicators
Initial Eigenvalues Extraction Sums of Squared Loadings

Factor Total % of Variance Cumulative % Total % of Variance Cumulative %
1 4.865 48.649 48.649 4.456 44.559 44.559
2 1.617 16.174 64.823 1.107 11.072 55.631
3 .799 7.990 72.813
4 .609 6.089 78.902
5 .521 5.212 84.114
6 .409 4.094 88.208
7 .381 3.815 92.022
8 .327 3.269 95.291
9 .245 2.447 97.738
10 .226 2.262 100.000
Extraction Method: Maximum Likelihood.
Scree plot of the two-factor solution of motivation construct is shown in Figure
4.8. As a sharp descent in the curve fallowed by a tailing-off started after the third
factor in the curve (Cattell, 1966), it proved the two-factor solution.
Figure 4.8. Scree plot of the motivation construct
Finally, univariate and multivariate normality were examined before proceeding

to the CFA analysis of “motivation” construct. To test normality, SPSS Amos 25 was
used, and its results are listed in Table 4.19. As highlighted in Table 4.26, the critical
ratio of skewness, kurtosis, or both were beyond the cut-off value of |2| (Field, 2009),
thus it proved that all the indicators deviated substantially from univariate normal

distribution. As for multivariate normality, the measure of multivariate kurtosis
(73.100) was greater than the cut-off value of 3 (Yuan et al., 2002), which
demonstrated a significant deviation from multivariate normal distribution. As a result,
bootstrapping methods were adopted in CFA to grapple with the non-normally
distributed data.
Table 4.19
Assessment of normality for the indicators within the construct motivation
Variable min max skew c.r. kurtosis c.r.

v24 1.000 5.000 -.579 -5.220 .272 1.228
v23 1.000 5.000 -.291 -2.621 -.047 -.210
v22 1.000 5.000 -.558 -5.036 .448 2.020
v21 1.000 5.000 -.675 -6.089 .683 3.081
v19 1.000 5.000 -.312 -2.818 -.328 -1.480
v18 1.000 5.000 -.752 -6.780 .479 2.159
v17 1.000 5.000 -.985 -8.888 1.195 5.390
v16 1.000 5.000 -1.055 -9.518 1.491 6.724
v15 1.000 5.000 -.866 -7.813 1.241 5.595
Multivariate 73.100 57.380
A two-factor measurement model with nine indicators for motivation construct

was established through CFA. The model with its standardised parameters is shown in
Figure 4.9.
Figure 4.9. Measurement model for the motivation construct

From Figure 4.9, five indicators are loaded on the factor “intrinsic motivation”,
and four indicators are loaded on the factor “extrinsic motivation”, and the model
reflects the data well. All the values of standardised regression weight were above the
preferred cut-off value of .50 (Hair et al., 2006), and most of them were above the
preferred value of .70 (Hair et al., 2006). Moreover, the squared multiple correlations
(SMC), which indicates to what proportion of variance that each indicator could be
explained by the factor, should be greater than .50, but an SMC above .30 is also
considered to be acceptable for a factor to well map the indicator (Jöreskog & Sörbom,
1989). Figure 4.9 shows that all the nine indicators meet this criterion. However, the
distinctively lower value of SMC of v19 (.33) re-raised the issue of the inclusion of
v19 “Learning English grammar and vocabulary is really interesting”. As recalled, v19
was also of concern in reliability test as the deletion of this item could increase alpha
value from .870 to .896 (see Table 4.14). Although this increase was not dramatic,
survey items in intrinsic motivation were revisited and it was found that compared to
other items, v19 was much more general without indicating a specific meaning.
Therefore, v19 was deleted from the construct and thus CFA was performed again.
The new measurement model with eight indicators is shown in Figure 4.10.
Figure 4.10. Modified measurement model for the motivation construct
In the two-factor measurement model with eight indicators, all standardised

regression weight values were above the preferred value of .70 (Hair et al., 2006) and

all SMC values were above the preferred value of .50 (Jöreskog & Sörbom, 1989). The
CFA results of the construct of motivation with eight variables showed a good model
fit (CMIN/DF=6.088, df=19, p=.000, SRMR=.045, RMSEA=.102; 90% Confidence
Interval (CI) [.085, .121]; TLI=.937; CFI=.957). Although the model had a significant
chi-square value of 115.672, the ratio between the chi-square value and the degree of
freedom (115.672/19=6.088) was not very high. Standardised root mean square
residual (SRMR) was .045 which was below the cut-off value of .08 (Hu & Bentler,
1999); baseline fit indices of Tucker-Lewis Index (TLI=.937) and Comparative Fit
Index (CFI=.957) were above the cut-off value of .90 (Bentler, 1990); but the value of
Root Mean Square Error of Approximation (RMSEA=.102) was above the cut-off
value of .08 (Ho, 2006; Schreiber et al., 2006). Given the complexity of the model, it
can be argued that the model had a reasonably good fit.
To sum up, the validity and reliability of the designed washback scale has been
tested and verified. Each construct was modified and validated through EFA and CFA.
Indicators deleted from the constructs include v19 and v20 in motivation scale; v38,
v42, and v43 in learning strategy scale; v49 and v50 in interaction scale; and v54 and
v56 in involvement in assessment scale. The decisions for deleting those variables
remained consistent across all constructs. The detailed EFA and CFA results of each
construct after deleting those variables are attached to Appendix L. Further
quantitative analyses in this study were conducted without the deleted indicators. The
statistical hypotheses were tested and thus helped to address the corresponding
research questions in Chapter Five, Chapter Six, and Chapter Seven.
Data analysis
In Chapter Five, Chapter Six, and Chapter Seven, descriptive statistics
(percentage distribution), Multiple Correspondence Analysis (MCA), SEM, and CFA
were applied.
MCA was applied to test the theoretical assumption of washback intensity

(Green, 2007a). As it has been conceptualised that the relationships between
perceptions of test importance, test difficulty, and washback intensity (i.e., test
preparation effort in the current data) are non-linear and multi-modal, MCA was
chosen to explore their relationships. In fact, MCA is part of the descriptive analysis
family which demonstrates patterning of different categories in complex data. MCA
can represent and model the complex dataset and show “clouds” of points in a

multidimensional Euclidean space (Greenacre, 1991, 2017; Tabachnick & Fidell,
2007). The MCA results can show patterns based on the relative positions and
distributions of variables and similar categories stay closer in a “cloud”. Results will
be reported in Chapter Six.
In addition to MCA, SEM was performed to model washback mechanism

suggested by theoretical conceptualisations (Green, 2007a; Hughes, 1993; Wolf &
Smith, 1995), empirical studies (Dong, 2020; Xie, 2013, 2015a; Xie & Andrews,
2013), and qualitative findings. Therefore, it was assumed that students’ perceptions
of test design characteristics and test importance affect their test preparation practices
through motivation and test anxiety. Further, the test preparation practices also exert
influence on students’ learning outcomes (i.e., students’ self-reported SHSEET
scores). The conceptual model is proposed in Figure 4.11 and results will be reported
in Chapter Six.
Figure 4.11. Washback mechanism of the GVT
Modelling LOA practices through CFA took both theories (Carless, 2007; Jones
& Saville, 2016; Lamb, 2010) and qualitative findings into consideration. Specifically,
the proposed key LOA practices consist of four constructs: classroom interaction,
involvement in assessment, feedback, and learner autonomy (see section 3.4). All these
four constructs were explored in the student survey. In addition, the correlation
between those constructed factors were posited due to reasons that the constructs were
seemingly correlated in qualitative data and the claim that LOA cycle is systematic
and ecological (Jones & Saville, 2016). All these assumptions were tested through
CFA, with an examination of whether LOA practices constitute a four-dimensional
model. Details of this CFA model will be presented in Chapter Seven.

4.6 ETHICAL CONSIDERATIONS
Although the study is considered as low-risk, ethical issues were considered

throughout the research process (Creswell, 2015; Johnson & Christensen, 2012). After
Confirmation, the researcher applied for ethical approval from the University Research
Ethics Unit (QUT Ethics Approval Number 1800000194) before collecting qualitative
data and one variation was approved before collecting quantitative data. Further
collection of paper survey was also reported to the University Research Ethics Unit.
In the qualitative phase, informed consent was gained from the schools, teachers,
and students before conducting the classroom observations and interviews. The
participants were informed of the research objectives, project information, and audio-
recording requirements. They were assured of confidentiality during the research
processes and final report. Interviews, in particular, were completed in a quiet,
spacious, and private office located in schools. In addition, each participant was
assured of the freedom to withdraw from the research (Johnson & Christensen, 2012;
Minichiello et al., 2008) if there was anything they felt uncomfortable about, although
no one chose to withdraw. The audio recordings were stored on the QUT network
which could only be accessed by the researcher herself. The written protocols were
locked in the researcher’s office at QUT.
In the quantitative phase, both the online survey and the paper survey were
distributed anonymously. For the online survey, a brief introduction of the research
and survey (i.e., the survey flyer attached in Appendix N) was presented to participants
once they opened the online survey link. For the paper version, the participant
information sheet was provided along with the survey. Thus, participants were assured
of the anonymous and voluntary nature of the quantitative phase. Moreover,
participants were free to withdraw. Clicking “leave the survey” button allows
withdrawal from the online survey. Not returning the survey or returning the survey
without answering it allows withdrawal from the paper version. The collected data
were also stored on the QUT network with the recordings and in the researcher’s office
locker at QUT to keep safety and confidentiality. Submission of the online survey and
the paper survey was regarded as participants’ voluntary participation in this research.
At the data analysis and reporting stages, all the participants, specifically the
interview and classroom observation participants were de-identified by using

pseudonyms. In sum, ethical considerations were thought through over the whole
research process of research design, data collection, data analysis, and data reporting.
4.7 CHAPTER SUMMARY
Chapter Four illustrates the research processes of the current study. To sum up,
the current study employed an exploratory sequential MMR design to investigate the
GVT washback and LOA practices, perceptions of the GVT washback and LOA
opportunities as well as challenges, and influential factors for participants’
perceptions. The research procedure and research design are summarised in Table
4.20, and the findings of data analysis are presented subsequently in Chapters Five,
Six, and Seven.
Table 4.20
The overall research procedure for the present study
Phase Research method Sampling Time (2018) No. of participants

classroom observation convenience April to May three classes
semi-structured
Qualitative convenience April to May three teachers
interview
focus group purposive April to May three groups of students
pilot survey convenience August 186 student questionnaires
Quantitative September to
main survey convenience 922 student questionnaires
October

Chapter 5: Test Preparation: Washback
Value
Chapter Five presents the data analysis and findings with regard to the washback
value (i.e., negative or positive) from the perspectives and practices of Grade 9
teachers and students as they prepared for the Grammar and Vocabulary Test in the
Senior High School Entrance English Test (the GVT). Through the analysis of data
obtained by classroom observations of teaching and learning practices, interviews with
teachers and students, and a student online survey, this chapter addresses the first sub-
research question of RQ1:
RQ 1: What is the washback of the GVT?

Guided by the washback model incorporating Learning Oriented Assessment

(LOA), this chapter will focus mainly on the washback value of the GVT (as
highlighted by the dotted frame in Figure 5.1). In particular, the washback value at
both macro and micro levels (not including LOA practices) is reported from both
qualitative and quantitative data.
Figure 5.1. Focus of the new washback model in Chapter Five (Carless, 2007; Green, 2007a; Jones &
Saville, 2016)
This chapter includes six sections. Section 5.1 describes participants’

understanding (section 5.1.1) and use of official test reference documents (section
5.1.2). Section 5.2 reveals participants’ perceptions of GVT design characteristics
regarding authenticity (section 5.2.1), provision of context (section 5.2.2), task method
Chapter 5: Test Preparation: Washback Value 121

(section 5.2.3), assessing language use (section 5.2.4), and students’ perceptions of
GVT design characteristics as measured in the survey (section 5.2.5). Section 5.3
focusses on participants’ affective responses to the GVT, which includes test anxiety
(section 5.3.1), intrinsic motivation (section 5.3.2), and extrinsic motivation (section
5.3.3). Section 5.4 focusses on the materials that participants used to prepare for the
GVT, including both exam-oriented test preparation materials (section 5.4.1) and non-
exam oriented learning materials (section 5.4.2). Section 5.5 presents the grammar and
vocabulary learning strategies taught by teachers and adopted by students during GVT
preparation, which contain both test-use oriented (section 5.5.1) and language-use
oriented learning strategies (section 5.5.2). Most importantly, corresponding
quantitative results are jointly reported in relevant sections. Finally, all these five
topics are summarised in section 5.6.
5.1 UNDERSTANDING AND USE OF OFFICIAL TEST REFERENCE

DOCUMENTS
The official test reference documents in this study refer to both the English
Curriculum Standards for Compulsory Education (ECSCE) and the Test Specifications
for SHSEET (Test Specifications). How participants understand and implement
ECSCE principles and use Test Specifications as test preparation reference are viewed
as a key factor at the starting point for test washback, which is perceived by researchers
as intended as well as unintended purpose during the test design stage (Linn, 1993; Qi,
2007). Although teachers and students are not test designers, as implementers of
curriculum and test design ideas (i.e., teachers) and receivers of tests and test
preparation (i.e., students), their opinions are helpful to understand the intended
washback.
The ECSCE designates the teaching and testing scope for compulsory education
(see Chapter One, section 1.2.2). Moreover, the Test Specifications reflects the ECSCE
language learning objectives, since both documents emphasise learner-centred
teaching and learning. As conceptualised in Chapter Three, both the ECSCE and Test
Specifications were positioned at the macro level of the current new washback model
which incorporates LOA. Therefore, in this section, participants’ understanding of
these two official test reference documents, teachers’ implementation of the ECSCE
principles, and participants’ use of Test Specifications are reported.
122 Chapter 5: Test Preparation: Washback Value

At the macro level of washback value, the curriculum reference for the GVT
(i.e., the ECSCE) is one of its indicators, and how it links to the interpretation of
student achievements and Test Specifications is crucial to the macro level of LOA. In
order to understand the role of this indicator, in addition to classroom observations,
questions regarding the ECSCE and Test Specifications were explored in teacher
interviews and student focus groups. Teachers and students across schools reportedly
acknowledged the important role of those official documents in guidance, especially
to teachers in their teaching practices. However, as shown in the qualitative data, the
use of those documents differed.
Before moving on the the qualittaive findings, the three teachers’ information is
summarised in Table 5.1.
Table 5.1
Information on the participating teachers
Teacher SA-TA SB-TB SC-TC

Pseudonyms Lan Hu Zhang
Gender Female Female Female
Age range (years) 25-30 25-30 35-40
Class size 48 students 52 students 36 students
Teaching Three years’ teaching Three years’ teaching 18 years’ teaching
experience experiences; first year experiences; first year experiences, Director of
Grade 9 teacher, no Grade 9 teacher; head Teaching Affairs
administrative position; teacher
subject teacher
Education Master of Translation Part-time Master Bachelor degree
and Interpreting (MTI) candidate (correspondence, in 2000)
English
TEM8 TEM8 Not given
proficiency
5.1.1 Understanding the role of official test reference documents

To begin with, teachers provided valuable insights regarding their knowledge as
well as their teaching philosophy when responding to questions on the two official
documents. All three teachers agreed that the key ECSCE principles of learner-centred
characteristics (which values the learning process and emphasises the ability to use
language), differentiating students with different language proficiency levels, and
testing students’ overall ability to use language, were reflected in Test Specifications.
Furthermore, these two documents supplement each other in both content and
principles, as Hu explained:

Hu: The methods used may vary, but the principle is the same, the core ideas of
ECSCE and Test Specifications are the same. (Interview)
According to Hu, teachers were aware of the synergy between these two
documents. That is, Test Specifications was assumed to have the same learner-centred
characteristic as the ECSCE. This finding, to some extent, aligned with the assumption
of an LOA cycle that the higher-level objective from the curriculum should be applied
in creating an LOA syllabus (Jones & Saville, 2016). Therefore, it provides some
evidence of incorporating LOA principles at the macro exam level in the GVT context.
Additionally, all three teachers agreed that the GVT design reflected ECSCE
learning objectives (i.e., the overall ability to use language and the five learning
objectives of affective attitudes, cultural understanding, language knowledge,
language skills, and learning strategies). For example, while Lan recognised the testing
of language knowledge through specific items, she felt that the test paper as a whole
reflected learning objectives in the ECSCE.
Lan: I mean, actually, this overall learning objective is not realised through any
single test item. … For example, MCQ in the GVT, you can’t link it alone with
those five sub-objectives and discuss how much importance it has to these
objectives. Instead, this thing, together with other tasks and at different
teaching stages, what it can accomplish with others. (Interview)
To Zhang, however, even a single item could reflect different learning

objectives, and the GVT design followed the communicative language use orientation
and overall language use ability stipulated in the ECSCE. Therefore, although the
implementation of teaching and assessment principles advocated in the ECSCE was
perceived by Zhang as difficult, she held a positive attitude towards GVT tasks. For
instance, she agreed that test tasks could help develop students’ communicative
language use.
In a similar vein, students agreed with their teachers’ comments on the crucial
role of test reference documents. In fact, to explore the macro washback value from
the students’ perspective, mainly use of Test Specifications was reported. In focus
groups, students (e.g., Kai-SC) indicated that they did recognise the crucial role of Test
Specifications in their test preparation. However, their understanding of the document
was restricted as their focal point was the explicit knowledge designated by the
documents to be the GVT scope. Therefore, at a macro level, the evidence of students’

understanding of the official test reference document (i.e., Test Specifications) was
limited and largely restricted to instrumental purposes.
Moreover, Zhang’s accounts proved that the GVT design followed what Green
(2007a) would call the “focal construct” of the curriculum. This finding thus
conformed to the LOA cycle assumption that the test design characteristics should also
contribute to an LOA syllabus (Test Specifications in this context), which followed
the key ideas in the curriculum at a macro level (Jones & Saville, 2016). Therefore,
the test design was believed to reflect curriculum stipulated learning objectives. For
example, Zhang commented that, when designing the Cloze, test designers’ intentions
were to “test higher-order language skills, thinking, language use abilities, and
knowledge” and to “weaken the testing of grammatical knowledge” (Interview). Thus,
it was confirmed that the GVT was designed to have an overlap with the focal construct
of the curriculum, providing a potential for positive influence (Green, 2007a) and
reflecting test designers’ positive washback intentions (Sharif & Siddiek, 2017).
This part of the section reported on the participants’ understanding of the two
official test reference documents. Summarising from the qualitative findings, all three
teachers agreed on the learner-centred and thus learning-oriented characteristics of the
ECSCE and Test Specifications. Further, teachers as well as students agreed that these
two documents had an important role in test preparation. Most importantly, all three
teachers agreed that the test design of the GVT reflected the ECSCE learning
objectives and Zhang particularly commented on the alignment between focal
construct and test design intentions. Therefore, it was assumed that at the macro level,
the GVT could generate positive washback due to curriculum developers’ and test
designers’ intentions to align the test with key curriculum principles.
5.1.2 Implementing the principles in the ECSCE and using the Test
Specifications as test preparation reference
From section 5.1.1, although teachers stated that they were aware of what the
curriculum specified for teaching and assessment, their implementation of ECSCE
principles was restricted in the test preparation context. While Lan’s teaching was still
guided by those principles during test preparation, the other two teachers admitted that
their teaching had originally followed these principles when teaching lower grades,
but their adherence to them decreased according to the proximity of the test, especially
in Grade 9. Their comments on this, however, were related to the SHSEET test

preparation as a whole, not specifically to the grammar and vocabulary preparation
context. The possible reason for such comments could be that, in teachers’
understanding and beliefs, although communicative language teaching (CLT)
curriculum intentions were positive, “the traditional way was the most efficient to get
the content transferred to the students” (Huang, 2006, p. 184). Nonetheless, from a
theoretical level, the absence or weaker implementation of core curriculum principles
in test preparation teaching could lead to a negative washback result (Green, 2007a).
Although teachers held similar understandings of the ECSCE and/or Test

Specifications and they recognised their learner-centred characteristics as well as
crucial role in test preparation, their attitudes towards the implementation of those
teaching and assessment principles in the ECSCE varied. Lan held a positive attitude
towards the implementation of teaching principles in test preparation, while the other
two teachers expressed difficulties in implementing teaching and assessment
principles in the ECSCE. The reason for this difference was partly explained by Zhang
who regarded the ECSCE as “impractical” in guiding teaching (Interview), especially
during test preparation, which could be seen as a challenge for her in applying
communicative pedagogy, a curriculum requirement (Zeng, 2008).
In addition, the significance of using Test Specifications for teaching guidance

was confirmed by all teachers. However, their use of the document differed. For
example, in addition to admitting “I think we use more of Test Specifications [than the
ECSCE]” (Interview, Lan), Hu and Zhang printed the vocabulary lists for students’
reference. Lan did not mention Test Specifications during classroom observations;
however, Hu informed students of some key points from Test Specifications, but
mainly “instructed and communicated [the Test Specifications] to them” according to
her interview accounts (i.e., the teacher studies the document, and then communicates
the information to students). For instance, when encountering test-related topics in
MCQ, Hu reminded students to pay attention to them during her test preparation
teaching:
Hu: But topics such as this, for example, topics like suggestions for study or exam
anxiety, doesn’t this belong to one of the 24 topics designated by the SHSEET?
Topics like learning approaches, etc. (SB-CO29)
9
SB-CO2 means “School B, Classroom Observation 2”.

This was further supported by student’ accounts in focus groups as students
learned grammar and vocabulary by referring to what was required by the ECSCE or
Test Specifications during test preparation. For example, Kai-SC thought the Test
Specifications was very important for students as this document was foundational for
junior high school English learning.
Kai-SC: It is the major essence of our junior high school English learning; it synthesises
those most important as well as common phrases and words and records them
in this one document. (FG-SC)
In contrast, when asked about students’ use of Test Specifications, Hu said

students knew “nothing about the document” and felt that “it’s nonsense to ask
students to study this document” since “they really do not have the time”. Therefore,
she studied and paid special attention to those yearly changes in test scores and new
language knowledge in Test Specifications, which was later instructed and
communicated to her students. For example, she was aware that “no more than five
marks” were changed in the new edition of Test Specifications (Interview). Thus, both
students’ and teachers’ use of Test Specifications was restricted to some extent.
Among those three teachers, Zhang most emphasised the value of Test
Specifications in her teaching and to her students, whose proficiency was lower than
that of students in other schools. For example, she commented on the crucial role of
this document to her students.
Zhang: For them (i.e., students in another high-level school), it is voluntary [to buy
Test Specifications]. But for my students, I required them to buy. … Those
students with high language proficiency levels, Test Specifications for them is
not really such a useful guidance. It [Test Specifications] is a very basic and
general public thing. It specifies vocabulary that should be mastered by
students. … Okay, then we have to use that as our reference and guidance.
(Interview)
After explaining the fundamental role of Test Specifications in test preparation

teaching, Zhang further expressed her preference for this document to guide her
teaching in contrast to the ECSCE:
Zhang: We still prefer the narrower document [Test Specifications] better, as it

concentrates, it helps easier concentration [of teaching and learning], and it’s
less broad as well as more practical, therefore, it is more operable. (Interview)

Additionally, even though teachers (Lan, Hu) did not clearly remind students of
the existence of Test Specifications, students did take note of the weighting and the
purpose of certain tasks in the SHSEET through referring to Test Specifications. For
example, Fei-SA regarded the decrease in marks allocated to the MCQ task in the GVT
was to focus attention on assessing writing abilities, which was emphasised in teaching
as well as the curriculum.
Fei-SA: The … decrease of the test score and weight for this section is to avoid testing
pure or simple grammar and language knowledge, and to promote students’
writing abilities and the ability of writing down one’s ideas. (FG-SA)
In sum, this part of the section reported on the participants’ use of the two official
test reference documents. The overall findings indicated that although every teacher
was familiar with teaching and assessment principles in the ECSCE, only Lan
implemented these in test preparation; further, although Lan was positive regarding
the implementation of curriculum principles in teaching, the other two teachers
expressed difficulties in Grade 9; and although all three teachers used Test
Specifications as their teaching reference, only Zhang especially emphasised its
importance to her students. However, even though participants recognised the
important role of test reference documents, their use of these documents was restricted
in that teachers rarely implemented teaching as well as assessment principles in the
ECSCE during test preparation and teachers as well as students mainly took Test
Specifications as test preparation reference by focusing on test score change and test
scope designated in the document. Results from this section indicated that there may
be the potential for both negative (Hu, Zhang, and students) and positive washback
(Lan) at the macro level, since teachers implemented ECSCE principles differently
and students used Test Specifications differently.
To summarise, findings presented in section 5.1 suggest that although the

positive washback potential was assumed to happen due to the overlap between focal
construct and test design characteristics (Green, 2007a), the fact that curriculum
principles were not implemented by Hu as well as Zhang and the exclusive focus on
using test scope knowledge in official test reference documents (Luo, 2012) by Hu,
Zhang, and students indicated a negative washback value at the macro level of the new
washback model in this study. It further implied that the positive washback intention
from curriculum developers and test designers failed to some extent. In contrast, Lan’s

intention of following curriculum principles in test preparation was the evidence for
positive or intended washback. However, the limited evidence could not fully gauge
participants’ answers to macro washback value in the GVT context, which needs
further investigation from other stakeholders, including test designers and curriculum
developers.
5.2 PERCEPTIONS OF TEST DESIGN CHARACTERISTICS
As reported in section 1.2.5, four tasks are included in the GVT and examples
are presented in Appendix B. This section presents participants’ perceptions of
characteristics of the GVT, which is one major factor in the washback value dimension
of the new washback model incorporating LOA. According to Green’s (2007a)
washback variability, participant characteristics and values are assumed to be
influenced by the potential for both negative and positive washback generated from
the macro level (i.e., the overlap between focal construct and test design
characteristics). To explore the micro washback value, teachers’ and students’
perceptions of test design characteristics were first explored.
It was noticeable that teachers and students generally commented on the lack of
communicative language skills tested in the GVT. In their opinion, these skills were
only tested through specific communicative language use tasks like the “Oral Test”
task in the SHSEET paper or written dialogues in MCQ items (see item 35 in Appendix
B for example). Due to the decreasing number of those items, teachers as well as
students felt that the GVT was no longer assessing students’ communicative language
ability. Lan clearly explained this in the interview:
Lan: Communicative language? Well, for communicative language, things are like
this, previously there were probably three items, it seems to be one or two
items in MCQ to test communicative language, which are special
communicative language testing items. However, since they (i.e., test
designers) thought that communicative language has already been tested in
Listening Comprehension, and also Oral Test task. As a result, this part was
decreased from MCQ in the GVT. (Interview)
From the above quote, it was interesting to note that teachers felt communicative
language was now tested less in GVT tasks. This perception might be due to their
understanding of the concept of communicative language; that is, communicative
language was perceived by participants as involving speaking skills only, relevant to

daily life, and as authentic language use (i.e., actual communication). For example, it
was relevant to daily communications such as greetings, restaurant reservations, and
room as well as flight ticket bookings (Lan, Interview). Therefore, as there was limited
inclusion of conversations, teachers regarded GVT tasks to be non-communicative in
nature (Zhang, Hu, Interview). This finding, due to participants’ narrow understanding
of communicative language use, was perceived as a general negative perception
towards the GVT.
As proposed in the new washback model incorporating LOA, participants’

perception of test design characteristics is a key factor in micro washback value. From
the semi-structured interviews and focus groups, various perceptions with regard to
the GVT characteristics were expressed by participants. This section reports both
teachers’ and students’ perceptions according to topics. Therefore, it is possible that
participants might have varied opinions on the same topic, or they might have
conflicting perceptions regarding the similar topic. To probe those qualitative findings
in a larger scale, survey results are also included in this section.
To begin with, differing ideas among participants were identified from their
interview accounts. Generally, there were four major features of GVT tasks that
participants commented on.
• Authenticity: Some participants felt GVT tasks of MCQ and Sentence

Completion lacked authentic language, while others thought that the topics
and language involved in GVT tasks were relevant to real life;
• Provision of context: Some participants felt that there was insufficient

context in test items of MCQ and Sentence Completion, while others
considered the test items in Cloze and Gap-filling cloze as providing a rich
context;
• Test method: Some participants regarded that GVT items of MCQs were
guessable and MCQ as well as Sentence Completion tested rote-
memorisation, while others acknowledged that those tasks tested a wide
range of language knowledge. Moreover, the test method of MCQs reflected
unchanged test content in GVT tasks;
• Assessing language use: Some participants criticised that the overall ability
to use language was not tested in GVT tasks of MCQ and Sentence

Completion, but others perceived that Cloze and Gap-filling cloze more
effectively tested language use.
Among all these addressed features of the GVT tasks, participants had quite
conflicting opinions. Each feature is thus reported with corresponding interview
accounts in this section.
5.2.1 Authenticity
Both teachers and students expressed their concerns over the authenticity of
language in GVT tasks. On the one hand, some participants agreed that GVT tasks,
mainly MCQ and Sentence Completion, were lack of authentic language; on the other
hand, other participants perceived that GVT tasks, especially Cloze and Gap-filling
cloze, were relevant to real-life experiences and language use.
To begin with, participants had differing views on the language used in GVT
tasks with regard to its relevance to real-life use. All three teachers and most students
commented that the type of language required in MCQ was irrelevant to either real-
life use or future English study. Lan explains this view below:
Lan: Well, that is to say, after students learned English… regarding the context
involved in test items [of the GVT], I think it does not conform, well, to
students’ use in real life, especially for their language use overseas. (Interview)
In Lan’s opinion, the language context involved in both teaching and testing was
“the same”, however, it was “impractical” in authentic communication (Interview).
Later on, Lan claimed that “what you can say in English in schools is what others do
not use in real life” and she was astonished when she had to teach old-fashioned
phrases such as “How do you do?” in junior high school (Interview). Similarly, Hu
shared the view that language knowledge learned and tested in junior high school was
irrelevant to that used outside the class, such as in American TV series (Interview) and
Zhang felt that “there is still some disparity” between real English use and the test
content of the GVT (Interview).
This perception of lack of language authenticity was shared mainly by School A

students and Na-SB, who generally had a high language proficiency. For example,
both Lan and School A students believed that some grammar and vocabulary was
actually used in daily life and by native English speakers; however, the test design and
the designated answers made those tasks irrelevant to the real-life language use of

students in China (FG-SA). In their opinion, the mismatch between real language use
and designated test answers in GVT tasks like MCQ limited their language learning
and confused them. The following comment from Xia-SA explains their confusion:
Xia-SA: However, I still think, besides, well, some options … are hard to be
differentiated. For example, modal verbs, well, I feel that every option could
be possible. That is to say, to put them into daily oral communication, every
option could be correct. Therefore, it is really hard to differentiate which one
is the best. (FG-SA)
Therefore, both teachers and students regarded GVT test content, MCQ in
particular, to be irrelevant to real-life use or future study contexts, which indicated a
lack of authenticity of language. This negative perception was similar to Zhi and
Wang’s (2019) findings in the NMET context where the perceived irrelevance of test
content to real-life English threatened test authenticity. To this end, it indicated an
influence of negative washback potential of test design characteristics on students
(Green, 2007a).
In contrast to teachers and School A students who felt that GVT tasks, MCQ in
particular, did not really reflect real-life experiences and language use, students from
School B and School C commented that the test content of GVT tasks were relevant
to real life. They commented that both Cloze and Gap-filling cloze were “connected
with real life” (Shu-SB) and used real-life topics (Na-SB). The following example
from Fang-SC illustrates this perception.
Fang-SC: That is, … it is not only what we learned in the classroom, and also outside the
classroom, some extra-curricular knowledge. For example, Gap-filling cloze,
it relates to our real life [topics], such as artificial intelligence (AI), something
like sharing bicycle, that is topic, that is, a lot of topics [are relevant to real
life]. (FG-S1)
Although conflicting ideas were expressed across schools, participants

expressed their ideas in response to different GVT tasks. In general, it was clear that
participants thought MCQ and Sentence Completion lacked authentic language, while
Cloze and Gap-filling cloze had topics that were relevant to real-life experiences. From
these findings, it was clear that students expected authentic tasks in the GVT. In order
to achieve this, it was necessary for both text and tasks to be linked to real-life
situations and the target language use (TLU) domain (Bachman & Palmer, 1996;
Carroll, 1980; Morrow, 1991). In this way, authentic tasks which represented

communicative language skills could generate a positive washback potential (Messick,
1996). Therefore, although the use of written dialogues was decreased in the GVT,
using authentic language and reflecting real-life experiences can help GVT tasks to be
more communicative.
5.2.2 Provision of context

Participants also had differing perceptions regarding the context used in the
testing of students’ grammar and vocabulary knowledge. For some participants, GVT
tasks of MCQ and Sentence Completion were lack of context; while for others, Cloze
and Gap-filling cloze tasks provided a rich context for testing grammar and
vocabulary.
On the one hand, Hu felt that the lack of context provided in MCQ and Sentence
Completion tasks was a concern. In her opinion, MCQ items did have some
background information, but this was only given in one or two sentences (Interview).
This insufficient context in test items created difficulties in explaining answers to
students.
Hu: Some of the tasks [in the GVT], if they lack some language background or
context, in fact, even if you choose the right answers, it is hard to persuade
[test-takers about why this is the right answer]. (Interview)
Likewise, Chao-SA and Ping-SC shared a similar opinion on MCQ and Sentence
Completion. Chao-SA commented that the practice of including only one sentence in
these tasks made it hard for students to “infer from its context” (FG-SA). In a similar
vein, Ping-SC recognised that there was no meaningful context provided in Sentence
Completion (FG-SC). As a result, this finding echoed those of studies which claimed
that decontextualised and discrete-point items cannot fulfil the purpose of
communicative language testing of grammar and vocabulary (Alderson & Hamp-
Lyons, 1996; Harrington, 2018).
From qualitative data, participants’ negative perception of insufficient context

in test items mainly centred on the sentence-based tasks of MCQ and Sentence
Completion. In fact, this negative perception was not unique to the GVT, as researchers
claimed that testing language knowledge through decontextualised test methods such
as yes/no questions and discrete-point items failed to measure students’ grammar and
vocabulary knowledge in a communicative language context (Alderson & Hamp-

Lyons, 1996; Harrington, 2018). In this vein, this perception of Hu and some students
was thus regarded as a negative washback of test design characteristics on teachers.
On the other hand, Lan and Hu agreed that certain tasks in the GVT provided a
rich language context. Nonetheless, this perception was closely related to Cloze and
Gap-filling cloze. In particular, Lan thought that Cloze and Gap-filling cloze provided
a richer context since they were passage-based tasks (Interview). Further, compared
with Sentence Completion, Hu agreed that richer context was involved in MCQ.
Hu: Yes, it (i.e., MCQ) is richer than Sentence Completion. The reason is that it
involves things like more language situation of context and the like. Some new
things like ideas or current affairs or issues can be designed in those MCQ
items, which offers more helpful information. (Interview)
In fact, this idea of “providing a rich context for language use” indicted a positive
washback of GVT design characteristics on participants. Despite Hu’s latter
comparison, it was generally believed by participants that passage-based grammar and
vocabulary tasks were able to involve context at the discourse level and thus better
assess students’ ability to use language. This finding further highlighted the need for
contextualised test items in communicative grammar testing (Rea-Dickins, 1991).
5.2.3 Test method

In relation to the test methods of the four GVT tasks, participants expressed
different opinions. In detail, some participants perceived that the test method of MCQs
(i.e., multiple-choice items in MCQ and Cloze) allowed guessability, and sentence-
based tasks of MCQ and Sentence Completion induced rote-memorisation; while for
others, MCQs tested a wide range of language knowledge. Moreover, the MCQs also
reflected that there was unchanged test content in GVT tasks. Details are reported as
follows.
First, as reported by Hu and Zhang, GVT tasks using MCQs (i.e., MCQ and
Cloze) were guessable. According to them, this could explain for the decreasing use
of MCQs in test design.
Zhang: Hmm, perhaps why it (i.e., MCQ) is decreasing gradually? Because it has a
great opportunistic nature. That is to say, I may not understand anything of this
item, I guess, I could have a twenty-five percent chance of choosing the right
answer. (Interview)

This comment was in agreement with the finding from a range of studies that
using an multiple-choice format to test language knowledge such as vocabulary could
negatively influence test-takers’ performance due to their willingness to guess
(Gyllstad et al., 2015; Nurweni & Read, 1999; Read, 1993, 1995). Therefore, guessing
might negatively influence test performance.
Zhang’s belief about guessing on MCQs accorded with her teaching of this type
of item. For example, she encouraged students to randomly select an answer when they
did not know the right one and even complimented students when they chose the right
answer through guessing. The following example from Zhang’s class is illustrative:
Zhang: This, okay, the second pitfall appears here. “Fine”, what does “fine” mean?
Okay, George.
George: 罚单 [The student was translating “fine” into Chinese.]
Zhang: 罚单, how do you know?
George: I guessed.
Zhang: Guessing. Give him a big hand [applause]. (SC-CO3)
Although Zhang praised George, she was encouraging the use of test-wiseness
strategies. Therefore, this negative perception of “guessability” in MCQ items
undermined the reliability and validity of the test and thus was an indication of
negative washback of the GVT.
Moreover, this finding was in line with students’ impression that the sentence-
based tasks of MCQ and Sentence Completion tested a fixed body of content and thus
measured their memorisation of language knowledge (Fei-SA, Ling-SA, Long-SB,
Xun-SB, Na-SB, Ping-SC). For example, Long-SB noted that MCQ tested fixed
collocations and vocabulary which he learnt by rote-memorisation. Na-SB commented
that she did not need to read the whole sentence of MCQ items but could provide the
correct answer by recalling her knowledge of fixed collocations, which was agreed by
Xun-SB. This negative perception focused on the discrete-point MCQ items and thus
reflected the negative influence of test design characteristics on learning. It further
indicated negative washback that test design could have on students as indicated by
Green (2007a).
Further, reflecting a differing view on multiple-choice test method, teachers and

students maintained that the GVT assessed a wide range of language knowledge, with

participants commenting that tasks like MCQ could test “extensive and rich content”
(Hu, Interview) since there were 15 items included in the MCQ task (Fang-SC). As all
language knowledge gained throughout three years of junior high school study should
be tested through the SHSEET, teachers felt that a large number of items in the
grammar and vocabulary section enabled a thorough assessment of students’ language
knowledge. This reflected the advantage of using MCQs in large-scale tests, since this
test method was traditionally preferred to test both grammar (Rea-Dickins, 1997) and
vocabulary knowledge (Nation, 1990; Read, 2000; Schmitt, 2000). It was further
acknowledged by Moss (2001) who claimed that MCQ tasks could test a wide range
of language knowledge in a short time. Likewise, due to the number of test items in
both Cloze and Gap-filling cloze (i.e., ten for each), Fang-SC also felt that those tasks
tested a wide range of students’ language knowledge (FG-SC).
In fact, this conflicting opinion regarding the task method of MCQs echoed the
debate on the advantages and disadvantages of multiple-choice items in testing
grammar and vocabulary knowledge. As a result, this perception from participants
indicated both the negative and positive washback potential of test design
characteristics on participants as conceptualised in the framework (Green, 2007a).
Additionally, the test methods in the GVT revealed that the test content of certain
GVT tasks were not changed since the original tests. For example, Lan, Hu, and Yao-
SC commented that what was tested in those tasks was fixed, which mainly related to
MCQ, Cloze, and Sentence Completion tasks. For example, Lan regarded MCQ as
testing certain language knowledge points, which were explicit and unchanged year
after year.
Lan: Because every MCQ item tests a certain aspect of language or vocabulary
knowledge. Taking vocabulary as an example, such as articles, comparative
degree, it tests a very specific language knowledge point. (Interview)
Additionally, all students shared the same perception. For example, Chao-SA
felt frustrated with MCQ as he knew what the item was trying to measure once he read
the stem.
Chao-SA: Because, sometimes, doing MCQ is really annoying, it, because it has obvious
tricks. That is to say, … I can guess what it indeed wants to test. Because
sometimes when I see, for example, “future tense of subject and present tense

of subordinate clause”, basically you can spot this structure at a glance. (FG-
SA)
Further, due to the enduring use of certain test tasks, Hu constantly reminded
students of common and frequent language knowledge points tested in GVT tasks in
her class.
Hu: MCQ in the GVT, there must be one item which tests? What kind of
knowledge of object clause?
Students: Declarative sentence order. (SB-CO2)
Moreover, other tasks such as Cloze also measured fixed content like
differentiating adjectives or adverbs (SB-CO4), and Sentence Completion was viewed
as the most fixed. According to Hu, Sentence Completion was full of “tricks” such as
the testing of interrogatives; while for Ping-SC, Sentence Completion always tested
the same type of items such as “negative sentence” and “transformation of
synonymous sentences”. Likewise, students commented that the topics in MCQ did
not change from year to year, which could be boring for test-takers. The following
explanation from Fei-SA illustrates this:
Fei-SA: In my opinion, it (i.e., MCQ) can, that is to say, towards that task, its topics
can be not only, well, like what we were tested in every exam, like topics of “I
borrow one pencil from you”, “I borrow one pencil-box from you”, “I write
one essay”, etc. [laughed] I feel these are very… I feel, well, these topics are
used too frequently and become very boring. (FG-SA)
According to teachers, the unchanged content of the GVT was the evidence of
the simple testing of basic language knowledge in MCQ and Sentence Completion
(Lan, Hu), and that some items focused “too much on grammar knowledge itself” (Hu,
Interview). Further, this testing of basic language knowledge was perceived by Zhang
as reflecting a lack of overall English ability tested in these tasks (Interview).
Therefore, once students knew how to change the “be-verb” pattern in the sentence,
this grammar task could be completed (Hu, interview).
In general, both teachers and students generally regarded the focus of GVT tasks
as unchanged and simply tested students’ basic language knowledge of grammar. In
their opinion, MCQ, Cloze, and Sentence Completion tested fixed content of language
knowledge, and the test content in MCQ and Sentence Completion was too basic and
could be mastered by students through rote-memorising knowledge such as grammar

structures. This finding was thus perceived as negative in measuring students’
language abilities since this test focus could negatively influence participants’
receptiveness to learning grammar in context for meaningful use (Macmillan et al.,
2014; Yang, 2015). As a result, test design characteristics of the GVT indicated a
negative washback potential (Green, 2007a). Further, it reflected students’ confidence
in their competence and also indicated that students expected more interesting and non-
repetitive tests which could test their communicative ability.
5.2.4 Assessing language use

Besides testing language knowledge, the extent to which GVT tasks tested
students’ overall ability to use language was also perceived differently by participants.
On the one hand, some participants commented that GVT tasks of MCQ and Sentence
Completion did not assess language use; on the other hand, the other participants
perceived that GVT tasks of Cloze and Gap-filling cloze more effectively tested
students’ language use than MCQ and Sentence Completion.
As reported at the beginning of section 5.2, students considered that no

communicative language use was measured in the GVT, as written dialogues were no
longer included (FG-SB, Long-SB). Although this general negative perception was
due to their specific understanding of the “communicative language” concept, some
students did regard MCQ as incapable of testing students’ ability to use language (Fei-
SA, Long-SB, Xun-SB, Hui-SB). According to Xun-SB, by using his knowledge of
language, he could easily select correct answers to most MCQ. Further, Hui-SB
commented on the lack of testing of meaningful language use in MCQ.
Hui-SB: Well, in my opinion, MCQ in the GVT does not test, well, does not test us
students’ ability to use language, because it contains all basic, relatively
elementary things. … What it (i.e., MCQ) does is to help use to lay the
foundation, to further set up our future language learning. Therefore, I think it
does not test [our overall ability to use language]. (FG-SB)
As perceived by Hui-SB, MCQ did not test his ability to use language. This
perception was in relation to the fact that MCQ and Sentence Completion tasks
contained easy tasks which tested more language knowledge rather than the overall
ability to use language.
In contrast, teachers did agree that the GVT measured students’ overall ability
to use language. The term “the overall ability to use language” has connotations of

“having communicative competence”, that is, knowing how to use the language
knowledge. For example, Zhang explained that language use was included in
responding successfully to Cloze tasks.
Zhang: Okay, from my viewpoint, how does that task have a communicative
characteristic? Well, in my opinion, it is still a higher requirement in students’
logic and understanding abilities, how do they use the knowledge. Also, it is
the same to Cloze, … Gap-filling cloze tests students’ ability to use language.
(Interview)
The above explanation from Zhang indicated that passage-based tasks such as
Gap-filling cloze did test students’ language use ability since it required students to
use language knowledge to complete the task. This finding aligns with the use of
gapped-sentence tasks in the Use of English section of the Cambridge English:
Proficiency (CPE) test, which was designed to reflect the synergy between
communicative teaching and assessment and was claimed to measure candidates’
productive language knowledge and linguistic competence (Booth & Saville, 2000;
Docherty, 2015). In addition, both Hu and Zhang agreed that the Gap-filling cloze also
tested other language abilities such as making inferences, which was perceived as a
part of overall language use ability for students. This positive test perception from
teachers seemed to conflict with their general negative perception that the GVT did
not test communicative language ability.
Likewise, all students agreed that Cloze and Gap-filling cloze, especially the
latter, were comparatively more effective tasks than MCQ and Sentence Completion
to test their overall ability to use language. For example, Fang-SC commented that
Gap-filling cloze tested more abilities to use language and a wider scope of language
knowledge (FG-SC). By “more abilities”, she meant that Gap-filling cloze was not
only testing language knowledge but also involved skills such as problem-solving (FG-
SC). This perception thus contrasted with students’ negative perception of MCQ and
Sentence Completion, which were perceived as not testing students’ ability to use
language. However, according to students, although the number of MCQ items with
written dialogues decreased, the retention of such items in the MCQ was the evidence
of testing their overall ability to use language. To recall, the written dialogue tasks, as
reported by Zhang, were decreased in MCQ, but one still remained (see Appendix B).
Further, Meng-SC considered that applying correct word meaning in MCQ was also
an indication of testing the overall ability to use language (FG-SC). This comment thus

reflected the understanding that communicative grammar testing calls for teaching
attention to both form and meaning (Rea-Dickins, 1991).
Further, Zhang felt that traditional MCQ items in the GVT also tested students’
application of foundational language knowledge and Hu stated that MCQ and Sentence
Completion items were also able to test students’ reading ability. Zhang’s response is
presented below:
Zhang: I think it (i.e., MCQ) is testing both students’ mastery and application of
language knowledge. … In my opinion, first, you [students] should master
these, these basic vocabulary and grammar. And then, you then try to apply
these knowledge and skills to solve the problem and apply them in the created
situations by test designers. Right, I think it is generally about this, about the
testing of language use ability, I think so. (Interview)
Similarly, all students across three schools agreed that the GVT assessed
students’ language skills other than language knowledge itself. Students agreed that
reading comprehension and understanding passage meaning were tested in Cloze and
Gap-filling cloze (Ming-SA, Xun-SB, Ping-SC, JING-SC, Meng-SC). In addition,
translation ability was tested in Sentence Completion as students needed to write down
answers (Ming-SA); and writing as well as spelling ability was tested in Sentence
Completion and Gap-filling cloze tasks (Fei-SA, Jing-SC, Hua-SC). Students also felt
that Cloze and Gap-filling cloze tasks tested logic and language intuition (Yao-SB,
Ping-SC).
From participants’ interview accounts, although they perceived that no actual

communication was measured in GVT tasks, these tasks did test some aspects of
communicative competence (i.e., the overall ability to use language). To this end, as
stipulated in the ECSCE, this perception indicate that the GVT design met the overall
learning-oriented objective in assessment principles. As a result, this indicated a
positive washback from test design characteristics on participants when referring back
to the theoretical framework (Green, 2007a).
Moreover, regardless of the negative perception of MCQ and Sentence

Completion did not test language use, all three teachers considered the test quality had
improved in accordance with the intention of test designers.

Hu: It might be that they relaxed their exclusive focus on grammar testing. … Not
purely testing a blunt knowledge of grammar, so that it improves students’
ability to use the English language. (Interview)
From the above quote, it was clear that Hu thought the GVT was changing from
assessing language knowledge to improving students’ ability to use language. This
positive shift in test design was also perceived by Lan and Zhang, who categorised the
current GVT as moving closer to the inclusion of more authentic context in test items.
These perspectives on improved test quality indicated the positive washback of the
GVT on participants.
5.2.5 Perceptions of GVT design characteristics as measured in the student

survey
In order to align with qualitative findings, the survey items were designed with
both negative and positive perceptions of GVT design characteristics. After applying
the Exploratory Factor Analysis (EFA), three constructs were extracted from the
student survey (see Appendix L). The first construct of negative perception (v1-v6)
was to gauge students’ negative perceptions of the GVT tasks such as the negative
perceptions in relation to the test method (v1, v3, v5) and no testing of language use
(v2, v4, v6). The second construct of positive perception (v7-v10) was to gauge
students’ positive perceptions of GVT tasks regarding test methods. The third
construct of positive perception (v11-v14) was thus to gauge students’ perceptions of
assessing language use of GVT tasks.
To further explore students’ perceptions of test design characteristics, results

from the student survey are presented in Table 5.2 (see next page). It should be noted
that the student survey was a five-point Likert scale; however, for the convenience of
reporting, the researcher calculated and reported the combined data of the first two
points (1 and 2) and the last two points (4 and 5). Therefore, each variable with three
sets of frequency in percentage is reported. Moreover, indicators v1 to v6 were
reversely coded as they were negatively worded in the survey (see section 4.5.3).
As shown in Table 5.2, the proportion of students who disagreed that the GVT
only measured their rote-memorising ability was remarkably higher than those who
agreed (52.8% versus 17.7%); the same tendency was evident for students’
disagreement versus agreement with Sentence Completion and Gap-filling cloze only
testing their spelling (67.3% versus 15.4%), MCQ format tasks only testing guessing

ability (72.4% versus 10.6%), the GVT measuring their ability of rote-applying
grammar rules (63.0% versus 13.6%), MCQ and Cloze only testing their ability to
eliminate distracting options (52.5% versus 18.2%), and the test content in the GVT
focusing on learning (58.0% versus 18.2%).
Table 5.2
Student Perceptions of GVT design characteristics10(see instrument reliability and validity in section
4.5.3)
10
The overall percentage of each variable was 100 ±0.1 because of the rounding error.

Further, except the indicator v11 which showed that the proportion of students
who agreed that language situations in GVT tasks were in line with real-life situations
was only slightly higher than those who disagreed (27.2% versus 22.8%), a different
tendency was identified for the other positive perceptions. In detail, the proportion of
students who agreed that they read the whole sentence in MCQ and Sentence
Completion tasks was higher than those who disagreed (41.4% versus 27.3%), which
was in contrast to Na-SB’s experience of selecting correct answers without reading the
whole sentence of MCQ items. The same tendency was evident for students’
agreement versus disagreement percentage for reading whole passage in Cloze and
Gap-filling cloze (38.9% versus 30.3%), understanding the sentence context in MCQ
and Sentence Completion (57.3% versus 14.1%), grasping the gist of passage in Cloze
and Gap-filling cloze (61.7% versus 11.9%), testing various topics (50.4% versus
15.0%), testing overall ability to use language (59.5% versus 12.2%), and testing
language use in different language situations (60.6% versus 10.7%). In general, these
quantitative results were in line with the qualitative findings. They (e.g., v13) further
echoed students’ interview accounts such as School C students agreed that
understanding the context of situation in MCQ and Sentence Completion was an
indication of testing ability to use language.
To conclude, students’ perceptions of the GVT were gauged in the student

survey, which included both negative and positive perceptions, with negative
perceptions mainly expressed towards MCQ and Sentence Completion tasks.
Although both qualitative findings and quantitative results showed participants’
concerns over similar topics, the quantitative results showed that generally positive
perceptions were reported by survey participants. To clarify, survey results showed
quite opposite findings to the qualitative results, as the majority of survey participants
disagreed with those negative perceptions expressed by interview participants.
However, students in focus groups who commented negatively on MCQ and Sentence
Completion were from a smaller sample than survey participants. Therefore, although
conflicting ideas existed in participants’ perceptions, their perceptions towards the
GVT as a whole were positive. Nonetheless, the existence of those negative
perceptions appeared to exert no strong washback on students.
In section 5.2, both negative perceptions and positive perceptions from

participants were reported. The conflicting perceptions (i.e., both negative and positive

perceptions) centred on the authenticity of language, provision of context, test method,
and assessing language use. To highlight, negative perceptions mainly focused on
sentence-based tasks of MCQ and Sentence Completion and/or the test method of
MCQs, which reflected students’ current needs in language learning and assessment;
that is, to learn and test language knowledge within a richer context and at the passage
rather than sentence level and it also highlighted the need to use authentic texts and
design meaningful tasks in grammar and vocabulary testing (Messick, 1996; Oller,
1979). It is important to note that quantitative data showed a general tendency of
deviation from negative perceptions and a general tendency towards positive
perceptions. Considering both qualitative and quantitative data, GVT design
characteristics showed both negative and positive washback of the test on teaching and
learning, but a more positive washback was found in students’ quantitative data as they
tended to have more positive than negative perceptions towards GVT tasks.
5.3 AFFECTIVE FACTORS
In addition to participants’ perceptions of test design features, their affective

characteristics were also assumed to influence test preparation as indicated both in
literature findings (see Figure 2.2) and washback variability (Green, 2007a).
Moreover, to theorise LOA, learners’ socio-psychological predispositions such as
emotions and motivation are considered as one of the key dimensions (Turner &
Purpura, 2016). Such affective factors, also extensively discussed in washback
literature (see, for example, Berwick & Ross, 1989; Chen et al., 2018; He, 2010;
Tsagari, 2011), constitute a key point to investigate since they were part of
participants’ characteristics and values at the micro level of washback value.
Therefore, this section explores participants’ affective responses to the GVT and
towards English grammar and vocabulary learning. Through this exploration, the
impact of the GVT exerted on teaching and learning is revealed.
Among participants’ expressions of affective factors, both motivation and test

anxiety were identified, which were thus incorporated in the quantitative survey
design. As such, in this section, the qualitative results from participants are first
reported, which is followed by the results from the student survey. To begin with, this
section is constituted of three topics as identified from qualitative data and further
investigated through the student survey. The three topics of test anxiety during GVT

preparation, intrinsic motivation, and extrinsic motivation for learning English
grammar and vocabulary are reported and discussed subsequently in this section.
5.3.1 Test anxiety

Under the pressure of the GVT, participants often expressed their feelings and
emotions during classroom observations and interviews. The common affective factor
conveyed as well as widely discussed by both teachers and students was test anxiety.
However, even though participants felt anxious during test preparation, the source of
anxiety differed among teachers and students. In this section, findings from qualitative
and quantitative data regarding participants’ test anxiety are documented.
Teachers’ test anxiety about the SHSEET was explicitly shown in their
classroom interactions. A common phenomenon was that teachers expressed their
anxiety after calculating students’ test scores or reviewing test answers in the class.
For example, when she was about to announce test answers in the class, Lan reminded
students to calm down and asked them to not scream when their answers were right or
sigh when their answers were incorrect (SA-CO1). Likewise, Lan was observed trying
to persuade students to make a compromise rather than arguing for an alternative
answer to a GVT exercise (SA-CO2). However, this worry about students’
argumentative behaviour was unique to Lan’s class since her students were generally
high-achieving. Zhang’s test anxiety was mainly centred on her students’ test scores.
Due to the generally low language proficiency of her students, she frequently and
explicitly pointed out her students’ problems and explained her concern in her classes.
For example, when Zhang found that students made mistakes on even the easiest MCQ,
she asked students to stop doing the exercises and commented that it was “meaningless”
to continue (SC-CO1).
Compared to teachers who generally felt anxious about the test, students from
all three schools reported experiencing anxiety, but to a differing extent and for a
variety of reasons. To highlight, students from these three schools reported different
patterns of test anxiety.
In the School A focus group, only Fei-SA and Ling-SA reported feeling anxious
about either the GVT or the SHSEET as a whole, which was different from other
students who reported that they did not experience any change in emotions. For
instance, Ming-SA explained that he actively tried to relax himself so that he wouldn’t

feel nervous as the SHSEET date approached. He was aware that worrying about the
test would impact on test performance. Therefore, he claimed that he spent more time
on entertainment during the test preparation period. This method of self-regulation
explained why students like Ming-SA did not feel anxious about the high-stakes test.
Several of his peers (i.e., Wei-SA, Fei-SA, and Chao-SA) also shared a similar
understanding, which helped them to ease their test anxiety. This finding echoed
Harlen and Deakin-Crick (2003) who found that anxiety was highly related to
participant characteristics, including self-regulation. It further resonated with what
studies claimed about the relationship between good language learners and affective
learning strategies they adopted, such as anxiety reduction and self-encouragement to
control their emotions as well as attitudes (Oxford, 1990). In other words, when
approaching the SHSEET date, students tried to “change the mindset” to face the exam
in a relaxed way, or even “communicate with English as a friend rather than an enemy”
(Chao-SA).
However, for Ming-SA, although he claimed that he was not anxious about the
test, he was quite emotional when talking about the test items. In particular, he
mentioned that if he made a small mistake in MCQ, he felt that “I will collapse” (FG-
SA), which indeed revealed his anxiety. This further indicated that even high-
achieving students from School A felt anxious when approaching the test date. For
example, Fei-SA worried about making mistakes in easy tasks like MCQ in the GVT.
Like the majority of School A students, School B students expressed that they
were not anxious at all. Generally, most students claimed there was “no pressure”
regarding the GVT and the SHSEET as a whole. For example, Long-SB mentioned
the strategy of “treating common tests as high-stakes tests and treating high-stakes
tests as common ones”. However, students’ anxious feelings were influenced by Yi
Zhen which was a pre-SHSEET test that enabled students to be pre-enrolled into their
expected senior high schools based on their Yi Zhen test scores. Hence, this test seemed
to mitigate their test anxieties towards the GVT and the SHSEET. It was thus perceived
as positive which was congruent with the positive washback of the exam reform of
NMET on test anxiety as two tests a year were found to lower students’ test anxiety
(Chen et al., 2018). However, the use and impact of such mock tests before actual test
preparation was unexplored in this SHSEET context since it was beyond the current
research scope.

In contrast, most students in the School C focus group, no matter from which
language proficiency level, expressed that they were “very anxious” towards the
SHSEET (Kai-SC). This was also evidenced by their exam-oriented learning practices
during test preparation. As a result, it could be argued that the affective factor of test
anxiety indicated negative washback of the test on students’ learning. This damaging
influence from the SHSEET in School C was similar to Zeng’s (2008) findings; that
is, the SHSEET aroused students’ anxiety as well as stress which led them to ‘learn to
the test’.
To summarise from the qualitative data, teachers expressed their test anxiety
towards the SHSEET in general and they felt anxious about students’ GVT
performance during test preparation. This anxiety was also felt by Fei-SA, Ling-SA,
and School C students. It thus indicated a negative washback of the GVT on teaching
and learning. Nonetheless, students’ opinions on test anxiety varied from school to
school. School A students were generally not anxious since they felt the GVT was not
challenging to them. Further, although School C students were most anxious which
echoed their teacher’s views, the existence of Yi Zhen was found to mitigate students’
affective factor of test anxiety among School B students. However, it is impossible to
generalise from these findings as the qualitative sample in this study was small.
Therefore, even though all three teachers experienced anxiety, the test brought anxiety
to some students, but not to others (Alderson & Wall, 1993). This finding indicated a
complex washback result regarding students’ test anxiety.
In order to further investigate students’ test anxiety in a larger sample, student

survey data were analysed, and the results are shown in Table 5.3.
Table 5.3
Indicators of test anxiety in the GVT context (see instrument reliability and validity in section 4.5.3)

As shown in Table 5.3, the proportion of students who agreed that their appetite
was unchanged during test preparation was much higher than those who disagreed
(59.5% versus 14.9%). The same tendency was evident for not feeling afraid of
comparing test scores with others (41.1% versus 28.7%) and feeling relaxed (38.4%
versus 27.1%) but reversed for students who worried about teachers’ and parents’
criticism (27.9% versus 39.9%). Except for students’ test anxiety relating to teachers’
and parental pressure, the general quantitative results aligned with the majority of
students’ comments in focus groups. Therefore, generally more students were not
seriously affected by test anxiety of the GVT than those who were.
5.3.2 Intrinsic motivation

Motivation, a decisive factor in successful L2 learning (Gardner, 1985), includes
three categories: intrinsic, extrinsic, and amotivation (Deci & Ryan, 2010). Students
described feeling intrinsically motivated, because they enjoyed learning English
grammar and vocabulary due to their inherent satisfaction or interest rather than
achieving certain utilitarian purposes or outcomes (Deci & Ryan, 1985; Gardner &
Lambert, 1972; Ryan & Deci, 2000; You & Dörnyei, 2014). In fact, interest, which
was found to be highly associated with motivation through intrinsic means (Weber,
2003), was expressed by some student participants. In focus groups, mainly students
with comparatively higher language proficiency (Ling-SA, Chao-SA, Na-SB)
emphasised the importance of having an interest in language learning. For example, in
response to whether the GVT could promote students’ learning, Chao-SA believed that
cultivating students’ interest in English learning should be the priority in junior high
school and revealed that he was intrinsically motivated to study English.
Chao-SA: Well, I also think that for English, the most important thing is, currently at the
junior high school level, to cultivate that, students’ interest in English, because
interest is the motivation. Such as, usually, if I do exercises, I will feel that
doing, reading English while doing exercises is really interesting, and thus I
will keep on doing the exercises. …… I felt like copying those words, I think
they are very, very, very “interesting”. Anyway, I just feel like that I am not
going to be tired. (FG-SA)
Due to the fact that the aforementioned students (Lin-SA, Chao-SA, Na-SB) who
had an intrinsic motivation in learning English were mainly high-achieving students,
it could thus support a relationship between motivation and learners’ language
achievements as claimed by researchers such as Gardner et al. (1985). To further

explore students’ intrinsic motivation in a larger sample, survey data are presented in
Table 5.4.
Table 5.4
Indicators of intrinsic motivation (see instrument reliability and validity in section 4.5.3)
As shown in Table 5.4, the proportion of students who agreed that they learned
English grammar and vocabulary for the purpose of promoting future learning was
much higher than those who disagreed (83.3% versus 2.3%); the same tendency
remained for reading books and surfing the Internet (80.7% versus 5.2%), helping
English language communication (78.4% versus 5.3%), and using resources to
understand foreign cultures (73.4% versus 6.5%). The results showed that the survey
respondents were overall intrinsically motivated to learn English grammar and
vocabulary during GVT preparation.
5.3.3 Extrinsic motivation

Students also expressed their extrinsic motivation in learning English for the
purpose of achieving certain outcomes (Deci & Ryan, 1985; Ryan & Deci, 2000; You
& Dörnyei, 2014). From focus group data, it was found that students had three types
of extrinsic motivation driven by instrumental purposes of language use, monitoring
learning progress, and using test scores to compete with peers. These purposes were
more about English learning in general rather than the GVT specifically. First, using
English or grammar and vocabulary knowledge in communicating in life and society
was considered by some students as essential. For example, Xia-SA expressed that
learning English was to prepare herself for joining the society and using the language
for international communication in the future, and for Wei-SA, English learning was

to cultivate an international insight. In addition, Yao-SB and Shu-SB felt the test could
improve their learning in that it helped them to do exercises. For example, regarding
their intention of using English in class, Yao-SB replied that “using more English
language may provide… a lot of help when I do exercises” (FG-SB). Therefore, Yao-
SB hoped to use the language for the purpose of doing exercises and improving her
study. This finding aligned with that of He (2010) that the participants’ instrumental
motivation of being successful in the GSEEE was an indication of negative washback.
Second, students from all three schools agreed that through taking GVT
exercises, they could monitor their grammar and vocabulary learning progress as well
as learning outcomes, as a student explained below:
Xia-SA: Because, normally we learn a lot of grammar knowledge, and then, doing this
task (i.e., the MCQ task in the GVT) is like assessing the normal [grammar
and vocabulary learning]. So, it checks whether you truly learn this knowledge
well or not, otherwise, what you do not know, you can still, that is, to fill in
the gap gradually. This is somehow helpful to the difficult tasks that follow
afterwards. (FG-SA)
According to Xia-SA, the purpose of taking GVT tasks was to monitor her
learning progress, detect learning gaps, and facilitate more challenging task
completion in the test. This intention of using the GVT to monitor learning progress
and diagnose learning gaps proved that students were extrinsically motivated to learn
English grammar and vocabulary in the GVT preparation stage. This finding coincides
with research (Popham, 2001; Qi, 2007) which found an instrumental purpose of tests
brought about a negative washback result.
Third, competing with peers was also regarded by students as extrinsically

motivating their English learning during GVT preparation. For example, Wei-SA
argued that in the test preparation stage, he kept comparing himself to peers, which
motivated him to spend more effort in study. This extrinsic motivation for English
study was also expressed by Yao-SB and Shu-SB, who thought ranking could motivate
them to compete with others and study harder.
From qualitative findings, it was common that students usually had short-term
goals of achieving higher test scores (Buck, 1988) which was considered as an
extrinsic motivation in this study. Therefore, the extrinsic motivation in regard to the

achievement on the GVT motivated students towards exam-oriented language learning
during test preparation. Further evidence was explored through student survey results.
Table 5.5
Indicators of extrinsic motivation (see instrument reliability and validity in section 4.5.3)
As shown in Table 5.5, regarding participants’ extrinsic motivation, the

proportion of students who agreed that learning English grammar and vocabulary was
to help their senior high school enrolment was much higher than those who disagreed
(64.7% versus 7.6); the same tendency was evident for getting a higher test score in
various tests (64.0% versus 7.0%), becoming a successful member of society (53.0%
versus 10.5%), and helping pass language evaluations in their future career (64.2%
versus 7.9%). These findings indicated that the survey participants were generally
extrinsically motivated in English grammar and vocabulary learning during GVT
preparation.
In order to achieve visible washback, participants’ motivation should be

sufficient to influence teaching and learning (Hughes, 1993) and to motivate test
preparation effort (Green, 2013). To summarise section 5.3, both qualitative and
quantitative results of affective factors from participants were reported. Generally,
teachers felt anxious about the GVT and the SHSEET, which indicated a negative
washback of the test. Likewise, some focus group participants were found to be
anxious (mainly Fei-SA, Ling-SA, and School C students) while others were more
relaxed. In addition, students were also found to be intrinsically motivated (Ling-SA,
Chao-SA, Na-SB) and extrinsically motivated (Wei-SA, Xia-SA, Ling-SA, Yao-SB,
Shu-SB) to learn English grammar and vocabulary during test preparation. These
qualitative findings were further supported by quantitative results which revealed that
the majority of survey participants did not feel anxious during GVT preparation, but

students were concerned about criticism from teachers and parents for their test
achievements. The data further showed that students were highly motivated in learning
English grammar and vocabulary during GVT preparation. As intrinsic motivation
promoted students’ language learning and test anxiety as well as extrinsic motivation
encouraged exam-oriented learning, it was thus evident that the test exerted both
positive and negative washback on participants.
5.4 TEST PREPARATION MATERIALS
At the micro level of washback value in the current proposed washback model,
the “teaching and learning” factor includes various teaching- as well as learning-
related aspects of which test preparation materials are an indicator. Test preparation
materials were categorised as participants’ resources to meet test demands (Green,
2007a). In this study, it was thus viewed as teaching and learning materials for
grammar and vocabulary. From classroom observations and interviews, participants
bought test preparation materials (either designated by the school or purchased
individually) to undertake test review practices and better prepare for the exam. It was
found that the materials used both in and outside class for English language teaching
and learning during test preparation were mainly exam-oriented. In this study, the term
“exam-oriented” conveys a similar meaning as “test-use oriented” which will be
discussed with learning strategies in section 5.5. Therefore, exam-oriented materials
and test-use oriented strategies refer to materials and strategies that were adopted by
participants to exclusively prepare for the test. Qualitative data showed that different
kinds of exam-oriented test preparation materials were adopted by teachers and
students; however, it is noticeable that students with high language proficiency levels
(mainly those high-achieving students from School A) also utilised non-exam oriented
learning materials during test preparation. Findings are reported as follows.
5.4.1 Exam-oriented test preparation materials

From the qualitative data, it emerged that four types of exam-oriented materials
were used by participants: test-based textbooks, test review coaching books, grammar
and vocabulary lists in the ECSCE and Test Specifications, and self-selcted test
preparation materials such as mock test papers. Each of these four materials is
presented with examples in this section.

Test-based textbooks
The first resource was the textbooks published by People’s Education Press
(PEP) as mentioned by mainly teachers, which reflected the test scope stipulated by
the ECSCE. However, teachers’ use of textbooks varied to some extent. Generally,
both Hu and Zhang used this series of PEP textbooks in their junior high school
teaching. Although Lan used these textbooks in the first semester of Grade 9 when she
started SHSEET test preparation, she was the only one who mentioned the change of
textbooks in Grade 9:
Lan: …. in Grade 7 and 8, we used [authentic] textbooks which have a higher

difficulty level since we hope that our students can reach a better learning
level. Therefore, we chose more difficult textbooks in teaching. … That is to
say, during SHSEET test preparation, we need to go back to use the textbooks
that cover the test scope of SHSEET. Therefore, we went back to use the
textbooks published by PEP in Grade 9. (Interview)
According to Lan’s account, the school’s decision to change from more authentic
and challenging textbooks to the test-related textbooks in Grade 9 could be to support
students to better prepare for the exam when the test date was approaching. This was
similar to findings from other test preparation contexts, for example, where passing
the test became the major goal in the time leading up to the CET-4 (Zhan & Andrews,
2014). Further, although not explicitly mentioned by Lan, her students in the focus
group mentioned their practice of “going through the vocabulary and grammar
content” in all the test-based textbooks for test review in the first semester of Grade 9.
However, this change to less authentic textbooks and the focus on grammar and
vocabulary knowledge in the textbooks indicated negative washback and an intense
washback on teaching and learning. This finding aligned with those from other studies,
where teachers’ textbook use during test preparation only focussed on test-related
content (Saif, 2006).
Test review coaching books

The second resource was the test review coaching books which were designed
for test teaching and preparation11, and varied from school to school. According to
11
The test review coaching book is a typical material designed for test review and test preparation. In
spite of the different names for various coaching books, they have almost the same structure; that is,
language knowledge summarised in a systematic way, past test items on certain language knowledge,
and mock exercises for students to strengthen their knowledge learned.

teachers, they were mainly adopters of the decision or implementors of the decided
teaching materials, since it was the school (i.e., Head of Curriculum) who chose what
coaching books to use. Nevertheless, they described the crucial role of test review
coaching books in Grade 9. This was not unique to the Chinese test preparation
context, as Wall and Alderson (1993) found similar phenomena in the third term of the
academic year when teachers ignored textbooks and instead taught from test
preparation materials for the O-Level examination in Sri Lanka. As the use of this kind
of test review coaching materials resulted in the narrowing of learning and teaching
scope, it was thus believed by researchers (see, for example, Damankesh & Babaii,
2015) to bring about negative washback. Therefore, it is evident that the phenomenon
of “narrowing of the curriculum” (Madaus, 1988) occurred in the current SHSEET test
preparation context. For instance, when responding to an interview question on
teaching materials in test preparation, Hu emphasised the use of teaching guidance
materials:
Hu: In Grade 9, the teaching is definitely guided by this teaching guidance

materials. It’s not possible that you buy this material but do not use it, right?
(Interview)
According to Hu, their school chose Ba Shu Talents SHSEET Final Review (巴
蜀英才中考总复习方案) as the main test preparation guidance book, but the decision
was made by the Head of Curriculum (Interview). Therefore, it was obvious that
teachers had little power or authority in decision-making regarding test preparation
materials, even for the classes that they were responsible for. This phenomenon was
further illustrated by Zhang, who had the role of Director of Teaching Affairs in her
school, when she expressed her regret regarding her choice of major teaching guidance
materials for test preparation: “this time, I did not choose very well.” (Interview)
According to her, the quality of the material was not satisfactory.
Similarly, students also mentioned the use of school-designated test review

coaching books to learn English grammar and vocabulary. In particular, students
confirmed the same books that teachers mentioned in the interview, which included:
New Direction (新方向) for School A students, Ba Shu Talents SHSEET Final Review
(巴蜀英才中考总复习方案) for School B students, and New Pivot (新支点) for
School C students.

Therefore, regarding the teaching guidance material, each school used mainly
test review coaching books purchased school wide. This commercial test preparation
material was exam-oriented and was the main resource for the SHSEET, especially
language knowledge, test review, and preparation. Therefore, the heavy reliance on
these books was an indication of negative washback (Wall & Alderson, 1993).
Grammar and vocabulary lists in official test reference documents

The third resource for test preparation was the vocabulary lists in the ECSCE
and/or Test Specifications. As this study focuses on investigating grammar and
vocabualry test preparation and these two kinds of language knowledge are the
foundation of language learning, it was found that teachers’ test preparation for the
GVT was in alignment with the ECSCE and Test Specifications required knowledge
scope. In detail, Lan mentioned that teachers prepared vocabulary knowledge
according to the ECSCE before Test Specifications were released in April 2018
(Interview). This finding aligned with teachers’ use of official test reference
documents when they focused mainly on their designated test scope (see section 5.1.2).
This negative influence from the exam was not unique to the SHSEET context, as
studies (see, for example, Luo, 2012) found that teachers tended to focus on mainly
what was relevant to test content from the curriculum, so that areas not included in the
final exam were deliberately reduced or absent in classroom teaching.
This was further supported by students’ interview accounts. Although students

did not primarily use the ECSCE, they endeavoured to use the test-related content such
as grammar and vocabulary lists included in Test Specifications. Interestingly, in
contrast to Hu’s comment that “some students even do not know what Test
Specifications is,”, it was found that School C students (Jing-SC, Kai-SC, and Hua-
SC) referred to their use of Test Specifications during test preparation. This finding
was thus in alignment with Zhang’s reminder that she asked her students to study Test
Specifications. For example, Jing-SC viewed their purpose in a similar way to Zhang;
that is, Test Specifications listed the scope for the SHSEET, and grammar and
vocabulary learning should abide by this:
Jing-SC: Now, as the textbook teaching has all finished, [we learn grammar and
vocabulary] according to, currently, what is in Test Specifications. Well, there
are clues, for example, clues like what will be tested in MCQ in the GVT. (FG-
SC)

Self-selected test preparation materials
The fourth material resource was self-selected test preparation materials, such as
SHSEET mock tests and English study newspapers. For example, Lan mentioned the
use of 21st Century English study newspapers when explaining the choice of test
preparation materials in her school:
Lan: We also subscribed 21st Century English study newspapers. The reason is that
the passages in that newspapers, the topics, are before, comparatively before,
well, what should I say, avant-garde, very new [laughed]. Yes, they are linked
to the topics with current affairs. (Interview)
In contrast, Hu and Zhang frequently used test-driven exercises as well as mock

test papers in classroom teaching. For example, Hu used mock test papers in her class
(SHSEET practice test papers) and Zhang used test-related exercises which were part
of her teaching guidance materials (i.e., New Pivot, 新支点) in the class. In addition,
School C also utilised a computer-supported learning system to deliver mock test
papers for students’ self-study.
Zhang: Well, we now, I mean, the school, spent 6,000 RMB and built an online self-
study website. Although it is a computerised exam marking system, well, we
can still check individual students’ learning progress through any individual
test item. (Interview)
Nonetheless, this use of technology in test preparation was still exam-oriented.

Therefore, to conclude, all materials used in class in the time leading up to the test
were exam-oriented. Linking to the theoretical component of participants’ resources
used to meet test demands (Green, 2007a), such use of exam-oriented test preparation
materials in classes thus proved a negative washback value at the micro level.
In students’ extra-curricular time, to strengthen their grammar and vocabulary

learning during test preparation, students generally made use of a variety of
commercial test papers. These commercial test papers were mainly purchased either
under teachers’ guidance or by students’ own choice. The most common ones were
different sets of test papers used across schools, including Tianli Test Papers (天利 38
套) (Ling-SA, Xun-SB, Yao-SB, Jing-SC, Meng-SC, Kai-SC, Hua-SC) and SHSEET
Test Papers (五年中考三年模拟) (Shu-SB, Na-SB), Spark English (星火英语)
(Ling-SA, Hui-SB), and Test Study (试题研究) (Chao-SA). These publications usually
included authentic test papers from past years, a series of mock test papers, and/or

some adapted test papers. Other students (Fei-SA, Ling-SA) used commercial exercise
textbooks which classified different test tasks according to same topics (i.e., topics
stipulated in the ECSCE to be within test scope for SHSEET). Similar to teachers, all
these materials mentioned by students indicated that they paid great attention to the
test-designated content and testing materials, which reflected findings by Erfani (2012)
in the context of IELTS and TOEFL iBT test preparation. Moreover, compared to
teachers, students listed a wider variety of test preparation materials they used in self-
study time. In addition, some students referred to their experiences of searching online
for learning materials or mock test papers during test preparation. For example, in his
response to the interview question on his experience in preparing for the GVT, Kai-
SC mentioned his choice of doing online authentic SHSEET test papers through
searching and collecting previous SHSEET test papers (FG-SC). Similarly, Long-SB
also mentioned the utilisation of online resources during test preparation. However, as
the only student who mentioned private tutoring during test preparation, his search for
test preparation materials occurred in self-study time and mostly under parents’ or a
tutor’s supervision.
Long-SB: I mean searching online, or, well, sometimes, my tutor gives me when I go to
private tutoring classes. Or when I stay at home, my parents search online since
they have nothing else to do, and then ask me to finish [the test-driven
exercises/tasks]. Because sometimes I am idling around, so they find test
preparation materials for me to complete. (FG-SB)
To conclude, teachers and students principally adopted exam-oriented test

preparation materials both in classes and in self-study time, which reflected the
findings of other SHSEET studies (see, for example, Zeng, 2008). Further, similar to
the CET-4 test preparation (Zhan & Andrews, 2014), students’ choice of learning
materials shifted to the specific use of past exam papers, test-related websites, and
vocabulary lists as the test date approached. These findings resonate with those of
studies in other countries in which students were found to ‘study for the test’
(Zafarghandi & Nemati, 2015) and little attention was paid to non-test related materials
by students (Erfani, 2012).
5.4.2 Non-exam oriented learning materials

In the GVT preparation stage, even though all participants primarily adopted
exam-oriented materials, non-exam oriented learning materials were utilised by

students with high language proficiency levels. Notably, School A students expressed
slightly different ideas in choosing learning materials from others. For example, Fei-
SA mentioned his reading from the Oxford Bookworm Library, and other students
(Ming-SA, Ling-SA, and Chao-SA) mentioned reading and learning from English
dictionaries to accumulate more vocabulary knowledge in their self-study time. This
finding thus proved evidence for positive washback. Nonetheless, although it was
possible for high-achieving students to adopt learning-oriented materials as resources
to meet test demands, the overwhelming use of exam-oriented materials amongst other
students indicated that negative washback outweighed positive washback regarding
the use of test preparation materials among participants.
Section 5.4 has mainly reported the test preparation materials that participants
used. The common materials used by both teachers and students and across schools
were: test-based textbooks, school-designated test review books, language knowledge
such as vocabulary lists in official test reference documents of the ECSCE and Test
Specifications, and self-selected test preparation materials (mainly mock test papers
and exercises). Nonetheless, students with high language proficiency levels sometimes
tended to choose non-exam oriented learning materials to learn English grammar and
vocabulary. However, few students used such learning materials, with the majority of
participants using exam-oriented test preparation materials for learning grammar and
vocabulary. Therefore, those exam-oriented test preparation materials, aiming to help
students better prepare for the exam, were evident signs of a negative washback on
teaching and learning (Alderson & Wall, 1993; Zhan & Andrews, 2014). To clarify,
participants’ use of test preparation materials was not included and generalised in the
student survey. The reason was that the interviewed participants had provided ample
evidence, so that there was no need to collect additional data on this topic.
5.5 GRAMMAR AND VOCABULARY LEARNING STRATEGIES
After analysing the qualitative data of classroom observations, semi-structured

interviews, and focus groups, it was found that divergent learning strategies were both
taught and adopted by participants during GVT preparation. Since those strategies
were commonly used by teachers and students, this section reports those teaching
methods and learning strategies related to grammar and vocabulary jointly.

To begin with, although teachers’ education and teaching background varied, the
grammar and vocabulary learning strategies they taught during test preparation were
quite similar. Teachers, especially Hu, used a teaching pattern of “lecture-evaluate-do
exercises” during test preparation.
Hu: Well, the normal skill is the trilogy of lecture-evaluate-do exercises. Of course,
when you actually apply it, you certainly need to use some tricks. Otherwise,
if you do lecture, do exercises, and evaluate every day, the student will die,
and you yourself can die. … That’s it. Anyway, when you actually apply [the
trilogy], use more, more, more, richer tricks. But the key is for sure, the key is
certainly lecture-evaluate-do exercises. (Interview)
According to Hu, the key principle for test preparation teaching was to follow
the trilogy of “lecture-evaluate-do exercises” and she did not think there were other
possible or effective test preparation pedagogy. Likewise, when doing test exercises,
Lan mentioned her normal test preparation teaching of “do exercises-check answers-
give feedback or instructions on students’ problems only” (Interview). These patterns
were found in all three teachers’ teaching practices. Hence, it indicated the
predominance of teacher-dominated test preparation activities, which was also evident
in other test preparation contexts such as the NMET (Qi, 2010) and IELTS academic
writing (Green, 2006b).
It is important to note that teachers also mentioned the changes they made in
Grade 9 to teaching grammar and vocabulary learning strategies. The major difference
was that teachers used communicative language teaching such as providing an
authentic context for students to learn a word or a certain grammar structure in Grade
7 and Grade 8. However, in the test preparation context in Grade 9, they mainly used
exercises to explicitly teach grammar and vocabulary. According to Zhang, this shift
in teaching method from creating a meaningful situational context to choosing a
correct answer in exercises was due to the tasks used in the GVT. The rationale for
this change, explained by Zhang, was that students were at different stages of learning
language knowledge. In other words, in lower grades, certain grammar structures and
vocabulary were new knowledge to them; whereas in Grade 9, the focus was to
consolidate their knowledge and review it to prepare for the final exam.
In contrast to teachers, students reported that the grammar and vocabulary

learning strategies for GVT preparation (Fei-SA, Xia-SA) and even for the whole
junior high school stage (Long-SB) were not changed. In their opinion, there was no

need to change their learning strategies since it could take a longer time for them to
rethink a new way of test review which might disturb their usual way of learning (Xia-
SA, Chao-SA). As a result, they kept using the same learning strategies in test
preparation stage.
Therefore, the test preparation teaching model was the same and the grammar
and vocabulary learning strategies remained similar across the observed classes. The
objective of this study was not to compare teaching methods and learning strategies;
instead, it adopts the term “learning strategy” to jointly report the teacher and student
data. To this end, “grammar and vocabulary learning strategies” referred to both
teachers’ teaching and students’ use of grammar and vocabulary strategies. As the
study aims to determine the washback value of the GVT, the qualitative data were
categorised into two types: test-use oriented learning strategies and learning-use
oriented learning strategies, which followed Zhi and Wang (2019). Further, the
categorisation and definition of these two strategies also modified those of Doe and
Fox (2011). In detail, test-use oriented strategies refer to those “used for a specific
testing activity”, which “are test-dependent but language independent”; and language-
use oriented strategies are strategies that “are activated to support engagement in or
with language itself” (Doe & Fox, 2011, p. 31). In applying test-use oriented strategies,
participants expected to improve test scores as their main purpose.
5.5.1 Test-use oriented grammar and vocabulary learning strategies

Summarising from both teachers’ and students’ data, various types of language
strategies were used to prepare for GVT tasks. Initially, as three teachers had
commented in interviews, their teaching methods in test preparation in Grade 9 were
different from Grade 7 and Grade 8. To clarify, Hu’s teaching in Grade 9 was about
systematic review for test preparation, but in Grade 7 and Grade 8, she taught students
according to their interests. For example, when talking about the topic of “winner
winner, chicken dinner (“大吉大利，今晚吃鸡” in Chinese)”, she would tell students
more about the cultural background of this saying if it were brought up in Grade 7 and
Grade 8. However, in Grade 9 teaching, detailed cultural background information was
not included. Therefore, her teaching shifted from teaching for interest to teaching for
test preparation as non-tested topics were neglected. This shift was also found in other
studies (Erfani, 2012; Saif, 2006). Further, Zhang mentioned her teaching of
vocabulary in Grade 7 and Grade 8 was mainly through the teaching of macroskills of

listening, speaking, reading, and writing. However, when test review started in Grade
9, vocabulary was taught less through writing and listening. Instead, she mainly taught
to the test by teaching test-taking strategies and techniques (Tsagari, 2011;
Zafarghandi & Nemati, 2015) focusing on tasks like MCQ.
Against this backdrop, both teachers and students used and reported their use of
the following test-use oriented grammar and vocabulary learning strategies.
• Using test-wiseness strategies (strategies that help improve test scores

exclusively by making use of test design characteristics rather than language
knowledge);
• Selective attention (deciding what to pay attention to for a test-taking

purpose exclusively);
• Rote-memorisation (memorising grammar and vocabulary knowledge

without necessarily understanding them).
• Drilling (doing mock tests and test-driven exercises repetitively);
In addition, teachers also reported their use of a fifth grammar and vocabulary
learning strategy to prepare students for the test. This strategy is:
• Anticipating challenges that students might encounter (thinking from a

student’s perspective to assume their problems in solving test tasks).
Despite various strategies, this section lists examples of using test-wiseness

strategies, selective attention, rote-memorisation, and anticipating challenges that
students might encounter to reveal the way how participants learn English grammar
and vocabulary during GVT preparation.
Using test-wiseness strategies

In brief, test-wiseness strategies were in relation to the utilisation of test
characteristics or formats with an intention to achieve a higher test score. It was found
that all three teachers and students adopted test-wiseness strategies to prepare students
to gain a higher score in the test. “Test-wiseness strategies” in this study refer to those
mainly aiming for making use of test characteristics and formats to obtain a higher
score (Cohen, 2013; Millman et al., 1965) and is both task-specific (Xie, 2015b) and
construct-irrelevant (Yang & Plakans, 2012). As such, test-wiseness was perceived as
independent from test-takers’ subject knowledge about what was assessed, which was

used specifically in the test preparation stage to improve test scores rather than
language learning.
Main categories of test-wiseness strategies commonly utilised by teachers and

students included but were not limited to: guessing test designers’ intentions, error-
avoidance strategy, excluding, guessing, finding keywords, searching for answers, and
applying specific test-taking skills. Those strategies were found to be commonly and
directly linked to students’ subject knowledge and they viewed gaining higher test
scores as the goal. Therefore, they were classified as test-use oriented grammar and
vocabulary learning strategies. Examples of those strategies are listed below.
First of all, all three teachers were found to anticipate the possible test topics or
items in their teaching. In order to persuade her students who frequently challenged
the answers provided by test designers, Lan emphasised the importance of guessing
the answers that test designers expected (SA-CO2). This idea of guessing test
designers’ intentions was also apparent in the other two teachers’ data. In fact, during
test preparation, it was common for the teachers to anticipate possible test items or
topics and to remind students that they could guess what would be tested in the actual
exam to prepare their learning accordingly. For example, after summarising passive
voice items in the past three-year’s test papers, Zhang asked her students about the
likelihood of a related item being tested in the 2018 paper.
Zhang: So for 2018 test paper, can you guess?
Ss: Future, simple future tense.
Zhang: Absolutely possible. And it might also have modal verbs?
Ss & Zhang: Passive voice.
Zhang: So this you can guess by yourself, you analyse those test items, and
then you can accurately grasp, understanding? (SC-CO3)
Likewise, School A students mentioned the learning strategy of guessing test

designers’ intentions. According to them, when dealing with MCQs in the GVT, if it
was hard to choose an answer from four seemingly-correct options, they then needed
to “talk to the test designers” as if they were mentally communicating with them (FG-
SA). In fact, guessing test design intentions was similar to teachers’ anticipation of test
topics, which were all test-use oriented.

Moreover, guessing was widely admitted by students to be useful in dealing with
MCQ and Cloze tasks. According to some students (Ling-SA, Ping-SC, Kai-SC), if
they did not know the answer to an MCQ item, they could have a 25% chance of
choosing the right answer. Hence, compared to constructed-response test items in
Sentence Completion and Gap-filling cloze, selected-response test items enabled
students to guess, even though they did not know the answer or corresponding
knowledge. This finding echoed the research finding that objective test items such as
MCQs closely linked with test-wiseness strategy use (Cohen, 2006; Millman, 1965).
Another test-wiseness strategy used by all three teachers was error-avoidance

strategy, namely, avoiding using vocabulary that was beyond the test scope. For
example, in her class, Lan regularly said “I don’t suggest you use this word” which
was beyond the test scope (SC-CO3). The following quote illustrates her concern:
Lan: Various. [pause] The word “various” is also not an SHSEET word, so students,
you not, in Gap-filling cloze, for goodness’s sake, do not use non-SHSEET
words, like that word “better”, last time you used that word, originally I did
not want to give you a score on using that word, but I considered that the
meaning was still right, so don’t use this kind [of non-SHSEET words]. (SA-
CO3)
Lan’s comment indicated that students should avoid using ‘non-SHSEET words’
(those that did not appear in the SHSEET vocabulary list) in the exam. These reference
lists, taken from the ECSCE and Test Specifications, clearly listed the vocabulary
scope for the SHSEET.
Further, teachers taught the use of specific test-taking principles and strategies
that were explicit to help test preparation. For example, test-taking principles such as
“do not be too happy to recall the correct word form (“得意而不忘形” in Chinese)”
and “no sentence without a verb (“无动不成句” in Chinese)” were widely used by
teachers. According to them, the former principle meant to pay close attention to
correct forms of words when responding to tasks such as Gap-filling cloze. The use of
this principle is explained below:
Zhang: when we some-[times] do Gap-filling cloze, we teachers will talk about skills.
The last skill is definitely “do not be too happy to recall the correct word form”,
I know this answer, I use “house”, but I might need to use the plural form of
“house”. This word I need “do”, but maybe it should be a past tense, or maybe
perfect tense, or maybe even present progressive tense. This word is thus “do

not be too happy to recall the correct word form”, that is to say, to use its
correct form. (Interview)
In sum, test-wiseness strategies were widely acknowledged and used in test

preparation across all three observed classes. This was with an aim to improve students’
test scores (Green, 2006b). Therefore, it could be inferred that those test-wiseness
strategies used were all test-use oriented, and thus supported the claim that the GVT
brought about a negative washback on the teaching and learning of grammar and
vocabulary.
Selective attention
Under teachers’ teaching guidance, students selectively prepared for the GVT
by focusing on filling in their test preparation gaps such as reviewing common
mistakes or strategically changing their learning foci. Results showed that teachers
taught and mainly School B as well as School C students adopted this learning strategy
during test preparation. According to them, this learning strategy could help them
improve test scores and learn specific language knowledge. Ping-SC’s quote below
exemplifies this:
Ping-SC: But when the test is approaching, definitely grasp the weaknesses, quite weak
sections, which are quite easy to improve test scores. Then that is, now, we are
not focusing on everything, but focusing on those sections that are [our] own
deficiencies. (FG-SC)
By paying attention to her weaknesses, Ping-SC hoped to improve test scores in

the coming exam. In fact, both students and teachers targeted at making students
“specialise in one point” such as the object clause and asked them to “have many and
repetitive exercises, in order to make students fully comprehend this knowledge” (Kai-
SC). Therefore, this behaviour of selective attention indicated negative washback on
students’ learning since it was exam-oriented, which was similarly found in other
SHSEET studies like that of Zeng (2008).
Rote-memorisation
Rote-memorisng grammar rules and vocabulary was found to occasionally
appear in Hu and Zhang’s classroom teaching. For example, when she was trying to
explain “the responsive principle (“呼应性原则” in Chinese)” of subject-predicate
consistency, Zhang emphasised to students the importance of rote-memorising this
principle.

Zhang: Can you understand? Can you memorise? Can you memorise? Remember?
Now, one minute for you to look at, look at it and remember. [pause] (SC-
CO1)
In addition to the in-class emphasis, participants also reported the use of rote-
memorising in interviews. According to Hu, at the beginning of test preparation, she
required students to rote-memorise vocabulary lists from the ECSCE and Test
Specifications, however, as they were getting closer to the test date, she felt this
method was inappropriate since the time was limited. Instead, she asked students to
rote-memorise vocabulary through reading tasks in test papers or drills in classrooms.
Rote-memorisation is not uncommon in the literature, with studies found that the
use of rote memorisation, even in integrated tasks, could exert a negative influence on
language learning (Green, 2006b; Linn et al., 1991; Tsagari, 2011). Therefore, this
teaching method of rote-memorising or memorising repeated test points revealed
negative washback on School B’s and School C’s grammar and vocabulary study.
Anticipating challenges that students might encounter

Interview data showed that teachers sometimes chose to anticipate potential
challenges that students might encounter during test preparation. This was mainly
adopted by both Hu and Zhang in their teaching when they did test-driven exercises.
Their intention was to more effectively prepare students for test tasks. Therefore,
Zhang first did the exercises herself in order to give further instructions to students.
Zhang: All right, then, certainly I sometimes anticipate during the process of doing the
exercise. But in fact, sometimes my anticipation can be wrong. I guessed that
students, this, could be their difficulty, so that many students would make a
mistake here. However, it did not turn out to be the case. Sometimes I feel this,
of course, because this is because that my comprehension of the test item is
different from students’ understanding. (Interview)
According to Zhang, she could made mistakes in predicting difficulties due to

the difference between her and her students’ understanding. Further, due to her limited
English language proficiency, to better teach students, she tried to put herself in a
student’s position to understand and anticipate challenges they may face in test-taking.
With the purpose of improving test scores, this teaching method was thus viewed as
exam-oriented.

In addition to the aforementioned test-use oriented learning strategies adopted
by participants during GVT preparation, the study probed these findings in a wider
participant group of the student survey. The results of the student survey are thus
presented in Table 5.6.
Table 5.6
Indicators of test-use oriented learning strategies (see instrument reliability and validity in section
4.5.3)
As revealed in Table 5.6, the proportion of students who reported being reliant
on supplementary learning materials was lower than those who reported they did not
(27.3% versus 36.2%); the same tendency was evident for repetitively doing test-
driven exercises (25.9% versus 42.4%) and rote-memorisation (24.3% versus 47.6%).
Thus, survey participants were found to not predominately adopt those three test-use
oriented strategies during GVT preparation.
To summarise, a variety of test-use oriented grammar and vocabulary learning

strategies were adopted by participants during GVT preparation. From qualitative data,
those strategies mainly included: 1) using test-wiseness strategies, 2) selective
attention, 3) rote-memorisation, 4) drilling, and 5) anticipating challenges that students
might encounter. Moreover, more test-use oriented learning strategies were reported
by students from School B and School C than those from School A. However,
contrasting results were found from the quantitative data, because more surveyed
students tended not to use those test-use oriented learning strategies during test
preparation. Nonetheless, all these strategies, applied during GVT preparation,
indicated the negative washback of the GVT on grammar and vocabulary teaching and
learning.

5.5.2 Language-use oriented grammar and vocabulary learning strategies
Strategies that participants used to improve English grammar and vocabulary
learning rather than exclusively gaining a higher score in the GVT were language-use
oriented. To this end, those strategies indicated positive washback on learning during
test preparation. From qualitative data, various learning-oriented grammar and
vocabulary learning strategies were taught by teachers and adopted by students both
in class and in students’ self-study time. In total, two language-use oriented strategies
were commonly reported by both teachers and students during test preparation:
• Taking notes (keeping a class note or self-study note of knowledge learned);
• Transfer (using learned language knowledge to facilitate language

communication and knowledge comprehension);
In addition, both classroom observations and interviews showed that teachers

also taught the following learning strategies during GVT preparation:
• Elaboration (expanding information in order to explain language tasks, such

as drawing attention from past learning experiences, finding key
information, and giving examples);
• Repetition (repeating what one says or repeating for better memorisation);
• Spelling (spelling or commenting on the spelling of words for students to

pay attention to);
• Summarisation (summarising knowledge learned through language tasks);
• Translation (translating the target language into Chinese or vice versa);
• Using word association (recalling students’ knowledge of different forms of

a word).
Nonetheless, students further mentioned two other learning strategies that they
used in extra-curricular time to learn English grammar and vocabulary. These
strategies are:
• Reading extensively to accumulate language knowledge (using various

reference resources for English learning and accumulate interesting or
useful language knowledge);

• Identifying and solving language learning problems (knowing one’s
weaknesses in learning and take actions to fill in the learning gaps, knowing
how to accommodate one’s language learning needs).
To be concise, this section only takes “reading extensively to accumulate

language knowledge” as an example to explain the language-use oriented grammar
and vocabulary learning strategies during GVT preparation.
Reading extensively to accumulate language knowledge

During test preparation, students were found to learn English grammar and
vocabulary through reading and accumulating language knowledge. Generally, all
School A students and comparatively high-achieving students from both School B
(Xun-SB, Hui-SB, Shu-SB, Na-SB) and School C (Fang-SC, Hua-SC) reported the use
of this learning strategy. For example, for Ming-SA who thought test preparation was
unnecessary, learning grammar and vocabulary was about reading passages and
experiencing the language.
Ming-SA: Talking about MCQ in the GVT, actually it’s the same, that is, I think for me,
I can’t form any habit, rather, I prefer reading English passages more, let, to
experience the language use in the passage. This, reading this kind of whole
paragraph and sentences, I can understand more easily, so I prefer
accumulating those exemplary words and sentences. (FG-SA)
According to students, reading passages frequently could help them easily

recognise the use of language (Shu-SB). In turn, this experience of language use in
context could further build their language intuition (Ling-SA). Therefore, they read
English passages (Ming-SA), browsed English dictionaries (Ming-SA, Chao-SA), and
read English storybooks (Fei-SA) and newspapers (Chao-SA) during test preparation.
Most importantly, they kept a notebook of useful sentences for daily learning (Fang-
SC, Hua-SC) to deepen their grammar and vocabulary knowledge, which was then
applied in their writing and reading (School A students).
This grammar and vocabulary learning strategy of reading and developing

language knowledge was found to link closely to students’ language proficiency, as
this learning strategy was mainly adopted by high-achieving students who were found
to be more capable of handling both test preparation and their own learning
improvement. As a result, despite of the test, students aimed at improving their English
learning rather than purely targeting at achieving higher test scores in the exam. In this

regard, it indicated a positive washback on students’ learning. To further explore the
language-use oriented learning strategies, quantitative results are demonstrated in
Table 5.7.
Table 5.7
Indicators of language-use oriented learning strategies (see instrument reliability and validity in
section 4.5.3)
As shown in Table 5.7, the proportion of students who reported keeping a

notebook of exemplary language knowledge was slightly lower than those who
reported they did not (34.1% versus 35.0%); the same pattern was evident for reading
extensively (31.1% versus 36.9%) but reversed for summarising as well as reviewing
common mistakes (35.1% versus 32.3%). The proportion of students who reported
summarising grammar rules roughly equalled that of students who reported they did
not (34.9% versus 34.7%).
To conclude, from qualitative data, this section reported ten language-use

oriented learning strategies of 1) taking notes, 2) transfer, 3) elaboration, 4) repetition,
5) spelling, 6) summarisation, 7) translation, 8) using word association, 9) reading
extensively to accumulate language knowledge, and 10) identifying and solving
learning problems. These positive strategies were found to be commonly adopted by
all teachers and high-achieving students across schools during GVT preparation. In
addition, quantitative results were reported, and the surveyed students were found to
have quite balanced practices of using or not using those strategies. Therefore, both
qualitative and quantitative results showed that language-use oriented learning

strategies were taught by teachers and adopted by students, which indicated a positive
washback of the GVT on grammar and vocabulary teaching as well as learning.
Section 5.5 reported the employment of test-use oriented and language-use

oriented learning strategies by participants. In this section, only key examples from
qualitative data were listed. However, the rich data implied that learning strategies
taught by teachers were similar among the observed teachers; while students with high
language proficiency tended to use more language-use oriented learning strategies, but
the test-use oriented learning strategies were commonly adopted by all students.
5.6 CHAPTER SUMMARY
In summary, this chapter contained five sections on washback value and both
qualitative and quantitative results were reported to answer RQ1a, which implied the
complexity of washback value.
Section 5.1 reported that negative washback outweighed positive washback in

relation to participants’ understanding and use of official test reference documents.
Principally, teachers agreed that GVT tasks reflected the learning-oriented intention in
those official test reference documents. Under this belief, teachers as well as students
acknowledged the crucial role of those documents in guiding their test preparation.
However, although Lan confirmed her positive implementation in test preparation
teaching, the implementation of the teaching and assessment principles in the ECSCE
in Grade 9 was regarded by Hu and Zhang as impractical. In addition, teachers and
mainly School C students tended to focus more on the designated test scope of
grammar and vocabulary and SHSEET-related topics in these documents. Therefore,
even though the positive implementation of ECSCE principles in Lan’s teaching
indicated a potential for positive washback, this “narrowing of the curriculum”
(Madaus, 1988) were perceived to exert negative influence on test preparation at the
macro level of washback value.
At the micro washback value, sections 5.2, 5.3, 5.4, and 5.5 revealed that both
positive and negative washback of the GVT were identified regarding GVT design
characteristics, affective factors, test preparation materials, and grammar and
vocabulary learning strategies. In section 5.2, teachers and students reported various
conflicting perceptions indicated that their positive perceptions generally centred on
Cloze and Gap-filling cloze, and their negative perceptions mainly focused on MCQ

and Sentence Completion. However, the quantitative results showed quite different
results from the qualitative findings, since students largely disagreed with the negative
perceptions but agreed with positive perceptions regarding the GVT. In section 5.3,
participants expressed affective feelings of test anxiety, intrinsic motivation, and
extrinsic motivation towards the test and grammar and vocabulary teaching and
learning. In general, teachers and mainly School C students (particularly low-
achieving students) experienced anxiety about the GVT and the SHSEET, considering
students’ test achievements. Further, students were intrinsically and extrinsically
motivated in learning English grammar and vocabulary, but the affective factor of
motivation was found to be evident in high-achieving students, especially intrinsic
motivation (Ling-SA, Chao-SA, Na-SB). In addition, quantitative findings showed
that students were generally not anxious about the GVT, but they did feel worried
about receiving criticism if they could not do well in the GVT; besides, students were
found to be both intrinsically and extrinsically motivated to learn English grammar
and vocabulary in GVT preparation to a great extent. In section 5.4, test preparation
materials used by teachers and students from qualitative data were listed, which were
predominantly exam-oriented. Even though School A students were more likely to
adopt non-exam oriented test preparation materials, the qualitative findings pointed to
the dominant use of exam-oriented or commercial test preparation materials in test
preparation (Saif, 2006; Wall & Alderson, 1993; Zeng, 2008; Zhan & Andrews, 2014)
and the neglect of non-tested materials during test preparation (Erfani, 2012). In
section 5.5, both test-use oriented and language-use oriented grammar and vocabulary
learning strategies were taught by teachers and employed by students. Generally, even
though various language-use oriented learning strategies were reported in qualitative
data, participants’ heavy reliance on test-use oriented learning strategies, particularly
using test-wiseness strategies to prepare for the GVT, indicated that the GVT brought
negative washback on teaching and learning. Moreover, test-use oriented learning
strategies were commonly evident across the observed classes. In contrast, high-
achieving students used more language-use oriented learning strategies. However,
quantitative results showed a slightly different situation, as students agreed that they
used more language-use oriented but less exam-oriented learning strategies. All these
findings reported indicated that the GVT brought about both positive and negative
washback on teaching and learning of grammar and vocabulary during the time leading
up to the test.

To conclude, the washback value results seemed to be roughly similar across the
three observed classes and the GVT as well as the SHSEET exerted both positive and
negative washback regarding the macro level of understanding and use of official test
reference documents and the micro level of test perceptions of GVT design
characteristics, affective factors, test preparation materials, and learning strategies.
Recalling the new washback model incorporating LOA in this study (Carless, 2007;
Green, 2007a; Jones & Saville, 2016), the findings of qualitative data of this chapter
in relation to RQ1a are further summarised in a table in Appendix M. Further results
in relation to GVT washback intensity will be reported momentarily in Chapter Six.

Chapter 6: Test Preparation: Washback
Intensity
Chapter Six presents the data analysis and findings with regard to the washback
intensity from the perspectives and practices of Grade 9 teachers and students as they
prepared for the Grammar and Vocabulary Test in the Senior High School Entrance
English Test (the GVT). Through the analysis of data obtained by observing the
teaching and learning practices in classrooms, eliciting participants’ perceptions
through interviews with teachers and students, and administering a student survey, this
chapter addresses the second sub-question of RQ1:
Guided by the washback model incorporating Learning Oriented Assessment

(LOA), this chapter focuses mainly on washback intensity from both teachers’ and
students’ perspectives (as highlighted by the dotted frame in Figure 6.1). Further, the
GVT washback model including both washback value and washback intensity is re-
envisaged at the end of this chapter.
Figure 6.1. Focus of the new washback model in Chapter Six (Carless, 2007; Green, 2007a; Jones &
Saville, 2016)
Washback intensity refers to the degree of washback associated with a test or the
extent to which participants will adjust to the test demands (Cheng, 2005; Green,
2007a). It further indicates to what extent stakeholders’ perceptions of test importance
and test difficulty influence the intensity of washback to them (Green, 2007a). On this
theoretical assumption, three main factors are considered in the dimension of
Chapter 6: Test Preparation: Washback Intensity 173

washback intensity in the new washback model which incorporates LOA. These three
factors are: perceptions of test importance, perceptions of test difficulty, and test
preparation effort. Results of the washback intensity of the GVT are reported
according to those factors.
To answer RQ1b, this chapter is composed of four sections. Section 6.1 reports
both qualitative interviews and quantitative survey results of the test importance of the
GVT as perceived by participants. Similarly, section 6.2 documents qualitative and
quantitative results of participants’ perceptions of test difficulty of the four GVT tasks.
Based on those perceptions, participants spent corresponding effort on test preparation,
which is presented in section 6.3. To present the overall picture of washback intensity
of the GVT, section 6.4 uses Multiple Correspondence Analysis (MCA) and reports
washback intensity patterns from the quantitative survey data. Likewise, section 6.5
investigates the relationship among students’ test perceptions, affective factors, test
preparation practices, and test performance through Structural Equation Modelling
(SEM). Finally, section 6.6 summarises the results of washback intensity and the GVT
washback model.
6.1 PERCEPTION OF TEST IMPORTANCE
Test importance literally means how important the test is perceived to be by its
relevant stakeholders. It is closely linked with test stakes which are regarded as
strongly indicating washback intensity (Madaus, 1988) and test use purposes that can
be both intended and unintended (Jin & Cheng, 2013). Therefore, stakeholders’
perceptions of test importance are also influenced by their awareness of test stakes and
the purposes of test use (i.e., how test results are interpreted and used by stakeholders).
In the current study, the SHSEET, as a high-stakes standardised English test, was
assumed to have a strong washback intensity. As part of the SHSEET, the GVT was
assumed to have a similar influence.
To start with, participants’ general impressions of the test importance (i.e., test
use purpose) were explored. When questioned about the test importance of the GVT,
Lan believed that teachers should find a way of keeping a balance in grammar teaching.
Due to her understanding of the nature of second language acquisition, she viewed
grammar learning to be essential; however, she disagreed with the predominant
grammar-focused approach to teaching EFL in China (Pan & Qian, 2017). In other
174 Chapter 6: Test Preparation: Washback Intensity

words, Lan believed that grammar teaching was essential but teaching grammar should
not be viewed as an end in itself. This perception was also shared by Zhang and Hu,
which thus led to somewhat conflicting ideas regarding the importance of some GVT
tasks among teachers. In addition, students’ awareness of test stakes which were
indicated by the use and interpretation of test results was also checked. In focus groups,
School A students explicitly mentioned the dual test use purposes of the SHSEET: one
is for graduation and the other is for senior high school enrolment. In other words,
students were clear about the stakes that were attached to the test. Therefore, it was
not surprising that both teachers and students perceived the GVT as both highly
important in some respects and relatively unimportant in others. Detailed findings are
presented below.
6.1.1 The GVT is perceived as highly important

Generally, all three teachers and most students regarded the GVT as highly
important due to the following considerations.
Firstly, the GVT was important due to the foundational role of grammar and
vocabulary in language learning. This was particularly relevant to junior high school
students, many of whom were categorised as beginners in terms of their English
proficiency level. To this end, Lan commented that the GVT was suitable for these
students. Similarly, from Hu’s perspective, GVT tasks, especially the easy tasks like
MCQ and Sentence Completion, played an essential role in junior high school
students’ learning as such basic language knowledge was “taking students’ arms and
being helpful in their future learning and life, such as overseas study” (Interview). In
a similar vein, students viewed the GVT as a whole to be highly important due to the
fact that the grammar and vocabulary knowledge assessed in the test was the
foundation of language use. For example, Shu-SB commented that the broad testing
of language knowledge in the GVT could reflect students’ language proficiency to
some extent. From their perspectives, “easy tasks assessing the foundation of language
learning” (Fei-SA, Hui-SB, Yao-SB, Fang-SC, Ping-SC, Kai-SC, Hua-SC) were
crucial for “English beginners” (Ling-SA, Chao-SA). This indicated the same
perception as their teachers Lan and Hu; that is, students thought easy tasks were
suitable for their learning stage.
In addition, the GVT was important mainly due to the use of SHSEET results
for graduation and senior high school enrolment (i.e., designated test use purposes). In

fact, this perception was particularly mentioned by participants who had a low
language proficiency background (i.e., Hu, Zhang, and low-achieving students).
According to students from School B and School C, the GVT, particularly the easy
tasks of MCQ and Sentence Completion, was highly important to their learning
because of its weighting in relation to the whole test (see section 1.2.5). In view of this
weighting, which was significantly linked with the dual test use purposes of graduation
and senior high school enrolment, participants tended to perceive the GVT to be highly
important to them. As such, obtaining more marks from easy tasks could help students,
especially low-achieving students, in overall test score gains. For example, Hu and
Zhang frequently mentioned the importance of MCQ, Cloze, and Sentence Completion
in their classes. Hu told her class that “students who aimed higher in senior high school
enrolment should secure the full 15 marks of the Cloze task” (SB-CO4); and in
classroom observations, Zhang was seen to repeatedly remind her students about
gaining marks on certain grammar tasks (SC-CO2). Zhang emphasised that obtaining
full marks on easy GVT tasks was very important for her students and her school,
especially when the student proficiency was low (Interview). Thus, it was noticeable
that the use of test results for graduation and senior high school enrolment influenced
Hu’s and Zhang’s perceptions. Furthermore, the test was also important for students’
job seeking needs after graduation (Ling-SA). This finding resonates with the CET
context, where students perceived that the instrumental function of pursuing
postgraduate study and future job application were important (Jin & Cheng, 2013).
Moreover, the GVT motivated students’ grammar and vocabulary learning, as teachers
commented that some students might not learn English language knowledge if it was
not included in high-stakes tests such as the SHSEET (Hu). This was also mentioned
by students who thought the GVT was significant in maintaining language learning
confidence, interest, and motivation (Fei-SA, Xia-SA, Long-SB, Hui-SB, Yao-SB,
Ping-SC).
Additionally, the GVT was important to retain because it followed the tradition
of EFL testing in China. In both Lan’s and Zhang’s opinions, GVT tasks (MCQ in
particular) were traditionally kept for both provincial and local tests. Due to this
consideration, GVT tasks were normally included in EFL tests, particularly in the
SHSEET, which is designed and administrated at the province level.

Another consideration shared by students was that the test was important since
the GVT tested knowledge helped their ability to use language as well as
communication (Fang-SC, Ping-SC). In this way, the test was able to reflect students’
language proficiency and thus students recognised the significant role of the GVT in
enabling students to improve their language learning.
Nonetheless, although the GVT was perceived by participants as highly

important according to the above reasons, some students commented that the GVT was
unimportant. Their perceptions are summarised and reported in the following section.
6.1.2 The GVT is perceived as relatively unimportant

Although teachers and students perceived the GVT to be important for teaching
and learning due to stakes attending the test and test use purposes attached to the test,
there was a sense in which they regarded easy GVT tasks such as MCQ and Sentence
Completion to be relatively unimportant.
Teachers like Hu pointed out that as the marks for the MCQ task of the GVT had
decreased, she no longer considered it to be as important as it used to be. In fact, the
marks allocated for the MCQ section were reduced to a total of 15 marks in the 2018
SHSEET test paper (originally 20 marks in 2016, so the total mark for GVT tasks in
2018 also decreased accordingly). Hu felt that the reduction in marks allocated to this
section indicated a lower test importance. As such, the mark allocation related to test
design decisions influenced teachers’ perceptions of test importance of the GVT,
which further impacted their decisions in teaching. Likewise, these tasks were
unimportant since they were too easy (Ming-SA, Long-SB, Na-SB, Jing-SC),
especially for high-achieving students (Fei-SA, Ling-SA). For example, every student
scored similarly in MCQ, which contributed less to discrimination in their overall test
scores (Meng-SC).
In addition, students regarded the inclusion of easy tasks such as MCQ which
focused on decontextualised points of grammar and vocabulary knowledge as
unnecessary. In their opinion, grammar and vocabulary knowledge could also be
assessed through and combined with other tasks such as Gap-filling cloze (Ming-SA),
writing (Long-SB), and reading (Jing-SC). This indicated that participants felt the
separate testing of grammar and vocabulary as unimportant.

Concluding from the qualitative data, participants expressed quite conflicting
perceptions of the test importance of the GVT. However, although teachers and
students expressed doubts regarding the inclusion of easy tasks like MCQ and
Sentence Completion, the overall importance of the GVT was perceived as high. Most
importantly, participants’ perceptions of high or low importance to teaching was
closely related to test stakes and test difficulty. Therefore, considering mainly the
learning stage and test use purpose for graduation as well as senior high school
enrolment, participants regarded the GVT to be highly important. This finding was
thus in accordance with what Madaus (1988) and Green (2007a) pointed out; that is,
test importance was associated with test stakes, and the test stakes of the GVT were
high due to the selection and graduation functions of the SHSEET.
6.1.3 Perceptions of test importance as measured in the student survey

Following the qualitative stage, participants’ perception of the test importance
was further investigated through students’ survey responses to the test importance
construct. Taking reference from Jin and Cheng (2013) and gaining insight from
qualitative findings, five items were used to quantify students’ perceptions of test
importance. As mentioned in Chapter Five, the data reported from the student survey
was the combined data, which has three sets of frequency for each indicator. Thus,
results of students’ perceptions of test importance are presented in Table 6.1.
Table 6.1
Indicators of test importance (see instrument reliability and validity in section 4.5.3)
From Table 6.1, it was found that most survey participants agreed that the GVT
was highly important to them. With regard to the instrumental test use purpose of the
GVT to junior high school graduation, the proportion of students who regarded it as
highly important to them accounted for 76.5%, which was much higher than those who

did not (5.7%). The same tendency was evident for students’ perceptions of GVT’s
purpose of senior high school enrolment (82.6% versus 3.1%), proving English
grammar and vocabulary proficiency (64.7% versus 7.4%), developing English
language use ability (71.0% versus 6.4%), and helping future English learning (76.6%
versus 5.5%). Those results thus aligned with general qualitative findings that the
majority of teacher and student participants regarded the GVT as a whole to be highly
important to them.
In summary, although conflicting views on MCQ and Sentence Completion

tasks were expressed by participants, teachers and students considered the GVT to be
highly important due to various reasons pertaining to its test stakes (Madaus, 1988)
and test use purposes (Jin & Cheng, 2013). The quantitative results of student survey
further supported this perception particularly from a student perspective. It thus met a
prerequisite for generating visible washback intensity (Green, 2007a).
6.2 PERCEPTION OF TEST DIFFICULTY
In addition to the factor of test importance, test difficulty is also viewed as the
driving force for washback intensity (Green, 2007a, 2013) which could lead to strong
or weak washback. Summarising from the qualitative data, it was found that both
teachers and students ranked the difficulty of the GVT by virtue of task types. In
general, MCQ and Sentence Completion were considered as easy tasks, Cloze had a
higher level of difficulty, but Gap-filing cloze was perceived as the most difficult by
all participants.
First, all three teachers and most students (for example, Xia-SA, Chao-SA)
perceived MCQ and Sentence Completion as the easiest tasks since they tested the
most basic grammar knowledge. Teachers tended to remind students about the easy
characteristics of GVT tasks, such as “the word order of declarative sentences was
very easy” (Hu, SB-CO2), in classes.
Further, Gap-filling cloze, which had more than one correct answer (Ming-SA),
was widely recognised by all teachers and students to be the most challenging task. As
such, they agreed that the most difficult task of Gap-filling cloze could test students’
overall ability to use language and prove their language proficiency. This perception
resonated with a study finding that gapped-sentence tasks were able to demonstrate
test-takers’ full linguistic repertoire (Docherty, 2015) and were thus difficult for

students. Further, according to teachers, the test difficulty of Gap-filling cloze was in
accordance with a selective test function (Hu). In other words, the difficult Gap-filling
cloze task was used to discriminate between students. This selective function was more
related to the competitive senior high school enrolment, which required higher test
scores than graduation. For example, even Lan, who had mainly high-achieving
students in her class, regarded this task as difficult and felt that it could discriminate
among students.
Lan: Gap-filling cloze is definitely a task that can differentiate students’ language
proficiency levels. Gap-filling cloze is a task that not only [in the GVT], but
also in the whole SHSEET test paper, it widens the gap [between low-
achieving and high-achieving students]. This task is a task that tests students’
abilities. (Interview)
In contrast, easy GVT tasks were perceived as not useful for discrimination but
enabled the majority of students to gain higher scores than they would otherwise have
received.
Hu: Because [MCQ] per se, according to the English inspector’s meaning, I feel,
anyway, this is basically a benefit for general, general students, which is not
used for selection. (Interview)
In general, teachers and students agreed that the overall test difficulty of the GVT
was not high (Ming-SA, Fei-SA) but was increasing each year. For instance, according
to all three teachers and students with an intermediate level of language proficiency
(Wei-SA, Xia-SA, Long-SB, Shu-SB), Cloze was more difficult than MCQ but easier
than Gap-filling cloze. Furthermore, despite the inclusion of relatively easy items such
as MCQ, all three teachers agreed that the overall test difficulty of the SHSEET and
the GVT was increasing each year. This could be seen in the decreasing mark
allocation given to MCQ in recent years. This perception regarding the change in test
difficulty reflected test designers’ intention of using the test for a selective purpose to
send high achieving students to prestigious senior high schools.
In addition to ranking GVT tasks according to task types, participants’ reasons

for their perceptions were also given. Generally, their perceptions of test difficulty
were influenced by both task methods and the test scope of the GVT. On the one hand,
teachers and students regarded the difficulty of the GVT to be different according to
test methods. To clarify, students thought that although similar to Gap-filling cloze
which had a passage-based format, Cloze was easier in that it listed options for each

item and thus it provided hints for students to reach the right answers (Wei-SA, Fei-
SA, Xia-SA, Jing-SC). On the other hand, the scope of what was tested in different
tasks influenced students’ perceptions of test difficulty. For instance, Gap-filling cloze
was perceived to be the most challenging because it tested students’ grammar
knowledge, language use, and test-taking skills (Ping-SC, Meng-SC, Hua-SC).
In fact, qualitative data presented complex results regarding the perception of

test difficulty. Although the test difficulty of the GVT was perceived to be relatively
low, according to students (Kai-SC, Hua-SC), even MCQ and Sentence Completion
tasks could have one or two difficult items. Therefore, to further probe this qualitative
finding in the student survey, a single-indicator item was designed for each of the four
GVT tasks to gauge students’ perceptions of their difficulty. Results are displayed in
Table 6.2.
Table 6.2
GVT task types and perceptions of test difficulty (see instrument reliability and validity in section
4.5.3)
From Table 6.2, the proportion of students who reported MCQ as absolutely
unchallenging and unchallenging was higher than those who perceived MCQ as
challenging (34.6% versus 25.3%); the same tendency appeared for Sentence
Completion (34.6% versus 31.2%) but reversed for Cloze (16.2% versus 42.8%) and
particularly Gap-filling cloze (5.0% versus 74.0%). These results aligned with
qualitative findings since all participants recognised that Gap-filling cloze was the
most challenging task.
To sum up, the test difficulty of the GVT was not definite; rather, its difficulty
varied when taking the test methods as well as test scope into consideration. To
conclude, as perceived by participants, MCQ was the easiest task, Sentence

Completion ranked the second; while Cloze was somewhat difficult, Gap-filling cloze
was the most challenging. Hence, both qualitative and quantitative data showed that
the difficulty of four GVT tasks ranged from low to high.
6.3 TEST PREPARATION EFFORT
Under the above perceptions of test importance and test difficulty, teachers and
students spent corresponding effort on test preparation. By including test preparation
data in the new washback model which incorporates LOA, the effort each participant
put towards test preparation was explored. To clarify, classroom observations and
teacher interviews disclosed the in-class test preparation effort, while student focus
groups revealed the extra-curricular test preparation effort. Further, students’ test
preparation effort was largely investigated in the student survey. Findings are reported
in this section.
6.3.1 In-class test preparation effort

Classroom observation and teacher interview findings showed that teachers
tended to think of and prepare for the SHSEET as a whole, but they did have specific
grammar and vocabulary test preparation practices in Grade 9.
Initially, the change of class schedule because of the SHSEET indicated the
influence of test on teaching and learning. From teacher interviews, Lan mentioned
that School A students had eight sessions of English classes in Grade 7 and Grade 8.
Among the eight classes, except for the seven classes mentioned in Chapter Four (see
Table 4.3), they also had one class session taught by a foreign language teacher which
focused on improving students’ speaking skills. However, in Grade 9, due to course
design and test review considerations, this foreign teacher’s class was not given to
students anymore. This phenomenon thus reflected a certain extent of washback
intensity, as schools also put more effort in test preparation by deliberately decreasing
certain activities. In addition to the reduction of specific classes in School A, all
classroom observation data also showed intense washback as the test preparation of
the GVT and the SHSEET took up all in-class teaching time of the three teachers.
Further, the teachers at all three schools reported using the same test preparation
model. According to teachers, the test preparation went across at least the entire second
semester of Grade 9, and roughly included three stages. The first round of test
preparation was mainly about reviewing textbooks which covered the SHSEET test

scope and were published by People’s Education Press (PEP), which in reality meant
focusing on reviewing vocabulary for about two months (Lan). The second stage was
specifically reviewing grammar knowledge: this stage took two weeks in which
teachers systematically reviewed the grammar knowledge learned in junior high
school. The last stage involved reviewing specific topics according to the ECSCE and
Test Specifications for the SHSEET. According to teachers, in the second stage of test
review, they did mostly MCQ tasks which exclusively focused on grammar knowledge
to prepare for the GVT. Lan felt that test preparation for the GVT was most intense in
this second stage. Most importantly, test preparation started from the beginning of
Grade 9, but became most intense from the beginning of second semester. Similar to
other study findings, this timing for test preparation also proved that washback could
be seasonal (Andrews et al., 2002; Cheng, 2005; Cheng et al., 2011) and longitudinal
(Bailey, 1996; Cheng, 2008b). Therefore, the closer to the test date, the more intense
the washback on teaching and learning.
Moreover, throughout test preparation, the teaching of grammar and vocabulary

knowledge was extensive, but teachers tended not to specifically prepare for the GVT
tasks separately. In fact, it was hard to judge the exact effort put in test preparation, as
teachers explained that grammar and vocabulary were always touched upon in test
preparation (Hu). For example, the test task of reading comprehension could also
prepare students for vocabulary testing (Hu). Further, even though teachers asked
students to do specific grammar and vocabulary exercises such as MCQs, it was not
very intense (for example, two sets of MCQ tasks every week or 20 items a day for a
certain period (Zhang)). Interestingly, Hu explained that the intensity of test
preparation regarding GVT tasks should be considered by their test weighting, which
determined how much effort they should put in test preparation. Therefore, taking
MCQ as an example, which accounted for 15 marks in 2018, it should thus be allocated
15 minutes’ time in test preparation (Hu).
In sum, teachers’ data presented an intense in-class test preparation effort

regarding the GVT and the SHSEET. Significantly, both the change of class schedule
and the in-class test preparation teaching indicated intense washback. Findings also
indicated that the washback intensity of the GVT on teaching appeared to be generally
at a certain level but became intense when approaching the test date. Moreover, test
preparation was organised in a similar way across schools and teachers took task types

and the weighting of tasks as the criteria for the amount of time spent on particular
tasks in test preparation. However, viewing the SHSEET as a whole, the intensity of
grammar and vocabulary teaching in test preparation classes was high, as more than
half of the semester (around two and a half months in Grade 9) was spent specifically
on grammar and vocabulary test review. It was thus appropriate to conclude that the
washback intensity of grammar and vocabulary test preparation in class was high.
6.3.2 Extra-curricular test preparation effort

Focus group data revealed that students’ GVT preparation could be categorised
according to three considerations. The general test preparation of the GVT was not
intense; but students tended to prepare for the test differently as low-achieving students
had more intense test preparation when the test was approaching; and students’
considerations of task types as well as test difficulty influenced their test preparation
efforts in extra-curricular time.
In general, students’ GVT preparation in extra-curricular time was not intense.

For example, most students did not prepare specifically for MCQ and Sentence
Completion tasks (Xia-SA, Chao-SA, Long-SB, Xun-SB, Hui-SB) since they regarded
the language tested in these easy tasks could be practiced in preparation for other tasks.
Therefore, they did the least test preparation on MCQ, such as spending half an hour
or less on it each week (Ling-SA, Long-SB, Yao-SB, Shu-SB, Meng-SC). In contrast
to those students who did little test preparation, Ming-SA commented that test
preparation was unnecessary. As a result, he did not prepare for the GVT or even for
the English subject. These findings support the research finding that the same test
might exert different amounts of washback intensity among participants (Alderson &
Hamp-Lyons, 1996; Erfani, 2012; Ferman, 2004; Watanabe, 1996a).
In contrast, Fei-SA carried out intense test preparation for the GVT, including
MCQ, in the first semester of Grade 9. As he argued, it was hard to make significant
progress in English within a short time, because language learning requires a long-
term commitment. Therefore, he suggested students who were not good at English
might need to spend more effort on test preparation. Likewise, Kai-SC reported that
he did intense GVT preparation even on MCQ when the test was approaching, because
he aimed to gain more marks in easy tasks. Moreover, when approaching the test date,
School C students tended to have a more intense test preparation since they were asked
by their teacher to do MCQ tasks every day. These findings of washback intensity

support the concept of the seasonality (Cheng, 2005; Cheng et al., 2011) or influence
of time periods (Andrews et al., 2002; Shohamy et al., 1996).
In general, compared to the in-class test preparation, students’ extra-curricular

test preparation was to a lower degree. Nonetheless, although most students spent
relatively little effort in preparing for MCQ and Sentence Completion tasks, they
prepared for more difficult tasks of Cloze and Gap-filling cloze (Wei-SA, Xun-SB,
Na-SB) in extra-curricular time. The different degrees of test preparation effort
indicated that washback intensity varied according to different tasks or tests (Shohamy
et al., 1996).
6.3.3 Test preparation effort as tested in the student survey

Besides the complex qualitative results regarding GVT preparation from both
in-class and extra-curricular time, students’ extra-curricular test preparation was
further examined in the student survey. The survey primarily quantified extra-
curricular test preparation since in-class test preparation was found to be similar across
the observed classes; that is, teachers allocated all in-class teaching time to test
preparation. In this way, the evidence of in-class test preparation was explicit and there
was no need to further quantify it. Therefore, taking reference from extant literature
(Qi, 2004b; Xie, 2010) and gaining insights from the qualitative findings, students’
test preparation effort was further investigated in the survey with a focus on the number
of sets of papers they took in relation to each GVT task after school hours. In this way,
the survey could gauge students’ test preparation on each GVT task. Results are shown
in Table 6.3.
Table 6.3
Number of test papers taken for GVT tasks (see instrument reliability and validity in section 4.5.3)

As shown in Table 6.3, the proportion of students who reported doing less than
three MCQ test papers per week was much higher than those who reported they did
more than seven MCQ test papers per week (56.4% versus 16.6%); the same tendency
was evident for Cloze (52.9% versus 18.4%), Sentence Completion (59.2% versus
17.1%), and Gap-filling cloze (48.9% versus 23.5%). The same method was also
applied to their time investment after school hours, which is shown in Table 6.4.
Table 6.4
Time spent on preparing for GVT tasks (see instrument reliability and validity in section 4.5.3)
From Table 6.4, it was found that the proportion of students who reported
spending less than one hour per week on MCQ exceeded those who spent more than
two or three hours on the task (59% versus 11.4%); the same tendency was evident for
Cloze (47.1% versus 16.6%), Sentence Completion (63.3% versus 12.0%), and Gap-
filling cloze (40.0% versus 27.2%).
To conclude, both qualitative and quantitative findings indicated a lower degree

of test preparation effort by students in extra-curricular time. However, students’ GVT
preparation was basically in line with their perceptions of test difficulty and different
test tasks (Shohamy et al., 1996). In particular, compared to the passage-based tasks
of Cloze and Gap-filling cloze, sentence-based tasks of MCQ and Sentence
Completion received less attention during test preparation.
To summarise section 6.3, intense in-class test preparation regarding the GVT
and the SHSEET was identified from teachers’ data and generally a lower degree of
GVT preparation in extra-curricular time was reported by students. In order to further
investigate the reported findings, Multiple Correspondence Analysis (MCA) was
conducted to examine the theoretical conceptualisation in Green’s (2007a) work (see

Figure 3.3) and the relationship between test importance, test difficulty, and test
preparation effort. Results are shown in the following section.
6.4 WASHBACK INTENSITY: MULTIPLE CORRESPONDENCE

ANALYSIS
Multiple Correspondence Analysis (MCA) was performed on the 13 variables

included in the student survey to measure students’ perceptions of test importance, test
difficulty, and test preparation effort. Descriptive statistics in terms of frequency have
been reported in previous sections. To recall, test difficulty was measured by four
variables, each to measure perceived difficulty of each of the four tasks, namely MCQ,
Cloze, Sentence Completion, and Gap-filling Cloze. Test preparation effort was
measured by two sets of variables, each set composed of four variables to measure
students’ test preparation effort in each of the four GVT tasks. The first set was
designed to gauge the number of test papers taken for each of the four tasks weekly,
and the second set, the number of hours spent on each of the four tasks weekly. Test
importance, however, was gauged by a single-indicator variable to measure students’
perceptions of the importance of the GVT.
MCA was conducted by using SPSS 25.0 and with a goal of identifying patterns
of washback intensity of the GVT in extra-curricular time. The MCA model summary
is presented in Table 6.5. A two-dimension MCA solution was obtained. The first
dimension, with Cronbach’s α of .914, eigenvalue of 6.384, and inertia of .491,
accounted for 49.109% of the variance of the 13 variables in the study sample of 922;
the second dimension, with Cronbach’s α of .865, eigenvalue of 4.955, and inertia of
.381, accounted for 38.115% of the variance. Thus, the total variance explained is
87.224%. These results met the recommended criteria of Cronbach’s α above .70,
inertia above .2, and the variance explained above 50% (Hair et al., 1998; Johnson &
Wichern, 2007). Therefore, the two-dimension MCA solution explained the data well.
Table 6.5
Model summary of washback intensity

Table 6.6 shows the discrimination measure of each of the 13 variables to the
two dimensions. In detail, the higher the value of the discrimination measure, the more
meaningful it proves to a dimension and the greater ability it discriminates between
the student participants under analysis. Additionally, each indicator’s contribution
towards explaining the variance in each dimension portrays its quality.
Table 6.6
Discrimination of variables for the dimensions12
The discrimination measures of the MCA model are displayed in Figure 6.2. In
both dimensions, the eight variables used to gauge test preparation effort of students
have larger discrimination measures than the variables used to gauge test difficulty and
test importance. This means, in general, along both dimensions, the codes/categories
of the eight variables for test preparation effort demonstrate a wider spread/variance
than those for test difficulty and test importance.
12
The indicator codes, such as 1TDFMCQ, used in this table reflect student responses to the Likert
scale used in the student survey. 1 means the first Likert point in the scale, 2 means the second Likert
point in the scale, and so on. In detail, for example, 1TDFMCQ means “MCQ is absolutely not
challenging”. For details of what each point represents, refer to the survey items in Appendix I or the
labels in Figure 6.3.

Figure 6.2. Indicators display (measures of discrimination)13
Further, MCA features four patterns of washback intensity of the GVT for the
survey participants, as shown in Figure 6.3.
Figure 6.3. Washback intensity patterns of the GVT
Following the naming of construct indicators in Table 6.6, TP means “test papers taken”, Time
13
means “time investment”, and TDF means “test difficulty”. These were also applied to Figure 6.3.

Pattern 1 (the green circle in Figure 6.3)-no washback: when the GVT was
perceived by students to be unimportant (2TIM) and three GVT tasks of MCQ,
Sentence Completion, and Cloze were perceived to be absolutely challenging for them
(5TDF), there was no effort spent in test preparation as students did none test papers
of and spend no time in these tasks per week (1TP, 1Time).
Pattern 2 (the red circle in Figure 6.3)-less intense washback: when the GVT was
perceived by students to be neutrally important (3TIM) and the four GVT tasks were
considered as challenging or absolutely challenging (4TDF for all, 5TDF for Gap-
filling cloze), students tended to spend comparatively less effort in test preparation
(2TP, 2Time). Therefore, each week, most students did one to three test papers of and
spend no more than one hour in these four test tasks.
Pattern 3 (the black circle in Figure 6.3)-more intense washback: when the GVT
was perceived by students to be important (4TIM) and absolutely important (5TIM),
and the four GVT tasks were not very difficult (2TDF) or neutrally difficult (3TDF),
students tended to do more test papers (3/4TP) and spend more time on these four test
tasks (3/4Time). In detail, each week, students did four to nine test papers of and spend
one to three hours in these four GVT tasks.
Pattern 4 (the blue circle in Figure 6.3)-unclear pattern: each week, students did
more than 10 test papers of and spent more than three hours in these four GVT tasks,
but those test preparation practices were not at all related to their perceptions of test
importance and test difficulty.
Before deconstructing the four identified washback intensity patterns, it is

necessary to review the theoretical conceptualisation of washback intensity. To this
end, Green’s (2007a) washback model is revisited in Figure 6.4 (see next page).
To highlight, both in Green (2007a) and in the new washback model

incorporating LOA, washback intensity is a key dimension which contributes to
revealing the entire GVT washback phenomenon. Therefore, the test preparation
intensity that the GVT exerted on self-study learning was further explored in the
student survey. To this end, the study quantified the washback intensity results in order
to provide more specific answers to RQ1b and to investigate whether the washback
intensity in this study could verify the theoretical assumptions of washback intensity
(Green, 2007a).

Figure 6.4. Washback intensity by Green (2007a)
Pattern 1 of no washback was in line with Alderson and Wall (1993) who
claimed that test will have no washback if it has no important consequences. Further,
it also verified Green’s (2007a) argument that test difficulty should be challenging but
also attainable for an intense washback to happen.
Pattern 2 of less intense washback was also supported by the literature. When
students’ perception of test importance increased, their test preparation efforts on the
GVT increased accordingly (Madaus, 1988; Popham, 1987). This finding could imply
that washback intensity can vary among different tests (Shohamy et al., 1996) and it
further verified the qualitative findings since students’ test preparation varied due to
different tasks, Gap-filling cloze in particular.
Pattern 3 of more intense washback demonstrated a relatively higher degree of

GVT preparation. When students regarded the test to be more important (Madaus,
1988; Popham, 1987) but attainable (Green, 2007a), they tended to prepare more for
the test as indicated in Green’s (2007a) model.
Pattern 4 presented a unique case in the current study. Against the theoretical
assumption that test importance was a crucial factor for generating washback intensity
(Alderson & Wall, 1993; Green, 2007a; Hughes, 1993), this pattern demonstrated
students prepared for the test intensely without perceiving the GVT as important or
difficult. To further investigate the phenomenon, participants grouped in Pattern 4
were tracked. Findings suggested that this pattern was reported mainly by students
with good test performance (18 out of 24 students reported scoring more than 126.5
marks in 2018 SHSEET). However, further analysis and more factors should be

explored to better explain this unclear pattern, which falls beyond the theoretical
framework (Green, 2007a) and could offer implications for future research directions.
To summarise, MCA findings supported students’ qualitative data; that is, the
washback intensity of the GVT on extra-curricular learning was to a lower degree. In
general, there were four patterns among which three out of the four verified the
theoretical assumption that test preparation is affected by participants’ perceptions of
both test importance and test difficulty. However, Pattern 4 reveals a unique pattern
that needs further investigation. Nonetheless, besides the previous findings of the three
factors with regard to washback intensity (i.e., sections 6.1, 6.2, 6.3), the further
investigation of washback intensity through MCA indicated that most students
experienced a less intense (Pattern 2) or more intense (Pattern 3) washback of the GVT,
which generally aligned with the accounts of focus group participants.
6.5 WASHBACK MECHANISM: STRUCTURAL EQUATION

MODELLING
Before moving on to the statistical analysis, it is noteworthy to remind that for

the convenience of data reporting and model construction, the SEM model adopted
simpler codes to indicate strategy use of students (see section 5.5). Therefore,
language-use oriented learning strategies were referred as “positive strategy” and test-
use oriented learning strategies were indicated by “negative strategy”.
As proposed in Chapter Four, the washback mechanism in this study aimed to

test the relationship between test perceptions and test performance through affective
factors and test preparation practices. Therefore, in this section, Structural Equation
Modelling (SEM) was applied to investigate the hypothetical relationship between
students’ test perceptions, students’ affective characteristics (i.e., motivation and test
anxiety), students’ test preparation practices, and their test performance (i.e., self-
reported SHSEET scores). Three hypothetical relationships are proposed and
examined in this section:
1) Students’ perceptions of test design characteristics and test importance

directly affected students’ motivation and test anxiety;
2) Students’ motivation and test anxiety directly affected their test preparation
practices;

3) Different modes of test preparation practices had varying influences on
students’ self-reported SHSEET scores.
Before processing the model, two points need to be acknowledged. First,

students’ SHSEET scores were used as a measure to indicate students’ GVT scores. It
should be noted that learning outcomes are composed of multiple dimensions, a
systematic quantification of which fall beyond the scope of the thesis. Second, the
mean value of test preparation effort (including test papers taken and time investment)
was computed. As the original test preparation effort was not designed as a conceptual
construct but as a measure of the “quantity” of effort, the computed mean value was
adopted as a measure of overall test preparation effort. Therefore, a new variable (test
preparation effort, v78) was obtained and used in the washback mechanism model.
Before performing SEM, the statistical assumption of normality was examined

(see Appendix L). With the multivariate kurtosis (510.675) greater than the cut-off of
3 (Yuan et al., 2002), the data significantly deviated from multivariate normal
distribution. Therefore, bootstrapping methods were selected when running SEM.
The hypothetical relationships between students’ test perceptions, affective

factors, test preparation practices, and test performance were established according to
the conceptual model of washback mechanism (see section 3.3) and the hypothetical
structural model (see section 4.5.3). The structural model for washback mechanism of
GVT is presented in Figure 6.5 (see next page). All standardised regression weight
values between the constructs under examination and between the constructs and their
corresponding indicators were above .60, and most of them were above the preferred
value of .70 (Hair et al., 2006). The squared multiple correlations (SMC) were all
above the acceptable cut-off value of .30 (Jöreskog & Sörbom, 1989) and most of them
were greater than the preferred value of .50 (Jöreskog & Sörbom, 1989). Therefore,
the model converged.
The SEM results of the washback mechanism showed a good model fit
(CMIN/DF=3.200, df=699, p=.000, SRMR=.071, RMSEA=.049; 90% CI [.047, .051];
TLI=.910; CFI=.919). Although the model had a significant chi-square value of
2236.996, the ratio between the chi-square value and the degree of freedom
(2236.996/699=3.200) was not high. Standardised root mean square residual (SRMR)
was .071 which was below the cut-off value of .08 (Hu & Bentler, 1999); baseline fit
indices of TLI (.910) and CFI (.919) were above the cut-off value of .90 (Bentler,

1990); and the value of RMSEA (.049) was below the cut-off value of .08 (Ho, 2006;
Schreiber et al., 2006) and the preferred value of .05 (Hair et al., 2006). Therefore,
given the complexity of the structural model and the large sample size of 922, it could
be argued that this model achieved a reasonably good fit and explained the underlying
conceptualisation of washback mechanism of the GVT on students’ learning.
Figure 6.5. Structural model for the relationship within GVT washback mechanism (N=922)
In addition, standardised path coefficients of the structural model of washback

mechanism are reported in Table 6.7. Similar to the previously reported qualitative
findings, these results further proved the complexity of washback phenomenon. The
results show significant and negative or positive relationships between certain
constructs whereas no significant results were identified between some constructs.

Table 6.7
Standardised path coefficients of the structural model of the GVT washback mechanism
The hypothetical relationships between students’ test perceptions and affective

factors are as follows:
1. Negative Perception of GVT characteristics had a significant negative but

weak association with Intrinsic Motivation (r=−.163, p<0.001); and an
insignificant association with both Extrinsic Motivation (r=−.008, p=.847)
and Test Anxiety (r=−.051, p=.231). When test design characteristics had a
negative association with participants’ understandings (r=−.163),
participants’ perceptions of test characteristics indicated a negative
washback on learning (Green, 2007a).
2. Positive Test Perception2 (language use characteristics) had a significant

positive but weak association with Intrinsic Motivation (r=.291, p<.001),
Extrinsic Motivation (r=.169, p<.001), and Test Anxiety (r=.173, p<.001).
Although the associations between those factors were weak, it seemed to

encourage more positive washback than negative washback, which again
pointed back to the positive washback indication (Green, 2007a) of the GVT.
3. Positive Test Perception2 (language use characteristics) had a significant

positive but weak association with Intrinsic Motivation (r=.291, p<.001),
Extrinsic Motivation (r=.169, p<.001), and Test Anxiety (r=.173, p<.001).
4. Test Importance1 (designated test use purpose) had an insignificant

association with Intrinsic Motivation (r=.057, p=.118); a significant positive
but weak association with Extrinsic Motivation (r=.195, p<.001); and a
significant negative but weak association with Test Anxiety (r=−.096,
p=.033).
5. Test Importance2 (perceived test use purpose) had a significant positive and
weak to moderate association with Intrinsic Motivation (r=.308, p<.001);
and a significant positive but weak association with Extrinsic Motivation
(r=.269, p<.001) and Test Anxiety (r=.165, p<.001).
The hypothetical relationships between affective factors and test preparation

practices are as follows:
1. Intrinsic Motivation had an insignificant association with Negative Strategy

(r=−.051, p=.394); a significant positive and weak to moderate association
with Positive Strategy (r=.318, p<.001); and a significant positive but weak
association with Test Preparation Effort (r=.189, p<.001).
2. Extrinsic Motivation had a significant positive but weak association with

Negative Strategy (r=.279, p<.001); and insignificant association with both
Positive Strategy (r=.081, p=.136) and Test Preparation Effort (r=.061,
p=.261).
3. Test Anxiety had a significant negative but weak association with Negative
Strategy (r=−.194, p<.001); a significant positive but weak association with
Positive Strategy (r=.171, p<.001); and an insignificant association with Test
Preparation Effort (r=−.007, p=.839).
According to the above results, intrinsically motivated students did not

necessarily use test-use oriented strategies during test preparation (r=−.051). Rather,

they tended to adopt language-use oriented strategies such as reading extensively
(r=.318) and actively prepare for the GVT (r=.189). Besides, extrinsically motivated
students tended to adopt more test-use oriented strategies such as rote-memorising
grammar rules and vocabulary lists (r=.279) and neglect language-use oriented
strategies, which was similar to qualitative findings. Further, statistics of test anxiety14
could indicate that instead of using more test-use oriented strategies (r=−.194),
students who were less anxious tended to adopt more language-use oriented strategies
during test preparation (r=.171). Therefore, the decrease of test anxiety could bring
about a more positive washback (Chen et al., 2018) on GVT learning.
Regarding the hypothetical relationships between test preparation practices and

students’ test score, it was evident that Negative Strategy had no statistically
significant association with Test Score (r=−.054, p=.159); Positive Strategy had a
significant positive and weak to moderate association with Test Score (r=.311,
p<.001); and Test Preparation Effort had a significant positive but weak association
with Test Score (r=.234, p<.001). Besides, among the three modes of test preparation
practices, Positive Strategy had the most significant positive association with Test
Score, followed by Test Preparation Effort. This finding thus showed no evidence that
the use of test-use oriented strategies linked with students’ overall test performance
(r=−.054). However, it could be argued that students who reported more use of
language-use oriented strategies tended to be those whose SHSEET scores were higher
(r=.311). Studies like Oxford and Nyikos (1989) could provide certain implications to
this finding, as they found university students’ motivation was a powerful indicator for
their use of language learning strategies. In addition, Gardner (1985) pointed out that
the decisive factor for language learning success is motivation (cited in Oxford, 1989).
It thus could potentially explain the relationships between motivation, strategy use
during GVT test preparation, and students’ test performance (self-reported SHSEET
scores) in this study; however, further investigation is needed to thoroughly examine
their relationships. Moreover, students who put more effort and spent more time in
GVT preparation tended to be those with higher SHSEET scores (r=.234), which
contradicted with those findings in other research (Dong, 2020; Xie, 2013).
14
The “test anxiety” scale was designed in a reversed way, such as “My appetite was unchanged”.
Therefore, the bigger the Liker-scale number, the less anxious students were.

To conclude, when students were intrinsically motivated, had positive test
perceptions of language use characteristics in the GVT and important self-perceived
GVT use purposes, they tended to have test preparation practices of adopting more
learning-use oriented strategies and doing more extra-curricular test preparation which
associated with higher SHSEET scores. On the contrary, students who were
extrinsically motivated, no matter what test perceptions they held, they tended to be
those who applied more test-use oriented strategies during GVT preparation which did
not associate with higher SHSEET scores. Finally, students who reported having less
anxious feelings towards the GVT and having positive perceptions as well as the
important self-perceived test use purposes, tended to be those who chose language-use
oriented strategies rather than test-use oriented strategies to learn grammar and
vocabulary. Likewise, the use of language-use oriented strategies associated
significantly and positively with students’ self-reported SHSEET scores. All findings
implied that GVT preparation practices indirectly associated with students’ test
perceptions through their affective factors.
Although test perceptions were constructed with different components in various

studies, the findings in this study were generally in line with Xie (2015a) who argued
that students’ test design perception influenced their test preparation through students’
expectations and test use perceptions and Dong (2020) who found learners’
perceptions of test importance affected their learning practices which further affected
their learning outcomes. Moreover, by considering possible association between
affective effects (i.e., motivation and test anxiety) with test performance (i.e., self-
reported SHSEET scores) (Wolf & Smith, 1995), it completed the washback model of
Green (2007a) and verified the “participants-process-products” washback mechanism
of Hughes (1993) and Bailey (1996). Therefore, the current structural model was more
complex in that it explored the whole washback mechanism through a range of paths
and its findings are summarised in Figure 6.6 (see next page).

Note. The highlighted arrows indicate a negative relationship, which are also marked by a red “−”.
Insignificant paths were removed from the model.
Figure 6.6. The structural relationships within the measurement model of the GVT washback
mechanism
6.6 CHAPTER SUMMARY
This chapter presented both qualitative and quantitative answers to RQ1b. In

general, it was found that the GVT washback intensity in test preparation classes was
high and extra-curricular learning was lower. Nonetheless, the overall washback
intensity of the GVT was complex due to the complex perceptions of test importance
and test difficulty as showed by both qualitative and quantitative data.
In relation to washback intensity, generally similar results were reported by both

teachers and students. First, all participants regarded the GVT to be highly important
due to designated test use purposes and perceived language learning purposes. In
addition, the test difficulty was perceived by participants according to different test
tasks. In-class test preparation pattern was similar and intense across the observed
classes, but students’ extra-curricular test preparation was to a lower degree and varied
according to different perceptions of test importance as well as test difficulty regarding
the four GVT tasks. Moreover, qualitative findings also implied that GVT washback
was longitudinal since it almost went through the entire Grade 9. These qualitative
findings were further tested through MCA which identified four different patterns.
Those patterns generally matched with the theoretical assumption of washback
intensity in relation to test importance, test difficulty, and possible other factors
(Alderson & Hamp-Lyons, 1996; Alderson & Wall, 1993; Green, 2007a; Hughes,
1993). Pattern 1, Pattern 2, and Pattern 3 indicated no, less intense, and more intense

washback of the GVT respectively, which converged with qualitative findings.
However, Pattern 4 falls beyond the theoretical conceptualisation which needs further
investigation. The findings of qualitative data in connection to RQ1b are summarised
in Table 6.8.
Table 6.8
Qualitative results to RQ1b
Revisiting Chapter Five findings, different patterns of GVT washback results

were reported across the observed classes, even though the test preparation seemed to
be similar in teaching. Furthermore, student language proficiency was identified as a
major influential factor which influenced washback value at a micro level. These
findings were further tested through SEM in section 6.5.
To test the hypothetical relationships between students’ perceptions of test

design characteristics and test importance, affective factors, test preparation practices,
and test performance (i.e., self-reported SHSEET scores), SEM was applied. It was
found that except for students’ negative perception of GVT design characteristics,
students’ test perceptions indirectly associated with test preparation practices through
affective factors of intrinsic motivation and test anxiety, which further associated with
students’ test performance. Although some relationships were weak, this structural
model tested and partially proved the theoretical conceptualisations from Hughes
(1993), Bailey (1996), Green (2007a), and Wolf and Smith (1995). Further, the
negative relationship between Positive Perception1 (test format characteristics) and
Test Anxiety, and between Test Importance1 (designated test use purposes) and Test
Anxiety implied that those two factors of test perception might negatively associate

with students’ test performance through Positive Strategy. Nonetheless, the significant
and positive relationships among other factors were found to indicate the positive
associations between test perceptions and test preparation practices (Xie, 2015a) and
their further relationship with test performance or learning outcomes (Dong, 2020).
To conclude, Chapters Five and Six are summarised in Figure 6.7 (see next page)
by modifying the proposed washback model in Chapter Three. The basic ideas within
this washback model were: macro value was assumed to exert an influence on micro
value (Green, 2007a), but the current research participants were unable to provide
abundant and crucial evidence (dashed arrow). Moreover, both perceptions of test
design characteristics (in micro washback value dimension) and test importance (in
washback intensity dimension) were found to associate with participants’ affective
factors (in micro washback value dimension), which in turn linked with their test
preparation practices (both micro washback value and washback intensity
dimensions). All these factors combined then tended to relate to students’ learning
outcomes (i.e., the self-reported test scores). Additionally, test preparation materials
and LOA practices were potential factors for both positive and negative washback
value, which were assumed to be included in this model. Details of LOA practices
during GVT preparation will be presented momentarily in Chapter Seven.

Figure 6.7. The GVT washback model on teaching and learning

Chapter 7: The Incorporation of LOA
Principles: Opportunities and
Challenges in GVT Preparation
Chapter Five and Chapter Six have reported the overall washback value and
washback intensity findings of the Grammar and Vocabulary Test in the Senior High
School Entrance English Test (the GVT). This chapter explores Learning Oriented
Assessment (LOA) opportunities as well as challenges in the GVT preparation. In
particular, the findings address the second research question:

LOA principles in GVT preparation?
To address RQ2, Chapter Seven jointly reports qualitative and quantitative

findings in five sections and then synthesises the findings. In detail, this chapter reports
findings based on themes. Section 7.1 reveals participants’ perceptions of
opportunities for the incorporation of LOA principles in GVT preparation.
Participants’ identifiable learning-oriented strategies and activities both in and outside
class are reported in section 7.2. Their reported practices from the qualitative study
were further investigated in the student survey and results are shown in section 7.3.
The hypothetical relationship between the dynamic of LOA practices in GVT
preparation and students’ self-reported SHSEET scores is presented in section 7.4. The
challenges of the incorporation of LOA principles in GVT preparation are reported in
section 7.5. Finally, section 7.6 provides a summary of both qualitative and
quantitative findings in response to RQ2.
7.1 BELIEFS ABOUT OPPORTUNITIES FOR THE INCORPORATION

OF LOA PRINCIPLES IN GVT PREPARATION
Both teachers and students felt that the GVT could offer opportunities for
promoting learning. However, ascertaining their understanding and knowledge of the
term LOA was crucial. Therefore, before delving into their perceptions of the LOA
opportunities, it is important to establish participants’ understanding of the concept of
LOA and the ways in which it informed them in the context of test preparation. In
effect, only teachers’ understanding of LOA was explored in the interviews,
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 203
considering that the term and theory could be hard for students to comprehend. Thus,
teachers’ opinions were primarily presented below.
Among the three teachers, Lan appeared to have a misconception regarding LOA
and expressed doubts regarding her understanding of the concept, as the following
excerpt demonstrates:
Lan: Oh, my understanding is, that is, language knowledge oriented. That is to say,
the language knowledge itself, which might not involve learning like students’
ability. Well, I think it should be, well, knowledge-oriented, well, testing,
knowledge. (Interview)
This knowledge-oriented assumption held by Lan was due to her confusion

regarding the term LOA. This was reflected through her questions “This learning, the
concept is quite general, is it learning outcome? Is it learning content? Is it learning
object? Is it learning subject?”, after reading through the explanation handed over to
her.
Hu’s and Zhang’s understanding of the concept of LOA reflected fundamental

principles. Most importantly, they emphasised that in keeping with LOA principles,
English as a Foreign Language (EFL) tests should help students to bridge their learning
gaps and exert a positive influence on both teaching (Zhang) and learning (Hu). The
following excerpt from Hu’s interview is illustrative.
Hu: Learning Oriented Assessment? First, [the test] can reflect the real learning
situation of students. Second, it can tell students, how they should do. Third,
having a certain positive influence on students’ next stage of study. You should
let students have confidence to continue learning, and they should be sure
about where and how they could make improvement. This should be more
realistic [for students]. (Interview)
After eliciting teachers’ understanding of the concept of LOA, their opinions on

incorporating LOA principles in teaching were also articulated. First of all, although
Lan was unsure about the meaning of LOA, she was aware of the difference between
Assessment of Learning (AoL) and Assessment for Learning (AfL). Most importantly,
Lan’s school was attended by students with a high EFL proficiency and she confirmed
that she had received in-service professional development workshops on AfL from the
school. In contrast, Hu and Zhang had not received professional development in
language assessment and Hu expressed her eagerness to have access to this type of
training. This reflected differences in professional development opportunities offered
204 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
across schools. In fact, literature suggests that building teachers’ subject knowledge
could influence student achievement in the long term (Hill et al., 2005) and it is
important to provide teachers with professional development opportunities for learning
about and practising LOA (Carless, 2007; Carless et al., 2006; Zeng et al., 2018).
Regardless of the differing interpretations of the term LOA, the three teachers
felt that their teaching practices were guided by LOA principles. In other words, they
agreed that they prioritised learner needs and improving students’ learning. For
example, Lan used flipped classes to assess and give feedback on students’ learning.
For Hu, being aware of students’ language achievement in light of learning aims was
key to her teaching. In order to enhance her students’ learning, she intentionally
increased the level of difficulty of language knowledge teaching in her classes. In other
words, after students reached a certain level, she moved on to use more difficult
content for teaching. This was clear evidence of formative assessment, since teachers
(e.g., Hu) collected students’ assessment information through observation to improve
learning (Nichols et al., 2009) and suggest actions to close the learning gap (Black &
Wiliam, 1998). Further evidence included a simple profile of students’ learning
records by Hu. For Zhang, on the one hand, LOA practices in teaching were reflected
in her belief of “educating before teaching”; on the other hand, she tried to involve
more stakeholders like the head teacher and parents in decisions on her teaching.
Therefore, it was clear that teachers believed in LOA principles and this was reflected
in their teaching practices, but their teaching approaches differed.
Although LOA principles were utilised by teachers, according to Hu and Zhang,

there was a clear distinction between lower grades and Grade 9 in junior high school
teaching. For example, Zhang admitted that her teaching focus changed during the
three years.
Zhang: Anyway, it’s like this, this way, I think that, it’s still, there were much more
[incorporation of LOA principles] in Grade7, while in Grade 9, we follow less
[of LOA principles]. (Interview)
This identification of the time influence on the incorporation of LOA principles

in actual teaching reflected the washback of the SHSEET. In fact, no matter for which
teacher, the closer the test date, the less likelihood that LOA principles were
emphasised in their teaching. This phenomenon of less learning-oriented intentions
was not unique in the current GVT; it was also found in NMET test preparation (Qi,
2005), past studies on the SHSEET (Yang, 2015), the IELTS, and the TOEFL
(Zafarghandi & Nemati, 2015) where teachers also intentionally ‘taught to the test’ in
order to achieve a better test result.
In line with teachers’ understanding of the LOA concept and their related
utilisation of LOA principles in teaching, both teachers and students believed that the
GVT could enable the incorporation of LOA principles in the test preparation stage.
In their opinions, there were various reasons for the GVT to promote learning. In total,
six kinds of opportunities were expressed by participants. Those opportunities include:
• Alignment with students’ EFL learning stage;
• Developing communication abilities in real life;
• Developing students’ learning skills in general;
• Learning-oriented test design;
• Transferring language knowledge into performance on macroskills;
• The level of challenge.
These six LOA opportunities are reported accordingly in this section.
7.1.1 Alignment with students’ EFL learning stage

First of all, teachers perceived that the GVT could promote learning due to the
belief that the test aligned with the English language learning stage of junior high
school students. Lan acknowledged that the GVT enabled the incorporation of LOA
principles in teaching and learning since it fitted her students’ learning stage well.
According to her, this was due to the fundamental characteristics of grammar and
vocabulary for her students’ language proficiency.
Lan: Well, currently, regarding this, this, our current students’ learning proficiency,
I think it is learning-oriented and helpful, nevertheless. After all, grammar is
what has to be learned [for this stage] … (Interview)
According to Lan, the students’ learning stage should be taken into consideration
as students were at an early stage of English learning. In fact, not all junior high school
students started learning English from Grade 3 as mandated; some students might only
have started learning English from Grade 7 (Ming-SA, Meng-SC, see section 4.4.4).
Therefore, for those English beginners, learning grammar was essential. This finding
was actually in accordance with students’ comment on the fundamental role of
grammar and vocabulary for beginning English learners (see section 6.1.1). Therefore,
the resonance between the fundamental role of language knowledge and the learning
stage of English beginners was thus considered as the first influential factor regarding
the opportunities for the incorporation of LOA principles in GVT preparation.
7.1.2 Developing communication abilities in real life

The second opportunity for incorporating LOA principles in GVT preparation
was utilising the context provided in test items. According to Hu, although this context
might seem irrelevant to real-life situations, it was still possible to use it to develop
students’ communication abilities in real life.
Hu: For example, like what we learn now in the class, although it has certain
differences from the real-life situation, however, it is not to say, they are totally
irrelevant. If you can learn this (i.e., the context provided in GVT tasks) well,
okay, actually it has a certain function at sometimes. Even though it might be
Chinglish, right? But it wouldn’t be totally useless, right? You, because you
learn this, like the rigid textbook knowledge, learned, learn it well, you won’t
be unable to go shopping when you are overseas, right? Or you won’t be put
into custody when clearing customs, or get into trouble, right? (Interview)
From Hu’s comment, it could be seen that although there were limitations of the
test design in terms of the relatively inauthentic contexts of GVT task items, they could
still provide a basis for students’ communication abilities in real life.
Likewise, students in focus groups claimed that what was tested in the GVT
could be practised in their real-life communication. For example, students could make
use of the MCQ content in a real-life situation when communicating with peers.
Ling-SA: Oh, I think that the MCQ task in the GVT is, sometimes, it can be embedded
into life. Take, for example, sometimes we (i.e., students) make jokes with
others, I mean making jokes by communicating with others in English. For
instance, it relates to some basic grammar, what is tested in MCQ, then we can
integrate what are tested in MCQ into life, sometimes [we can] integrate into.
(FG-SA)
Moreover, students (Yao-SB, Meng-SC, Kai-SC, Hua-SC) thought that MCQ

items were related to real-life experiences and language use. Hence, students could
learn and use the grammar and vocabulary content tested in MCQ to promote their
language learning through preparing for communication in the future.
Kai-SC: For example, if a foreign friend comes and asks us about some routine
questions, we will be able to, because we take [the MCQ exercises] regularly,
so [we are] proficient, so we will be able to answer their questions. (FG-SC)
In addition, students understood that GVT tasks could also impart useful
knowledge about language use. In this way, they could sense the learning-oriented
potential of the test, as a student explains below:
Na-SB: It, they [Cloze and Gap-filling cloze] just use another method to tell us (i.e.,
students, test-takers) the truth or facts of life. In fact, last time, it seemed that
I did one passage, which seemed to be about how to speak more politely. At
that time, it gave me one inspiration, that is, how to talk to others in a polite
way. Therefore, I think, in fact, Cloze and, and Gap-filling cloze, in fact, give
me great inspirations, which are relevant to real life. (FG-SB)
From Na-SB’s comment, it was inferred that students taking the information
conveyed through passage-based tasks to deepen their understanding was an indication
of learning-oriented potential of the GVT. Therefore, the potential of applying what
was tested into real-life situations created opportunities for the GVT to promote
learning.
7.1.3 Developing students’ learning skills in general

From the interview data, both Hu and Zhang believed that the GVT developed
students’ general learning skills. According to them, both “strengthening students’
memory” (Hu) and “logic as well as inferencing ability” (Zhang) contributed to the
LOA potential of the test.
Zhang: … maybe what seem to be careless things [such as differentiating contrasting

options in MCQ] are actually giving some [focus] on students’ logic, that is,
to help them make inference… Some aspects of sense or logical sense and
inferencing sense. … Cultivating these aspects, I think it might be somehow
learning-oriented. (Interview)
Through analysing Zhang’s accounts, it was found that this perception was
influenced by the test design characteristics (testing language knowledge) and test
method (using a multiple-choice approach in some tasks). Regarding test design
characteristics, GVT tasks tested both language knowledge and overall ability to use
language (see section 5.2). As for test method, the MCQ type tasks could train
students’ logic and memory, which could be generally applied in language learning.
These were necessary learning skills that students should acquire and make use of in
order to learn a language. Taking the skill of inferencing as an example, it was
considered by many researchers as an important skill to be used in L2 reading
(Anderson, 1991) and vocabulary teaching (Walters, 2004).
7.1.4 Learning-oriented test design

Significantly, Zhang confirmed that all four types of GVT tasks were designed
with a learning-oriented focus (i.e., an intention to test more than language knowledge
and further improve students’ learning). This judgement came from her knowledge of
test designers’ intentions, as Zhang was the Director of Teaching Affairs in her school
and thus received training on designing test items. From her own experience, she was
familiar with and thus discussed test designers’ intentions in the interview.
Zhang: As for …that Cloze task, … Yesterday, Teaching and Research Officer Molly
gave us the training in designing test items, she said in this way “Cloze does
not test grammar”. … It tests the understanding of the passage, logical
inference … It tests higher-order stuff. (Interview)
According to Zhang, the intention of testing higher-order thinking and language

abilities by test designers was communicated to teachers. Therefore, it was considered
as the fourth opportunity for GVT preparation to motivate teaching practices aimed at
improving students’ learning.
7.1.5 Transferring language knowledge into performance on macroskills

Another opportunity pertinent to the LOA feature of the GVT was that what was
learned and tested in those tasks could be foundational for performance in macroskill
tasks, including writing. For example, when questioned on whether they used their
accumulated grammar and vocabulary knowledge, School A students responded “yes”,
and Xia-SA further explained as follows:
Xia-SA: Such as when writing a composition, or writing articles, [we] can then make
use of this knowledge accumulated. (FG-SA)
Furthermore, Fei-SA explained his way of learning and using grammar and
vocabulary knowledge through extensive reading (see section 5.5.2). In fact, among
all focus group participants, this view of the foundational importance of grammar and
vocabulary knowledge was mainly expressed by students from School A and School
B. These students with a comparatively higher language proficiency level recognised
the link between knowledge of the language and the ability to use it. This finding, to
some extent, was assumed to uncover the relationship between LOA opportunities as
well as practices and student language proficiency levels. Further discussion between
LOA challenges and student language proficiency will be presented later (see section
7.4 and section 7.5).
7.1.6 The level of challenge

According to students, test difficulty could also provide an opportunity for
incorporating LOA principles in GVT preparation. Generally, due to the fact that Gap-
filling cloze was regarded as the most difficult (see section 6.2), students perceived
this as the task which could be used to improve their learning (Ling-SA, Chao-SA, Na-
SB). For instance, Na-SB stated:
Na-SB: I think that MCQ has only very limited use for me. In my opinion, I think that
Gap-filling cloze can improve my learning to a greater extent. … Because its
difficulty level is higher. (FG-SB).
This interview account from Na-SB showed that the more difficult the task, the
greater the possibility of improving one’s learning. This opinion was agreed by most
students in focus groups. A further comment from Ling-SA also offers insight into this
finding as the decreased mark allocation of MCQ indicated a shift of test focus from
testing language knowledge to testing overall ability to use language. In her opinion,
this shift brought about more challenges to her learning and thus motivated her to study
hard to improve junior high school English learning, which could further help her
senior high school study.
Ling-SA: In my opinion, in fact, if overall test score weight of other tasks [other than
MCQ] was increased, it means that in regard to the text, the text
comprehension, that is, our language abilities shall be improved, which is very
helpful to our study at senior high schools. Then … if we do not build a solid
and higher basis at a junior high school level, when we go to senior high
schools, in fact, even if you again, you study very hard again, I feel that I need
to spend more interest in improving my own learning. (FG-SA)
Summarising from Ling-SA’s explanation, the increased level of challenge in

test tasks was regarded as necessary for students to prepare for learning improvement
and providing a foundation for their English study at senior high schools. In this regard,
the level of challenge was thus considered to be a potential factor for promoting
students’ learning in the GVT context.
To conclude, participants believed that these six categories of opportunities
made it possible for the GVT to promote learning and thus indicated the potential for
the GVT to be learning-oriented. This was gleaned from the interview data. The
perceptions of LOA opportunities varied among teachers and remained similar among
students. According to teachers, Lan regarded the test as crucial for English beginner
learners and suitable for the junior high school stage of EFL learning. Although the
test items were not reflecting a real-life communication, Hu and students considered
the test to have an LOA potential since it could help students develop certain
communication abilities in real life. Next, other than improving students’ language
learning, the test itself was thought to be able to develop students’ abilities of memory,
logic, and inference (Hu, Zhang). Furthermore, according to Zhang, the GVT could be
learning-oriented due to learning-oriented test design intentions such as testing overall
language use ability rather than simple linguistic knowledge. In addition, as reported
by students, it was possible for the GVT preparation to incorporate LOA principles
due to the possibility of transferring language knowledge into performance on
macroskill tasks and the level of challenge.
Besides those perceptions of opportunities, it was found that participants’

teaching and learning practices could also reflect the incorporation of LOA principles.
The detailed results are reported in section 7.2.
7.2 IDENTIFIABLE LOA STRATEGIES AND ACTIVITIES
Under the aforementioned beliefs, both classroom observation and interview

data showed that teachers and students did take certain learning-oriented strategies and
different learning-oriented activities during GVT preparation. To clarify, although
participants were not familiar with the definition and theoretical bases of the LOA
theory, they did agree that they used certain types of strategies and had specific
activities that could promote learning.
Generally, a variety of LOA strategies and activities were practiced by the

participants during GVT preparation. In particular, from the perspectives of LOA
principles (Carless, 2007) and the LOA cycle (Jones & Saville, 2016), the proposed
key LOA practices (see section 3.4.2) were generally identified in the qualitative data.
To clarify, the learning-oriented practices in the current dataset were constituted of:
• Interactive classroom activities;
• Feedback;
• Learning-oriented strategies.
These practices and their related categories are reported subsequently in this
section.
7.2.1 Interactive classroom activities

According to Jones and Saville (2016), two kinds of activities complement each
other in a learning-oriented classroom context; namely, 1) learner-centred and 2)
content-centred activities. Learner-centred activities enable higher-order thinking
skills and the co-construction of knowledge to occur during teacher-student or student-
student interactions, while content-centred activities feature the transmission of
knowledge, usually by the teacher. These two types of interactions were both identified
in the observed classes.
Learner-centred interaction
As aforementioned, LOA cycle depicts learner- and content-centred activities in
classrooms. Although it is ideal that both interactive activities should appear in
classrooms aiming for LOA purposes, it was rare to see interactive activities featured
with higher-order skills in the current SHSEET preparation courses. As a result, only
limited types of learner-centred interactions were identified. Those activities included
1) group or pair discussion which was prominent in all three teachers’ classes and 2)
positioning students as holders of language knowledge which was predominant in
Zhang’s classes.
Group/pair discussion
Noticeably, all three teachers observed utilised this activity in their classes to
motivate students to solve problems in groups or pairs (think-pair-share). For instance,
in her class, Zhang encouraged students to first discuss their answers in pairs before
she proceeded with further instruction on students’ exercises:
Zhang: You can discuss in pairs, then tell me your answer. You can discuss with your
partner, in pairs. Which one did you choose and, oh, which one will you choose
and why? You can discuss in pairs. (SC-CO3)
The process of carrying out a group discussion was detailed by Hu in the

interview. Taking the MCQ task in the GVT as an example, she explained why she
used group discussion among students.
Hu: After they finish some MCQ tasks in a certain time period, their group itself,
itself summarises one [correct answer]. In the case that no one knows the
correct answer…, for example, I choose A, you choose B, he chooses C, and
she chooses D, we then discuss together. Which answer is indeed the most
correct one, the group itself gathers the final answer. (Interview)
According to the participating teachers, group and pair discussion were used to
provide opportunities for learners to engage with the assessment tasks with peers. The
use of this activity in a test preparation context was thus viewed as fostering a higher-
order communication skill of problem-solving among students. In light of this activity,
students were encouraged to apply knowledge learned to engage cooperatively with
assessment tasks. Therefore, the use of group or pair discussion in test preparation was
regarded as contributing to the learning-oriented potential of the GVT.
Positioning students as holders of language knowledge

The other category of learner-centred interaction in classroom observations was
asking students to demonstrate their answers in test preparation classes which
positioned students as holders of language knowledge. Although both Hu and Zhang
mentioned using “positioning students as holders of language knowledge” 15 in
interviews, in the observations, only Zhang demonstrated this in her test preparation
classes. Moreover, it is interesting to note that Hu was the only teacher who dominated
classroom talk. In contrast, by positioning students as holders of language knowledge,
Zhang asked students to come to the front of the class and share their
knowledge/response to a task and answer questions from the class. She explained the
procedure to students as follows:
Zhang: And I will ask some students to come here. I will check what you have read
just, what you have read. Now, first one, who wants to have a try? Come here.
And show what you have had, have you, what you have read. [pause] Now,
Lily, please come here. [pause] And if, if you have any question, you can ask
her. Okay? (SC-CO4)
This phenomenon of positioning students as holders of language knowledge was

found to be unique to Zhang in the classroom observation data. She used this
interactive activity for the purpose of improving students’ learning and cultivating
15
By positioning students as holders of language knowledge, it means that the teacher appointed a
certain student to act as a teacher, come to the front of the class, and then explain how he finished the
task to the whole class.
their ability in critical thinking. Her reason for adopting this activity in teaching was
thus explained in the interview as follows:
Zhang: It can certainly [improve students’ grammar and vocabulary learning]. As I

said just now, the student did the exercise, but he or she was not necessarily
able to tell [how he or she did it]. This is such a process of persuading oneself
(i.e., speaking out the student’s own thinking) and making clear of one’s mind,
which I think is very important. In fact, the time is short, if the time is
sufficient, I will especially ask that medium, no, the low language proficiency
level students to come to the front and lecture. If those students can explain it
very clear, then the other students should possibly be fine. What else? The
teacher lecturing, students might not like this. But if it’s a student who lectures,
the feeling of novelty, when a student is lecturing, the audience’s perception
and feelings are different. (Interview)
In brief, Zhang transferred the power and responsibility of teaching to students

through this interactive activity and hoped it could further improve their language
learning. According to her, encouraging students to articulate their thinking processes
and problem-solving skills could help all students to learn. Thus, through this
interactive activity, the teacher created a zone of proximal development (ZPD)
(Vygotsky, 1986) and positioned the student as expert to promote learning.
Content-centred interaction
As the literature suggests (Jones & Saville, 2016) and the current study describes,
content-centred classroom activity mainly focuses on knowledge of linguistic forms
and taking test tasks. Similar to the findings of learner-centred classroom interaction,
Hu was the teacher who mentioned content-centred activities in her interview, but this
was rarely observed in her classes. Nonetheless, a variety of content-centred
interactions were identified in Lan’s and Zhang’s test preparation classes, which
included 1) open-ended questions, 2) closed questions, 3) student-initiated questions,
4) reading aloud as a class, and 5) giving bonus points. This section presents these
activities in sequence.
Open-ended questions
From the classroom observation data, it was found that teachers adopted open-
ended questions to interact with students. Lan and Zhang used this type of questioning
frequently in order to engage individual students as well as the whole class in
classroom teaching. In fact, open-ended questioning was characterised with “why”
questions in classes observed. Through asking a “why” question, teachers expected
students to clarify their thoughts and engage in problem-solving processes. For
example,
Lan: No? Okay, forty-five, okay, helpful. [pause] What’s your reason? Why, why
you chose “helpful”? What’s your reason? (SA-CO1)
From the above quote, it was evident that Lan was expecting students to explain
the process behind the response. the reason was that she was not clear of the student’s
decoding process before the student explained the reasoning. In effect, questions aimed
at higher cognitive levels contribute more to learning than simple recall or mechanical
response through closed questions (Hasselgreen et al., 2012). Therefore, using open-
ended questions, which could also be viewed as metacognitive questions (Cazden,
2001), can promote students’ learning. In this way, Lan and Zhang both used open-
ended questions to interact and engage with students in test preparation classes.
Further, teachers also frequently adopted closed questions, which is presented in the
following section.
Closed questions
Although open-ended questions helped teachers probe further into students’
higher-order learning processes, closed questions were frequently used by teachers to
engage students in classes. For example, teachers preferred using the first half of the
sentence as hints for students to complete the other half of the sentence, which guided
students to reach the answers step by step. The following example from Lan illustrates
this:
Lan: Yeah, baozi [steamed pork buns] can be in different sizes. So you can see, the
first part is talking about its position, right? “It’s common, even President
Xi…”?
S: eats baozi. (SA-CO3)
Similarly, teachers mainly used closed questions to elicit correct answers from
students. In addition, Hu frequently used closed questions to confirm basic language
knowledge, for example “Is this word a countable noun or uncountable noun?” (SB-
CO1), which was too easy for students. It was also noticeable that Hu answered her
questions either by herself or together with the whole class and did not always expect
to receive answers from students before moving on to the next task. As a result, Hu’s
class involved the least student interactions among all three teachers.
In fact, teacher-initiated questions are common in EFL teaching as this is
regarded as a valid invitation for students to respond. It is the first turn in a traditional
Initiation-Response-Evaluation (IRE) sequence. The characteristic of an IRE model is
that it is a teacher-led model, which involves the teacher questioning students with the
answers already known to the teacher (Lemke, 1990). Through questioning, teachers
hoped to encourage students to participate in class activities, express their ideas of and
opinions on learning (Liu & Zhao, 2010). In turn, this expectation could reduce the
phenomenon of teacher-talk. From the classroom observation data, it was found that
both Lan and Zhang commonly adopted open-ended questions and closed questions in
their teaching. In contrast, Hu often answered her own (rhetorical) questions and
seldom asked individual students questions in her teaching.
Student-initiated questions
In addition to the traditional teacher-led IRE pattern of interaction, there were
also student-initiated IRE interactions in both Lan’s and Zhang’s classes. Especially
for Zhang, since she involved students in her teaching, her students seemed to have
more chance to initiate questions. For example, when Kelly was demonstrating
teaching in front of the whole class, one student expressed his doubts and expected her
to provide a more detailed answer to the question, as can be seen in the interaction
below:
S: What guide word?
Zhang: What guide word? Some students asked you [Kelly] immediately, what guide
word?
Kelly: Well, I mean conjunction.
Zhang: She said conjunction, she answered you. (SC-CO2)
Zhang repeated the students’ question and Kelly’s answer to keep the classroom
interaction dynamic. As for students, they were seeking further feedback on the task
from peers, which was perceived as effective in contributing to learning if students
could take the initiative in the learning and feedback process (Hasselgreen et al., 2012).
It was undeniable that her learner-centred activity of positioning students as holders of
language knowledge motivated student-initiated questioning. Therefore, through the
interactions between teachers and students, they co-constructed a common body of
knowledge (Hall & Walsh, 2002) to fill in learning gaps jointly.
Reading aloud as a class
To engage students with classroom interactions, Lan also guided students to read
aloud together in test preparation classes. Differing from an interactive learner-centred
activity, choral reading aimed at transmitting language knowledge in learning.
Therefore, in Lan’s classes, she frequently asked students to read important grammar
structures and sentences to deepen their impression of certain language knowledge.
The following excerpt is an illustrative example of this classroom activity.
Lan: Ending. People in Chongqing xxx take a lot of hotpots. Make it

louder. Make it louder. You can use the “not only...but also...”, right?
Okay. Just read it, “Not only...”, go!
Ss & Lan: Not only in Chongqing, but also in the other parts of the world, people
eat hotpot.
Lan: Yes, okay. (SA-CO4)
Through reading together with students and asking students to read aloud, Lan
focused on language knowledge in test preparation classes. In addition, this activity
appeared to be motivating to students as they actively engaged in this classroom
activity. Therefore, this content-centred classroom interaction of reading aloud was
adopted in Lan’s class to promote students’ learning.
Giving bonus points

Besides reading aloud, Lan also engaged students with test preparation learning
through giving bonus points in her classes. As a content-centred classroom activity,
giving bonus points was her strategy to form a competitive culture in her class. In fact,
this was quite similar to the “evaluation” turn in the IRE model of classroom
interaction. Through the adoption of this activity, Lan was expecting students to strive
for the correct answer to the test-driven exercises. She explained the instructions for
the activity to students as follows:
Lan: If you got exactly the same answer as the original text, you can get two points.
And for other answers, if they are right, you can just get one. (SA-CO3)
During her classroom teaching, Lan kept a record of bonus points for students.
However, the real purpose of her use of this activity was unclear in the classroom
observation. Regardless of these ambiguities, the use of this content-centred activity
motivated her classroom interactions with students, and thus helped to promote
students’ language learning.
As pointed out by Jones and Saville (2016), both learner-centred and content-
centred classroom interactions should be emphasised since the complementarity
between them is necessary for developing learning-oriented classroom practices. The
current study further supported this claim as both interactive activities were found to
be key in GVT preparation classes. Although the number of learner-centred classroom
interactions was limited in the qualitative dataset, it was still evident that both learner-
centred and content-centred interactions were present in both Lan’s and Zhang’s
classes, while Hu’s classes appeared to be the least learning-oriented. This indicates
the potential for GVT preparation classes to be learning-oriented, but differences
remain across different teachers. In sum, the emergence of higher-order skills in
classroom teaching and learning as well as the process of knowledge transmission
complemented each other in test preparation classes.
Classroom interaction as measured in the student survey

Informed by the qualitative findings from classroom observations, a set of items
were designed and incorporated in the student survey. Results from the survey are
presented in Table 7.1.
Table 7.1
Indicators of classroom interaction (see instrument reliability and validity in section 4.5.3)
As shown in Table 7.1, the proportion of students who reported often and always
having group discussion in classes was higher than those who reported they never or
seldom did (36.8% versus 31.5%); the same tendency was evident for positioning
students as holders of language knowledge (39.3% versus 29.7%) but reversed for
having interesting learning activities such as performing drama and having English
debates (24.0% versus 45.3%). Thus, mainly two types of classroom interactions,
namely group/pair discussion (v48) and positioning students as holders of language
knowledge (v51) as identified in classroom observations, were probed in a wider
participant population, which provided evidence for positive washback of the GVT on
grammar and vocabulary learning. As for having interesting learning activities (v52),
it corresponded to participants’ interview accounts, that is, those interesting interactive
activities such as English debates happened in Grade 7 and Grade 8, rather than in
Grade 9. As such, it indicated a negative washback of the GVT on grammar and
vocabulary learning.
7.2.2 Feedback
Feedback practices, as a key element in LOA (Carless, 2007; Jones & Saville,
2016), were explored in both classroom observations and interview data. In fact,
Carless (2007) and Jones and Saville (2016) emphasised the importance for teachers
to give feedback on student performance to close learning gaps and thus improve
students’ learning. Therefore, as a key element in the LOA cycle, feedback practice in
the classroom was also explored in the qualitative phase.
To begin with, all three teachers believed in the power of feedback to improve
students’ learning and tried to expand students’ knowledge when giving feedback.
According to them, their feedback practices during the test preparation period were
mainly in an oral form (i.e., face to face), while written feedback was rarely given. The
reason for this was that “written feedback takes up more time” (Hu, Interview).
Moreover, both Hu and Zhang explained that they gave feedback according to different
language proficiency levels of students and they preferred students explaining first
before they gave feedback on their responses in class. Through these methods, teachers
felt that feedback was “helpful and effective” to students’ learning and their teaching
(Zhang, Interview). In addition, teachers explained that feedback should be
longitudinal, even during the test preparation stage. By so doing, feedback worked as
a mediated tool for teaching and learning as teachers and students co-constructed a
ZPD (Vygotsky, 1986) where students modified their learning systems based on the
feedback (Aljaafreh & Lantolf, 1994) and teachers could better detect students’
problems in learning and thus strategically help them to improve in a long term. In this
way, feedback mediated language development (Rassaei, 2019). For instance, Lan
explains her view of feedback below.
Lan: Therefore, maybe at one exam time, the student made a mistake in this item,
but next time he or she could choose a right answer. Okay, then at least, we
can say, if this item aligns with, for example, if it aligns with object clause, if
he makes a mistake, then at least it will mean that the student is having problem
with this language point, so [we teachers] can consolidate this, okay. Then,
hmm, it is more about a longitudinal tracking. If the student is having a
problem with this single MCQ item for a long time, and it’s only related to
those several items, then it means that the student has problems with this
section. (Interview)
Lan regarded that feedback should be progressive and used to monitor students’
learning progress over time. This idea, also adopted by Zhang, was found to work
through keeping a student profile of their learning outcomes. As explained by Zhang,
her records of students’ progress and achievement were used to monitor students’
progress in different language tasks in relation to the test. By referring to those records,
she strategically changed her teaching focus to help students make progress in their
learning.
Students’ comments in focus group data further supported the feedback practices
reported by teachers. For example, regarding to how teachers gave feedback, three
modes of feedback provision were reported from students: oral feedback, written
feedback, and online feedback. Most students mentioned face-to-face oral feedback
and its usefulness as it was detailed, insightful, and timesaving. This echoed teachers’
preference for oral feedback. As for written feedback, some students regarded it as
unimportant since they did not spend time reading written feedback; while others
perceived written feedback as more useful and effective (Hui-SB, Shu-SB, Na-SB) and
could be encouraging when it included smiles or emojis (Na-SB). In addition, online
feedback was proposed by Ming-SA, since he felt that an online channel could make
him feel less embarrassed when seeking feedback from his teacher.
Moreover, although feedback was claimed by teachers to occur in test

preparation teaching, teachers used a different channel of classroom feedback outside
the test preparation period. According to Lan and Hu, feedback was mainly applied
through observing and checking through students’ classroom talks as well as
performance. However, for Zhang, no matter whether outside of or during the test
preparation stage, her way of feedback was mainly produced through different written-
form assessment such as unit quizzes.
Overall, the feedback practices from teachers’ perspective were mainly at a task
level (i.e., focusing on test tasks) and the preparation for the GVT appeared to offer
limited learning-oriented potential through feedback practices (i.e., feeding forward).
Regarding the differences among three teachers, it was observed that Hu provided less
feedback and there were no instances of feeding forward in observations of her classes.
The reason for this phenomenon could be partly explained by Hu’s classroom
interactions which were different from the other two teachers. In Hu’s classes, teacher-
talk predominated, and thus there were rarely any feedback opportunities to promote
students’ learning. This was counter to the recommendation that feedback should be
both timely and forward-looking in order to improve students’ current as well as future
learning (Carless, 2007). In sum, although teachers were not aware of the relationship
of feedback practices to LOA, they believed that providing feedback could promote
students’ learning during GVT preparation. Hence, GVT preparation was able to
provide an opportunity of incorporating LOA principles through feedback practices,
but teachers’ feedback practices varied.
Additionally, both teachers and students confirmed that individual feedback was
given to students since every student’s learning progress and challenges differed.
Therefore, most individual feedback was given to students orally and was considered
to feed forward to some extent. While providing individual feedback orally, teachers
extended language knowledge and taught problem-solving and language learning
skills. For example, Ming-SA commented on his teacher’s individual feedback:
Ming-SA: Because, when you talk to the English teacher face to face, she not only
explains this single exercise to us, but also extends other knowledge. For
example, this type of exercise, it may have other special situations, and she
will give us instructions on those special ones as well. Hence, it is quite helpful
for the future [study]. (FG-SA)
According to Ming-SA, the feedback from his English teacher was not simply
the feedback on the specific task, but also included extended knowledge which was
useful for future studies. In fact, similar comments were made by other students (Xun-
SB, Fang-SC, Meng-SC, Hua-SC), which thus indicated the possibility of feeding
forward in GVT preparation. By feeding forward, it refers to pointing out the next
stage goal and indicating future learning implications (Carless, 2007; Hattie &
Timperley, 2007). As a result, although teachers’ feedback was mainly at the task level
and specifically related to test preparation, most students expressed that their teachers’
feedback can help them improve grammar and vocabulary learning. Moreover, key
literature on feedback (Carless, 2007; Hattie & Timperley, 2007) recommends timely
feedback and maintains the application of feedback is crucial in improving learning
and thus feedback was investigated in the student survey.
Based on the qualitative findings, various characteristics of feedback were

investigated in the student survey, which included characteristics of frequency,
timeliness, the degree of detail, feeding forward, usefulness, and satisfaction. The
corresponding results are thus reported in Table 7.2.
Table 7.2
Indicators of feedback (see instrument reliability and validity in section 4.5.3)
As shown in Table 7.2, the proportion of students who agreed that they had
frequent feedback on grammar and vocabulary learning from teachers was much
higher than those who disagreed (35.0% versus 15.6%); the same tendency was evident
for timely feedback (52.9% versus 11.1%), detailed feedback (52.7% versus 10.7%),
feeding forward (51.2% versus 9.7%), the usefulness of feedback (43.1% versus
11.5%), and satisfaction with teachers’ feedback (53.3% versus 9.0%). This finding
complemented qualitative findings, as the responses to the survey indicated that
teachers’ feedback on grammar and vocabulary during GVT preparation was frequent,
timely, detailed, feeding forward, helpful, and that students were satisfied with the
feedback they received from their teachers. Therefore, the feedback on grammar and
vocabulary did reveal positive washback of the GVT on students’ learning, which
further proved the possibility for the GVT to have a learning-oriented potential in the
test preparation stage.
7.2.3 Learning-oriented strategies
Although the language-use oriented grammar and vocabulary learning strategies

in section 5.5.2 extensively listed the learning strategies that teachers taught and
students applied in test preparation, the other three specific aspects of teaching and
learning which closely linked with the LOA theory (Carless, 2007; Jones & Saville,
2016) are reported in this section. In fact, in order to prepare for the test, participants
were observed to utilise and interviewed to confirm as well as report their use of certain
types of learning-oriented strategies. Therefore, with a purpose of intentionally
improving students’ learning, teachers and students adopted three learning-oriented
strategies which offered opportunities for incorporating LOA principles during GVT
preparation. These three learning-oriented strategies identified in the qualitative data
are thus reported accordingly in this section, which are 1) performing mentor-
apprentice pairs, 2) promoting learner autonomy, and 3) involving students in
assessment.
Performing mentor-apprentice pairs
To improve students’ learning, teachers asked students to help each other. This
learning activity, encouraged mainly by Hu and Zhang, was termed by Zhang as a
“mentor-apprentice pair”. In fact, this mentor-apprentice pair valued both guidance
and participation in learning and cognitive development activities (Rogoff, 1990),
which thus allowed students to co-construct a ZPD for them to cognitively process
their learning tasks at a higher level (Vygotsky, 1986). In turn, this ZPD placed
effective learning in a social and cultural context rather than a simple knowledge
development context (Jones & Saville, 2016; Sjøberg, 2007). Therefore, the unique
phenomenon that a pair of students stood up together when Zhang questioned one of
them in her class was explained in the interview. According to Zhang, her reason for
students to form such a pair was supported by her teaching as well as language learning
beliefs.
Zhang: It is impossible for me to explain 100% of what I want to say in class, then I
have to train the students with a high language proficiency level. For example,
I just said, only four students made a mistake with this test item, then you go
to your mentors. If your mentor can’t help, then you go to your English subject
representative; if the representative can’t help, then you come to me. Well, it’s
like this, this way, and then it can be a bit easier for me. Moreover, I think this
is definitely beneficial for students, why is it beneficial? I (i.e., the student)
can do this exercise, but if I need to help you to understand it, this process, will
be a process of reviewing my own knowledge. If the student can explain it
clearly, for the student [this is beneficial]. Besides, the student will find, well,
I (i.e., the mentor) can get the correct answer to this exercise, however, when
I explain to you (i.e., the apprentice), I can’t accurately tell you my problem-
solving process or I can’t help you to understand it. Okay, then here comes the
problem, so that you need to study. (Interview)
As explained by Zhang, such a “mentor-apprentice pair” consisted of a student

with high language proficiency and a student with lower language proficiency. She
believed that once the mentor could clearly explain the problem-solving process to the
apprentice, then it meant the mentor understood the knowledge clearly. Further,
mentors saved Zhang’s time which could be used for other aspects of teaching.
Therefore, from this example, the importance of peer help in the learning process was
verified. In fact, this interactive activity was also closely related to learner autonomy,
which will be discussed later.
Additionally, students from School C confirmed this learning-oriented strategy

organised by Zhang in test preparation classes. According to students, adopting a
mentor-apprentice pair (Kai-SC) and positioning students to provide instruction were
believed by both Zhang and her students to be learning-oriented. Therefore, this
opportunity of improving students’ learning was created by the teacher and recognised
by students as reflecting a learning-oriented purpose in this high-stakes test preparation
context.
Promoting learner autonomy
According to Lan and Zhang, suggesting ways for students to learn by

themselves was a condition for students to improve their learning. This learning-
oriented strategy was thus regarded as “promoting learner autonomy” in this study.
For example, as English was a conduit for students to learn more about the world, Lan
regarded it necessary for students to be autonomous learners and develop effective
methods of self-study. Hence, she suggested that students learn by themselves in their
spare time.
Lan: I encourage students … to read more, study more in their extra-curricular time.
Then well, when they encounter some new words, if the words are related to
our topics, topics that are specified by the SHSEET, or are closely linked with
their life, I will, well, suggest them to, to look it up in the dictionary by
themselves, or to make one or two sentences by themselves, okay. Otherwise,
[I will suggest them] to look for one or two sentences which are relevant to
real-life in the dictionary. (Interview)
From the above quote, it was clear that Lan encouraged students to be
autonomous learners and learn vocabulary in their self-study time. In order to achieve
this purpose, she suggested that students develop their own learning goals for
vocabulary study. Therefore, Lan told students to make their own decisions about
vocabulary learning in the test preparation period. Similarly, Zhang reminded students
to learn from a good example when she discovered that high-achieving student Mia
was an autonomous learner in her self-study time.
Zhang: Now, Mia, what about you?
Mia: Every night, after going back home, I write what I learned in the class for each
day, and then according to, according to the previous knowledge, I check my
learning.
Zhang: Okay, review, go through, and write the test review knowledge learned every
day. … Like just now I asked David to stand up and tell me what he learned
yesterday, but nothing. Is he having a poor memory? No, what’s the problem?
What’s his problem? Can he understand in the class? I’m sure he can. What’s
his problem? [pause] He did not do what Mia said, review after returning
home. … Now I precisely tell you (i.e., David), what the teacher teaches every
day, you should find some time after going home, what should you do with the
knowledge? Summarising it, okay? Remember, it’s very important for
everyone, okay? Good, okay. (SC-CO3)
Zhang regarded Mia’s self-study on an everyday basis as an important strategy

for students’ learning. In her opinion, this way of autonomous learning was the key for
students to make progress and thus improve their language proficiency. Therefore, like
Lan’s suggestions to students, this was about empowering students with the ability to
be independent learners. According to teachers in this study, this LOA practice of
learner autonomy closely linked to language proficiency levels and it could thus help
students to improve their learning (Corno & Mandinach, 1983; Deng, 2007; Zhang &
Li, 2004). In this fashion, teachers expected to improve students’ learning outside
classroom time, which thus provided an opportunity for GVT preparation to
incorporate LOA principles.
Moreover, students also reported the practice of learning by themselves which

was important to them and more relevant to an extra-curricular context. This practice
was also regarded as promoting learner autonomy. According to students, in adopting
autonomous learning strategies, they tended to choose their own learning materials
(Fei-SA, Chao-SA), summarise their own learning problems (Xia-SA), plan their own
learning progress (Ling-SA), and find learning activities for oneself such as
downloading software for English learning (Na-SB, Hua-SC). However, these
activities were mainly mentioned by School A students and relatively high-achieving
students from School B and School C. Therefore, it is possible that the development
of learner autonomy, as an opportunity for LOA practices in GVT preparation, may be
linked to school profiles (in terms of the learning proficiency of students) and student
language proficiency. This result thus echoed the claim that learner autonomy and
language proficiency are reciprocally and positively correlated (Ablard & Lipschultz,
1998; Deng, 2007; Little, 2007; Risemberg & Zimmerman, 1992; Zhang & Li, 2004).
Promoting learner autonomy had a two-fold characteristic in the current

qualitative data. On the one hand, learner autonomy had similar characteristics with
classroom activities during test preparation, as students claimed that being an
autonomous learner meant doing what they did in the classroom by themselves in their
self-study time. Therefore, instead of teachers giving instructions on language
knowledge, autonomous learners (Shu-SB, Fang-SC) tried to understand and
comprehend language knowledge as well as learning problems by themselves. On the
other hand, learner autonomy was also reported by low-achieving students who took
the initiative to seek help from teachers and peers to further improve their learning.
For example, Meng-SC reported seeking teachers’ and peers’ help when dealing with
his homework.
Further, in contrast to most students from School A and School B, some School
C students reported that learner autonomy was required by teachers (Fang-SC, Jing-
SC). Therefore, this triangulated the teachers’ data of promoting learner autonomy
among students. Significantly, although learner autonomy seemed to be practised
among students with different levels of language proficiency, it was found to be
prominent with high-achieving students from the qualitative data. Drawing
implications from these findings, further relationship between learner autonomy and
student language proficiency will be presented in section 7.5.
To summarise, although the importance of learner autonomy has been

established by researchers (Salamoura & Unsworth, 2015) and triangulated by both
teachers’ and students’ data in this study, it has not been foregrounded in LOA
theoretical frameworks (Carless, 2007; Jones & Saville, 2016) except that Lamb (2010)
claims that learner autonomy is closely linked with LOA. The current findings thus
suggested the factor of promoting learner autonomy be added to the LOA theoretical
framework and LOA practices in the current washback model. This theoretical
adjustment helped to indicate positive washback brought by the GVT and enrich the
micro level of the LOA cycle, which constitutes three key principles in an outside
classroom context: learner autonomy, learning from peers, and involvement in
assessment.
Against this backdrop, the learner autonomy practice discovered in qualitative

findings was investigated in the student survey. Taking reference from Zhang and Li
(2004), the survey included questions on a range of autonomous learning practices
both inside and outside class. As shown in Table 7.3 (see next page), the proportion of
students who reported often and always seeking help from teachers and students was
higher than those who never or seldom did (44.4% versus 22.9). The same tendency
remained for problem-solving by oneself (45.6% versus 21.5%), actively taking part
in activities (34.8% versus 31.6%), and making a good use of self-study time for
learning English grammar and vocabulary (33.0% versus 28.7%) but reversed for
attending extra-curricular activities such as practising conversations with classmates
(27.6% versus 42.3%). The reversed finding was not too surprising as students did
report that a great portion of their self-study time was spent on doing test-driven
exercises. Nonetheless, the general positive finding regarding learner autonomy
generalised the qualitative results from students, which indicated that the GVT did
exert a positive washback on students’ learning in this respect.
Table 7.3
Indicators of learner autonomy (see instrument reliability and validity in section 4.5.3)
Involving students in assessment
The last learning-oriented strategy concerning GVT preparation was involving

students in assessment, which was explicitly explained by Hu. Due to the summative
nature of the SHSEET and some constructed-response items in the GVT, teachers
could get students involved in assessment by familiarising them with the marking
criteria of the test, which could be viewed as one form of “involvement in assessment”
(Carless, 2007). For example, Hu reminded students that the scoring of Sentence
Completion items was also based on their writing skills. Therefore, it was significant
that students should capitalise the first letter of the first word in a sentence.
Hu: Tasks such as this one (i.e., No. 74), if your writing style is wrong, then you
will get a zero mark here. Because it only has what? One mark, if you made a
mistake, the exam marker would not give you a mark of 0.5. (SB-CO1)
According to Hu, reminding students of the marking criteria could thus help
them know the writing principle of “using capital letters in a sentence”. In this way,
informing students of scoring rubrics could improve both students’ vocabulary
learning and their GVT scores. Therefore, this method of familiarising students with
the marking criteria was considered as the evidence of “involvement in assessment”
(Carless, 2007), which could work as the learning goal in a testing context.
Besides familiarising students with marking criteria of the exam, Hu also

encouraged students to do self-assessment. In fact, this self-assessment method had a
similar function of taking mock tests, which also aimed to improve students’ language
learning and test scores. Therefore, when explaining her way of teaching grammar and
vocabulary in the test preparation stage, Hu listed a variety of teaching methods, and
self-assessment among students was one of them. Her corresponding interview
accounts are presented below:
Hu: Yes, you need to constantly change [teaching methods]. … You can also ask
them to look for answers, because especially for the tasks that have
explanations [in test preparation materials], you can check the exercise
answers and then ask them to check [their own problems], isn’t this applicable?
(Interview)
Asking students to self-check exercises and reflect demonstrated Hu’s intention

for students to become involved in self-assessment. As one of the various assessment
activities that she used in test preparation teaching, self-assessment was her
encouragement for students to improve their learning and become more autonomous
learners. In fact, what Hu required students to do with assessment was also evident in
her students’ interview accounts of going through the process of assessment in their
self-study time. Hence, it was concluded that involving students in assessment
provided an opportunity for GVT preparation to incorporate LOA principles.
For students, involvement in assessment mainly referred to self-assessment in

self-study time. From the qualitative data, it was found that although students from
School B and School C were comparatively less likely to use autonomous learning
strategies than those from School A, self-assessment was used by almost every student
regardless of their language proficiency. As one of the key principles in the LOA
theory (Carless, 2007), students’ involvement in assessment was regarded to be
important in a learning-oriented test preparation stage (Jones & Saville, 2016). In the
current study, self-assessment meant to have students go through the whole process of
“do a mock test-check one’s answers-solve one’s learning problems”. In other words,
this process of self-assessment closely linked to learners’ autonomous behaviours.
This finding was hardly surprising as autonomous behaviours were found to be

related with and cultivated by assessment practices of self- and peer-assessment (Dam,
1995; Little, 1996; Oscarson, 1998; Tassinari, 2012). In particular, studies found that
autonomy would be enhanced when learner were encouraged to self-assess (Nunan,
1996) and the better involvement in (self-) assessment indicated a greater extent of
autonomous practices (Bell & Harris, 2013; Everhard, 2015). For example, some
students preferred assessing themselves through authentic test papers and gained
feedback on their results by checking answers or finding explanations online (Kai-SC).
Other students, who were clearer about their learning progression, tended to re-assess
themselves through using similar learning tasks that they had previously made
mistakes on or were not good at (for example, Xia-S4, Yao-SB, Na-SB). In fact,
through self-assessment, students were also aware of their learning progression and
knew their learning problems. Therefore, self-assessment was not simply about doing
test-driven mock tests; students had to detect their problems through this process.
Accordingly, four variables in the student survey investigated the involvement

in assessment practices from qualitative findings. However, although practices of
involvement in assessment such as peer-assessment was not reported by students in
focus groups, the quantitative survey identified students’ different practices of
involvement in assessment. Findings are presented in Table 7.4.
As shown in Table 7.4, the proportion of students who agreed that teachers
encouraged them to do self-assessment was much higher than those who disagreed
(46.3% versus 23.1%); the same tendency was evident for becoming familiar with
grammar and vocabulary scoring rubrics (60.2% versus 13.5%) and summarising from
exams taken (66.8% versus 12.1%). The quantitative findings indicated a positive
washback of the GVT from involvement in assessment practices encouraged by
teachers. Therefore, both qualitative and quantitative findings on students’
involvement in assessment converged and supported the argument that involvement in
assessment offered opportunities for incorporating LOA principles in GVT
preparation.
Table 7.4
Indicators of involvement in assessment (see instrument reliability and validity in section 4.5.3)
In conclusion, LOA opportunities offered by the GVT included participants’

actual but not explicitly reported incorporation of LOA principles in test preparation
teaching and learning. From classroom observations, preparing for the test involved
both learner-centred and content-centred interactive activities in test preparation
classes, but this was not consistent across the three classes. Nonetheless, feedback
practices were similarly applied across the observed classes. Finally, with the
incorporation of learning-oriented strategies such as performing mentor-apprentice
pairs, promoting learner autonomy, and encouraging involvement in assessment, it was
also possible for the GVT to promote students’ language learning in the test
preparation stage. In addition, classroom interaction, feedback, learner autonomy, and
involvement in assessment were also explored in the student survey and generally
positive results were obtained (see Tables 7.1, 7.2, 7.3 and 7.4). Therefore, LOA
practices of classroom interaction, feedback, learner autonomy, and involvement in
assessment depicted the LOA dynamic in GVT preparation. Yet, participants’
practices of these LOA opportunities varied across classes.
Setting out from the qualitative findings of this study and the theoretical
positioning of LOA (Carless, 2007; Jones & Saville, 2016) and learner autonomy
(Lamb, 2010), the following sections discuss two research hypotheses generated from
the qualitative findings.
7.3 LEARNING ORIENTED ASSESSMENT AS A DYNAMIC

MULTIDIMENSIONAL CONSTRUCT IN GVT PREPARATION
Informed by the qualitative findings and the theoretical conceptualisation of

LOA and learner autonomy (Carless, 2007; Jones & Saville, 2016; Lamb, 2010), the
student survey was designed with four constructs of classroom interaction,
involvement in assessment, feedback, and learner autonomy to investigate LOA
practices during GVT preparation. As discussed in previous sections, the identified
test preparation opportunities for incorporating LOA principles in GVT preparation
indicated that LOA in the GVT context may involve multiple factors of perceptions
like students’ EFL learning stage and practices like feedback. However, as the study
aimed to explore the potential LOA practices to provide implications for GVT
preparation, special attention was thus paid to LOA practices in test preparation stage
in this section. Therefore, besides the frequency check as presented in section 7.2, this
section measures those four constructs to explore the dynamic of LOA practices in
GVT preparation. The following research hypothesis was proposed to test qualitative
findings and theoretical assumptions by using a larger sample of SHSEET test-takers:
Ha1: Learning Oriented Assessment in GVT preparation is a multidimensional

construct.
To test this research hypothesis, Confirmatory Factor Analysis (CFA) was

performed on sample two (N=488). Before processing CFA, the normality of the data
was tested (see Appendix L). The results showed that, except for v58, the critical ratio
of skewness, kurtosis, or both was beyond the cut-off value of |2| (Field, 2009), thus
all the indicators deviated substantially from univariate normal distribution. As for
multivariate normality, the measure of multivariate kurtosis (140.854 was greater than
the cut-off value of 3 (Yuan et al., 2002), which demonstrated a substantial deviation
from multivariate normal distribution. As a result, when conducting CFA,
bootstrapping methods were adopted to grapple with the problem of violation of
normal distribution. The purpose here was to testify the multidimensionality of LOA
practices in GVT preparation as suggested by theories and qualitative findings.
Figure 7.1 demonstrates each of the four dimensions of the LOA practices, their
corresponding indicators, and the correlation between the four LOA dimensions. All
the values of standardised regression weight were above .60, and most of them were
above the preferred value of .70 (Hair et al., 2006). Moreover, the squared multiple
correlations (SMC) were all above the acceptable cut-off value of .30 (Jöreskog &
Sörbom, 1989) and most of them were greater than .50 (Jöreskog & Sörbom, 1989).
Figure 7.1. Structural model for the relationship within LOA practices in GVT preparation (N=488)
The CFA results of the construct of LOA practices with 17 variables showed a
good model fit (CMIN/DF=4.162, df=113, p=.000, SRMR=.050, RMSEA=.081; 90%
CI [.073, .088]; TLI=.908; CFI=.924). Although the model had a significant chi-square
value of 470.361, the ratio between the chi-square value and the degree of freedom
(470.361/113=4.162) was not very high. Standardised root mean square residual
(SRMR) was .050 which was below the cut-off value of .08 (Hu & Bentler, 1999);
baseline fit indices of TLI (.908) and CFI (.924) were above the cut-off value of .90
(Bentler, 1990); and the value of RMSEA (.081) was very close to the cut-off value of
.08 (Ho, 2006; Schreiber et al., 2006). Given its complexity, the model had a
reasonably good fit.
As shown in Figure 7.1, the four constructs were significantly and positively
related to each other. Their correlations vary in range between .53 and .67,
demonstrating a relatively strong inter-correlation across the four constructs. In the
literature, researchers (Brown, 2006; Ockey & Choi, 2015) recommended that
correlations above 0.85 indicated low distinction between constructs. As the
correlation results among these four factors were all below the cut-off value of 0.85,
the analysis was proceeded, which showed that Learning Oriented Assessment in GVT
preparation was a multidimensional construct with four constituting components of
classroom interaction, involvement in assessment, feedback, and learner autonomy.
These findings thus further prove the rationale for including learner autonomy, though
not conceptualised in theories (Carless, 2007; Jones & Saville, 2016), in this proposed
model of LOA practices during GVT preparation. Indeed, as claimed by scholars,
learner autonomy can promote LOA (Lamb, 2010) and closely link with assessment
practices such as self-assessment (Dam, 1995; Tassinari, 2012).
The above findings thus supported the argument that LOA cycle is an ecological
model, which combines both classroom evaluation and standardised test evaluation
(Jones & Saville, 2016). However, instead of investigating LOA from a teacher
perspective and at a macro level (i.e., focusing on assessment and educational system
level), this survey took students as participants to offer micro level evidence for the
synergy between high-stakes standardised testing and in-class as well as extra-
curricular learning.
Drawing implications from classroom observations and qualitative interviews,

in addition to these identified LOA practices, students’ language proficiency seemed
to be related to their LOA practices during GVT preparation. This finding was
therefore tested, and the results of quantitative analysis are reported in the following
section.
7.4 LEARNING ORIENTED ASSESSMENT PRACTICES IN GVT
PREPARATION AND STUDENT TEST PERFORMANCE
As found in previous sections, student language proficiency was commonly

found to be influential in incorporating LOA principles in test preparation practices,
particularly learner autonomy. Accordingly, another statistical hypothesis was
proposed.
H01: Student test performance has no statistically significant correlation with

students’ LOA practices in GVT preparation.
Due to the issue of confidentiality, students’ SHSEET scores could not be

obtained directly. Therefore, their self-reported SHSEET scores from the 2018 test
was taken as the indicator of their actual test performance and Spearman’s correlation
was used to test the null hypothesis. A series of Spearman’s correlations were obtained
to examine whether a relationship existed between students’ self-reported SHSEET
scores on the one hand, and classroom interaction, involvement in assessment,
feedback, and learner autonomy on the other hand. Correlation results are listed in
Table 7.5.
Table 7.5
Correlation coefficients between SHSEET score and LOA practices (N=922)
As shown in Table 7.5, there was a significant relationship between a student’s

SHSEET score and the four LOA practices (variables) in GVT preparation. As such,
the null hypothesis H01 was rejected. It should be noted, however, correlation
coefficient results did not indicate any causal relationship between students’
achievement and the four LOA variables.
In addition, Table 7.5 showed that the correlation between the SHSEET score
and learner autonomy was the highest (r=.418, p < .01) of the four. It thus indicated
that a higher SHSEET score was significantly and positively associated with a
student’s learner autonomy practices, and this association has a moderate strength. The
second highest correlation was found between the SHSEET score and the involvement
in assessment practices (r=.389, p < .01). In comparison, there was a weak, though
significant, correlation between the SHSEET score and feedback practices (r=.276, p
< .01), and between the SHSEET score and classroom interaction practices (r=.155, p
< .01).
The correlation results were consistent with the qualitative findings presented in
previous chapters and the theoretical conceptualisation of LOA (Carless, 2007; Jones
& Saville, 2016). First, the moderate significant relationship between students’ self-
reported SHSEET scores and learner autonomy corresponds to the claim that the
development of learner autonomy is mutually interactive with the growth of language
proficiency (Little, 2007). It was also consistent with extant literature (Ablard &
Lipschultz, 1998; Deng, 2007; Risemberg & Zimmerman, 1992; Zhang & Li, 2004)
which found that test score tends to be more closely linked with learner autonomy, but
the relationship between test score and learner autonomy practises was not a simple
causal relationship. Likewise, although there was a weak relationship between self-
reported SHSEET scores and involvement in assessment, the significant relationship
appeared to indicate that assessment practices such as teacher-, self- and peer-
assessment are relevant to students’ language proficiency (Iraji et al., 2016; Liu &
Brantmeier, 2019; Oscarson, 1989).
7.5 CHALLENGES OF THE INCORPORATION OF LOA PRINCIPLES IN

GVT PREPARATION
In this section, the challenges of incorporating LOA principles in GVT

preparation from both teachers’ and students’ perspectives are reported. To explore the
challenges of the incorporation of LOA principles, specific interview questions in
relation to participants actual teaching as well as learning experiences that reflected
key LOA principles of classroom interaction, feedback, and target language use (Jones
& Saville, 2016) were asked. Therefore, the perceived challenges as mentioned by
participants were mainly explored through the lens of those questions.
To start with, it is important to note that those challenges were identified from
participants’ perceptions and thus were explored mainly through interview accounts.
Generally, both teachers and students perceived some challenges of the incorporation
of LOA principles in GVT preparation, as their primary goal of learning was to
increase test scores. In contrast to the perceived opportunities for incorporating LOA
principles in GVT preparation, more challenges than opportunities were expressed by
participants. In general, there are eight aspects of challenges as found in the interview
data:
• Efficient use of class time;
• The consideration of high test stakes;
• Student language proficiency;
• Administrative influence;
• Class size;
• The concern over teaching performance;
• Limited teaching experiences and expertise;
• Test method.
In this section, these eight perceived LOA challenges are reported subsequently.
7.5.1 Efficient use of class time

First, all three teachers agreed that the efficient use of class time was the priority
during test preparation. This impacted on both their use of the target language and class
activities.
Lan: …. Sometimes, for example, when I think that students could not understand,
I use Chinese instead. Because, after all, sometimes you need to be efficient to
prepare for the test. That is to say, you can use maybe two or three sentences
in Chinese to explain something clearly, but if you use English, maybe some
students need to comprehend for quite a while for even one sentence.
Therefore, [using English Language] should accord to different situation.
(Interview)
Zhang also admitted that not using the target language in English classes was
due to the consideration of saving time and making test preparation easy for both
teachers and students. However, Zhang was aware that this timesaving compromise
was not beneficial to students’ language learning. She understood that learning an L2
was more effective when learners’ exposure to the target language was maximised and
L1 use was limited (Collins et al., 1999; Ellis, 2006; Krashen, 2003; Lightbown &
Spada, 2019).
In addition, the decrease in the variety of classroom learning activities in test
preparation classes was noted by Hu. According to her, the test preparation classes
were more boring than other classes she taught.
Hu: Ordinarily, like when we did in the first round of test review, we still read
vocabulary, read sentences or the like, and then you had so many activities and
so on. Anyway, the closer to the test date, the comparatively fewer the
activities, which is more boring. (Interview)
Hu explained that the closer the test date, the fewer class activities. This
phenomenon reflected the negative washback of the test induced by the time
consideration. This aspect of class time and efficiency had long been recognised by
researchers (see, for example, Alderson & Hamp-Lyons, 1996) to negatively influence
teaching and learning activities, such as rarely using English in teaching (Yang, 2015).
Considering this finding, in the current GVT context, time was thus influential in that
it negatively affected the range of grammar and vocabulary activities and target
language use in test preparation classes.
In fact, not only teachers but also students considered the actual factor of class
time and efficiency. For students from School B (Xun-SB, Na-SB) and School C
(Ping-SC, Jing-SC, Kai-SC), GVT preparation could not incorporate LOA principles
due to class time and test preparation considerations. In response to implicit interview
questions regarding the use of LOA principles, students expressed their concerns over
having LOA practices such as using target language and receiving teacher feedback in
classes. From students’ responses, two general views were summarised.
First, School B students regarded using the target language (i.e., English) in
classes was not so important, which proved to be an obstacle for implementing LOA
principles in test preparation classes. For example, regarding English language use in
English classes, Na-SB clarified that not too much target language should be used in
class.
R: Okay, so you still want to use [English] in class, right?
Na-SB: But not too much, because if you do not understand [the language], it will
definitely influence the class efficiency in a negative way. (FG-SB)
In addition, the limited class time impacted differently on School C students who
felt that there was no need for teachers to give too much individual feedback in test
preparation classes. Their belief was that class time was too precious to waste on
individual students. For instance, Ping-SC proposed that her English teacher should
give her more and clearer feedback on her GVT exercises. However, when asked
whether she meant to have more detailed feedback on her mistakes in class, she
immediately clarified her previous answer by saying that this feedback should be
provided outside the class.
Ping-SC: Well, that (i.e., teacher helping individual student to analyse mistakes) can be
done in extra-curricular time, because the class time is already very short, and
then…
Kai-SC: Because it will be a waste of time, if we all go to ask for feedback.
Ping-SC: Also, what if some other students did not make such a mistake? It will be a
waste of time if it’s only you who made this mistake. (FG-SC)
To conclude, the consideration of class time and efficiency during the test
preparation period was perceived by both teachers and students as a primary challenge
to the incorporation of LOA principles, but this was not uncommon in a high-stakes
testing context. In fact, studies also found that in order to make good use of the time
for test preparation (Alderson & Hamp-Lyons, 1996) and improve test preparation
efficiency, students tended to spend their extra-curricular time in enhancing their
learning and enabling a more efficient classroom learning.
7.5.2 The consideration of high test stakes

Additionally, the high-stakes nature of the test was perceived by teachers to
influence test preparation teaching. According to all three teachers, high test stakes
influenced learning-oriented teaching practices negatively. In Lan’s school, the change
of textbooks in Grade 9 teaching was considered to be in response to the SHSEET.
Due to the fact that the SHSEET test design was based on People’s Education Press
(PEP) textbooks, the whole school had to change their previously used textbooks to
help students prepare for the summative exam in Grade 9. As for Hu, her decision to
use class time to talk about “contract signing” issues (see section 4.4.4) and
intentionally decrease some authentic activities was due to the high stakes of the
approaching test.
Hu: Some activities [e.g., making sandwiches, making conversations, playing

drama], you can’t have at Grade 9, because they are inappropriate. … Because
the test, this test stuff, how should I say? As for those kids, it is really
important, it is an important experience in their life. (Interview)
Further, she emphasised the importance of taking the test into consideration
during test preparation teaching. According to Hu, the exam-oriented teaching model
that she used in test preparation classes was important for her students’ future.
Hu: The mode of “lecture-evaluate-do exercises” is boring, but you can make it
more interesting and vivid, that’s the only thing you can do. But it is effective.
… I can’t make jokes with kid, kids’ future. I can’t [be selfish to] make the
English class better, more interesting, solely because I want to make it fun.
Future is more important than anything else, those kids’ future is more
important than anything else. (Interview)
In a similar vein, Zhang expressed the need to strictly adhere to GVT

requirements in her teaching. According to her, even the less challenging MCQ task
was important to her teaching, as the following quote illustrates:
Zhang: Because, this task (i.e., the MCQ task) has a great washback, if you do not test
it, then I will decrease, automatically decrease its weight in my teaching. We
(i.e., teachers) are definitely following the baton of exam, well, to be honest,
quality-education is still far away from us. According to our current situation,
generally we are still having exam-oriented education. Okay, so, especially for
my school, the trace of exam-oriented education is really heavy. This is true,
and I am also following the baton of exam in my own teaching. What the test
tests is what we teach. If vocabulary teaching, or the MCQ task will no longer
be tested, then we will gradually decrease our research on and exercises of this
task. (Interview)
Zhang emphasised the high-stakes nature of the GVT and its impact on teaching.
As a result, any change to the test content would certainly bring about changes to
teaching. The exam worked as a baton to influence their teaching in a negative way,
which was also found in the NMET context (Qi, 2004b).
Most importantly, the reported LOA practices in test preparation classes (see
section 7.2) could also support the consideration of the high stakes of the test. For
example, although different types of classroom interactions happened in classes, it was
noticeable that content-centred activities dominated classroom interactions and Hu
rarely had interactive activities in her classes to promote students’ learning. Therefore,
it was evident that learning-oriented interactions between teachers and students or
students and students were achievable but not always present in all three teachers’
GVT preparation classes. In other words, LOA practices were conducted differently
across the observed classes.
The limited number and types of learner-centred interactions were unsurprising
since all three teachers highlighted this obvious change in Grade 9 classroom activities,
as SHSEET approached. According to teachers, various activities were held before test
preparation (i.e., in Grade 7 and Grade 8). In semi-structured interviews, they all
mentioned English activities such as “performance, film dubbing, debating” (Lan), and
“role playing” (Hu and Zhang). Those learner-centred interactive activities were
mainly targeting at cultivating students’ English learning interest through language use
rather than mere teaching instruction (Zhang, interview). However, interactive
activities such as role-play were no longer included during test preparation, because
teachers regarded it as inappropriate to have such kind of activities in Grade 9 teaching
(Hu, Interview). Hu explained her practice of “lecture-evaluate-do exercises (“讲评练”
in Chinese)” in test preparation classes. According to her, the adoption of such a model
was common among teachers when they started test preparation. Moreover, due to her
limited teaching experiences and language proficiency, she felt that this teaching
model was “easy to use” and “effective” in improving students’ test scores (Interview).
All these practices closely linked with the high stakes of the SHSEET and it is
noteworthy that the decrease of the learner-centred activities as the test approached
undermined positive washback (i.e., learning-oriented opportunities) in relation to the
test.
7.5.3 Administrative influence

Besides time and test stakes factors, all three teachers mentioned the
administrative influence on the extent to which they were able to teach towards an
LOA purpose (i.e., viewing learner-centred and learning-oriented as ultimate goals in
teaching). By administrative influence, the data showed that teachers either followed
the decisions made by the administrative team or made decisions since they have an
administrative role. The former phenomenon was found in Lan’s situation as she was
a relatively new teacher in her school and had to follow the school-wide decisions.
About the latter aspect, the other two teachers had the decision-making power in that
Hu was a head teacher of the class and Zhang was the Director of Teaching Affairs in
her school.
In Lan’s school, the change from overseas textbooks to PEP textbooks in Grade
9 (see section 7.5.2) was a decision made by the school. It was thus not within her
control; in other words, teachers had to follow policies that the school administrative
team decided to implement during test preparation. Lan explained this accordingly.
Lan: Why did we change to PEP textbooks in Grade 9? It is because we have to

participate in the final tests at a district level at last. … This is indeed why,
well, anyway, we have to go through with it. (Interview)
On the other hand, Hu and Zhang were facing a different issue as they had
important administrative roles. For Hu, being a head teacher of the class meant
sacrificing her own class time for student affairs and other subjects. From her
viewpoint, as she should be responsible for students’ future life and study (see section
7.5.2), her role of head teacher made her spend class time on tasks like “contract
signing”. This necessity to spend students’ learning time on activities that were not
relevant to learning or assessment tasks thus went against the LOA principle of
“assessment tasks as learning tasks” (Carless, 2007). Likewise, Zhang who was the
Director of Teaching Affairs had difficulties in providing feedback to students because
she could not devote enough time to teaching. This challenge is expressed below:
Zhang: Well, yes, because it is like this, as for me, normally I am very busy, I have a
lot of administrative work. I am different from other teachers, for example,
other teachers’ offices are upstairs. For example, after class, if they have time,
they ask students to come to the office to give face-to-face feedback or
instruction. I can’t do this. I rarely have time to allocate to students separately.
Tomorrow I have four meetings, I can do nothing about it, after one class, then
I will go to four meetings, and then mid-of-semester meeting, parent meeting,
enrolment meeting, monthly exam meeting, and the leader of the teaching and
research group meeting, there is no way to give [time] to them. (Interview)
From the above quote, it was thus seen that Zhang’s difficulty in providing
feedback and spending time on students’ learning hampered her intention to have more
LOA practices in test preparation classes. This challenge was due to her administrative
role as the Director of Teaching Affairs. Therefore, it indicated that administrative
roles seemed to bring about challenges for teachers to incorporate LOA principles in
their test preparation teaching.
7.5.4 Student language proficiency

A further factor which negatively influenced the potential for incorporating LOA
principles in GVT preparation was the wide range of students’ language proficiency
levels in one class. In general, this challenge was mainly reported by Hu and Zhang.
The possible reasons for this finding were that their students’ language proficiency
levels differed to a great extent and their students were generally with a lower language
proficiency level than those from School A. As such, both Hu and Zhang chose not to
use the target language in classroom teaching.
Hu: Mainly used, previously used English, while later on we, later on we mainly
used Chinese to teach grammar points. If you use English to instruct grammar
points. … Well, our class’s level is certainly not up to that standard. It’s
definitely impossible. Because students’, that, level is not so high, [language
proficiency] level is not that high. (Interview)
In fact, not only target language use in classes, but also learning-oriented
activities in Zhang’s classes were negatively influenced by her students’ comparatively
low language proficiency. Therefore, when talking about the learning-oriented activity
of forming mentor and apprentice pairs in her classes, Zhang commented on her
disappointment in her current students’ level of language proficiency.
Zhang: But, not this grade, I used this method very well in previous years [with other
Grade 9 students who had already graduated]. The students themselves this
year, well, regarding mentors, I can’t find those very excellent ones, perhaps.
(Interview)
According to Zhang’s further comments, even high-achieving students who

worked as mentors in her class were not as proficient in English as she expected. This
factor thus negatively influenced the efficiency of her promotion of mentor-apprentice
pairs with a learning-oriented intention. As a result, it was regarded as the fourth
challenge to the incorporation of LOA principles in GVT preparation.
This challenge was also identified by students. For example, according to Long-
SB, students’ language proficiency levels should be taken into consideration when
using target language in classroom teaching and learning.
Long-SB: I personally do not like using English in the class, either. Because in my
opinion, in the class, well, it’s mainly because my class, [the students’
language proficiency] is greatly divided, we have [students with] high
language proficiency, but there are also many medium [language proficiency
students], and even low [language proficiency students]. Therefore, I think that
if our English teacher only uses English in the class, she will not be able to
take care of those, that is to say, students with low language proficiency and
students whose English pronunciation is not good. (FG-SB)
From the above comment, Long-SB regarded it necessary to consider students
with low language proficiency, and thus suggested to not use the target language in
actual English classes. Instead, he thought it feasible and helpful for students,
especially the low language proficiency students, to use the target language outside
classes. Thus, this finding may help to explain the lack of target language use in Hu’s
classes. Taking Long-SB’s accounts into consideration, it thus recalled both teachers
and students expressed similar concerns over the challenge of the incorporation of
LOA principles through the lens of the target language use and having LOA activities
in test preparation classes.
7.5.5 Class size
Moreover, class size was repeatedly mentioned by Hu during the interview. In

fact, the effort towards teaching test preparation that could improve students’ learning
was hampered by her excessive workload since she had a larger number of students
than the other two teachers. According to Hu, some of the learning-oriented teaching
strategies and practices were not feasible in her context. The less frequent use of
student portfolios in the test preparation stage was a good example to illustrate this:
Hu: Certainly, some forms seem to be useless. For example, portfolio, I didn’t set
up a very detailed portfolio for those kids to track their learning progress. This
might be due to our (i.e., teachers) time and effort. Time, effort, if you want to
have a detailed record for everyone to trace and investigate this such as
portfolio, actually [it is] very, very complicated. Even if you keep a portfolio
for one person is very painful and hard, let alone for dozens of them. I have
more than 100 students, more than 100, this class has more than fifty and the
other class has more than fifty. (Interview)
According to Hu, her failure to keep detailed learning records for students were
thus a challenge for her to implement LOA principles. In fact, a portfolio “is a
purposeful collection of student work that exhibits the students’ efforts, progress and
achievements” (Paulson et al., 1991, p. 59). As an effective tool for both formative and
summative assessment, a portfolio has been regarded as crucial for both teachers and
students to attend to their own learning progress (Lau, 2018). By showing students
their learning portfolios, Zhang expected students to self-direct their learning which
helped their learning-oriented test preparation as similarly shown in other studies
(Delett et al., 2001; Mok, 2013). However, from the above accounts, it was found that
Hu had more than 100 students to teach. Therefore, in contrast to Zhang, who used
student portfolios to monitor students’ grammar and vocabulary learning, Hu felt
challenged to use portfolios in her teaching due to the large number of her students.
Thus, class size was a challenge to the implementation of LOA principles in GVT
preparation.
7.5.6 The concern over teaching performance
An additional challenge was the teachers’ concern over their teaching

performance. Mentioned by Hu in particular was her concern about students not
listening to her class. Due to this concern, Hu explained why she chose not to use the
target language in her classes.
Hu: Those kids are very wayward, especially according to their current teenage
psychology. If you force them to learn English, but they could not understand.
They will then think ‘I spend time in English subject but still couldn’t improve
my test score, so why should I learn?’ … They will ask you the answers for
that. Perhaps for me, I’d rather give up many of my previous teaching
behaviours (i.e., teaching activities), in order to at least make sure that he will
listen to my class. (Interview)
This effort of getting students to listen to her class was indeed a reflection of
Hu’s concern for her teaching profession. As Hu commented in her interview, her
sacrifice of using the target language and interactive classroom activities in test
preparation courses was due to the consideration of students’ needs to get a higher test
score. As the head teacher of the class, she had to consider students’ test scores and
make sure that students listen to her class, which would be viewed as her teaching
performance at the end of Grade 9. This concern, proved by the current research
participant group and other studies (see, for example, Tsagari, 2011), was generally
viewed as the evaluation criteria for teachers’ academic performance and thus aroused
teachers’ anxiety. Therefore, teachers’ concern over teaching performance or career
security may impede their intentions to have LOA activities such as learner-centred
interactions during GVT preparation.
7.5.7 Limited teaching experiences and expertise

Teachers’ limited teaching experiences (Lan, Hu) and language proficiency or
expertise (Zhang) appeared to be a challenge to the implementation of LOA principles
during test preparation. As for Lan and Hu, their identities as new Grade 9 teachers
explained the reason why they felt it difficult to implement LOA principles in GVT
preparation courses.
Lan: However, Grade 9, because I am a new teacher, I haven’t taught Grade 9

before, so, many other teachers who are experienced may teach Grade 9 better
than me. As for me, well, it’s still mainly me who guided more [in the class],
because, recently I am quite confused. Because, in other words, you have to
complete the [teaching] schedule, but if you always guide students, and for one
question you have to guide for a long time, and the student still couldn’t give
you the answer, that will be too much waste of time. Well, I am still falling
short in teaching. (Interview)
From the above comment, it was evident that Lan felt challenging due to her
inexperience of Grade 9 teaching as a new teacher. In fact, Hu had a similar challenge
since she was also a new teacher. Therefore, this explained why Hu believed that the
mode of “instruct-evaluate-do exercises” was the best teaching method she could use
in test preparation.
In addition, compared to Lan and Hu, although Zhang had 18 years’ teaching
experience, she had a lower level of English proficiency. Therefore, she sometimes
used Chinese instead of using the target language of English in classroom teaching, as
she explains below.
Zhang: Regarding the issue of using Chinese, I think there are three reasons. The first
reason is that, my own level is not good enough. For example, if I say in a
“low” way, very easy words I can make it. But if it’s slightly difficult, and I
normally do not use those, well, then I can’t smoothly express my ideas. This
is within a certain time, well, this is my own [problem], that is to say, I easily
give up using English, and use Chinese instead. This is the first, my own factor,
that is, the teacher’s self-proficiency is not up to the standard. (Interview)
Zhang attributed her reluctance to use English in class to her lower level of
proficiency in English. As a result, although she was much more experienced in
teaching than the other two teachers, her limited language expertise hindered the
potential for incorporating learning-oriented activities and principles such as using
target language for a communicative purpose in test preparation courses.
7.5.8 Test method

The last challenge for GVT preparation to be learning-oriented was due to its
test methods as perceived by students in focus groups. Specifically, the use of the MCQ
task was discussed by School A students (Fei-SA, Ling-SA). In their opinion, the MCQ
task in the GVT did not seem to have the potential to be learning-oriented.
Fei-SA: It could not reach that level. … As for the MCQ task in the GVT, it emphasises
more about personal mastery of language knowledge, which rarely
emphasises that level and the aspect of cooperation [like completing and
presenting tasks or projects in groups, which I have mentioned in my
understanding of LOA term]. (FG-SA)
According to Fei-SA, the fact that GVT tasks which adopted a multiple-choice
method did not provide opportunities for classroom activities to incorporate
cooperative and interactive elements which were key in LOA theories (Carless, 2007;
Jones & Saville, 2016). Therefore, the use of MCQ and Sentence Completion in the
GVT, which were decontextualised, discrete-point items (Madsen, 1983), aimed to test
the accuracy of grammar and vocabulary knowledge (Halleck, 1992) and rarely
touched upon higher-level learning skills if the test preparation was to follow the LOA
principle of “assessment tasks as learning tasks” (Carless, 2007), thwarted the potential
for incorporating LOA principles in test preparation learning.
To sum up, teachers and students reported eight LOA challenges of the
incorporation of LOA principles in GVT preparation. Through the lens of LOA
principles and LOA practices such as using target language in test preparation classes
or having learning-oriented activities, the commonly recognised challenges included
efficient use of class time, the consideration of high test stakes, administrative
influence, student language proficiency, class size, the concern over teaching
performance, limited teaching experiences and expertise, and test method. On the one
hand, the qualitative results showed that, compared to Lan, Hu and Zhang had more
challenges of the incorporation of LOA principles in GVT preparation. Most
importantly, those challenges mainly came from external rather than internal factors.
Nonetheless, due to those concerns, teachers had to make compromises in their
teaching and thus led to certain ‘teaching to the test’ phenomena (see section 5.5.1).
On the other hand, although students perceived less challenges, different concerns
were reported across the observed classes. To clarify, School A students (Fei-SA,
Ling-SA) regarded MCQs in the GVT to be unable to allow for LOA practices.
Further, students from School B and School C, considered efficient use of class time
to be the priority during GVT preparation. Additionally, School B students pointed out
that students’ language proficiency should be considered when having LOA practices
in test preparation classes.
7.6 CHAPTER SUMMARY
With the theoretical assumption that LOA is to provide a supporting framework

for learning (Jones & Saville, 2016), this chapter addressed the second research
question of the study to specifically explore the potential for GVT preparation to
promote learning for immediate stakeholders of Grade 9 teachers and students. In this
chapter, qualitative data from classroom observations and interviews regarding
opportunities for and challenges of the incorporation of LOA principles in GVT
preparation, coupled with quantitative data of student survey were reported. Broadly
speaking, teachers and students believed that the GVT offered various opportunities
for incorporating LOA principles during test preparation. Under those beliefs, they
applied identifiable LOA strategies and activities in their test preparation classes.
However, interview data showed that different factors challenged the incorporation of
LOA principles in GVT preparation. As such, the qualitative findings are first
summarised in Table 7.6 (see next page), and the important factors considered and
measured in quantitative analyses are highlighted.
Taking insights from qualitative results, two statistical hypotheses were tested in
the quantitative phase. CFA findings suggested that Learning Oriented Assessment in
the context of GVT preparation was indeed a multidimensional construct which was
constituted by classroom interaction, involvement in assessment, feedback, and learner
autonomy. The four-dimension model fitted the data well with the four constructs
significantly correlated with one another. Further, Spearman’s correlation showed that
there were positive and significant relationships between student test performance (i.e.,
self-reported SHSEET scores) and LOA practices in GVT preparation. Therefore, the
quantitative findings indicated that LOA in GVT preparation was a dynamic
multidimensional construct and the GVT could enable LOA opportunities (i.e.,
positive washback) in test preparation.
Table 7.6
Qualitative findings of RQ2
Chapter 8: Discussion and Conclusion
This chapter presents the discussion and the overall conclusion of the thesis.
Section 8.1 discusses and interprets the important research findings of this study by re-
addressing the research questions of washback value (section 8.1.1), washback
intensity (section 8.1.2), washback mechanism (section 8.1.3), and LOA opportunities
and challenges (section 8.1.4), which are summarised in section 8.1.5. Section 8.2
sums up the contributions and implications of the study from theoretical (section
8.2.1), methodological (section 8.2.2), and practical perspectives (section 8.2.3).
Section 8.3 documents the researcher’s reflections on conducting this exploratory
sequential mixed methods research (MMR) study. Section 8.4 summarises the
limitations of the current research. Section 8.5 delineates the recommendations for
future research. The whole study is then concluded in section 8.6.
8.1 DISCUSSION
This study, conceptualised through Green’s (2007a) washback model, Carless’

(2007) LOA theory, and Jones and Saville’s (2016) LOA cycle, investigated both
teachers’ and students’ perceptions of and preparation for the Grammar and
Vocabulary Test in the Senior High School Entrance English Test (the GVT) (RQ1)
and the opportunities as well as challenges for GVT preparation to be learning-oriented
(RQ2). To explore the holistic washback phenomenon of the GVT, this section
discusses key aspects of findings in respect to the two research questions in this study.
8.1.1 Washback value

With regard to washback value, the English Curriculum Standards for
Compulsory Education (ECSCE) and Test Specifications, as official test reference
documents, were found to be followed by teachers or used by both teachers and
students to a limited extent, and the GVT was found to influence Grade 9 teaching and
learning both positively and negatively in various aspects. Those aspects included
perceptions of test requirements, affective factors of test anxiety and motivation, test
preparation materials, and language-use oriented as well as test-use oriented grammar
and vocabulary learning strategies.
Chapter 8: Discussion and Conclusion 251

The failure of intended washback at the macro washback value
Regarding official test reference documents, participants, especially teachers,
perceived that the test design of the SHSEET and the GVT reflected the learner-
centred philosophy of these documents. However, it is found that teachers and students
engaged in “narrowing of the curriculum” (Madaus, 1988; Shohamy et al., 1996) and
the intended washback from curriculum developers failed in GVT preparation, which
was also found in the NMET context (Qi, 2004a, 2004b, 2005, 2007).
Both teachers and students recognised the central role of the ECSCE and Test
Specifications in GVT preparation. However, their focus was on grammar and
vocabulary lists in both documents, rather than following the teaching as well as
assessment principles in the ECSCE which highlight the principles of learner-centred
English education for compulsory education. This indicated the disconnect with the
stated intentions of the curriculum standards, as the test failed to bring about a positive
change at the macro level of washback value, based on the negative findings from both
Grade 9 teachers and students in this study. It thus echoes the criticism that
standardised tests which are used for gatekeeping purposes can generate negative
results and impede curriculum implementation (Dello-Iacovo, 2009), and the research
finding that intended washback by test constructors can fail when the gatekeeping role
and selection function of the exam conflict with each other (Qi, 2004b, 2005, 2007).
In addition, teachers focused more on test content rather than curriculum content
as also found by Al-Wadi (2020) and thus students’ communicative language use
ability could not be improved due to the “narrowing of the curriculum” caused in-class
drilling of grammar and vocabulary knowledge and tested skills (Saglam & Farhady,
2019). This finding reflected the conflict between the curriculum requirements and
actual test preparation needs in the test preparation stage. The reason might be that it
was difficult for teachers to respond to both the curriculum stipulation and test
requirements at the same time as similarly reported by Al-Wadi (2020), since they
constitute competing imperatives. Nonetheless, as argued by scholars (see, for
example, Messick, 1996; Shohamy, 2001), it is not the case that well-designed tests
can only generate positive washback and poor tests are destined to bring about negative
washback. Therefore, the reflection of learning-oriented principles in the test design
and the challenge for curriculum implementation in GVT preparation did not
necessarily mean that the GVT will only bring about positive washback or negative
252 Chapter 8: Discussion and Conclusion

washback, as complex washback results were identified in this study. In line with the
theoretical conceptualisation of this study (Green, 2007a) and the literature review on
the washback of high-stakes standardised English tests, the following two discussion
sections focus mainly on positive and negative washback findings of the GVT.
Positive washback of the GVT on teaching and learning

Positive washback in relation to the GVT was found in Grade 9 teaching and
learning. In particular, the positive washback of the GVT was indicated by
participants’ positive perceptions of test design characteristics of Cloze and Gap-filling
cloze tasks, non-anxious feelings and intrinsic motivation, and language-use oriented
grammar and vocabulary learning strategies.
Cloze and Gap-filling cloze tested communicative language use

In qualitative findings, different perceptions emerged from participants and were
considered as positive influences from the GVT. Among the various positive
perceptions, the most salient positive perceptions were that the passage-based tasks of
Cloze and Gap-filling cloze provided a rich context for testing grammar and
vocabulary and they assessed language use rather than simple language knowledge.
As such, these two tasks tested communicative language use.
In this study, participants’ perception of the overall ability to use language was
an indication of the potential for assessing some aspects of communicative language.
Although communicative language use was interpreted differently by teachers and
students, similar to NMET participants (Dong, 2020), it was possible that the two
Cloze tasks assessed students’ ability to use language and thus had the potential for
positive washback. In fact, it is widely acknowledged that the communicative features
of a test could bring about positive washback (Erfani, 2012; Hawkey, 2006; Wall &
Alderson, 1993), which is closely linked to the use of authentic and integrated tasks
(Biber & Gray, 2013; Jamieson et al., 2000; Ostovar-Namaghi & Safaee, 2017).
Therefore, the fact that passage-based GVT tasks of Cloze and Gap-filling cloze tested
students’ overall ability to use language, provided a rich context for language use, and
had topics that were relevant to real-life experiences indicated the communicative
language testing characteristics of the GVT design.
Furthermore, similar to the Use of English paper in Cambridge Main Suite

Exams such as the Cambridge English: Advanced (CAE) (Docherty, 2015), Cloze and

Gap-filling cloze tasks in the GVT had the capacity to assess communicative language.
These tasks, taking a lexico-grammatical approach, emphasised productive tasks over
discrete-point items such as MCQ which assessed receptive knowledge (Prodromou,
1995). Therefore, tests such as the Use of English paper requires not only basic
language knowledge but also language in context, which the passaged-based tasks of
Cloze and Gap-filling cloze can fulfil. Again, it indicated that GVT Cloze and Gap-
filling cloze tasks could be able to link form, meaning, and use to assess learners’
language knowledge and ability to use language. This could be possibly taken as the
reason for participants’ positive perception of the test, but further evidence should be
explored through methods like content analysis of GVT tasks themselves. It could also
be possible that participants felt that Cloze and Gap-filing cloze enabled them to use
grammar and vocabulary as a resource to accomplish or enhance the macroskill of
writing and communicative language use (Jamieson et al., 2000; Turner & Upshur,
1995; Wallace, 2014).
High-achieving students were not anxious and were intrinsically motivated

during GVT preparation
Positive washback was identified from participants’ affective feelings as some
students were not anxious about the test-taking and felt intrinsically motivated to learn
English grammar and vocabulary during GVT preparation. Both qualitative and
quantitative data showed that most students did not feel anxious about the GVT, which
contradicted other findings on standardised English as a Foreign Language (EFL) tests
in China, such as the SHSEET in Wuhan (Zeng, 2008). In addition, high-achieving
students who felt intrinsically motivated to learn English grammar and vocabulary
during GVT test preparation evidenced a positive washback tendency. This aligned
with studies that found test preparation encouraged students to learn English (Gan,
2009; Teng & Fu, 2019; Zeng, 2008) and even developed an enthusiasm to learn
English in extra-curricular time (Li, 1990). However, this finding may not be
generalised to all exams or all students, as different test design and focus (e.g.,
communicative language or language knowledge) could be influential factors for
washback value and this finding was mainly identified in high-achieving students.
Nonetheless, although different from other studies where anxiety regarding test-
taking was found, there were several reasons for students’ less anxious feelings
towards the GVT in this study. These reasons included but were not limited to: 1)

students in this study were generally high-achieving; 2) the test difficulty and test
stakes were lower than others like the NMET, the Graduate School Entrance English
Examination (GSEEE), which are all administrated nationally; and 3) the mitigating
influence of Yi Zhen (the mock SHSEET) which indicated the effect of students
already receiving contracts from schools based on their Yi Zhen performance.
Language-use oriented grammar and vocabulary learning strategies were

taught by teachers and adopted by high-achieving students
In this study, various language-use oriented grammar and vocabulary learning
strategies were taught by teachers and employed by learners during GVT preparation.
This echoed the positive washback on teaching methods and learning strategies in
different test contexts of the Internet-based CET-Band 4 (IB CET-4) (Wang et al.,
2014), the NMET (Zhi & Wang, 2019), and the SHSEET in Jiangxi (Yang, 2015). For
example, students who read extensively to accumulate language knowledge in the
present study were similar to other students who prepared for the NMET (Zhi & Wang,
2019). Since this strategy was useful in real-life tasks (Zhi & Wang, 2019) and for
better comprehension of the use of language, it was thus perceived as positive
washback on learning.
Interestingly, although teachers taught their learners quite similar language-use

oriented learning strategies during GVT preparation, the uptake of those strategies was
more pronounced among high-achieving students. This qualitative finding, identified
through mainly focus groups, was taken as a component in the construction of the GVT
washback mechanism that was tested.
Negative washback of the GVT on teaching and learning

Negative washback of the GVT was also identified in Grade 9 teaching and
learning. In particular, the GVT appeared to show negative washback through the
following aspects: participants’ negative perceptions of test design characteristics of
GVT tasks of MCQ and Sentence Completion, reported test anxiety and extrinsic
motivation, exam-oriented test preparation materials, and test-use oriented grammar
and vocabulary learning strategies.
Discrete-point approach in the GVT brought about negative perceptions and

GVT tasks were lack of test authenticity
Negative perceptions of GVT tasks emerged from qualitative data and were
further explored in quantitative data. However, these negative perceptions were found

to be more relevant to MCQ and Sentence Completion tasks. In effect, integrated tasks
are frequently adopted in large-scale standardised English tests such as the TOEFL
iBT, but other tests still choose to separately examine test-takers’ grammatical
knowledge, such as the NMET in China (Pan & Qian, 2017). Although this study
showed the potential for the teaching and learning of grammar and vocabulary to
develop students’ overall ability to use language, it was possible that the separate
testing of grammar and vocabulary tasks might lead to the passive intake of
grammatical instruction (Macmillan et al., 2014; Yang, 2015). Particularly, the use of
discrete-point tasks such as MCQ and Sentence Completion might exacerbate the
situation since the discrete-point approach has long been criticised for failing to
evaluate communicative ability adequately (Halleck, 1992; Prodromou, 1995). In this
way, it caused participants’ negative perceptions of MCQ and Sentence Completion
which were perceived as having insufficient language context and were open to
guessing.
Qualitative data disclosed that teachers and students often complained the lack
of authentic language in GVT tasks. As revealed by teachers and focus group
participants, most students thought that GVT tasks did not reflect real-life language
use, which raised the question of the quality of GVT tasks. This negative perception
was similar to what Zhi and Wang (2019) found in the NMET context when they
perceived the irrelevance of test content to real-life English threatened test
authenticity. Therefore, the similar issue of a lack of authentic language in GVT tasks
was considered as a threat to GVT authenticity in the SHSEET. In turn, this issue
indicated the influence of the negative washback potential of GVT design
characteristics on learning. Most importantly, these negative perception findings
highlighted that teachers as well as students felt the need for more authentic grammar
and vocabulary tasks in the GVT.
Low-achieving students were anxious and extrinsically motivated, and teachers

were anxious during GVT preparation
Negative washback was also identified from participants’ affective feelings as
low-achieving students and teachers in the qualitative stage were generally anxious
about the test-taking and some students felt extrinsically motivated to learn English
grammar and vocabulary. In addition to the qualitative evidence, more negative
response to survey item of v28 (39.9%, see section 5.3.2) reflected that student

participants did feel stressed about their parents’ and teacher’s criticism of them if they
achieved a less than ideal GVT score. Thus, for some students, they did feel anxious
about the GVT, which was similar to other SHSEET contexts reported by Zeng (2008).
As for teachers, they also admitted that they were under test-related pressure. In China,
high-stakes testing is closely related to the evaluation of teaching (Qi, 2005, 2007;
Zhan & Andrews, 2014), and the majority of schools in China treat test scores as the
most direct and objective indicators for evaluating the job performance of teachers (Qi,
2005). Therefore, with an accountability concern (Ballou & Springer, 2015), Hu and
Zhang felt anxious and worried about students’ test scores due to the gatekeeping role
of the SHSEET (Dello-Iacovo, 2009; Qi, 2005) and the use of results to reflect
teachers’ professional performance through students’ test scores (Tsagari, 2011).
Possible reasons for teachers’ anxious feelings might include but not limited to
the accountability issue. Worldwide, students’ performance in standard EFL exams
had been used to reflect teachers’ professional competence (McDonnell, 2004) and the
assessment criteria for teaching and learning performance was judged through a high-
stakes exam system which served to rank students, and evaluated schools as well as
teachers (McDonnell, 2013; Nichols et al., 2006; Tsagari, 2011). Based on the same
consideration, students’ test achievements became the main goal for both teachers and
students during GVT preparation, which led to the negative influence of the test where
teachers and students might avoid communicative activities and materials that were
not perceived as helpful in test score improvement as found in other test contexts (see,
for example, Tsagari, 2011).
Given this consideration, it was not surprising that students reported extrinsic
motivation such as competing with peers as important to their test preparation. This
was similar to other high-stakes EFL studies in China like the GSEEE as researchers
found students attached great value to instrumental motivation which was perceived
to negatively impact the education system and teaching (He, 2010).
The dominant use of exam-oriented test preparation materials

Regardless of the non-exam oriented learning materials reported by high-
achieving students, the majority of materials used by participants to prepare for
grammar and vocabulary learning were exam-oriented, which mainly included test-
based textbooks, school-designated test review coaching books, grammar and
vocabulary knowledge listed in test reference documents, and various test-driven

materials such as mock test papers. Their extensive use of those exam-oriented
materials was for the purpose of gaining a higher test score, which was perceived as a
detrimental washback on both teaching and learning. This negative washback was also
evident in other studies (Damankesh & Babaii, 2015; Saif, 2006; Wall & Alderson,
1993; Zeng, 2008; Zhan & Andrews, 2014) where participants ignored non-test
materials and used commercial test preparation materials for self-study. These are all
perceived as negative washback due to the fact that they represent a “narrowing of the
curriculum” phenomenon (Madaus, 1988; Shohamy et al., 1996) and triggered the
phenomenon of ‘teaching to the test’ and ‘studying for the test’ (Zafarghandi &
Nemati, 2015).
Test-use oriented grammar and vocabulary learning strategies were taught by

teachers and adopted by students
An overwhelming evidence of negative washback of the GVT was found on
teaching and learning as teachers and students reported an extensive use of test-taking
strategies to achieve better performance on GVT tasks. These test-use oriented
learning strategies are also reflected in the washback literature. Among these, test-
taking strategies such as relying on drilling past exam papers and rote-memorising for
the test were often used (Green, 2006b; Tsagari, 2011; Yang, 2015; Zeng, 2008; Zhan
& Andrews, 2014). In addition, teachers warned students not to attempt to use
vocabulary from outside the vocabulary lists in the ECSCE and Test Specifications in
both Gap-filling cloze and other constructed-response tasks in the SHSEET. This test-
wiseness strategy was, in fact, guessing what test designers wanted them to answer in
those tasks to achieve a higher test score in the exam; and this phenomenon was also
evident in other tests like the Beijing Matriculation English Test (Xu & Wu, 2012)
where students were found to guess raters’ intentions and preferences in the writing
task in order to leave a good impression on them. The extensive use of those test-use
oriented learning strategies could be the reason that grammar and vocabulary teaching
was isolated from the teaching of other macroskills. For example, the vocabulary
teaching was intense in the first round and grammar teaching was separated in the
second round of SHSEET review. In this way, it was possible that communicative
language activities were ignored, since grammar and vocabulary knowledge were
taught separately and were thus isolated from the teaching of integrated language skills
as found in other studies that focused on the MCQ task in the SHSEET in Jiangxi
(Yang, 2015).

In conclusion, literature shows the negative washback of high-stakes
standardised tests on teaching methods and learning strategies, which could be
generalised as ‘teach to the test’ (Qi, 2005; Yang, 2015) and ‘learn to the test’
(Zafarghandi & Nemati, 2015). The strategies used during GVT preparation also
evidenced ‘teach to the test’ and ‘learn to the test’ in the GVT context, which proved
negative washback of the GVT on both teaching and learning.
8.1.2 Washback intensity

Traditionally, test importance, also refers to the perceived “stakes” of the test
(Madaus, 1988), and test difficulty (Green, 2007a) have been regarded as two factors
that drive washback intensity (Green, 2013). In the new washback model incorporating
LOA, washback intensity is conceptualised as composed of participants’ perceptions
of test importance, test difficulty, and test preparation effort as indicated in Green’s
(2007a) washback model. From both qualitative and quantitative data, different
patterns of washback intensity in relation to the GVT were discovered, since both
intense in-class test preparation and a lower degree of extra-curricular test preparation
were identified. Quantitative findings from Multiple Correspondence Analysis (MCA)
in general supported the qualitative findings, but salient washback intensity results
regarding Gap-filling cloze stood out and distinctive washback intensity pattern
(Pattern 4 in MCA) was found to go beyond the washback intensity model (Green,
2007a). As washback intensity was closely related to test importance and test
difficulty, the following discussion aligns with these test perceptions.
More intense test preparation in classes and among low-achieving students

as well as survey participants
From classroom observations and teacher interviews, it was found that the in-
class test study of grammar and vocabulary was more intense than students’ extra-
curricular study during GVT preparation. In other words, teachers prepared for the
GVT in a similar pattern and with a similar degree of intensity. In addition, Pattern 4
in MCA results (see p. 189) showed that there were some survey participants who
spent a great deal of effort in GVT preparation but did not perceive the test to be
important or difficult.
As this study was conducted in the last semester of Grade 9, and the test date
was drawing close to teachers and students, it was unsurprising to find intense in-class
washback of which other researchers call “seasonality of washback” (Bailey, 1999;

Cheng, 2005; Cheng et al., 2011). As test importance and test preparation have long
been discussed by researchers (Dong, 2020; Green, 2007a, 2013), and as the GVT was
generally perceived by both teachers and students as highly important, it partly
explained the intense washback in test preparation courses. Therefore, it was assumed
that since the GVT was important to participants due to the SHSEET score use for test-
designated purposes (e.g., graduation) and their perceived test use purposes (e.g.,
proving students’ language proficiency), their teaching and learning behaviours tended
to change accordingly. This finding was also evident in other test contexts such as the
NMET (Dong, 2020) which found students’ perceptions of test importance
significantly influenced their learning practices.
As claimed, to achieve visible washback, tests need to be recognised as

important (Green, 2007a); however, the high importance of the SHSEET which
brought about intense test preparation of grammar and vocabulary in Grade 9 indicated
negative washback. The reason was that it induced a large amount of time spent in
classroom teaching of grammar and vocabulary knowledge, which could replace
regular classroom activities. This is a negative impact that has been documented by
researchers such as Alderson and Hamp-Lyons (1996). Therefore, this phenomenon of
eating into curricular time to prepare for the test was perceived as a major example of
negative washback on teaching and learning.
Additionally, students with low English proficiency tended to spend more effort
in preparing for the exam. As School C students reported in the focus group, teachers
required them to do test-driven exercises as homework, which was thus viewed as the
shifting of GVT preparation to extra-curricular study. This negative washback was
also manifested by the TOEFL preparation in the Sri Lankan context (Alderson &
Hamp-Lyons, 1996).
Finally, Pattern 4 in MCA results revealed that intense test preparation was
conducted by some survey participants but had no connection to their perceptions of
test importance and test difficulty. This finding did not align with the theoretical
assumption that visible washback or intense washback seen in significant effort during
test preparation was generated under the driving force of test importance and test
difficulty (Green, 2007a, 2013; Hughes, 1993). As such, whether Pattern 4
demonstrates an “invisible” washback effect or falls beyond the washback

phenomenon remains unknown. Although unexplained in the current data, this finding
could further indicate the complexity of washback phenomenon.
A relatively lower degree of test preparation in extra-curricular time

As has been highlighted in the washback literature, Alderson and Hamp-Lyons
(1996, p. 296) pointed out that tests can have “different amounts and types of washback
on some teachers and learners than on other teachers and learners”. Findings from the
current study supported this claim, as a large number of students had the least extra-
curricular test preparation in relation to the GVT (see section 6.3.3). Therefore, in
contrast to the qualitative findings of intense in-class test preparation and low-
achieving students’ intense GVT preparation in extra-curricular time, survey
participants did have different levels of test preparation. Moreover, survey
participants’ responses clearly verified the important roles of test importance and test
difficulty in gendering test preparation effort (Green, 2007a, 2013; Hughes, 1993).
First, focus group data showed a fluctuation of washback intensity among

students in relation to different GVT tasks. The lack of challenge in MCQ and Sentence
Completion triggered students’ perceptions that these tasks had relatively low
importance. Compared with Gap-filing cloze which had a high difficulty level and
induced more test preparation effort as shown through the survey, it was found that
high-achieving students who regarded those two tasks as easy felt reluctant to prepare
for MCQ and Sentence Completion.
Additionally, results from the MCA revealed the fluctuation of washback

intensity (Pattern 1-no washback; Pattern 2-less intense washback; Pattern 3-more
intense washback) to different students within the same participant group. These
results fitted the washback intensity model of Green (2007a) well, which verified that
washback intensity was closely linked with participants’ test perceptions of
importance and difficulty. Therefore, it again emphasised that to achieve visible or
intense washback, a test should be viewed as both important and challenging (Green,
2007a, 2013; Hughes, 1993).
In sum, although teachers demonstrated a similar washback intensity pattern in

classroom teaching, this study identified various patterns of washback intensity among
students in extra-curricular time regarding test methods, test difficulty, and test
perceptions. Therefore, it was assumed that those different factors resulted in differing

test preparation behaviours for the GVT, which indicated complex washback intensity
findings as proposed in the Washback Hypothesis (Alderson & Hamp-Lyons, 1996;
Wall & Alderson, 1993).
8.1.3 Washback mechanism

Drawing implications from theories (Green, 2007a; Hughes, 1993; Wolf &
Smith, 1995; Xie & Andrews, 2013) and empirical studies (Chapman & Snyder, 2000;
Dong, 2020; Mizutani, 2009; Xie, 2013, 2015a; Xie & Andrews, 2013), complex
results were found in relation to the constructed SEM model of the GVT washback
mechanism. In line with the scope of the current study, two major findings are
summarised and discussed in this section; that is, the influence of affective factors and
test preparation practices on test performance in the GVT washback mechanism.
First of all, students’ intrinsic motivation and test anxiety explained the influence
from perceptions of language use characteristics (Positive Perception2) and self-
perceived test use purpose such as proving language proficiency (Test Importance2)
to GVT preparation practices of language-use oriented learning strategies use (Positive
strategy) and Test Preparation Effort. This finding can be unpacked in three aspects of
intrinsic motivation, test anxiety, and Green’s (2007a) washback model as follows.
In this overall finding, intrinsic motivation was first perceived as a key factor in
positive washback on students’ learning. As such, students who were intrinsically
motivated could be in a better position to be those who used language-use oriented
learning strategies more frequently, which in turn appeared to be those with higher
SHSEET scores. This finding echoed that of Wolf and Smith (1995) who claimed that
test-takers’ perceptions of test consequence (i.e., test importance) significantly
influenced participants’ motivation which was positively linked with test performance.
However, as their motivation was scaled from high to low, it was not directly
comparable to the current motivation scale, which was constituted by both intrinsic
motivation and extrinsic motivation. In this study, it was understandable that
intrinsically motivated students tended to adopt more language-use oriented learning
strategies for GVT preparation, and the result that they also tended to spend more test
preparation effort might be because that those students did not see any harm in doing
test papers and they may also enjoy doing exercises like they do with other language
learning activities.

A second factor to be considered in this finding was the factor of test anxiety.
More specifically, the decrease in test anxiety will impact positively on students’ test
preparation practices. In effect, the finding indicated that the lower the test anxiety, the
more possibility that students applied language-use oriented strategies during GVT
preparation. This was similar to what Jin and Cheng (2013) found in the CET-4
context, which echoed Wolf and Smith’s (1995) finding that, in contrast to students’
test-taking motivation, a high level of test anxiety had a negative effect on test
performance. Therefore, it was thus appropriate to say that the less anxious students
were, the more chance for positive washback indication of the GVT.
Most importantly, the first major finding of SEM analysis suggested that GVT
design features with regard to testing the overall ability to use language (the construct
of Positive Perception2, see section 5.2.5) supported the theoretical conceptualisation
of the washback value dimension in Green’s (2007a) model, since the GVT as a whole
reflected the testing of overall ability to use language (focal construct) which brought
about positive washback on students’ learning (language-use oriented learning
strategy) through the indicators of intrinsic motivation and test anxiety. Therefore,
although perceived from a student perspective16, the macro level of washback value in
both Green’s (2007a) model and the new washback model incorporating LOA was
found to be partially supported. Therefore, the GVT design characteristics of testing
overall ability to use language indicated positive washback.
The second major finding of this SEM model suggested that students with higher
self-reported SHSEET scores tended to be those who adopted language-use oriented
learning strategies and spent more test preparation effort. These findings were also
reported by other researchers (see, for example, Dong, 2020; Green, 2007b; Xie,
2013). On the one hand, language-use oriented learning strategies used by students has
the strongest association with students’ SHSEET scores as self-reported in the survey
(r=.311). The finding echoed the researcher’s assumption that using more language-
use oriented learning strategies would link more closely with students’ test
performance and corresponded to the qualitative finding that high-achieving students
reported more application of language-use oriented learning strategies during GVT
16
Since essential construct validity evidence was not obtained in the qualitative phase, it was perceived
only from a student perspective by investigating their perceptions of test characteristics in the survey,
which was thus categorised as a micro level factor in the new washback value model of this study.

preparation. On the other hand, Test Preparation Effort was found to significantly
associate with students’ self-reported SHSEET scores (r=.234). It means that, students
with higher self-reported SHSEET scores tended to do more test exercises and spent
more time on GVT tasks in extra-curricular time. The finding was thus in contrast to
previous studies that found no significant relationship between test preparation and
learning outcomes (see, for example, Dong, 2020; Xie, 2013). However, since intense
test preparation effort was perceived by researchers (see, for example, Dong, 2020) to
hamper language ability improvement, it was thus regarded as indicating a negative
washback of the GVT on learning. Additionally, compared with language-use oriented
learning strategy, Test Preparation Effort had a weaker association with students’ self-
reported SHSEET scores. This finding could be explained by revisiting the frequency
results of those two factors (see Table 5.7, Table 6.3, and Table 6.4). As reported by
survey participants, their involvement in taking test papers and spending time in GVT
tasks were much lower than that they did in applying language-use oriented learning
strategies during test preparation.
To conclude, the SEM model of washback mechanism indicated that perceptions

of language use characteristics and students’ self-perceived test use purposes could
positively influence their test preparation practices of both language-use oriented
learning strategies and test preparation effort through intrinsic motivation and test
anxiety. Moreover, both language-use oriented learning strategies and test preparation
effort associated with students’ self-reported SHSEET scores of which the former had
a stronger association. However, it should be cautious to use these findings to claim
the washback value of the GVT, since the greater the test preparation effort, the more
negative washback of the GVT on extra-curricular learning.
8.1.4 LOA opportunities and challenges

LOA, as defined by Carless (2015, p. 964), is the assessment that primarily
focuses on “the potential to develop productive student learning processes”. In the
current study, LOA theories (Carless, 2007; Jones & Saville, 2016) were
conceptualised to explore positive washback of the GVT. As depicted in Chapter
Three, key principles of LOA practices during GVT preparation included classroom
interaction, student involvement in assessment, feedback, and learner autonomy.
These LOA practices were identified in participants’ test preparation, accompanying

their beliefs about LOA opportunities. However, challenges for the GVT preparation
to incorporate LOA principles were also articulated.
Participants carried out identifiable LOA practices

When considering classroom interaction, both learner-centred and content-
centred interactive activities are perceived to be important in learning-oriented
classrooms (Jones & Saville, 2016), which was indicated but still limited in the current
study. Effective interactions in classes were those interactive activities that promoted
productive learning in the co-constructed zone of proximal development (ZPD)
(Vygotsky, 1986) rather than pure knowledge transmission from teacher to student and
valued student engagement in the interactive process. Therefore, interactive activities
in test preparation courses should be stimulated by tests like the IELTS (Erfani, 2012)
which assessed communicative competence. Considering both the characteristics of
the GVT which was constituted of selected-response and constructed-response items,
and the classroom activities given by teachers were mainly doing test-driven exercises
(i.e., drilling), there were limited learning activities which resembled communicative
interactions during GVT preparation.
As for involvement in assessment, familiarising students with marking criteria

and taking practice tests in self-study time were widely used. Students were found to
simulate the assessment process in self-study time as they normally did with actual
exams. Therefore, by both teachers’ encouragement and students’ self-effort, the self-
study time involved self-assessment and feedback from their performance on practice
tests by students themselves. Thus, as found in this study, teachers promoted mainly
self-assessment, but there was little evidence of peer-assessment.
Further, teachers and students were intensively involved in feedback practices

which were mainly delivered in an oral form. However, different from others such as
Carless (2015) who construed feedback as multidirectional, the feedback practices
found in this study were mainly teacher-to-student feedback which was on the whole
unidirectional. Nonetheless, feedback practices were frequent, timely, detailed,
helpful, feeding forward, and considered satisfactory by students.
Most importantly, the consideration of learner autonomy during GVT

preparation was new to both Carless’ (2007) LOA conceptualisation which was
situated within the tertiary education context and the ecological model of the LOA

cycle (Jones & Saville, 2016), because learner autonomy is closely linked to students’
extra-curricular GVT study. In effect, students were expected to be autonomous
learners outside the class and autonomous behaviours were considered to be able to
promote Assessment for Learning (AfL) (Lamb, 2010). Moreover, learner autonomy
was closely connected with involvement in assessment (Dam, 1995; Little, 1996;
Tassinari, 2012) and reflection through self-assessment (Oscarson, 1998).
Additionally, Bailey (1996) proposed that positive or beneficial washback can emerge
from learner autonomy and self-assessment. Therefore, with further findings of inter-
relationships between learner autonomy, classroom interaction, feedback, and
involvement in assessment found in the current study, it is reasonable to claim that
those four factors provided evidence for positive washback of or LOA opportunities
for the high-stakes standardised GVT in practice.
Further correlation results appeared to suggest the potential association between

students’ self-reported SHSEET scores and four LOA practices of classroom
interaction, involvement in assessment, feedback, and learner autonomy. Among these
four factors, learner autonomy really stood out due to its much stronger association
with self-reported SHSEET scores as compared to classroom interaction (learner
autonomy, r=.418; classroom interaction, r=.155). The possible reason for fewer
classroom interactions might be that in-class GVT preparation was mainly dominated
by teachers, whereas learner autonomy was controlled by students themselves. In
practice, classroom interaction had a very small correlation effect with test
performance. Considering the fact that LOA practices in classroom were jointly
completed by both students and teachers (Jones & Saville, 2016), the disparity between
learner autonomy and classroom interaction suggested that classroom interactions
initiated by teachers might not closely correlate with students’ test performance.
Participants expressed both opportunities for and challenges of the

incorporation of LOA principles in GVT preparation
According to the interview accounts, teachers and students believed that there
were opportunities for GVT preparation to incorporate LOA principles, but they also
perceived difficulties in so doing. On the one hand, with the belief of developing
communication abilities in real life, teachers and students felt that GVT preparation
could promote their grammar and vocabulary learning. For the MCQ task, what was
tested could be used for daily communication or communication with foreigners; while

for Cloze and Gap-filling cloze, the topics in these tasks could offer them life
inspiration. This belief thus suggested that participants valued the ability to use
language rather than simply focus on gaining higher scores of the GVT.
On the other hand, when incorporating LOA principles in GVT preparation, both
teachers and students experienced challenges. Teachers felt that the high-stakes nature
of the SHSEET made them reluctant to include interactive activities in classes; instead,
they spent time on drilling. In other words, teachers concerned and were mindful of
improving students’ language learning, but their teaching were constrained by the test.
It thus closely echoed the fact that high-stakes standardised EFL tests in China were
mainly viewed by teachers to be ‘the baton of teaching’ (Qi, 2004b). Given the high-
stakes nature of the SHSEET, it was not surprising to notice that teachers felt torn
between communicative language teaching (CLT) and improving students’ scores on
the test. In turn, this indicated that the high stakes of a test hindered the intention of
improving student language learning (i.e., the implementation of LOA principles in
this study) during test preparation (Qi, 2005). Moreover, participants’ consideration of
the efficient use of class time (Alderson & Hamp-Lyons, 1996; Yang, 2015) and the
differing student language proficiency levels challenged their intentions to follow
LOA principles in GVT preparation, which was common during the high-stakes test
preparation period. Likewise, teachers’ concern over the large class size showed their
difficulty in having LOA practices. Interestingly, in view of the large population of
students in China and generally large classes in schools, it was unsurprising that
teachers who had larger class sizes could have more workload than those who had
smaller classes. Against this backdrop, it was possible that class size influenced LOA
practices in classes, as teachers with fewer students were found to be more likely to
implement AfL in classes (Danielson, 2008).
Reconceptualising the LOA dynamic in the GVT context

Drawing conclusions from the current research findings and the literature, the
researcher would like to conclude RQ2 by re-envisaging LOA in the GVT context in
Figure 8.1. From the research findings, it was clear that test preparation of the GVT in
three observed classes rarely focused on learner-centred interactive classroom
activities, instead, they constituted mainly task-level activities and interactions.
Therefore, although Chapter Three proposed the possibility of drawing a new LOA
cycle for the GVT, it was hard to apply this in the current research context. The reasons

are two-fold: first, at the macro education system level, insufficient evidence for
curriculum use, curriculum interpretation, and SHSEET score interpretation were
derived from participants; second, at the micro classroom level, teachers and students
were mainly doing exercises which were at a task level in test preparation courses. All
these factors made the whole teaching and learning environment hard to engender
systematic learning-oriented activities. Nonetheless, due to teachers’ limited
knowledge of LOA, although there was no systematic as well as ecological LOA cycle
in GVT preparation courses, LOA practices from both Carless (2007) and Jones and
Saville (2016) could still be drawn upon and summarised. Therefore, LOA could be
visible in GVT preparation from four aspects: classroom interaction, student
involvement in assessment (self-assessment in particular), feedback, and learner
autonomy. For the latter three aspects, both in-class and outside class activities were
beneficial to students’ test preparation learning. In addition, learner autonomy was
prominently added and participants’ perceptions of LOA opportunities as well as
challenges should all be considered.
Figure 8.1. LOA dynamic in the GVT context
To conclude, with a unique feature of comprising both in-class and extra-

curricular LOA practices, this LOA model could specifically apply to the summative
testing of the GVT and particularly the SHSEET in China, which is a high-stakes
standardised test and has dual functions of graduation and enrolment for junior high
school graduates. Therefore, in order to bring about the synergy between summative
assessment and its related teaching as well as learning, this study suggests the

following principles to be considered during the test preparation period for summative
assessment:
1. Promoting classroom interactions, both content-centred and learner-centred

interactions;
2. Involving students in assessment, particularly self-assessment both in and

outside class;
3. Providing feedback which can feed forward both in and outside class;
4. Encouraging learner autonomy (i.e., motivating students to study by

themselves and actively engage in class activities) both in and outside class.
As this study identified, implementing those principles could be beneficial for

teachers and students to achieve a learning-oriented purpose in the test preparation
stage.

In this complex washback study, washback on teaching and learning in Grade 9
was explored, influential factors for GVT washback phenomenon were found, and the
potential for incorporating LOA principles in GVT preparation was explored.
Although further exploration of the relationship between perceptions of test
characteristics, affective factors, and LOA practices is necessary, this study primarily
focused on identifying possible LOA practices which could indicate a positive
washback. As this exploration of LOA practices in summative test preparation for
junior high school students in China was the first of its kind, the further relationship
within the model remains yet to be substantiated through future studies. The research
findings of all influential factors in connection to the washback phenomenon of the
GVT are summarised in Figure 8.2 (see next page).

Figure 8.2. Washback model of the GVT
8.2 CONTRIBUTIONS AND IMPLICATIONS
By presenting the washback results and developing a theoretical washback

model to deconstruct the complex washback phenomenon in a junior high school EFL
context, this study proposed a new washback model that applies to the GVT and the
SHSEET. This study has thus made contributions in theoretical, methodological, and
practical dimensions.
8.2.1 Theoretical contributions

Although many studies have investigated the positive and negative, intended and
unintended washback, no systematic positive or intended washback examinations of
teaching and learning have been proposed. The current study, which incorporates LOA
theories, fills this research gap by clarifying potential practices for summative
assessment such as the GVT to be learning-oriented in respect of teaching and learning,
particularly learning practices. Specifically, the study made four major contributions
to knowledge.

Initially, this study enriched the limited body of research of washback on
learning and learners, focusing on a junior high school level. In washback literature,
high-stakes standardised EFL exams and teachers are always at the centre and
washback studies on learners are sparse (Damankesh & Babaii, 2015; Pan, 2014; Xie
& Andrews, 2013). Besides, washback studies on high-stakes standardised EFL tests
in China mainly focus on tertiary level exams (see, for example, Jin, 2000; Pan, 2014;
Ren, 2011; Xiao, 2014) or the NMET (see, for example, Pan & Qian, 2017; Zeng,
2010; Zhi & Wang, 2019). Thus, this study is timely and significant in the washback
literature.
Moreover, this study not only contributes to the richness of washback literature,
but also offers an opportunity to investigate the relationship between the two major
washback dimensions of washback value and washback intensity. In other words, this
study not only presented results of washback value and washback intensity, but also
focused on deconstructing the washback mechanism in the current GVT context. It
conceptualised the potential relationship between test perceptions and test preparation
practices through the influence of affective factors. Although qualitative methods
could effectively identify the actual factors influencing the washback mechanism of
the GVT, they were unable to statistically present and explore the internal relationship
among all variables (Dong, 2020; Xie, 2015a). Thus, the current study is meaningful
in that it performed SEM to investigate the complex relationships between various
washback factors.
Further, the washback mechanism of the GVT in this study is systematic and
thorough compared with similar washback studies. Although a handful of washback
mechanism studies have been conducted, they mainly discuss washback mechanism
separately. For example, studies explored relationships between test preparation and
learning outcomes (Dong, 2020; Xie, 2013), test perceptions and test preparation
(Dong, 2020; Xie, 2015a), and motivation factors and test preparation (Xie, 2015a).
Even though studies have investigated the relationship between test perception and test
preparation or learning outcomes through the influential factors like expectancy-value
(Xie, 2013; Xie & Andrews, 2013), the current study took into account the more
complex elements of affective factors to complete the comprehensive washback
conceptualisation.

Essentially, the new model of washback which incorporates LOA offers
researchers a new perspective to explore and analyse the complex washback
phenomenon as well as LOA practices in a high-stakes standardised test context. The
use of LOA theory in language testing and assessment as well as the wider education
field is gaining prominence. That said, the use of LOA theory in this study offers
contributions to this body of knowledge since learner autonomy was identified as a
key LOA practice in GVT preparation. This finding differs from that of Jones and
Saville (2016) who focused on in-class teaching and learning. This study incorporated
LOA opportunities in an extra-curricular context and added new key components of
classroom interaction and learner autonomy to the LOA model proposed by Carless
(2007). Therefore, in the new washback model which combined Green (2007a),
Carless (2007), and Jones and Saville (2016), LOA practices are considered as key
evidence of positive washback, which indicates a new direction for exploring positive
washback.
8.2.2 Methodological contributions

As this study employed a sequential exploratory MMR design, it contributes to
the MMR washback research in the field of language testing and assessment in the
following aspects.
First, this study systematically combined thematic analyses and advanced

statistical modelling. To the best of the researcher’s knowledge, few washback studies
have focused on using both thematic analyses and SEM to investigate and test the
washback mechanism. In fact, MMR studies such as Qi (2005) have adopted both
qualitative and quantitative methods but mainly used descriptive quantitative analyses
to triangulate the washback findings. SEM has been used in some pure quantitative
studies (rather than in combination with qualitative examination) to investigate
structural relationships within washback mechanism (Dong, 2020; Xie, 2015a; Xie &
Andrews, 2013). The current MMR design first triangulated different modes of
research methods, namely, classroom observations, semi-structured interviews, focus
groups, and a student survey; and then combined thematic analyses and statistical
models such as SEM and CFA to explore complex washback phenomena. It can thus
be claimed that this MMR washback study is comprehensive, thorough, and innovative
in data triangulation and analyses.

The second methodological contribution is the researcher’s trial of quantifying
washback intensity through MCA and LOA through CFA by using the results from a
student survey. To the best of the researcher’s knowledge, no quantitative approach
such as MCA has been undertaken to quantify washback intensity of EFL tests. In fact,
MCA has been more widely used in other research fields such as marketing (Hoffman
& De Leeuw, 1992; Hoffman & Franke, 1986). Therefore, the application of MCA in
quantifying the washback intensity phenomena provided a new methodological view
to examine washback phenomena in relation to washback intensity.
Most significantly, another methodological contribution is that this study has

taken an MMR design to quantify LOA practices in a high-stakes standardised test
preparation period. In fact, LOA so far has been researched mainly in a qualitative
methodology paradigm and with formative assessment (see, for example, Carless,
2015; Hamp-Lyons & Green, 2014; May et al., 2020; Tsagari, 2014). Therefore, this
study, using CFA to quantify LOA practices in a summative test preparation stage,
provides a methodological reference to explore LOA practices in relation to the test
preparation of summative assessment. This contribution was also enabled through the
validated student washback survey. Thus, the design of the survey demonstrated the
successful application of qualitative findings to the quantitative survey design; and the
effort expended in survey design ensured its quality and potential for generalisability.
8.2.3 Implications for practice

This study has investigated the influence of the GVT on teaching and learning
and teachers’ as well as students’ concerns about applying LOA principles in GVT
preparation. To reconcile the tension between assessment, teaching, and learning, the
findings provide practical suggestions and implications for in-service teacher training,
Grade 9 teaching, Grade 9 learning, SHSEET test design, and ECSCE curriculum
development.
Suggestions for in-service teacher training

Teaching and Research System in China, a highly institutionalised, systematic
mechanism cascading from the national through to the local and then to the school
level in China, is responsible for supervising, evaluating, supporting, and studying
teaching practice through professional development training programs for in-service
teachers (Mu et al., 2018). In this study, teachers’ interview accounts, Hu and Zhang

in particular, showed that the Teaching and Research System and schools should
consider the provision of opportunities for teacher professional development if they
desire to implement LOA principles. Although teachers could comprehend the concept
of LOA and they were found to incorporate some LOA principles in teaching, their
lack of teaching experience and assessment literacy was perceived as one difficulty in
implementing LOA principles during test preparation. As such, test preparation
teaching failed to follow the learner-centred teaching and learning-oriented assessment
principles that were featured in the ECSCE.
Against this background, teachers’ academic capacity of teaching and

assessment should be improved through in-service teaching programs and education.
Possible teacher education practices can include: providing workshops on topics that
reflect the synergy between teaching, learning, and assessment, including LOA,
feeding forward, interactive classroom activities, and classroom-based assessment to
build teachers’ teaching and assessment literacy; introducing the systematic as well as
ecological LOA cycle to teachers and highlighting their existing LOA practices such
as having mentor-apprentice pairs to assist the areas of teaching that they can change;
instructing teachers, students, and test designers in a deeper understanding of
communicative competence and communicative language use; instructing teachers in
designing grammar and vocabulary assessment tasks that embody communicative
language features; and instructing teachers to teach grammar tasks by means of
different conscious-raising task types such as focused communication tasks (Nitta &
Gardner, 2005). All these practices aim to improve teachers’ subject knowledge and
professional development, which are argued to benefit students’ learning achievement
in the end (Hill et al., 2005).
Suggestions for Grade 9 teaching

Importantly, it is necessary for teachers to teach students how to reconcile the
tension between assessment and learning. For example, teachers can first develop
assessment literacy and then try to involve students in assessment and build their
feedback potential to accommodate the reality of large classes in China. Teachers
should involve students in assessment by supporting them to gain knowledge of
implementing self- and peer-assessment. Teachers could engage students in providing
peer-feedback rather than simply giving a great deal of teacher-initiated feedback. To

this end, a good example of the mentor-apprentice pair (School C) in this study can
shed light on this practice and should thus be expanded to a wider context.
Additionally, it is crucial for teachers to realise the importance of achieving

positive washback potential through teaching language-use rather than test-use
oriented learning strategies. This claim is made according to students’ survey findings.
As SEM results showed, students with higher SHSEET scores tended to be those who
used more language-use oriented learning strategies in GVT preparation. Therefore,
since teachers’ main goal for test preparation is to improve students’ test scores at the
end of junior high school study, teaching and informing students to use language-use
oriented learning strategies such as reading extensively to accumulate language
knowledge did not conflict with their intention of achieving higher learning outcomes.
As the Chinese saying goes, “teaching benefits teachers as well as students (教学相长
)”. Therefore, it should be noted that the alignment of teaching, learning, and
assessment can be achieved through a positive washback potential by having learning-
oriented teaching practices during test preparation.
Significantly, the study makes available the possibility of recognising teachers’

expertise and spaces for improvement and getting away from deficit views of teachers
and their teaching practices. This study suggests that teachers should be aware of LOA
opportunities that can be applied in both test preparation and non-test preparation
stages. In brief, four key principles of classroom interaction, involving students in
assessment, feedback, and learner autonomy should be considered in teaching
processes. As these findings have been positively verified in this MMR study, teachers
can then try to make use of these practices and impart this knowledge to students, since
learning-oriented teaching should be a joint effort made by both teachers and students.
For example, methods like using the target language of English as the medium of
teaching, creating target language use opportunities in real-life situations (Dong,
2020), having classroom interactions that foreground learners and learning, especially
involving learner-centred classroom activities (Jones & Saville, 2016), and fostering
learner autonomy through means such as self-assessment (Tassinari, 2012) could all
be applied to build teachers’ LOA capacity. In turn, these methods can further improve
students’ learning to achieve the synergy between teaching, learning, and assessment
in a more effective way.

Suggestions for Grade 9 learning
Being autonomous learners could benefit students and their language learning.
In other words, students could be encouraged and motivated to learn by themselves
outside class time. As Zhang highlighted in the class and the correlation results
reported in Chapter Seven, learner autonomy could be perceived as an advantage for
test preparation and learning, which seemed to correlate with higher SHSEET scores
as self-reported by students. In order to be more autonomous learners, students can
make use of various resources in self-study time such as practising MCQ written
dialogues and grammar knowledge in GVT tasks in daily communication or enhancing
self-assessment by simulating the assessment process when practising mock GVT
tests. These resources are all low-cost and relatively easy to access, which could build
students’ knowledge of grammar and vocabulary during test preparation. Moreover,
students will also benefit from activities like collaborative learning with peers, self-
reflecting their own studies like keeping a learning diary, establishing problem-solving
skills, and making effective use of self-study time.
Most importantly, affective factors, particularly intrinsic motivation and test

anxiety, are of concern during test preparation. According to SEM results of washback
mechanism, intrinsically motivated and being less anxious during GVT preparation
had positive and significant association with positive test preparation behaviours of
using language-use oriented strategies. Moreover, this finding speaks to what students
mentioned in focus groups. As a result, reducing anxious feelings and becoming more
intrinsically motivated in language learning are beneficial to students’ learning
outcomes in the test preparation stage. As “English learning is a long-term issue” (Fei-
SA), it is thus important for students to develop an intrinsic motivation for learning
English grammar and vocabulary. Although grammar and vocabulary learning might
be considered as tedious, making good use of interesting activities could help students
to build their intrinsic motivation. Possible activities include reading English stories
and novels, trying to use knowledge of English grammar and vocabulary in various
occasions like speaking contests, and trying to communicate in English with foreigners
online and/or in real life.
In sum, these suggestions, learner autonomy in particular, are practical advice

for students to build their capacities for life-long learners (Tassinari, 2012).

Suggestions for SHSEET test design
It could be argued that the inclusion of MCQ and Sentence Completion tasks for
testing grammar and vocabulary in the SHSEET do not reflect contemporary
understandings of best practice in assessing grammar and vocabulary. In this study,
both teachers and students confirmed that MCQs suited the learning stage of junior
high school students, motivated students’ test-taking experiences, assessed extensive
language knowledge, and most importantly, reflected the tradition of summative EFL
testing in China. However, they also clearly identified their negative perceptions of
MCQs as well as Sentence Completion and expressed their feelings that those two
tasks were not meaningful assessment items. By dint of the high stakes of the SHSEET,
it is thus necessary to seek more evidence regarding the inclusion of these
decontextualised and discrete-point grammar and vocabulary tasks with questionable
assessment qualities.
Further, as teachers and students suggested in interviews, the design of GVT test
items could be promoted from two main aspects. On the one hand, GVT tasks should
reflect real-life language and experiences and contain richer language context; that is,
test design should consider both text authenticity and task authenticity (Morrow,
1991). Thus, current affairs or local culture and authentic texts should be incorporated.
However, it is important to avoid unfamiliar topics that students cannot be expected to
have knowledge of. On the other hand, test methods of GVT tasks should be
reconsidered. According to participants’ positive perceptions of GVT design
characteristics and students’ perceived challenges of test methods in GVT tasks, more
passage-based grammar and vocabulary tasks such as Cloze and Gap-filling cloze
should be adopted to assess students’ language knowledge. Alternatively, assessing
students’ grammar and vocabulary through integrated writing and speaking tasks
rather than in a separate section of the test will be beneficial. To this end, students’
expectations of having more challenging grammar and vocabulary tasks to assess their
language competence could be met.
Moreover, based on the previous improvement of test tasks, the process of the
GVT test design could be advised to teachers and students. Test designers could
instruct schools and teachers on how to design learning-oriented assessment to reach
the purpose of improving learning. Likewise, instead of making the test design process
a secret, instructing classroom teachers and students how to design GVT tasks

themselves could also encourage their involvement in assessment. In this way, it can
offer more opportunities for classroom activities and tasks to be learning-oriented and
thus achieve the goal of “assessment tasks as learning tasks” (Carless, 2007).
Suggestions for ECSCE curriculum development

Regarding teachers’ concern about the difficulty in implementing more
communicative teaching and assessment methods in Grade 9, modifying and clarifying
the ECSCE is of significance. For example, curriculum developers could make use of
the LOA cycle (Jones & Saville, 2016) to provide classroom-based assessment
examples or incorporating key principles like feedback and involvement in assessment
(Carless, 2007) in the ECSCE. To this end, they can provide practical guidance to
teachers.
Furthermore, learning gaps between different education stages need to be

considered. Taking grammar as an example, teachers complained about the
disconnection between major stages like junior high school study, senior high school
study, and university study. According to them, even the same grammar structure
could be explained and used differently in each stage. Moreover, vocabulary lists
should not be strictly stipulated for the SHSEET since students with higher language
proficiency might be capable of learning more difficult words. Therefore, the use of
certain grammar and vocabulary that are beyond the test scope should not be
considered incorrect. To achieve this purpose, English curriculum developers at
different levels should work together to refine the curriculum, especially the language
knowledge stipulation. The assumption is that EFL learning at each stage should not
be considered as static, rather, they should be dynamic and display synergies between
stages.

To summarise section 8.2, this study has significance as it contributes
theoretically, methodologically, and practically to new knowledge. Theoretically, this
study fully presented the washback mechanism in a high-stakes standardised test
context at the junior high school level in China. In addition, it combined LOA theories
to offer empirical and theoretical evidence for positive washback. Methodologically,
this sequential exploratory MMR study not only presented robust data analyses, but
also validated a student washback survey which could collect complex washback

findings, test washback mechanism, and explore potential LOA practices during test
preparation. Finally, based on the research findings, practical implications were
provided for in-service teacher training, Grade 9 teaching, Grade 9 learning, SHSEET
test design, and ECSCE curriculum development.
8.3 REFLECTION
To better guide future MMR research projects, the researcher points out three
reflections that she engaged in after data collection and analysis.
The first reflection is about survey design. As reported in Chapter Four, the face
validity of the student survey was checked via different parties. It is essential to be
careful about the wording used in surveys. Moreover, applying theoretical assumptions
in a specific research context should be handled with care. For example, although peer
assessment was assumed to be a key element in students’ involvement in assessment
(Carless, 2007), it might not be applicable to the SHSEET context or to junior high
school students in China.
The second concern is with survey distribution. As reported in Chapter Four, an

unexpected issue occurred when distributing the online survey. To cope with this
incident, the researcher printed paper versions of the online survey according to
teachers’ suggestions. However, in the future research process, the researcher will
work as closely as possible with the school administration and the target participant
cohorts in order to gain professional advice to ensure a smooth data collection process.
Finally, translation skills should be considered when reporting data. As

qualitative data collection and data analysis were conducted in the participants’ mother
tongue, the researcher confronted translation difficulties when reporting the findings.
For example, considering the originality of data transcription, although some phrases
used by teachers and students (e.g., well-就是; then-然后) were meaningless and
redundant, they were kept in translated data. Another example is about the code-
switching within one sentence which might make readers confused. This was closely
related to the teaching method of translation (e.g., car, c-a-r, 车). In addition, to present
more integrated ideas in translation, the researcher frequently added contextual
information which was indicated by using parenthesis “[]”. Besides, the free shifts
between personal pronouns might confuse readers; therefore, additional information
was added by the researcher as “(i.e.)”. Most importantly, through direct translation,

the differences between plural and singular word forms in both Chinese and English
might be presented as grammatical errors in translation. To reflect, although the
researcher and the research team were capable of providing quality translation, it
would be ideal if professional translation skills were applied.
8.4 LIMITATIONS
Despite the aforementioned significant contributions, implications, and

reflections of this study, several limitations are acknowledged.
First, the inherent subjectivity of participants’ perceptions in this study is

acknowledged. Even though teachers agreed that the ECSCE and Test Specifications
shared similar learning-oriented characteristics, and the test design of the GVT reflects
learning-oriented principles, these are recognised as opinions rather than fact. It is
possible that some learning-oriented principles in the ECSCE have not necessarily
resulted in learning-oriented Test Specifications.
Second, the survey did not investigate teachers’ perceptions of implementing

LOA principles especially the challenges since it went beyond the current research
scope. In particular, the time for transcription, translation, and analysis of the
qualitative data was quite limited since the questionnaire distribution was intended to
be undertaken directly after the test administration. That said, this could by no means
weaken the answers to RQ2 as qualitative findings and quantitative LOA practices
provided rich information.
Third, two concerns were associated with statistical analyses of students’

learning outcomes (i.e., test performance). On the one hand, instead of using students’
test scores on the four grammar and vocabulary tasks that make up the GVT, their
composite SHSEET scores which were self-reported were applied in statistical
analyses. As the study focused on GVT washback, it would be better and more
meaningful to use specific GVT achievements in statistical analyses. However, it was
unrealistic since only overall SHSEET scores were reported to students. On the other
hand, the single use of self-reported SHSEET scores could be viewed with caution as
it could not predict students’ entire learning outcomes in junior high schools. In fact,
LOA also emphasises the value of formative assessment results in understanding
students’ achievements. Therefore, in order to achieve more generalisable results,

other academic performance, that is, students’ performance in formative assessment
should be considered.
The fourth limitation is about the statistical models structured in this study.
Primarily, although the SEM model of GVT washback mechanism was constructed on
the bases of theoretical conceptualisation (Bailey, 1996; Green, 2007a; Hughes, 1993;
Wolf & Smith, 1995), qualitative results, and empirical studies (Dong, 2020; Xie,
2015a), the LOA practices were not included in the complex SEM model of the GVT
washback mechanism. Moreover, discriminate validity which could help to make
distinctions between constructs was not tested in the CFA model of LOA practices,
but the researcher was aware of this.
8.5 FUTURE DIRECTIONS
This study has investigated the washback of the GVT on teaching and learning,
and the relationships between those washback factors. Most importantly, this study
explored the possibility of using LOA theorisation to explore positive washback and
negative washback opportunities and challenges. Based on the research findings, the
researcher proposes the following areas worth investigating in future studies:
Exploring the macro level of GVT washback value

Due to the limited evidence of understanding as well as using of test reference
documents by teachers and students, the researcher suggests two possibilities to be
considered to further explore the macro level of washback value. On the one hand, test
designers and curriculum developers were more suitable than teachers and students for
giving opinions on related curriculum reference ideas, test design intentions in
particular (Qi, 2004b). This can be supported by Zhang’s previous comment of the
Teaching and Research Officer’s (TRO) “well-thumbed” ECSCE since “the TRO
needs to use it [for test design and in-service teachers’ teaching guidance purposes]”
(Interview). Therefore, TROs and test designers can also be included to further
investigate “curriculum reference” and “test design” perceptions. On the other hand,
analysing candidates’ responses to the GVT and the content of GVT tasks will provide
direct evidence for test design characteristics. These two methods will thus provide
valuable evidence for the macro level of washback value conceptualised in both Green
(2007a) and the new washback model incorporating LOA.

Teacher surveys on washback and opportunities for and challenges of the
incorporation of LOA principles during test preparation
Due to the scope of this PhD project, only a student survey was conducted, but
empirically, designing and collecting data of a teacher survey is still necessary to gain
a more comprehensive understanding of the complexity of the GVT washback and
LOA possibilities. Therefore, future studies should be developed with a particular
focus on teachers’ perspectives. Additionally, a teacher survey regarding LOA
opportunities and challenges during high-stakes standardised test preparation can be
of high importance to explore the practical factors that influence a positive washback
intention. To this end, it will provide a wealth of evidence of LOA from a teacher
perspective and further probe the qualitative findings of this study.
Including other junior high school grades in similar washback and LOA
studies
As the current research was conducted in the last semester of Grade 9 when
students were about to sit for the test, it was thus unable to present the influence of
SHSEET on teaching and learning in a long term. As this study highlighted, LOA test
preparation practices such as feedback and washback effect could be longitudinal
(Yang et al., 2013), it is thus worth considering longitudinal washback and LOA
studies to further explore the potential for positive washback.
Action research studies on implementing LOA principles in summative test

preparation
According to the current research findings, it is important to help teachers
become familiar with learning-oriented teaching as well as assessment principles. In
order for teachers to be able to teach in a learning-oriented way during test preparation,
an action research study could provide advice for feeding forward and implementing
the LOA cycle (Jones & Saville, 2016) in various test preparation contexts. This will
be a valuable research method for teachers to implement and then reflect on the impact
of incorporating LOA principles (Zeng et al., 2018).
The mitigating effect of mock SHSEET tests on washback

The effect of multi-time tests such as Yi Zhen in this context is also important to
explore. According to School B students’ focus group accounts, the existence of this
major SHSEET mock test significantly influenced their test anxiety. For example, in
the NMET context, multiple mock tests are proved to influence test washback (Hou,

2018; Zhang, 2019), and teachers as well as students held both positive and negative
attitudes towards this reform (Chen et al., 2018; Zhang et al., 2018). As such, the effect
of those mock SHSEET tests such as Yi Zhen is worth exploring in future studies.
Exploring the reasons behind Pattern 4 in MCA results

As reported in Chapter Six, four patterns of GVT preparation were identified in
the current GVT study, but Pattern 4 remained largely unexplained from the current
dataset. As the washback literature conceptualised and the other three patterns
presented, participants’ test preparation effort was closely related to the perceptions of
test importance and test difficulty. However, Pattern 4 fell beyond the theoretical
conceptualisation and appeared to suggest that other factors should be considered
when examining washback intensity. Therefore, to explain Pattern 4, future studies
such as follow-up interviews with the survey participants (i.e., the survey cases of
Pattern 4) will be helpful.
Using MCA for exploring both in-class and extracurricular test preparation
Even though Pattern 4 in MCA results needs further investigations, the MCA
results in Chapter Six highlighted the feasibility of applying this quantitative method
in the analysis of washback intensity of tests. Therefore, future studies can consider
using MCA to quantify both in-class and extra-curricular test preparation of the
SHSEET or any other high-stakes standardised EFL tests to explore rich washback
intensity patterns.
Investigating the washback of the SHSEET as a whole

As pointed out in the limitations, both teachers and students tended to express
their opinions on and practices of the SHSEET as a whole rather than separating
grammar and vocabulary test preparation. With this concern in mind, the researcher
thus suggests washback research on the entire SHSEET, which will certainly provide
richer and more practical insights into its washback on teaching and learning.
In section 8.5, the researcher proposes potential future research topics on

washback and LOA from different test stakeholder groups. In sum, the application of
this washback scale, the systematic washback mechanism, and the dynamic of LOA
test preparation pertaining to other high-stakes standardised tests need further
exploration.

8.6 OVERALL CONCLUSIONS OF THE STUDY
This is a washback study which explored the influence of the grammar and
vocabulary testing in the SHSEET on EFL teaching and learning in junior high schools
in China. Considering the high-stakes test feature and the under-researched topic of
the SHSEET, particularly the GVT section, the researcher aimed to reveal the actual
effect of the GVT through an LOA perspective. This study, applying an exploratory
sequential MMR design, used qualitative data to inform the quantitative design of a
student survey. Green’s (2007a) washback model, Carless’ (2007) LOA
conceptualisation, and Jones and Saville’s (2016) LOA cycle helped to frame the
theoretical foundation for this study.
Generally, both qualitative and quantitative data pointed to the complexity of

washback in the GVT context. At the macro level of washback value, the learner-
centred EFL teaching and assessment principles as stipulated in the ECSCE failed to
achieve its intended washback. Regardless of teachers’ effort of keeping learner-
centred teaching in mind, generally negative washback was identified as Grade 9
teachers failed or decreased to implement teaching and assessment principles
stipulated in the ECSCE, and participants’ use of official test reference documents
reflected their “narrowing of the curriculum” during test preparation. At the micro
level of washback value, it was found that the GVT exerted both positive and negative
washback on teaching and learning in various aspects. In general, both similarities and
differences of washback value among participating teachers and students were
identified. However, even though ‘teaching to the test’ and ‘learning to the test’ were
found to negatively influence grammar and vocabulary study to a great extent, similar
to many other studies and test contexts, GVT preparation practices such as the various
language-use oriented learning strategies taught and applied during test preparation
provides evidence for positive washback.
Complex washback evidence was also supported by findings of washback

intensity and washback mechanism. As for washback intensity, qualitative data
revealed that intense test preparation of grammar and vocabulary was conducted in
classes; while qualitative and quantitative results showed that extra-curricular test
preparation of the GVT differed according to different GVT tasks and students’
varying perceptions of test difficulty and test importance. In further investigation of
washback mechanism, it was found that students’ perceptions of GVT design

characteristics that reflected language use features and their perceptions of test use
purposes positively influenced their test preparation practices of language-use oriented
learning strategies and test preparation effort through factors of intrinsic motivation
and decreased test anxiety. Moreover, both test preparation practices further displayed
associations with students’ test performance. However, while the former showed a
positive washback, the latter indicated a negative washback of the GVT on students’
grammar and vocabulary learning.
In terms of the LOA opportunities and challenges, various perceptions and

practises were drawn from both qualitative and quantitative data. In exploring the
opportunities for incorporating LOA principles in GVT preparation, potential practices
were found in the qualitative stage and tested in the quantitative survey. However, as
GVT preparation courses were generally teacher-dominated, the study failed to draw
detailed evidence of LOA by referring to the LOA cycle (Jones & Saville, 2016).
Nonetheless, participants had different beliefs about LOA opportunities and they
carried out identifiable LOA strategies of which a unique and dynamic LOA model in
the summative assessment context in Chinese junior high schools was verified through
CFA, which constituted four factors of classroom interaction, involvement in
assessment, learner autonomy, and feedback. However, challenges like efficient use
of class time and student language proficiency were perceived by participants as
hindering their incorporation of LOA principles in GVT preparation.
To conclude, it was evident in the study that the GVT influenced the grammar
and vocabulary teaching and learning in Grade 9 in both positive and negative
directions. However, it was hard to judge whether positive washback of the GVT
outweighed its negative washback or vice versa. The researcher would like to re-iterate
and conclude the study by reminding the readers of the aim of this study and the
immediate call for bringing about positive washback (Bailey, 1996) through testing
and the synergy between instruction, testing, and learning (Jones & Saville, 2016;
Turner & Purpura, 2016). Therefore, as the study aimed to unpack potentials for
positive washback or LOA opportunities, it thus offers valuable insights into
reconciling the tension between assessment, teaching, and learning.

Bibliography
Ablard, K. E., & Lipschultz, R. E. (1998). Self-regulated learning in high-achieving

students: Relations to advanced reasoning, achievement goals, and gender.
Journal of Educational Psychology, 90(1), 94-101.
Adamson, B. (2004). China’s English: A history of English in Chinese education.
Hong Kong University Press.
Al-Wadi, H. M. (2020). Bahrain's secondary EFL teachers’ beliefs of English
language national examination: ‘How it made teaching different?’. International
Journal of Instruction, 13(1).
Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of
washback. Language Testing, 13(3), 280-297.
Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied linguistics, 14(2),
115-129.
Alemi, M., & Miraghaee, A. (2011). A comparative study of testing grammar
knowledge of Iranian students between cloze and multiple-choice tests. Journal
on English Language Teaching, 1(3), 72-79.
Ali, M. M., & Hamid, M. O. (2020). Teaching English to the test: Why does negative
washback exist within secondary education in Bangladesh? Language
Assessment Quarterly, 17(2), 129-146.
Aljaafreh, A., & Lantolf, J. P. (1994). Negative feedback as regulation and second
language learning in the zone of proximal development. The Modern Language
Journal, 78(4), 465-483.
Anderson, N. J. (1991). Individual differences in strategy use in second language
reading and testing. The Modern Language Journal, 75(4), 460-472.
Andrews, S., Fullilove, J., & Wong, Y. (2002). Targeting washback—a case-study.
System, 30(2), 207.
Argüelles Álvarez, I. (2013). Large-scale assessment of language proficiency:
Theoretical and pedagogical reflections on the use of multiple-choice tests.
International Journal of English Studies, 13(2), 21-38.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing
and developing useful language tests (Vol. 1). Oxford University Press.
Bailey, K. M. (1996). Working for washback: A review of the washback concept in
language testing. Language Testing, 13(3), 257-279.
Bailey, K. M. (1999). Washback in language testing (TOEFL Monograph 15).
Educational Testing Service.
Baker, T. L. (1994). Doing social research (2nd ed.). McGraw-Hill.
Bibliography 287
Ballou, D., & Springer, M. G. (2015). Using student test scores to measure teacher
performance: Some problems in the design and implementation of evaluation
systems. Educational Researcher, 44(2), 77-86.
Barksdale-Ladd, M. A., & Thomas, K. F. (2000). What’s at stake in high-stakes
testing. Journal of Teacher Education, 51(5), 384.
Bell, C., & Harris, D. (2013). Evaluating and assessing for learning (Revised ed.).
Routledge.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological
bulletin, 107(2), 238-246.
Bentler, P. M., & Chou, C.-P. (1987). Practical issues in structural modeling.
Sociological Methods & Research, 16(1), 78-117.
Berwick, R., & Ross, S. (1989). Motivation after matriculation: Are Japanese
learners of English still alive after exam hell? JALT Journal, 11(2), 193-210.
Biber, D., & Gray, B. (2013). Discourse characteristics of writing and speaking task
types on the TOEFL iBT® test: A lexico-grammatical analysis (TOEFL iBT
Research Report-19). Educational Testing Service.
Birjandi, P., & Siyyari, M. (2010). Self-assessment and peer-assessment: A
comparative study of their effect on writing performance and rating accuracy.
Iranian Journal of Applied Linguistics, 13(1), 23-45.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in
Education: Principles, Policy & Practice, 5(1), 7-74.
Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment.
Educational Assessment, Evaluation and Accountability (formerly: Journal of
Personnel Evaluation in Education), 21(1), 5-31.
Booth, D., & Saville, N. (2000). Development of new item-based tests: The gapped
sentences in the revised CPE paper 3. Research Notes, 2, 10-11.
Bousfield, K., & Ragusa, A. T. (2014). A sociological analysis of Australia's
NAPLAN and My School Senate Inquiry submissions: The adultification of
childhood? Critical Studies in Education, 55(2), 170-185.
Boyatzis, R. E. (1998). Transforming qualitative information: Thematic analysis and
code development. Sage Publications.
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative
Research in Psychology, 3(2), 77-101.
Braun, V., Clarke, V., Hayfield, N., & Terry, G. (2019). Thematic analysis. In P.
Liamputtong (Ed.), Handbook of Research Methods in Health Social Sciences
(1st ed., pp. 843-860). Springer Singapore.
Brown, J. D. (2000). University entrance examinations: Strategies for creating
positive washback on English language teaching in Japan. Shiken: JALT Testing
& Evaluation SIG Newsletter, 3(2), 2-7.
288 Bibliography
Brown, T. A. (2006). Confirmatory factor analysis for applied research. Guilford
Publications.
Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd ed.).
Guilford Publications.
Buck, G. (1988). Testing listening comprehension in Japanese university entrance
examinations. JALT Journal, 10(1), 15-42.
Burrows, C. (2004). Washback in classroom-based assessment: A study of the
washback effect in the Australian adult migrant English program. In L. Cheng,
Y. Watanabe, & A. Curtis (Eds.), Washback in language testing: Research
contexts and methods (pp. 113-128). Lawrence Erlbaum Associates.
Canale, M. (1983a). From communicative competence to communicative language
pedagogy. In J. C. Richards & R. W. Schmidet (Eds.), Language and
communication (Vol. 1, pp. 2-27). Longman.
Canale, M. (1983b). On some dimensions of language proficiency. In J. W. Oller
(Ed.), Issues in language testing research (pp. 333-342). Newbury House.
Canale, M., & Swain, M. (1980). Theoretical bases of com-municative approaches to
second language teaching and testing. Applied linguistics, 1(1), 1-47.
Carless, D. (2007). Learning-oriented assessment: Conceptual bases and practical
implications. Innovations in Education and Teaching International, 44(1), 57-
66.
Carless, D. (2015). Exploring learning-oriented assessment processes. Higher
Education, 69(6), 963-976.
Carless, D., Joughin, G., & Mok, M. M. C. (2006). Learning-oriented assessment:
Principles and practice. Assessment & Evaluation in Higher Education, 31(4),
395-398.
Carroll, B. J. (1980). Testing communicative performance: An interim study.
Pergamon Press.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate
Behavioral Research, 1(2), 245-276.
Cazden, C. B. (2001). Classroom discourse: The language of teaching and learning
(2nd ed.). Heinemann.
Celce-Murcia, M. (2007). Towards more context and discourse in grammar
instruction. TESL-EJ, 11(2), 1-6.
Celce-Murcia, M., & Larsen-Freeman, D. (1999). The grammar book: An ESL/EFL
teacher's course (2nd ed.). Heinle/Cengage Learning.
Chapman, D. W., & Snyder, C. W. (2000). Can high stakes national testing improve
instruction: Reexamining conventional wisdom. International Journal of
Educational Development, 20(6), 457-474.
Chen, G. (2007). An oral English exam exercise eystem: Research and design.
Modern Educational Technology, 17(08), 68-71, 78.
Bibliography 289
Chen, Y., Cai, J., & Hu, L. (2018). The washback effect of the new model of foreign
language examination in NCEE. Foreign Language Research, 1, 79-85.
Cheng, L. (1997). How does washback influence teaching? Implications for Hong
Kong. Language and Education, 11(1), 38-54.
Cheng, L. (1998). Impact of a public english examination change on students’
perceptions and attitudes toward their English learning. Studies in Educational
Evaluation, 24(3), 279-301.
Cheng, L. (1999). Changing assessment: Washback on teacher perceptions and
actions. Teaching and Teacher Education, 15(3), 253-271.
Cheng, L. (2005). Changing language teaching through language testing: A
washback study. Cambridge University Press.
Cheng, L. (2008a). The key to success: English language testing in China. Language
Testing, 25(1), 15-37.
Cheng, L. (2008b). Washback, impact and consequences. In E. Shohamy & N. H.
Hornberger (Eds.), Encyclopedia of language education. Volume 7: Language
testing and assessment (2nd ed., pp. 349–364). Springer Science and Business
Media LLC.
Cheng, L., Andrews, S., & Yu, Y. (2011). Impact and consequences of school-based
assessment (SBA): Students’ and parents’ views of SBA in Hong Kong.
Language Testing, 28(2), 221-249.
Cheng, L., & Curtis, A. (2004). Washback or backwash: A review of the impact of
testing on teaching and learning. In L. Cheng, Y. Watanabe, & A. Curtis (Eds.),
Washback in language testing: Research contexts and methods (pp. 3-17).
Lawrence Erlbaum Associates, Inc.
Cheng, L., & Curtis, A. (2010). English language assessment and the Chinese
learner. Routledge.
Chongqing Municipal People’s Government Network. (2018). 305,000 students in
our city have over 60% of today’s high school entrance examination (woshi
305,000 xuesheng jinri zhongkao chao liucheng kaosheng keshang putong
gaozhong). Retrieved January 12 from http://jw.cq.gov.cn/Item/29940.aspx
Chongqing Municipal People’s Government. (2015). The introduction of Chongqing
(chongqingshi jianjie). Retrieved December 12 from
http://www.cq.gov.cn/cqgk/82835.shtml
Chongqing Zhongkao. (2017). Interpreting the SHSEE policy of associated areas
and unassociated areas in Chongqing (2018 nian chongqing zhongkao zhengce
jiedu zhi lianzhaoqu yu fei lianzhaoqu). Retrieved December 21 from
http://cq.zhongkao.com/e/20170814/59914e842191d.shtml
Chou, M.-H. (2019). The impact of the English listening test in the high-stakes
national entrance examination on junior high school students and teachers.
International Journal of Listening, 1-19.
290 Bibliography
Chrzanowska, J. (2002). Interviewing groups and individuals in qualitative market
research. Sage.
Clarke, V., & Braun, V. (2017). Thematic analysis. The Journal of Positive
Psychology, 12(3), 297-298.
Cohen, A. D. (2013). Using test-wiseness strategy research in task development. In
A. J. Kunnan (Ed.), The companion to language assessment (pp. 893–905).
Wiley/Blackwell.
Collins, L., Halter, R. H., Lightbown, P. M., & Spada, N. (1999). Time and the
distribution of time in L2 instruction. 33(4), 655-680.
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). L.
Erlbaum Associates.
Corno, L., & Mandinach, E. B. (1983). The role of cognitive engagement in
classroom learning and motivation. Educational psychologist, 18(2), 88-108.
Creswell, J. W. (2011). Controversies in mixed methods research. In N. K. Denzin &
Y. S. Lincoln (Eds.), The Sage handbook of qualitative research (Vol. 4, pp.
269-283).
Creswell, J. W. (2013). Qualitative inquiry and research design: Choosing among
five approaches (Third ed.). SAGE Publications.
Creswell, J. W. (2015). Educational research: Planning, conducting, and evaluating
quantitative and qualitative research (5th ed.). Pearson Education Inc.
Creswell, J. W., & Plano Clark, V. L. (2011). Designing and conducting mixed
methods research (2nd ed.). SAGE Publications.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.
Psychometrika, 16(3), 297-334.
Cui, D. (2006). The principles and trends of proposition in Senior High School
Entrance English Test under the New English Curriculum. Journal of Shanxi
Normal University (Philosophy and Social Sciences Edition), 35(S1), 374-376.
Dam, L. (1995). Learner autonomy 3: From theory to classroom practice. Authentik.
Damankesh, M., & Babaii, E. (2015). The washback effect of Iranian high school
final examinations on students’ test-taking and test-preparation strategies.
Studies in Educational Evaluation, 45, 62-69.
Danielson, C. (2008). Assessment for learning: For teachers as well as students. In C.
Dwyer (Ed.), The Future of Assessment (Vol. Routledge, pp. 191-213).
Routledge.
Dávid, G. (2007). Investigating the performance of alternative types of grammar
items. Language Testing, 24(1), 65-97.
Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in
human behavior. Plenum.
Bibliography 291
Deci, E. L., & Ryan, R. M. (2010). Intrinsic motivation. In I. B. Weiner & W. E.
Craighead (Eds.), The Corsini encyclopedia of psychology (pp. 1-2).
Delett, J. S., Barnhardt, S., & Kevorkian, J. A. (2001). A framework for Portfolio
Assessment in the foreign language classroom. 34(6), 559-568.
Dello-Iacovo, B. (2009). Curriculum reform and ‘Quality Education’ in China: An
overview. International Journal of Educational Development, 29(3), 241-249
Deng, D. (2007). An exploration of the relationship between learner autonomy and
English proficiency. Asian EFL Journal, 24(4), 24-34.
Deng, Y. (2018). A study on the washback effects of the Junior Secondary English
Achievement Graduation Test of Zhangjiajie Prefecture [Master’s thesis, Hunan
Normal University]. Changsha, Hunan.
Ding, R. (2014). Validity study on the cloze test of Shanxi Senior High School
English Entrance Exam from 2009 to 2013 [Master’s thesis, Northwest
University]. Xi’an, Shanxi.
Docherty, C. (2015). Revising the use of English component in FCE and CAE.
Research Notes, 62, 15-20.
Doe, C., & Fox, J. (2011). Exploring the Testing Process: Three Test Takers’
Observed and Reported Strategy Use over Time and Testing Contexts.
Canadian Modern Language Review, 67(1), 29-54.
Dong, M. (2020). Structural relationship between learners’ perceptions of a test,
learning practices, and learning outcomes: A study on the washback mechanism
of a high-stakes test. Studies in Educational Evaluation, 64, 100824.
Education History Research Group in Teaching Materials Research Institute. (2008).
The development of English curriculum (syllabus) for Chinese primary and
secondary schools in China in 20th century. Retrieved September 22 from
http://old.pep.com.cn/dy_1/
Ellis, R. (2002). The place of grammar instruction in the second/foreign curriculum.
In E. Hinkel & S. Fotos (Eds.), New perspectives on grammar teaching in
second language classrooms (pp. 17-34). Erlbaum.
Ellis, R. (2006). Current issues in the teaching of grammar: An SLA perspective.
TESOL Quarterly, 40(1), 83-107.
Erfani, S. S. (2012). A comparative washback study of IELTS and TOEFL iBT on
teaching and learning activities in preparation courses in the Iranian context.
English Language Teaching, 5(8), 185-195.
Everhard, C. J. (2015). The assessment-autonomy relationship. In C. J. Everhard &
L. Murphy (Eds.), Assessment and autonomy in language learning (pp. 8-34).
Palgrave Macmillan.
Fan, J., & Ji, P. (2014). Test candidates’ attitudes and their test performance: The
case of the Fudan English Test. University of Sydney Papers in TESOL, 9, 1-35.
292 Bibliography
Fan, Y. (2015). The globalization and localization of English from the perspective of
English as a lingua franca and implications for “China English” and English
language education in China. Contemporary Foreign Languages Studies, 6, 29-
33.
Ferman, I. (2004). The washback of an EFL national oral matriculation test to
teaching and learning. In L. Cheng, Y. Watanabe, & A. Curtis (Eds.), Washback
in language testing: Research contexts and methods (pp. 191-210). Lawrence
Erlbaum Associates, Inc.
Field, A. P. (2009). Discovering statistics using SPSS (3rd ed.). SAGE Publications.
Fink, A. (2009). How to conduct surveys: A step-by-step guide (4th ed.). SAGE.
Fotos, S. (1994). Integrating grammar instruction and communicative language use
through grammar consciousness-raising tasks. TESOL Quarterly, 28(2), 323-
351.
Fotos, S., & Ellis, R. (1991). Communicating about grammar: A task-based
approach. TESOL Quarterly, 25(4), 605-628.
Franklin, C., & Ballan, M. (2001). Reliability and validity in qualitative research. In
B. A. Thyer (Ed.), The handbook of social work research methods. SAGE
Publications, Inc.
Gan, Z. (2009). IELTS preparation course and student IELTS performance: A case
study in Hong Kong. RELC Journal, 40(1), 23-41.
Gardner, R. C. (1985). Social psychology and second language learning: The role of
attitudes and motivation. Edward Arnold.
Gardner, R. C., Lalonde, R. N., & Moorcroft, R. (1985). The role of attitudes and
motivation in second language learning: Correlational and experimental
considerations. Language Learning, 35(2), 207-227.
Gardner, R. C., & Lambert, W. E. (1972). Attitudes and motivation in second-
language learning. Newbury House Pubs.
Gates, S. (1995). Exploiting washback from standardized tests. In J. D. Brown & S.
O. Yamashata (Eds.), Language testing in Japan (pp. 107-112). Japan
Association for Language Teaching.
Geng, Y. (2013). A study of the washback of JSEAGT listening on secondary school
EFL teaching. [Master’s thesis, Ludong University]. Yantai, Shandong.
Geranpayeh, A. (2007). Using structural equation modelling to facilitate the revision
of high stakes testing: The case of CAE. Research Notes, 30, 8–12.
Goffman, E. (1971). The presentation of self in everyday life. Penguin.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world.
Annual review of psychology, 60, 549-576.
Green, A. (2006a). Washback to the learner: Learner and teacher perspectives on
IELTS preparation course expectations and outcomes. Assessing Writing, 11(2),
113-134.
Bibliography 293
Green, A. (2006b). Watching for washback: Observing the influence of the
International English Language Testing System academic writing test in the
classroom. Language Assessment Quarterly, 3(4), 333-368.
Green, A. (2007a). IELTS washback in context: Preparation for academic writing in
higher education. Cambridge University Press.
Green, A. (2007b). Washback to learning outcomes: A comparative study of IELTS
preparation and university pre-sessional language courses. Assessment in
Education: Principles, Policy & Practice, 14(1), 75-97.
Green, A. (2013). Washback in language assessment. International Journal of
English Studies, 13(2), 39-51.
Green, A. (2014). The Test of English for Academic Purposes (TEAP) impact study:
Report 1 - preliminary questionnaires to Japanese high school students and
teachers. Eiken Foundation of Japan.
Greenacre, M. J. (1991). Interpreting multiple correspondence analysis. Applied
Stochastic Models and Data Analysis, 7(2), 195-210.
Greenacre, M. J. (2017). Correspondence analysis in practice (3 ed.). Chapman and
Hall/CRC.
Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual
framework for mixed-method evaluation designs. Educational Evaluation and
Policy Analysis, 11(3), 255-274.
Gu, X., & Saville, N. (2012). Impact of Cambridge English: Key for Schools and
Preliminary for Schools–parents’ perspectives in China. Research Notes, 50, 48-
56.
Gu, X., Zhang, Z., & Liu, X. (2014). An empirical study of the innovated CET
washback on students’ extra-curricular learning process based on students’
learning diaries. Journal of PLA University of Foreign Languages, 35(5), 32-39,
159.
Gu, Y. (2012). English curriculum and assessment for basic education in China. In J.
Ruan & C. Leung (Eds.), Perspectives on teaching and learning English literacy
in China (Vol. 3, pp. 35-50). Springer Netherlands.
Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? An
experiment with data saturation and variability. Field methods, 18(1), 59-82.
Guo, S., Guo, Y., Luke, A., Dooley, K., & Mu, G. M. (2019). Market economy,
social change, and educational inequality: Notes for a critical sociology of
Chinese education. In G. M. Mu, K. Dooley, & A. Luke (Eds.), Bourdieu and
Chinese education: Inequality, competition, and change (pp. 20-44). Routledge.
Gyllstad, H., Vilkaitė, L., & Schmitt, N. (2015). Assessing vocabulary size through
multiple-choice formats: Issues with guessing and sampling rates. ITL-
International Journal of Applied Linguistics, 166(2), 278-306.
Haertel, E. (1992). Performance assessment. In M. C. Alkin (Ed.), Encyclopedia of
educational research (6th ed., pp. 984-989). Macmillan.
294 Bibliography
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate data
analysis with readings (3rd ed.). Prentice Hall.
Hair, J. F., Black, B. J., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2006).
Multivariate data analysis (6th ed.). Prentice Hall.
Halai, N. (2007). Making use of bilingual interview data: Some experiences from the
field. Qualitative Report, 12(3), 344.
Halcomb, E. J., & Davidson, P. M. (2006). Is verbatim transcription of interview data
always necessary? Applied Nursing Research, 19(1), 38-42.
Hall, J. K., & Walsh, M. (2002). 10. Teacher-student interaction and language
learning. Annual Review of Applied Linguistics, 22, 186-203.
Halleck, G. B. (1992). The oral proficiency interview: Discrete point test or a
measure of communicative language ability? Foreign Language Annals, 25(3),
227-231.
Halliday, M. A. K. (2004). An introduction to functional grammar (3rd ed.). Arnold.
Hamp-Lyons, L. (1997). Washback, impact and validity: Ethical concerns. Language
Testing, 14(3), 295-303.
Hamp-Lyons, L., & Green, T. (2014, October). Applying a concept model of
learning-oriented language assessment to a large-scale speaking test.
Presentation at the Roundtable on Learning-Oriented Assessment in Language
Classrooms and Large-Scale Contexts, Teachers College, Columbia University,
New York.
Harlen, W., & Deakin-Crick, R. (2003). Testing and motivation for learning.
Assessment in Education: Principles, Policy & Practice, 10(2), 169-207.
Harrington, D. (2009). Confirmatory factor analysis. Oxford University Press.
Harrington, M. (2018). Measuring lexical facility: The timed yes/no test. In Lexical
facility: Size, recognition speed and consistency as dimensions of second
language vocabulary knowledge (pp. 95-119). Palgrave Macmillan UK.
Harrison, J. (2015). The English grammar profile. In J. Harrison & F. Barker (Eds.),
English profile in practice (Vol. 5, pp. 28-48). Cambridge University Press.
Hasselgreen, A., Drew, I., & Sørheim, B. (2012). Understanding the language
classroom. Fagbokforlaget, Bergen.
Hatch, J. A. (2002). Doing qualitative research in education settings. State
University of New York Press.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational
Research, 77(1), 81-112.
Hawkey, R. (2006). Impact theory and practice: Studies of the IELTS test and
Progetto Liugue 2000. Cambridge University Press.
Bibliography 295
He, L. (2010). The Graduate School Entrance English Examination. In L. Cheng &
A. Curtis (Eds.), English language assessment and the Chinese learner (pp. 145-
157). Routledge.
He, Q. (2001). English language education in China. In S. J. Baker (Ed.), Language
policy: Lessons from global models ( 1st ed., pp. 225-231). Monterey Institute of
International Studies.
He, Y. (2015). On the usefulness of Test of English for Xiamen Senior High School
Entrance [Master’s thesis, Minnan Normal University]. Zhangzhou, Fujian.
Henning, G. (1991). A study of the effects of contextualization and familiarization on
responses to the TOEFL vocabulary test items. Educational Testing Service.
Hill, H. C., Rowan, B., & Ball, D. L. (2005). Effects of teachers’ mathematical
knowledge for teaching on student achievement. 42(2), 371-406.
Ho, R. (2006). Handbook of univariate and multivariate data analysis and
interpretation with SPSS. Chapman & Hall/CRC.
Hoffman, D. L., & De Leeuw, J. (1992). Interpreting multiple correspondence
analysis as a multidimensional scaling method. Marketing Letters, 3(3), 259-
272.
Hoffman, D. L., & Franke, G. R. (1986). Correspondence analysis: graphical
representation of categorical data in marketing research. Journal of Marketing
Research, 23(3), 213-227.
Holec, H. (1981). Autonomy and foreign language learning. Pergamon.
Hou, Y. (2018). A study on the washback effect of the reform of SHMET Listening
and Speaking Test. Technology Enhanced Foreign Language Education(5), 23-
29.
Hu, G. (2005a). English language education in China: Policies, progress, and
problems. Language Policy, 4(1), 5-24.
Hu, G. (2005b). Professional development of secondary EFL teachers: Lessons from
China. Teachers College Record, 107(4), 654-705.
Hu, L-t., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance
structure analysis: Conventional criteria versus new alternatives. Structural
Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55.
Hu, Y. (2015). A study of washback effects of JSEAGT writing test on junior English
writing teaching and learning. [Master’s thesis, Gannan Normal University].
Ganzhou, Jiangxi.
Huang, J. (2006). Understanding factors that influence Chinese English teachers’
decision to implement communicative activities in teaching. Journal of Asia
TEFL, 3(4), 165-191.
Hughes, A. (1989). Testing for language teachers. Cambridge University Press.
Hughes, A. (1993). Backwash and TOEFL 2000 [Unpublished manuscript].
University of Reading.
296 Bibliography
Hung, S.-T. A. (2012). A washback study on e-portfolio assessment in an English as
a Foreign Language teacher preparation program. Computer Assisted Language
Learning, 25(1), 21-36.
Iraji, H. R., Enayat, M. J., & Momeni, M. (2016). The effects of self-and peer-
assessment on Iranian EFL learners’ argumentative writing performance. Theory
and Practice in Language Studies, 6(4), 716-722.
Jaeyoung, C., Kaysi Eastlick, K., Judy, M., & Daniel, W. L. L. (2012).
Understanding the language, the culture, and the experience: Translation in
cross-cultural research. International Journal of Qualitative Methods, 11(5),
652-665.
Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., & Taylor, C. (2000). TOEFL 2000
framework: A working paper (TOEFL Monograph Series MS-16). Educational
Testing Service.
Jiang, Y. (2003). English as a Chinese language. English Today, 19(2), 3-8.
Jin, Y. (2000). The washback effects of College English Test-Spoken English Test
on teaching. Foreign Language World, 118(2), 56-61.
Jin, Y. (2017). Construct and content in context: Implications for language learning,
teaching and assessment in China. Language Testing in Asia, 7(1), 12.
Jin, Y., & Cheng, L. (2013). The effects of psychological factors on the validity of
high-stakes tests. Modern Foreign Languages, 36(1), 62-69.
Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate correspondence
analysis (6th ed.). Prentice-Hall.
Johnson, R. B., & Christensen, L. B. (2012). Educational research: Quantitative,
qualitative, and mixed approaches (4th ed.). SAGE Publications.
Johnson, R. B., Onwuegbuzie, A. J., & Turner, L. A. (2007). Toward a definition of
mixed methods research. Journal of Mixed Methods Research, 1(2), 112-133.
Jones, N., & Saville, N. (2016). Learning oriented assessment: A systemic approach
(Vol. 45). Cambridge University Press.
Jöreskog, K. G., & Sörbom, D. (1989). LISREL 7: A guide to the program and
applications (2nd ed.). Spss Inc.
Joughin, G. (2005, 4-5 November). Learning oriented assessment: A conceptual
framework. Conference proceedings, Effective Learning and Teaching
Conference, Brisbane, Brisbane.
Kaiser, H. F. (1960). The application of electronic computers to factor analysis.
Educational and psychological measurement, 20(1), 141-151.
Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39, 31-36.
Kamberelis, G., & Dimitriadis, G. (2013). Focus groups: From structured interviews
to collective conversations. Routledge.
Bibliography 297
Khaniya, T. (1990). The washback effect of a textbook-based test. Edinburgh
Working Papers in Applied Linguistics, 1, 48-58.
Krashen, S. D. (2003). Explorations in language acquisition and use. Heinemann.
Kuzel, A. J. (1992). Sampling in qualitative inquiry. In B. F. Crabtree & W. L. Miller
(Eds.), Research methods for primary care, Vol. 3. Doing qualitative research
(pp. 31-44). Sage Publications.
Lamb, T. (2010). Assessment of autonomy or assessment for autonomy? Evaluating
learner autonomy for formative purposes. In A. Paran & L. Sercu (Eds.), Testing
the untestable in language education (pp. 98-119). Multilingual Matters.
Larsen-Freeman, D. (2003). Teaching language : From grammar to grammaring.
Thomson/Heinle.
Latham, H. (1886). On the action of examinations considered as a means of
selection. Deighton, Bell.
Lau, K. (2018). To be or not to be: Understanding university academic English
teachers’ perceptions of assessing self-directed learning. Innovations in
Education and Teaching International, 55(2), 201-211.
Laufer, B., Elder, C., Hill, K., & Congdon, P. (2004). Size and strength: Do we need
both to measure vocabulary knowledge? Language Testing, 21(2), 202-226.
Lemke, J. L. (1990). Talking science: Language, learning, and values. Ablex
Publishing Company.
Li, F. (2017). A study on the backwash effect of JSGT reading comprehension tests
on middle school English teaching and learning [Master’s thesis, Chongqing
Normal University]. Chongqing.
Li, H. (2009). Three rounds of test preparation strategy in Senior High School
Entrance English Test. Theory and Practice of Education, 29(11), 60-61.
Li, J. (2018). A study of washback of SHSEE (English, Shanghai) on teaching of
reading comprehension at junior middle school [Master’s thesis, Shanghai
Normal University]. Shanghai.
Li, P., Hu, Y., Xu, Q., & Li, P. (2019). Based on accomplishment to improve
language ability and focus on education to help coordinate all-round
development—An analysis of English test questions in the 2019 Shanxi Middle-
School Entrance Examinations. Theory and Practice of Education, 39(32), 3-6.
Li, X. (1990). How powerful can a language test be? The MET in China. Journal of
Multilingual & Multicultural Development, 11(5), 393-404.
Lightbown, P. M., & Spada, N. (2019). Teaching and learning L2 in the classroom:
It’s about time. Language Teaching, 1-11.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry (Vol. 75). Sage.
Linn, R. L. (1993). Educational Assessment: Expanded Expectations and Challenges.
Educational Evaluation and Policy Analysis, 15(1), 1-16.
298 Bibliography
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based
assessment: Expectations and validation criteria. Educational Researcher, 20(8),
15-21.
Little, D. G. (1996). Strategic competence considered in relation to strategic control
of the language learning process. In H. Holec, D. G. Little, & R. Richterich
(Eds.), Strategies in language learning and use: Studies towards a Common
European Framework of Reference for language learning and teaching (pp. 9-
37). Council of Europe.
Little, D. G. (2007). Language learner autonomy: Some fundamental considerations
revisited. Innovation in Language Learning and Teaching, 1(1), 14-29.
Liu, H., & Brantmeier, C. (2019). “I know English”: Self-assessment of foreign
language reading and writing abilities among young Chinese learners of English.
System, 80, 60-72.
Liu, R. (2012). A survey of the present condition in the spoken English test of
Changsha Senior High School Entrance Examination [Master’s thesis, Hunan
University]. Changsha, Hunan.
Liu, Y., & Zhao, Y. (2010). A study of teacher talk in interactions in English classes.
Chinese Journal of Applied Linguistics, 33(2), 76-86.
Luo, M. (2012). Reforming curriculum in a centralized system: An examination of
the relationships between teacher implementation of student-centered pedagogy
and high stakes teacher evaluation policies in China [Doctoral dissertation,
Columbia University]. New York.
Ma, W. (2018). A study on washback of the listening test items in Senior Secondary
School Entrance Examination of English (Jiangxi) [Master’s thesis, Jiangxi
Normal University]. Nanchang, Jiangxi.
Macmillan, F., Walter, D., & O'Boyle, J. (2014). Investigating grammatical
knowledge at the advanced level. Research Notes(55), 7-12.
Madaus, G. F. (1988). The influence of testing on the curriculum. In L. N. Tanner
(Ed.), Critical issues in curriculum: Eighty-seventh yearbook of the National
Society for the Study of Education (pp. 83-121). University of Chicago Press.
Madsen, H. S. (1983). Techniques in testing. Oxford University Press.
May, L., Nakatsuhara, F., Lam, D., & Galaczi, E. (2020). Developing tools for
learning oriented assessment of interactional competence: Bridging theory and
practice. Language Testing, 37(2), 165-188.
McChesney, R. (1999). Introduction. In N. Chomsky (Ed.), Profit over people:
Neoliberalism and global order (pp. 6-17). Seven Stories Press.
McDonnell, L. M. (2004). Politics, persuasion, and educational testing. Harvard
University Press.
McDonnell, L. M. (2013). Educational accountability and policy feedback.
Educational Policy, 27(2), 170-189.
Bibliography 299
McNamara, T. F. (1996). Measuring second language performance. Addison Wesley
Longman.
Meara, P., & Buxton, B. (1987). An alternative to multiple choice vocabulary tests.
Merriam, S. B. (2016). Qualitative research: A guide to design and implementation
(4th ed.). Jossey Bass Ltd.
Merton, R., Fiske, M., & Kendall, P. (1956). The focused interview. New York: Free
Press.
Merton, R. K., & Kendall, P. L. (1946). The focused interview. American journal of
Sociology, 51(6), 541-557.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.,
pp. 13-103). Macmillan.
Messick, S. (1994). The interplay of evidence and consequences in the validation of
performance exercises. Educational Researcher, 36, 13-23.
Messick, S. (1996). Validity and washback in language testing. Language Testing,
13(3), 241-256.
Millman, J., Bishop, C. H., & Ebel, R. (1965). An analysis of test-wiseness.
Educational psychological measurement, 25(3), 707-726.
Minichiello, V., Aroni, R., & Hays, T. (2008). In-depth interviewing: Principles,
techniques, analysis (3rd ed.). Pearson Education Australia.
Ministry of Education. (1999). Opinions on the reform of junior high school
graduates and entrance examinations (guanyu chuzhong biye shengxue kaoshi
de zhidao yijian). Ministry of Education of China. Retrieved December 12, 2017
from
http://www.moe.gov.cn/s78/A06/jcys_left/moe_706/s3321/201001/t20100128_8
1825.html
Ministry of Education. (2001). English Curriculum Standards for Full-time
Compulsory Education and Senior High Schools (Trial Version) (quanrizhi yiwu
jiaoyu putong gaoji zhongxue yingyu kecheng biaozhun (shiyangao)). Beijing
Normal University Press. http://www.tefl-china.net/2003/ca13821.htm
Ministry of Education. (2006). Compulsory Education Law of the People’s Republic
of China (zhonghua renmin gongheguo yiwu jiaoyu fa).
http://en.moe.gov.cn/Resources/Laws_and_Policies/201506/t20150626_191391.
html
Ministry of Education. (2011). English Curriculum Standards for Compulsory
Education (2011) (yiwujiaoyu yingyu kecheng biaozhun, 2011). Beijing Normal
University Press.
Ministry of Education. (2015). What we do. Retrieved December 19, 2017 from
http://en.moe.gov.cn/About_the_Ministry/What_We_Do/201506/t20150626_19
1288.html
300 Bibliography
Ministry of Education. (2019). Number of students of formal education by type and
level (geji gelei xueli jiaoyu xuesheng qingkuang). Retrieved Janaury 12, 2020
from
http://www.moe.gov.cn/s78/A03/moe_560/jytjsj_2018/qg/201908/t20190812_3
94239.html
Mizutani, S. (2009). The mechanism of washback on teaching and learning [Doctoral
dissertation, The University of Auckland]. Auckland.
Moeller, A. K., Creswell, J. W., & Saville, N. (2016). Second language assessment
and mixed methods research. Cambridge University Press.
Mok, M. M. C. (2013). Self-directed learning oriented assessments in the Asia-
Pacific (1st ed.). Springer Netherlands.
Morrow, K. (1991). Evaluating communicative tests. In S. Anivan (Ed.), Current
developments in language testing (pp. 111-118). Regional Language Centre.
Moss, E. (2001). Multiple choice questions: Their value as an assessment tool.
Current Opinion in Anesthesiology, 14(6), 661-666.
Mu, G. M., Liang, W., Lu, L., & Huang, D. (2018). Building pedagogical content
knowledge within professional learning communities: An approach to
counteracting regional education inequality. Teaching and Teacher Education,
73, 24-34.
Nardi, P. M. (2006). Doing survey research: A guide to quantitative methods (2nd
ed.). Pearson/Allyn & Bacon.
Nassaji, H., & Fotos, S. (2004). Current developments in research on the teaching of
grammar. Annual Review of Applied Linguistics, 24, 126-145.
Nastasi, B. K., Hitchcock, J. H., & Brown, L. M. (2010). An inclusive framework for
conceptulizing mixed methods design typologies: Moving toward fully
integrated synergistic research methods. In A. Tashakkori & C. Teddlie (Eds.),
Sage handbook of mixed methods in social & behavioral research (2nd ed., pp.
305-339). SAGE Publications.
Nation, P. (1990). Teaching and learning vocabulary. Newbury House.
Nation, P. (2001). Learning vocabulary in another language. Cambridge University
Press.
Nation, P. (2018). Keeping it practical and keeping it simple. Language Teaching,
51(1), 138-146.
Nation, P., & Beglar, D. (2007). A vocabulary size test. The language teacher, 31(7),
9-13.
National Bureau of Statistics of China. (2018). China statistical yearbook,
Chongqing statistics. China Statistics Press. Retrieved January 12, 2020 from
http://tjj.cq.gov.cn//tjnj/2018/indexch.htm
Bibliography 301
Nichols, P. D., Meyers, J. L., & Burling, K. S. (2009). A framework for evaluating
and planning assessments intended to improve student achievement.
Educational Measurement: Issues and Practice, 28(3), 14-23.
Nichols, S. L., Glass, G. V., & Berliner, D. C. (2006). High-stakes testing and
student achievement: Does accountability pressure increase student learning?
Education policy analysis archives, 14, 1.
Nida, E. A. (1977). The nature of dynamic equivalence in translating. Babel:
International Journal of Translation.
Nitta, R., & Gardner, S. (2005). Consciousness-raising and practice in ELT
coursebooks. ELT Journal, 59(1), 3-13.
Nunan, D. (1996). Towards autonomous learning: Some theoretical, empirical and
practical issues. In R. Pemberton, E. S. L. Li, W. W. F. Or, & H. D. Pierson
(Eds.), Taking Control: Autonomy in Language Learning (pp. 13-26). Hong
Kong University Press.
Nunnally, J. C. (1978). Psychometric theory. McGraw-Hill.
Nurweni, A., & Read, J. (1999). The English vocabulary knowledge of Indonesian
university students. English for Specific Purposes, 18(2), 161-175.
Ockey, G. J., & Choi, I. (2015). Structural Equation Modeling reporting practices for
language assessment. Language Assessment Quarterly, 12(3), 305-319.
OECD. (2019). PISA 2018: Insights and interpretations. OECD. Retrieved January
12, 2020 from
https://www.oecd.org/pisa/PISA%202018%20Insights%20and%20Interpretatio
ns%20FINAL%20PDF.pdf
Oller, J. W. (1979). Language tests at school. Longman.
Oscarson, M. (1989). Self-assessment of language proficiency: Rationale and
applications. 6(1), 1-13.
Oscarson, M. (1998, 18-20 September). Learner self-assessment of language skills:
A review of some of the issues. IATEFL Special Interest Group Symposium,
Gdansk, Poland.
Ostovar-Namaghi, S. A., & Safaee, S. E. (2017). Exploring techniques of developing
writing skill in IELTS preparatory courses: A data-driven study. English
Language Teaching, 10(3), 74-81.
Oxford, R. (1989). Use of language learning strategies: A synthesis of studies with
implications for strategy training. System, 17(2), 235-247.
Oxford, R. (1990). Language learning strategies: What every teacher should know.
Heinle and Heinle.
Oxford, R., & Nyikos, M. (1989). Variables affecting choice of language learning
strategies by university students. The Modern Language Journal, 73(3), 291-
300.
302 Bibliography
Özmen, K. S. (2011). Analyzing washback effect of SEPPPO on prospective English
teachers. Journal of Language & Linguistics Studies, 7(2), 24-51.
Pan, M., & Feng, G. (2015). On the assessment requirements of the National criteria
of teaching quality for undergraduate English majors. Foreign Languages in
China, 67(5), 11-16.
Pan, M., & Qian, D. D. (2017). Embedding corpora into the content validation of the
grammar test of the National Matriculation English Test (NMET) in China.
Language Assessment Quarterly, 14(2), 120-139.
Pan, Y.-C. (2014). Learner washback variability in standardized exit tests. TESL-EJ:
Teaching English as a Second or Foreign Language, 18(2).
Pan, Y.-C., & Newfields, T. (2011). Teacher and student washback on test
preparation evidenced from Taiwan's English certification exit requirements.
International Journal of Pedagogies & Learning, 6(3), 260-272.
Pan, Y.-C., & Roever, C. (2016). Consequences of test use: A case study of
employers' voice on the social impact of English certification exit requirements
in Taiwan. Language Testing in Asia, 6(1), 1-21.
Paribakht, T. S., & Wesche, M. (1997). Vocabulary enhancement activities and
reading for meaning in second language vocabulary acquisition. In J. Coady &
T. Huckin (Eds.), Second language vocabulary acquisition (pp. 174-200).
Cambridge University Press.
Patton, M. Q. (2015). Qualitative research & evaluation methods: Integrating theory
and practice (4th ed.). SAGE Publications, Inc.
Paulson, F. L., Paulson, P. R., & Meyer, C. A. (1991). What makes a portfolio a
portfolio. Educational leadership, 48(5).
Pazaver, A., & Wang, H. (2009). Asian students’ perceptions of grammar teaching in
the ESL classroom. The International Journal of Language Society and Culture,
27, 27-35.
Petrescu, M. C., Helms-Park, R., & Dronjic, V. (2017). The impact of frequency and
register on cognate facilitation: Comparing Romanian and Vietnamese speakers
on the Vocabulary Levels Test. English for Specific Purposes, 47, 15-25.
Phillipson, R. (2009). English in globalisation, a lingua franca or a lingua
Frankensteinia? TESOL Quarterly, 43(2), 335-339.
Popham, W. J. (1987). The merits of measurement-driven instruction. The Phi Delta
Kappan, 68(9), 679-682.
Popham, W. J. (1999). Why standardized tests don't measure educational quality.
Educational leadership, 56, 8-16.
Popham, W. J. (2001). Teaching to the test? Educational leadership, 58(6), 16-21.
Powers, W. R. (2005). Transcription techniques for the spoken word. Rowman
Altamira.
Bibliography 303
Prodromou, L. (1995). The backwash effect: From testing to teaching. ELT Journal,
49(1), 13-25.
Purpura, J. E. (2004). Assessing grammar. Cambridge University Press.
Purpura, J. E., & Turner, C. E. (2013). Learning-oriented assessment in classrooms:
A place where SLA, interaction, and language assessment interface
ILTA/AAAL Joint Symposium on “LOA in classrooms”.
Qi, L. (2004a). Has a high-stakes test produced the intended changes. In L. Cheng,
Y. Watanabe, & A. Curtis (Eds.), Washback in language testing: Research
contexts and methods (pp. 171-190). Lawrence Erlbaum.
Qi, L. (2004b). The intended washback effect of the National Matriculation English
Test in China: Intentions and reality. Foreign Language Teaching and Research
Press.
Qi, L. (2005). Stakeholders’ conflicting aims undermine the washback function of a
high-stakes test. Language Testing, 22(2), 142-173.
Qi, L. (2007). Is testing an efficient agent for pedagogical change? Examining the
intended washback of the writing task in a high-stakes English test in China.
Qi, L. (2010). Should proofreading go? Examining the selection function and
washback of the proofreading sub-test in the National Matriculation English
Test. In L. Cheng & A. Curtis (Eds.), English language assessment and the
Chinese learner (pp. 219-233). Routledge.
Rassaei, E. (2019). Tailoring mediation to learners’ ZPD: Effects of dynamic and
non-dynamic corrective feedback on L2 development. The Language Learning
Journal, 47(5), 591-607.
Rea-Dickins, P. (1991). What makes a grammar test communicative. In C. J.
Alderson & B. North (Eds.), Language testing in the 1990s: The communicative
legacy (pp. 112-131). Macmillan Publishers Limited.
Rea-Dickins, P. (1997). The testing of grammar in a second language. In C. Clapham
& D. Corson (Eds.), Encyclopedia of language and education: Language testing
and assessment (Vol. 7, pp. 87-97). Kluwer Academic.
Read, J. (1993). The development of a new measure of L2 vocabulary knowledge.
Read, J. (1995). Refining the word associates format as a measure of depth of
vocabulary knowledge. New Zealand Studies in Applied Linguistics, 1, 1-17.
Read, J. (1997). Assessing vocabulary in a second language. In C. Clapham & D.
Corson (Eds.), Encyclopedia of language and education: Language testing and
assessment (Vol. 7, pp. 99-107). Kluwer Academic.
Read, J. (2000). Assessing vocabulary. Cambridge University Press.
304 Bibliography
Read, J. (2019). Key issues in measuring vocabulary knowledge. In S. A. Webb
(Ed.), The Routledge handbook of vocabulary studies (pp. 545-560). New York,
NY.
Read, J., & Chapelle, C. A. (2001). A framework for second language vocabulary
assessment. Language Testing, 18(1), 1-32.
Regmi, K., Naidoo, J., & Pilkington, P. (2010). Understanding the processes of
translation and transliteration in qualitative research. International Journal of
Qualitative Methods, 9(1), 16-26.
Ren, Y. (2011). A study of the washback effects of the College English Test (band 4)
on teaching and learning English at tertiary level in China. International Journal
of Pedagogies & Learning, 6(3), 243-259.
Reynolds, B. L., Shih, Y.-C., & Wu, W.-H. (2018). Modeling Taiwanese adolescent
learners' English vocabulary acquisition and retention: The washback effect of
the College Entrance Examination Center's reference word list. English for
Specific Purposes, 52, 47-59.
Richards, J. C., & Schmidt, R. (2013). Longman dictionary of language teaching and
applied linguistics (4th ed.). Routledge.
Risemberg, R., & Zimmerman, B. J. (1992). Self-regulated learning in gifted
students. Roeper Review, 15(2), 98-101.
Rogoff, B. (1990). Apprenticeship in thinking: Cognitive development in social
context. Oxford university press.
Rose, D. (2008). Vocabulary use in the FCE listening test. Research Notes, 32, 9-16.
Ryan, R. M., & Deci, E. L. (2000). Intrinsic and extrinsic motivations: Classic
definitions and new directions. Contemporary Educational Psychology, 25(1),
54-67.
Saglam, A. L. G., & Farhady, H. (2019). Can exams change how and what learners
learn? Investigating the washback effect of a university English language
proficiency test in the Turkish context. Advances in Language Literary Studies,
10(1), 177-186.
Saif, S. (2006). Aiming for positive washback: A case study of international teaching
assistants. Language Testing, 23(1), 1-34.
Salamoura, A., & Unsworth, S. (2015). Learning Oriented Assessment: Putting
learning, teaching and assessment together. Modern English Teacher, 24(3), 4-7.
www.modernenglishteacher.com
Salverda, R. (2002). Language diversity and international communication. English
Today, 18(3), 3-11.
Saville, N., & Salamoura, A. (2014, October). Learning Oriented Assessment - A
systemic view from an examination provider. Presentation at the Roundtable on
Learning-Oriented Assessment in Language Classrooms and Large-Scale
Contexts, Teachers College, Columbia University, New York.
Bibliography 305
Schmitt, N. (2000). Vocabulary in language teaching. Cambridge University Press.
Schmitt, N. (2019). Understanding vocabulary acquisition, instruction, and
assessment: A research agenda. Language Teaching, 52(2), 261-274.
Schmitt, N., Nation, P., & Kremmel, B. (2020). Moving the field of vocabulary
assessment forward: The need for more rigorous test development and
validation. Language Teaching, 53(1), 109-120.
Schreiber, J. B., Nora, A., Stage, F. K., Barlow, E. A., & King, J. (2006). Reporting
structural equation modeling and confirmatory factor analysis results: A review.
The Journal of educational research, 99(6), 323-338.
Seidlhofer, B. (2005). English as a lingua franca. ELT Journal, 59(4), 339-341.
Sharif, K. S. M., & Siddiek, A. G. (2017). Critical thinking as reflected in the
Sudanese and Jordanian Secondary School Certificate English Language
Examinations. English Language Teaching, 10(5), 37-61.
Shepard, L. A. (2000). The role of assessment in a learning culture. Educational
Researcher, 29(7), 4-14.
Shi, H. (2013). A study on the validity of clozing tests in Senior High School
Entrance English Test in Shanxi province. Journal of Shanxi Normal University
(Natural Sciences Edition) Special Issue for Postgraduate Theses, 27(S1), 141-
143.
Shi, J. (2001). Tracing the historical development of curriculum policies in China's
basic education (woguo jichu jiaoyu kecheng zhengce fazhan bianhua de lishi
guiji). China Education and Research Network. Retrieved December 20, 2017
from http://www.teachercn.com/Kcgg/Lltt/2006-
1/5/20060119160119692_4.html
Shih, C.-M. (2007). A new washback model of students’ learning. Canadian Modern
Language Review, 64(1), 135-161.
Shih, C.-M. (2009). How tests change teaching: A model for reference. English
Teaching, 8(2), 188-206.
Shohamy, E. (1992). Beyond proficiency testing: A diagnostic feedback testing
model for assessing foreign language learning. The Modern Language Journal,
76(4), 513-521.
Shohamy, E. (2001). The power of tests: A critical perspective on the uses of
language tests. Pearson Education Limited.
Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test impact revisited:
Washback effect over time. Language Testing, 13(3), 298-317.
Shohamy, E., Reves, T., & Bejarano, Y. (1986). Introducing a new comprehensive
test of oral proficiency. ELT Journal, 40(3), 212-220.
Simons, H. (2009). Case study research in practice (1st ed.). SAGE.
Sinclair, J. M., & Coulthard, M. (1975). Towards an analysis of discourse: The
English used by teachers and pupils. Oxford University Press.
306 Bibliography
Sjøberg, S. (2007). Constructivism and learning. In E. Baker, B. McGaw, & P. P
(Eds.), International encyclopaedia of education (3rd ed.). Elsevier.
So, Y. (2014). Are teacher perspectives useful? Incorporating EFL teacher feedback
in the development of a large-scale international English test. Language
Assessment Quarterly, 11(3), 283-303.
Song, X. (2013). A study on washback of test formats of English reading
comprehension test of JSGT on teaching and learning in middle schools.
[Master’s thesis, Ludong University]. Yantai, Shandong.
Spolsky, B. (1990). Social aspects of individual assessment. In J. H. A. L. de Jong &
D. K. Stevenson (Eds.), Individualizing the assessment of language abilities (pp.
3-15). Multilingual Matters.
Spolsky, B. (1995). Measured words: The development of objetive language testing.
Oxford University Press.
Spratt, M. (2005). Washback and the classroom: The implications for teaching and
learning of studies of washback from exams. Language Teaching Research,
9(1), 5-29.
Squires, A. (2009). Methodological challenges in cross-language qualitative
research: A research review. International Journal of Nursing Studies, 46(2),
277-287.
Sun, C. (2010). An introduction to major university English tests and English
language teaching in China [Master’s thesis, Brigham Young University].
United States.
Sutrisno, A., Nguyen, N. T., & Tangen, D. (2014). Incorporating translation in
qualitative studies: Two case studies in education. International journal of
qualitative studies in education, 27(10), 1337-1353.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.).
Pearson/Allyn & Bacon.
Tan, C. (2020). Beyond high-stakes exam: A neo-Confucian educational programme
and its contemporary implications. Educational Philosophy and Theory, 52(2),
137-148.
Tashakkori, A., & Creswell, J. W. (2007). Editorial: The new era of mixed methods.
Journal of Mixed Methods Research, 1(1), 3-7.
Tassinari, M. G. (2012). Evaluating learner autonomy: A dynamic model with
descriptors. Studies in Self-Access Learning Journal, 3(1), 24-40.
Teddlie, C., & Tashakkori, A. (2012). Common “core” characteristics of mixed
methods research: A review of critical issues and call for greater gonvergence.
American Behavioral Scientist, 56(6), 774-788.
Teng, H.-C., & Fu, C.-W. (2019). The washback of listening tests for entrance exams
on EFL instruction in Taiwanese junior high schools. Language Education
Assessment, 2(2), 96-109.
Bibliography 307
Tsagari, D. (2009). The complexity of test washback: An empirical study. Peter Lang
GmbH.
Tsagari, D. (2011). Washback of a high-stakes English exam on teachers’
perceptions and practices. Selected papers on theoretical and applied linguistics,
19, 431-445.
Tsagari, D. (2014, October). Unplanned LOA in EFL classrooms: Findings from an
empirical study. Presentation at the Roundtable on Learning-Oriented
Assessment in Language Classrooms and Large-Scale Contexts, Teachers
College, Columbia University, New York.
Turner, C. E., & Purpura, J. E. (2016). Learning-oriented assessment in second and
foreign language classrooms. In D. Tsagari & J. Banerjee (Eds.), Handbook of
second language assessment (pp. 255-273). DeGruyter Mounton.
Turner, C. E., & Upshur, J. A. (1995). Some effects of task type on the relation
between communicative effectiveness and grammatical accuracy in intensive
ESL classes. TESL Canada Journal, 12(2), 18-31.
Uyaniker, P. (2017). Language assessment: Now and then. Eurasian Journal of
Language Education and Research, 1(1), 1-20.
Van Teijlingen, E. R., & Hundley, V. (2001). The importance of pilot studies. Social
Research Update, 35.
Verplanck, W. S. (1992). A brief introduction to the word associate test. The
Analysis of verbal behavior, 10(1), 97-123.
Vygotsky, L. S. (1986). Thought and language. MIT Press.
Wall, D. (1996). Introducing new tests into traditional systems: Insights from general
education and from innovation theory. Language Testing, 13(3), 334-354.
Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact
study. Language Testing, 10(1), 41-69.
Wallace, J. (2014). Grammar in speaking: Raising student awareness and
encouraging autonomous learning. Research Notes(56), 30-36.
Walters, J. (2004). Teaching the use of context to infer meaning: A longitudinal
survey of L1 and L2 vocabulary research. Language Teaching, 37(4), 243-252.
Wang, C., Yan, J., & Liu, B. (2014). An empirical study on washback effects of the
Internet-Based College English Test Band 4 in China. English Language
Teaching, 7(6), 26-53.
Wang, L. (2014). Quality assurance in higher education in China: Control,
accountability and freedom. Policy and Society, 33(3), 253-262.
Wang, L., & Mok, K. H. (2013). The impacts of neo-liberalism on higher education
in China. In A. Turner & H. Yolcu (Eds.), Neo-liberal educational reforms: A
critical analysis (pp. 139-163). Routledge.
Wang, Q. (2007). The national curriculum changes and their effects on English
language teaching in the People’s Republic of China. In J. Cummins & C.
308 Bibliography
Davison (Eds.), International Handbook of English Language Teaching (pp. 87-
105).
Wang, X. (2003). Education in China since 1976. McFarland & Compant, Inc.
Watanabe, Y. (1996a). Does grammar translation come from the entrance
examination? Preliminary findings from classroom-based research. Language
Testing, 13(3), 318-333.
Watanabe, Y. (1996b). Investigating washback in Japanese EFL classrooms:
problems of methodology. Australian Review of Applied Linguistics.
Supplement Series, 13(1), 208-239.
Watanabe, Y. (2004). Teacher factors mediating washback. In L. Cheng, Y.
Watanabe, & A. Curtis (Eds.), Washback in language testing: Research contexts
and methods (pp. 129-146). Lawrence Erlbaum.
Webb, S., Sasao, Y., & Ballance, O. (2017). The updated Vocabulary Levels Test:
Developing and validating two new forms of the VLT. ITL-International
Journal of Applied Linguistics, 168(1), 33-69.
Weber, K. (2003). The relationship of interest to internal and external motivation.
Communication Research Reports, 20(4), 376-383.
Weir, C. J. (2005). Language testing and validation: An evidence-based approach.
Palgrave Macmillan.
Weir, C. J. (2013). An overview of the influences on English language testing in the
United Kingdom 1913–2012. In C. J. Weir, I. Vidaković, & E. D. Galaczi
(Eds.), Measured constructs: A history of Cambridge English language
examinations 1913–2012 (Vol. 37, pp. 1-102). Cambridge University Press.
Wen, Q. (2018). The production-oriented approach to teaching university students
English in China. Language Teaching, 51(4), 526-540.
Wolf, L. F., & Smith, J. K. (1995). The consequence of consequence: Motivation,
anxiety, and test performance. Applied Measurement in Education, 8(3), 227-
242.
Wu, Y. (2017). Language education in China: Teaching foreign languages. In R.
Sybesma (Ed.), Encyclopedia of Chinese language and linguistics (Vol. 2, pp.
515-527). Brill.
Xiao, W. (2014). The intensity and direction of CET washback on Chinese college
students’ test-taking strategy use. Theory & Practice in Language Studies, 4(6),
1171-1177.
Xie, Q. (2010). Test design and use, preparation, and performance: A structural
equation modeling study of consequential validity [Unpublished doctoral
dissertation, The University of Hong Kong]. Hong Kong.
Xie, Q. (2013). Does test preparation work? Implications for score validity.
Language Assessment Quarterly, 10(2), 196-218.
Bibliography 309
Xie, Q. (2015a). Do component weighting and testing method affect time
management and approaches to test preparation? A study on the washback
mechanism. System, 50, 56-68.
Xie, Q. (2015b). “I must impress the raters!” An investigation of Chinese test-takers’
strategies to manage rater impressions. Assessing Writing, 25, 22-37.
Xie, Q., & Andrews, S. (2013). Do test design and uses influence test preparation?
Testing a model of washback with Structural Equation Modeling. Language
Testing, 30(1), 49-70.
Xu, Y., & Wu, Z. (2012). Test-taking strategies for a high-stakes writing test: An
exploratory study of 12 Chinese EFL learners. Assessing Writing, 17(3), 174-
190.
Yang, H.-C., & Plakans, L. (2012). Second language writers’ strategy use and
performance on an integrated reading-listening-writing task. TESOL Quarterly,
46(1), 80-103.
Yang, W. (2015). A Study on the washback effect of the grammar part in the Junior
Secondary English Achievement Graduation Test (Jiangxi) [Master’s thesis,
Guangdong University of Foreign Studies]. Guangdong.
Yang, Z., Gu, X., & Liu, X. (2013). A longitudinal study of the CET washback on
college English classroom teaching and learning in China: Revisiting college
English classes of a university. Chinese Journal of Applied Linguistics, 36(3),
304-325.
You, C., & Dörnyei, Z. (2014). Language learning motivation in China: Results of a
large-scale stratified survey. Applied linguistics, 37(4), 495-519.
Yu, L. (2001). Communicative language teaching in China: Progress and resistance.
TESOL Quarterly, 35(1), 194-198.
Yuan, K.-H., Marshall, L. L., & Bentler, P. M. (2002). A unified approach to
exploratory factor analysis with missing data, nonnormal data, and in the
presence of outliers. Psychometrika, 67(1), 95-121.
Yurdugül, H. (2008). Minimum sample size for Cronbach’s coefficient alpha: A
Monte-Carlo study. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 35(35).
Zafarghandi, A. M., & Nemati, M. J. (2015). A comparative analysis of IELTS and
TOEFL in an Iranian context: A case study of washback of standard tests.
Theory and Practice in Language Studies, 5(1), 154.
Zeng, W., Huang, F., Yu, L., & Chen, S. (2018). Towards a learning-oriented
assessment to improve students’ learning—A critical review of literature.
Educational Assessment, Evaluation and Accountability (formerly: Journal of
Personnel Evaluation in Education), 30(3), 211-250.
Zeng, Y. (2008). A study on the washback effect of the Senior High School Entrance
Exam on junior English teaching and learning [Master’s thesis, Huazhong
Normal University]. Wuhan, Hubei.
310 Bibliography
Zeng, Y. (2010). The computerized oral English test of the National Matriculation
English Test. In L. Cheng & A. Curtis (Eds.), English language assessment and
the Chinese learner (pp. 234-247). Routledge.
Zhan, Y., & Andrews, S. (2014). Washback effects from a high-stakes examination
on out-of-class English learning: Insights from possible self theories.
Zhang, H., Zhang, W., Wu, S., & Guo, Q. (2018). A survey on attitudes toward
holding the National Matriculation English Test twice a year. China
Examinations(1), 20-26.
Zhang, L., & Li, X. (2004). A comparative study on learner autonomy between
Chinese students and west European students. Foreign Language World, 4, 15-
23.
Zhang, R. (2019). Backwash effect of integrating listening and speaking test into
NMET (Shanghai): Taking School J as an example. Foreign Language Testing
and Teaching(04), 47-53.
Zhi, M., & Wang, Y. (2019). Washback of college entrance English exam on student
perceptions of learning in a Chinese rural city. In R. M. Damerow & K. M.
Bailey (Eds.), Chinese-speaking learners of English: Research, theory, practice
(1st ed., pp. 26-37). Routledge.
Zhuang, X. (2008). Practice on assessing grammar and vocabulary: The case of the
TOEFL. Journal of US-China Education Review, 5(7), 46-57.
Ziegler, N., & Kang, L. (2016). Drawing mixed methods procedural diagrams. In A.
K. Moeller, J. W. Creswell, & N. Saville (Eds.), Second language assessment
and mixed methods research (pp. 51-83). Cambridge University Press.
Zou, S., & Xu, Q. (2017). A washback study of the Test for English Majors for
Grade Eight (TEM8) in China—From the perspective of university program
administrators. Language Assessment Quarterly, 14(2), 140-159.
Bibliography 311
312 Bibliography
Appendices
Appendix A
Language Knowledge Requirement at Level 5 in the ECSCE
1. Know the importance of phonetics in language learning.

2. Use generally correct, natural and fluent phonetics and
intonation in daily conversations.
phonetics 3. Understand and express different intentions and attitudes
according to the changes of stress and intonation.
4. Combine and read words according to pronunciation rules and
phonetic syllables.
1. Know that English vocabulary includes words, phrases,
idioms, collocations, etc.
2. Understand and assimilate the fundamental meanings of
words and their meanings in specific contexts.
vocabulary
3. Use vocabulary to describe objects, behaviours and
characteristics, and to explain definitions, etc.
4. Obtain and use a vocabulary size of 1,500~1,600 words and
200~300 idioms or collocations.
1. Understand the content listed in the “List of Grammatical
Level 5
Knowledge” in the appendix and know how to use them in
specific contexts.
2. Know basic structures and common ideographic functions of
language forms.
3. Understand and assimilate ideographic functions of language
grammar
forms in practical use.
4. Understand and use appropriate language forms to describe
people and objects; describe the occurrence and process of
specific events and behaviours; describe time, place and
direction as well as position; compare people, objects and
things, etc.
Understand and use functional ideations of the forms of language
function expressions listed at the current level appropriately in
communication.
Understand and use relevant forms of language expressions to
topic
explain the listed topics at the current level appropriately.
Appendices 313
Appendix B
Test Item Examples of the GVT from the Authentic 2018 SHSEET Paper
(Chongqing, Paper A)
II．单项选择。(每小题 1 分，共 15 分)
从 A、B、C、D 四个选项中选出可以填入空白处的最佳答案，并把答题卡上对应题目
的答案标号涂黑。
21. I had ________ egg and some milk for breakfast this morning.
A. a B. an C. the D. /
22. I always play basketball to relax myself _______ Saturdays.
A. on B. in C. at D. by
23. --- I have a bad cold.--- Sorry to hear that. You’d better go to see a ________ at once.
A. doctor B. cook C. writer D. farmer
24. I’m surprised at the new look of ________ hometown.
A. I B. me C. my D. mine
25. The show was so _________ that I couldn’t stop laughing.
A. sad B. terrible C. funny D. serious
26. __________ visitors came to take photos of Hongyadong during the vacation.
A. Thousand B. Thousand of C. Thousands D. Thousands of
27. It was raining. My father asked me _______ a raincoat.
A. take B. takes C. took D. to take
28. I can’t hear the teacher _______ with so much noise outside.
A. clearly B. slowly C. warmly D. bravely
29. They don’t live here any longer. They _______ to Chengdu last month.
A. move B. moved C. will move D. are moving
30. --- Must I go out to have dinner with you, Mum?--- No, you ________, my dear. You’re
free to make your own decision.
A. shouldn’t B. mustn’t C. needn’t D. can’t
314 Appendices
31. It’s hard for us to say goodbye ______ we have so many happy days to remember.
A. so B. because C. although D. until
32. ________ me a chance and I’ll bring you a surprise.
A. Give B. Giving C. Gives D. To give
33. ________ special class we had today! We learned about kung fu.
A. How B. What C. How a D. What a
34. The 2022 Winter Olympic Games ________ in China. I’m sure it will be a great success.
A. hold B. will hold C. were held D. will be held
35. --- Excuse me! Do you know ______________? --- It’s two kilometers away from here.
A. where is the supermarket B. when does the supermarket open
C. where the supermarket is D. when the supermarket opens
III．完形填空。(每小题 1.5 分，共 15 分)
根据短文内容，从 A、B、C、D 四个选项中选出一个能填入相应空格内的最佳答案，

并把答题卡上对应题目的答案标号涂黑。
Everyone has dreams. Lily dreamed of being a dancer. She took 36 lessons and all her
teachers thought she was an excellent student.
One day she saw a notice. It said that a famous dancing group would be performing in her
town. 37 though, “I must show the leader my dancing skills.” She waited for the group
leader in the dressing room. 38 the leader appeared, she came up and hander him the flowers
she prepared. The thorns (刺) hurt her fingers and blood came out. But she was too 39 to
care about the pain. She expressed her strong wish to be a dancer and begged (乞求) to show
her dance.
“All right, you dance.” The leader agreed. But half way through the dance, he stopped her,
“I’m sorry, in my mind you’re not good enough!” On hearing this, Lily 40 out as fast as her
legs could carry her. It was so hard for her to accept this. She lost heart and 41 her dream.
Several years later, the dancing group came to her town again. She decided to find out 42
the leader had told her she was not good enough.
This was his reply, “I tell this 43 every student.”
“You’ve ruined (毁掉) my life!” she shouted angrily.
Appendices 315
The leader went on, “I remember your present of 44 and how the thorns had hurt your fingers
but you carried on bravely. It was a pity that you didn’t take dancing like that and stopped
trying so 45 . So you are still not good enough for dancing!”
36. A. singing B. reading C. dancing D. dressing
37. A. He B. She C. They D. We
38. A. When B. Since C. Before D. Unless
39. A. weak B. bored C. excited D. tired
40. A. worked B. ran C. found D. looked
41. A. got on B. went on C. picked up D. gave up
42. A. why B. how C. who D. what
43. A. on B. at C. in D. to
44. A. books B. dances C. flowers D. dresses
45. A. happily B. easily C. luckily D. safely
VII．完成句子。(每空 1 分，共 10 分)
根据所给提示，完成句子。每空一词，含缩略词。
70. Jeff played tennis with his classmates yesterday. (改为否定句)
Jeff _____________ _____________ tennis with his classmates yesterday.
71. I go to the movies once a week. (对划线部分提问)
____________ ______________ do you go to the movies?
72. My sister will take care of my cat when I am on holiday. (改为同义句)
My sister will ____________ ____________ my cat when I am on holiday.
73. 当我们有不同意见时，应该相互理解。（完成译句）
When we have different ideas, we should understand __________ ____________.
74 史蒂芬·霍金不仅是一名伟大的科学家，而且是一位著名的作家。（完成译句）
Stephen Hawking was not only a great scientist __________ __________ a famous writer.
VIII．短文填空。(每空 2 分，共 16 分)
根据下面短文内容，在短文的空格处填上一个恰当的词，使短文完整、通顺。
316 Appendices
As we are growing up, we really need advice from adults. Here are three people talking
about their experience.
Jasper, 26, actor
You’re not alone.
Sometimes when you’re a teenager, you feel as if you’re all alone and there’s 75
you can talk to. Do you know twenty to thirty percent of teenagers in the US have a hard 76
going through the period? They feel lonely and sad. I think life is so much easier if you 77
your troubles with others. I regret that I didn’t take the advice when someone gave it to me.
Steve, 27, teacher
Your teachers only want what’s best for you.
When I was in school, I never thought I’d become a teacher. I acted badly in class,
and I feel 78 about that now. I love my job and I know how challenging it is, so I hope kids
can show their teachers more respect (尊敬). I hope kids can 79 that teachers push them to
do their best and not just to give them a hard time.
Anna, 29, doctor
Money doesn’t grow on trees.
When I was a teenager, I never learned 80 to save money. I just spent it! My parents
gave me everything I wanted, but I realize now they spent little 81 themselves. Now I wish
I knew more about planning my money, and I am not the only one! It seems that today’s
teenagers know about money planning even less 82 me years ago. I do wish they could
learn about it earlier.
Appendices 317
Appendix C
Table of Empirical Washback Studies of High-stakes Standardised English Tests in the International Context
Author(s) Topic Target test(s) Site Participants Research methods Major findings
School Leaving
The washback of a
Khaniya (1990) Certificate English Nepal 358 students Test administration negative washback
textbook-based test
examination (SLC)
The differences
between IELTS and Classroom observation (22 IELTS
197 leaners; 20
Green (2006b) English for academic IELTS U.K. preparation classes; 13 EAP negative washback
teachers
purposes (EAP) writing classes); brief teacher interview
classes
The washback of a The Selection
national high-stakes Examination for
negative and harmful
Özmen (2011) examination on Professional Posts in Ankara, Turkey 164 student-teachers Questionnaire; interview
washback
prospective English Public Organization
teachers (SEPPPO)
The relationship
between intended FCE 15 native and non-
washback and teachers’ native FCE teachers
Tsagari (2011) FCE Greece Interview negative washback
perceptions towards the in private language
exam as well as schools
classroom practice
The washback
60 IELTS
Zafarghandi and correlation of two Test administration; interview;
IELTS; TOEFL Iran applicants; 60 negative washback
Nemati (2015) standardised tests in questionnaire
TOEFL applicants
Iran
Six intact classes in
Washback on students’ four high schools in 80 Iranian male
Damankesh and High school final the Northern Guilan generally negative, some
test-taking and test province, the cities
high school students Think aloud methodology
Babaii (2015) English exam in Iran positive effects
preparation strategies of Siyahkal and (freshmen learners)
Shaft, Iran
Appendices 319
572 students; 83 Questionnaire; classroom generally positive
Hawkey (2006) Impact of IELTS IELTS Worldwide teachers; 45 observation; interview; document washback
textbook evaluators analysis (teaching materials)
A test of spoken
19 graduate
The possibility of language ability
The University of advisors; 255 Interview; observation; test
Saif (2006) generating positive designed for positive washback
Victoria, Canada undergraduate administration
washback international teaching
students; 47 ITAs
assistants (ITAs)
3,868 students, 423
The consideration of the
The Test of English for high school
intended washback by
Green (2014) Academic Purposes Japan teachers; 19 Student and teacher questionnaires positive washback
test developers and how
(TEAP) university English
to achieve it
teachers
9 ASL teachers; 16
EFL teachers; 62
The washback of test Student questionnaire; structured different washback
Shohamy et al. ASL, low-stake test; ASL students; 50
changes in two national Israel interview (teachers and inspectors); patterns; both positive and
(1996) EFL, high-stakes test EFL students; 2
tests document analysis negative washback
ASL inspectors; 4
EFL inspectors
The washback of a new
Wall and English examination in The O-Level 7 Sri Lankan Classroom observation both positive and negative
Sri Lanka
Alderson (1993) Sri Lanka on language examination teachers washback
teaching
20 teachers for each
The comparison of test;
Student and teacher questionnaires;
IELTS and TOEFL iBT IELTS; 100 IELTS both positive and negative
Erfani (2012) Iran classroom observation; teacher
washback in test TOEFL iBT students; 120 washback
interview
preparation courses TOEFL iBT
students
Test of Readiness for
Academic English
(TRACE), a local
Saglam and integrated them-based Mixed methods of test both positive and negative
Washback on learning Turkey 147 EFL students
Farhady (2019) high-stakes English administration and focus groups washback
language proficiency
test, used in a university
EAP program
320 Appendices
Mainly in a Teacher interview; student no answer to the
Alderson and The washback of
specialised interview; TOEFL preparation undesirable TOEFL
Hamp-Lyons TOEFL preparation TOEFL Two teachers
institute in the classroom observation; non-TOEFL influence on language
(1996) courses
USA preparation classroom observation teaching
The relationship
between the university
entrance examinations
Watanabe Japanese university A yobiko school Two teachers and Interviews for teachers’ background no positive or negative
and the use of the
(1996a) entrance examinations in central Tokyo four courses information; classroom observation washback
grammar-translation
approach in teaching to
the exam
Note. The studies were listed based on washback result findings, namely, “positive washback”, “negative washback”, and “mixed/complex washback”.
Appendices 321
Appendix D
Table of Empirical Washback Studies of High-stakes Standardised English Tests in China
8 NMET
constructors; 6
The intended and Interview; classroom
Guangdong English inspectors; negative washback: the
Qi (2004b) unintended washback of NMET observation; teacher and
388 secondary intended washback failed
NMET student questionnaires
school teachers; 986
students
8 NMET
The reasons for the constructors; 6
Two provinces in Interview; student and teacher negative washback: the
Qi (2005) failure of intended NMET English inspectors;
China questionnaires intended washback failed
washback of NMET 388 teachers; 986
students
388 Senior III
middle school
Teacher and student
The washback of Guangdong, English teachers; negative washback: the
Qi (2007) NMET questionnaires; interview;
writing task in NMET Sichuan 986 Senior III intended washback failed
classroom observation
students; 8 NMET
test constructors
The effects of test uses
870 sophomores in A set of questionnaires
and test design on test
Xie and Guangdong, a university in (questionnaire of test
preparation (washback CET 4 negative washback
Andrews (2013) China Guangdong perception; questionnaire of
effects on teaching and
province test preparation)
learning)
Whether the MET
Six provinces 229 teachers and
innovation brought any Matriculation English
Li (1990) enlisted in the local English
change in English Test (MET), same as Questionnaire positive washback
MET experiment teaching-and-
teaching in middle the NMET
research officers
schools
The Junior Secondary 21 teachers and 130 Teacher and student
Washback of grammar
Yang (2015) English Achievement Jiangxi students from a questionnaires, classroom positive washback
part
Graduation Test junior high school observation, teacher interview
322 Appendices
(JSEAGT) –SHSEET in
Jiangxi province
Perceptions of test and
Zou and Xu TEM 8: both oral and 724 program
test washback on Mainland China Mailing questionnaire positive washback
(2017) paper tests administrators
teaching and learning
Washback of listening 30 English teachers Teacher and student positive washback on
Comprehensive
Teng and Fu test in the senior high and 298 students questionnaires; teacher and teaching; both positive and
Assessment Program Taiwan
(2019) entrance exam on EFL from three junior student semi-structured negative washback on
(CAP) listening test
teaching and learning high schools interviews learning
The effects of changes 3 cohorts of Test administration; test
Andrews et al. both negative and positive
to high-stakes tests on UE Oral examination Hong Kong Secondary 7 videotapes and transcripts
(2002) washback
test-takers’ performance students analysis
Five secondary Interview (five test item
254 Grade 9
The washback of schools in writers, five teachers);
students; 40 English both positive and negative
Zeng (2008) SHSEET on teaching SHSEET Hongshan classroom observation;
teachers; 4 test item washback
and learning District, Wuhan teacher and student
writers
province questionnaires
150 non-English
major Student questionnaire;
Wang et al. The actual washback Beijing both positive and negative
IB CET-4 undergraduates unstructured interview;
(2014) effects of IB CET4 washback
from Beijing classroom observation
Jiaotong University
Washback of NMET the foreign language Teacher and student
Chen et al. under the policy of two test in National College Zhejiang 79 teachers, 710 questionnaires, teacher and both positive and negative
(2018) tests a year on teaching Entrance Examination province, China Grade 12 students student semi-structured washback
and learning (NCEE), i.e., NMET interviews
Washback on learning A concurrent qualitative-
the National College
Zhi and Wang from low A rural town in 139 senior high dominant mixed-methods both positive and negative
Entrance English Exam
(2019) socioeconomic Central China school students approach; a cross-sectional washback
(NCEEE), i.e., NMET
background students survey design
The washback of the
Hong Kong Certificate
Teacher and student
of Education Hong Kong
48 teachers; 42 questionnaires; classroom difficult to judge the value
Cheng (1997) Examination in English HKCEE Secondary
students observation; unstructured of washback
(HKCEE) regarding schools
interviews
revised examination
syllabus
Appendices 323
Two cohorts of
The influence of
students at
examination change
Secondary 5: 1100
regarding classroom
Hong Kong students taking the limited/superficial
Cheng (1998) activities, practice HKCEE Student questionnaire
secondary schools old HKCEE in washback
opportunities and
1994; 600 students
learning strategies from
taking the new
students’ perspectives
HKCEE in 1995.
Department chair, 2
Two private or 3 teachers, 14 to
Shih (2007) The washback of GEPT
institutions of 15 students and 3 Interview; classroom various but limited degrees
on English learning and GEPT
higher education students’ family observation; document review of washback
Shih (2009) teaching
in Taiwan members from each
department
The potential washback
of preparatory IELTS
course on students’
IELTS test
146 students in 23 Web-based student
performance/the A university in
Gan (2009) IELTS undergraduate questionnaire (main); student no obvious washback
learning/motivational Hong Kong
programmes interview (supplementary)
effects of IELTS test
preparation course
among ESL students in
Hong Kong.
389 Secondary 4
The perceptions of the
Cheng et al. students; 315 Student and parent
impact of School-based HKCEE SBA Hong Kong complex impact
(2011) parents of those questionnaires
assessment (SBA)
students
A provincial
The CET 4 test impact Three non-English Diary approach; semi-
Zhan and comprehensive
on students’ out-of-class The revised CET-4 major structured post-diary diverse washback
Andrews (2014) university in
learning undergraduates interview
Jiangsu province
Note. The studies were listed based on washback result findings, namely, “positive washback”, “negative washback”, and “mixed/complex washback”.
324 Appendices
Appendix E
Classroom Observation Scheme
Date:
School:
Teacher:
Class number:
No. of students:
Time period:
Part A: General teaching and learning practices regarding the GVT
Major tasks/language
points
Time spent
Mention of the SHSEET
Assessment-related Teaching
activities Learning
Use of target language L1
L2
Materials Types
Purposes
Part B: LOA related practice
Assessment tasks
Learning focus Language
Test
Participant organisation T→Ss/Class
S→T/Class
Choral
Feedback forms Oral
Written
Only correction
Correction +
explanation
Result interpretation Simple (only
answer)
Advanced
(referring to
ECSCE/test
specification)
Part C: Field notes (nonverbal behaviours: facial expression, eye contact, etc.)
Part D: Comments and questions
Appendices 325
Appendix F
Semi-structured Interview Protocol
*Please note: These interview questions may be changed once the classroom
observation data are collected and briefly analysed, therefore, some of the questions in
this protocol will be edited to reflect the data found in the previous observation stage.
Section 1 Interview record

Time of interview:
Date:
Place:
Interviewer:
Interviewee:
(Participant demographic information will be obtained through pre-observation
informal interviews.)
Section 2 Interview questions
Thanks for your participation. I am researching the test influence of the grammar and
vocabulary part in the Senior High School Entrance English Test (SHSEET) on
English teaching. I would really appreciate your views on the following questions
about the test influence and the Learning Oriented Assessment regarding the GVT.
Q1. Do you think the inclusion of the separate testing of grammar and vocabulary in
the SHSEET reflects the aim of the communicative language use development in the
English Curriculum Standards for Compulsory Education (ECSCE)?
- Yes. In what aspects?
- No. Why?
Q2. From your understanding, which aspects of grammar and vocabulary are the focus
of the SHSEET? Do you feel that these aspects reflect the intent of the ECSCE?
- Do you have any access to the SHSEET Test Specifications?
- If so, are you familiar with them?
- If so, do you think they have the same guiding ideas of student- and learning-
centred instruction as specified in the ECSCE?
Q3. Do you focus on the language points listed for grammar and vocabulary in the
ECSCE only?
- Yes. How do you deal with new language points during teaching?
- No. Why? What is your focus on the teaching of grammar and vocabulary?
326 Appendices
Q4. Except for the language points listed in the ECSCE, are you also familiar with
other guidelines for teaching and assessment? Such as teaching should help to develop
students’ language use ability and the SHSEET should test students’ language use
ability in integrated macroskill tasks?
- Yes. Do you follow those guidelines during teaching?
- No. Why?
Q5. What are your thoughts on the GVT?

Possible prompts:
- What do you think of the test format?
- What do you think of the test content?
- What do you think of the test importance/stakes?
- Does the test importance influence your teaching?
- If so, in what aspects?
- What do you think of the test difficulty?
- Does the test difficulty influence your teaching?
- Does it test language use ability or simple language knowledge?
- Do the tasks resemble real/authentic English language use context?
Q6. How do you interpret your students’ performance in the GVT?
- Do you follow any interpretation guidelines?
- Yes. What are the guidelines and where do they come from?
- No. Why? Is it necessary to have some interpretation guidelines?
Q7. Do you offer any feedback to students on their performance of the GVT?
- Yes.
- In what ways? Orally? Written? Or other?
- How often? Once? Progressively? Or other?
- In what forms? Marks? Comments? Or other?
- No. Why? Then do you offer any feedback on students’ performance on grammar and
vocabulary outside of the test itself?
Appendices 327
Q8. How do you prepare your students for the GVT?
- How long do you spend on teaching the grammar and vocabulary tasks in general?
- How frequently do you prepare students for the grammar and vocabulary tasks?
- What teaching materials do you use?
- What is your teaching content?
- What teaching methods do you use?
- What language do you use during teaching?
- More English? Why?
- More Chinese? Why?
- Or other? Why?
Q9. What is your understanding of the term “Learning Oriented Assessment (LOA)”?
- Do you have any training of LOA?
- Do you inform students about LOA?
- Is your teaching informed by LOA?
Q10. Do you think the GVT can be learning-oriented?
- Yes. In what ways? Do you arrange any classroom assessment which is learning oriented?
- No. Why? What are the possible obstacles?
Q11. Do you consciously include any LOA activities during teaching?
- Yes. What are they?
- No. Why? Do you think it is necessary and possible to arrange LOA activities such as
peer-evaluation for the GVT during teaching?
Q12. How will students be evaluated upon their graduation from junior high schools?
- Only summative assessment?
- Both summative and formative assessment?
- If so, do you keep any formative records for students? What are they? What are the
purposes?
- Or other?
Q13. Does the inclusion of the GVT impact on your teaching?
328 Appendices
- Yes. What is the impact?
- No. Why?
Q14. Do you expect any change to be made in future to the GVT?
- Yes. What kinds of change?
- No. why?
(The interviewer will have the 2011 ECSCE, 2018 Chongqing English test
specification, two past test papers at hand for reference.)
Appendices 329
Appendix G
Focus Group Interview Protocol
*Please note: These interview questions designed are possible to change once the
classroom observation data are collected and briefly analysed, therefore, some of the
questions in this protocol will be edited to reflect the data found in the previous
observation stage.
Section 1 Interview record

Time of interview:
Date:
Place:
Interviewer:
Interviewees:
(Participant demographic information will be obtained through emails.)
Section 2 Interview questions

Thank you all for the participation. I am researching the test influence of the grammar
and vocabulary part in the Senior High School Entrance English Test (SHSEET) on
English learning. I would really appreciate your views on following questions about
the test influence and the learning-oriented possibilities regarding the GVT.
Q1. What are your thoughts on the GVT?

Possible prompts:
- What do you think of the test format?
- What do you think of the test content?
- What do you think of the test importance/stakes?
- Does the test importance influence your learning?
- What do you think of the test difficulty?
- Does the test difficulty influence your learning?
- Does it test language use ability or simple language knowledge?
- Do the tasks resemble real/authentic English language use context?
Q2. Does your teacher mention the ECSCE or test specification when teaching the
grammar and vocabulary tasks of the SHSEET?
330 Appendices
- Yes.
- What is the purpose?
- What is the focus?
- Language points?
- Other? Please specify.
- No. What else does she/he mainly talk about when teaching the GVT?
Q3. Does your teacher follow the teaching and assessment guidelines when teaching
the grammar and vocabulary part of the SHSEET? Such as teaching should help to
develop language use ability listed in the ECSCE?
- Yes. Do you follow those guidelines during learning?
- No. Why?
Q4. Do you have any feedback on your grammar and vocabulary achievement?
- Yes. What kinds of feedback?
- No. Do you think the feedback will be useful for improving your knowledge and ability
to use grammar and vocabulary?
- Yes. What kinds of feedback would you like to receive?
- No. Why? What else will be helpful?
Q5. How do you prepare for the GVT?
- How much time do you spend in general?
- How frequently do you prepare?
- What learning materials do you use?
- What is your main learning content?
- What learning strategies do you use?
- What language do you use during learning?
- More English? Why?
- More Chinese? Why?
- Or other? Why?
Q6. Do you think the GVT can improve your learning of grammar and vocabualry?
Appendices 331
- Yes. In what ways?
- No. Why? What are the possible obstacles?
Q7. Do you engage in any learning-oriented activities like pair or group work when
learning the GVT?
- Yes. How do you think of it? Are there other similar forms of learning-oriented activities
possible?
- No. What are the usual activities?
Q8. Does the inclusion of the separate testing of the GVT impact on your learning?
- Yes. What are they? Are they positive or negative?
- No. Why?
Q9. Do you expect any change to be taken in future to the GVT?
- Yes. What kinds of change?
- No. Why?
(The interviewer will have 2011 ECSCE, 2018 Chongqing English test specification,
two past test papers at hand for reference.)
332 Appendices
Appendix H
Transcription Symbols Used in This Study (adapted from Powers (2005))
Symbol Meaning
xxx Not clear in the recording.
… Omission in the transcript.
…… Participants deliberately did not finish the sentence.
(i.e., ) The researcher’s explanation of certain words or pronouns for

readers’ better understanding.
[] The researcher adds certain content to complete the sentence
for a better understanding or adds symbolic behaviours that
participants had.
[pause] Pause longer than em, or 2 seconds.
Bold (e.g., very) The interviewees’ raise of the tone which means emphasis of
the word.
“” 1. The content between “ ” was originally spoken by
participants in English.
2. Teachers guessing students’ thinking or imitating students’
words.
c-a-r 3. Participants spelt out a word.
Appendices 333
Appendix I
Student Survey
Section 1
Pilot version of the student survey
Construct Indicators Item description
Perceptions of Regarding the GVT (the Multiple Choice Question, Cloze,
test design Sentence Completion, and Gap-filling Cloze), I think that:
characteristics
Negative v2 As long as I rote-memorise those grammar rules and vocabulary
perception of test lists, I can achieve a high score.
design
v3 The Sentence Completion and Gap-filling tasks are only testing
characteristics
students’ vocabulary spelling.
v4 The MCQ and cloze tasks are only testing students’ ability of
guessing the correct answers.
Positive v1 The language situations designed in the items are fully in line
perception of test with the real-life situations.
design
v5 I must read the whole sentence in order to answer the questions
characteristics
(for MCQ and Sentence Completion).
v6 I must read the whole passage in order to answer the questions
(for Cloze and Gap-filing cloze).
v7 I must understand the sentence context in order to answer the
questions (for MCQ and Sentence Completion).
v8 I must grasp the gist of the passage in order to answer the
questions (for Cloze and Gap-filing cloze).
v9 The four task types are testing various topics which require a
lot of background information and common sense.
v10 The four task types can test my overall ability to use language.
v11 The four task types can test my ability to use grammar and
vocabulary in different language situations.
Motivation My motivation to learn English grammar and vocabulary:
Intrinsic v12 Learning English grammar and vocabulary can greatly
motivation influence my future English study (such as study in higher
education).
v14 Learning English grammar and vocabulary is really interesting.
v15 Learning English grammar and vocabulary can further help me
to read English books or magazines, surf English websites, etc.
v16 Learning English grammar and vocabulary can greatly help me
in language communication (both oral and written
communication).
v19 Learning English grammar and vocabulary can help me make
use of various resources to understand the cultural background
of English-speaking countries.
Extrinsic v13 Learning English grammar and vocabulary is very helpful for
motivation me to get a higher test score in various language tests.
334 Appendices
v17 Learning English grammar and vocabulary is to meet the
requirement of school courses.
v18 Learning English grammar and vocabulary can help me to be
enrolled into my ideal senior high school.
v20 Learning English grammar and vocabulary can help me to
become a successful member of society.
v21 Learning English grammar and vocabulary can greatly help me
to pass any future language evaluations that may be given in my
future career.
Test anxiety In the time leading up to the GVT,
v22 My appetite was unchanged.
v23 My sleep habits were unchanged.
v24 I was confident that I could do much better than most of the
other students.
v25 I never worried that the teacher and my parents would criticise
me if I couldn’t get an ideal score.
v26 I still did NOT change my usual study habits for learning
English grammar and vocabulary.
Test preparation How many test papers did you take each week (excluding
effort the normal class time) after starting your test preparation?
Test papers taken v27 MCQ
v28 Cloze
v29 Sentence Completion
v30 Gap-filling cloze
Time investment In the time leading up to the GVT, each week I spent some
time on the following test tasks(excluding the normal class
time),
v31 MCQ
v32 Cloze
Learning In the time leading up to the GVT, I used the following
strategy strategies to learn English grammar and vocabulary,
Negative strategy v35 Focusing mainly on the language knowledge which is
frequently tested.
v36 Being reliant on supplementary learning materials (such as
grammar book, vocabulary list, and mock tests).
v37 Doing a lot of exercises and mock tests.
v38 Only rote-memorise grammar rules and vocabulary lists.
v39 Using test-wiseness strategies (such as eliminating the similar
options and avoiding using unfamiliar words or grammar rules).
Positive strategy v40 Finding suitable learning materials for myself.
v41 Keeping a record of exemplary words, sentences and
paragraphs that are useful for my future language learning
while reading.
Appendices 335
v42 Summarising and reviewing the language points on which I
often make mistakes.
v43 Reading extensively to build the sense of language
appropriateness (i.e., know what grammar structure or
vocabulary to be used in different sentences or contexts).
v44 Summarising rules for leaning grammar and vocabulary for
myself.
In the time leading up to the test, we did the following
grammar and vocabulary activities in class,
Classroom v45 We often solved problems in groups or pairs.
interaction
v46 We often just followed the English teacher’s instructions.
v47 We practiced conversations regarding various topics (not
including reading the text from the test).
v48 Our English teacher did things like nominating one student to
lead the whole class (the student taught some grammar and
vocabulary knowledge to the whole class).
v49 We performed drama and had English debates.
Involvement in v50 We were encouraged by the teacher to self-assess to identify
assessment strengths and weaknesses in grammar and vocabulary learning.
v51 We were encouraged by the teacher to assess peers’
performance on grammar and vocabulary tests to give
feedback.
v52 We were encouraged by the teacher to know and become
familiar with the grammar and vocabulary scoring rubrics in
different assessments (such as learning the scoring rubrics of
Gap-filling cloze).
v53 We were encouraged by the teacher to try to design grammar
and vocabulary test items to assess our own achievement.
v54 We were encouraged by the teacher to summarise and reflect
on our strengths and weaknesses after taking every grammar
and vocabulary test, including quiz and self-assessment.
Feedback In the time leading up to the test,
v55 The feedback on my grammar and vocabulary learning from my
English teacher was very frequent.
English teacher was very timely.
English teacher was very detailed.
v58 The feedback on my grammar and vocabulary learning from the
English teacher helped me to find my weaknesses and make
learning objectives for the next stage.
v59 As long as I acted upon the feedback on the grammar and
vocabulary learning from the English teacher seriously, I could
make progress in future.
v60 I felt very satisfied with the feedback method (such as “marking
the assignment after class—focused feedback in the next
class—individual feedback after the class”.)
336 Appendices
Learner In the time leading up to the test, I learned grammar and
autonomy vocabulary autonomously in the following ways,
v61 I preferred asking for my teacher’s or classmate’s help when I
had any grammar and vocabulary problems.
v62 I preferred solving the grammar and vocabulary problems by
myself (e.g., looking up in a dictionary, checking with the test
preparation materials about grammar rules or vocabulary
knowledge, reviewing my previous notes on incorrect test
responses).
v63 I usually tried to take every opportunity to take part in grammar
and vocabulary activities in the class, such as pair and group
discussion, to help my learning.
v64 I could make a very effective use of my free time to learn
v65 Except for the assignments required by the English teacher, I
also attended many extra-class activities to practice and learn
English grammar and vocabulary (such as practising
conversations with classmates).
Test importance The following sayings are about the test importance of the
GVT (MCQ, cloze, sentence completion, gap-filling cloze),
please choose the scale that suits you.
v66 I learned English grammar and vocabulary knowledge, because
it was tested in the exam.
v67 What are tested in the grammar and vocabulary should be what
students learn.
v68 The total score of the GVT (56 marks for all the four task types)
influences the entire score for the SHSEET directly.
v69 I think that learning the GVT can greatly influence my future
study and life.
Test difficulty To what extent is the GVT challenging to you?
v70 MCQ
v71 Cloze
Section 2
English version of student survey in the main study
Construct Indicators Item description
Demographic Have you ever completed the pilot version of this survey?
information
Are you a Grade 9 student who graduated in Chongqing this
year? Yes/No
Which test paper did you use when taking the SHSEET?
SHSEET Paper A/SHSEET Paper B
Gender: Male/Female
Your 2018 SHSEET score:
Appendices 337
Please write down the name of the junior high school you
studied. District name; School name; Class number.
Perceptions of Regarding the GVT (Multiple Choice Question, Cloze,
test design Sentence Completion, and Gap-filling Cloze tasks), I think
characteristics that:
Negative v1 It only aims to test students’ ability of rote-memorising
perception of test vocabulary and fixed collocations.
design
v2 The Sentence Completion and Gap-filling cloze tasks are only
characteristics
testing students’ spelling of vocabulary.
v3 The MCQ and Cloze tasks are only testing students’ ability of
guessing correct answers.
v4 The four task types are testing students’ ability of rote-
applying grammar rules into different sentences.
v5 The MCQ and Cloze are only testing students’ ability of
eliminating distracting options.
v6 What is tested the GVT should be the learning focus of
students.
Positive v7 I must read the whole sentence in order to answer the questions
perception of test (for MCQ and Sentence Completion tasks).
design
v8 I must read the whole passage in order to answer the questions
characteristics
(for Cloze and Gap-filing cloze tasks).
v9 I must understand the sentence context in order to answer the
questions (for MCQ and Sentence Completion tasks).
v10 I must grasp the gist of the passage in order to answer the
questions (for Cloze and Gap-filing cloze tasks).
v11 The language situations involved in the questions are absolutely
in line with real-life situations.
v12 The four task types are testing various topics which require a
lot of background information and common sense.
v13 The four task types can test my overall ability to use language.
v14 The four task types can test my ability to use grammar and
vocabulary in different language situations.
Motivation My motivation to learn English grammar and vocabulary:
Intrinsic v15 Learning English grammar and vocabulary can help me lay a
motivation foundation for future language learning.
v16 Learning English grammar and vocabulary can help me read
English books or magazines, surf English websites, etc.
v17 Learning English grammar and vocabulary can help me in
English language communication (both oral and written
communication).
v18 Learning English grammar and vocabulary can help me make
use of various resources to understand the cultural background
of English-speaking countries.
v19 Learning English grammar and vocabulary is really interesting.
Extrinsic v20 Learning English grammar and vocabulary is to meet the
motivation requirement of school courses.
338 Appendices
v21 Learning English grammar and vocabulary can help the
enrolment into my ideal senior high school.
v22 Learning English grammar and vocabulary can help me get a
higher test score in various language tests.
v23 Learning English grammar and vocabulary can help me become
a successful member of society.
v24 Learning English grammar and vocabulary can help me pass
any future language evaluations in my future career.
Test anxiety In the time leading up to the GVT,
v25 My appetite was unchanged.
v26 My sleep habits were unchanged.
v27 I was confident that I could do much better than most of the
other students and thus not afraid of comparing scores with
others.
v28 I never worried that the teacher and my parents would criticise
me if I couldn’t get an ideal score.
v29 I felt still relaxed.
Test preparation How many test papers did you take each week (excluding
effort the normal class time) after starting your test preparation?
Test papers taken v30 MCQ
v31 Cloze
Time investment In the time leading up to the GVT, each week I spent some
time on the following test tasks (excluding the normal class
time),
v34 MCQ
v35 Cloze
Learning In the time leading up to the GVT, I used the following
strategy strategies to learn English grammar and vocabulary,
Negative strategy v38 Focusing mainly on the language knowledge which is
frequently tested.
v39 Being reliant on supplementary learning materials (such as
grammar book, vocabulary list, and mock tests).
v40 Repetitively doing a lot of exercises and mock tests.
v41 Rote-memorising grammar rules and vocabulary lists.
v42 Using test-wiseness strategies (such as guessing test designers’
intentions to choose the right answers).
Positive strategy v43 Finding suitable learning materials for myself.
v44 Keeping a notebook of exemplary words, sentences and
paragraphs that were useful for my future language learning
while reading.
Appendices 339
v45 Summarising and reviewing the language points on which I
often made mistakes.
v46 Reading extensively to build the sense of language
appropriateness (i.e., know what grammar structure or
vocabulary to be used in different sentences or contexts).
v47 Summarising rules for leaning grammar and vocabulary for
myself.
In the time leading up to the GVT, we did the following
grammar and vocabulary activities in class,
Classroom v48 We often solved problems in groups or pairs.
interaction
v49 Our English teacher often used open-ended questioning to give
instruction on grammar and vocabulary (not including asking
for simple answers like “yes/no” and “right/wrong”).
v50 We practiced conversations regarding various topics (not
including reading the text from the test).
v51 Our English teacher did things like nominating one student to
lead the whole class (the student taught/explained some
grammar and vocabulary knowledge to the whole class).
v52 We had interesting learning activities such as performing drama
and having English debates.
Involvement in v53 We were encouraged by the English teacher to self-assess to
assessment identify our strengths and weaknesses in grammar and
vocabulary learning (such as checking our own homework).
v54 We were encouraged by the English teacher to assess peers’
performance on grammar and vocabulary tests to give
feedback.
v55 We were encouraged by the English teacher to know and
become familiar with the grammar and vocabulary scoring
rubrics in different assessments (such as learning the scoring
rubrics of grammar for Gap-filling cloze and writing tasks).
v56 We were encouraged by the English teacher to try to design
grammar and vocabulary test items to assess our own
achievement.
v57 We were encouraged by the English teacher to summarise and
reflect on our strengths and weaknesses after taking every
grammar and vocabulary test, including quiz and self-
assessment.
Feedback In the time leading up to the GVT,
English teacher was very frequent.
English teacher was very timely.
English teacher was very detailed.
v61 The feedback on my grammar and vocabulary learning from the
English teacher helped me to find my weaknesses and make
learning objectives for the next stage.
v62 As long as I acted upon the feedback on the grammar and
vocabulary learning from the English teacher seriously, I could
make progress in future.
340 Appendices
v63 I felt very satisfied with the feedback method (such as “marking
the assignment after class—focused feedback in the next
class—individual feedback after the class”.)
Learner In the time leading up to the GVT, I learned grammar and
autonomy vocabulary autonomously in the following ways,
v64 I actively asked for my English teacher’s or classmate’s help
when I had any grammar and vocabulary problems.
v65 I actively solved the grammar and vocabulary problems by
myself (e.g., looking up in a dictionary, checking with the test
preparation materials about grammar rules or vocabulary
knowledge, reviewing my previous notes on incorrect test
responses).
v66 I actively tried to take every opportunity to take part in grammar
and vocabulary activities in the class, such as pair and group
discussion, to help me to learn better.
v67 I could make a very effective use of my free time to learn
v68 In addition to the assignments required by the English teacher,
I also attended many extra-curricular activities to practice and
learn English grammar and vocabulary (such as practising
conversations with classmates).
Test importance Regarding the test importance of the GVT (MCQ, cloze,
Sentence Completion, Gap-filling cloze), please choose the
importance scale that suits you.
v69 Junior high school graduation
v70 senior high school enrolment
v71 proving English grammar and vocabulary proficiency
v72 developing English language use ability
v73 helping future English learning
Test difficulty To what extent is the GVT challenging to you?
v74 MCQ
v75 Cloze
Section 3
Chinese version of the student survey in the main study
中考英语语法与词汇测试的反拨效应研究
同学你好！我们正在进行一项关于中考英语语法和词汇测试（包括单项选
择、完形填空、完成句子和短文填空四种题型）的匿名问卷调查，目的是为了
了解重庆市 2018 届初中毕业生的初三英语学习情况。请认真阅读以下题目，
并根据自己的实际情况和真实想法作答。答案没有对错之分，对你提供的所有
Appendices 341
Appendix J
Independent Samples T-test Results of the Main Study
Table 1
Independent samples T-test results of paper survey (N=500) and online survey (N=422)
Indicators t df Sig. (2-tailed) r

v4 -2.749 916.723 .006 .09
v6 -3.632 914.747 .000 .12
v7 -1.994 915.329 .046 .07
v9 2.043 919.989 .041 .07
v10 2.583 919.973 .010 .08
v12 5.386 917.681 .000 .18
v13 5.124 910.088 .000 .17
v14 3.930 909.329 .000 .13
v17 2.105 918.060 .036 .07
v18 2.545 919.199 .011 .08
v19 4.655 905.845 .000 .15
v22 2.755 918.784 .006 .09
v23 3.961 919.866 .000 .13
v24 2.006 918.883 .045 .07
v31 -3.083 919.649 .002 .10
v33 -3.106 919.824 .002 .10
v35 -2.560 919.917 .011 .08
v52 2.826 911.798 .005 .09
v55 2.086 919.391 .037 .07
v59 7.268 909.074 .000 .23
v60 5.066 916.472 .000 .17
v61 5.380 918.075 .000 .17
v62 4.056 913.688 .000 .13
v63 5.136 914.602 .000 .17
v64 4.484 915.094 .000 .15
v65 3.671 919.117 .000 .12
v66 8.083 916.695 .000 .26
v68 6.448 910.405 .000 .21
v69 2.771 909.142 .006 .09
v70 2.649 918.163 .008 .09
v71 5.793 919.942 .000 .19
v72 3.945 916.028 .000 .13
v73 4.458 906.224 .000 .15
v77 3.946 919.240 .000 .13
Appendices 347
Table 2
Independent Samples T-test results of SHSEET Paper A (N=805) and Paper B (N=115)17
Indicators t df Sig. (2-tailed) r

v12 -3.698 172.856 .000 .27
v13 -2.827 167.261 .005 .21
v14 -2.127 176.948 .035 .16
v19 -3.072 167.432 .002 .23
v61 -3.212 156.348 .002 .25
v63 -2.582 157.667 .011 .20
17
Two participants did not respond to this item of “test paper type”.
348 Appendices
Appendix K
Descriptive Statistics of Indicators in the Main Study Instrument
Std.
N Mean Skewness Kurtosis
Constructs Deviation
& Indicators Std. Std. Std.
Statistic Statistic Statistic Statistic Statistic
Error Error Error
v1 922 2.534 .0347 1.0550 .427 .081 -.355 .161
v2 922 2.288 .0354 1.0762 .781 .081 -.011 .161
perception
v3 922 2.092 .0341 1.0363 .929 .081 .420 .161

Negative
v4 922 2.317 .0345 1.0490 .658 .081 -.065 .161

v5 922 2.537 .0346 1.0517 .393 .081 -.421 .161
v6 922 2.403 .0376 1.1419 .520 .081 -.539 .161
v7 922 3.192 .0364 1.1052 -.158 .081 -.696 .161
v8 922 3.134 .0367 1.1148 -.058 .081 -.770 .161
Positive perception
v9 922 3.556 .0327 .9927 -.530 .081 -.066 .161

v10 922 3.655 .0322 .9784 -.596 .081 .068 .161
v11 922 3.058 .0302 .9179 -.013 .081 .128 .161
v12 922 3.454 .0319 .9698 -.345 .081 -.188 .161
v13 922 3.595 .0320 .9724 -.614 .081 .198 .161
v14 922 3.652 .0319 .9700 -.601 .081 .210 .161
v15 922 4.157 .0258 .7842 -.987 .081 1.701 .161
v16 922 4.062 .0290 .8802 -1.126 .081 1.710 .161
motivation
Intrinsic
v17 922 4.032 .0292 .8867 -.998 .081 1.250 .161

v18 922 3.922 .0296 .8978 -.793 .081 .650 .161
v19 922 3.338 .0361 1.0955 -.357 .081 -.311 .161
v20 922 2.862 .0355 1.0766 .153 .081 -.547 .161
v21 922 3.726 .0288 .8748 -.587 .081 .450 .161
motivation
Extrinsic
v22 922 3.724 .0287 .8729 -.570 .081 .490 .161

v23 922 3.554 .0310 .9402 -.353 .081 .027 .161
v24 922 3.727 .0292 .8856 -.545 .081 .290 .161
v25 922 3.637 .0355 1.0782 -.552 .081 -.308 .161
v26 922 3.530 .0351 1.0653 -.355 .081 -.563 .161
Anxiety
v27 922 3.208 .0371 1.1258 -.068 .081 -.810 .161

v28 922 2.888 .0367 1.1133 .260 .081 -.643 .161
v29 922 3.198 .0367 1.1145 -.029 .081 -.713 .161
v30 922 2.545 .0352 1.0677 .711 .081 .028 .161
v31 922 2.595 .0353 1.0725 .575 .081 -.196 .161
investment papers
taken
v32 922 2.492 .0359 1.0897 .712 .081 -.061 .161

Test
v33 922 2.720 .0385 1.1692 .464 .081 -.584 .161

v34 922 2.468 .0311 .9440 .853 .081 .767 .161
v35 922 2.663 .0325 .9882 .520 .081 .048 .161
Time
v36 922 2.406 .0315 .9571 .869 .081 .689 .161

v37 922 2.886 .0372 1.1305 .289 .081 -.633 .161
v38 922 3.519 .0350 1.0631 -.528 .081 -.325 .161
v39 922 2.882 .0346 1.0510 .091 .081 -.531 .161
Negative
strategy
v40 922 2.776 .0361 1.0974 .191 .081 -.653 .161

v41 922 2.671 .0383 1.1625 .304 .081 -.721 .161
v42 922 3.154 .0366 1.1122 -.164 .081 -.645 .161
v43 922 3.190 .0367 1.1130 -.243 .081 -.654 .161
v44 922 3.007 .0376 1.1405 .053 .081 -.788 .161
strategy
Interac Positive
v45 922 3.055 .0371 1.1258 .006 .081 -.735 .161

v46 922 2.932 .0370 1.1222 .092 .081 -.727 .161
v47 922 3.003 .0378 1.1486 -.002 .081 -.808 .161
v48 922 3.056 .0369 1.1204 -.087 .081 -.731 .161
v49 922 3.519 .0343 1.0425 -.448 .081 -.318 .161
tion
v50 922 3.332 .0340 1.0311 -.324 .081 -.361 .161
Appendices 349
v51 922 3.146 .0377 1.1454 -.094 .081 -.780 .161
v52 922 2.694 .0381 1.1577 .267 .081 -.693 .161
v53 922 3.328 .0366 1.1105 -.264 .081 -.636 .161
Involvement
assessment
v54 922 2.999 .0386 1.1710 -.095 .081 -.812 .161
v55 922 3.656 .0340 1.0326 -.512 .081 -.272 .161
v56 922 3.012 .0397 1.2048 -.083 .081 -.845 .161
in
v57 922 3.838 .0351 1.0647 -.700 .081 -.200 .161

v58 922 3.245 .0306 .9280 -.139 .081 .207 .161
v59 922 3.503 .0305 .9254 -.486 .081 .312 .161
v60 922 3.527 .0309 .9390 -.433 .081 .213 .161
Feedback
v61 922 3.526 .0306 .9286 -.365 .081 .213 .161

v62 922 3.397 .0307 .9326 -.297 .081 .286 .161
v63 922 3.574 .0310 .9398 -.393 .081 .199 .161
v64 922 3.280 .0374 1.1352 -.303 .081 -.571 .161
v65 922 3.308 .0359 1.0915 -.332 .081 -.471 .161
autonomy
Learner
v66 922 3.031 .0366 1.1102 -.075 .081 -.678 .161

v67 922 3.051 .0350 1.0629 -.059 .081 -.496 .161
v68 922 2.761 .0388 1.1774 .136 .081 -.837 .161
v69 922 4.068 .0318 .9662 -1.084 .081 1.103 .161
importance
v70 922 4.263 .0284 .8627 -1.199 .081 1.473 .161

v71 922 3.803 .0322 .9778 -.665 .081 .300 .161
v72 922 3.937 .0322 .9766 -.875 .081 .666 .161
Test
v73 922 4.061 .0314 .9542 -1.070 .081 1.135 .161

v74 922 2.887 .0376 1.1402 .147 .081 -.529 .161
difficulty
v75 922 3.343 .0339 1.0306 -.314 .081 -.115 .161

v76 922 2.951 .0399 1.2102 .043 .081 -.808 .161
Test
v77 922 4.031 .0317 .9625 -.991 .081 .965 .161
350 Appendices
Appendix L
Summary of Factor Analysis Results for Main Study Instrument
Table 1
Factor analysis results for the measurable constructs in the main study
Construct No. of Total KMO SRMR RMSEA 90% TLI CFI

factors variance CI
extracted explained
Perception of test 3 54.064% .825 .073 .078 .068, .917 .932
design .087
characteristics
Motivation18 2 62.578% .891 .045 .102 .085, .937 .957
.121
Anxiety19 1 50.922% .770 .026 .124 .074, .947 .982
.181
Learning 2 59.904% .804 .048 .056 .033, .977 .986
strategy20 .080
Learning Oriented 4 60.522% .915 .050 .081 .073, .908 .924
Assessment 21 .088
Test importance22 2 78.133% .716 .029 .121 .085, .951 .980
.161
Note.
1. The final deletion included two motivation variables (v19, v20), one anxiety variable (v26),
three learning strategy variables (v38, V42, v43); two interaction variables (v49 and v50);
and two involvement in assessment variables (v54 and v56). Ten in total.
2. In this factor analysis result table, all the statistics were the results after deletion of items in
each scale.
18
v19 and v20 were removed from the scale.
19
v26 was removed from the scale.
20
v38, v42 and v43 were removed from the scale.
21
v49, v50, v54, and v56 were removed from the scale.
22
This is a two-factor solution, one factor includes “v69 v70”, and the other one composes “v71, v72
and v73”.
Appendices 351
Table 2
Assessment of normality for indicators within the washback mechanism model (N=922)
Variable min max skew c r. kurtosis c r.

Test Score 21.000 149.000 -1.382 -17.130 .928 5.752
v78 1.000 5.000 .365 4.531 .185 1.144
v71 1.000 5.000 -.664 -8.230 .292 1.809
v72 1.000 5.000 -.874 -10.829 .656 4.067
v73 1.000 5.000 -1.068 -13.237 1.122 6.956
v69 1.000 5.000 -1.083 -13.420 1.090 6.759
v70 1.000 5.000 -1.197 -14.835 1.459 9.040
v47 1.000 5.000 -.002 -.026 -.810 -5.018
v46 1.000 5.000 .092 1.142 -.729 -4.522
v45 1.000 5.000 .006 .080 -.738 -4.572
v44 1.000 5.000 .053 .658 -.790 -4.897
v39 1.000 5.000 .091 1.124 -.535 -3.315
v40 1.000 5.000 .191 2.363 -.656 -4.068
v41 1.000 5.000 .303 3.758 -.723 -4.483
v25 1.000 5.000 -.551 -6.830 -.312 -1.936
v27 1.000 5.000 -.068 -.841 -.813 -5.036
v28 1.000 5.000 .260 3.218 -.646 -4.004
v29 1.000 5.000 -.029 -.358 -.715 -4.435
v21 1.000 5.000 -.586 -7.264 .441 2.736
v22 1.000 5.000 -.569 -7.060 .481 2.981
v23 1.000 5.000 -.353 -4.374 .021 .129
v24 1.000 5.000 -.544 -6.741 .282 1.749
v15 1.000 5.000 -.986 -12.218 1.686 10.447
v16 1.000 5.000 -1.125 -13.941 1.694 10.499
v17 1.000 5.000 -.997 -12.356 1.237 7.667
v18 1.000 5.000 -.791 -9.809 .640 3.964
v14 1.000 5.000 -.600 -7.436 .202 1.253
v13 1.000 5.000 -.613 -7.604 .191 1.182
v12 1.000 5.000 -.345 -4.275 -.194 -1.202
v11 1.000 5.000 -.013 -.164 .120 .746
v10 1.000 5.000 -.595 -7.375 .061 .380
v9 1.000 5.000 -.529 -6.563 -.072 -.446
v8 1.000 5.000 -.058 -.722 -.772 -4.785
v7 1.000 5.000 -.157 -1.950 -.699 -4.331
v6 1.000 5.000 .519 6.439 -.543 -3.363
v5 1.000 5.000 .393 4.868 -.425 -2.637
v4 1.000 5.000 .657 8.140 -.071 -.442
v3 1.000 5.000 .928 11.503 .411 2.548
v2 1.000 5.000 .779 9.661 -.017 -.106
v1 1.000 5.000 .426 5.282 -.360 -2.231
352 Appendices
Table 3
Assessment of normality for indicators within the construct LOA practices (N=488)
Variable min max skew c.r. kurtosis c.r.

v68 1.000 5.000 .141 1.267 -.813 -3.667
v67 1.000 5.000 -.058 -.519 -.496 -2.235
v66 1.000 5.000 -.107 -.962 -.732 -3.302
v65 1.000 5.000 -.286 -2.578 -.575 -2.594
v64 1.000 5.000 -.358 -3.228 -.520 -2.347
v63 1.000 5.000 -.428 -3.862 .417 1.880
v62 1.000 5.000 -.268 -2.414 .263 1.186
v61 1.000 5.000 -.312 -2.811 .194 .876
v60 1.000 5.000 -.383 -3.450 .137 .618
v59 1.000 5.000 -.449 -4.046 .269 1.215
v58 1.000 5.000 -.152 -1.375 .193 .868
v57 1.000 5.000 -.679 -6.126 -.139 -.628
v55 1.000 5.000 -.516 -4.651 -.256 -1.154
v53 1.000 5.000 -.328 -2.962 -.586 -2.642
v52 1.000 5.000 .244 2.197 -.679 -3.062
v51 1.000 5.000 -.161 -1.456 -.793 -3.578
v48 1.000 5.000 -.142 -1.281 -.744 -3.354
Appendices 353
Appendix M
Qualitative Results to RQ1a
Washback value-Macro level

Understanding • all three teachers agreed on the learning-oriented characteristic of ECSCE and
and use of the Test Specifications
ECSCE and Test • although every teacher was familiar with teaching and assessment principles,
Specifications only Lan implemented those in test preparation stage
• all three teachers agreed that test design of the GVT reflected learning
objectives in the ECSCE
• although Lan was positive in curriculum principles implementation, the other
two teachers expressed difficulties in Grade 9
• although all three teachers used Test Specifications as their teaching reference,
only Zhang especially emphasised its importance to her and her students
• students learned grammar and vocabulary by referring to what was required by
the ECSCE or Test Specifications during test preparation
• students interpreted the test score weight and test purpose of certain tasks in
SHSEET through referring to the ECSCE or Test Specifications requirement
Washback value-Micro level
Perceptions of • authenticity: Some participants felt GVT tasks of MCQ and Sentence
GVT design Completion lacked authentic language, while others thought that the topics and
characteristics language involved in GVT tasks were relevant to real life;
• provision of context: Some participants felt that there was insufficient context
in test items of MCQ and Sentence Completion, while others considered the test
items in Cloze and Gap-filling cloze as providing a rich context;
• test method: Some participants regarded that GVT items of MCQs were
guessable and MCQ as well as Sentence Completion tested rote-memorisation,
while others acknowledged that those tasks tested a wide range of language
knowledge. Moreover, the test method resulted in unchanged test content in GVT
task;
• assessing language use: Some participants criticised that the overall ability to
use language was not tested in GVT tasks of MCQ and Sentence Completion, but
the others perceived that Cloze and Gap-filling cloze more effectively tested
language use.
Affective factors • test anxiety: different modes of anxiety, but Hu and Zhang were the most
anxious teachers with regard to the SHSEET; students from School C were most
anxious, Yi Zhen influenced students’ test anxiety
• intrinsic motivation: high-achieving students and students from School A and
Na-SB were mainly intrinsically motivated in English grammar and vocabulary
learning
• extrinsic motivation: both students from School A and School B were
extrinsically motivated in English grammar and vocabulary learning
Test preparation • test-based textbooks
materials • test review coaching books
• grammar and vocabulary lists in Test Specifications and the ECSCE
• self-selected test preparation materials such as mock test papers
• none-exam oriented materials as used by some School A students
Teaching Test-use oriented grammar and Language-use oriented grammar and
methods/learning vocabulary learning strategies vocabulary learning strategies
strategies using test-wiseness strategies; taking notes; transfer; elaboration;
selective attention; rote- repetition; spelling; summarisation;
memorisation; drilling; anticipating translation; using word association; reading
challenges that students may extensively to accumulate language
encounter knowledge; identifying and solving
language learning problems
354 Appendices

Ruijin Yang Thesis

Uploaded by

Copyright:

Available Formats

You might also like

Ruijin Yang Thesis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ruijin Yang Thesis

Uploaded by

Copyright:

Available Formats

GRAMMAR AND VOCABULARY TESTING

IN THE SENIOR HIGH SCHOOL ENTRANCE

Submitted in fulfilment of the requirements for the degree of

School of Teacher Education and Leadership

English Curriculum Standards for Compulsory Education (ECSCE), English as

High-stakes standardised English tests may impact significantly on

In relation to the LOA opportunities and challenges, various perceptions and

Figure 1.1. The structure of curriculum objectives in the English Curriculum

Table 1.1 Four types of academic junior high schools ................................................ 4

AfL Assessment for Learning

CEFR Common European Framework of Reference for languages

CET College English Test

CFA Confirmatory Factor Analysis

CFI Comparative Fit Index

CLT communicative language teaching

CNKI China National Knowledge Infrastructure

COET Computerised Oral English Test

EAP English for Academic Purposes

ECSCE English Curriculum Standards for Compulsory Education

EFA Exploratory Factor Analysis

EFL English as a Foreign Language

ESL English as a Second Language

GEPT General English Proficiency Test

GP cloze Gap-filling cloze

GSEEE Graduate School Entrance English Examination

GVT the Grammar and Vocabulary Test in the SHSEET

IB CET-4 Internet-based College English Test (Band 4)

IELTS International English Language Testing System

IFI Incremental Fit Index

LOA Learning Oriented Assessment

LOLA Learning Oriented Language Assessment

MCA Multiple Correspondence Analysis

MCQ Multiple Choice Question

MMR mixed methods research

MOE Ministry of Education

NFI Normed Fit Index

NME National Matriculation Examination

NMET National Matriculation English Test

NPEE National Postgraduate Entrance Examination

PEP People’s Education Press

RFI Relative Fit Index

RMSEA Root Mean Square Error of Approximation

SEM Structural Equation Modelling

SHSEE Senior High School Entrance Examination

SHSEET Senior High School Entrance English Test

SMC Square multiple correlations

TEM (-8) Test for English Majors (Band 8)

TLI Tucker-Lewis Index

TLU target language use

TOEFL iBT TOEFL internet-based test

VKS Vocabulary Knowledge Scale

VLT Vocabulary Levels Test

WAT Word Associates Test

ZDP zone of proximal development

Signature: OUT Verified Signature

Date: 14- /o'd / ")-0),o

Under the sponsorship of China Scholarship Council and Queensland University

First and foremost, I would like to express my special appreciation to my

Thirdly, my sincere thanks are given to my friends and officemates who