Professional Documents
Culture Documents
Ruijin Yang Thesis
Ruijin Yang Thesis
Ruijin Yang Thesis
Ruijin Yang
BA (English for Medical Purposes)
MA (Foreign Linguistics and Applied Linguistics)
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective i
Abstract
The exploratory sequential MMR design in this study contained both the
qualitative phase of 15 classroom observations, three teacher semi-structured
interviews, and three student focus groups and the quantitative phase of a student
survey (N=922). Both qualitative data and quantitative data were collected from Grade
9 teachers and students and were combined to address research questions. The findings
of this study revealed the complexity of washback phenomena in the GVT context.
Regarding washback value, both positive and negative washback were identified.
At the macro level of washback value, negative washback outweighed positive
washback since teachers tended to not implement the English Curriculum Standards
for Compulsory Education (ECSCE) principles in Grade 9 and a narrowing of the
curriculum phenomenon was evident in GVT preparation. At the micro level, it was
found that the GVT exerted both positive and negative washback on teaching and
learning in various aspects, including participants’ perceptions of test design
ii Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Study from A Learning Oriented Assessment Perspective
characteristics, affective factors, test preparation materials, and grammar and
vocabulary learning strategies. As for washback intensity, qualitative findings revealed
both in-class and extra-curricular GVT preparation practices. Quantitative results
gained from Multiple Correspondence Analysis (MCA) identified four different
patterns of washback intensity of GVT preparation. In further investigation of the
washback mechanism through Structural Equation Modelling (SEM), results showed
that students’ perceptions of test design characteristics and test importance indirectly
influence their test preparation through affective factors of test anxiety and intrinsic
motivation. In turn, students’ test preparation appeared to be associated with their self-
reported SHSEET scores.
This study is significant from multiple perspectives. At the theoretical level, the
study suggests that LOA theories can be applied in washback studies, especially in
exploring positive washback results in high-stakes standardised EFL test preparation
context. Methodologically, this MMR study is, to the best of the researcher’s
knowledge, the first trial of combining both thematic analyses and various advanced
statistical modelling, which includes MCA, SEM, CFA, and a robust survey design in
one single washback study. As a contribution to practice, suggestions and implications
for promoting positive washback and LOA practices during test preparation are
provided to inform in-service teacher training, Grade 9 teaching and learning, and the
GVT test design. The study contributes to research on reconciling the tension between
assessment, teaching, and learning in a summative assessment context.
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective iii
Table of Contents
Keywords .................................................................................................................................. i
Abstract .................................................................................................................................... ii
Table of Contents .................................................................................................................... iv
List of Figures ....................................................................................................................... viii
List of Tables ............................................................................................................................ x
List of Abbreviations .............................................................................................................. xii
Statement of Original Authorship ......................................................................................... xiv
Acknowledgements ................................................................................................................ xv
Chapter 1: Introduction ...................................................................................... 1
1.1 Background .................................................................................................................... 1
1.2 Context ........................................................................................................................... 2
1.2.1 The Education system in China ........................................................................... 3
1.2.2 English as a Foreign Language Education in China and the English
Curriculum Standards for Compulsory Education ............................................... 6
1.2.3 The role of standardised English exams in China .............................................. 11
1.2.4 Overview of the SHSEET .................................................................................. 12
1.2.5 Grammar and Vocabulary Testing in the SHSEET ........................................... 14
1.3 Aims of the Study ........................................................................................................ 16
1.3.1 Research objective ............................................................................................. 16
1.3.2 Research questions............................................................................................. 16
1.4 Significance of the Research ........................................................................................ 17
1.5 Thesis Outline .............................................................................................................. 17
Chapter 2: Literature Review ........................................................................... 19
2.1 Standardised English Language Tests .......................................................................... 19
2.1.1 High-stakes standardised English language tests in the international
context................................................................................................................ 20
2.1.2 High-stakes standardised English language tests in China ................................ 21
2.1.3 Empirical studies of the SHSEET ...................................................................... 22
2.1.4 Section summary................................................................................................ 24
2.2 English Grammar and Vocabulary Testing .................................................................. 24
2.2.1 English grammar testing .................................................................................... 25
2.2.2 English vocabulary testing ................................................................................. 28
2.2.3 The testing of English grammar and vocabulary ............................................... 31
2.2.4 Section summary................................................................................................ 33
2.3 Washback ..................................................................................................................... 33
2.3.1 Washback concepts and dimensions .................................................................. 34
2.3.2 Working towards positive washback ................................................................. 37
2.3.3 A new approach to positive washback: Learning Oriented Assessment............ 38
2.3.4 Washback and stakeholders ............................................................................... 39
2.3.5 Washback of high-stakes standardised English tests ......................................... 41
2.3.6 Section summary................................................................................................ 51
iv Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Study from A Learning Oriented Assessment Perspective
2.4 Summary and Implications ...........................................................................................51
Chapter 3: Theoretical Framework ................................................................. 53
3.1 Key Developments in Washback Theorisation .............................................................53
3.1.1 Washback Hypothesis ........................................................................................53
3.1.2 A curriculum innovation model..........................................................................55
3.1.3 Washback models of learning and teaching .......................................................55
3.2 A Washback Model Incorporating Intensity and Direction ..........................................57
3.3 Washback Mechanism ..................................................................................................61
3.4 Key Learning Oriented Assessment Frameworks ........................................................63
3.4.1 The LOA framework of Carless .........................................................................64
3.4.2 The LOA cycle developed by Cambridge English Language Assessment ........66
3.4.3 The LOA cycle in the SHSEET context .............................................................69
3.5 A New Washback Model Incorporating LOA ..............................................................71
3.6 Chapter Summary .........................................................................................................72
Chapter 4: Methodology.................................................................................... 75
4.1 The Methodological Review of Washback Studies ......................................................75
4.2 Mixed Methods Research .............................................................................................77
4.3 An Exploratory Sequential Mixed Methods Research Design .....................................79
4.4 Qualitative Phase ..........................................................................................................82
4.4.1 Site selection.......................................................................................................82
4.4.2 Participants .........................................................................................................84
4.4.3 Classroom observations ......................................................................................85
4.4.4 Interviews ...........................................................................................................87
4.4.5 Transcription and translation ..............................................................................92
4.4.6 Thematic analysis ...............................................................................................93
4.4.7 Validity and reliability........................................................................................95
4.5 Quantitative Phase ........................................................................................................98
4.5.1 Instrument design and development ...................................................................99
4.5.2 Pilot study .........................................................................................................101
4.5.3 Main study ........................................................................................................104
4.6 Ethical Considerations ................................................................................................118
4.7 Chapter Summary .......................................................................................................119
Chapter 5: Test Preparation: Washback Value ............................................ 121
5.1 Understanding and Use of Official Test Reference Documents .................................122
5.1.1 Understanding the role of official test reference documents ............................123
5.1.2 Implementing the principles in the ECSCE and using the Test
Specifications as test preparation reference......................................................125
5.2 Perceptions of Test Design Characteristics ................................................................129
5.2.1 Authenticity ......................................................................................................131
5.2.2 Provision of context..........................................................................................133
5.2.3 Test method ......................................................................................................134
5.2.4 Assessing language use ....................................................................................138
5.2.5 Perceptions of GVT design characteristics as measured in the student
survey ...............................................................................................................141
5.3 Affective Factors ........................................................................................................144
5.3.1 Test anxiety ......................................................................................................145
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective v
5.3.2 Intrinsic motivation .......................................................................................... 148
5.3.3 Extrinsic motivation......................................................................................... 149
5.4 Test Preparation Materials ......................................................................................... 152
5.4.1 Exam-oriented test preparation materials ........................................................ 152
5.4.2 Non-exam oriented learning materials ............................................................. 157
5.5 Grammar and Vocabulary Learning Strategies .......................................................... 158
5.5.1 Test-use oriented grammar and vocabulary learning strategies ....................... 160
5.5.2 Language-use oriented grammar and vocabulary learning strategies .............. 167
5.6 Chapter Summary ...................................................................................................... 170
Chapter 6: Test Preparation: Washback Intensity ....................................... 173
6.1 Perception of Test Importance ................................................................................... 174
6.1.1 The GVT is perceived as highly important ...................................................... 175
6.1.2 The GVT is perceived as relatively unimportant ............................................. 177
6.1.3 Perceptions of test importance as measured in the student survey .................. 178
6.2 Perception of Test Difficulty...................................................................................... 179
6.3 Test Preparation Effort ............................................................................................... 182
6.3.1 In-class test preparation effort ......................................................................... 182
6.3.2 Extra-curricular test preparation effort ............................................................ 184
6.3.3 Test preparation effort as tested in the student survey ..................................... 185
6.4 Washback Intensity: Multiple Correspondence Analysis .......................................... 187
6.5 Washback Mechanism: Structural Equation Modelling............................................. 192
6.6 Chapter Summary ...................................................................................................... 199
Chapter 7: The Incorporation of LOA Principles: Opportunities and
Challenges in GVT Preparation............................................................................ 203
7.1 Beliefs about Opportunities for the Incorporation of LOA Principles in GVT
Preparation ........................................................................................................................... 203
7.1.1 Alignment with students’ EFL learning stage ................................................. 206
7.1.2 Developing communication abilities in real life .............................................. 207
7.1.3 Developing students’ learning skills in general ............................................... 208
7.1.4 Learning-oriented test design........................................................................... 209
7.1.5 Transferring language knowledge into performance on macroskills ............... 209
7.1.6 The level of challenge ...................................................................................... 210
7.2 Identifiable LOA Strategies and Activities ................................................................ 211
7.2.1 Interactive classroom activities ........................................................................ 212
7.2.2 Feedback .......................................................................................................... 219
7.2.3 Learning-oriented strategies ............................................................................ 223
7.3 Learning Oriented Assessment as a Dynamic Multidimensional Construct in GVT
Preparation ........................................................................................................................... 232
7.4 Learning Oriented Assessment Practices in GVT Preparation and Student Test
Performance ......................................................................................................................... 235
7.5 Challenges of the Incorporation of LOA Principles in GVT Preparation .................. 236
7.5.1 Efficient use of class time ................................................................................ 237
7.5.2 The consideration of high test stakes ............................................................... 239
7.5.3 Administrative influence.................................................................................. 241
7.5.4 Student language proficiency........................................................................... 242
7.5.5 Class size ......................................................................................................... 244
7.5.6 The concern over teaching performance .......................................................... 245
vi Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Study from A Learning Oriented Assessment Perspective
7.5.7 Limited teaching experiences and expertise .....................................................245
7.5.8 Test method ......................................................................................................246
7.6 Chapter Summary .......................................................................................................248
Chapter 8: Discussion and Conclusion .......................................................... 251
8.1 Discussion ...................................................................................................................251
8.1.1 Washback value ................................................................................................251
8.1.2 Washback intensity...........................................................................................259
8.1.3 Washback mechanism ......................................................................................262
8.1.4 LOA opportunities and challenges ...................................................................264
8.1.5 Section summary ..............................................................................................269
8.2 Contributions and Implications...................................................................................270
8.2.1 Theoretical contributions ..................................................................................270
8.2.2 Methodological contributions ...........................................................................272
8.2.3 Implications for practice ...................................................................................273
8.2.4 Section summary ..............................................................................................278
8.3 Reflection....................................................................................................................279
8.4 Limitations ..................................................................................................................280
8.5 Future Directions ........................................................................................................281
8.6 Overall Conclusions of the Study ...............................................................................284
Bibliography ........................................................................................................... 287
Appendices .............................................................................................................. 313
Appendix A Language Knowledge Requirement at Level 5 in the ECSCE .........................313
Appendix B Test Item Examples of the GVT from the Authentic 2018 SHSEET Paper
(Chongqing, Paper A) ...........................................................................................................314
Appendix C Table of Empirical Washback Studies of High-stakes Standardised English
Tests in the International Context .........................................................................................319
Appendix D Table of Empirical Washback Studies of High-stakes Standardised English
Tests in China .......................................................................................................................322
Appendix E Classroom Observation Scheme .......................................................................325
Appendix F Semi-structured Interview Protocol ..................................................................326
Appendix G Focus Group Interview Protocol ......................................................................330
Appendix H Transcription Symbols Used in This Study (adapted from Powers (2005)) .....333
Appendix I Student Survey ...................................................................................................334
Appendix J Independent Samples T-test Results of the Main Study ....................................347
Appendix K Descriptive Statistics of Indicators in the Main Study Instrument ...................349
Appendix L Summary of Factor Analysis Results for Main Study Instrument ....................351
Appendix M Qualitative Results to RQ1a ............................................................................354
Appendix N Ethics Form for the Online Student Survey .....................................................355
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective vii
List of Figures
viii Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Study from A Learning Oriented Assessment Perspective
Figure 6.5. Structural model for the relationship within GVT washback
mechanism (N=922) .................................................................................. 194
Figure 6.6. The structural relationships within the measurement model of the
GVT washback mechanism ....................................................................... 199
Figure 6.7. The GVT washback model on teaching and learning ........................... 202
Figure 7.1. Structural model for the relationship within LOA practices in GVT
preparation (N=488) .................................................................................. 233
Figure 8.1. LOA dynamic in the GVT context ........................................................ 268
Figure 8.2. Washback model of the GVT ................................................................ 270
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective ix
List of Tables
x Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Study from A Learning Oriented Assessment Perspective
Table 5.3 Indicators of test anxiety in the GVT context (see instrument
reliability and validity in section 4.5.3) ..................................................... 147
Table 5.4 Indicators of intrinsic motivation (see instrument reliability and
validity in section 4.5.3) ............................................................................. 149
Table 5.5 Indicators of extrinsic motivation (see instrument reliability and
validity in section 4.5.3) ............................................................................. 151
Table 5.6 Indicators of test-use oriented learning strategies (see instrument
reliability and validity in section 4.5.3) ..................................................... 166
Table 5.7 Indicators of language-use oriented learning strategies (see
instrument reliability and validity in section 4.5.3) ................................... 169
Table 6.1 Indicators of test importance (see instrument reliability and validity
in section 4.5.3) .......................................................................................... 178
Table 6.2 GVT task types and perceptions of test difficulty (see instrument
reliability and validity in section 4.5.3) ..................................................... 181
Table 6.3 Number of test papers taken for GVT tasks (see instrument reliability
and validity in section 4.5.3) ...................................................................... 185
Table 6.4 Time spent on preparing for GVT tasks (see instrument reliability
and validity in section 4.5.3) ...................................................................... 186
Table 6.5 Model summary of washback intensity .................................................... 187
Table 6.6 Discrimination of variables for the dimensions ....................................... 188
Table 6.7 Standardised path coefficients of the structural model of the GVT
washback mechanism ................................................................................. 195
Table 6.8 Qualitative results to RQ1b ..................................................................... 200
Table 7.1 Indicators of classroom interaction (see instrument reliability and
validity in section 4.5.3) ............................................................................. 218
Table 7.2 Indicators of feedback (see instrument reliability and validity in
section 4.5.3) .............................................................................................. 222
Table 7.3 Indicators of learner autonomy (see instrument reliability and
validity in section 4.5.3) ............................................................................. 228
Table 7.4 Indicators of involvement in assessment (see instrument reliability
and validity in section 4.5.3) ...................................................................... 231
Table 7.5 Correlation coefficients between SHSEET score and LOA practices
(N=922) ..................................................................................................... 235
Table 7.6 Qualitative findings of RQ2 ..................................................................... 249
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective xi
List of Abbreviations
Ha Alternative Hypothesis
H0 Null Hypothesis
KMO Kaiser-Meyer-Olkin
L1 first language
xii Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Study from A Learning Oriented Assessment Perspective
L2 second language
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective xiii
Statement of Original Authorship
The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the best
of my knowledge and belief, the thesis contains no material previously published or
written by another person except where due reference is made.
xiv Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: AWashback
Study from A Learning Oriented Assessment Perspective
Acknowledgements
Secondly, I would also like to take this great opportunity to thank my participants
and those who helped me in data collection. I have learned and known more about
junior high school teaching and learning from your kind suggestions. Thanks, my three
participant teachers, without you, this study could not proceed smoothly. Special
thanks should go to Zhang (pseudonym) who always answered my questions during
thesis writing and provided me with important information. You are like a “big sister”
of mine and I feel grateful for that. Thank you to the English inspectors who helped
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective xv
me contact schools and my survey participants who contributed their time to my study.
Thanks for your great support!
Next, I would like to thank the Faculty of Education at QUT, especially for
offering me the great opportunity to go to the University of Calgary with excellent
professors, Professor Suzanne Carrington and Professor Karen Dooley; and my
colleagues and friends, Bridget, Ayomi, and Jonathan. It was a lovely and
unforgettable trip and I will always remember these happy moments with you in
Canada. Thank you, my faculty, thanks for your culture, numerous workshops, and the
academic environment where I have learned a lot and been able to improve myself.
xvi Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback
Study from A Learning Oriented Assessment Perspective
families. Besides, I would like to thank my in-laws, thanks very much for your kind
and generous support and care for Yiyang. Without you, I could hardly feel so
determined to continue my PhD study. My husband Yuefei Geng (耿跃飞), thanks for
our love and support for each other over the past fourteen years. Thanks for respect
my choice and support my decisions. My PhD is completed under both your and
Yiyang’s moral support. Moreover, thanks to my own parents and siblings who
supported me and encouraged me all the time. Special thanks go to my father, who
motivated and inspired me to pursue for a PhD degree when I started my
undergraduate. Finally, I also owe many thanks to my grandmother who gave me great
care and warmest love in my childhood. You supported me mentally when I feel I am
going to collapse. Thanks to all my families, my achievement goes to all of you!
This PhD study offered me a lot. As my thesis title says, I have gone through a
learning-oriented PhD study project and I would like to keep learning and being
learning-oriented in my future life and profession. Thank you, Ruijin, thanks for being
learning-oriented for all those years and thanks for your hard work to make this PhD
dream come true!
Grammar and Vocabulary Testing in the Senior High School Entrance English Test in China: A Washback Study
from A Learning Oriented Assessment Perspective xvii
Chapter 1: Introduction
This study investigates the positive and negative washback of the grammar and
vocabulary testing in the Senior High School Entrance English Test (SHSEET) from
a Learning Oriented Assessment (LOA) perspective in junior high schools in China.
To begin with, washback refers to the test influence on teaching (McNamara, 1996),
learning (Shohamy et al., 1996) or on teaching and learning (Bailey, 1996). Generally,
when the Grammar and Vocabulary Test in the SHSEET (referred to as the GVT from
this point onwards) brings about positive influence such as motivating students to
spend more time on language learning, positive washback will occur; when the test
brings about negative influence such as spending excessive time on test-related
exercises, negative washback will occur. In the GVT context, when the emphasis of
teaching and learning English as a Foreign Language (EFL) in junior high schools is
on language use and promoting communicative language teaching, it is perceived as
an indication of positive washback and thus the fulfilment of the intended washback
of the GVT. However, if the classroom teaching and learning are dominated by a focus
on rote-learning language knowledge and test-driven practices which are regarded as
undermining the intentions of curriculum developers, then the exam is understood as
failing to direct the EFL teaching and learning in a positive way.
This opening chapter sets the scene for the study. Section 1.1 addresses the
research background. Section 1.2 presents the relevant contextual information. Section
1.3 delineates the research aims and questions, which is followed by the significance
of the study in section 1.4. The chapter concludes with an overview of the structure of
the thesis in section 1.5.
1.1 BACKGROUND
Chapter 1: Introduction 1
This study is situated in the high-stakes context of the Senior High School
Entrance Examination (SHSEE), success in which qualifies test-takers for entry to
senior high schools. English is one of the seven subjects tested in the SHSEE, and the
English test is called the Senior High School Entrance English Test (SHSEET).
Despite its selective nature, the SHSEET is intended to bring about changes to the
existing exam-oriented education system and facilitate the aims of learner-centred
English instruction and student development in junior high schools (Ministry of
Education, 2011). In practice, however, whether the SHSEET meets the Ministry’s
expectations remains largely unknown.
The CLT-oriented curriculum has seen the focus of test tasks shifting from a
simple display of language knowledge to the assessment of the ability to use language.
Against this backdrop, the present study aims to examine the washback of the GVT of
which the multiple-choice tasks have received particular attention and are criticised as
reported by Xu’s study (as cited in Pan & Qian, 2017) from a Learning Oriented
Assessment (LOA) perspective. In the proposed study, washback is explored through
LOA theory which stresses the synergy between instruction, testing, and learning
(Turner & Purpura, 2016) and intends to promote positive washback from an
examination provider’s viewpoint (Saville & Salamoura, 2014). To this end, teachers’
and students’ perceptions and the underlying factors influencing their perceptions are
explored to provide empirical evidence for test developers and education authorities
to reflect on the test design and EFL teaching and learning at junior high schools. The
present study therefore demonstrates the washback of the GVT and realises the
potential of using LOA theory in a high-stakes standardised English test context as a
lens to explore the potential for positive washback.
1.2 CONTEXT
2 Chapter 1: Introduction
(principally English this time) from primary school onwards in 1912 (Education
History Research Group in Teaching Materials Research Institute, 2008). Over the past
decades, different theories and approaches such as CLT (Yu, 2001) and the product-
oriented approach (POA) (Wen, 2018) have been imported or developed to inform
EFL teaching in China. This indicates that EFL teaching in China has endeavoured to
move towards a communicative-oriented direction and to promote learning. Under the
influence of the fast-developing English education in China, English tests continue to
be important, denoting that high-stakes tests exacerbate pressures such as
standardisation, measurement, and accountability (Barksdale-Ladd & Thomas, 2000).
Chapter 1: Introduction 3
from Grade 10 to Grade 12, (3) compulsory education level, which is composed of
junior high school and primary school education, and (4) pre-school level
(kindergarten). According to the Compulsory Education Law in China (Ministry of
Education, 2006), compulsory education consists of a total of nine years of schooling,
commonly including a six-year primary school education and a three-year junior high
school education. The present study investigates the washback of the GVT on teachers
and students who are preparing for the entry to academic senior high schools; hence,
adult junior high schools go beyond the scope of the study. In China, four types of
academic schools offer junior high school education (see Table 1.1).
Table 1.1
Four types of academic junior high schools
Schools shown in Table 1.1 can also be divided into key schools and non-key
schools. Compared to non-key schools, key schools receive more government funds
and have more high-achieving students as well as teachers with higher education
background. However, according to the Compulsory Education Law of the People’s
Republic of China (Ministry of Education, 2006), the distinction between the terms of
key and the non-key is not suggested since the government is trying to narrow the
achievement gap between schools. Despite the policy-level requirement, the terms of
“key schools” and “non-key schools” are still used both in research and in practice.
As China has a long education history, testing in China can be traced back to the
Han Dynasty (202 BC - 220 AD) when exams were used to select civil servants (Tan,
2020). Since then, exams have played a major role in selecting qualified candidates to
enter the next level of education. Internationally, Chinese students have excellent
academic performance in high-stakes tests such as the Programme for International
Student Assessment (PISA). According to 2018 statistics, the Chinese mainland cities
of Beijing, Shanghai, Jiangsu, and Zhejiang ranked first in PISA results; however, the
well-being of students was lower than average (OECD, 2019). Domestically, students
are also engaged with countless exams in their school life. Currently, four major
4 Chapter 1: Introduction
standardised test batteries are administered across different education levels in China,
which are summarised in Table 1.2. The highly selective function of the education
system brings about the result that the higher the education level, the smaller the
number of students, and only a limited number of students can finally reach the highest
position in the system by achieving success in various competitive examinations (Qi,
2010).
Table 1.2
Major standardised entrance tests in China
Degree Test (paper test) Time Subjects tested in the main test
Admission units are in charge of
The PhD Student Twice in a year,
the test design and
Entrance every year in
PhD administration, common subjects:
Examination March and
Politics, Foreign language, and
(PSEE) October/November
Professional Subjects
The National The weekend
Four subjects: Politics, Foreign
Postgraduate before 23rd Dec.
Language,
Master Entrance in Chinese lunar
Mathematics/Professional
Examination calendar (two
Subject 1, Professional Subject 2
(NPEE) days), annually
Four subjects: Chinese,
The National Mathematics, Foreign Language
Matriculation 7th-8th June, (mainly Englisha), Social
Undergraduate
Examination annually Sciences (Politics, History and
(NME) Geography)/Natural Sciences
(Physics, Chemistry and Biology)
Flexible due to different region:
The Senior High mainly Chinese, Mathematics,
Senior high School Entrance Foreign Language (mainly
June, annually
school Examination English), Politics, History,
(SHSEE) Geography, Biology, Physics,
and Chemistry
Note. a Other Foreign Languages include Russian, French, Japanese, German, and Spanish.
The present study is situated in the SHSEE context. At the time of graduating
from junior high schools, students need to take two types of standardised tests: one for
obtaining the graduation certificate or school leaving purpose (huikao) and the other
being the Senior High School Entrance Examination (SHSEE, zhongkao). In fact, the
school-leaving certificate test and the SHSEE are undergoing major changes, and it is
common in some places that these two tests have long been combined as one (Ministry
of Education, 1999). Nonetheless, in some regions, the two tests are still organised
separately, and the SHSEE is administered by local education authorities at the
provincial- or county-level under the guidance of the MOE.
Chapter 1: Introduction 5
In effect, not every junior high school graduate can continue to three years of
formal study in academic senior high schools. According to the statistics from the
MOE, although the overall promotion rate of junior high school graduates has steadily
increased and now remains stable, many of them have to enter the job market or be
enrolled in vocational schools which are career-oriented. Taking Chongqing as an
example, the total number of SHSEE test-takers was 305,000 in 2018; however, the
planned enrolment for regular junior high schools was 195,000 students, and the rest
of them might attend vocational junior high schools (Chongqing Municipal People's
Government Network, 2018). In contrast to the National Matriculation Examination
(NME) which allows higher education institutes to recruit students nationally, junior
high school graduates usually attend a senior high school close to their residence,
known as the catchment area. The quality of schools in the same catchment area often
differs, sometimes dramatically. Students with better SHSEE outcomes are more
competitive for admission to better senior high schools. In this way, the SHSEE
becomes the yardstick for senior high school entrance. As acknowledged earlier,
English is one of the major subjects tested in the SHSEE, and the SHSEET is therefore
high-stakes in nature. By researching Grade 9 teachers’ and students’ (14 or 15 years
old) SHSEET test preparation experiences and test perceptions, the study is expected
to provide valuable insight into and implications for the study of grammar and
vocabulary during SHSEET preparation and compulsory education years.
6 Chapter 1: Introduction
both overall education and EFL education in China were restored and have undergone
modernisation (from 1977 to 1993) and globalisation (from 1993 onwards) (Adamson,
2004).
These education reforms relating to EFL are influenced by both political and
economic factors. To keep up with educational as well as policy changes and respond
to globalisation, the official commencement of foreign language (mainly English) as a
mandatory subject in compulsory education shifted from junior middle school level
(Grade 7, 12 years old) to primary school level (Grade 3, 9 years old) in 2001 (Wang,
2007). Currently, the importance of foreign language (mainly English) is clear at
various education levels (see Table 1.2). Most importantly, in contemporary China,
English ability is not only accentuated in the academic context but also greatly valued
by people working in government, education, and research sectors who need to seek
promotion opportunities in their professions (He, 2001).
In line with the changing status of EFL education in China, the English
curriculum/syllabus used in primary and secondary school education has experienced
more than ten changes and revisions during the past century (Education History
Research Group in Teaching Materials Research Institute, 2008). Together with other
18 curriculum standards for school subjects, the most recent version of the English
curriculum for the compulsory education is the English Curriculum Standards for
Chapter 1: Introduction 7
Compulsory Education (ECSCE), which was issued by the MOE in 2011 and
implemented from September 2012. The ECSCE, which the SHSEET has to abide by,
is a national curriculum and the major reference for test designers, classroom teachers,
and students to conduct English test designing, teaching, and learning activities during
compulsory education years (from Grade 3 to Grade 9). However, it is often perceived
that the lack of effective communication among curriculum developers, classroom
teachers, and test designers hampers the successful relationship between assessment
and instruction. This phenomenon is referred to as a “curriculum sandwich” (Gu, 2012,
p. 48). As a result, each group holds a different understanding of the curriculum and
thus implements the curriculum separately according to their own knowledge. The
“sandwich” separation model contributes to the discrepancy between those groups. In
this regard, understanding the major characteristics and requirements of the ECSCE is
a priority for this washback study to clarify the guiding principles for test design.
The ECSCE clearly defines five levels of requirements to achieve each objective.
These levels run through the nine-year compulsory education, with Level 2 set as the
basic requirement for Grade 6 graduates in primary schools, and Level 3 to 5 set as the
requirement for the three grades in junior high schools respectively. Level 5 is
therefore regarded as the guiding interpretation for students’ performance in the
SHSEET. In addition, the ECSCE lists the requirements at each level, similar to the
“can-do” statements in the Common European Framework of Reference for languages
(CEFR). These requirements give test designers, teachers, and students a clear
reference to assess learners’ English abilities.
8 Chapter 1: Introduction
Figure 1.1. The structure of curriculum objectives in the English Curriculum Standards for
Compulsory Education (ECSCE)
• Making full use of the leading role of assessment to achieve positive results
for different stakeholder groups;
• Helping monitor and improve the process of teaching and learning by using
formative assessment;
Chapter 1: Introduction 9
• Prioritising student motivation in primary schools;
• Selecting real and authentic language texts and designing test tasks
according to the real language use context.
The emphasis on the overall ability to use language coincides with international
CLT trends as well as the concept of AfL and LOA theories, indicating the learner-
centred orientation of the ECSCE. It is thus necessary to investigate the real classroom
context and examine the implementation of those test design standards through key
stakeholders’ perspectives. Therein lies the focus of the current study: the actual
washback of the GVT and the potential for the GVT preparation to incorporate LOA
principles from teachers’ and students’ perspectives. The reason for focusing on the
GVT will then be discussed in section 1.2.5.
10 Chapter 1: Introduction
examining SHSEET washback. As Test Specifications takes ECSCE as its reference,
it therefore follows the curriculum to focus more on students and their learning.
Additionally, SHSEET grammar and vocabulary test scope in Test Specifications has
been drawn up to meet the language knowledge objective defined by ECSCE Level 5
(see Appendix A). In sum, the intention of the curriculum developers is to bring about
a positive influence on EFL teaching and learning by shifting the teaching and
assessment emphases from formal linguistic knowledge to language use and practice.
Therefore, it is crucial to obtain empirical data to understand the extent to which this
intention has been realised and further explore the learning-oriented possibility of the
GVT.
Nonetheless, these English tests, due to their gatekeeping roles in the education
system, have been criticised for bringing about negative washback through impeding
curriculum implementation and reforms (Dello-Iacovo, 2009). As mentioned above,
Chapter 1: Introduction 11
tests like the SHSEET are an important symbol of the exam-oriented education in
China. These tests are at loggerheads with the nationwide implementation of
significant curriculum reforms that aim to shift test focus to the ability to use language
rather than rote memorisation. As Spolsky (1995) pointed out, test designers expect to
use exams for directing classroom activities; however, exams narrow down the
education process, limiting the focus of teaching and learning to what is to be tested.
Therefore, to achieve desirable test results, test preparation practices such as rote
memorisation and mechanical drilling often dominate the teaching process and finally
result in a failure in fulfilling the positive intentions of the curriculum standards (Qi,
2005) such as the ECSCE. Herein lies the significance of the present study, which
investigates the washback of the GVT on teaching and learning in order to determine
the opportunities and challenges concerning English instruction and assessment. The
next section provides a focused introduction to the SHSEET that is pertinent to various
stakeholders including policymakers, curriculum developers, test designers, classroom
teachers, and students.
The structure of the SHSEET in Chongqing – the data collection site for the
current PhD project – is presented in Table 1.3. The total score for the SHSEET in
Chongqing is 150 marks, with 98 marks allocated to Paper I and 52 marks allocated to
Paper II. The test duration is two hours.
12 Chapter 1: Introduction
Table 1.3
Composition of 2018 SHSEET (Chongqing) test paper
No. of Weighting
Components Test content Test method Marks
items (%)
Paper I
Listening Multiple choice
I. Listening question 20 30 20
(MCQ)
II. Multiple Grammar and
Choice Questions vocabulary MCQ 15a 15 10
(MCQ)
Grammar and
III. Cloze MCQ 10 15 10
vocabulary
IV. Reading Reading
MCQ 15 30 20
Comprehension
V. Oral Test Speaking MCQ 5 5 3
Paper II
VI. Task-based Reading Open-ended
4 9 6
reading questions
VII. Sentence Grammar and
Gap filling 5 10 7
Completion vocabulary
VIII. Gap-filling Grammar and Open-ended
8 16 11
cloze vocabulary gap filling
IX. Writing Task Writing Guided writing 1 20 13
Total 83 150 100
Note. a The total mark for this task started to change from 2017, previously it was 20 marks in total. In
2017, it was 18 marks.
Although test items in the SHSEET test papers in each province or county vary,
the tests are all guided by the ECSCE Level 5 requirements. They all have similar test
components as depicted in Table 1.3. Paper I mainly contains fixed multiple-choice
tasks and Paper II contains constructed-response tasks. The four macro language skills,
namely listening, speaking, reading, and writing, are tested in the SHSEET, and the
test tasks are assumed to evaluate learners’ overall ability to use language according
to the ECSCE.
This washback study takes Chongqing as the research site due to two major
reasons. First, the researcher obtained her Master’s degree in Chongqing University,
which ensures her familiarity with the city and access to schools. Second, as one of the
four municipalities in China, Chongqing has a massive education population and rapid
economic development. As the largest and most populous municipality, Chongqing is
located in the southwest of China. It has 26 districts, eight counties, and four
autonomous counties (Chongqing Municipal People’s Government, 2015), with a total
Chapter 1: Introduction 13
population of 33.89 million and about 19.70 million people live in the urban areas in
2017 (National Bureau of Statistics of China, 2018). In 2017, 98.69% of the SHSEE
test-takers moved on to the next education level, with 63.9% attending academic senior
high schools (Chongqing Municipal People's Government Network, 2018).
The test administration time in Chongqing is from 12th to 14th June every year,
and English is tested on the morning of 14th June. Two sets of SHSEET test papers
(Paper A and Paper B) are designed every year in Chongqing. The nine main districts
and seven districts or counties that take Paper A in 2017 constitute the so-called “joint
area”, and the other districts and counties which use Paper B are not included in the
joint area (Chongqing Zhongkao, 2017). Since the study mainly collected qualitative
data from the joint area, Paper A was first analysed before data collection started.
Nonetheless, Paper B was also consulted as these two test papers remain similar in
most aspects.
First, whether the GVT mitigates against the CLT objective and learning-
oriented aims of the ECSCE and whether it directs EFL teaching and learning towards
a greater focus on grammatical accuracy in classrooms remain unclear. As required by
the ECSCE, summative assessment (the SHSEET in this case) should focus on testing
students’ integrated language use and avoid the discrete testing of language
knowledge. However, the SHSEET still reflects the importance of the separate testing
of grammar and vocabulary through four different tasks of MCQ, Cloze, Sentence
Completion, and Gap-filling cloze which account for 38% of the total SHSEET score
(see Table 1.3). In this thesis, these four different GVT tasks are referred to according
to the names of these test tasks in the authentic SHSEET papers. In the GVT, MCQ is
14 Chapter 1: Introduction
the task that uses selected-response items and the Sentence Completion is the task
which uses constructed-response items, and both tasks are sentence-based.
Furthermore, the Cloze has selected-response items as choices are provided and the
Gap-filling cloze has constructed-response items which require written answers, and
both tasks are passage-based. Examples of each of these tasks are given in Appendix
B.
Chapter 1: Introduction 15
1.3 AIMS OF THE STUDY
16 Chapter 1: Introduction
In order to address these two research questions, this study applied an
exploratory sequential mixed method research (MMR) design. The qualitative data
were collected through classroom observations, semi-structured interviews and focus
groups, while the quantitative data were collected through the administration of a
student survey to Grade 9 students in Chongqing.
This study, which focuses on the washback of the GVT, first provides empirical
evidence of the test washback. Contextualised in China, specifically Chongqing, the
study sheds light on EFL grammar and vocabulary teaching, learning, and testing at
the compulsory education level. It can address the scarcity of research on the washback
of junior high school English tests in China, and therefore, potentially benefit Grade 9
teachers and students in terms of identifying challenges and opportunities in the
instruction and assessment of English grammar and vocabulary and providing
suggestions for more learning-oriented pedagogy and assessment.
In light of incorporating LOA theories (Carless, 2007; Jones & Saville, 2016)
into washback study, this washback study has a potential to make a theoretical
contribution to promoting positive washback in EFL teaching and learning. Further, as
the separate testing of grammar and vocabulary is also common to other high-stakes
standardised English tests in China (e.g., the NMET, the TEM), this study is of interest
and value to EFL teachers and test designers in similar contexts at different education
levels. Likewise, the high-stakes nature of the SHSEET also provides implications to
a wider international context. Most significantly, the present study has the potential to
help policymakers, curriculum developers, and test designers to realise the potential of
or difficulties in curriculum implementation and test design.
The thesis constitutes eight chapters. Each chapter is organised based on the
research questions presented in section 1.3.2. This first chapter has introduced the
research background and contextual information on EFL exams and the SHSEET,
which sets the scene for the current study. Moreover, it also explained the aim, research
questions, and significance of the study. Chapter Two reviews three bodies of the
literature: standardised English testing, grammar and vocabulary testing, and
washback studies. Chapter Three introduces washback models of teaching and
Chapter 1: Introduction 17
learning, and specifically depicts the LOA cycle which is combined with the washback
model as a new theoretical framework for this study. Chapter Four delineates the
exploratory sequential MMR design. Research findings are reported in Chapters Five,
Six, and Seven. In particular, the qualitative and quantitative results are integrated to
address the two proposed research questions in each chapter. Finally, Chapter Eight
discusses the overall research findings in connection to the research questions and
concludes the study with contributions, implications, reflections, limitations, and
future directions.
18 Chapter 1: Introduction
Chapter 2: Literature Review
This chapter presents research related to the current washback study regarding
the Grammar and Vocabulary Test in the SHSEET (the GVT). To start with, the two
research questions introduced in section 1.3.2, which are in relation to test washback
(both washback value and washback intensity) and the opportunities for as well as
challenges of the incorporation of Learning Oriented Assessment (LOA) principles in
GVT preparation, were informed by this literature review.
Studies relevant to the two research questions are presented in three sections.
First, considering the nature of the SHSEET, standardised English language tests are
first briefly reviewed in section 2.1. Second, the testing of grammar and vocabulary,
as aspects of language which are assessed separately in the SHSEET, are addressed in
section 2.2. As the major foundation of this study, washback is given substantial
attention in section 2.3 from both theoretical and empirical perspectives. The chapter
is concluded in section 2.4 with the identification of potential research gaps.
The negative influence of high stakes English language tests in China is reflected
by test-takers’ extrinsic motivation as reported, which results in intense test-driven
practices in test preparation. For example, participants’ extrinsic motivation to succeed
in the GSEEE negatively impacts on the education system and teaching, leading to
teaching to the test by coaching schools. An impact study of the GSEEE test
preparation program looked into test-takers’ intentions, reasons for attending test
preparation programs, and their expectations of the coaching programs (He, 2010).
Findings indicated that most test participants aimed for better job opportunities and/or
to secure their current jobs by enrolling in a Master’s degree. They attended the
coaching programs with the expectation of a rapid increase in test scores rather than
improving their language proficiency. These goals led to an exclusive focus on the test
content and format in coaching centres.
Negative test influence also occurs when different test functions (i.e., the
selection function and the educational change function) conflict with each other. For
example, a washback study of the editing task in the NMET found that the task failed
to achieve its positive intentions (Qi, 2010). Although all stakeholders agreed that the
editing task discriminated among different levels of language proficiency, which
contributes to the selection function of the NMET, test designers and teaching experts
expressed their negative perceptions of the editing task. Test preparation activities
were evident in classes, where the teaching of test-taking strategies and grammar
points and time spent on mock tests were contrary to the expected impact of enabling
students to identify and correct errors in writing. As a result, the test designers’
Positive test influence happens when the test design and interpretation of
performance on the test meet design intentions. For instance, the construct validity
(i.e., test designers’ intended aim of fulfilling communicative purpose) of the
Computerised Oral English Test (COET) of the NMET in Guangdong was the focus
of research by Zeng (2010). Despite the absence of interactional features, the COET
did measure key features of students’ oral competence since it reflected the test
construct (factors of pronunciation and intonation, translation, and comprehension and
oral production). Likewise, although some grammar items were not identical with the
authentic use of first language (L1) speakers, Pan and Qian (2017) found that the
content validity of the grammar subtest in the NMET (Shanghai) met the intended
design purpose.
Further, as the two building blocks of any language, grammar and vocabulary
are viewed as fundamental for language acquisition and communication. However,
they have traditionally been viewed as a body of knowledge which is static in nature
and reflects the unchanging knowledge of rules, especially grammar. In contrast, a
1
When “MCQs” is used, it means test items with an MCQ format, and MCQ in this study refers
specifically to the first GVT task in the SHSEET.
Using corpora to design and develop grammar tests can be an efficient method
to maintain content validity (Argüelles Álvarez, 2013; Macmillan et al., 2014; Pan &
Qian, 2017). Through a corpus-based approach, the content of the grammar section in
the NMET delivered in Shanghai was validated, and the benefits of considering the
incorporation of corpora into test design and development phase were confirmed (Pan
& Qian, 2017). Findings indicated that test items generally covered the grammatical
domains listed in test specifications, but certain drawbacks remained. First, due to
practical item writing constraints, it is difficult to test articles (a/an/the) since no
sufficient context is provided in the test items. Second, not all the listed grammatical
features had been tested in the test, especially some crucial grammatical features (e.g.,
cleft sentence, appositive clause). Therefore, there was a lack of content
representativeness. Further, to avoid testing grammatical structures with low
frequency in high-stakes tests, the reference to corpora during test design could be
useful.
Vocabulary tests which use MCQs are mostly preferred by both L1 and L2
speakers (Read, 2000, 2019). The invention and widespread use of vocabulary tests,
scaling and checklists such as the VLT (Nation, 1990; Schmitt, 2000) exemplify this
In addition, scholars have also endeavoured to incorporate test purpose into test
design to assess vocabulary. Read and Chapelle (2001, p. 10) provide a framework for
realising this incorporation. From Figure 2.1 on the next page, in between the intended
test purpose and test design, three factors mediate the process: construct definition,
performance summary and reporting, and test presentation. The presence of these
mediating factors influences the desired impacts on stakeholders. Moreover, the
consideration of test purpose in test design and the validation of vocabulary tests
through the listed process in Figure 2.1 will allow opportunities for positive
consequences.
2
Although the name of the test was changed, this study adopts the name used by the researchers whose
work is cited (i.e., the former name of FCE). This also applies to the other Cambridge tests appear later
in the thesis (i.e., CAE, CPE).
Figure 2.1. A framework for vocabulary testing (Read & Chapelle, 2001, p. 10)
The design and revision of the Use of English paper in tests of the Cambridge
Main Suite Exams (MSE) can be a good reference for designing and testing grammar
and vocabulary. The Use of English component was first introduced into the
Cambridge MSE in the 1950s and the testing focus has progressed according to
teaching and testing changes over time (Weir, 2013). Generally, to align with
communicative teaching and assessment, tests in the Cambridge MSE take a lexico-
grammatical approach to accentuate the relationship between grammar and
vocabulary, which requires not only basic language knowledge but also the ability to
use language in context. Originally, gapped-sentence tasks in the CPE Use of English
paper proved to be able to demonstrate test-takers’ full linguistic repertoire (Docherty,
2015). Compared to the difficulty in designing multiple-choice distractors, gapped-
sentence tasks enable the examination of candidates’ productive knowledge and
2.3 WASHBACK
Based on the previous review and considering present research objectives, this
section discusses the theoretical foundations of washback, the movement towards
positive washback and LOA, washback stakeholders, and significant washback
studies.
It is true that tests can exert influence on some participants, but not all (Linn,
1993), or a test may have positive or negative washback or even neutral washback
(Alderson & Wall, 1993). To investigate the washback of one particular test, the
washback of test outcomes and uses should be demonstrated and understood, not
simply asserted (Linn et al., 1991; Wall & Alderson, 1993). Therefore, researchers
should not take it for granted that a good test can certainly bring about positive
washback, or that a poor test will result in negative washback (Messick, 1996).
In the present study, the positive and negative washback as well as the intended
and unintended washback, which all indicate the direction or value of washback, are
A review of the relevant literature shows that unequal attention has been paid to
different key stakeholders. As students constitute the most direct and ultimate
stakeholder of any assessment, scholars (see, for example, Hamp-Lyons, 1997)
advocate that more studies should foreground their perceptions of washback, tests, and
test results as there are still a paucity of learner washabck studies (Damankesh &
Babaii, 2015; Pan, 2014; Xie & Andrews, 2013). Therefore, students and their learning
practices have recently become the focus of research (see, for example, Andrews et
al., 2002; Cheng et al., 2011; Reynolds et al., 2018; Saglam & Farhady, 2019; Shih,
2007; Xie & Andrews, 2013; Zhan & Andrews, 2014). Noticeably, students are mainly
included in current washback studies, either researched together with teachers (Green,
2006a, 2006b; Pan & Newfields, 2011) or other stakeholders such as parents (Cheng
et al., 2011) or test constructors (Qi, 2004a; 2005; 2007). For example, in the Hong
Kong secondary school context, Cheng et al. (2011) conducted impact studies among
students and their parents during the test innovation period. An investigation of their
perceptions of the impact of school-based assessment (SBA), identified that students’
perceptions of test learning activities related to their language proficiency awareness.
In sum, as the two essential and most basic stakeholder groups, it is necessary to
investigate both teachers and students, who are more directly affected by the test than
any other stakeholder groups, since the testing process is closely linked with both
teaching and learning. To this end, a comprehensive understanding of the classroom
practices leading up to the test is important to attain.
Negative washback is also identified regarding teaching and learning content and
teaching and learning methodology of the SHSEET (Yang, 2015; Zeng, 2008), the
CET-4 (Zhan & Andrews, 2014), and the NMET (Qi, 2005). A SHSEET washback
study by Zeng (2008) found that for teachers, tests disrupted the teaching routine and
limited their choices of teaching content such as neglecting speaking as well as the
Summary
Concluding from the above empirical washback literature in both the
international and Chinese contexts, the complexity of washback phenomenon is seen
through the different patterns of washback value and washback intensity among
stakeholders, tests, and over time. Major similarities are as follows: first, both positive
In sum, the general research gaps lead to the employment of LOA theory into
the current washback study. Therefore, this washback study of the GVT will utilise the
LOA theory to explore the positive and negative as well as intended and unintended
washback on teaching and learning from teachers’ and students’ perspectives in the
junior high school context in China. The application of LOA theory is further
explained in Chapter Three.
During the past decades, in order to clarify the mechanism of washback, various
models and frameworks have been developed (see, for example, Alderson & Wall,
1993; Bailey, 1996; Burrows, 2004; Green, 2007a; Hughes, 1993; Shih, 2009). In this
section, three washback frameworks targeting learning and teaching are presented.
However, the model developed by Green (2007a) was adopted due to its alignment
with the first objective of the current study, which is to explore teachers’ and students’
perceptions of the washback.
Note. The numbering of hypotheses in this table is not consecutive due to the grouping under different
categories.
However, not all these hypotheses are reflected in reality. For example, in their
Sri Lankan washback study, Wall and Alderson (1993) found that the introduction of
O-Level examinations did not result in changes in teachers’ pedagogy. Therefore,
Hypothesis 4 is not verified. In fact, the “Washback Hypothesis” by Alderson and Wall
(1993) is not exhaustive and it has been criticised as being overly simplistic and
general, since it mainly focuses on the linear relationship between tests and test-related
teaching as well as learning aspects (Shih, 2007). Considering the complexity of
washback, the “Washback Hypothesis”, while incomplete, attempts to clarify the
washback concept and has provided a foundation for both empirical studies and more
comprehensive washback models which followed.
Learning Teaching
Figure 3.2. Washback models of learning and teaching (Shih, 2007, p. 151; 2009, p. 199)
From Figure 3.2, it is evident that these two models also consider the time factor
(the axis and (t) symbol) to point out that washback varies over time (Shohamy et al.,
1996; Shohamy et al., 1986). Similar to Burrows (2004), these two models also map
the flow of influences which brings about washback in dotted lines. Although the
factors and relationships seem to be exhaustive and comprehensive, they fail to map
teaching and learning into a comprehensive picture since teaching and learning are
To summarise, these three washback theories and models try to depict the
complexity of the washback phenomenon, either from the whole teaching and learning
system (Alderson & Wall, 1993) or from specific teaching (Burrows, 2004; Shih,
2009) and learning (Shih, 2007) aspects. Insightful and informative as they are, they
all fail to touch upon the specific issue of the positive or negative washback value and
need clearer identification of washback intensity. This was addressed in a more
comprehensive washback model by Green (2007a), outlined in the next section.
Figure 3.3. Model of washback, incorporating intensity and direction (Green, 2007a, p. 24)
• value success on the test above developing skills for the target language use
domain;
• work in a context where these perceptions are shared (or dictated) by other
participants.
Taking RQ 1 of the present study (What is the washback of the Grammar and
Vocabulary Test in the SHSEET (the GVT)?) as an example, the ‘overlap’ between
test design characteristics (the GVT) and the focal construct (the development of
communicative language use) in the current model can guide the exploration of
positive, negative, intended, and unintended washback. The focal construct, as
explained by Green (2007a), also refers to the characteristics of the TLU domain (i.e.,
how the target language is used outside of the test itself). To decide the direction or
value of the test washback, the ‘overlap’ can first be analysed through the examination
of content of the tasks (i.e., comparing the test tasks with the TLU domain in actual
teaching). However, the ‘overlap’ should be considered from participant
characteristics and values; that is, how participants interpret the test demands, how
they understand the test content and the relationship between test tasks and the TLU
domain. It thus proposes the necessity of gathering information from participants to
understand their opinions about the ‘focal construct’. It is also specified in the model
that the washback variability of participant characteristics and values play a major role
in the realisation of washback. In this way, differing from previous studies which
mainly collected data from test designers (see, for example, Qi, 2004a, 2004b),
teachers and students should be consulted to enable the researcher to gather
information about the positive and negative as well as intended and unintended
washback. Therefore, in this study, their perceptions were elicited through interviews
with teachers and interviews as well as surveys with students to find out the washback
value of the GVT.
Figure 3.4. The washback model of the GVT (adapted from Green (2007a))
As shown in Figure 3.4, the focal construct of the SHSEET is the development
of communicative language use as the ECSCE has indicated (Ministry of Education,
2011). However, since the study object is the GVT, the test features (use of MCQs,
decontextualised testing of grammar and vocabulary knowledge, etc.) have to be taken
into consideration, so a further content analysis of the test tasks was conducted before
entering the field (i.e., conducting classroom observation and interviews). Moreover,
to encompass a comprehensive view of washback direction, participant characteristics
Against this background and the limitation of Green’s model, the relationships
among different washback components are explored before moving on to LOA
theories. Therefore, the washback mechanism is reviewed, and a key summary is
provided as follows.
Based on insights from both Green’s model (2007a) and the literature summary
(see section 2.3.5), the researcher posits that test perceptions eventually influence
students’ learning outcomes through the indication of both participants’ affective
factors such as motivation and test preparation practices. This assumption further takes
reference from theoretical conceptualisations from Hughes (1993), Green (2007a), Xie
and Andrew (2013), and Wolf and Smith (1995). According to Hughes (1993),
participants’ test perceptions influence learning outcomes through test preparation
processes. Green (2007a) contends that participants’ “understanding of test demand”
influences their learning outcomes more than learning content. Furthermore, Xie and
Andrew (2013) suggest that test-takers’ perceptions of test use and test design
positively correlate and they influence test preparation through participants’
Drawing implications from all aforementioned theories and models, this study
conceptualises the washback mechanism of the GVT. In detail, test perceptions include
participants’ perceptions of test design characteristics and test importance, and
participants’ affective factors contain motivation and test anxiety. Further, test
preparation practices mainly refer to participants’ learning strategy use and test
preparation effort. Finally, test performance (i.e., students’ learning outcomes) is
represented by final SHSEET test scores reported by students. To conclude, test
perceptions are assumed to influence participants’ characteristics, which in turn affect
their test preparation practices. As a result, perceptions regarding the test may
influence students’ final learning outcomes. The detailed components of this washback
model will be explored in following chapters and the conceptualisation of washback
mechanism will then be tested through statistical modelling (see section 4.5.3.6).
The inclusion of an LOA model in the present washback study is both possible
and timely on theoretical and empirical bases (see section 2.3.3). As Chapter Two and
previous sections have indicated, the value or direction of washback is widely
discussed in the field of language testing and assessment. However, how to bring about
positive washback to encourage students’ learning has, until recently, not been the
focus of research. Theoretically, the LOA frameworks, reflecting the centrality of
students and learning, and aiming to promote learning, align with the assessment
guidelines and SHSEET design principles (see section 1.2.2) according to the ECSCE
(Ministry of Education, 2011). Empirically, frameworks of LOA are ideal since they
offer practical guidance for the preparation activities leading up to the test. To this end,
this section mainly focuses on key LOA frameworks, which can help answer RQ 2
(What are the opportunities for and challenges of the incorporation of Learning
Oriented Assessment (LOA) principles in GVT preparation?).
From Figure 3.5, the link between LOA and positive washback is evident. In
Green’s (2007a) washback model which incorporates intensity and direction (Figure
3.3), the overlap between the focal construct and test characteristics can lead teaching
and learning in the direction of positive washback. For Carless (2007), LOA occurs
when the certification purpose and the learning purpose of assessment overlap with
each other. As for the SHSEET, guided by the ECSCE’s key tenet of learner-
centredness, once the teaching and learning purpose overlaps with the test objective,
LOA opportunities will be achieved. In this sense, LOA coincides with the positive
washback concept in that the relationship between the test objective and teaching and
learning purposes is put in the same position as the ‘overlap’ between the focal
construct and test characteristics (Green, 2007a) to encourage students’ learning
practices.
Further, LOA aims to strengthen learning processes and can be achieved through
both formative and summative assessments (Carless, 2007). The realisation of LOA
can be achieved by following three principles (Carless, 2007, pp. 59-60). First, tasks
for assessment should promote students’ learning in a productive and comprehensive
way. Second, students should be involved in the assessment system through different
ways such as participating in the development of the assessment criteria and quality
and taking part in self- or peer-assessment. Third, teachers should provide timely and
forward-looking feedback to help students’ current as well as future learning.
Although it clarifies the possibility that LOA can promote positive washback,
the LOA framework proposed by Carless (2007) is set at the tertiary institutional level
Figure 3.6. Evidence for learning – the LOA cycle (Jones & Saville, 2016)
At the micro level, learning practices take place in the classroom context.
Generally, tasks can be learning-oriented once the LOA cycle is implemented in a
specific way of where teachers and students are all fully involved and play their roles.
During the LOA cycle inside the classroom, the records generated from teachers’
observation of students’ language activities, both learner-centred and content-centred
activities, are integrated into an informal record which will be jointly interpreted with
the achievement record from the external exam. More importantly, although the LOA
cycle clearly positions learning and learners at the centre, the teaching and teachers’
roles are not overlooked. The teacher’s role can be best depicted by the classroom
LOA circle since they design tasks to inspire and encourage language activities for
students, which are again recorded informally after observation. Further, the records
are used by teachers to make decisions and provide feedback to revise learning
objectives as well as examine prior knowledge. To this end, similar to Carless (2007),
the LOA cycle values the three major principles of learning tasks, classroom
interactions, and feedback at the centre of the micro classroom context. Moreover, the
micro level can also align with and compensate for the washback variability of
participant characteristics and values in the washback model of Green (2007a).
Therefore, participants’ knowledge or understandings of, resources to meet, and
acceptance of test demands are affected by the major principles of learning tasks,
The LOA cycle is an ecological and systematic model for AfL (Jones & Saville,
2016). However, in addition to the aforementioned key principles in LOA
(involvement in assessment, classroom interaction, and feedback), learner autonomy
also needs to be considered. In particular, although the importance of learner autonomy
has been recognised by researchers (Salamoura & Unsworth, 2015), it has not been
foregrounded in LOA theoretical frameworks (Carless, 2007; Jones & Saville, 2016).
Nonetheless, it has been claimed that autonomous behaviours can promote assessment
for learning (Lamb, 2010) and are closely related to self-assessment and peer-
assessment (Dam, 1995; Tassinari, 2012). Most importantly, the deeper the
involvement in (self-) assessment, the greater extent of autonomous practices (Bell &
Harris, 2013; Everhard, 2015). Indeed, self-assessment and peer-assessment are
important to cultivating learner autonomy (Little, 1996). These claims provide
theoretical and empirical evidence for including learner autonomy in LOA frameworks
and support its relationship with involvement in assessment and feedback.
The macro LOA cycle is closely linked with the reference for interpreting
achievement records or learning outcomes from both the summative external exam and
the formative structured records by teachers. The micro LOA cycle is fulfilled by
teachers and students who go through language activities and provide informal records
for further interpretation. In addition, this LOA framework can be applied to different
CEFR levels of English proficiency, from basic users at the Breakthrough or Beginner
The strengths of this LOA cycle are two-fold. On the one hand, it seeks the
potential of aligning the large-scale summative assessment with a classroom-based
formative assessment by exploring the common ground between them. The external
exam of the SHSEET and internal assessment of classroom activities jointly provide
records of both formal and informal assessment-related activities. On the other hand,
the core and major stakeholder groups are considered at both macro and micro levels.
At the macro level, the higher-level objectives and lower-level assessment content are
decided by curriculum developers and test constructors. At the micro level, LOA
activities are carried out by key stakeholder groups of teachers and students.
Although numerous strengths and possible alignment with the washback model
are clear, the LOA cycle is not without flaws and researchers should consider the
following aspects. First, the reference frame of CEFR for learning outcome
interpretation and the LOA cycle are envisaged in Cambridge English exams in the
European context, while the SHSEET as a yardstick for junior high school graduates
is in a Chinese high-stakes and large-scale standardised testing context. Second, as
Jones and Saville (2016) elaborated, the implementation of the LOA system entails
considerable effort. For example, the development of curriculum and materials at the
macro level, teacher training at the micro level to enhance their ability to understand
and identify students’ strengths and weaknesses, and also the approach which links
classroom assessment and external assessment should all be considered. Despite the
perceived shortcomings, as a positive learning promotion system, the LOA cycle can
be applied in the current SHSEET study.
In Figure 3.7, at the macro level, the frame of reference for checking students’
learning outcomes is the ECSCE, which sets the proficiency level of the SHSEET at
Level 5. The macro-level learning objectives from the ECSCE (the development of
communicative language use) and the micro-level content from the test (test
characteristics) are reflected in the SHSEET Test Specifications. In other words, the
SHSEET Test Specifications resembles and seeks reference from the ECSCE but
reflects the actual test content. In the current study, the GVT acts as the external exam
of which the learning outcomes will be interpreted in accordance with Level 5 (see
Appendix A) set in the ECSCE in the macro context. As for the classroom learning,
SHSEET-related activities are assumed to be learning-oriented as the ECSCE
prescribes (see section 1.2.2). Therefore, the whole SHSEET LOA cycle is envisaged
to be moving towards a positive direction, which can be explored through classroom
observations.
The SHSEET, which has a major selection function, first needs to bring about
educational changes to meet the requirement of the “quality education” policy
proposed by the Chinese government from the 1990s onwards (Dello-Iacovo, 2009).
Through “quality education”, in contrast to the “exam-oriented education”, student
The potential of using the LOA cycle (Jones & Saville, 2016) in the chosen
washback model (Green, 2007a) has been explained in the previous sections.
Moreover, the key concepts in Carless (2007) are also considered in this new washback
model, which are indicated in LOA practices. Figure 3.8 presents the relationship
between and the alignment of the two models in one diagram.
Figure 3.8. A new washback model incorporating LOA (Green, 2007a; Carless, 2007; Jones &
Saville, 2016)
As denoted in Figure 3.8, the macro level and micro level of the LOA cycle are
embedded into the washback dimension of value (or direction). The key elements of
the macro value comprise both the curriculum reference of the ECSCE (focal
The advantage of this proposed model is two-fold. On the one hand, the model
is comprehensive in that it provides all possible factors for the GVT washback on
teaching and learning. On the other hand, the model provides relevant LOA practices
to guide the data collection and analysis. In this way, it enables the researcher to
explore the LOA dynamic in the actual summative test preparation stages. However,
one limitation exists regarding the proposed model, that is, the inter-relationships
among different factors are not depicted. For example, the relationships among
different washback variability and washback intensity factors. Nonetheless, the
proposed model contributes to the extensive knowledge of washback in the GVT
context and thus provides theoretical as well as empirical directions for the proposed
study. To overcome the limitation, the researcher has made an attempt to investigate
the inter-relationships between those factors by exploring their structural relationships.
Thus, the washback mechanism of the GVT will be presented in Chapter Six. Most
importantly, a revised washback model incorporating LOA will be finally depicted
after the data analysis.
To sum up, based on the empirical findings from Chapter Two and the theoretical
discussion in this chapter, the present washback study of the GVT used the washback
model with two dimensions of the value and intensity (Green, 2007a) and the LOA
cycle with both micro and macro levels (Jones & Saville, 2016). As has been
Chapter Two and Chapter Three have demonstrated the abundance of washback
studies and theories. However, it is important to point out that as a well-established
concept, washback is not only theoretically rich but also methodologically fruitful. An
overview of the literature establishes that a wide range of methods has been employed
in empirical washback studies, among which interviews, questionnaires, test
administration, and classroom observations are most common. Structured
questionnaire is a widely used quantitative method (Hawkey, 2006; Xie & Andrews,
2013). Furthermore, questionnaires are also used in combination with qualitative
Chapter 4: Methodology 75
methods such as interviews (Gu & Saville, 2012; Qi, 2007) and classroom observations
(Hawkey, 2006; Saif, 2006) to employ MMR approaches in washback studies
(Burrows, 2004; Green, 2007b). Other methods such as textbook or material analysis
(Hawkey, 2006; Tsagari, 2009), administration of tests (Andrews et al., 2002; Green,
2007b), and student learning diaries (Gu et al., 2014; Zhan & Andrews, 2014) are also
used.
76 Chapter 4: Methodology
2002; Erfani, 2012; Fan & Ji, 2014; Özmen, 2011; Qi, 2005, 2007; Yang et al., 2013).
For instance, an exploratory sequential MMR design was adopted by Qi (2005) to
explore what factors influenced the intended washback of the NMET. The first phase
of the study collected qualitative data from interviews with test inspectors, test
constructors, teachers, and students. Qualitative data were then analysed to inform the
design of teachers’ and students’ questionnaires. Quantitative data, in turn, were used
to generalise findings from qualitative data.
Chapter 4: Methodology 77
2011; Moeller et al., 2016) and particularly in language testing and assessment studies
(Moeller et al., 2016).
Of all the different models of MMR, the present study adopted an exploratory
sequential MMR design to first collect qualitative data to explore factors for washback
values, washback intensity, and LOA opportunities and challenges. Findings from
qualitative data analysis informed the quantitative phase. The quantitative data
collection and analysis in turn provided further insight into the qualitative findings. As
a result, the research problem and questions were more comprehensively understood,
and both quantitative and qualitative methods were considered indispensable in the
present study. Table 4.1 depicts the current MMR design in detail.
Table 4.1
Qualitative and quantitative methods in the present study
78 Chapter 4: Methodology
combining both the qualitative and quantitative findings. The classroom observations
recorded and discovered the teaching and learning practices regarding the GVT
washback and LOA possibilities. The interviews further asked for student and teacher
perceptions about the GVT washback and LOA possibilities, especially exploring the
factors that influence their perceptions. The student survey was designed to address
the same questions regarding learners’ perceptions of the GVT washback and LOA
practices. In addition, the student survey sought to explore the qualitative findings in
a larger sample. The exploratory sequential MMR design and procedure are detailed
in the next section.
This exploratory sequential MMR study unfolded in four stages (see Figure 4.1).
In brief, Stage 1 informs Stage 2, Stage 2 leads to Stage 3, and Stages 1, 2, and 3
necessitate Stage 4 by integrating the final interpretation to answer the research
questions. Therefore, the four stages were accorded with equal importance throughout
the research.
STAGE 2 STAGE 4
• qualitative data • quantitative data
collection • instrument collection • interpretation
• qualitative data design • quantitative data
analysis analysis
STAGE 1 STAGE 3
Chapter 4: Methodology 79
knowledge discussed in the literature review (see Figure 2.2), and preliminary results
of the qualitative phase (Stage 1). In Stage 3, quantitative data were collected and
analysed to examine, generalise, or problematise findings from the initial qualitative
phase (Stage 1). In Stage 4, both the qualitative dataset and the quantitative dataset
were integrated to answer the research questions. Details of Stages 2, 3, and 4 are
provided in Section 4.5.
The entire research procedure is depicted in Figure 4.2 (see next page). The
credibility and accuracy of the findings were maximised through data triangulation
throughout the whole research process. The research procedure is detailed in following
sections.
80 Chapter 4: Methodology
4.4 QUALITATIVE PHASE
Of all the major types of qualitative data collection, observations and interviews
are recommended for exploratory ends (Johnson & Christensen, 2012; Merriam,
2016). In this study, the qualitative phase involved the collection of data from three
sources: classroom observations of three Grade 9 English teachers’ classes (15
sessions in total, consisting of five 40 minute lessons with each class), semi-structured
individual interviews with the three class teachers, and three focus groups with 18
students (six students in each group). Qualitative data sources are documented in Table
4.2.
Table 4.2
Summary of participant involvement in qualitative phase
Classroom Semi-structured
Participants Focus group interviews
observations interviews
Teachers √ √
Students √ √
82 Chapter 4: Methodology
Table 4.3
Information on the participating schools
Chapter 4: Methodology 83
4.4.2 Participants
For classroom observations, one Grade 9 class from School A (48 students), one
Grade 9 class from School B (52 students), and one Grade 9 class from School C (36
students) were chosen. The three English teachers of these classes were then
interviewed individually after classroom observations, and 18 students (six from each
class) were interviewed in three focus groups. Grade 9 was chosen because the students
in this grade were about to sit for the SHSEET. Specifically, the second semester of
Grade 9 was chosen because it was close to the examination date (each year, the
SHSEET is tested on the morning of 14th June from 9am to 11am in Chongqing), a
time relating to seasonality of washback (Bailey, 1999; Cheng, 2005), which has been
demonstrated by previous washback studies to be critical. Hence, the closer the time
of examination, the more intensive the test washback will be observed (Bailey, 1999;
Cheng, 2005; Cheng et al., 2011). It was beneficial to focus on Grade 9 in this study
to identify the extent the GVT might influence the teaching and learning practices, and
the opportunities that the GVT allowed for learning-oriented practices in test
preparation classes.
84 Chapter 4: Methodology
classroom teacher and students, which was also a prerequisite for running smooth
interviews.
As for semi-structured interviews, three teachers were included, each from one
of the three observed classrooms. Elsewhere, researchers suggest varied numbers of
interviewees, from five to 25 (Creswell, 2013), or from 12 to 20 (Kuzel, 1992), with
saturation commonly occurring after the first 12 interviews (Guest et al., 2006; Lincoln
& Guba, 1985). However, saturation was not the purpose of the interviews in the
current study. The purpose was to “pair” classroom observation and teacher interview.
Chapter 4: Methodology 85
Before entering the classrooms, the researcher obtained consent from schools,
teachers, and students. Traditionally, qualitative observation is always conducted in
natural settings with an exploratory purpose, and four types of observers’ roles are
classified: the complete participant, the participant-as-observer, the observer-as-
participant, and the complete observer (Johnson & Christensen, 2012). In order to
minimise unexpected effects or “frontstage behaviour” (Goffman, 1971), the
researcher acted as a “complete observer” (Johnson & Christensen, 2012) who took on
the role of “outsider” and nonparticipant observer (Creswell, 2015; Patton, 2015). In
this sense, the researcher entered the classrooms from the back door and sat in the back
of the class.
In total, 15 teaching sessions (five sessions for each teacher) were observed from
April to May in 2018 during the second semester of the final junior high school year.
The observation timetable was based on the teachers’ willingness and availability.
Although each school had a different number of English lessons per week, each lesson
lasted 40 minutes, which enabled the collection of a comparable amount of data across
schools. Besides, the observed classroom sessions were also audio-recorded with a
digital recorder, and extensive field notes were taken during and after observations
since it was necessary that field notes should be taken down, corrected, and edited in
time for memorising important details for later analysis (Johnson & Christensen,
2012).
86 Chapter 4: Methodology
ended; therefore, any idea that was omitted in design could be easily added after each
observation session. Generally, the scheme considered five major aspects. The first
part recorded demographic information. Part A and Part B were designed by referring
to the Communicative Orientation Language Teaching (COLT) (Sinclair & Coulthard,
1975), a widely used observation instrument in washback studies (Green, 2006b), and
the theoretical framework of the LOA cycle adopted (see Figure 3.7). Therefore, both
parts were used to document test preparation practices. Part C mainly dealt with the
omitted information at the design stage; therefore, anything interesting and worthy of
attention was added to the scheme. In addition, nonverbal behaviours were also
recorded in this part. After each classroom observation, the comment for the
observation and questions to be asked in the follow-up interviews were reflected in
Part D.
4.4.4 Interviews
Interview, as a widely-used data collection method, is often preferred by
researchers to conduct qualitative studies, during which the interviewees are asked to
answer general and open-ended questions, and the interviewer records the answers
(Creswell, 2015; Johnson & Christensen, 2012; Patton, 2015). As indicated in Table
4.1 and Figure 4.2, the post-observational phase contained interviews to ascertain and
explore participants’ perceptions of the GVT washback and opportunities for and
challenges of the incorporation of LOA principles in test preparation.
Chapter 4: Methodology 87
researcher tried to keep impartial (Johnson & Christensen, 2012). Third, the researcher
built trust with interviewees by 1) orally introducing herself and this PhD project; 2)
explaining the significance of this project and the value of their participation in the
integrity of this washback study; and 3) assuring interviewees that their information
would be kept confidential. Fourth, all the interviews were audio-recorded to secure
the data integrity for future retrieval in the data analysis phase. Fifth, separate
interview protocols for individual and focus group interviews were used. Sixth, since
all the participants were Chinese speakers and Grade 9 students might be shy to speak
in English, Chinese was used during interview processes to enable better
communication and eliminate misunderstanding. Seventh, to eliminate
misunderstanding and difficulty in answering interview questions, specific terms such
as “washback” were avoided. Instead, “test influence on teaching and learning” was
adopted.
It is important to note that the term “Learning Oriented Assessment” was used
in teachers’ semi-structured interviews, but not in students’ focus groups. The rationale
behind this decision was the actual consideration of participants’ assessment literacy
and age factors. This decision, made after experiencing actual classroom sessions and
interview discussions, was due to students’ difficulty in comprehending such a
technical term and the consideration of data collection efficiency. Nonetheless, to
better explore teachers’ perspectives, an LOA information sheet (e.g., a brief definition
of LOA) was provided to teachers after they gave their understanding the term.
Semi-structured interviews
Semi-structured one-on-one interviews were conducted with the three observed
English teachers. The one-on-one interviews in this study were semi-structured mainly
due to question design; that is, the researchers asked open-ended questions with broad
topics in a flexible wording or order (Minichiello et al., 2008). Semi-structured
interviews in the present study offered guidelines for the interview questions,
permitted flexibility to interviewees, and allowed possibilities for the interviewer to
probe new ideas (Merriam, 2016; Simons, 2009). After collecting and briefly analysing
classroom observation data, the individual semi-structured interviews were conducted
face-to-face in April and May 2018.
An interview protocol was used (Appendix F): except for some demographic
information such as date, place, and interviewer, the protocol for the interviews
88 Chapter 4: Methodology
contained open-ended questions developed around previous relevant studies, the LOA
framework, and the research questions. The interview questions were designed to elicit
teachers’ perceptions and their teaching experiences in terms of test washback and
LOA. Since those teachers were all experienced and quite familiar with the researcher
after going through the classroom observation stage, they were expected to articulate
and share their opinions comfortably.
Focus groups
As part of the qualitative phase of this exploratory sequential MMR study, focus
groups were immediately scheduled after asking for teachers’ permission and students’
availability. Focus groups were conducted to explore similar issues from the
perspective of students’ learning experiences and perceptions of the GVT as well as
LOA. As indicated in section 4.4.2, three groups of six students were purposively
selected to take part in the interviews. The information on student participants from
three schools is shown in Table 4.4, Table 4.5, and Table 4.6.
As shown in Table 4.4, Fei-SA and Chao-SA were students with a high language
proficiency from Lan’s class, Ming-SA and Ling-SA were ranked next, while Wei-SA
and Xia-SA had scored comparatively lower. Therefore, participants in School A were
regarded as students with very high (Fei-SA, Chao-SA) and intermediate language
proficiency levels. The reason for this categorisation was to keep student language
proficiency across schools in mind.
Table 4.4
Information on the participating students from School A
Student Pseudonyms Gender Age (years English Mock SHSEET
old) learning from test score
SA-S1* Ming-SA Male 15 Grade 7 120
SA-S2 Wei-SA Male 15 Not given 110
SA-S3 Fei-SA Male 15 Not given 140
SA-S4 Xia-SA Female 15 Grade 3 110
SA-S5 Ling-SA Female 15 Grade 3 120
SA-S6 Chao-SA Male 15 Not given 140
* SA-S1 indicates that this is Student 1 from School A.
Chapter 4: Methodology 89
signed the contract3 [for a senior high school enrolment] may feel fine to have their
one-hour study time taken”. She further considered it necessary for students with low
language proficiency level to have more time to study as participating in the focus
group might affect their mood. Therefore, School B students had an intermediate level
of language proficiency (see Table 4.5).
Table 4.5
Information on the participating students from School B
Table 4.6
Information on the participating students from School C
3
Contract signing is a common phenomenon in the SHSEE in Chongqing. According to teachers, there
is one large-scale mock SHSEE test in April each year, and after the release of the test scores, senior
high schools will pre-enrol students that meet up with their enrolment standards. Therefore, after signing
this senior high school enrolment contract, students are expected to attend the school, if the actual
SHSEE score of a student finally meets their requirement.
90 Chapter 4: Methodology
Theoretically, the idea of a ‘focus group’ originated from sociology (Merton et
al., 1956; Merton & Kendall, 1946), which means that the moderator leads the
discussion to collect data on the same topics or questions from a group of individuals
(Kamberelis & Dimitriadis, 2013). Therefore, in the current study, the moderator (i.e.,
the researcher) led the group discussion to elicit responses from all individuals but not
intrude when interviewees were talking. Once relevant ideas were expressed by the
interviewees, the researcher quickly jotted them down and made a quick decision of
whether to ask further questions about those ideas.
On the one hand, it is claimed that a focus group should usually include
homogeneous participants to improve discussion or heterogeneous participants based
on the research purpose (Johnson & Christensen, 2012). Although resources were
limited, the researcher tried to have heterogeneous interviewees in the focus groups.
Therefore, students with different English proficiency levels from low to high were
included. Moreover, interaction and cooperation could help to exchange and build
upon ideas smoothly in groups.
On the other hand, it is quite challenging for the interviewer to lead and keep all
the participants focus on the interview topic (Johnson & Christensen, 2012) and not be
distracted by others’ opinions or talking about other irrelevant ideas. To address this
concern, an interview protocol was used. The protocol design for the focus group
contained two sections (see Appendix G): the interview record (demographic
information) and interview questions.
Although focus groups are useful to collect shared understanding from several
individuals at a time and yield the best information through active interactions, it is
possible that students generate a ‘group think’ (Simons, 2009) and some students feel
hesitant to express themselves (Creswell, 2015). These challenges were addressed
through the seven steps mentioned at the beginning of section 4.4.4. Furthermore, in
order to recognise individual voices in audio-recordings, the researcher identified each
student by numbering them as SA-S1, SA-S2, SB-S1, SC-S1, etc. before starting the
interview. Thus, students responded to the interview questions by taking turns.
However, pseudonyms were used in data transcription and analyses in the main study.
Chapter 4: Methodology 91
4.4.5 Transcription and translation
Before data analysis, the audio-taped excerpts from classroom observations and
interviews were transcribed. Due to funding limitations, the researcher transcribed all
the recordings by herself, which was considered as an effective measure since the
researcher knew the materials better after conducting all the observations and
interviews in person. Therefore, the firsthand experience of the research process
helped transcription (Halcomb & Davidson, 2006). Moreover, field notes, observation
schemes, and interview protocols directly supported the transcripts and helped to better
understand the recordings. Therefore, during transcription, the researcher combined
the voice recordings with the field notes (i.e., Part C in the observation scheme) and
kept a transcription log when she made changes to transcripts. In this way, the
comprehensiveness of the qualitative data was ensured. Moreover, a transcription
symbol list was adapted from Powers (2005) to guide the transcription (see Appendix
H).
As stated in the previous sections, all the participants were Mandarin speakers
and the study was conducted by using Mandarin. Therefore, language translation was
a necessary and crucial process during the instrument design, data analysis and for the
final report. At the research design stage, for the predesigned interview protocols, the
researcher offered both the English and Chinese versions for the supervisory team to
check the reliability and translation. The supervisory team includes a Chinese-English
bilingual scholar who has studied and researched in English speaking countries for
many years. In addition, the researcher herself is familiar with bilingual translation as
an English major and has studied as well as used English for more than fifteen years.
Before entering the filed, all protocols were discussed with the supervisory team.
After finalising qualitative instruments, the researcher then started to collect data
in Chinese. The use of the shared first language enabled better communication and
understanding between the researcher and participants, especially the teenage students.
At the data analysis stage, the original analysis was in the source language of Chinese
(Squires, 2009), but the researcher kept communication with the supervisory team in
English to get advice on analysis and writing up of the project in English.
92 Chapter 4: Methodology
2010), this was unrealistic because of the limited funding and resources for the current
project. Nonetheless, dynamic equivalence, which seeks the most natural way to
enable target language users to comprehend the information reproduced and translated
by source language users, was used as the translation principle (Nida, 1977; Sutrisno
et al., 2014). Dynamic equivalence was essential in that the collected data reflected
Chinese culture, but the report in English demands translation from Chinese to English
which brought about cultural change (Halai, 2007). Although it was demanding, the
researcher’s capability in Chinese-English translation, the bilingual capacity of the
supervisory team, plus cross-checking of the reported scripts by several English
speakers enabled the integrity of the translation to be established.
Two types of thematic analysis have been identified. One is inductive (thematic)
analysis (Braun & Clarke, 2006; Patton, 2015) or data-driven thematic analysis
(Boyatzis, 1998). As the name indicates, this analysis mainly uses qualitative data to
generate new ideas, identify themes, and/or theories. The other type of thematic
analysis is deductive (Braun & Clarke, 2006; Patton, 2015), known as theory-driven
or prior-research-driven (Boyatzis, 1998). In contrast to inductive thematic analysis,
the deductive thematic analysis is usually driven by the theoretical framework (a new
washback model incorporating LOA, Figure 3.8) or analytic interest, and it determines
how qualitative data collected support the theory or framework being used. The
flexibility of thematic analysis allowed the researcher to identify key features of the
abundant qualitative data and unpredictable insights (Braun & Clarke, 2006). On the
one hand, using the deductive thematic analysis, which built on the theoretical
framework of the new washback model (see Figure 3.8), could elicit explicitly specific
and significant data aspects regarding the washback of the GVT. On the other hand, it
was also important to be open to new themes emerging from the data, which could add
new knowledge to the study and the model. Therefore, both deductive and inductive
Chapter 4: Methodology 93
thematic analyses were adopted in the data analysis. The process of applying thematic
analysis is depicted in Figure 4.3.
Figure 4.3. The process of thematic analysis (Braun & Clarke, 2006)
As indicated in Figure 4.3, six stages are necessary. In the current study, the
researcher generated two coding schemes based on the proposed washback model (see
Figure 3.8). The teacher coding scheme was constructed based on classroom
observations and teacher interviews, and the student coding scheme was constructed
based on focus group data.
Ming-SA: In my opinion, its test items are mainly testing our grammar
knowledge,/
which are totally irrelevant to our daily life./
In fact, I think that MCQ in the GVT has so many problems./
Besides, some of the items are, one and two options of one test item,
according to the meaning, are correct, but the grammar knowledge…/
(FG-SA)
Following the above procedure, the qualitative dataset was segmented, and two
coding schemes were constructed. After constructing the teacher and student coding
schemes, co-coding was conducted. In effect, using co-coding was of concern as Braun
et al. (2019) argue that co-coding works at odds with fully qualitative paradigms and
aligns more with a (post)positivism paradigm. While acknowledging their caveat, the
philosophical position of this MMR study is to think through qualitative and
quantitative paradigms. In this vein, co-coding enables methodological values sharing
between the two paradigms. As Boyatzis (1998, p. vii), a seminal researcher on coding
94 Chapter 4: Methodology
reliability approach, explains, coding reliability “is a translator of those speaking the
language of qualitative analysis and those speaking the language of quantitative
analysis”. As such, the co-coding was completed before writing up the thesis, the
details of co-coding are reported later in section 4.4.7.
In this exploratory sequential MMR study, qualitative analysis was crucial for
the quantitative instrument design. However, the time for transcription, translation,
and analysis was relatively limited (from end of May to end of July 2018) since the
survey was planned to be distributed shortly after students took SHSEET on 14th June
2018. Therefore, it was impractical to complete the qualitative analysis in such a short
time. Nonetheless, the researcher managed to preliminarily transcribe and analyse the
data to facilitate the survey design. This process was feasible and durable since the
survey design was also guided by literature findings (see Figure 2.2) and the proposed
washback model (see Figure 3.8) in relation to research questions. Therefore, the key
ideas from the coding of the qualitative data were adopted in instrument design for
both the pilot survey (August 2018) and finalised survey (September and October
2018).
Literally, validity means how valid or truthful the study is. To ensure the
accuracy of the study, “qualitative inquirers often employ validation procedures”
(Creswell, 2015, p. 261). Various scholars have proposed methods to enhance both
internal and external validity in qualitative research (Creswell, 2015; Franklin &
Ballan, 2001; Merriam, 2016). Table 4.7 lists ways to enhance validity in the
qualitative phase through synthesising suggestions from Franklin and Ballan (2001)
and Creswell (2015).
Chapter 4: Methodology 95
Table 4.7
Methods for increasing reliability in the present study
Reliability refers to “whether the results are consistent to the data collected”
(Merriam, 2016, p. 251), which means consistency or dependability (Lincoln & Guba,
1985). In order to maintain the internal and external reliability, different methods are
suggested (Franklin & Ballan, 2001; Merriam, 2016). Table 4.8 lists ways suggested
by Franklin and Ballan (2001) to enhance research reliability.
Table 4.8
Methods for increasing reliability in the present study
The reliability of this qualitative stage was enhanced by establishing both intra-
coder reliability and inter-coder reliability. For intra-coder reliability, the researcher
herself went back to check the coded segments several times and revised the coding
scheme and coded segments accordingly. For inter-coder reliability, the researcher
invited two independent third parties to co-code 30% of the data, as recommended by
Gass and Mackey (2000). The student focus group data were co-coded by a junior high
school English teacher who was not involved in the data collection. Thus, one out of
96 Chapter 4: Methodology
the three focus group transcripts was co-coded. For the teacher data, four classroom
observations4 were co-coded and for semi-structured teacher interviews, one out of the
three transcripts was co-coded. The co-coding results are documented in Table 4.9.
Table 4.9
Co-coding results for qualitative transcripts
Table 4.9 shows that the overall co-coding agreement for student data is 89.55%,
and the average co-coding agreement for teacher data (both interviews and classroom
observations) is 87.93%, which are all above the suggested co-coding rate of 80%
(Braun et al., 2019). These levels are acceptable, and most importantly, the researcher
discussed with the co-coders regarding coded segments that were coded differently.
The researcher first asked why co-coders coded certain segments differently, and then
reached an agreement after going through all the relevant segments. Co-coding was an
important instrument to establish the reliability of the qualitative coding analysis.
Although this was a time-consuming process, since the co-coders had to spend some
time to discuss with the researcher and detailed reasons had to be explored and codes
amended, it was worthwhile.
Taking the issues of validity and reliability seriously in the qualitative data
collection and analysis, a series of themes were identified and located. In detail, those
themes were related to washback and LOA practices, participants’ perceptions of the
GVT, participants’ perceptions of incorporating LOA principles in test preparation,
and possible reasons for their perceptions regarding the LOA incorporation. Therefore,
the qualitative phase laid the foundation for the subsequent quantitative phase,
especially the instrument design.
4
Although 15 classroom sessions were observed, three of them (two from Hu, and the other one from
Zhang) were focusing on either reading comprehension or listening comprehension, thus only classroom
sessions related to grammar and vocabulary was coded for main study analysis. Therefore, three
observed sessions were not counted and analysed in the qualitative phase.
Chapter 4: Methodology 97
4.5 QUANTITATIVE PHASE
5
WeChat is a very popular social media app in China.
98 Chapter 4: Methodology
to statistical software (e.g., SPSS) for analysis, which saved time for data entry and
data entry error was controlled.
However, although the original plan was to collect online questionnaires only, it
turned out that some participant schools were reluctant to forward the online flyer due
to their school management considerations 6 . Therefore, the final data collection
included both online and paper questionnaires, and the paper questionnaires were
mailed back to the researcher. Nonetheless, no difference existed in the two versions,
as the online flyer was also printed out to students when they were asked to fill in the
paper survey. Further, as the ethical clearance for the quantitative phase did not
anticipate this change, a later report of this incident was submitted online to the Office
of Research Ethics and Integrity at QUT and this incident will be reflected in section
8.3.
Before instrument design started, the language of the survey was carefully
considered. Firstly, the participants’ mother tongue was Chinese, and secondly,
considering the age factor of participants, Chinese was chosen as the research language.
Therefore, the survey was designed in Chinese but translated into English for the
purposes of supervision, ethics application, and thesis reporting. After these
fundamental considerations, the process of instrument design and development was
undertaken from the following four aspects.
6
In some schools, mobile phones were restricted, and some teachers still preferred a traditional paper
survey.
Chapter 4: Methodology 99
First, preliminarily analyses of the qualitative data informed the survey design.
As the researcher was also the observer and interviewer during the qualitative stage,
she was very familiar with the data. As a result, key items of demographic information,
perceptions of test design characteristics, test anxiety, classroom interaction,
involvement in assessment, and feedback were designed based on the observation and
interview data. Moreover, single-item measure such as test difficulty for each of the
four task types was overt to design.
Second, during the design of student survey, existing instruments in the literature
were referred to for certain constructs. For example, knowledge of test design
characteristics (Xie, 2010), motivation (Qi, 2004b; Xie, 2010), time investment (Qi,
2004b; Xie, 2010), learning strategy (Qi, 2004b; Xie, 2010), learner autonomy (Zhang
& Li, 2004), and test importance (Jin & Cheng, 2013).
Third, the new washback model (Figure 3.8) and the LOA cycle in the GVT
context (Figure 3.7) were also the reference for instrument design.
Fourth, face validity was ensured prior to pilot study. As the researcher found in
focus groups, academic or complicated words could not be understood by students.
Once the survey was drafted by the researcher, the supervisory team worked together
to check whether the survey items were related to the research questions. As survey
participants were junior high school graduates, who were teenagers, the intelligibility
of the wording was the priority for answering the survey. Therefore, before the pilot,
face validity of the survey was ensured by different stakeholder cohorts.
The first cohort was ten Grade 9 English teachers from Chongqing and
Guangdong, Gansu, and Yunnan provinces. They suggested that the survey was clear
and easy to understand, but further suggestions regarding the survey format and
wording were provided and adopted. The second cohort included two Chinese
language teachers and one undergraduate who majored in Chinese language. As the
student survey would be mainly implemented in Chinese, these stakeholders provided
professional language editing advice. The third cohort was composed of eight Grade 9
graduates and three Grade 9 to-be students from Guangdong and Henan provinces.
Interestingly, they provided comments that items like “classroom interaction” and
“involvement in assessment” were not clear enough, which was edited accordingly.
The last cohort was the research team and researcher’s colleagues. The researcher had
been working closely with the supervisory team during the survey design and revising
The survey comprised two main parts. The first part was about students’
demographic and background information (i.e., gender, SHSEET test score, school
district, school name, class number). The second part was the washback questionnaire
relating to the GVT. Specifically, the washback questionnaire contained two sections:
washback value and washback intensity. It is noteworthy that the macro level of
washback value was not included since students could hardly provide any information
regarding curriculum and/or Test Specifications. In total, seven demographic
information items and 73 washback items were included in the pilot study; and six
demographic items and 77 washback items were included in the main study. A five-
point Likert scale (e.g., 1=Never, 2=Seldom, 3=Sometimes, 4=Often, 5=Always) was
adopted. The detailed items within the pilot version and the final version of the survey
are listed in Appendix I. The process of piloting is discussed in the following section.
The researcher distributed the pilot survey first online mainly to students in two
observed schools and students in another unobserved school in Chongqing. The survey
link created through Sojump was sent to Grade 9 graduates in those schools. In total,
192 students responded to the pilot survey. Among the 192 respondents, two of them
were non-Chongqing students. Another four questionnaires were invalid due to
students’ using of same answers to the whole scale. Therefore, 186 valid
questionnaires were collected in the pilot stage. This sample size was considered as
Table 4.11
Item-Total Statistics of construct reliability of negative perceptions of test design characteristics
Item-Total Statistics
Scale Mean if Scale Variance Corrected Item- Squared Multiple Cronbach's Alpha
Indicators Item Deleted if Item Deleted Total Correlation Correlation if Item Deleted
v2 4.6290 2.732 .382 .164 .703
v3 4.6505 2.077 .596 .363 .427
v4 4.8710 2.156 .496 .299 .569
Another indicator of the item reliability was the value of corrected item total
correlation to all items. If the value was below the cut-off value of .33, it means that
the corresponding item can account for less than 10% of the variance of the scale and
is therefore not reliable enough (Ho, 2006). The corrected item total correlation value
of .382 for v2 justified that v2 explained more than 10% of the scale variance of
“negative perception” construct (Ho, 2006). Therefore, rather than deleting v2, a
decision was made to revise the item. Based on the suggestions of the supervisory team
and survey respondents, this item was changed to “v1 The GVT only aims to test
Survey distribution
To approach potential student participants, the researcher first contacted the
schools observed and an TRO who was working in Chongqing educational department
to distribute the online questionnaire mainly to former Grade 9 (Grade 10 at the time
of data collection) students in the nine districts and seven counties in Chongqing (the
joint area, see section 1.2.3). The time for carrying out the main survey was at the
beginning of September 2018 when students had known their SHSEET scores and
restarted school. This was an ideal time to collect quantitative data as students were no
longer coping with the stress of test preparation, but their test experiences were still
fresh. It was thus feasible to ask them to reflect on their perceptions and activities
regarding SHSEET test preparation.
As indicated at the beginning of section 4.5, quantitative data were collected both
online and from a paper version. Further, the collected data were from both the
expected districts/counties and regions not included in the joint area. Since respondents
were from different educational jurisdictions, when they responded to the survey
The total number of valid surveys was 922. For paper version, 538 students’ data
were collected, among which seven were from non-Chongqing graduates, two students
only wrote down their demographic information, and another 29 questionnaires used
the same answer to the whole survey (mainly choosing 3 on the five-Likert scale).
After the removal of these invalid responses, 500 valid surveys were used as the data
source for analysis. As for the online survey, 541 students accessed the online survey
through Sojump. However, 116 students either participated in the pilot version or did
not graduate from a Chongqing school. Therefore, they were not eligible for the survey
research. Besides, another three students gave the same answer to all the survey items,
so ultimately there were 422 valid responses to the online survey. In order to make
sure that the survey remained statistically “identical” to the two student cohorts (i.e.,
paper survey participants and online survey participants), the results of data screening
and comparison are reported in the following section.
The missing values of paper-based survey data were replaced with “series mean”.
The reasoning for this is the missing value percentage was minimal (0.562%, far less
than 5%) (Graham, 2009). Further, both paper survey and online survey were
compared through applying Independent Samples T-test and results are attached in
Appendix J. In general, although paper participants (N=500) and online participants
(N=422) responded to most items in a significant different way (p<.05), the effect size
of the difference was rather small (r<.30). As such, it could be argued that despite the
statistical difference between the two cohorts, their difference, in practice, was not
meaningful (Field, 2009). Therefore, responses from both paper and online survey
participants were combined as a whole dataset. The dataset was randomly split into
First, the test paper types were analysed. It was found that 805 students (87.31%)
used Paper A and 115 students (12.47%) used Paper B. Two students (0.22%) did not
respond to this item. Although Paper A and Paper B were two different SHSEET test
papers, they were of the same content and characteristics.
900 805
800
700
600
Frequency
500
400
300
200 115
100 2
0
N/G Paper A Paper B
600
518
500
401
400
Frequency
300
200
100
3
0
N/G Female Male
Further, except for six students who did not report their district (see Figure 4.6),
both the online survey and paper survey were mainly completed by schools in Nanan
District (N=307, 33.30%), Changshou District (N=166, 18.00%), Jiangbei District
(N=145, 15.73%), Jiulongpo District (N=69, 7.48%), Jiangjin District (N=42, 4.56%),
and Shapingba District (N=33, 3.58%). As a municipality, Chongqing has 26 districts,
8 counties, and 4 autonomous counties. Hence, the participants in the main study came
from a wide range of districts and counties in Chongqing, although not equal in
numbers.
350
307
300
250
Frequency
200 166
145
150
100 69
42 33
50 14 23 21
6 2 5 4 1 2 5 1 1 3 12 4 1 2 3 4 3 10 3 1 9 6 1 10 3
0
0
1
7
3
4
2
17
21
28
10
15
29
32
33
34
12
24
38
30
25
26
16
14
11
22
35
18
23
37
13
Note. 8=Jiulongpo District; 9=Nanan District; 5=Dadukou District; 12=Banan District; 38=Pengshui
Miao and Tujia Autonomous County; 6=Jiangbei District; 14=Jiangjin District; 7=Shapingba District;
4=Yuzhong District; 11=Yubei District; 18=Qijiang District; 23=Rongchang District; 21=Tongliang
District; 13=Changshou District.
Figure 4.6. Distribution of school district
40
35
30
25
Frequency
20
15
10
0
21
30
33
38
51
56
63
71
77
81
86
95
42.5
46.5
67.5
90.5
101
105
108
116
121
124
129
131
134
139
144
148
112.5
118.5
126.5
136.5
141.5
Figure 4.7. Distribution of participants’ SHSEET test scores
Before running EFA, CFA, and main statistical analyses, descriptive analysis
was conducted, and results are shown in Appendix K. From the results, the standard
deviations of the items range from 0.78 to 1.21 on a five-point Likert scale. Therefore,
the results suggested an adequate variance in participant responses except the ones
identified as problematic by the item analysis reported earlier.
In order to verify the construct validity of the instrument, both EFA and CFA
results were demonstrated for each construct of the main study. To repeat, variables in
test preparation effort and test difficulty was not analysed through factor analysis since
they are single-item measures (see section 4.5.1). For EFA results, both correlation
matrix for indicators, Kaise-Meyer-Olkin (KMO) measure, communalities, total
variance, scree plot, and assessment of normality of each construct were explored to
check the validity of constructs in the instrument. For CFA results, main model fit
indices of each construct were examined. Most importantly, before starting factor
analysis, the researcher decided to use maximum-likelihood method in factor analysis.
The reason was that the current sampling was expected to be able to generalise the
results to a larger population (Field, 2009).
For the sake of conciseness, this study mainly took the motivation construct as
an example for conducting EFA, which demonstrates a theory-driven method
(extraction of a fixed number of two factors from motivation construct according to
motivation theory). Therefore, instead of extracting factors based on eigenvalues
greater than 1, the researcher set a fixed number of two factors to be extracted
according to the theoretical dimension of intrinsic and extrinsic motivation. Further,
CFA results are also discussed to provide an example for the validation of the
motivation construct.
Table 4.14
Correlation matrix for the indicators within the motivation construct
Indicators v15 v16 v17 v18 v19 v20 v21 v22 v23 v24
v15 1.000 .651 .608 .618 .388 -.187 .314 .356 .367 .380
v16 .651 1.000 .767 .724 .408 -.150 .311 .407 .476 .477
v17 .608 .767 1.000 .733 .443 -.119 .287 .414 .478 .467
Correlation
v18 .618 .724 .733 1.000 .516 -.084 .364 .482 .517 .547
v19 .388 .408 .443 .516 1.000 -.165 .176 .320 .498 .343
v20 -.187 -.150 -.119 -.084 -.165 1.000 .260 .219 .083 .152
v21 .314 .311 .287 .364 .176 .260 1.000 .566 .453 .489
v22 .356 .407 .414 .482 .320 .219 .566 1.000 .547 .589
v23 .367 .476 .478 .517 .498 .083 .453 .547 1.000 .654
v24 .380 .477 .467 .547 .343 .152 .489 .589 .654 1.000
v15 .000 .000 .000 .000 .000 .000 .000 .000 .000
v16 .000 .000 .000 .000 .001 .000 .000 .000 .000
v17 .000 .000 .000 .000 .007 .000 .000 .000 .000
Sig. (1-tailed)
v18 .000 .000 .000 .000 .040 .000 .000 .000 .000
v19 .000 .000 .000 .000 .000 .000 .000 .000 .000
v20 .000 .001 .007 .040 .000 .000 .000 .042 .001
v21 .000 .000 .000 .000 .000 .000 .000 .000 .000
v22 .000 .000 .000 .000 .000 .000 .000 .000 .000
v23 .000 .000 .000 .000 .000 .042 .000 .000 .000
v24 .000 .000 .000 .000 .000 .001 .000 .000 .000
As shown in Table 4.15, KMO value of the motivation construct was .886, which
was well above the cut-off value of .50 (Kaiser, 1974). Moreover, Bartlett’s test of
sphericity confirmed the adequacy of the magnitude of the correlations by presenting
a Chi-square value of 2220.687 (p<.001), which was statistically significant.
Therefore, these two results indicated that EFA should generate distinct and
predictable factors.
Table 4.15
KMO and Bartlett’s test for the indicators within the motivation construct
Table 4.18
Total variance of the motivation construct explained by its indicators
Table 4.19
Assessment of normality for the indicators within the construct motivation
To sum up, the validity and reliability of the designed washback scale has been
tested and verified. Each construct was modified and validated through EFA and CFA.
Indicators deleted from the constructs include v19 and v20 in motivation scale; v38,
v42, and v43 in learning strategy scale; v49 and v50 in interaction scale; and v54 and
v56 in involvement in assessment scale. The decisions for deleting those variables
remained consistent across all constructs. The detailed EFA and CFA results of each
construct after deleting those variables are attached to Appendix L. Further
quantitative analyses in this study were conducted without the deleted indicators. The
statistical hypotheses were tested and thus helped to address the corresponding
research questions in Chapter Five, Chapter Six, and Chapter Seven.
Data analysis
In Chapter Five, Chapter Six, and Chapter Seven, descriptive statistics
(percentage distribution), Multiple Correspondence Analysis (MCA), SEM, and CFA
were applied.
Modelling LOA practices through CFA took both theories (Carless, 2007; Jones
& Saville, 2016; Lamb, 2010) and qualitative findings into consideration. Specifically,
the proposed key LOA practices consist of four constructs: classroom interaction,
involvement in assessment, feedback, and learner autonomy (see section 3.4). All these
four constructs were explored in the student survey. In addition, the correlation
between those constructed factors were posited due to reasons that the constructs were
seemingly correlated in qualitative data and the claim that LOA cycle is systematic
and ecological (Jones & Saville, 2016). All these assumptions were tested through
CFA, with an examination of whether LOA practices constitute a four-dimensional
model. Details of this CFA model will be presented in Chapter Seven.
In the qualitative phase, informed consent was gained from the schools, teachers,
and students before conducting the classroom observations and interviews. The
participants were informed of the research objectives, project information, and audio-
recording requirements. They were assured of confidentiality during the research
processes and final report. Interviews, in particular, were completed in a quiet,
spacious, and private office located in schools. In addition, each participant was
assured of the freedom to withdraw from the research (Johnson & Christensen, 2012;
Minichiello et al., 2008) if there was anything they felt uncomfortable about, although
no one chose to withdraw. The audio recordings were stored on the QUT network
which could only be accessed by the researcher herself. The written protocols were
locked in the researcher’s office at QUT.
In the quantitative phase, both the online survey and the paper survey were
distributed anonymously. For the online survey, a brief introduction of the research
and survey (i.e., the survey flyer attached in Appendix N) was presented to participants
once they opened the online survey link. For the paper version, the participant
information sheet was provided along with the survey. Thus, participants were assured
of the anonymous and voluntary nature of the quantitative phase. Moreover,
participants were free to withdraw. Clicking “leave the survey” button allows
withdrawal from the online survey. Not returning the survey or returning the survey
without answering it allows withdrawal from the paper version. The collected data
were also stored on the QUT network with the recordings and in the researcher’s office
locker at QUT to keep safety and confidentiality. Submission of the online survey and
the paper survey was regarded as participants’ voluntary participation in this research.
At the data analysis and reporting stages, all the participants, specifically the
interview and classroom observation participants were de-identified by using
Chapter Four illustrates the research processes of the current study. To sum up,
the current study employed an exploratory sequential MMR design to investigate the
GVT washback and LOA practices, perceptions of the GVT washback and LOA
opportunities as well as challenges, and influential factors for participants’
perceptions. The research procedure and research design are summarised in Table
4.20, and the findings of data analysis are presented subsequently in Chapters Five,
Six, and Seven.
Table 4.20
The overall research procedure for the present study
Chapter Five presents the data analysis and findings with regard to the washback
value (i.e., negative or positive) from the perspectives and practices of Grade 9
teachers and students as they prepared for the Grammar and Vocabulary Test in the
Senior High School Entrance English Test (the GVT). Through the analysis of data
obtained by classroom observations of teaching and learning practices, interviews with
teachers and students, and a student online survey, this chapter addresses the first sub-
research question of RQ1:
Figure 5.1. Focus of the new washback model in Chapter Five (Carless, 2007; Green, 2007a; Jones &
Saville, 2016)
The official test reference documents in this study refer to both the English
Curriculum Standards for Compulsory Education (ECSCE) and the Test Specifications
for SHSEET (Test Specifications). How participants understand and implement
ECSCE principles and use Test Specifications as test preparation reference are viewed
as a key factor at the starting point for test washback, which is perceived by researchers
as intended as well as unintended purpose during the test design stage (Linn, 1993; Qi,
2007). Although teachers and students are not test designers, as implementers of
curriculum and test design ideas (i.e., teachers) and receivers of tests and test
preparation (i.e., students), their opinions are helpful to understand the intended
washback.
The ECSCE designates the teaching and testing scope for compulsory education
(see Chapter One, section 1.2.2). Moreover, the Test Specifications reflects the ECSCE
language learning objectives, since both documents emphasise learner-centred
teaching and learning. As conceptualised in Chapter Three, both the ECSCE and Test
Specifications were positioned at the macro level of the current new washback model
which incorporates LOA. Therefore, in this section, participants’ understanding of
these two official test reference documents, teachers’ implementation of the ECSCE
principles, and participants’ use of Test Specifications are reported.
Before moving on the the qualittaive findings, the three teachers’ information is
summarised in Table 5.1.
Table 5.1
Information on the participating teachers
According to Hu, teachers were aware of the synergy between these two
documents. That is, Test Specifications was assumed to have the same learner-centred
characteristic as the ECSCE. This finding, to some extent, aligned with the assumption
of an LOA cycle that the higher-level objective from the curriculum should be applied
in creating an LOA syllabus (Jones & Saville, 2016). Therefore, it provides some
evidence of incorporating LOA principles at the macro exam level in the GVT context.
Additionally, all three teachers agreed that the GVT design reflected ECSCE
learning objectives (i.e., the overall ability to use language and the five learning
objectives of affective attitudes, cultural understanding, language knowledge,
language skills, and learning strategies). For example, while Lan recognised the testing
of language knowledge through specific items, she felt that the test paper as a whole
reflected learning objectives in the ECSCE.
Lan: I mean, actually, this overall learning objective is not realised through any
single test item. … For example, MCQ in the GVT, you can’t link it alone with
those five sub-objectives and discuss how much importance it has to these
objectives. Instead, this thing, together with other tasks and at different
teaching stages, what it can accomplish with others. (Interview)
In a similar vein, students agreed with their teachers’ comments on the crucial
role of test reference documents. In fact, to explore the macro washback value from
the students’ perspective, mainly use of Test Specifications was reported. In focus
groups, students (e.g., Kai-SC) indicated that they did recognise the crucial role of Test
Specifications in their test preparation. However, their understanding of the document
was restricted as their focal point was the explicit knowledge designated by the
documents to be the GVT scope. Therefore, at a macro level, the evidence of students’
Moreover, Zhang’s accounts proved that the GVT design followed what Green
(2007a) would call the “focal construct” of the curriculum. This finding thus
conformed to the LOA cycle assumption that the test design characteristics should also
contribute to an LOA syllabus (Test Specifications in this context), which followed
the key ideas in the curriculum at a macro level (Jones & Saville, 2016). Therefore,
the test design was believed to reflect curriculum stipulated learning objectives. For
example, Zhang commented that, when designing the Cloze, test designers’ intentions
were to “test higher-order language skills, thinking, language use abilities, and
knowledge” and to “weaken the testing of grammatical knowledge” (Interview). Thus,
it was confirmed that the GVT was designed to have an overlap with the focal construct
of the curriculum, providing a potential for positive influence (Green, 2007a) and
reflecting test designers’ positive washback intentions (Sharif & Siddiek, 2017).
This part of the section reported on the participants’ understanding of the two
official test reference documents. Summarising from the qualitative findings, all three
teachers agreed on the learner-centred and thus learning-oriented characteristics of the
ECSCE and Test Specifications. Further, teachers as well as students agreed that these
two documents had an important role in test preparation. Most importantly, all three
teachers agreed that the test design of the GVT reflected the ECSCE learning
objectives and Zhang particularly commented on the alignment between focal
construct and test design intentions. Therefore, it was assumed that at the macro level,
the GVT could generate positive washback due to curriculum developers’ and test
designers’ intentions to align the test with key curriculum principles.
5.1.2 Implementing the principles in the ECSCE and using the Test
Specifications as test preparation reference
From section 5.1.1, although teachers stated that they were aware of what the
curriculum specified for teaching and assessment, their implementation of ECSCE
principles was restricted in the test preparation context. While Lan’s teaching was still
guided by those principles during test preparation, the other two teachers admitted that
their teaching had originally followed these principles when teaching lower grades,
but their adherence to them decreased according to the proximity of the test, especially
in Grade 9. Their comments on this, however, were related to the SHSEET test
Hu: But topics such as this, for example, topics like suggestions for study or exam
anxiety, doesn’t this belong to one of the 24 topics designated by the SHSEET?
Topics like learning approaches, etc. (SB-CO29)
9
SB-CO2 means “School B, Classroom Observation 2”.
Kai-SC: It is the major essence of our junior high school English learning; it synthesises
those most important as well as common phrases and words and records them
in this one document. (FG-SC)
Among those three teachers, Zhang most emphasised the value of Test
Specifications in her teaching and to her students, whose proficiency was lower than
that of students in other schools. For example, she commented on the crucial role of
this document to her students.
Zhang: For them (i.e., students in another high-level school), it is voluntary [to buy
Test Specifications]. But for my students, I required them to buy. … Those
students with high language proficiency levels, Test Specifications for them is
not really such a useful guidance. It [Test Specifications] is a very basic and
general public thing. It specifies vocabulary that should be mastered by
students. … Okay, then we have to use that as our reference and guidance.
(Interview)
Fei-SA: The … decrease of the test score and weight for this section is to avoid testing
pure or simple grammar and language knowledge, and to promote students’
writing abilities and the ability of writing down one’s ideas. (FG-SA)
In sum, this part of the section reported on the participants’ use of the two official
test reference documents. The overall findings indicated that although every teacher
was familiar with teaching and assessment principles in the ECSCE, only Lan
implemented these in test preparation; further, although Lan was positive regarding
the implementation of curriculum principles in teaching, the other two teachers
expressed difficulties in Grade 9; and although all three teachers used Test
Specifications as their teaching reference, only Zhang especially emphasised its
importance to her students. However, even though participants recognised the
important role of test reference documents, their use of these documents was restricted
in that teachers rarely implemented teaching as well as assessment principles in the
ECSCE during test preparation and teachers as well as students mainly took Test
Specifications as test preparation reference by focusing on test score change and test
scope designated in the document. Results from this section indicated that there may
be the potential for both negative (Hu, Zhang, and students) and positive washback
(Lan) at the macro level, since teachers implemented ECSCE principles differently
and students used Test Specifications differently.
As reported in section 1.2.5, four tasks are included in the GVT and examples
are presented in Appendix B. This section presents participants’ perceptions of
characteristics of the GVT, which is one major factor in the washback value dimension
of the new washback model incorporating LOA. According to Green’s (2007a)
washback variability, participant characteristics and values are assumed to be
influenced by the potential for both negative and positive washback generated from
the macro level (i.e., the overlap between focal construct and test design
characteristics). To explore the micro washback value, teachers’ and students’
perceptions of test design characteristics were first explored.
It was noticeable that teachers and students generally commented on the lack of
communicative language skills tested in the GVT. In their opinion, these skills were
only tested through specific communicative language use tasks like the “Oral Test”
task in the SHSEET paper or written dialogues in MCQ items (see item 35 in Appendix
B for example). Due to the decreasing number of those items, teachers as well as
students felt that the GVT was no longer assessing students’ communicative language
ability. Lan clearly explained this in the interview:
Lan: Communicative language? Well, for communicative language, things are like
this, previously there were probably three items, it seems to be one or two
items in MCQ to test communicative language, which are special
communicative language testing items. However, since they (i.e., test
designers) thought that communicative language has already been tested in
Listening Comprehension, and also Oral Test task. As a result, this part was
decreased from MCQ in the GVT. (Interview)
From the above quote, it was interesting to note that teachers felt communicative
language was now tested less in GVT tasks. This perception might be due to their
understanding of the concept of communicative language; that is, communicative
language was perceived by participants as involving speaking skills only, relevant to
To begin with, differing ideas among participants were identified from their
interview accounts. Generally, there were four major features of GVT tasks that
participants commented on.
• Test method: Some participants regarded that GVT items of MCQs were
guessable and MCQ as well as Sentence Completion tested rote-
memorisation, while others acknowledged that those tasks tested a wide
range of language knowledge. Moreover, the test method of MCQs reflected
unchanged test content in GVT tasks;
• Assessing language use: Some participants criticised that the overall ability
to use language was not tested in GVT tasks of MCQ and Sentence
Among all these addressed features of the GVT tasks, participants had quite
conflicting opinions. Each feature is thus reported with corresponding interview
accounts in this section.
5.2.1 Authenticity
Both teachers and students expressed their concerns over the authenticity of
language in GVT tasks. On the one hand, some participants agreed that GVT tasks,
mainly MCQ and Sentence Completion, were lack of authentic language; on the other
hand, other participants perceived that GVT tasks, especially Cloze and Gap-filling
cloze, were relevant to real-life experiences and language use.
To begin with, participants had differing views on the language used in GVT
tasks with regard to its relevance to real-life use. All three teachers and most students
commented that the type of language required in MCQ was irrelevant to either real-
life use or future English study. Lan explains this view below:
Lan: Well, that is to say, after students learned English… regarding the context
involved in test items [of the GVT], I think it does not conform, well, to
students’ use in real life, especially for their language use overseas. (Interview)
In Lan’s opinion, the language context involved in both teaching and testing was
“the same”, however, it was “impractical” in authentic communication (Interview).
Later on, Lan claimed that “what you can say in English in schools is what others do
not use in real life” and she was astonished when she had to teach old-fashioned
phrases such as “How do you do?” in junior high school (Interview). Similarly, Hu
shared the view that language knowledge learned and tested in junior high school was
irrelevant to that used outside the class, such as in American TV series (Interview) and
Zhang felt that “there is still some disparity” between real English use and the test
content of the GVT (Interview).
Xia-SA: However, I still think, besides, well, some options … are hard to be
differentiated. For example, modal verbs, well, I feel that every option could
be possible. That is to say, to put them into daily oral communication, every
option could be correct. Therefore, it is really hard to differentiate which one
is the best. (FG-SA)
Therefore, both teachers and students regarded GVT test content, MCQ in
particular, to be irrelevant to real-life use or future study contexts, which indicated a
lack of authenticity of language. This negative perception was similar to Zhi and
Wang’s (2019) findings in the NMET context where the perceived irrelevance of test
content to real-life English threatened test authenticity. To this end, it indicated an
influence of negative washback potential of test design characteristics on students
(Green, 2007a).
In contrast to teachers and School A students who felt that GVT tasks, MCQ in
particular, did not really reflect real-life experiences and language use, students from
School B and School C commented that the test content of GVT tasks were relevant
to real life. They commented that both Cloze and Gap-filling cloze were “connected
with real life” (Shu-SB) and used real-life topics (Na-SB). The following example
from Fang-SC illustrates this perception.
Fang-SC: That is, … it is not only what we learned in the classroom, and also outside the
classroom, some extra-curricular knowledge. For example, Gap-filling cloze,
it relates to our real life [topics], such as artificial intelligence (AI), something
like sharing bicycle, that is topic, that is, a lot of topics [are relevant to real
life]. (FG-S1)
On the one hand, Hu felt that the lack of context provided in MCQ and Sentence
Completion tasks was a concern. In her opinion, MCQ items did have some
background information, but this was only given in one or two sentences (Interview).
This insufficient context in test items created difficulties in explaining answers to
students.
Hu: Some of the tasks [in the GVT], if they lack some language background or
context, in fact, even if you choose the right answers, it is hard to persuade
[test-takers about why this is the right answer]. (Interview)
Likewise, Chao-SA and Ping-SC shared a similar opinion on MCQ and Sentence
Completion. Chao-SA commented that the practice of including only one sentence in
these tasks made it hard for students to “infer from its context” (FG-SA). In a similar
vein, Ping-SC recognised that there was no meaningful context provided in Sentence
Completion (FG-SC). As a result, this finding echoed those of studies which claimed
that decontextualised and discrete-point items cannot fulfil the purpose of
communicative language testing of grammar and vocabulary (Alderson & Hamp-
Lyons, 1996; Harrington, 2018).
On the other hand, Lan and Hu agreed that certain tasks in the GVT provided a
rich language context. Nonetheless, this perception was closely related to Cloze and
Gap-filling cloze. In particular, Lan thought that Cloze and Gap-filling cloze provided
a richer context since they were passage-based tasks (Interview). Further, compared
with Sentence Completion, Hu agreed that richer context was involved in MCQ.
Hu: Yes, it (i.e., MCQ) is richer than Sentence Completion. The reason is that it
involves things like more language situation of context and the like. Some new
things like ideas or current affairs or issues can be designed in those MCQ
items, which offers more helpful information. (Interview)
In fact, this idea of “providing a rich context for language use” indicted a positive
washback of GVT design characteristics on participants. Despite Hu’s latter
comparison, it was generally believed by participants that passage-based grammar and
vocabulary tasks were able to involve context at the discourse level and thus better
assess students’ ability to use language. This finding further highlighted the need for
contextualised test items in communicative grammar testing (Rea-Dickins, 1991).
First, as reported by Hu and Zhang, GVT tasks using MCQs (i.e., MCQ and
Cloze) were guessable. According to them, this could explain for the decreasing use
of MCQs in test design.
Zhang: Hmm, perhaps why it (i.e., MCQ) is decreasing gradually? Because it has a
great opportunistic nature. That is to say, I may not understand anything of this
item, I guess, I could have a twenty-five percent chance of choosing the right
answer. (Interview)
Zhang’s belief about guessing on MCQs accorded with her teaching of this type
of item. For example, she encouraged students to randomly select an answer when they
did not know the right one and even complimented students when they chose the right
answer through guessing. The following example from Zhang’s class is illustrative:
Zhang: This, okay, the second pitfall appears here. “Fine”, what does “fine” mean?
Okay, George.
George: I guessed.
Although Zhang praised George, she was encouraging the use of test-wiseness
strategies. Therefore, this negative perception of “guessability” in MCQ items
undermined the reliability and validity of the test and thus was an indication of
negative washback of the GVT.
Moreover, this finding was in line with students’ impression that the sentence-
based tasks of MCQ and Sentence Completion tested a fixed body of content and thus
measured their memorisation of language knowledge (Fei-SA, Ling-SA, Long-SB,
Xun-SB, Na-SB, Ping-SC). For example, Long-SB noted that MCQ tested fixed
collocations and vocabulary which he learnt by rote-memorisation. Na-SB commented
that she did not need to read the whole sentence of MCQ items but could provide the
correct answer by recalling her knowledge of fixed collocations, which was agreed by
Xun-SB. This negative perception focused on the discrete-point MCQ items and thus
reflected the negative influence of test design characteristics on learning. It further
indicated negative washback that test design could have on students as indicated by
Green (2007a).
In fact, this conflicting opinion regarding the task method of MCQs echoed the
debate on the advantages and disadvantages of multiple-choice items in testing
grammar and vocabulary knowledge. As a result, this perception from participants
indicated both the negative and positive washback potential of test design
characteristics on participants as conceptualised in the framework (Green, 2007a).
Additionally, the test methods in the GVT revealed that the test content of certain
GVT tasks were not changed since the original tests. For example, Lan, Hu, and Yao-
SC commented that what was tested in those tasks was fixed, which mainly related to
MCQ, Cloze, and Sentence Completion tasks. For example, Lan regarded MCQ as
testing certain language knowledge points, which were explicit and unchanged year
after year.
Lan: Because every MCQ item tests a certain aspect of language or vocabulary
knowledge. Taking vocabulary as an example, such as articles, comparative
degree, it tests a very specific language knowledge point. (Interview)
Additionally, all students shared the same perception. For example, Chao-SA
felt frustrated with MCQ as he knew what the item was trying to measure once he read
the stem.
Chao-SA: Because, sometimes, doing MCQ is really annoying, it, because it has obvious
tricks. That is to say, … I can guess what it indeed wants to test. Because
sometimes when I see, for example, “future tense of subject and present tense
Further, due to the enduring use of certain test tasks, Hu constantly reminded
students of common and frequent language knowledge points tested in GVT tasks in
her class.
Hu: MCQ in the GVT, there must be one item which tests? What kind of
knowledge of object clause?
Moreover, other tasks such as Cloze also measured fixed content like
differentiating adjectives or adverbs (SB-CO4), and Sentence Completion was viewed
as the most fixed. According to Hu, Sentence Completion was full of “tricks” such as
the testing of interrogatives; while for Ping-SC, Sentence Completion always tested
the same type of items such as “negative sentence” and “transformation of
synonymous sentences”. Likewise, students commented that the topics in MCQ did
not change from year to year, which could be boring for test-takers. The following
explanation from Fei-SA illustrates this:
Fei-SA: In my opinion, it (i.e., MCQ) can, that is to say, towards that task, its topics
can be not only, well, like what we were tested in every exam, like topics of “I
borrow one pencil from you”, “I borrow one pencil-box from you”, “I write
one essay”, etc. [laughed] I feel these are very… I feel, well, these topics are
used too frequently and become very boring. (FG-SA)
According to teachers, the unchanged content of the GVT was the evidence of
the simple testing of basic language knowledge in MCQ and Sentence Completion
(Lan, Hu), and that some items focused “too much on grammar knowledge itself” (Hu,
Interview). Further, this testing of basic language knowledge was perceived by Zhang
as reflecting a lack of overall English ability tested in these tasks (Interview).
Therefore, once students knew how to change the “be-verb” pattern in the sentence,
this grammar task could be completed (Hu, interview).
In general, both teachers and students generally regarded the focus of GVT tasks
as unchanged and simply tested students’ basic language knowledge of grammar. In
their opinion, MCQ, Cloze, and Sentence Completion tested fixed content of language
knowledge, and the test content in MCQ and Sentence Completion was too basic and
could be mastered by students through rote-memorising knowledge such as grammar
Hui-SB: Well, in my opinion, MCQ in the GVT does not test, well, does not test us
students’ ability to use language, because it contains all basic, relatively
elementary things. … What it (i.e., MCQ) does is to help use to lay the
foundation, to further set up our future language learning. Therefore, I think it
does not test [our overall ability to use language]. (FG-SB)
As perceived by Hui-SB, MCQ did not test his ability to use language. This
perception was in relation to the fact that MCQ and Sentence Completion tasks
contained easy tasks which tested more language knowledge rather than the overall
ability to use language.
In contrast, teachers did agree that the GVT measured students’ overall ability
to use language. The term “the overall ability to use language” has connotations of
Zhang: Okay, from my viewpoint, how does that task have a communicative
characteristic? Well, in my opinion, it is still a higher requirement in students’
logic and understanding abilities, how do they use the knowledge. Also, it is
the same to Cloze, … Gap-filling cloze tests students’ ability to use language.
(Interview)
The above explanation from Zhang indicated that passage-based tasks such as
Gap-filling cloze did test students’ language use ability since it required students to
use language knowledge to complete the task. This finding aligns with the use of
gapped-sentence tasks in the Use of English section of the Cambridge English:
Proficiency (CPE) test, which was designed to reflect the synergy between
communicative teaching and assessment and was claimed to measure candidates’
productive language knowledge and linguistic competence (Booth & Saville, 2000;
Docherty, 2015). In addition, both Hu and Zhang agreed that the Gap-filling cloze also
tested other language abilities such as making inferences, which was perceived as a
part of overall language use ability for students. This positive test perception from
teachers seemed to conflict with their general negative perception that the GVT did
not test communicative language ability.
Likewise, all students agreed that Cloze and Gap-filling cloze, especially the
latter, were comparatively more effective tasks than MCQ and Sentence Completion
to test their overall ability to use language. For example, Fang-SC commented that
Gap-filling cloze tested more abilities to use language and a wider scope of language
knowledge (FG-SC). By “more abilities”, she meant that Gap-filling cloze was not
only testing language knowledge but also involved skills such as problem-solving (FG-
SC). This perception thus contrasted with students’ negative perception of MCQ and
Sentence Completion, which were perceived as not testing students’ ability to use
language. However, according to students, although the number of MCQ items with
written dialogues decreased, the retention of such items in the MCQ was the evidence
of testing their overall ability to use language. To recall, the written dialogue tasks, as
reported by Zhang, were decreased in MCQ, but one still remained (see Appendix B).
Further, Meng-SC considered that applying correct word meaning in MCQ was also
an indication of testing the overall ability to use language (FG-SC). This comment thus
Further, Zhang felt that traditional MCQ items in the GVT also tested students’
application of foundational language knowledge and Hu stated that MCQ and Sentence
Completion items were also able to test students’ reading ability. Zhang’s response is
presented below:
Zhang: I think it (i.e., MCQ) is testing both students’ mastery and application of
language knowledge. … In my opinion, first, you [students] should master
these, these basic vocabulary and grammar. And then, you then try to apply
these knowledge and skills to solve the problem and apply them in the created
situations by test designers. Right, I think it is generally about this, about the
testing of language use ability, I think so. (Interview)
Similarly, all students across three schools agreed that the GVT assessed
students’ language skills other than language knowledge itself. Students agreed that
reading comprehension and understanding passage meaning were tested in Cloze and
Gap-filling cloze (Ming-SA, Xun-SB, Ping-SC, JING-SC, Meng-SC). In addition,
translation ability was tested in Sentence Completion as students needed to write down
answers (Ming-SA); and writing as well as spelling ability was tested in Sentence
Completion and Gap-filling cloze tasks (Fei-SA, Jing-SC, Hua-SC). Students also felt
that Cloze and Gap-filling cloze tasks tested logic and language intuition (Yao-SB,
Ping-SC).
From the above quote, it was clear that Hu thought the GVT was changing from
assessing language knowledge to improving students’ ability to use language. This
positive shift in test design was also perceived by Lan and Zhang, who categorised the
current GVT as moving closer to the inclusion of more authentic context in test items.
These perspectives on improved test quality indicated the positive washback of the
GVT on participants.
As shown in Table 5.2, the proportion of students who disagreed that the GVT
only measured their rote-memorising ability was remarkably higher than those who
agreed (52.8% versus 17.7%); the same tendency was evident for students’
disagreement versus agreement with Sentence Completion and Gap-filling cloze only
testing their spelling (67.3% versus 15.4%), MCQ format tasks only testing guessing
Table 5.2
Student Perceptions of GVT design characteristics10(see instrument reliability and validity in section
4.5.3)
10
The overall percentage of each variable was 100 ±0.1 because of the rounding error.
Teachers’ test anxiety about the SHSEET was explicitly shown in their
classroom interactions. A common phenomenon was that teachers expressed their
anxiety after calculating students’ test scores or reviewing test answers in the class.
For example, when she was about to announce test answers in the class, Lan reminded
students to calm down and asked them to not scream when their answers were right or
sigh when their answers were incorrect (SA-CO1). Likewise, Lan was observed trying
to persuade students to make a compromise rather than arguing for an alternative
answer to a GVT exercise (SA-CO2). However, this worry about students’
argumentative behaviour was unique to Lan’s class since her students were generally
high-achieving. Zhang’s test anxiety was mainly centred on her students’ test scores.
Due to the generally low language proficiency of her students, she frequently and
explicitly pointed out her students’ problems and explained her concern in her classes.
For example, when Zhang found that students made mistakes on even the easiest MCQ,
she asked students to stop doing the exercises and commented that it was “meaningless”
to continue (SC-CO1).
Compared to teachers who generally felt anxious about the test, students from
all three schools reported experiencing anxiety, but to a differing extent and for a
variety of reasons. To highlight, students from these three schools reported different
patterns of test anxiety.
In the School A focus group, only Fei-SA and Ling-SA reported feeling anxious
about either the GVT or the SHSEET as a whole, which was different from other
students who reported that they did not experience any change in emotions. For
instance, Ming-SA explained that he actively tried to relax himself so that he wouldn’t
However, for Ming-SA, although he claimed that he was not anxious about the
test, he was quite emotional when talking about the test items. In particular, he
mentioned that if he made a small mistake in MCQ, he felt that “I will collapse” (FG-
SA), which indeed revealed his anxiety. This further indicated that even high-
achieving students from School A felt anxious when approaching the test date. For
example, Fei-SA worried about making mistakes in easy tasks like MCQ in the GVT.
Like the majority of School A students, School B students expressed that they
were not anxious at all. Generally, most students claimed there was “no pressure”
regarding the GVT and the SHSEET as a whole. For example, Long-SB mentioned
the strategy of “treating common tests as high-stakes tests and treating high-stakes
tests as common ones”. However, students’ anxious feelings were influenced by Yi
Zhen which was a pre-SHSEET test that enabled students to be pre-enrolled into their
expected senior high schools based on their Yi Zhen test scores. Hence, this test seemed
to mitigate their test anxieties towards the GVT and the SHSEET. It was thus perceived
as positive which was congruent with the positive washback of the exam reform of
NMET on test anxiety as two tests a year were found to lower students’ test anxiety
(Chen et al., 2018). However, the use and impact of such mock tests before actual test
preparation was unexplored in this SHSEET context since it was beyond the current
research scope.
To summarise from the qualitative data, teachers expressed their test anxiety
towards the SHSEET in general and they felt anxious about students’ GVT
performance during test preparation. This anxiety was also felt by Fei-SA, Ling-SA,
and School C students. It thus indicated a negative washback of the GVT on teaching
and learning. Nonetheless, students’ opinions on test anxiety varied from school to
school. School A students were generally not anxious since they felt the GVT was not
challenging to them. Further, although School C students were most anxious which
echoed their teacher’s views, the existence of Yi Zhen was found to mitigate students’
affective factor of test anxiety among School B students. However, it is impossible to
generalise from these findings as the qualitative sample in this study was small.
Therefore, even though all three teachers experienced anxiety, the test brought anxiety
to some students, but not to others (Alderson & Wall, 1993). This finding indicated a
complex washback result regarding students’ test anxiety.
Table 5.3
Indicators of test anxiety in the GVT context (see instrument reliability and validity in section 4.5.3)
Chao-SA: Well, I also think that for English, the most important thing is, currently at the
junior high school level, to cultivate that, students’ interest in English, because
interest is the motivation. Such as, usually, if I do exercises, I will feel that
doing, reading English while doing exercises is really interesting, and thus I
will keep on doing the exercises. …… I felt like copying those words, I think
they are very, very, very “interesting”. Anyway, I just feel like that I am not
going to be tired. (FG-SA)
Due to the fact that the aforementioned students (Lin-SA, Chao-SA, Na-SB) who
had an intrinsic motivation in learning English were mainly high-achieving students,
it could thus support a relationship between motivation and learners’ language
achievements as claimed by researchers such as Gardner et al. (1985). To further
Table 5.4
Indicators of intrinsic motivation (see instrument reliability and validity in section 4.5.3)
As shown in Table 5.4, the proportion of students who agreed that they learned
English grammar and vocabulary for the purpose of promoting future learning was
much higher than those who disagreed (83.3% versus 2.3%); the same tendency
remained for reading books and surfing the Internet (80.7% versus 5.2%), helping
English language communication (78.4% versus 5.3%), and using resources to
understand foreign cultures (73.4% versus 6.5%). The results showed that the survey
respondents were overall intrinsically motivated to learn English grammar and
vocabulary during GVT preparation.
Second, students from all three schools agreed that through taking GVT
exercises, they could monitor their grammar and vocabulary learning progress as well
as learning outcomes, as a student explained below:
Xia-SA: Because, normally we learn a lot of grammar knowledge, and then, doing this
task (i.e., the MCQ task in the GVT) is like assessing the normal [grammar
and vocabulary learning]. So, it checks whether you truly learn this knowledge
well or not, otherwise, what you do not know, you can still, that is, to fill in
the gap gradually. This is somehow helpful to the difficult tasks that follow
afterwards. (FG-SA)
According to Xia-SA, the purpose of taking GVT tasks was to monitor her
learning progress, detect learning gaps, and facilitate more challenging task
completion in the test. This intention of using the GVT to monitor learning progress
and diagnose learning gaps proved that students were extrinsically motivated to learn
English grammar and vocabulary in the GVT preparation stage. This finding coincides
with research (Popham, 2001; Qi, 2007) which found an instrumental purpose of tests
brought about a negative washback result.
From qualitative findings, it was common that students usually had short-term
goals of achieving higher test scores (Buck, 1988) which was considered as an
extrinsic motivation in this study. Therefore, the extrinsic motivation in regard to the
Table 5.5
Indicators of extrinsic motivation (see instrument reliability and validity in section 4.5.3)
At the micro level of washback value in the current proposed washback model,
the “teaching and learning” factor includes various teaching- as well as learning-
related aspects of which test preparation materials are an indicator. Test preparation
materials were categorised as participants’ resources to meet test demands (Green,
2007a). In this study, it was thus viewed as teaching and learning materials for
grammar and vocabulary. From classroom observations and interviews, participants
bought test preparation materials (either designated by the school or purchased
individually) to undertake test review practices and better prepare for the exam. It was
found that the materials used both in and outside class for English language teaching
and learning during test preparation were mainly exam-oriented. In this study, the term
“exam-oriented” conveys a similar meaning as “test-use oriented” which will be
discussed with learning strategies in section 5.5. Therefore, exam-oriented materials
and test-use oriented strategies refer to materials and strategies that were adopted by
participants to exclusively prepare for the test. Qualitative data showed that different
kinds of exam-oriented test preparation materials were adopted by teachers and
students; however, it is noticeable that students with high language proficiency levels
(mainly those high-achieving students from School A) also utilised non-exam oriented
learning materials during test preparation. Findings are reported as follows.
According to Lan’s account, the school’s decision to change from more authentic
and challenging textbooks to the test-related textbooks in Grade 9 could be to support
students to better prepare for the exam when the test date was approaching. This was
similar to findings from other test preparation contexts, for example, where passing
the test became the major goal in the time leading up to the CET-4 (Zhan & Andrews,
2014). Further, although not explicitly mentioned by Lan, her students in the focus
group mentioned their practice of “going through the vocabulary and grammar
content” in all the test-based textbooks for test review in the first semester of Grade 9.
However, this change to less authentic textbooks and the focus on grammar and
vocabulary knowledge in the textbooks indicated negative washback and an intense
washback on teaching and learning. This finding aligned with those from other studies,
where teachers’ textbook use during test preparation only focussed on test-related
content (Saif, 2006).
11
The test review coaching book is a typical material designed for test review and test preparation. In
spite of the different names for various coaching books, they have almost the same structure; that is,
language knowledge summarised in a systematic way, past test items on certain language knowledge,
and mock exercises for students to strengthen their knowledge learned.
According to Hu, their school chose Ba Shu Talents SHSEET Final Review (巴
蜀英才中考总复习方案) as the main test preparation guidance book, but the decision
was made by the Head of Curriculum (Interview). Therefore, it was obvious that
teachers had little power or authority in decision-making regarding test preparation
materials, even for the classes that they were responsible for. This phenomenon was
further illustrated by Zhang, who had the role of Director of Teaching Affairs in her
school, when she expressed her regret regarding her choice of major teaching guidance
materials for test preparation: “this time, I did not choose very well.” (Interview)
According to her, the quality of the material was not satisfactory.
Jing-SC: Now, as the textbook teaching has all finished, [we learn grammar and
vocabulary] according to, currently, what is in Test Specifications. Well, there
are clues, for example, clues like what will be tested in MCQ in the GVT. (FG-
SC)
Lan: We also subscribed 21st Century English study newspapers. The reason is that
the passages in that newspapers, the topics, are before, comparatively before,
well, what should I say, avant-garde, very new [laughed]. Yes, they are linked
to the topics with current affairs. (Interview)
Zhang: Well, we now, I mean, the school, spent 6,000 RMB and built an online self-
study website. Although it is a computerised exam marking system, well, we
can still check individual students’ learning progress through any individual
test item. (Interview)
Long-SB: I mean searching online, or, well, sometimes, my tutor gives me when I go to
private tutoring classes. Or when I stay at home, my parents search online since
they have nothing else to do, and then ask me to finish [the test-driven
exercises/tasks]. Because sometimes I am idling around, so they find test
preparation materials for me to complete. (FG-SB)
Section 5.4 has mainly reported the test preparation materials that participants
used. The common materials used by both teachers and students and across schools
were: test-based textbooks, school-designated test review books, language knowledge
such as vocabulary lists in official test reference documents of the ECSCE and Test
Specifications, and self-selected test preparation materials (mainly mock test papers
and exercises). Nonetheless, students with high language proficiency levels sometimes
tended to choose non-exam oriented learning materials to learn English grammar and
vocabulary. However, few students used such learning materials, with the majority of
participants using exam-oriented test preparation materials for learning grammar and
vocabulary. Therefore, those exam-oriented test preparation materials, aiming to help
students better prepare for the exam, were evident signs of a negative washback on
teaching and learning (Alderson & Wall, 1993; Zhan & Andrews, 2014). To clarify,
participants’ use of test preparation materials was not included and generalised in the
student survey. The reason was that the interviewed participants had provided ample
evidence, so that there was no need to collect additional data on this topic.
Hu: Well, the normal skill is the trilogy of lecture-evaluate-do exercises. Of course,
when you actually apply it, you certainly need to use some tricks. Otherwise,
if you do lecture, do exercises, and evaluate every day, the student will die,
and you yourself can die. … That’s it. Anyway, when you actually apply [the
trilogy], use more, more, more, richer tricks. But the key is for sure, the key is
certainly lecture-evaluate-do exercises. (Interview)
According to Hu, the key principle for test preparation teaching was to follow
the trilogy of “lecture-evaluate-do exercises” and she did not think there were other
possible or effective test preparation pedagogy. Likewise, when doing test exercises,
Lan mentioned her normal test preparation teaching of “do exercises-check answers-
give feedback or instructions on students’ problems only” (Interview). These patterns
were found in all three teachers’ teaching practices. Hence, it indicated the
predominance of teacher-dominated test preparation activities, which was also evident
in other test preparation contexts such as the NMET (Qi, 2010) and IELTS academic
writing (Green, 2006b).
It is important to note that teachers also mentioned the changes they made in
Grade 9 to teaching grammar and vocabulary learning strategies. The major difference
was that teachers used communicative language teaching such as providing an
authentic context for students to learn a word or a certain grammar structure in Grade
7 and Grade 8. However, in the test preparation context in Grade 9, they mainly used
exercises to explicitly teach grammar and vocabulary. According to Zhang, this shift
in teaching method from creating a meaningful situational context to choosing a
correct answer in exercises was due to the tasks used in the GVT. The rationale for
this change, explained by Zhang, was that students were at different stages of learning
language knowledge. In other words, in lower grades, certain grammar structures and
vocabulary were new knowledge to them; whereas in Grade 9, the focus was to
consolidate their knowledge and review it to prepare for the final exam.
Therefore, the test preparation teaching model was the same and the grammar
and vocabulary learning strategies remained similar across the observed classes. The
objective of this study was not to compare teaching methods and learning strategies;
instead, it adopts the term “learning strategy” to jointly report the teacher and student
data. To this end, “grammar and vocabulary learning strategies” referred to both
teachers’ teaching and students’ use of grammar and vocabulary strategies. As the
study aims to determine the washback value of the GVT, the qualitative data were
categorised into two types: test-use oriented learning strategies and learning-use
oriented learning strategies, which followed Zhi and Wang (2019). Further, the
categorisation and definition of these two strategies also modified those of Doe and
Fox (2011). In detail, test-use oriented strategies refer to those “used for a specific
testing activity”, which “are test-dependent but language independent”; and language-
use oriented strategies are strategies that “are activated to support engagement in or
with language itself” (Doe & Fox, 2011, p. 31). In applying test-use oriented strategies,
participants expected to improve test scores as their main purpose.
Against this backdrop, both teachers and students used and reported their use of
the following test-use oriented grammar and vocabulary learning strategies.
In addition, teachers also reported their use of a fifth grammar and vocabulary
learning strategy to prepare students for the test. This strategy is:
First of all, all three teachers were found to anticipate the possible test topics or
items in their teaching. In order to persuade her students who frequently challenged
the answers provided by test designers, Lan emphasised the importance of guessing
the answers that test designers expected (SA-CO2). This idea of guessing test
designers’ intentions was also apparent in the other two teachers’ data. In fact, during
test preparation, it was common for the teachers to anticipate possible test items or
topics and to remind students that they could guess what would be tested in the actual
exam to prepare their learning accordingly. For example, after summarising passive
voice items in the past three-year’s test papers, Zhang asked her students about the
likelihood of a related item being tested in the 2018 paper.
Zhang: So this you can guess by yourself, you analyse those test items, and
then you can accurately grasp, understanding? (SC-CO3)
Lan: Various. [pause] The word “various” is also not an SHSEET word, so students,
you not, in Gap-filling cloze, for goodness’s sake, do not use non-SHSEET
words, like that word “better”, last time you used that word, originally I did
not want to give you a score on using that word, but I considered that the
meaning was still right, so don’t use this kind [of non-SHSEET words]. (SA-
CO3)
Lan’s comment indicated that students should avoid using ‘non-SHSEET words’
(those that did not appear in the SHSEET vocabulary list) in the exam. These reference
lists, taken from the ECSCE and Test Specifications, clearly listed the vocabulary
scope for the SHSEET.
Further, teachers taught the use of specific test-taking principles and strategies
that were explicit to help test preparation. For example, test-taking principles such as
“do not be too happy to recall the correct word form (“得意而不忘形” in Chinese)”
and “no sentence without a verb (“无动不成句” in Chinese)” were widely used by
teachers. According to them, the former principle meant to pay close attention to
correct forms of words when responding to tasks such as Gap-filling cloze. The use of
this principle is explained below:
Zhang: when we some-[times] do Gap-filling cloze, we teachers will talk about skills.
The last skill is definitely “do not be too happy to recall the correct word form”,
I know this answer, I use “house”, but I might need to use the plural form of
“house”. This word I need “do”, but maybe it should be a past tense, or maybe
perfect tense, or maybe even present progressive tense. This word is thus “do
Selective attention
Under teachers’ teaching guidance, students selectively prepared for the GVT
by focusing on filling in their test preparation gaps such as reviewing common
mistakes or strategically changing their learning foci. Results showed that teachers
taught and mainly School B as well as School C students adopted this learning strategy
during test preparation. According to them, this learning strategy could help them
improve test scores and learn specific language knowledge. Ping-SC’s quote below
exemplifies this:
Ping-SC: But when the test is approaching, definitely grasp the weaknesses, quite weak
sections, which are quite easy to improve test scores. Then that is, now, we are
not focusing on everything, but focusing on those sections that are [our] own
deficiencies. (FG-SC)
Rote-memorisation
Rote-memorisng grammar rules and vocabulary was found to occasionally
appear in Hu and Zhang’s classroom teaching. For example, when she was trying to
explain “the responsive principle (“呼应性原则” in Chinese)” of subject-predicate
consistency, Zhang emphasised to students the importance of rote-memorising this
principle.
In addition to the in-class emphasis, participants also reported the use of rote-
memorising in interviews. According to Hu, at the beginning of test preparation, she
required students to rote-memorise vocabulary lists from the ECSCE and Test
Specifications, however, as they were getting closer to the test date, she felt this
method was inappropriate since the time was limited. Instead, she asked students to
rote-memorise vocabulary through reading tasks in test papers or drills in classrooms.
Rote-memorisation is not uncommon in the literature, with studies found that the
use of rote memorisation, even in integrated tasks, could exert a negative influence on
language learning (Green, 2006b; Linn et al., 1991; Tsagari, 2011). Therefore, this
teaching method of rote-memorising or memorising repeated test points revealed
negative washback on School B’s and School C’s grammar and vocabulary study.
Zhang: All right, then, certainly I sometimes anticipate during the process of doing the
exercise. But in fact, sometimes my anticipation can be wrong. I guessed that
students, this, could be their difficulty, so that many students would make a
mistake here. However, it did not turn out to be the case. Sometimes I feel this,
of course, because this is because that my comprehension of the test item is
different from students’ understanding. (Interview)
Table 5.6
Indicators of test-use oriented learning strategies (see instrument reliability and validity in section
4.5.3)
As revealed in Table 5.6, the proportion of students who reported being reliant
on supplementary learning materials was lower than those who reported they did not
(27.3% versus 36.2%); the same tendency was evident for repetitively doing test-
driven exercises (25.9% versus 42.4%) and rote-memorisation (24.3% versus 47.6%).
Thus, survey participants were found to not predominately adopt those three test-use
oriented strategies during GVT preparation.
Nonetheless, students further mentioned two other learning strategies that they
used in extra-curricular time to learn English grammar and vocabulary. These
strategies are:
Ming-SA: Talking about MCQ in the GVT, actually it’s the same, that is, I think for me,
I can’t form any habit, rather, I prefer reading English passages more, let, to
experience the language use in the passage. This, reading this kind of whole
paragraph and sentences, I can understand more easily, so I prefer
accumulating those exemplary words and sentences. (FG-SA)
Table 5.7
Indicators of language-use oriented learning strategies (see instrument reliability and validity in
section 4.5.3)
In summary, this chapter contained five sections on washback value and both
qualitative and quantitative results were reported to answer RQ1a, which implied the
complexity of washback value.
At the micro washback value, sections 5.2, 5.3, 5.4, and 5.5 revealed that both
positive and negative washback of the GVT were identified regarding GVT design
characteristics, affective factors, test preparation materials, and grammar and
vocabulary learning strategies. In section 5.2, teachers and students reported various
conflicting perceptions indicated that their positive perceptions generally centred on
Cloze and Gap-filling cloze, and their negative perceptions mainly focused on MCQ
Chapter Six presents the data analysis and findings with regard to the washback
intensity from the perspectives and practices of Grade 9 teachers and students as they
prepared for the Grammar and Vocabulary Test in the Senior High School Entrance
English Test (the GVT). Through the analysis of data obtained by observing the
teaching and learning practices in classrooms, eliciting participants’ perceptions
through interviews with teachers and students, and administering a student survey, this
chapter addresses the second sub-question of RQ1:
Figure 6.1. Focus of the new washback model in Chapter Six (Carless, 2007; Green, 2007a; Jones &
Saville, 2016)
Washback intensity refers to the degree of washback associated with a test or the
extent to which participants will adjust to the test demands (Cheng, 2005; Green,
2007a). It further indicates to what extent stakeholders’ perceptions of test importance
and test difficulty influence the intensity of washback to them (Green, 2007a). On this
theoretical assumption, three main factors are considered in the dimension of
To answer RQ1b, this chapter is composed of four sections. Section 6.1 reports
both qualitative interviews and quantitative survey results of the test importance of the
GVT as perceived by participants. Similarly, section 6.2 documents qualitative and
quantitative results of participants’ perceptions of test difficulty of the four GVT tasks.
Based on those perceptions, participants spent corresponding effort on test preparation,
which is presented in section 6.3. To present the overall picture of washback intensity
of the GVT, section 6.4 uses Multiple Correspondence Analysis (MCA) and reports
washback intensity patterns from the quantitative survey data. Likewise, section 6.5
investigates the relationship among students’ test perceptions, affective factors, test
preparation practices, and test performance through Structural Equation Modelling
(SEM). Finally, section 6.6 summarises the results of washback intensity and the GVT
washback model.
Test importance literally means how important the test is perceived to be by its
relevant stakeholders. It is closely linked with test stakes which are regarded as
strongly indicating washback intensity (Madaus, 1988) and test use purposes that can
be both intended and unintended (Jin & Cheng, 2013). Therefore, stakeholders’
perceptions of test importance are also influenced by their awareness of test stakes and
the purposes of test use (i.e., how test results are interpreted and used by stakeholders).
In the current study, the SHSEET, as a high-stakes standardised English test, was
assumed to have a strong washback intensity. As part of the SHSEET, the GVT was
assumed to have a similar influence.
To start with, participants’ general impressions of the test importance (i.e., test
use purpose) were explored. When questioned about the test importance of the GVT,
Lan believed that teachers should find a way of keeping a balance in grammar teaching.
Due to her understanding of the nature of second language acquisition, she viewed
grammar learning to be essential; however, she disagreed with the predominant
grammar-focused approach to teaching EFL in China (Pan & Qian, 2017). In other
Firstly, the GVT was important due to the foundational role of grammar and
vocabulary in language learning. This was particularly relevant to junior high school
students, many of whom were categorised as beginners in terms of their English
proficiency level. To this end, Lan commented that the GVT was suitable for these
students. Similarly, from Hu’s perspective, GVT tasks, especially the easy tasks like
MCQ and Sentence Completion, played an essential role in junior high school
students’ learning as such basic language knowledge was “taking students’ arms and
being helpful in their future learning and life, such as overseas study” (Interview). In
a similar vein, students viewed the GVT as a whole to be highly important due to the
fact that the grammar and vocabulary knowledge assessed in the test was the
foundation of language use. For example, Shu-SB commented that the broad testing
of language knowledge in the GVT could reflect students’ language proficiency to
some extent. From their perspectives, “easy tasks assessing the foundation of language
learning” (Fei-SA, Hui-SB, Yao-SB, Fang-SC, Ping-SC, Kai-SC, Hua-SC) were
crucial for “English beginners” (Ling-SA, Chao-SA). This indicated the same
perception as their teachers Lan and Hu; that is, students thought easy tasks were
suitable for their learning stage.
In addition, the GVT was important mainly due to the use of SHSEET results
for graduation and senior high school enrolment (i.e., designated test use purposes). In
Additionally, the GVT was important to retain because it followed the tradition
of EFL testing in China. In both Lan’s and Zhang’s opinions, GVT tasks (MCQ in
particular) were traditionally kept for both provincial and local tests. Due to this
consideration, GVT tasks were normally included in EFL tests, particularly in the
SHSEET, which is designed and administrated at the province level.
Teachers like Hu pointed out that as the marks for the MCQ task of the GVT had
decreased, she no longer considered it to be as important as it used to be. In fact, the
marks allocated for the MCQ section were reduced to a total of 15 marks in the 2018
SHSEET test paper (originally 20 marks in 2016, so the total mark for GVT tasks in
2018 also decreased accordingly). Hu felt that the reduction in marks allocated to this
section indicated a lower test importance. As such, the mark allocation related to test
design decisions influenced teachers’ perceptions of test importance of the GVT,
which further impacted their decisions in teaching. Likewise, these tasks were
unimportant since they were too easy (Ming-SA, Long-SB, Na-SB, Jing-SC),
especially for high-achieving students (Fei-SA, Ling-SA). For example, every student
scored similarly in MCQ, which contributed less to discrimination in their overall test
scores (Meng-SC).
In addition, students regarded the inclusion of easy tasks such as MCQ which
focused on decontextualised points of grammar and vocabulary knowledge as
unnecessary. In their opinion, grammar and vocabulary knowledge could also be
assessed through and combined with other tasks such as Gap-filling cloze (Ming-SA),
writing (Long-SB), and reading (Jing-SC). This indicated that participants felt the
separate testing of grammar and vocabulary as unimportant.
Table 6.1
Indicators of test importance (see instrument reliability and validity in section 4.5.3)
From Table 6.1, it was found that most survey participants agreed that the GVT
was highly important to them. With regard to the instrumental test use purpose of the
GVT to junior high school graduation, the proportion of students who regarded it as
highly important to them accounted for 76.5%, which was much higher than those who
In addition to the factor of test importance, test difficulty is also viewed as the
driving force for washback intensity (Green, 2007a, 2013) which could lead to strong
or weak washback. Summarising from the qualitative data, it was found that both
teachers and students ranked the difficulty of the GVT by virtue of task types. In
general, MCQ and Sentence Completion were considered as easy tasks, Cloze had a
higher level of difficulty, but Gap-filing cloze was perceived as the most difficult by
all participants.
First, all three teachers and most students (for example, Xia-SA, Chao-SA)
perceived MCQ and Sentence Completion as the easiest tasks since they tested the
most basic grammar knowledge. Teachers tended to remind students about the easy
characteristics of GVT tasks, such as “the word order of declarative sentences was
very easy” (Hu, SB-CO2), in classes.
Further, Gap-filling cloze, which had more than one correct answer (Ming-SA),
was widely recognised by all teachers and students to be the most challenging task. As
such, they agreed that the most difficult task of Gap-filling cloze could test students’
overall ability to use language and prove their language proficiency. This perception
resonated with a study finding that gapped-sentence tasks were able to demonstrate
test-takers’ full linguistic repertoire (Docherty, 2015) and were thus difficult for
Lan: Gap-filling cloze is definitely a task that can differentiate students’ language
proficiency levels. Gap-filling cloze is a task that not only [in the GVT], but
also in the whole SHSEET test paper, it widens the gap [between low-
achieving and high-achieving students]. This task is a task that tests students’
abilities. (Interview)
In contrast, easy GVT tasks were perceived as not useful for discrimination but
enabled the majority of students to gain higher scores than they would otherwise have
received.
Hu: Because [MCQ] per se, according to the English inspector’s meaning, I feel,
anyway, this is basically a benefit for general, general students, which is not
used for selection. (Interview)
In general, teachers and students agreed that the overall test difficulty of the GVT
was not high (Ming-SA, Fei-SA) but was increasing each year. For instance, according
to all three teachers and students with an intermediate level of language proficiency
(Wei-SA, Xia-SA, Long-SB, Shu-SB), Cloze was more difficult than MCQ but easier
than Gap-filling cloze. Furthermore, despite the inclusion of relatively easy items such
as MCQ, all three teachers agreed that the overall test difficulty of the SHSEET and
the GVT was increasing each year. This could be seen in the decreasing mark
allocation given to MCQ in recent years. This perception regarding the change in test
difficulty reflected test designers’ intention of using the test for a selective purpose to
send high achieving students to prestigious senior high schools.
Table 6.2
GVT task types and perceptions of test difficulty (see instrument reliability and validity in section
4.5.3)
From Table 6.2, the proportion of students who reported MCQ as absolutely
unchallenging and unchallenging was higher than those who perceived MCQ as
challenging (34.6% versus 25.3%); the same tendency appeared for Sentence
Completion (34.6% versus 31.2%) but reversed for Cloze (16.2% versus 42.8%) and
particularly Gap-filling cloze (5.0% versus 74.0%). These results aligned with
qualitative findings since all participants recognised that Gap-filling cloze was the
most challenging task.
To sum up, the test difficulty of the GVT was not definite; rather, its difficulty
varied when taking the test methods as well as test scope into consideration. To
conclude, as perceived by participants, MCQ was the easiest task, Sentence
Under the above perceptions of test importance and test difficulty, teachers and
students spent corresponding effort on test preparation. By including test preparation
data in the new washback model which incorporates LOA, the effort each participant
put towards test preparation was explored. To clarify, classroom observations and
teacher interviews disclosed the in-class test preparation effort, while student focus
groups revealed the extra-curricular test preparation effort. Further, students’ test
preparation effort was largely investigated in the student survey. Findings are reported
in this section.
Initially, the change of class schedule because of the SHSEET indicated the
influence of test on teaching and learning. From teacher interviews, Lan mentioned
that School A students had eight sessions of English classes in Grade 7 and Grade 8.
Among the eight classes, except for the seven classes mentioned in Chapter Four (see
Table 4.3), they also had one class session taught by a foreign language teacher which
focused on improving students’ speaking skills. However, in Grade 9, due to course
design and test review considerations, this foreign teacher’s class was not given to
students anymore. This phenomenon thus reflected a certain extent of washback
intensity, as schools also put more effort in test preparation by deliberately decreasing
certain activities. In addition to the reduction of specific classes in School A, all
classroom observation data also showed intense washback as the test preparation of
the GVT and the SHSEET took up all in-class teaching time of the three teachers.
Further, the teachers at all three schools reported using the same test preparation
model. According to teachers, the test preparation went across at least the entire second
semester of Grade 9, and roughly included three stages. The first round of test
preparation was mainly about reviewing textbooks which covered the SHSEET test
In contrast, Fei-SA carried out intense test preparation for the GVT, including
MCQ, in the first semester of Grade 9. As he argued, it was hard to make significant
progress in English within a short time, because language learning requires a long-
term commitment. Therefore, he suggested students who were not good at English
might need to spend more effort on test preparation. Likewise, Kai-SC reported that
he did intense GVT preparation even on MCQ when the test was approaching, because
he aimed to gain more marks in easy tasks. Moreover, when approaching the test date,
School C students tended to have a more intense test preparation since they were asked
by their teacher to do MCQ tasks every day. These findings of washback intensity
Table 6.3
Number of test papers taken for GVT tasks (see instrument reliability and validity in section 4.5.3)
Table 6.4
Time spent on preparing for GVT tasks (see instrument reliability and validity in section 4.5.3)
From Table 6.4, it was found that the proportion of students who reported
spending less than one hour per week on MCQ exceeded those who spent more than
two or three hours on the task (59% versus 11.4%); the same tendency was evident for
Cloze (47.1% versus 16.6%), Sentence Completion (63.3% versus 12.0%), and Gap-
filling cloze (40.0% versus 27.2%).
To summarise section 6.3, intense in-class test preparation regarding the GVT
and the SHSEET was identified from teachers’ data and generally a lower degree of
GVT preparation in extra-curricular time was reported by students. In order to further
investigate the reported findings, Multiple Correspondence Analysis (MCA) was
conducted to examine the theoretical conceptualisation in Green’s (2007a) work (see
MCA was conducted by using SPSS 25.0 and with a goal of identifying patterns
of washback intensity of the GVT in extra-curricular time. The MCA model summary
is presented in Table 6.5. A two-dimension MCA solution was obtained. The first
dimension, with Cronbach’s α of .914, eigenvalue of 6.384, and inertia of .491,
accounted for 49.109% of the variance of the 13 variables in the study sample of 922;
the second dimension, with Cronbach’s α of .865, eigenvalue of 4.955, and inertia of
.381, accounted for 38.115% of the variance. Thus, the total variance explained is
87.224%. These results met the recommended criteria of Cronbach’s α above .70,
inertia above .2, and the variance explained above 50% (Hair et al., 1998; Johnson &
Wichern, 2007). Therefore, the two-dimension MCA solution explained the data well.
Table 6.5
Model summary of washback intensity
Table 6.6
Discrimination of variables for the dimensions12
The discrimination measures of the MCA model are displayed in Figure 6.2. In
both dimensions, the eight variables used to gauge test preparation effort of students
have larger discrimination measures than the variables used to gauge test difficulty and
test importance. This means, in general, along both dimensions, the codes/categories
of the eight variables for test preparation effort demonstrate a wider spread/variance
than those for test difficulty and test importance.
12
The indicator codes, such as 1TDFMCQ, used in this table reflect student responses to the Likert
scale used in the student survey. 1 means the first Likert point in the scale, 2 means the second Likert
point in the scale, and so on. In detail, for example, 1TDFMCQ means “MCQ is absolutely not
challenging”. For details of what each point represents, refer to the survey items in Appendix I or the
labels in Figure 6.3.
Further, MCA features four patterns of washback intensity of the GVT for the
survey participants, as shown in Figure 6.3.
Following the naming of construct indicators in Table 6.6, TP means “test papers taken”, Time
13
means “time investment”, and TDF means “test difficulty”. These were also applied to Figure 6.3.
Pattern 2 (the red circle in Figure 6.3)-less intense washback: when the GVT was
perceived by students to be neutrally important (3TIM) and the four GVT tasks were
considered as challenging or absolutely challenging (4TDF for all, 5TDF for Gap-
filling cloze), students tended to spend comparatively less effort in test preparation
(2TP, 2Time). Therefore, each week, most students did one to three test papers of and
spend no more than one hour in these four test tasks.
Pattern 3 (the black circle in Figure 6.3)-more intense washback: when the GVT
was perceived by students to be important (4TIM) and absolutely important (5TIM),
and the four GVT tasks were not very difficult (2TDF) or neutrally difficult (3TDF),
students tended to do more test papers (3/4TP) and spend more time on these four test
tasks (3/4Time). In detail, each week, students did four to nine test papers of and spend
one to three hours in these four GVT tasks.
Pattern 4 (the blue circle in Figure 6.3)-unclear pattern: each week, students did
more than 10 test papers of and spent more than three hours in these four GVT tasks,
but those test preparation practices were not at all related to their perceptions of test
importance and test difficulty.
Pattern 1 of no washback was in line with Alderson and Wall (1993) who
claimed that test will have no washback if it has no important consequences. Further,
it also verified Green’s (2007a) argument that test difficulty should be challenging but
also attainable for an intense washback to happen.
Pattern 2 of less intense washback was also supported by the literature. When
students’ perception of test importance increased, their test preparation efforts on the
GVT increased accordingly (Madaus, 1988; Popham, 1987). This finding could imply
that washback intensity can vary among different tests (Shohamy et al., 1996) and it
further verified the qualitative findings since students’ test preparation varied due to
different tasks, Gap-filling cloze in particular.
Pattern 4 presented a unique case in the current study. Against the theoretical
assumption that test importance was a crucial factor for generating washback intensity
(Alderson & Wall, 1993; Green, 2007a; Hughes, 1993), this pattern demonstrated
students prepared for the test intensely without perceiving the GVT as important or
difficult. To further investigate the phenomenon, participants grouped in Pattern 4
were tracked. Findings suggested that this pattern was reported mainly by students
with good test performance (18 out of 24 students reported scoring more than 126.5
marks in 2018 SHSEET). However, further analysis and more factors should be
To summarise, MCA findings supported students’ qualitative data; that is, the
washback intensity of the GVT on extra-curricular learning was to a lower degree. In
general, there were four patterns among which three out of the four verified the
theoretical assumption that test preparation is affected by participants’ perceptions of
both test importance and test difficulty. However, Pattern 4 reveals a unique pattern
that needs further investigation. Nonetheless, besides the previous findings of the three
factors with regard to washback intensity (i.e., sections 6.1, 6.2, 6.3), the further
investigation of washback intensity through MCA indicated that most students
experienced a less intense (Pattern 2) or more intense (Pattern 3) washback of the GVT,
which generally aligned with the accounts of focus group participants.
2) Students’ motivation and test anxiety directly affected their test preparation
practices;
The SEM results of the washback mechanism showed a good model fit
(CMIN/DF=3.200, df=699, p=.000, SRMR=.071, RMSEA=.049; 90% CI [.047, .051];
TLI=.910; CFI=.919). Although the model had a significant chi-square value of
2236.996, the ratio between the chi-square value and the degree of freedom
(2236.996/699=3.200) was not high. Standardised root mean square residual (SRMR)
was .071 which was below the cut-off value of .08 (Hu & Bentler, 1999); baseline fit
indices of TLI (.910) and CFI (.919) were above the cut-off value of .90 (Bentler,
Figure 6.5. Structural model for the relationship within GVT washback mechanism (N=922)
5. Test Importance2 (perceived test use purpose) had a significant positive and
weak to moderate association with Intrinsic Motivation (r=.308, p<.001);
and a significant positive but weak association with Extrinsic Motivation
(r=.269, p<.001) and Test Anxiety (r=.165, p<.001).
3. Test Anxiety had a significant negative but weak association with Negative
Strategy (r=−.194, p<.001); a significant positive but weak association with
Positive Strategy (r=.171, p<.001); and an insignificant association with Test
Preparation Effort (r=−.007, p=.839).
14
The “test anxiety” scale was designed in a reversed way, such as “My appetite was unchanged”.
Therefore, the bigger the Liker-scale number, the less anxious students were.
Figure 6.6. The structural relationships within the measurement model of the GVT washback
mechanism
Table 6.8
Qualitative results to RQ1b
To conclude, Chapters Five and Six are summarised in Figure 6.7 (see next page)
by modifying the proposed washback model in Chapter Three. The basic ideas within
this washback model were: macro value was assumed to exert an influence on micro
value (Green, 2007a), but the current research participants were unable to provide
abundant and crucial evidence (dashed arrow). Moreover, both perceptions of test
design characteristics (in micro washback value dimension) and test importance (in
washback intensity dimension) were found to associate with participants’ affective
factors (in micro washback value dimension), which in turn linked with their test
preparation practices (both micro washback value and washback intensity
dimensions). All these factors combined then tended to relate to students’ learning
outcomes (i.e., the self-reported test scores). Additionally, test preparation materials
and LOA practices were potential factors for both positive and negative washback
value, which were assumed to be included in this model. Details of LOA practices
during GVT preparation will be presented momentarily in Chapter Seven.
Chapter Five and Chapter Six have reported the overall washback value and
washback intensity findings of the Grammar and Vocabulary Test in the Senior High
School Entrance English Test (the GVT). This chapter explores Learning Oriented
Assessment (LOA) opportunities as well as challenges in the GVT preparation. In
particular, the findings address the second research question:
Both teachers and students felt that the GVT could offer opportunities for
promoting learning. However, ascertaining their understanding and knowledge of the
term LOA was crucial. Therefore, before delving into their perceptions of the LOA
opportunities, it is important to establish participants’ understanding of the concept of
LOA and the ways in which it informed them in the context of test preparation. In
effect, only teachers’ understanding of LOA was explored in the interviews,
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 203
considering that the term and theory could be hard for students to comprehend. Thus,
teachers’ opinions were primarily presented below.
Among the three teachers, Lan appeared to have a misconception regarding LOA
and expressed doubts regarding her understanding of the concept, as the following
excerpt demonstrates:
Lan: Oh, my understanding is, that is, language knowledge oriented. That is to say,
the language knowledge itself, which might not involve learning like students’
ability. Well, I think it should be, well, knowledge-oriented, well, testing,
knowledge. (Interview)
Hu: Learning Oriented Assessment? First, [the test] can reflect the real learning
situation of students. Second, it can tell students, how they should do. Third,
having a certain positive influence on students’ next stage of study. You should
let students have confidence to continue learning, and they should be sure
about where and how they could make improvement. This should be more
realistic [for students]. (Interview)
204 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
across schools. In fact, literature suggests that building teachers’ subject knowledge
could influence student achievement in the long term (Hill et al., 2005) and it is
important to provide teachers with professional development opportunities for learning
about and practising LOA (Carless, 2007; Carless et al., 2006; Zeng et al., 2018).
Regardless of the differing interpretations of the term LOA, the three teachers
felt that their teaching practices were guided by LOA principles. In other words, they
agreed that they prioritised learner needs and improving students’ learning. For
example, Lan used flipped classes to assess and give feedback on students’ learning.
For Hu, being aware of students’ language achievement in light of learning aims was
key to her teaching. In order to enhance her students’ learning, she intentionally
increased the level of difficulty of language knowledge teaching in her classes. In other
words, after students reached a certain level, she moved on to use more difficult
content for teaching. This was clear evidence of formative assessment, since teachers
(e.g., Hu) collected students’ assessment information through observation to improve
learning (Nichols et al., 2009) and suggest actions to close the learning gap (Black &
Wiliam, 1998). Further evidence included a simple profile of students’ learning
records by Hu. For Zhang, on the one hand, LOA practices in teaching were reflected
in her belief of “educating before teaching”; on the other hand, she tried to involve
more stakeholders like the head teacher and parents in decisions on her teaching.
Therefore, it was clear that teachers believed in LOA principles and this was reflected
in their teaching practices, but their teaching approaches differed.
Zhang: Anyway, it’s like this, this way, I think that, it’s still, there were much more
[incorporation of LOA principles] in Grade7, while in Grade 9, we follow less
[of LOA principles]. (Interview)
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 205
2005), past studies on the SHSEET (Yang, 2015), the IELTS, and the TOEFL
(Zafarghandi & Nemati, 2015) where teachers also intentionally ‘taught to the test’ in
order to achieve a better test result.
In line with teachers’ understanding of the LOA concept and their related
utilisation of LOA principles in teaching, both teachers and students believed that the
GVT could enable the incorporation of LOA principles in the test preparation stage.
In their opinions, there were various reasons for the GVT to promote learning. In total,
six kinds of opportunities were expressed by participants. Those opportunities include:
Lan: Well, currently, regarding this, this, our current students’ learning proficiency,
I think it is learning-oriented and helpful, nevertheless. After all, grammar is
what has to be learned [for this stage] … (Interview)
According to Lan, the students’ learning stage should be taken into consideration
as students were at an early stage of English learning. In fact, not all junior high school
students started learning English from Grade 3 as mandated; some students might only
have started learning English from Grade 7 (Ming-SA, Meng-SC, see section 4.4.4).
Therefore, for those English beginners, learning grammar was essential. This finding
was actually in accordance with students’ comment on the fundamental role of
206 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
grammar and vocabulary for beginning English learners (see section 6.1.1). Therefore,
the resonance between the fundamental role of language knowledge and the learning
stage of English beginners was thus considered as the first influential factor regarding
the opportunities for the incorporation of LOA principles in GVT preparation.
Hu: For example, like what we learn now in the class, although it has certain
differences from the real-life situation, however, it is not to say, they are totally
irrelevant. If you can learn this (i.e., the context provided in GVT tasks) well,
okay, actually it has a certain function at sometimes. Even though it might be
Chinglish, right? But it wouldn’t be totally useless, right? You, because you
learn this, like the rigid textbook knowledge, learned, learn it well, you won’t
be unable to go shopping when you are overseas, right? Or you won’t be put
into custody when clearing customs, or get into trouble, right? (Interview)
From Hu’s comment, it could be seen that although there were limitations of the
test design in terms of the relatively inauthentic contexts of GVT task items, they could
still provide a basis for students’ communication abilities in real life.
Likewise, students in focus groups claimed that what was tested in the GVT
could be practised in their real-life communication. For example, students could make
use of the MCQ content in a real-life situation when communicating with peers.
Ling-SA: Oh, I think that the MCQ task in the GVT is, sometimes, it can be embedded
into life. Take, for example, sometimes we (i.e., students) make jokes with
others, I mean making jokes by communicating with others in English. For
instance, it relates to some basic grammar, what is tested in MCQ, then we can
integrate what are tested in MCQ into life, sometimes [we can] integrate into.
(FG-SA)
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 207
Kai-SC: For example, if a foreign friend comes and asks us about some routine
questions, we will be able to, because we take [the MCQ exercises] regularly,
so [we are] proficient, so we will be able to answer their questions. (FG-SC)
In addition, students understood that GVT tasks could also impart useful
knowledge about language use. In this way, they could sense the learning-oriented
potential of the test, as a student explains below:
Na-SB: It, they [Cloze and Gap-filling cloze] just use another method to tell us (i.e.,
students, test-takers) the truth or facts of life. In fact, last time, it seemed that
I did one passage, which seemed to be about how to speak more politely. At
that time, it gave me one inspiration, that is, how to talk to others in a polite
way. Therefore, I think, in fact, Cloze and, and Gap-filling cloze, in fact, give
me great inspirations, which are relevant to real life. (FG-SB)
From Na-SB’s comment, it was inferred that students taking the information
conveyed through passage-based tasks to deepen their understanding was an indication
of learning-oriented potential of the GVT. Therefore, the potential of applying what
was tested into real-life situations created opportunities for the GVT to promote
learning.
Through analysing Zhang’s accounts, it was found that this perception was
influenced by the test design characteristics (testing language knowledge) and test
method (using a multiple-choice approach in some tasks). Regarding test design
characteristics, GVT tasks tested both language knowledge and overall ability to use
language (see section 5.2). As for test method, the MCQ type tasks could train
students’ logic and memory, which could be generally applied in language learning.
These were necessary learning skills that students should acquire and make use of in
208 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
order to learn a language. Taking the skill of inferencing as an example, it was
considered by many researchers as an important skill to be used in L2 reading
(Anderson, 1991) and vocabulary teaching (Walters, 2004).
Zhang: As for …that Cloze task, … Yesterday, Teaching and Research Officer Molly
gave us the training in designing test items, she said in this way “Cloze does
not test grammar”. … It tests the understanding of the passage, logical
inference … It tests higher-order stuff. (Interview)
Xia-SA: Such as when writing a composition, or writing articles, [we] can then make
use of this knowledge accumulated. (FG-SA)
Furthermore, Fei-SA explained his way of learning and using grammar and
vocabulary knowledge through extensive reading (see section 5.5.2). In fact, among
all focus group participants, this view of the foundational importance of grammar and
vocabulary knowledge was mainly expressed by students from School A and School
B. These students with a comparatively higher language proficiency level recognised
the link between knowledge of the language and the ability to use it. This finding, to
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 209
some extent, was assumed to uncover the relationship between LOA opportunities as
well as practices and student language proficiency levels. Further discussion between
LOA challenges and student language proficiency will be presented later (see section
7.4 and section 7.5).
Na-SB: I think that MCQ has only very limited use for me. In my opinion, I think that
Gap-filling cloze can improve my learning to a greater extent. … Because its
difficulty level is higher. (FG-SB).
This interview account from Na-SB showed that the more difficult the task, the
greater the possibility of improving one’s learning. This opinion was agreed by most
students in focus groups. A further comment from Ling-SA also offers insight into this
finding as the decreased mark allocation of MCQ indicated a shift of test focus from
testing language knowledge to testing overall ability to use language. In her opinion,
this shift brought about more challenges to her learning and thus motivated her to study
hard to improve junior high school English learning, which could further help her
senior high school study.
Ling-SA: In my opinion, in fact, if overall test score weight of other tasks [other than
MCQ] was increased, it means that in regard to the text, the text
comprehension, that is, our language abilities shall be improved, which is very
helpful to our study at senior high schools. Then … if we do not build a solid
and higher basis at a junior high school level, when we go to senior high
schools, in fact, even if you again, you study very hard again, I feel that I need
to spend more interest in improving my own learning. (FG-SA)
210 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
To conclude, participants believed that these six categories of opportunities
made it possible for the GVT to promote learning and thus indicated the potential for
the GVT to be learning-oriented. This was gleaned from the interview data. The
perceptions of LOA opportunities varied among teachers and remained similar among
students. According to teachers, Lan regarded the test as crucial for English beginner
learners and suitable for the junior high school stage of EFL learning. Although the
test items were not reflecting a real-life communication, Hu and students considered
the test to have an LOA potential since it could help students develop certain
communication abilities in real life. Next, other than improving students’ language
learning, the test itself was thought to be able to develop students’ abilities of memory,
logic, and inference (Hu, Zhang). Furthermore, according to Zhang, the GVT could be
learning-oriented due to learning-oriented test design intentions such as testing overall
language use ability rather than simple linguistic knowledge. In addition, as reported
by students, it was possible for the GVT preparation to incorporate LOA principles
due to the possibility of transferring language knowledge into performance on
macroskill tasks and the level of challenge.
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 211
• Feedback;
• Learning-oriented strategies.
These practices and their related categories are reported subsequently in this
section.
Learner-centred interaction
As aforementioned, LOA cycle depicts learner- and content-centred activities in
classrooms. Although it is ideal that both interactive activities should appear in
classrooms aiming for LOA purposes, it was rare to see interactive activities featured
with higher-order skills in the current SHSEET preparation courses. As a result, only
limited types of learner-centred interactions were identified. Those activities included
1) group or pair discussion which was prominent in all three teachers’ classes and 2)
positioning students as holders of language knowledge which was predominant in
Zhang’s classes.
Group/pair discussion
Noticeably, all three teachers observed utilised this activity in their classes to
motivate students to solve problems in groups or pairs (think-pair-share). For instance,
in her class, Zhang encouraged students to first discuss their answers in pairs before
she proceeded with further instruction on students’ exercises:
Zhang: You can discuss in pairs, then tell me your answer. You can discuss with your
partner, in pairs. Which one did you choose and, oh, which one will you choose
and why? You can discuss in pairs. (SC-CO3)
212 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
Hu: After they finish some MCQ tasks in a certain time period, their group itself,
itself summarises one [correct answer]. In the case that no one knows the
correct answer…, for example, I choose A, you choose B, he chooses C, and
she chooses D, we then discuss together. Which answer is indeed the most
correct one, the group itself gathers the final answer. (Interview)
According to the participating teachers, group and pair discussion were used to
provide opportunities for learners to engage with the assessment tasks with peers. The
use of this activity in a test preparation context was thus viewed as fostering a higher-
order communication skill of problem-solving among students. In light of this activity,
students were encouraged to apply knowledge learned to engage cooperatively with
assessment tasks. Therefore, the use of group or pair discussion in test preparation was
regarded as contributing to the learning-oriented potential of the GVT.
Zhang: And I will ask some students to come here. I will check what you have read
just, what you have read. Now, first one, who wants to have a try? Come here.
And show what you have had, have you, what you have read. [pause] Now,
Lily, please come here. [pause] And if, if you have any question, you can ask
her. Okay? (SC-CO4)
15
By positioning students as holders of language knowledge, it means that the teacher appointed a
certain student to act as a teacher, come to the front of the class, and then explain how he finished the
task to the whole class.
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 213
their ability in critical thinking. Her reason for adopting this activity in teaching was
thus explained in the interview as follows:
Content-centred interaction
As the literature suggests (Jones & Saville, 2016) and the current study describes,
content-centred classroom activity mainly focuses on knowledge of linguistic forms
and taking test tasks. Similar to the findings of learner-centred classroom interaction,
Hu was the teacher who mentioned content-centred activities in her interview, but this
was rarely observed in her classes. Nonetheless, a variety of content-centred
interactions were identified in Lan’s and Zhang’s test preparation classes, which
included 1) open-ended questions, 2) closed questions, 3) student-initiated questions,
4) reading aloud as a class, and 5) giving bonus points. This section presents these
activities in sequence.
Open-ended questions
From the classroom observation data, it was found that teachers adopted open-
ended questions to interact with students. Lan and Zhang used this type of questioning
frequently in order to engage individual students as well as the whole class in
classroom teaching. In fact, open-ended questioning was characterised with “why”
214 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
questions in classes observed. Through asking a “why” question, teachers expected
students to clarify their thoughts and engage in problem-solving processes. For
example,
Lan: No? Okay, forty-five, okay, helpful. [pause] What’s your reason? Why, why
you chose “helpful”? What’s your reason? (SA-CO1)
From the above quote, it was evident that Lan was expecting students to explain
the process behind the response. the reason was that she was not clear of the student’s
decoding process before the student explained the reasoning. In effect, questions aimed
at higher cognitive levels contribute more to learning than simple recall or mechanical
response through closed questions (Hasselgreen et al., 2012). Therefore, using open-
ended questions, which could also be viewed as metacognitive questions (Cazden,
2001), can promote students’ learning. In this way, Lan and Zhang both used open-
ended questions to interact and engage with students in test preparation classes.
Further, teachers also frequently adopted closed questions, which is presented in the
following section.
Closed questions
Although open-ended questions helped teachers probe further into students’
higher-order learning processes, closed questions were frequently used by teachers to
engage students in classes. For example, teachers preferred using the first half of the
sentence as hints for students to complete the other half of the sentence, which guided
students to reach the answers step by step. The following example from Lan illustrates
this:
Lan: Yeah, baozi [steamed pork buns] can be in different sizes. So you can see, the
first part is talking about its position, right? “It’s common, even President
Xi…”?
Similarly, teachers mainly used closed questions to elicit correct answers from
students. In addition, Hu frequently used closed questions to confirm basic language
knowledge, for example “Is this word a countable noun or uncountable noun?” (SB-
CO1), which was too easy for students. It was also noticeable that Hu answered her
questions either by herself or together with the whole class and did not always expect
to receive answers from students before moving on to the next task. As a result, Hu’s
class involved the least student interactions among all three teachers.
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 215
In fact, teacher-initiated questions are common in EFL teaching as this is
regarded as a valid invitation for students to respond. It is the first turn in a traditional
Initiation-Response-Evaluation (IRE) sequence. The characteristic of an IRE model is
that it is a teacher-led model, which involves the teacher questioning students with the
answers already known to the teacher (Lemke, 1990). Through questioning, teachers
hoped to encourage students to participate in class activities, express their ideas of and
opinions on learning (Liu & Zhao, 2010). In turn, this expectation could reduce the
phenomenon of teacher-talk. From the classroom observation data, it was found that
both Lan and Zhang commonly adopted open-ended questions and closed questions in
their teaching. In contrast, Hu often answered her own (rhetorical) questions and
seldom asked individual students questions in her teaching.
Student-initiated questions
In addition to the traditional teacher-led IRE pattern of interaction, there were
also student-initiated IRE interactions in both Lan’s and Zhang’s classes. Especially
for Zhang, since she involved students in her teaching, her students seemed to have
more chance to initiate questions. For example, when Kelly was demonstrating
teaching in front of the whole class, one student expressed his doubts and expected her
to provide a more detailed answer to the question, as can be seen in the interaction
below:
Zhang: What guide word? Some students asked you [Kelly] immediately, what guide
word?
Zhang repeated the students’ question and Kelly’s answer to keep the classroom
interaction dynamic. As for students, they were seeking further feedback on the task
from peers, which was perceived as effective in contributing to learning if students
could take the initiative in the learning and feedback process (Hasselgreen et al., 2012).
It was undeniable that her learner-centred activity of positioning students as holders of
language knowledge motivated student-initiated questioning. Therefore, through the
interactions between teachers and students, they co-constructed a common body of
knowledge (Hall & Walsh, 2002) to fill in learning gaps jointly.
216 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
Reading aloud as a class
To engage students with classroom interactions, Lan also guided students to read
aloud together in test preparation classes. Differing from an interactive learner-centred
activity, choral reading aimed at transmitting language knowledge in learning.
Therefore, in Lan’s classes, she frequently asked students to read important grammar
structures and sentences to deepen their impression of certain language knowledge.
The following excerpt is an illustrative example of this classroom activity.
Ss & Lan: Not only in Chongqing, but also in the other parts of the world, people
eat hotpot.
Through reading together with students and asking students to read aloud, Lan
focused on language knowledge in test preparation classes. In addition, this activity
appeared to be motivating to students as they actively engaged in this classroom
activity. Therefore, this content-centred classroom interaction of reading aloud was
adopted in Lan’s class to promote students’ learning.
Lan: If you got exactly the same answer as the original text, you can get two points.
And for other answers, if they are right, you can just get one. (SA-CO3)
During her classroom teaching, Lan kept a record of bonus points for students.
However, the real purpose of her use of this activity was unclear in the classroom
observation. Regardless of these ambiguities, the use of this content-centred activity
motivated her classroom interactions with students, and thus helped to promote
students’ language learning.
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 217
As pointed out by Jones and Saville (2016), both learner-centred and content-
centred classroom interactions should be emphasised since the complementarity
between them is necessary for developing learning-oriented classroom practices. The
current study further supported this claim as both interactive activities were found to
be key in GVT preparation classes. Although the number of learner-centred classroom
interactions was limited in the qualitative dataset, it was still evident that both learner-
centred and content-centred interactions were present in both Lan’s and Zhang’s
classes, while Hu’s classes appeared to be the least learning-oriented. This indicates
the potential for GVT preparation classes to be learning-oriented, but differences
remain across different teachers. In sum, the emergence of higher-order skills in
classroom teaching and learning as well as the process of knowledge transmission
complemented each other in test preparation classes.
Table 7.1
Indicators of classroom interaction (see instrument reliability and validity in section 4.5.3)
As shown in Table 7.1, the proportion of students who reported often and always
having group discussion in classes was higher than those who reported they never or
seldom did (36.8% versus 31.5%); the same tendency was evident for positioning
students as holders of language knowledge (39.3% versus 29.7%) but reversed for
218 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
having interesting learning activities such as performing drama and having English
debates (24.0% versus 45.3%). Thus, mainly two types of classroom interactions,
namely group/pair discussion (v48) and positioning students as holders of language
knowledge (v51) as identified in classroom observations, were probed in a wider
participant population, which provided evidence for positive washback of the GVT on
grammar and vocabulary learning. As for having interesting learning activities (v52),
it corresponded to participants’ interview accounts, that is, those interesting interactive
activities such as English debates happened in Grade 7 and Grade 8, rather than in
Grade 9. As such, it indicated a negative washback of the GVT on grammar and
vocabulary learning.
7.2.2 Feedback
Feedback practices, as a key element in LOA (Carless, 2007; Jones & Saville,
2016), were explored in both classroom observations and interview data. In fact,
Carless (2007) and Jones and Saville (2016) emphasised the importance for teachers
to give feedback on student performance to close learning gaps and thus improve
students’ learning. Therefore, as a key element in the LOA cycle, feedback practice in
the classroom was also explored in the qualitative phase.
To begin with, all three teachers believed in the power of feedback to improve
students’ learning and tried to expand students’ knowledge when giving feedback.
According to them, their feedback practices during the test preparation period were
mainly in an oral form (i.e., face to face), while written feedback was rarely given. The
reason for this was that “written feedback takes up more time” (Hu, Interview).
Moreover, both Hu and Zhang explained that they gave feedback according to different
language proficiency levels of students and they preferred students explaining first
before they gave feedback on their responses in class. Through these methods, teachers
felt that feedback was “helpful and effective” to students’ learning and their teaching
(Zhang, Interview). In addition, teachers explained that feedback should be
longitudinal, even during the test preparation stage. By so doing, feedback worked as
a mediated tool for teaching and learning as teachers and students co-constructed a
ZPD (Vygotsky, 1986) where students modified their learning systems based on the
feedback (Aljaafreh & Lantolf, 1994) and teachers could better detect students’
problems in learning and thus strategically help them to improve in a long term. In this
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 219
way, feedback mediated language development (Rassaei, 2019). For instance, Lan
explains her view of feedback below.
Lan: Therefore, maybe at one exam time, the student made a mistake in this item,
but next time he or she could choose a right answer. Okay, then at least, we
can say, if this item aligns with, for example, if it aligns with object clause, if
he makes a mistake, then at least it will mean that the student is having problem
with this language point, so [we teachers] can consolidate this, okay. Then,
hmm, it is more about a longitudinal tracking. If the student is having a
problem with this single MCQ item for a long time, and it’s only related to
those several items, then it means that the student has problems with this
section. (Interview)
Lan regarded that feedback should be progressive and used to monitor students’
learning progress over time. This idea, also adopted by Zhang, was found to work
through keeping a student profile of their learning outcomes. As explained by Zhang,
her records of students’ progress and achievement were used to monitor students’
progress in different language tasks in relation to the test. By referring to those records,
she strategically changed her teaching focus to help students make progress in their
learning.
Students’ comments in focus group data further supported the feedback practices
reported by teachers. For example, regarding to how teachers gave feedback, three
modes of feedback provision were reported from students: oral feedback, written
feedback, and online feedback. Most students mentioned face-to-face oral feedback
and its usefulness as it was detailed, insightful, and timesaving. This echoed teachers’
preference for oral feedback. As for written feedback, some students regarded it as
unimportant since they did not spend time reading written feedback; while others
perceived written feedback as more useful and effective (Hui-SB, Shu-SB, Na-SB) and
could be encouraging when it included smiles or emojis (Na-SB). In addition, online
feedback was proposed by Ming-SA, since he felt that an online channel could make
him feel less embarrassed when seeking feedback from his teacher.
220 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
performance. However, for Zhang, no matter whether outside of or during the test
preparation stage, her way of feedback was mainly produced through different written-
form assessment such as unit quizzes.
Overall, the feedback practices from teachers’ perspective were mainly at a task
level (i.e., focusing on test tasks) and the preparation for the GVT appeared to offer
limited learning-oriented potential through feedback practices (i.e., feeding forward).
Regarding the differences among three teachers, it was observed that Hu provided less
feedback and there were no instances of feeding forward in observations of her classes.
The reason for this phenomenon could be partly explained by Hu’s classroom
interactions which were different from the other two teachers. In Hu’s classes, teacher-
talk predominated, and thus there were rarely any feedback opportunities to promote
students’ learning. This was counter to the recommendation that feedback should be
both timely and forward-looking in order to improve students’ current as well as future
learning (Carless, 2007). In sum, although teachers were not aware of the relationship
of feedback practices to LOA, they believed that providing feedback could promote
students’ learning during GVT preparation. Hence, GVT preparation was able to
provide an opportunity of incorporating LOA principles through feedback practices,
but teachers’ feedback practices varied.
Additionally, both teachers and students confirmed that individual feedback was
given to students since every student’s learning progress and challenges differed.
Therefore, most individual feedback was given to students orally and was considered
to feed forward to some extent. While providing individual feedback orally, teachers
extended language knowledge and taught problem-solving and language learning
skills. For example, Ming-SA commented on his teacher’s individual feedback:
Ming-SA: Because, when you talk to the English teacher face to face, she not only
explains this single exercise to us, but also extends other knowledge. For
example, this type of exercise, it may have other special situations, and she
will give us instructions on those special ones as well. Hence, it is quite helpful
for the future [study]. (FG-SA)
According to Ming-SA, the feedback from his English teacher was not simply
the feedback on the specific task, but also included extended knowledge which was
useful for future studies. In fact, similar comments were made by other students (Xun-
SB, Fang-SC, Meng-SC, Hua-SC), which thus indicated the possibility of feeding
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 221
forward in GVT preparation. By feeding forward, it refers to pointing out the next
stage goal and indicating future learning implications (Carless, 2007; Hattie &
Timperley, 2007). As a result, although teachers’ feedback was mainly at the task level
and specifically related to test preparation, most students expressed that their teachers’
feedback can help them improve grammar and vocabulary learning. Moreover, key
literature on feedback (Carless, 2007; Hattie & Timperley, 2007) recommends timely
feedback and maintains the application of feedback is crucial in improving learning
and thus feedback was investigated in the student survey.
Table 7.2
Indicators of feedback (see instrument reliability and validity in section 4.5.3)
As shown in Table 7.2, the proportion of students who agreed that they had
frequent feedback on grammar and vocabulary learning from teachers was much
higher than those who disagreed (35.0% versus 15.6%); the same tendency was evident
for timely feedback (52.9% versus 11.1%), detailed feedback (52.7% versus 10.7%),
feeding forward (51.2% versus 9.7%), the usefulness of feedback (43.1% versus
222 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
11.5%), and satisfaction with teachers’ feedback (53.3% versus 9.0%). This finding
complemented qualitative findings, as the responses to the survey indicated that
teachers’ feedback on grammar and vocabulary during GVT preparation was frequent,
timely, detailed, feeding forward, helpful, and that students were satisfied with the
feedback they received from their teachers. Therefore, the feedback on grammar and
vocabulary did reveal positive washback of the GVT on students’ learning, which
further proved the possibility for the GVT to have a learning-oriented potential in the
test preparation stage.
To improve students’ learning, teachers asked students to help each other. This
learning activity, encouraged mainly by Hu and Zhang, was termed by Zhang as a
“mentor-apprentice pair”. In fact, this mentor-apprentice pair valued both guidance
and participation in learning and cognitive development activities (Rogoff, 1990),
which thus allowed students to co-construct a ZPD for them to cognitively process
their learning tasks at a higher level (Vygotsky, 1986). In turn, this ZPD placed
effective learning in a social and cultural context rather than a simple knowledge
development context (Jones & Saville, 2016; Sjøberg, 2007). Therefore, the unique
phenomenon that a pair of students stood up together when Zhang questioned one of
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 223
them in her class was explained in the interview. According to Zhang, her reason for
students to form such a pair was supported by her teaching as well as language learning
beliefs.
Zhang: It is impossible for me to explain 100% of what I want to say in class, then I
have to train the students with a high language proficiency level. For example,
I just said, only four students made a mistake with this test item, then you go
to your mentors. If your mentor can’t help, then you go to your English subject
representative; if the representative can’t help, then you come to me. Well, it’s
like this, this way, and then it can be a bit easier for me. Moreover, I think this
is definitely beneficial for students, why is it beneficial? I (i.e., the student)
can do this exercise, but if I need to help you to understand it, this process, will
be a process of reviewing my own knowledge. If the student can explain it
clearly, for the student [this is beneficial]. Besides, the student will find, well,
I (i.e., the mentor) can get the correct answer to this exercise, however, when
I explain to you (i.e., the apprentice), I can’t accurately tell you my problem-
solving process or I can’t help you to understand it. Okay, then here comes the
problem, so that you need to study. (Interview)
224 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
oriented strategy was thus regarded as “promoting learner autonomy” in this study.
For example, as English was a conduit for students to learn more about the world, Lan
regarded it necessary for students to be autonomous learners and develop effective
methods of self-study. Hence, she suggested that students learn by themselves in their
spare time.
Lan: I encourage students … to read more, study more in their extra-curricular time.
Then well, when they encounter some new words, if the words are related to
our topics, topics that are specified by the SHSEET, or are closely linked with
their life, I will, well, suggest them to, to look it up in the dictionary by
themselves, or to make one or two sentences by themselves, okay. Otherwise,
[I will suggest them] to look for one or two sentences which are relevant to
real-life in the dictionary. (Interview)
From the above quote, it was clear that Lan encouraged students to be
autonomous learners and learn vocabulary in their self-study time. In order to achieve
this purpose, she suggested that students develop their own learning goals for
vocabulary study. Therefore, Lan told students to make their own decisions about
vocabulary learning in the test preparation period. Similarly, Zhang reminded students
to learn from a good example when she discovered that high-achieving student Mia
was an autonomous learner in her self-study time.
Mia: Every night, after going back home, I write what I learned in the class for each
day, and then according to, according to the previous knowledge, I check my
learning.
Zhang: Okay, review, go through, and write the test review knowledge learned every
day. … Like just now I asked David to stand up and tell me what he learned
yesterday, but nothing. Is he having a poor memory? No, what’s the problem?
What’s his problem? Can he understand in the class? I’m sure he can. What’s
his problem? [pause] He did not do what Mia said, review after returning
home. … Now I precisely tell you (i.e., David), what the teacher teaches every
day, you should find some time after going home, what should you do with the
knowledge? Summarising it, okay? Remember, it’s very important for
everyone, okay? Good, okay. (SC-CO3)
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 225
Lan’s suggestions to students, this was about empowering students with the ability to
be independent learners. According to teachers in this study, this LOA practice of
learner autonomy closely linked to language proficiency levels and it could thus help
students to improve their learning (Corno & Mandinach, 1983; Deng, 2007; Zhang &
Li, 2004). In this fashion, teachers expected to improve students’ learning outside
classroom time, which thus provided an opportunity for GVT preparation to
incorporate LOA principles.
226 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
Further, in contrast to most students from School A and School B, some School
C students reported that learner autonomy was required by teachers (Fang-SC, Jing-
SC). Therefore, this triangulated the teachers’ data of promoting learner autonomy
among students. Significantly, although learner autonomy seemed to be practised
among students with different levels of language proficiency, it was found to be
prominent with high-achieving students from the qualitative data. Drawing
implications from these findings, further relationship between learner autonomy and
student language proficiency will be presented in section 7.5.
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 227
generalised the qualitative results from students, which indicated that the GVT did
exert a positive washback on students’ learning in this respect.
Table 7.3
Indicators of learner autonomy (see instrument reliability and validity in section 4.5.3)
228 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
Hu: Tasks such as this one (i.e., No. 74), if your writing style is wrong, then you
will get a zero mark here. Because it only has what? One mark, if you made a
mistake, the exam marker would not give you a mark of 0.5. (SB-CO1)
According to Hu, reminding students of the marking criteria could thus help
them know the writing principle of “using capital letters in a sentence”. In this way,
informing students of scoring rubrics could improve both students’ vocabulary
learning and their GVT scores. Therefore, this method of familiarising students with
the marking criteria was considered as the evidence of “involvement in assessment”
(Carless, 2007), which could work as the learning goal in a testing context.
Hu: Yes, you need to constantly change [teaching methods]. … You can also ask
them to look for answers, because especially for the tasks that have
explanations [in test preparation materials], you can check the exercise
answers and then ask them to check [their own problems], isn’t this applicable?
(Interview)
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 229
regardless of their language proficiency. As one of the key principles in the LOA
theory (Carless, 2007), students’ involvement in assessment was regarded to be
important in a learning-oriented test preparation stage (Jones & Saville, 2016). In the
current study, self-assessment meant to have students go through the whole process of
“do a mock test-check one’s answers-solve one’s learning problems”. In other words,
this process of self-assessment closely linked to learners’ autonomous behaviours.
As shown in Table 7.4, the proportion of students who agreed that teachers
encouraged them to do self-assessment was much higher than those who disagreed
(46.3% versus 23.1%); the same tendency was evident for becoming familiar with
grammar and vocabulary scoring rubrics (60.2% versus 13.5%) and summarising from
exams taken (66.8% versus 12.1%). The quantitative findings indicated a positive
washback of the GVT from involvement in assessment practices encouraged by
teachers. Therefore, both qualitative and quantitative findings on students’
involvement in assessment converged and supported the argument that involvement in
230 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
assessment offered opportunities for incorporating LOA principles in GVT
preparation.
Table 7.4
Indicators of involvement in assessment (see instrument reliability and validity in section 4.5.3)
Setting out from the qualitative findings of this study and the theoretical
positioning of LOA (Carless, 2007; Jones & Saville, 2016) and learner autonomy
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 231
(Lamb, 2010), the following sections discuss two research hypotheses generated from
the qualitative findings.
232 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
Figure 7.1 demonstrates each of the four dimensions of the LOA practices, their
corresponding indicators, and the correlation between the four LOA dimensions. All
the values of standardised regression weight were above .60, and most of them were
above the preferred value of .70 (Hair et al., 2006). Moreover, the squared multiple
correlations (SMC) were all above the acceptable cut-off value of .30 (Jöreskog &
Sörbom, 1989) and most of them were greater than .50 (Jöreskog & Sörbom, 1989).
Figure 7.1. Structural model for the relationship within LOA practices in GVT preparation (N=488)
The CFA results of the construct of LOA practices with 17 variables showed a
good model fit (CMIN/DF=4.162, df=113, p=.000, SRMR=.050, RMSEA=.081; 90%
CI [.073, .088]; TLI=.908; CFI=.924). Although the model had a significant chi-square
value of 470.361, the ratio between the chi-square value and the degree of freedom
(470.361/113=4.162) was not very high. Standardised root mean square residual
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 233
(SRMR) was .050 which was below the cut-off value of .08 (Hu & Bentler, 1999);
baseline fit indices of TLI (.908) and CFI (.924) were above the cut-off value of .90
(Bentler, 1990); and the value of RMSEA (.081) was very close to the cut-off value of
.08 (Ho, 2006; Schreiber et al., 2006). Given its complexity, the model had a
reasonably good fit.
As shown in Figure 7.1, the four constructs were significantly and positively
related to each other. Their correlations vary in range between .53 and .67,
demonstrating a relatively strong inter-correlation across the four constructs. In the
literature, researchers (Brown, 2006; Ockey & Choi, 2015) recommended that
correlations above 0.85 indicated low distinction between constructs. As the
correlation results among these four factors were all below the cut-off value of 0.85,
the analysis was proceeded, which showed that Learning Oriented Assessment in GVT
preparation was a multidimensional construct with four constituting components of
classroom interaction, involvement in assessment, feedback, and learner autonomy.
These findings thus further prove the rationale for including learner autonomy, though
not conceptualised in theories (Carless, 2007; Jones & Saville, 2016), in this proposed
model of LOA practices during GVT preparation. Indeed, as claimed by scholars,
learner autonomy can promote LOA (Lamb, 2010) and closely link with assessment
practices such as self-assessment (Dam, 1995; Tassinari, 2012).
The above findings thus supported the argument that LOA cycle is an ecological
model, which combines both classroom evaluation and standardised test evaluation
(Jones & Saville, 2016). However, instead of investigating LOA from a teacher
perspective and at a macro level (i.e., focusing on assessment and educational system
level), this survey took students as participants to offer micro level evidence for the
synergy between high-stakes standardised testing and in-class as well as extra-
curricular learning.
234 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
7.4 LEARNING ORIENTED ASSESSMENT PRACTICES IN GVT
PREPARATION AND STUDENT TEST PERFORMANCE
Table 7.5
Correlation coefficients between SHSEET score and LOA practices (N=922)
In addition, Table 7.5 showed that the correlation between the SHSEET score
and learner autonomy was the highest (r=.418, p < .01) of the four. It thus indicated
that a higher SHSEET score was significantly and positively associated with a
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 235
student’s learner autonomy practices, and this association has a moderate strength. The
second highest correlation was found between the SHSEET score and the involvement
in assessment practices (r=.389, p < .01). In comparison, there was a weak, though
significant, correlation between the SHSEET score and feedback practices (r=.276, p
< .01), and between the SHSEET score and classroom interaction practices (r=.155, p
< .01).
The correlation results were consistent with the qualitative findings presented in
previous chapters and the theoretical conceptualisation of LOA (Carless, 2007; Jones
& Saville, 2016). First, the moderate significant relationship between students’ self-
reported SHSEET scores and learner autonomy corresponds to the claim that the
development of learner autonomy is mutually interactive with the growth of language
proficiency (Little, 2007). It was also consistent with extant literature (Ablard &
Lipschultz, 1998; Deng, 2007; Risemberg & Zimmerman, 1992; Zhang & Li, 2004)
which found that test score tends to be more closely linked with learner autonomy, but
the relationship between test score and learner autonomy practises was not a simple
causal relationship. Likewise, although there was a weak relationship between self-
reported SHSEET scores and involvement in assessment, the significant relationship
appeared to indicate that assessment practices such as teacher-, self- and peer-
assessment are relevant to students’ language proficiency (Iraji et al., 2016; Liu &
Brantmeier, 2019; Oscarson, 1989).
To start with, it is important to note that those challenges were identified from
participants’ perceptions and thus were explored mainly through interview accounts.
Generally, both teachers and students perceived some challenges of the incorporation
of LOA principles in GVT preparation, as their primary goal of learning was to
236 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
increase test scores. In contrast to the perceived opportunities for incorporating LOA
principles in GVT preparation, more challenges than opportunities were expressed by
participants. In general, there are eight aspects of challenges as found in the interview
data:
• Administrative influence;
• Class size;
• Test method.
In this section, these eight perceived LOA challenges are reported subsequently.
Lan: …. Sometimes, for example, when I think that students could not understand,
I use Chinese instead. Because, after all, sometimes you need to be efficient to
prepare for the test. That is to say, you can use maybe two or three sentences
in Chinese to explain something clearly, but if you use English, maybe some
students need to comprehend for quite a while for even one sentence.
Therefore, [using English Language] should accord to different situation.
(Interview)
Zhang also admitted that not using the target language in English classes was
due to the consideration of saving time and making test preparation easy for both
teachers and students. However, Zhang was aware that this timesaving compromise
was not beneficial to students’ language learning. She understood that learning an L2
was more effective when learners’ exposure to the target language was maximised and
L1 use was limited (Collins et al., 1999; Ellis, 2006; Krashen, 2003; Lightbown &
Spada, 2019).
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 237
In addition, the decrease in the variety of classroom learning activities in test
preparation classes was noted by Hu. According to her, the test preparation classes
were more boring than other classes she taught.
Hu: Ordinarily, like when we did in the first round of test review, we still read
vocabulary, read sentences or the like, and then you had so many activities and
so on. Anyway, the closer to the test date, the comparatively fewer the
activities, which is more boring. (Interview)
Hu explained that the closer the test date, the fewer class activities. This
phenomenon reflected the negative washback of the test induced by the time
consideration. This aspect of class time and efficiency had long been recognised by
researchers (see, for example, Alderson & Hamp-Lyons, 1996) to negatively influence
teaching and learning activities, such as rarely using English in teaching (Yang, 2015).
Considering this finding, in the current GVT context, time was thus influential in that
it negatively affected the range of grammar and vocabulary activities and target
language use in test preparation classes.
In fact, not only teachers but also students considered the actual factor of class
time and efficiency. For students from School B (Xun-SB, Na-SB) and School C
(Ping-SC, Jing-SC, Kai-SC), GVT preparation could not incorporate LOA principles
due to class time and test preparation considerations. In response to implicit interview
questions regarding the use of LOA principles, students expressed their concerns over
having LOA practices such as using target language and receiving teacher feedback in
classes. From students’ responses, two general views were summarised.
First, School B students regarded using the target language (i.e., English) in
classes was not so important, which proved to be an obstacle for implementing LOA
principles in test preparation classes. For example, regarding English language use in
English classes, Na-SB clarified that not too much target language should be used in
class.
Na-SB: But not too much, because if you do not understand [the language], it will
definitely influence the class efficiency in a negative way. (FG-SB)
In addition, the limited class time impacted differently on School C students who
felt that there was no need for teachers to give too much individual feedback in test
preparation classes. Their belief was that class time was too precious to waste on
238 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
individual students. For instance, Ping-SC proposed that her English teacher should
give her more and clearer feedback on her GVT exercises. However, when asked
whether she meant to have more detailed feedback on her mistakes in class, she
immediately clarified her previous answer by saying that this feedback should be
provided outside the class.
Ping-SC: Well, that (i.e., teacher helping individual student to analyse mistakes) can be
done in extra-curricular time, because the class time is already very short, and
then…
Ping-SC: Also, what if some other students did not make such a mistake? It will be a
waste of time if it’s only you who made this mistake. (FG-SC)
To conclude, the consideration of class time and efficiency during the test
preparation period was perceived by both teachers and students as a primary challenge
to the incorporation of LOA principles, but this was not uncommon in a high-stakes
testing context. In fact, studies also found that in order to make good use of the time
for test preparation (Alderson & Hamp-Lyons, 1996) and improve test preparation
efficiency, students tended to spend their extra-curricular time in enhancing their
learning and enabling a more efficient classroom learning.
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 239
Further, she emphasised the importance of taking the test into consideration
during test preparation teaching. According to Hu, the exam-oriented teaching model
that she used in test preparation classes was important for her students’ future.
Hu: The mode of “lecture-evaluate-do exercises” is boring, but you can make it
more interesting and vivid, that’s the only thing you can do. But it is effective.
… I can’t make jokes with kid, kids’ future. I can’t [be selfish to] make the
English class better, more interesting, solely because I want to make it fun.
Future is more important than anything else, those kids’ future is more
important than anything else. (Interview)
Zhang: Because, this task (i.e., the MCQ task) has a great washback, if you do not test
it, then I will decrease, automatically decrease its weight in my teaching. We
(i.e., teachers) are definitely following the baton of exam, well, to be honest,
quality-education is still far away from us. According to our current situation,
generally we are still having exam-oriented education. Okay, so, especially for
my school, the trace of exam-oriented education is really heavy. This is true,
and I am also following the baton of exam in my own teaching. What the test
tests is what we teach. If vocabulary teaching, or the MCQ task will no longer
be tested, then we will gradually decrease our research on and exercises of this
task. (Interview)
Zhang emphasised the high-stakes nature of the GVT and its impact on teaching.
As a result, any change to the test content would certainly bring about changes to
teaching. The exam worked as a baton to influence their teaching in a negative way,
which was also found in the NMET context (Qi, 2004b).
Most importantly, the reported LOA practices in test preparation classes (see
section 7.2) could also support the consideration of the high stakes of the test. For
example, although different types of classroom interactions happened in classes, it was
noticeable that content-centred activities dominated classroom interactions and Hu
rarely had interactive activities in her classes to promote students’ learning. Therefore,
it was evident that learning-oriented interactions between teachers and students or
students and students were achievable but not always present in all three teachers’
GVT preparation classes. In other words, LOA practices were conducted differently
across the observed classes.
240 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
The limited number and types of learner-centred interactions were unsurprising
since all three teachers highlighted this obvious change in Grade 9 classroom activities,
as SHSEET approached. According to teachers, various activities were held before test
preparation (i.e., in Grade 7 and Grade 8). In semi-structured interviews, they all
mentioned English activities such as “performance, film dubbing, debating” (Lan), and
“role playing” (Hu and Zhang). Those learner-centred interactive activities were
mainly targeting at cultivating students’ English learning interest through language use
rather than mere teaching instruction (Zhang, interview). However, interactive
activities such as role-play were no longer included during test preparation, because
teachers regarded it as inappropriate to have such kind of activities in Grade 9 teaching
(Hu, Interview). Hu explained her practice of “lecture-evaluate-do exercises (“讲评练”
in Chinese)” in test preparation classes. According to her, the adoption of such a model
was common among teachers when they started test preparation. Moreover, due to her
limited teaching experiences and language proficiency, she felt that this teaching
model was “easy to use” and “effective” in improving students’ test scores (Interview).
All these practices closely linked with the high stakes of the SHSEET and it is
noteworthy that the decrease of the learner-centred activities as the test approached
undermined positive washback (i.e., learning-oriented opportunities) in relation to the
test.
In Lan’s school, the change from overseas textbooks to PEP textbooks in Grade
9 (see section 7.5.2) was a decision made by the school. It was thus not within her
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 241
control; in other words, teachers had to follow policies that the school administrative
team decided to implement during test preparation. Lan explained this accordingly.
On the other hand, Hu and Zhang were facing a different issue as they had
important administrative roles. For Hu, being a head teacher of the class meant
sacrificing her own class time for student affairs and other subjects. From her
viewpoint, as she should be responsible for students’ future life and study (see section
7.5.2), her role of head teacher made her spend class time on tasks like “contract
signing”. This necessity to spend students’ learning time on activities that were not
relevant to learning or assessment tasks thus went against the LOA principle of
“assessment tasks as learning tasks” (Carless, 2007). Likewise, Zhang who was the
Director of Teaching Affairs had difficulties in providing feedback to students because
she could not devote enough time to teaching. This challenge is expressed below:
Zhang: Well, yes, because it is like this, as for me, normally I am very busy, I have a
lot of administrative work. I am different from other teachers, for example,
other teachers’ offices are upstairs. For example, after class, if they have time,
they ask students to come to the office to give face-to-face feedback or
instruction. I can’t do this. I rarely have time to allocate to students separately.
Tomorrow I have four meetings, I can do nothing about it, after one class, then
I will go to four meetings, and then mid-of-semester meeting, parent meeting,
enrolment meeting, monthly exam meeting, and the leader of the teaching and
research group meeting, there is no way to give [time] to them. (Interview)
From the above quote, it was thus seen that Zhang’s difficulty in providing
feedback and spending time on students’ learning hampered her intention to have more
LOA practices in test preparation classes. This challenge was due to her administrative
role as the Director of Teaching Affairs. Therefore, it indicated that administrative
roles seemed to bring about challenges for teachers to incorporate LOA principles in
their test preparation teaching.
242 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
The possible reasons for this finding were that their students’ language proficiency
levels differed to a great extent and their students were generally with a lower language
proficiency level than those from School A. As such, both Hu and Zhang chose not to
use the target language in classroom teaching.
Hu: Mainly used, previously used English, while later on we, later on we mainly
used Chinese to teach grammar points. If you use English to instruct grammar
points. … Well, our class’s level is certainly not up to that standard. It’s
definitely impossible. Because students’, that, level is not so high, [language
proficiency] level is not that high. (Interview)
In fact, not only target language use in classes, but also learning-oriented
activities in Zhang’s classes were negatively influenced by her students’ comparatively
low language proficiency. Therefore, when talking about the learning-oriented activity
of forming mentor and apprentice pairs in her classes, Zhang commented on her
disappointment in her current students’ level of language proficiency.
Zhang: But, not this grade, I used this method very well in previous years [with other
Grade 9 students who had already graduated]. The students themselves this
year, well, regarding mentors, I can’t find those very excellent ones, perhaps.
(Interview)
This challenge was also identified by students. For example, according to Long-
SB, students’ language proficiency levels should be taken into consideration when
using target language in classroom teaching and learning.
Long-SB: I personally do not like using English in the class, either. Because in my
opinion, in the class, well, it’s mainly because my class, [the students’
language proficiency] is greatly divided, we have [students with] high
language proficiency, but there are also many medium [language proficiency
students], and even low [language proficiency students]. Therefore, I think that
if our English teacher only uses English in the class, she will not be able to
take care of those, that is to say, students with low language proficiency and
students whose English pronunciation is not good. (FG-SB)
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 243
From the above comment, Long-SB regarded it necessary to consider students
with low language proficiency, and thus suggested to not use the target language in
actual English classes. Instead, he thought it feasible and helpful for students,
especially the low language proficiency students, to use the target language outside
classes. Thus, this finding may help to explain the lack of target language use in Hu’s
classes. Taking Long-SB’s accounts into consideration, it thus recalled both teachers
and students expressed similar concerns over the challenge of the incorporation of
LOA principles through the lens of the target language use and having LOA activities
in test preparation classes.
Hu: Certainly, some forms seem to be useless. For example, portfolio, I didn’t set
up a very detailed portfolio for those kids to track their learning progress. This
might be due to our (i.e., teachers) time and effort. Time, effort, if you want to
have a detailed record for everyone to trace and investigate this such as
portfolio, actually [it is] very, very complicated. Even if you keep a portfolio
for one person is very painful and hard, let alone for dozens of them. I have
more than 100 students, more than 100, this class has more than fifty and the
other class has more than fifty. (Interview)
According to Hu, her failure to keep detailed learning records for students were
thus a challenge for her to implement LOA principles. In fact, a portfolio “is a
purposeful collection of student work that exhibits the students’ efforts, progress and
achievements” (Paulson et al., 1991, p. 59). As an effective tool for both formative and
summative assessment, a portfolio has been regarded as crucial for both teachers and
students to attend to their own learning progress (Lau, 2018). By showing students
their learning portfolios, Zhang expected students to self-direct their learning which
helped their learning-oriented test preparation as similarly shown in other studies
(Delett et al., 2001; Mok, 2013). However, from the above accounts, it was found that
244 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
Hu had more than 100 students to teach. Therefore, in contrast to Zhang, who used
student portfolios to monitor students’ grammar and vocabulary learning, Hu felt
challenged to use portfolios in her teaching due to the large number of her students.
Thus, class size was a challenge to the implementation of LOA principles in GVT
preparation.
Hu: Those kids are very wayward, especially according to their current teenage
psychology. If you force them to learn English, but they could not understand.
They will then think ‘I spend time in English subject but still couldn’t improve
my test score, so why should I learn?’ … They will ask you the answers for
that. Perhaps for me, I’d rather give up many of my previous teaching
behaviours (i.e., teaching activities), in order to at least make sure that he will
listen to my class. (Interview)
This effort of getting students to listen to her class was indeed a reflection of
Hu’s concern for her teaching profession. As Hu commented in her interview, her
sacrifice of using the target language and interactive classroom activities in test
preparation courses was due to the consideration of students’ needs to get a higher test
score. As the head teacher of the class, she had to consider students’ test scores and
make sure that students listen to her class, which would be viewed as her teaching
performance at the end of Grade 9. This concern, proved by the current research
participant group and other studies (see, for example, Tsagari, 2011), was generally
viewed as the evaluation criteria for teachers’ academic performance and thus aroused
teachers’ anxiety. Therefore, teachers’ concern over teaching performance or career
security may impede their intentions to have LOA activities such as learner-centred
interactions during GVT preparation.
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 245
explained the reason why they felt it difficult to implement LOA principles in GVT
preparation courses.
From the above comment, it was evident that Lan felt challenging due to her
inexperience of Grade 9 teaching as a new teacher. In fact, Hu had a similar challenge
since she was also a new teacher. Therefore, this explained why Hu believed that the
mode of “instruct-evaluate-do exercises” was the best teaching method she could use
in test preparation.
In addition, compared to Lan and Hu, although Zhang had 18 years’ teaching
experience, she had a lower level of English proficiency. Therefore, she sometimes
used Chinese instead of using the target language of English in classroom teaching, as
she explains below.
Zhang: Regarding the issue of using Chinese, I think there are three reasons. The first
reason is that, my own level is not good enough. For example, if I say in a
“low” way, very easy words I can make it. But if it’s slightly difficult, and I
normally do not use those, well, then I can’t smoothly express my ideas. This
is within a certain time, well, this is my own [problem], that is to say, I easily
give up using English, and use Chinese instead. This is the first, my own factor,
that is, the teacher’s self-proficiency is not up to the standard. (Interview)
Zhang attributed her reluctance to use English in class to her lower level of
proficiency in English. As a result, although she was much more experienced in
teaching than the other two teachers, her limited language expertise hindered the
potential for incorporating learning-oriented activities and principles such as using
target language for a communicative purpose in test preparation courses.
246 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
task was discussed by School A students (Fei-SA, Ling-SA). In their opinion, the MCQ
task in the GVT did not seem to have the potential to be learning-oriented.
Fei-SA: It could not reach that level. … As for the MCQ task in the GVT, it emphasises
more about personal mastery of language knowledge, which rarely
emphasises that level and the aspect of cooperation [like completing and
presenting tasks or projects in groups, which I have mentioned in my
understanding of LOA term]. (FG-SA)
According to Fei-SA, the fact that GVT tasks which adopted a multiple-choice
method did not provide opportunities for classroom activities to incorporate
cooperative and interactive elements which were key in LOA theories (Carless, 2007;
Jones & Saville, 2016). Therefore, the use of MCQ and Sentence Completion in the
GVT, which were decontextualised, discrete-point items (Madsen, 1983), aimed to test
the accuracy of grammar and vocabulary knowledge (Halleck, 1992) and rarely
touched upon higher-level learning skills if the test preparation was to follow the LOA
principle of “assessment tasks as learning tasks” (Carless, 2007), thwarted the potential
for incorporating LOA principles in test preparation learning.
To sum up, teachers and students reported eight LOA challenges of the
incorporation of LOA principles in GVT preparation. Through the lens of LOA
principles and LOA practices such as using target language in test preparation classes
or having learning-oriented activities, the commonly recognised challenges included
efficient use of class time, the consideration of high test stakes, administrative
influence, student language proficiency, class size, the concern over teaching
performance, limited teaching experiences and expertise, and test method. On the one
hand, the qualitative results showed that, compared to Lan, Hu and Zhang had more
challenges of the incorporation of LOA principles in GVT preparation. Most
importantly, those challenges mainly came from external rather than internal factors.
Nonetheless, due to those concerns, teachers had to make compromises in their
teaching and thus led to certain ‘teaching to the test’ phenomena (see section 5.5.1).
On the other hand, although students perceived less challenges, different concerns
were reported across the observed classes. To clarify, School A students (Fei-SA,
Ling-SA) regarded MCQs in the GVT to be unable to allow for LOA practices.
Further, students from School B and School C, considered efficient use of class time
to be the priority during GVT preparation. Additionally, School B students pointed out
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 247
that students’ language proficiency should be considered when having LOA practices
in test preparation classes.
Taking insights from qualitative results, two statistical hypotheses were tested in
the quantitative phase. CFA findings suggested that Learning Oriented Assessment in
the context of GVT preparation was indeed a multidimensional construct which was
constituted by classroom interaction, involvement in assessment, feedback, and learner
autonomy. The four-dimension model fitted the data well with the four constructs
significantly correlated with one another. Further, Spearman’s correlation showed that
there were positive and significant relationships between student test performance (i.e.,
self-reported SHSEET scores) and LOA practices in GVT preparation. Therefore, the
quantitative findings indicated that LOA in GVT preparation was a dynamic
multidimensional construct and the GVT could enable LOA opportunities (i.e.,
positive washback) in test preparation.
248 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
Table 7.6
Qualitative findings of RQ2
Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation 249
250 Chapter 7: The Incorporation of LOA Principles: Opportunities and Challenges in GVT Preparation
Chapter 8: Discussion and Conclusion
This chapter presents the discussion and the overall conclusion of the thesis.
Section 8.1 discusses and interprets the important research findings of this study by re-
addressing the research questions of washback value (section 8.1.1), washback
intensity (section 8.1.2), washback mechanism (section 8.1.3), and LOA opportunities
and challenges (section 8.1.4), which are summarised in section 8.1.5. Section 8.2
sums up the contributions and implications of the study from theoretical (section
8.2.1), methodological (section 8.2.2), and practical perspectives (section 8.2.3).
Section 8.3 documents the researcher’s reflections on conducting this exploratory
sequential mixed methods research (MMR) study. Section 8.4 summarises the
limitations of the current research. Section 8.5 delineates the recommendations for
future research. The whole study is then concluded in section 8.6.
8.1 DISCUSSION
Both teachers and students recognised the central role of the ECSCE and Test
Specifications in GVT preparation. However, their focus was on grammar and
vocabulary lists in both documents, rather than following the teaching as well as
assessment principles in the ECSCE which highlight the principles of learner-centred
English education for compulsory education. This indicated the disconnect with the
stated intentions of the curriculum standards, as the test failed to bring about a positive
change at the macro level of washback value, based on the negative findings from both
Grade 9 teachers and students in this study. It thus echoes the criticism that
standardised tests which are used for gatekeeping purposes can generate negative
results and impede curriculum implementation (Dello-Iacovo, 2009), and the research
finding that intended washback by test constructors can fail when the gatekeeping role
and selection function of the exam conflict with each other (Qi, 2004b, 2005, 2007).
In addition, teachers focused more on test content rather than curriculum content
as also found by Al-Wadi (2020) and thus students’ communicative language use
ability could not be improved due to the “narrowing of the curriculum” caused in-class
drilling of grammar and vocabulary knowledge and tested skills (Saglam & Farhady,
2019). This finding reflected the conflict between the curriculum requirements and
actual test preparation needs in the test preparation stage. The reason might be that it
was difficult for teachers to respond to both the curriculum stipulation and test
requirements at the same time as similarly reported by Al-Wadi (2020), since they
constitute competing imperatives. Nonetheless, as argued by scholars (see, for
example, Messick, 1996; Shohamy, 2001), it is not the case that well-designed tests
can only generate positive washback and poor tests are destined to bring about negative
washback. Therefore, the reflection of learning-oriented principles in the test design
and the challenge for curriculum implementation in GVT preparation did not
necessarily mean that the GVT will only bring about positive washback or negative
In this study, participants’ perception of the overall ability to use language was
an indication of the potential for assessing some aspects of communicative language.
Although communicative language use was interpreted differently by teachers and
students, similar to NMET participants (Dong, 2020), it was possible that the two
Cloze tasks assessed students’ ability to use language and thus had the potential for
positive washback. In fact, it is widely acknowledged that the communicative features
of a test could bring about positive washback (Erfani, 2012; Hawkey, 2006; Wall &
Alderson, 1993), which is closely linked to the use of authentic and integrated tasks
(Biber & Gray, 2013; Jamieson et al., 2000; Ostovar-Namaghi & Safaee, 2017).
Therefore, the fact that passage-based GVT tasks of Cloze and Gap-filling cloze tested
students’ overall ability to use language, provided a rich context for language use, and
had topics that were relevant to real-life experiences indicated the communicative
language testing characteristics of the GVT design.
Nonetheless, although different from other studies where anxiety regarding test-
taking was found, there were several reasons for students’ less anxious feelings
towards the GVT in this study. These reasons included but were not limited to: 1)
Qualitative data disclosed that teachers and students often complained the lack
of authentic language in GVT tasks. As revealed by teachers and focus group
participants, most students thought that GVT tasks did not reflect real-life language
use, which raised the question of the quality of GVT tasks. This negative perception
was similar to what Zhi and Wang (2019) found in the NMET context when they
perceived the irrelevance of test content to real-life English threatened test
authenticity. Therefore, the similar issue of a lack of authentic language in GVT tasks
was considered as a threat to GVT authenticity in the SHSEET. In turn, this issue
indicated the influence of the negative washback potential of GVT design
characteristics on learning. Most importantly, these negative perception findings
highlighted that teachers as well as students felt the need for more authentic grammar
and vocabulary tasks in the GVT.
Possible reasons for teachers’ anxious feelings might include but not limited to
the accountability issue. Worldwide, students’ performance in standard EFL exams
had been used to reflect teachers’ professional competence (McDonnell, 2004) and the
assessment criteria for teaching and learning performance was judged through a high-
stakes exam system which served to rank students, and evaluated schools as well as
teachers (McDonnell, 2013; Nichols et al., 2006; Tsagari, 2011). Based on the same
consideration, students’ test achievements became the main goal for both teachers and
students during GVT preparation, which led to the negative influence of the test where
teachers and students might avoid communicative activities and materials that were
not perceived as helpful in test score improvement as found in other test contexts (see,
for example, Tsagari, 2011).
Given this consideration, it was not surprising that students reported extrinsic
motivation such as competing with peers as important to their test preparation. This
was similar to other high-stakes EFL studies in China like the GSEEE as researchers
found students attached great value to instrumental motivation which was perceived
to negatively impact the education system and teaching (He, 2010).
As this study was conducted in the last semester of Grade 9, and the test date
was drawing close to teachers and students, it was unsurprising to find intense in-class
washback of which other researchers call “seasonality of washback” (Bailey, 1999;
Additionally, students with low English proficiency tended to spend more effort
in preparing for the exam. As School C students reported in the focus group, teachers
required them to do test-driven exercises as homework, which was thus viewed as the
shifting of GVT preparation to extra-curricular study. This negative washback was
also manifested by the TOEFL preparation in the Sri Lankan context (Alderson &
Hamp-Lyons, 1996).
Finally, Pattern 4 in MCA results revealed that intense test preparation was
conducted by some survey participants but had no connection to their perceptions of
test importance and test difficulty. This finding did not align with the theoretical
assumption that visible washback or intense washback seen in significant effort during
test preparation was generated under the driving force of test importance and test
difficulty (Green, 2007a, 2013; Hughes, 1993). As such, whether Pattern 4
demonstrates an “invisible” washback effect or falls beyond the washback
First of all, students’ intrinsic motivation and test anxiety explained the influence
from perceptions of language use characteristics (Positive Perception2) and self-
perceived test use purpose such as proving language proficiency (Test Importance2)
to GVT preparation practices of language-use oriented learning strategies use (Positive
strategy) and Test Preparation Effort. This finding can be unpacked in three aspects of
intrinsic motivation, test anxiety, and Green’s (2007a) washback model as follows.
In this overall finding, intrinsic motivation was first perceived as a key factor in
positive washback on students’ learning. As such, students who were intrinsically
motivated could be in a better position to be those who used language-use oriented
learning strategies more frequently, which in turn appeared to be those with higher
SHSEET scores. This finding echoed that of Wolf and Smith (1995) who claimed that
test-takers’ perceptions of test consequence (i.e., test importance) significantly
influenced participants’ motivation which was positively linked with test performance.
However, as their motivation was scaled from high to low, it was not directly
comparable to the current motivation scale, which was constituted by both intrinsic
motivation and extrinsic motivation. In this study, it was understandable that
intrinsically motivated students tended to adopt more language-use oriented learning
strategies for GVT preparation, and the result that they also tended to spend more test
preparation effort might be because that those students did not see any harm in doing
test papers and they may also enjoy doing exercises like they do with other language
learning activities.
Most importantly, the first major finding of SEM analysis suggested that GVT
design features with regard to testing the overall ability to use language (the construct
of Positive Perception2, see section 5.2.5) supported the theoretical conceptualisation
of the washback value dimension in Green’s (2007a) model, since the GVT as a whole
reflected the testing of overall ability to use language (focal construct) which brought
about positive washback on students’ learning (language-use oriented learning
strategy) through the indicators of intrinsic motivation and test anxiety. Therefore,
although perceived from a student perspective16, the macro level of washback value in
both Green’s (2007a) model and the new washback model incorporating LOA was
found to be partially supported. Therefore, the GVT design characteristics of testing
overall ability to use language indicated positive washback.
The second major finding of this SEM model suggested that students with higher
self-reported SHSEET scores tended to be those who adopted language-use oriented
learning strategies and spent more test preparation effort. These findings were also
reported by other researchers (see, for example, Dong, 2020; Green, 2007b; Xie,
2013). On the one hand, language-use oriented learning strategies used by students has
the strongest association with students’ SHSEET scores as self-reported in the survey
(r=.311). The finding echoed the researcher’s assumption that using more language-
use oriented learning strategies would link more closely with students’ test
performance and corresponded to the qualitative finding that high-achieving students
reported more application of language-use oriented learning strategies during GVT
16
Since essential construct validity evidence was not obtained in the qualitative phase, it was perceived
only from a student perspective by investigating their perceptions of test characteristics in the survey,
which was thus categorised as a micro level factor in the new washback value model of this study.
On the other hand, when incorporating LOA principles in GVT preparation, both
teachers and students experienced challenges. Teachers felt that the high-stakes nature
of the SHSEET made them reluctant to include interactive activities in classes; instead,
they spent time on drilling. In other words, teachers concerned and were mindful of
improving students’ language learning, but their teaching were constrained by the test.
It thus closely echoed the fact that high-stakes standardised EFL tests in China were
mainly viewed by teachers to be ‘the baton of teaching’ (Qi, 2004b). Given the high-
stakes nature of the SHSEET, it was not surprising to notice that teachers felt torn
between communicative language teaching (CLT) and improving students’ scores on
the test. In turn, this indicated that the high stakes of a test hindered the intention of
improving student language learning (i.e., the implementation of LOA principles in
this study) during test preparation (Qi, 2005). Moreover, participants’ consideration of
the efficient use of class time (Alderson & Hamp-Lyons, 1996; Yang, 2015) and the
differing student language proficiency levels challenged their intentions to follow
LOA principles in GVT preparation, which was common during the high-stakes test
preparation period. Likewise, teachers’ concern over the large class size showed their
difficulty in having LOA practices. Interestingly, in view of the large population of
students in China and generally large classes in schools, it was unsurprising that
teachers who had larger class sizes could have more workload than those who had
smaller classes. Against this backdrop, it was possible that class size influenced LOA
practices in classes, as teachers with fewer students were found to be more likely to
implement AfL in classes (Danielson, 2008).
3. Providing feedback which can feed forward both in and outside class;
Moreover, this study not only contributes to the richness of washback literature,
but also offers an opportunity to investigate the relationship between the two major
washback dimensions of washback value and washback intensity. In other words, this
study not only presented results of washback value and washback intensity, but also
focused on deconstructing the washback mechanism in the current GVT context. It
conceptualised the potential relationship between test perceptions and test preparation
practices through the influence of affective factors. Although qualitative methods
could effectively identify the actual factors influencing the washback mechanism of
the GVT, they were unable to statistically present and explore the internal relationship
among all variables (Dong, 2020; Xie, 2015a). Thus, the current study is meaningful
in that it performed SEM to investigate the complex relationships between various
washback factors.
Further, the washback mechanism of the GVT in this study is systematic and
thorough compared with similar washback studies. Although a handful of washback
mechanism studies have been conducted, they mainly discuss washback mechanism
separately. For example, studies explored relationships between test preparation and
learning outcomes (Dong, 2020; Xie, 2013), test perceptions and test preparation
(Dong, 2020; Xie, 2015a), and motivation factors and test preparation (Xie, 2015a).
Even though studies have investigated the relationship between test perception and test
preparation or learning outcomes through the influential factors like expectancy-value
(Xie, 2013; Xie & Andrews, 2013), the current study took into account the more
complex elements of affective factors to complete the comprehensive washback
conceptualisation.
Further, as teachers and students suggested in interviews, the design of GVT test
items could be promoted from two main aspects. On the one hand, GVT tasks should
reflect real-life language and experiences and contain richer language context; that is,
test design should consider both text authenticity and task authenticity (Morrow,
1991). Thus, current affairs or local culture and authentic texts should be incorporated.
However, it is important to avoid unfamiliar topics that students cannot be expected to
have knowledge of. On the other hand, test methods of GVT tasks should be
reconsidered. According to participants’ positive perceptions of GVT design
characteristics and students’ perceived challenges of test methods in GVT tasks, more
passage-based grammar and vocabulary tasks such as Cloze and Gap-filling cloze
should be adopted to assess students’ language knowledge. Alternatively, assessing
students’ grammar and vocabulary through integrated writing and speaking tasks
rather than in a separate section of the test will be beneficial. To this end, students’
expectations of having more challenging grammar and vocabulary tasks to assess their
language competence could be met.
Moreover, based on the previous improvement of test tasks, the process of the
GVT test design could be advised to teachers and students. Test designers could
instruct schools and teachers on how to design learning-oriented assessment to reach
the purpose of improving learning. Likewise, instead of making the test design process
a secret, instructing classroom teachers and students how to design GVT tasks
8.3 REFLECTION
To better guide future MMR research projects, the researcher points out three
reflections that she engaged in after data collection and analysis.
The first reflection is about survey design. As reported in Chapter Four, the face
validity of the student survey was checked via different parties. It is essential to be
careful about the wording used in surveys. Moreover, applying theoretical assumptions
in a specific research context should be handled with care. For example, although peer
assessment was assumed to be a key element in students’ involvement in assessment
(Carless, 2007), it might not be applicable to the SHSEET context or to junior high
school students in China.
8.4 LIMITATIONS
The fourth limitation is about the statistical models structured in this study.
Primarily, although the SEM model of GVT washback mechanism was constructed on
the bases of theoretical conceptualisation (Bailey, 1996; Green, 2007a; Hughes, 1993;
Wolf & Smith, 1995), qualitative results, and empirical studies (Dong, 2020; Xie,
2015a), the LOA practices were not included in the complex SEM model of the GVT
washback mechanism. Moreover, discriminate validity which could help to make
distinctions between constructs was not tested in the CFA model of LOA practices,
but the researcher was aware of this.
This study has investigated the washback of the GVT on teaching and learning,
and the relationships between those washback factors. Most importantly, this study
explored the possibility of using LOA theorisation to explore positive washback and
negative washback opportunities and challenges. Based on the research findings, the
researcher proposes the following areas worth investigating in future studies:
Including other junior high school grades in similar washback and LOA
studies
As the current research was conducted in the last semester of Grade 9 when
students were about to sit for the test, it was thus unable to present the influence of
SHSEET on teaching and learning in a long term. As this study highlighted, LOA test
preparation practices such as feedback and washback effect could be longitudinal
(Yang et al., 2013), it is thus worth considering longitudinal washback and LOA
studies to further explore the potential for positive washback.
Using MCA for exploring both in-class and extracurricular test preparation
Even though Pattern 4 in MCA results needs further investigations, the MCA
results in Chapter Six highlighted the feasibility of applying this quantitative method
in the analysis of washback intensity of tests. Therefore, future studies can consider
using MCA to quantify both in-class and extra-curricular test preparation of the
SHSEET or any other high-stakes standardised EFL tests to explore rich washback
intensity patterns.
This is a washback study which explored the influence of the grammar and
vocabulary testing in the SHSEET on EFL teaching and learning in junior high schools
in China. Considering the high-stakes test feature and the under-researched topic of
the SHSEET, particularly the GVT section, the researcher aimed to reveal the actual
effect of the GVT through an LOA perspective. This study, applying an exploratory
sequential MMR design, used qualitative data to inform the quantitative design of a
student survey. Green’s (2007a) washback model, Carless’ (2007) LOA
conceptualisation, and Jones and Saville’s (2016) LOA cycle helped to frame the
theoretical foundation for this study.
To conclude, it was evident in the study that the GVT influenced the grammar
and vocabulary teaching and learning in Grade 9 in both positive and negative
directions. However, it was hard to judge whether positive washback of the GVT
outweighed its negative washback or vice versa. The researcher would like to re-iterate
and conclude the study by reminding the readers of the aim of this study and the
immediate call for bringing about positive washback (Bailey, 1996) through testing
and the synergy between instruction, testing, and learning (Jones & Saville, 2016;
Turner & Purpura, 2016). Therefore, as the study aimed to unpack potentials for
positive washback or LOA opportunities, it thus offers valuable insights into
reconciling the tension between assessment, teaching, and learning.
Bibliography 287
Ballou, D., & Springer, M. G. (2015). Using student test scores to measure teacher
performance: Some problems in the design and implementation of evaluation
systems. Educational Researcher, 44(2), 77-86.
Barksdale-Ladd, M. A., & Thomas, K. F. (2000). What’s at stake in high-stakes
testing. Journal of Teacher Education, 51(5), 384.
Bell, C., & Harris, D. (2013). Evaluating and assessing for learning (Revised ed.).
Routledge.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological
bulletin, 107(2), 238-246.
Bentler, P. M., & Chou, C.-P. (1987). Practical issues in structural modeling.
Sociological Methods & Research, 16(1), 78-117.
Berwick, R., & Ross, S. (1989). Motivation after matriculation: Are Japanese
learners of English still alive after exam hell? JALT Journal, 11(2), 193-210.
Biber, D., & Gray, B. (2013). Discourse characteristics of writing and speaking task
types on the TOEFL iBT® test: A lexico-grammatical analysis (TOEFL iBT
Research Report-19). Educational Testing Service.
Birjandi, P., & Siyyari, M. (2010). Self-assessment and peer-assessment: A
comparative study of their effect on writing performance and rating accuracy.
Iranian Journal of Applied Linguistics, 13(1), 23-45.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in
Education: Principles, Policy & Practice, 5(1), 7-74.
Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment.
Educational Assessment, Evaluation and Accountability (formerly: Journal of
Personnel Evaluation in Education), 21(1), 5-31.
Booth, D., & Saville, N. (2000). Development of new item-based tests: The gapped
sentences in the revised CPE paper 3. Research Notes, 2, 10-11.
Bousfield, K., & Ragusa, A. T. (2014). A sociological analysis of Australia's
NAPLAN and My School Senate Inquiry submissions: The adultification of
childhood? Critical Studies in Education, 55(2), 170-185.
Boyatzis, R. E. (1998). Transforming qualitative information: Thematic analysis and
code development. Sage Publications.
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative
Research in Psychology, 3(2), 77-101.
Braun, V., Clarke, V., Hayfield, N., & Terry, G. (2019). Thematic analysis. In P.
Liamputtong (Ed.), Handbook of Research Methods in Health Social Sciences
(1st ed., pp. 843-860). Springer Singapore.
Brown, J. D. (2000). University entrance examinations: Strategies for creating
positive washback on English language teaching in Japan. Shiken: JALT Testing
& Evaluation SIG Newsletter, 3(2), 2-7.
288 Bibliography
Brown, T. A. (2006). Confirmatory factor analysis for applied research. Guilford
Publications.
Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd ed.).
Guilford Publications.
Buck, G. (1988). Testing listening comprehension in Japanese university entrance
examinations. JALT Journal, 10(1), 15-42.
Burrows, C. (2004). Washback in classroom-based assessment: A study of the
washback effect in the Australian adult migrant English program. In L. Cheng,
Y. Watanabe, & A. Curtis (Eds.), Washback in language testing: Research
contexts and methods (pp. 113-128). Lawrence Erlbaum Associates.
Canale, M. (1983a). From communicative competence to communicative language
pedagogy. In J. C. Richards & R. W. Schmidet (Eds.), Language and
communication (Vol. 1, pp. 2-27). Longman.
Canale, M. (1983b). On some dimensions of language proficiency. In J. W. Oller
(Ed.), Issues in language testing research (pp. 333-342). Newbury House.
Canale, M., & Swain, M. (1980). Theoretical bases of com-municative approaches to
second language teaching and testing. Applied linguistics, 1(1), 1-47.
Carless, D. (2007). Learning-oriented assessment: Conceptual bases and practical
implications. Innovations in Education and Teaching International, 44(1), 57-
66.
Carless, D. (2015). Exploring learning-oriented assessment processes. Higher
Education, 69(6), 963-976.
Carless, D., Joughin, G., & Mok, M. M. C. (2006). Learning-oriented assessment:
Principles and practice. Assessment & Evaluation in Higher Education, 31(4),
395-398.
Carroll, B. J. (1980). Testing communicative performance: An interim study.
Pergamon Press.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate
Behavioral Research, 1(2), 245-276.
Cazden, C. B. (2001). Classroom discourse: The language of teaching and learning
(2nd ed.). Heinemann.
Celce-Murcia, M. (2007). Towards more context and discourse in grammar
instruction. TESL-EJ, 11(2), 1-6.
Celce-Murcia, M., & Larsen-Freeman, D. (1999). The grammar book: An ESL/EFL
teacher's course (2nd ed.). Heinle/Cengage Learning.
Chapman, D. W., & Snyder, C. W. (2000). Can high stakes national testing improve
instruction: Reexamining conventional wisdom. International Journal of
Educational Development, 20(6), 457-474.
Chen, G. (2007). An oral English exam exercise eystem: Research and design.
Modern Educational Technology, 17(08), 68-71, 78.
Bibliography 289
Chen, Y., Cai, J., & Hu, L. (2018). The washback effect of the new model of foreign
language examination in NCEE. Foreign Language Research, 1, 79-85.
Cheng, L. (1997). How does washback influence teaching? Implications for Hong
Kong. Language and Education, 11(1), 38-54.
Cheng, L. (1998). Impact of a public english examination change on students’
perceptions and attitudes toward their English learning. Studies in Educational
Evaluation, 24(3), 279-301.
Cheng, L. (1999). Changing assessment: Washback on teacher perceptions and
actions. Teaching and Teacher Education, 15(3), 253-271.
Cheng, L. (2005). Changing language teaching through language testing: A
washback study. Cambridge University Press.
Cheng, L. (2008a). The key to success: English language testing in China. Language
Testing, 25(1), 15-37.
Cheng, L. (2008b). Washback, impact and consequences. In E. Shohamy & N. H.
Hornberger (Eds.), Encyclopedia of language education. Volume 7: Language
testing and assessment (2nd ed., pp. 349–364). Springer Science and Business
Media LLC.
Cheng, L., Andrews, S., & Yu, Y. (2011). Impact and consequences of school-based
assessment (SBA): Students’ and parents’ views of SBA in Hong Kong.
Language Testing, 28(2), 221-249.
Cheng, L., & Curtis, A. (2004). Washback or backwash: A review of the impact of
testing on teaching and learning. In L. Cheng, Y. Watanabe, & A. Curtis (Eds.),
Washback in language testing: Research contexts and methods (pp. 3-17).
Lawrence Erlbaum Associates, Inc.
Cheng, L., & Curtis, A. (2010). English language assessment and the Chinese
learner. Routledge.
Chongqing Municipal People’s Government Network. (2018). 305,000 students in
our city have over 60% of today’s high school entrance examination (woshi
305,000 xuesheng jinri zhongkao chao liucheng kaosheng keshang putong
gaozhong). Retrieved January 12 from http://jw.cq.gov.cn/Item/29940.aspx
Chongqing Municipal People’s Government. (2015). The introduction of Chongqing
(chongqingshi jianjie). Retrieved December 12 from
http://www.cq.gov.cn/cqgk/82835.shtml
Chongqing Zhongkao. (2017). Interpreting the SHSEE policy of associated areas
and unassociated areas in Chongqing (2018 nian chongqing zhongkao zhengce
jiedu zhi lianzhaoqu yu fei lianzhaoqu). Retrieved December 21 from
http://cq.zhongkao.com/e/20170814/59914e842191d.shtml
Chou, M.-H. (2019). The impact of the English listening test in the high-stakes
national entrance examination on junior high school students and teachers.
International Journal of Listening, 1-19.
290 Bibliography
Chrzanowska, J. (2002). Interviewing groups and individuals in qualitative market
research. Sage.
Clarke, V., & Braun, V. (2017). Thematic analysis. The Journal of Positive
Psychology, 12(3), 297-298.
Cohen, A. D. (2013). Using test-wiseness strategy research in task development. In
A. J. Kunnan (Ed.), The companion to language assessment (pp. 893–905).
Wiley/Blackwell.
Collins, L., Halter, R. H., Lightbown, P. M., & Spada, N. (1999). Time and the
distribution of time in L2 instruction. 33(4), 655-680.
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). L.
Erlbaum Associates.
Corno, L., & Mandinach, E. B. (1983). The role of cognitive engagement in
classroom learning and motivation. Educational psychologist, 18(2), 88-108.
Creswell, J. W. (2011). Controversies in mixed methods research. In N. K. Denzin &
Y. S. Lincoln (Eds.), The Sage handbook of qualitative research (Vol. 4, pp.
269-283).
Creswell, J. W. (2013). Qualitative inquiry and research design: Choosing among
five approaches (Third ed.). SAGE Publications.
Creswell, J. W. (2015). Educational research: Planning, conducting, and evaluating
quantitative and qualitative research (5th ed.). Pearson Education Inc.
Creswell, J. W., & Plano Clark, V. L. (2011). Designing and conducting mixed
methods research (2nd ed.). SAGE Publications.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.
Psychometrika, 16(3), 297-334.
Cui, D. (2006). The principles and trends of proposition in Senior High School
Entrance English Test under the New English Curriculum. Journal of Shanxi
Normal University (Philosophy and Social Sciences Edition), 35(S1), 374-376.
Dam, L. (1995). Learner autonomy 3: From theory to classroom practice. Authentik.
Damankesh, M., & Babaii, E. (2015). The washback effect of Iranian high school
final examinations on students’ test-taking and test-preparation strategies.
Studies in Educational Evaluation, 45, 62-69.
Danielson, C. (2008). Assessment for learning: For teachers as well as students. In C.
Dwyer (Ed.), The Future of Assessment (Vol. Routledge, pp. 191-213).
Routledge.
Dávid, G. (2007). Investigating the performance of alternative types of grammar
items. Language Testing, 24(1), 65-97.
Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in
human behavior. Plenum.
Bibliography 291
Deci, E. L., & Ryan, R. M. (2010). Intrinsic motivation. In I. B. Weiner & W. E.
Craighead (Eds.), The Corsini encyclopedia of psychology (pp. 1-2).
Delett, J. S., Barnhardt, S., & Kevorkian, J. A. (2001). A framework for Portfolio
Assessment in the foreign language classroom. 34(6), 559-568.
Dello-Iacovo, B. (2009). Curriculum reform and ‘Quality Education’ in China: An
overview. International Journal of Educational Development, 29(3), 241-249
Deng, D. (2007). An exploration of the relationship between learner autonomy and
English proficiency. Asian EFL Journal, 24(4), 24-34.
Deng, Y. (2018). A study on the washback effects of the Junior Secondary English
Achievement Graduation Test of Zhangjiajie Prefecture [Master’s thesis, Hunan
Normal University]. Changsha, Hunan.
Ding, R. (2014). Validity study on the cloze test of Shanxi Senior High School
English Entrance Exam from 2009 to 2013 [Master’s thesis, Northwest
University]. Xi’an, Shanxi.
Docherty, C. (2015). Revising the use of English component in FCE and CAE.
Research Notes, 62, 15-20.
Doe, C., & Fox, J. (2011). Exploring the Testing Process: Three Test Takers’
Observed and Reported Strategy Use over Time and Testing Contexts.
Canadian Modern Language Review, 67(1), 29-54.
Dong, M. (2020). Structural relationship between learners’ perceptions of a test,
learning practices, and learning outcomes: A study on the washback mechanism
of a high-stakes test. Studies in Educational Evaluation, 64, 100824.
Education History Research Group in Teaching Materials Research Institute. (2008).
The development of English curriculum (syllabus) for Chinese primary and
secondary schools in China in 20th century. Retrieved September 22 from
http://old.pep.com.cn/dy_1/
Ellis, R. (2002). The place of grammar instruction in the second/foreign curriculum.
In E. Hinkel & S. Fotos (Eds.), New perspectives on grammar teaching in
second language classrooms (pp. 17-34). Erlbaum.
Ellis, R. (2006). Current issues in the teaching of grammar: An SLA perspective.
TESOL Quarterly, 40(1), 83-107.
Erfani, S. S. (2012). A comparative washback study of IELTS and TOEFL iBT on
teaching and learning activities in preparation courses in the Iranian context.
English Language Teaching, 5(8), 185-195.
Everhard, C. J. (2015). The assessment-autonomy relationship. In C. J. Everhard &
L. Murphy (Eds.), Assessment and autonomy in language learning (pp. 8-34).
Palgrave Macmillan.
Fan, J., & Ji, P. (2014). Test candidates’ attitudes and their test performance: The
case of the Fudan English Test. University of Sydney Papers in TESOL, 9, 1-35.
292 Bibliography
Fan, Y. (2015). The globalization and localization of English from the perspective of
English as a lingua franca and implications for “China English” and English
language education in China. Contemporary Foreign Languages Studies, 6, 29-
33.
Ferman, I. (2004). The washback of an EFL national oral matriculation test to
teaching and learning. In L. Cheng, Y. Watanabe, & A. Curtis (Eds.), Washback
in language testing: Research contexts and methods (pp. 191-210). Lawrence
Erlbaum Associates, Inc.
Field, A. P. (2009). Discovering statistics using SPSS (3rd ed.). SAGE Publications.
Fink, A. (2009). How to conduct surveys: A step-by-step guide (4th ed.). SAGE.
Fotos, S. (1994). Integrating grammar instruction and communicative language use
through grammar consciousness-raising tasks. TESOL Quarterly, 28(2), 323-
351.
Fotos, S., & Ellis, R. (1991). Communicating about grammar: A task-based
approach. TESOL Quarterly, 25(4), 605-628.
Franklin, C., & Ballan, M. (2001). Reliability and validity in qualitative research. In
B. A. Thyer (Ed.), The handbook of social work research methods. SAGE
Publications, Inc.
Gan, Z. (2009). IELTS preparation course and student IELTS performance: A case
study in Hong Kong. RELC Journal, 40(1), 23-41.
Gardner, R. C. (1985). Social psychology and second language learning: The role of
attitudes and motivation. Edward Arnold.
Gardner, R. C., Lalonde, R. N., & Moorcroft, R. (1985). The role of attitudes and
motivation in second language learning: Correlational and experimental
considerations. Language Learning, 35(2), 207-227.
Gardner, R. C., & Lambert, W. E. (1972). Attitudes and motivation in second-
language learning. Newbury House Pubs.
Gates, S. (1995). Exploiting washback from standardized tests. In J. D. Brown & S.
O. Yamashata (Eds.), Language testing in Japan (pp. 107-112). Japan
Association for Language Teaching.
Geng, Y. (2013). A study of the washback of JSEAGT listening on secondary school
EFL teaching. [Master’s thesis, Ludong University]. Yantai, Shandong.
Geranpayeh, A. (2007). Using structural equation modelling to facilitate the revision
of high stakes testing: The case of CAE. Research Notes, 30, 8–12.
Goffman, E. (1971). The presentation of self in everyday life. Penguin.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world.
Annual review of psychology, 60, 549-576.
Green, A. (2006a). Washback to the learner: Learner and teacher perspectives on
IELTS preparation course expectations and outcomes. Assessing Writing, 11(2),
113-134.
Bibliography 293
Green, A. (2006b). Watching for washback: Observing the influence of the
International English Language Testing System academic writing test in the
classroom. Language Assessment Quarterly, 3(4), 333-368.
Green, A. (2007a). IELTS washback in context: Preparation for academic writing in
higher education. Cambridge University Press.
Green, A. (2007b). Washback to learning outcomes: A comparative study of IELTS
preparation and university pre-sessional language courses. Assessment in
Education: Principles, Policy & Practice, 14(1), 75-97.
Green, A. (2013). Washback in language assessment. International Journal of
English Studies, 13(2), 39-51.
Green, A. (2014). The Test of English for Academic Purposes (TEAP) impact study:
Report 1 - preliminary questionnaires to Japanese high school students and
teachers. Eiken Foundation of Japan.
Greenacre, M. J. (1991). Interpreting multiple correspondence analysis. Applied
Stochastic Models and Data Analysis, 7(2), 195-210.
Greenacre, M. J. (2017). Correspondence analysis in practice (3 ed.). Chapman and
Hall/CRC.
Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual
framework for mixed-method evaluation designs. Educational Evaluation and
Policy Analysis, 11(3), 255-274.
Gu, X., & Saville, N. (2012). Impact of Cambridge English: Key for Schools and
Preliminary for Schools–parents’ perspectives in China. Research Notes, 50, 48-
56.
Gu, X., Zhang, Z., & Liu, X. (2014). An empirical study of the innovated CET
washback on students’ extra-curricular learning process based on students’
learning diaries. Journal of PLA University of Foreign Languages, 35(5), 32-39,
159.
Gu, Y. (2012). English curriculum and assessment for basic education in China. In J.
Ruan & C. Leung (Eds.), Perspectives on teaching and learning English literacy
in China (Vol. 3, pp. 35-50). Springer Netherlands.
Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? An
experiment with data saturation and variability. Field methods, 18(1), 59-82.
Guo, S., Guo, Y., Luke, A., Dooley, K., & Mu, G. M. (2019). Market economy,
social change, and educational inequality: Notes for a critical sociology of
Chinese education. In G. M. Mu, K. Dooley, & A. Luke (Eds.), Bourdieu and
Chinese education: Inequality, competition, and change (pp. 20-44). Routledge.
Gyllstad, H., Vilkaitė, L., & Schmitt, N. (2015). Assessing vocabulary size through
multiple-choice formats: Issues with guessing and sampling rates. ITL-
International Journal of Applied Linguistics, 166(2), 278-306.
Haertel, E. (1992). Performance assessment. In M. C. Alkin (Ed.), Encyclopedia of
educational research (6th ed., pp. 984-989). Macmillan.
294 Bibliography
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate data
analysis with readings (3rd ed.). Prentice Hall.
Hair, J. F., Black, B. J., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2006).
Multivariate data analysis (6th ed.). Prentice Hall.
Halai, N. (2007). Making use of bilingual interview data: Some experiences from the
field. Qualitative Report, 12(3), 344.
Halcomb, E. J., & Davidson, P. M. (2006). Is verbatim transcription of interview data
always necessary? Applied Nursing Research, 19(1), 38-42.
Hall, J. K., & Walsh, M. (2002). 10. Teacher-student interaction and language
learning. Annual Review of Applied Linguistics, 22, 186-203.
Halleck, G. B. (1992). The oral proficiency interview: Discrete point test or a
measure of communicative language ability? Foreign Language Annals, 25(3),
227-231.
Halliday, M. A. K. (2004). An introduction to functional grammar (3rd ed.). Arnold.
Hamp-Lyons, L. (1997). Washback, impact and validity: Ethical concerns. Language
Testing, 14(3), 295-303.
Hamp-Lyons, L., & Green, T. (2014, October). Applying a concept model of
learning-oriented language assessment to a large-scale speaking test.
Presentation at the Roundtable on Learning-Oriented Assessment in Language
Classrooms and Large-Scale Contexts, Teachers College, Columbia University,
New York.
Harlen, W., & Deakin-Crick, R. (2003). Testing and motivation for learning.
Assessment in Education: Principles, Policy & Practice, 10(2), 169-207.
Harrington, D. (2009). Confirmatory factor analysis. Oxford University Press.
Harrington, M. (2018). Measuring lexical facility: The timed yes/no test. In Lexical
facility: Size, recognition speed and consistency as dimensions of second
language vocabulary knowledge (pp. 95-119). Palgrave Macmillan UK.
Harrison, J. (2015). The English grammar profile. In J. Harrison & F. Barker (Eds.),
English profile in practice (Vol. 5, pp. 28-48). Cambridge University Press.
Hasselgreen, A., Drew, I., & Sørheim, B. (2012). Understanding the language
classroom. Fagbokforlaget, Bergen.
Hatch, J. A. (2002). Doing qualitative research in education settings. State
University of New York Press.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational
Research, 77(1), 81-112.
Hawkey, R. (2006). Impact theory and practice: Studies of the IELTS test and
Progetto Liugue 2000. Cambridge University Press.
Bibliography 295
He, L. (2010). The Graduate School Entrance English Examination. In L. Cheng &
A. Curtis (Eds.), English language assessment and the Chinese learner (pp. 145-
157). Routledge.
He, Q. (2001). English language education in China. In S. J. Baker (Ed.), Language
policy: Lessons from global models ( 1st ed., pp. 225-231). Monterey Institute of
International Studies.
He, Y. (2015). On the usefulness of Test of English for Xiamen Senior High School
Entrance [Master’s thesis, Minnan Normal University]. Zhangzhou, Fujian.
Henning, G. (1991). A study of the effects of contextualization and familiarization on
responses to the TOEFL vocabulary test items. Educational Testing Service.
Hill, H. C., Rowan, B., & Ball, D. L. (2005). Effects of teachers’ mathematical
knowledge for teaching on student achievement. 42(2), 371-406.
Ho, R. (2006). Handbook of univariate and multivariate data analysis and
interpretation with SPSS. Chapman & Hall/CRC.
Hoffman, D. L., & De Leeuw, J. (1992). Interpreting multiple correspondence
analysis as a multidimensional scaling method. Marketing Letters, 3(3), 259-
272.
Hoffman, D. L., & Franke, G. R. (1986). Correspondence analysis: graphical
representation of categorical data in marketing research. Journal of Marketing
Research, 23(3), 213-227.
Holec, H. (1981). Autonomy and foreign language learning. Pergamon.
Hou, Y. (2018). A study on the washback effect of the reform of SHMET Listening
and Speaking Test. Technology Enhanced Foreign Language Education(5), 23-
29.
Hu, G. (2005a). English language education in China: Policies, progress, and
problems. Language Policy, 4(1), 5-24.
Hu, G. (2005b). Professional development of secondary EFL teachers: Lessons from
China. Teachers College Record, 107(4), 654-705.
Hu, L-t., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance
structure analysis: Conventional criteria versus new alternatives. Structural
Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55.
Hu, Y. (2015). A study of washback effects of JSEAGT writing test on junior English
writing teaching and learning. [Master’s thesis, Gannan Normal University].
Ganzhou, Jiangxi.
Huang, J. (2006). Understanding factors that influence Chinese English teachers’
decision to implement communicative activities in teaching. Journal of Asia
TEFL, 3(4), 165-191.
Hughes, A. (1989). Testing for language teachers. Cambridge University Press.
Hughes, A. (1993). Backwash and TOEFL 2000 [Unpublished manuscript].
University of Reading.
296 Bibliography
Hung, S.-T. A. (2012). A washback study on e-portfolio assessment in an English as
a Foreign Language teacher preparation program. Computer Assisted Language
Learning, 25(1), 21-36.
Iraji, H. R., Enayat, M. J., & Momeni, M. (2016). The effects of self-and peer-
assessment on Iranian EFL learners’ argumentative writing performance. Theory
and Practice in Language Studies, 6(4), 716-722.
Jaeyoung, C., Kaysi Eastlick, K., Judy, M., & Daniel, W. L. L. (2012).
Understanding the language, the culture, and the experience: Translation in
cross-cultural research. International Journal of Qualitative Methods, 11(5),
652-665.
Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., & Taylor, C. (2000). TOEFL 2000
framework: A working paper (TOEFL Monograph Series MS-16). Educational
Testing Service.
Jiang, Y. (2003). English as a Chinese language. English Today, 19(2), 3-8.
Jin, Y. (2000). The washback effects of College English Test-Spoken English Test
on teaching. Foreign Language World, 118(2), 56-61.
Jin, Y. (2017). Construct and content in context: Implications for language learning,
teaching and assessment in China. Language Testing in Asia, 7(1), 12.
Jin, Y., & Cheng, L. (2013). The effects of psychological factors on the validity of
high-stakes tests. Modern Foreign Languages, 36(1), 62-69.
Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate correspondence
analysis (6th ed.). Prentice-Hall.
Johnson, R. B., & Christensen, L. B. (2012). Educational research: Quantitative,
qualitative, and mixed approaches (4th ed.). SAGE Publications.
Johnson, R. B., Onwuegbuzie, A. J., & Turner, L. A. (2007). Toward a definition of
mixed methods research. Journal of Mixed Methods Research, 1(2), 112-133.
Jones, N., & Saville, N. (2016). Learning oriented assessment: A systemic approach
(Vol. 45). Cambridge University Press.
Jöreskog, K. G., & Sörbom, D. (1989). LISREL 7: A guide to the program and
applications (2nd ed.). Spss Inc.
Joughin, G. (2005, 4-5 November). Learning oriented assessment: A conceptual
framework. Conference proceedings, Effective Learning and Teaching
Conference, Brisbane, Brisbane.
Kaiser, H. F. (1960). The application of electronic computers to factor analysis.
Educational and psychological measurement, 20(1), 141-151.
Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39, 31-36.
Kamberelis, G., & Dimitriadis, G. (2013). Focus groups: From structured interviews
to collective conversations. Routledge.
Bibliography 297
Khaniya, T. (1990). The washback effect of a textbook-based test. Edinburgh
Working Papers in Applied Linguistics, 1, 48-58.
Krashen, S. D. (2003). Explorations in language acquisition and use. Heinemann.
Kuzel, A. J. (1992). Sampling in qualitative inquiry. In B. F. Crabtree & W. L. Miller
(Eds.), Research methods for primary care, Vol. 3. Doing qualitative research
(pp. 31-44). Sage Publications.
Lamb, T. (2010). Assessment of autonomy or assessment for autonomy? Evaluating
learner autonomy for formative purposes. In A. Paran & L. Sercu (Eds.), Testing
the untestable in language education (pp. 98-119). Multilingual Matters.
Larsen-Freeman, D. (2003). Teaching language : From grammar to grammaring.
Thomson/Heinle.
Latham, H. (1886). On the action of examinations considered as a means of
selection. Deighton, Bell.
Lau, K. (2018). To be or not to be: Understanding university academic English
teachers’ perceptions of assessing self-directed learning. Innovations in
Education and Teaching International, 55(2), 201-211.
Laufer, B., Elder, C., Hill, K., & Congdon, P. (2004). Size and strength: Do we need
both to measure vocabulary knowledge? Language Testing, 21(2), 202-226.
Lemke, J. L. (1990). Talking science: Language, learning, and values. Ablex
Publishing Company.
Li, F. (2017). A study on the backwash effect of JSGT reading comprehension tests
on middle school English teaching and learning [Master’s thesis, Chongqing
Normal University]. Chongqing.
Li, H. (2009). Three rounds of test preparation strategy in Senior High School
Entrance English Test. Theory and Practice of Education, 29(11), 60-61.
Li, J. (2018). A study of washback of SHSEE (English, Shanghai) on teaching of
reading comprehension at junior middle school [Master’s thesis, Shanghai
Normal University]. Shanghai.
Li, P., Hu, Y., Xu, Q., & Li, P. (2019). Based on accomplishment to improve
language ability and focus on education to help coordinate all-round
development—An analysis of English test questions in the 2019 Shanxi Middle-
School Entrance Examinations. Theory and Practice of Education, 39(32), 3-6.
Li, X. (1990). How powerful can a language test be? The MET in China. Journal of
Multilingual & Multicultural Development, 11(5), 393-404.
Lightbown, P. M., & Spada, N. (2019). Teaching and learning L2 in the classroom:
It’s about time. Language Teaching, 1-11.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry (Vol. 75). Sage.
Linn, R. L. (1993). Educational Assessment: Expanded Expectations and Challenges.
Educational Evaluation and Policy Analysis, 15(1), 1-16.
298 Bibliography
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based
assessment: Expectations and validation criteria. Educational Researcher, 20(8),
15-21.
Little, D. G. (1996). Strategic competence considered in relation to strategic control
of the language learning process. In H. Holec, D. G. Little, & R. Richterich
(Eds.), Strategies in language learning and use: Studies towards a Common
European Framework of Reference for language learning and teaching (pp. 9-
37). Council of Europe.
Little, D. G. (2007). Language learner autonomy: Some fundamental considerations
revisited. Innovation in Language Learning and Teaching, 1(1), 14-29.
Liu, H., & Brantmeier, C. (2019). “I know English”: Self-assessment of foreign
language reading and writing abilities among young Chinese learners of English.
System, 80, 60-72.
Liu, R. (2012). A survey of the present condition in the spoken English test of
Changsha Senior High School Entrance Examination [Master’s thesis, Hunan
University]. Changsha, Hunan.
Liu, Y., & Zhao, Y. (2010). A study of teacher talk in interactions in English classes.
Chinese Journal of Applied Linguistics, 33(2), 76-86.
Luo, M. (2012). Reforming curriculum in a centralized system: An examination of
the relationships between teacher implementation of student-centered pedagogy
and high stakes teacher evaluation policies in China [Doctoral dissertation,
Columbia University]. New York.
Ma, W. (2018). A study on washback of the listening test items in Senior Secondary
School Entrance Examination of English (Jiangxi) [Master’s thesis, Jiangxi
Normal University]. Nanchang, Jiangxi.
Macmillan, F., Walter, D., & O'Boyle, J. (2014). Investigating grammatical
knowledge at the advanced level. Research Notes(55), 7-12.
Madaus, G. F. (1988). The influence of testing on the curriculum. In L. N. Tanner
(Ed.), Critical issues in curriculum: Eighty-seventh yearbook of the National
Society for the Study of Education (pp. 83-121). University of Chicago Press.
Madsen, H. S. (1983). Techniques in testing. Oxford University Press.
May, L., Nakatsuhara, F., Lam, D., & Galaczi, E. (2020). Developing tools for
learning oriented assessment of interactional competence: Bridging theory and
practice. Language Testing, 37(2), 165-188.
McChesney, R. (1999). Introduction. In N. Chomsky (Ed.), Profit over people:
Neoliberalism and global order (pp. 6-17). Seven Stories Press.
McDonnell, L. M. (2004). Politics, persuasion, and educational testing. Harvard
University Press.
McDonnell, L. M. (2013). Educational accountability and policy feedback.
Educational Policy, 27(2), 170-189.
Bibliography 299
McNamara, T. F. (1996). Measuring second language performance. Addison Wesley
Longman.
Meara, P., & Buxton, B. (1987). An alternative to multiple choice vocabulary tests.
Language Testing, 4(2), 142-154.
Merriam, S. B. (2016). Qualitative research: A guide to design and implementation
(4th ed.). Jossey Bass Ltd.
Merton, R., Fiske, M., & Kendall, P. (1956). The focused interview. New York: Free
Press.
Merton, R. K., & Kendall, P. L. (1946). The focused interview. American journal of
Sociology, 51(6), 541-557.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.,
pp. 13-103). Macmillan.
Messick, S. (1994). The interplay of evidence and consequences in the validation of
performance exercises. Educational Researcher, 36, 13-23.
Messick, S. (1996). Validity and washback in language testing. Language Testing,
13(3), 241-256.
Millman, J., Bishop, C. H., & Ebel, R. (1965). An analysis of test-wiseness.
Educational psychological measurement, 25(3), 707-726.
Minichiello, V., Aroni, R., & Hays, T. (2008). In-depth interviewing: Principles,
techniques, analysis (3rd ed.). Pearson Education Australia.
Ministry of Education. (1999). Opinions on the reform of junior high school
graduates and entrance examinations (guanyu chuzhong biye shengxue kaoshi
de zhidao yijian). Ministry of Education of China. Retrieved December 12, 2017
from
http://www.moe.gov.cn/s78/A06/jcys_left/moe_706/s3321/201001/t20100128_8
1825.html
Ministry of Education. (2001). English Curriculum Standards for Full-time
Compulsory Education and Senior High Schools (Trial Version) (quanrizhi yiwu
jiaoyu putong gaoji zhongxue yingyu kecheng biaozhun (shiyangao)). Beijing
Normal University Press. http://www.tefl-china.net/2003/ca13821.htm
Ministry of Education. (2006). Compulsory Education Law of the People’s Republic
of China (zhonghua renmin gongheguo yiwu jiaoyu fa).
http://en.moe.gov.cn/Resources/Laws_and_Policies/201506/t20150626_191391.
html
Ministry of Education. (2011). English Curriculum Standards for Compulsory
Education (2011) (yiwujiaoyu yingyu kecheng biaozhun, 2011). Beijing Normal
University Press.
Ministry of Education. (2015). What we do. Retrieved December 19, 2017 from
http://en.moe.gov.cn/About_the_Ministry/What_We_Do/201506/t20150626_19
1288.html
300 Bibliography
Ministry of Education. (2019). Number of students of formal education by type and
level (geji gelei xueli jiaoyu xuesheng qingkuang). Retrieved Janaury 12, 2020
from
http://www.moe.gov.cn/s78/A03/moe_560/jytjsj_2018/qg/201908/t20190812_3
94239.html
Mizutani, S. (2009). The mechanism of washback on teaching and learning [Doctoral
dissertation, The University of Auckland]. Auckland.
Moeller, A. K., Creswell, J. W., & Saville, N. (2016). Second language assessment
and mixed methods research. Cambridge University Press.
Mok, M. M. C. (2013). Self-directed learning oriented assessments in the Asia-
Pacific (1st ed.). Springer Netherlands.
Morrow, K. (1991). Evaluating communicative tests. In S. Anivan (Ed.), Current
developments in language testing (pp. 111-118). Regional Language Centre.
Moss, E. (2001). Multiple choice questions: Their value as an assessment tool.
Current Opinion in Anesthesiology, 14(6), 661-666.
Mu, G. M., Liang, W., Lu, L., & Huang, D. (2018). Building pedagogical content
knowledge within professional learning communities: An approach to
counteracting regional education inequality. Teaching and Teacher Education,
73, 24-34.
Nardi, P. M. (2006). Doing survey research: A guide to quantitative methods (2nd
ed.). Pearson/Allyn & Bacon.
Nassaji, H., & Fotos, S. (2004). Current developments in research on the teaching of
grammar. Annual Review of Applied Linguistics, 24, 126-145.
Nastasi, B. K., Hitchcock, J. H., & Brown, L. M. (2010). An inclusive framework for
conceptulizing mixed methods design typologies: Moving toward fully
integrated synergistic research methods. In A. Tashakkori & C. Teddlie (Eds.),
Sage handbook of mixed methods in social & behavioral research (2nd ed., pp.
305-339). SAGE Publications.
Nation, P. (1990). Teaching and learning vocabulary. Newbury House.
Nation, P. (2001). Learning vocabulary in another language. Cambridge University
Press.
Nation, P. (2018). Keeping it practical and keeping it simple. Language Teaching,
51(1), 138-146.
Nation, P., & Beglar, D. (2007). A vocabulary size test. The language teacher, 31(7),
9-13.
National Bureau of Statistics of China. (2018). China statistical yearbook,
Chongqing statistics. China Statistics Press. Retrieved January 12, 2020 from
http://tjj.cq.gov.cn//tjnj/2018/indexch.htm
Bibliography 301
Nichols, P. D., Meyers, J. L., & Burling, K. S. (2009). A framework for evaluating
and planning assessments intended to improve student achievement.
Educational Measurement: Issues and Practice, 28(3), 14-23.
Nichols, S. L., Glass, G. V., & Berliner, D. C. (2006). High-stakes testing and
student achievement: Does accountability pressure increase student learning?
Education policy analysis archives, 14, 1.
Nida, E. A. (1977). The nature of dynamic equivalence in translating. Babel:
International Journal of Translation.
Nitta, R., & Gardner, S. (2005). Consciousness-raising and practice in ELT
coursebooks. ELT Journal, 59(1), 3-13.
Nunan, D. (1996). Towards autonomous learning: Some theoretical, empirical and
practical issues. In R. Pemberton, E. S. L. Li, W. W. F. Or, & H. D. Pierson
(Eds.), Taking Control: Autonomy in Language Learning (pp. 13-26). Hong
Kong University Press.
Nunnally, J. C. (1978). Psychometric theory. McGraw-Hill.
Nurweni, A., & Read, J. (1999). The English vocabulary knowledge of Indonesian
university students. English for Specific Purposes, 18(2), 161-175.
Ockey, G. J., & Choi, I. (2015). Structural Equation Modeling reporting practices for
language assessment. Language Assessment Quarterly, 12(3), 305-319.
OECD. (2019). PISA 2018: Insights and interpretations. OECD. Retrieved January
12, 2020 from
https://www.oecd.org/pisa/PISA%202018%20Insights%20and%20Interpretatio
ns%20FINAL%20PDF.pdf
Oller, J. W. (1979). Language tests at school. Longman.
Oscarson, M. (1989). Self-assessment of language proficiency: Rationale and
applications. 6(1), 1-13.
Oscarson, M. (1998, 18-20 September). Learner self-assessment of language skills:
A review of some of the issues. IATEFL Special Interest Group Symposium,
Gdansk, Poland.
Ostovar-Namaghi, S. A., & Safaee, S. E. (2017). Exploring techniques of developing
writing skill in IELTS preparatory courses: A data-driven study. English
Language Teaching, 10(3), 74-81.
Oxford, R. (1989). Use of language learning strategies: A synthesis of studies with
implications for strategy training. System, 17(2), 235-247.
Oxford, R. (1990). Language learning strategies: What every teacher should know.
Heinle and Heinle.
Oxford, R., & Nyikos, M. (1989). Variables affecting choice of language learning
strategies by university students. The Modern Language Journal, 73(3), 291-
300.
302 Bibliography
Özmen, K. S. (2011). Analyzing washback effect of SEPPPO on prospective English
teachers. Journal of Language & Linguistics Studies, 7(2), 24-51.
Pan, M., & Feng, G. (2015). On the assessment requirements of the National criteria
of teaching quality for undergraduate English majors. Foreign Languages in
China, 67(5), 11-16.
Pan, M., & Qian, D. D. (2017). Embedding corpora into the content validation of the
grammar test of the National Matriculation English Test (NMET) in China.
Language Assessment Quarterly, 14(2), 120-139.
Pan, Y.-C. (2014). Learner washback variability in standardized exit tests. TESL-EJ:
Teaching English as a Second or Foreign Language, 18(2).
Pan, Y.-C., & Newfields, T. (2011). Teacher and student washback on test
preparation evidenced from Taiwan's English certification exit requirements.
International Journal of Pedagogies & Learning, 6(3), 260-272.
Pan, Y.-C., & Roever, C. (2016). Consequences of test use: A case study of
employers' voice on the social impact of English certification exit requirements
in Taiwan. Language Testing in Asia, 6(1), 1-21.
Paribakht, T. S., & Wesche, M. (1997). Vocabulary enhancement activities and
reading for meaning in second language vocabulary acquisition. In J. Coady &
T. Huckin (Eds.), Second language vocabulary acquisition (pp. 174-200).
Cambridge University Press.
Patton, M. Q. (2015). Qualitative research & evaluation methods: Integrating theory
and practice (4th ed.). SAGE Publications, Inc.
Paulson, F. L., Paulson, P. R., & Meyer, C. A. (1991). What makes a portfolio a
portfolio. Educational leadership, 48(5).
Pazaver, A., & Wang, H. (2009). Asian students’ perceptions of grammar teaching in
the ESL classroom. The International Journal of Language Society and Culture,
27, 27-35.
Petrescu, M. C., Helms-Park, R., & Dronjic, V. (2017). The impact of frequency and
register on cognate facilitation: Comparing Romanian and Vietnamese speakers
on the Vocabulary Levels Test. English for Specific Purposes, 47, 15-25.
Phillipson, R. (2009). English in globalisation, a lingua franca or a lingua
Frankensteinia? TESOL Quarterly, 43(2), 335-339.
Popham, W. J. (1987). The merits of measurement-driven instruction. The Phi Delta
Kappan, 68(9), 679-682.
Popham, W. J. (1999). Why standardized tests don't measure educational quality.
Educational leadership, 56, 8-16.
Popham, W. J. (2001). Teaching to the test? Educational leadership, 58(6), 16-21.
Powers, W. R. (2005). Transcription techniques for the spoken word. Rowman
Altamira.
Bibliography 303
Prodromou, L. (1995). The backwash effect: From testing to teaching. ELT Journal,
49(1), 13-25.
Purpura, J. E. (2004). Assessing grammar. Cambridge University Press.
Purpura, J. E., & Turner, C. E. (2013). Learning-oriented assessment in classrooms:
A place where SLA, interaction, and language assessment interface
ILTA/AAAL Joint Symposium on “LOA in classrooms”.
Qi, L. (2004a). Has a high-stakes test produced the intended changes. In L. Cheng,
Y. Watanabe, & A. Curtis (Eds.), Washback in language testing: Research
contexts and methods (pp. 171-190). Lawrence Erlbaum.
Qi, L. (2004b). The intended washback effect of the National Matriculation English
Test in China: Intentions and reality. Foreign Language Teaching and Research
Press.
Qi, L. (2005). Stakeholders’ conflicting aims undermine the washback function of a
high-stakes test. Language Testing, 22(2), 142-173.
Qi, L. (2007). Is testing an efficient agent for pedagogical change? Examining the
intended washback of the writing task in a high-stakes English test in China.
Assessment in Education: Principles, Policy & Practice, 14(1), 51-74.
Qi, L. (2010). Should proofreading go? Examining the selection function and
washback of the proofreading sub-test in the National Matriculation English
Test. In L. Cheng & A. Curtis (Eds.), English language assessment and the
Chinese learner (pp. 219-233). Routledge.
Rassaei, E. (2019). Tailoring mediation to learners’ ZPD: Effects of dynamic and
non-dynamic corrective feedback on L2 development. The Language Learning
Journal, 47(5), 591-607.
Rea-Dickins, P. (1991). What makes a grammar test communicative. In C. J.
Alderson & B. North (Eds.), Language testing in the 1990s: The communicative
legacy (pp. 112-131). Macmillan Publishers Limited.
Rea-Dickins, P. (1997). The testing of grammar in a second language. In C. Clapham
& D. Corson (Eds.), Encyclopedia of language and education: Language testing
and assessment (Vol. 7, pp. 87-97). Kluwer Academic.
Read, J. (1993). The development of a new measure of L2 vocabulary knowledge.
Language Testing, 10(3), 355-371.
Read, J. (1995). Refining the word associates format as a measure of depth of
vocabulary knowledge. New Zealand Studies in Applied Linguistics, 1, 1-17.
Read, J. (1997). Assessing vocabulary in a second language. In C. Clapham & D.
Corson (Eds.), Encyclopedia of language and education: Language testing and
assessment (Vol. 7, pp. 99-107). Kluwer Academic.
Read, J. (2000). Assessing vocabulary. Cambridge University Press.
304 Bibliography
Read, J. (2019). Key issues in measuring vocabulary knowledge. In S. A. Webb
(Ed.), The Routledge handbook of vocabulary studies (pp. 545-560). New York,
NY.
Read, J., & Chapelle, C. A. (2001). A framework for second language vocabulary
assessment. Language Testing, 18(1), 1-32.
Regmi, K., Naidoo, J., & Pilkington, P. (2010). Understanding the processes of
translation and transliteration in qualitative research. International Journal of
Qualitative Methods, 9(1), 16-26.
Ren, Y. (2011). A study of the washback effects of the College English Test (band 4)
on teaching and learning English at tertiary level in China. International Journal
of Pedagogies & Learning, 6(3), 243-259.
Reynolds, B. L., Shih, Y.-C., & Wu, W.-H. (2018). Modeling Taiwanese adolescent
learners' English vocabulary acquisition and retention: The washback effect of
the College Entrance Examination Center's reference word list. English for
Specific Purposes, 52, 47-59.
Richards, J. C., & Schmidt, R. (2013). Longman dictionary of language teaching and
applied linguistics (4th ed.). Routledge.
Risemberg, R., & Zimmerman, B. J. (1992). Self-regulated learning in gifted
students. Roeper Review, 15(2), 98-101.
Rogoff, B. (1990). Apprenticeship in thinking: Cognitive development in social
context. Oxford university press.
Rose, D. (2008). Vocabulary use in the FCE listening test. Research Notes, 32, 9-16.
Ryan, R. M., & Deci, E. L. (2000). Intrinsic and extrinsic motivations: Classic
definitions and new directions. Contemporary Educational Psychology, 25(1),
54-67.
Saglam, A. L. G., & Farhady, H. (2019). Can exams change how and what learners
learn? Investigating the washback effect of a university English language
proficiency test in the Turkish context. Advances in Language Literary Studies,
10(1), 177-186.
Saif, S. (2006). Aiming for positive washback: A case study of international teaching
assistants. Language Testing, 23(1), 1-34.
Salamoura, A., & Unsworth, S. (2015). Learning Oriented Assessment: Putting
learning, teaching and assessment together. Modern English Teacher, 24(3), 4-7.
www.modernenglishteacher.com
Salverda, R. (2002). Language diversity and international communication. English
Today, 18(3), 3-11.
Saville, N., & Salamoura, A. (2014, October). Learning Oriented Assessment - A
systemic view from an examination provider. Presentation at the Roundtable on
Learning-Oriented Assessment in Language Classrooms and Large-Scale
Contexts, Teachers College, Columbia University, New York.
Bibliography 305
Schmitt, N. (2000). Vocabulary in language teaching. Cambridge University Press.
Schmitt, N. (2019). Understanding vocabulary acquisition, instruction, and
assessment: A research agenda. Language Teaching, 52(2), 261-274.
Schmitt, N., Nation, P., & Kremmel, B. (2020). Moving the field of vocabulary
assessment forward: The need for more rigorous test development and
validation. Language Teaching, 53(1), 109-120.
Schreiber, J. B., Nora, A., Stage, F. K., Barlow, E. A., & King, J. (2006). Reporting
structural equation modeling and confirmatory factor analysis results: A review.
The Journal of educational research, 99(6), 323-338.
Seidlhofer, B. (2005). English as a lingua franca. ELT Journal, 59(4), 339-341.
Sharif, K. S. M., & Siddiek, A. G. (2017). Critical thinking as reflected in the
Sudanese and Jordanian Secondary School Certificate English Language
Examinations. English Language Teaching, 10(5), 37-61.
Shepard, L. A. (2000). The role of assessment in a learning culture. Educational
Researcher, 29(7), 4-14.
Shi, H. (2013). A study on the validity of clozing tests in Senior High School
Entrance English Test in Shanxi province. Journal of Shanxi Normal University
(Natural Sciences Edition) Special Issue for Postgraduate Theses, 27(S1), 141-
143.
Shi, J. (2001). Tracing the historical development of curriculum policies in China's
basic education (woguo jichu jiaoyu kecheng zhengce fazhan bianhua de lishi
guiji). China Education and Research Network. Retrieved December 20, 2017
from http://www.teachercn.com/Kcgg/Lltt/2006-
1/5/20060119160119692_4.html
Shih, C.-M. (2007). A new washback model of students’ learning. Canadian Modern
Language Review, 64(1), 135-161.
Shih, C.-M. (2009). How tests change teaching: A model for reference. English
Teaching, 8(2), 188-206.
Shohamy, E. (1992). Beyond proficiency testing: A diagnostic feedback testing
model for assessing foreign language learning. The Modern Language Journal,
76(4), 513-521.
Shohamy, E. (2001). The power of tests: A critical perspective on the uses of
language tests. Pearson Education Limited.
Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test impact revisited:
Washback effect over time. Language Testing, 13(3), 298-317.
Shohamy, E., Reves, T., & Bejarano, Y. (1986). Introducing a new comprehensive
test of oral proficiency. ELT Journal, 40(3), 212-220.
Simons, H. (2009). Case study research in practice (1st ed.). SAGE.
Sinclair, J. M., & Coulthard, M. (1975). Towards an analysis of discourse: The
English used by teachers and pupils. Oxford University Press.
306 Bibliography
Sjøberg, S. (2007). Constructivism and learning. In E. Baker, B. McGaw, & P. P
(Eds.), International encyclopaedia of education (3rd ed.). Elsevier.
So, Y. (2014). Are teacher perspectives useful? Incorporating EFL teacher feedback
in the development of a large-scale international English test. Language
Assessment Quarterly, 11(3), 283-303.
Song, X. (2013). A study on washback of test formats of English reading
comprehension test of JSGT on teaching and learning in middle schools.
[Master’s thesis, Ludong University]. Yantai, Shandong.
Spolsky, B. (1990). Social aspects of individual assessment. In J. H. A. L. de Jong &
D. K. Stevenson (Eds.), Individualizing the assessment of language abilities (pp.
3-15). Multilingual Matters.
Spolsky, B. (1995). Measured words: The development of objetive language testing.
Oxford University Press.
Spratt, M. (2005). Washback and the classroom: The implications for teaching and
learning of studies of washback from exams. Language Teaching Research,
9(1), 5-29.
Squires, A. (2009). Methodological challenges in cross-language qualitative
research: A research review. International Journal of Nursing Studies, 46(2),
277-287.
Sun, C. (2010). An introduction to major university English tests and English
language teaching in China [Master’s thesis, Brigham Young University].
United States.
Sutrisno, A., Nguyen, N. T., & Tangen, D. (2014). Incorporating translation in
qualitative studies: Two case studies in education. International journal of
qualitative studies in education, 27(10), 1337-1353.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.).
Pearson/Allyn & Bacon.
Tan, C. (2020). Beyond high-stakes exam: A neo-Confucian educational programme
and its contemporary implications. Educational Philosophy and Theory, 52(2),
137-148.
Tashakkori, A., & Creswell, J. W. (2007). Editorial: The new era of mixed methods.
Journal of Mixed Methods Research, 1(1), 3-7.
Tassinari, M. G. (2012). Evaluating learner autonomy: A dynamic model with
descriptors. Studies in Self-Access Learning Journal, 3(1), 24-40.
Teddlie, C., & Tashakkori, A. (2012). Common “core” characteristics of mixed
methods research: A review of critical issues and call for greater gonvergence.
American Behavioral Scientist, 56(6), 774-788.
Teng, H.-C., & Fu, C.-W. (2019). The washback of listening tests for entrance exams
on EFL instruction in Taiwanese junior high schools. Language Education
Assessment, 2(2), 96-109.
Bibliography 307
Tsagari, D. (2009). The complexity of test washback: An empirical study. Peter Lang
GmbH.
Tsagari, D. (2011). Washback of a high-stakes English exam on teachers’
perceptions and practices. Selected papers on theoretical and applied linguistics,
19, 431-445.
Tsagari, D. (2014, October). Unplanned LOA in EFL classrooms: Findings from an
empirical study. Presentation at the Roundtable on Learning-Oriented
Assessment in Language Classrooms and Large-Scale Contexts, Teachers
College, Columbia University, New York.
Turner, C. E., & Purpura, J. E. (2016). Learning-oriented assessment in second and
foreign language classrooms. In D. Tsagari & J. Banerjee (Eds.), Handbook of
second language assessment (pp. 255-273). DeGruyter Mounton.
Turner, C. E., & Upshur, J. A. (1995). Some effects of task type on the relation
between communicative effectiveness and grammatical accuracy in intensive
ESL classes. TESL Canada Journal, 12(2), 18-31.
Uyaniker, P. (2017). Language assessment: Now and then. Eurasian Journal of
Language Education and Research, 1(1), 1-20.
Van Teijlingen, E. R., & Hundley, V. (2001). The importance of pilot studies. Social
Research Update, 35.
Verplanck, W. S. (1992). A brief introduction to the word associate test. The
Analysis of verbal behavior, 10(1), 97-123.
Vygotsky, L. S. (1986). Thought and language. MIT Press.
Wall, D. (1996). Introducing new tests into traditional systems: Insights from general
education and from innovation theory. Language Testing, 13(3), 334-354.
Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact
study. Language Testing, 10(1), 41-69.
Wallace, J. (2014). Grammar in speaking: Raising student awareness and
encouraging autonomous learning. Research Notes(56), 30-36.
Walters, J. (2004). Teaching the use of context to infer meaning: A longitudinal
survey of L1 and L2 vocabulary research. Language Teaching, 37(4), 243-252.
Wang, C., Yan, J., & Liu, B. (2014). An empirical study on washback effects of the
Internet-Based College English Test Band 4 in China. English Language
Teaching, 7(6), 26-53.
Wang, L. (2014). Quality assurance in higher education in China: Control,
accountability and freedom. Policy and Society, 33(3), 253-262.
Wang, L., & Mok, K. H. (2013). The impacts of neo-liberalism on higher education
in China. In A. Turner & H. Yolcu (Eds.), Neo-liberal educational reforms: A
critical analysis (pp. 139-163). Routledge.
Wang, Q. (2007). The national curriculum changes and their effects on English
language teaching in the People’s Republic of China. In J. Cummins & C.
308 Bibliography
Davison (Eds.), International Handbook of English Language Teaching (pp. 87-
105).
Wang, X. (2003). Education in China since 1976. McFarland & Compant, Inc.
Watanabe, Y. (1996a). Does grammar translation come from the entrance
examination? Preliminary findings from classroom-based research. Language
Testing, 13(3), 318-333.
Watanabe, Y. (1996b). Investigating washback in Japanese EFL classrooms:
problems of methodology. Australian Review of Applied Linguistics.
Supplement Series, 13(1), 208-239.
Watanabe, Y. (2004). Teacher factors mediating washback. In L. Cheng, Y.
Watanabe, & A. Curtis (Eds.), Washback in language testing: Research contexts
and methods (pp. 129-146). Lawrence Erlbaum.
Webb, S., Sasao, Y., & Ballance, O. (2017). The updated Vocabulary Levels Test:
Developing and validating two new forms of the VLT. ITL-International
Journal of Applied Linguistics, 168(1), 33-69.
Weber, K. (2003). The relationship of interest to internal and external motivation.
Communication Research Reports, 20(4), 376-383.
Weir, C. J. (2005). Language testing and validation: An evidence-based approach.
Palgrave Macmillan.
Weir, C. J. (2013). An overview of the influences on English language testing in the
United Kingdom 1913–2012. In C. J. Weir, I. Vidaković, & E. D. Galaczi
(Eds.), Measured constructs: A history of Cambridge English language
examinations 1913–2012 (Vol. 37, pp. 1-102). Cambridge University Press.
Wen, Q. (2018). The production-oriented approach to teaching university students
English in China. Language Teaching, 51(4), 526-540.
Wolf, L. F., & Smith, J. K. (1995). The consequence of consequence: Motivation,
anxiety, and test performance. Applied Measurement in Education, 8(3), 227-
242.
Wu, Y. (2017). Language education in China: Teaching foreign languages. In R.
Sybesma (Ed.), Encyclopedia of Chinese language and linguistics (Vol. 2, pp.
515-527). Brill.
Xiao, W. (2014). The intensity and direction of CET washback on Chinese college
students’ test-taking strategy use. Theory & Practice in Language Studies, 4(6),
1171-1177.
Xie, Q. (2010). Test design and use, preparation, and performance: A structural
equation modeling study of consequential validity [Unpublished doctoral
dissertation, The University of Hong Kong]. Hong Kong.
Xie, Q. (2013). Does test preparation work? Implications for score validity.
Language Assessment Quarterly, 10(2), 196-218.
Bibliography 309
Xie, Q. (2015a). Do component weighting and testing method affect time
management and approaches to test preparation? A study on the washback
mechanism. System, 50, 56-68.
Xie, Q. (2015b). “I must impress the raters!” An investigation of Chinese test-takers’
strategies to manage rater impressions. Assessing Writing, 25, 22-37.
Xie, Q., & Andrews, S. (2013). Do test design and uses influence test preparation?
Testing a model of washback with Structural Equation Modeling. Language
Testing, 30(1), 49-70.
Xu, Y., & Wu, Z. (2012). Test-taking strategies for a high-stakes writing test: An
exploratory study of 12 Chinese EFL learners. Assessing Writing, 17(3), 174-
190.
Yang, H.-C., & Plakans, L. (2012). Second language writers’ strategy use and
performance on an integrated reading-listening-writing task. TESOL Quarterly,
46(1), 80-103.
Yang, W. (2015). A Study on the washback effect of the grammar part in the Junior
Secondary English Achievement Graduation Test (Jiangxi) [Master’s thesis,
Guangdong University of Foreign Studies]. Guangdong.
Yang, Z., Gu, X., & Liu, X. (2013). A longitudinal study of the CET washback on
college English classroom teaching and learning in China: Revisiting college
English classes of a university. Chinese Journal of Applied Linguistics, 36(3),
304-325.
You, C., & Dörnyei, Z. (2014). Language learning motivation in China: Results of a
large-scale stratified survey. Applied linguistics, 37(4), 495-519.
Yu, L. (2001). Communicative language teaching in China: Progress and resistance.
TESOL Quarterly, 35(1), 194-198.
Yuan, K.-H., Marshall, L. L., & Bentler, P. M. (2002). A unified approach to
exploratory factor analysis with missing data, nonnormal data, and in the
presence of outliers. Psychometrika, 67(1), 95-121.
Yurdugül, H. (2008). Minimum sample size for Cronbach’s coefficient alpha: A
Monte-Carlo study. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 35(35).
Zafarghandi, A. M., & Nemati, M. J. (2015). A comparative analysis of IELTS and
TOEFL in an Iranian context: A case study of washback of standard tests.
Theory and Practice in Language Studies, 5(1), 154.
Zeng, W., Huang, F., Yu, L., & Chen, S. (2018). Towards a learning-oriented
assessment to improve students’ learning—A critical review of literature.
Educational Assessment, Evaluation and Accountability (formerly: Journal of
Personnel Evaluation in Education), 30(3), 211-250.
Zeng, Y. (2008). A study on the washback effect of the Senior High School Entrance
Exam on junior English teaching and learning [Master’s thesis, Huazhong
Normal University]. Wuhan, Hubei.
310 Bibliography
Zeng, Y. (2010). The computerized oral English test of the National Matriculation
English Test. In L. Cheng & A. Curtis (Eds.), English language assessment and
the Chinese learner (pp. 234-247). Routledge.
Zhan, Y., & Andrews, S. (2014). Washback effects from a high-stakes examination
on out-of-class English learning: Insights from possible self theories.
Assessment in Education: Principles, Policy & Practice, 21(1), 71-89.
Zhang, H., Zhang, W., Wu, S., & Guo, Q. (2018). A survey on attitudes toward
holding the National Matriculation English Test twice a year. China
Examinations(1), 20-26.
Zhang, L., & Li, X. (2004). A comparative study on learner autonomy between
Chinese students and west European students. Foreign Language World, 4, 15-
23.
Zhang, R. (2019). Backwash effect of integrating listening and speaking test into
NMET (Shanghai): Taking School J as an example. Foreign Language Testing
and Teaching(04), 47-53.
Zhi, M., & Wang, Y. (2019). Washback of college entrance English exam on student
perceptions of learning in a Chinese rural city. In R. M. Damerow & K. M.
Bailey (Eds.), Chinese-speaking learners of English: Research, theory, practice
(1st ed., pp. 26-37). Routledge.
Zhuang, X. (2008). Practice on assessing grammar and vocabulary: The case of the
TOEFL. Journal of US-China Education Review, 5(7), 46-57.
Ziegler, N., & Kang, L. (2016). Drawing mixed methods procedural diagrams. In A.
K. Moeller, J. W. Creswell, & N. Saville (Eds.), Second language assessment
and mixed methods research (pp. 51-83). Cambridge University Press.
Zou, S., & Xu, Q. (2017). A washback study of the Test for English Majors for
Grade Eight (TEM8) in China—From the perspective of university program
administrators. Language Assessment Quarterly, 14(2), 140-159.
Bibliography 311
312 Bibliography
Appendices
Appendix A
Appendices 313
Appendix B
Test Item Examples of the GVT from the Authentic 2018 SHSEET Paper
(Chongqing, Paper A)
II.单项选择。(每小题 1 分,共 15 分)
从 A、B、C、D 四个选项中选出可以填入空白处的最佳答案,并把答题卡上对应题目
的答案标号涂黑。
21. I had ________ egg and some milk for breakfast this morning.
A. a B. an C. the D. /
A. on B. in C. at D. by
23. --- I have a bad cold.--- Sorry to hear that. You’d better go to see a ________ at once.
A. I B. me C. my D. mine
26. __________ visitors came to take photos of Hongyadong during the vacation.
28. I can’t hear the teacher _______ with so much noise outside.
29. They don’t live here any longer. They _______ to Chengdu last month.
30. --- Must I go out to have dinner with you, Mum?--- No, you ________, my dear. You’re
free to make your own decision.
314 Appendices
31. It’s hard for us to say goodbye ______ we have so many happy days to remember.
33. ________ special class we had today! We learned about kung fu.
34. The 2022 Winter Olympic Games ________ in China. I’m sure it will be a great success.
35. --- Excuse me! Do you know ______________? --- It’s two kilometers away from here.
Everyone has dreams. Lily dreamed of being a dancer. She took 36 lessons and all her
teachers thought she was an excellent student.
One day she saw a notice. It said that a famous dancing group would be performing in her
town. 37 though, “I must show the leader my dancing skills.” She waited for the group
leader in the dressing room. 38 the leader appeared, she came up and hander him the flowers
she prepared. The thorns (刺) hurt her fingers and blood came out. But she was too 39 to
care about the pain. She expressed her strong wish to be a dancer and begged (乞求) to show
her dance.
“All right, you dance.” The leader agreed. But half way through the dance, he stopped her,
“I’m sorry, in my mind you’re not good enough!” On hearing this, Lily 40 out as fast as her
legs could carry her. It was so hard for her to accept this. She lost heart and 41 her dream.
Several years later, the dancing group came to her town again. She decided to find out 42
the leader had told her she was not good enough.
Appendices 315
The leader went on, “I remember your present of 44 and how the thorns had hurt your fingers
but you carried on bravely. It was a pity that you didn’t take dancing like that and stopped
trying so 45 . So you are still not good enough for dancing!”
43. A. on B. at C. in D. to
VII.完成句子。(每空 1 分,共 10 分)
根据所给提示,完成句子。每空一词,含缩略词。
73. 当我们有不同意见时,应该相互理解。(完成译句)
74 史蒂芬·霍金不仅是一名伟大的科学家,而且是一位著名的作家。(完成译句)
Stephen Hawking was not only a great scientist __________ __________ a famous writer.
VIII.短文填空。(每空 2 分,共 16 分)
根据下面短文内容,在短文的空格处填上一个恰当的词,使短文完整、通顺。
316 Appendices
As we are growing up, we really need advice from adults. Here are three people talking
about their experience.
Sometimes when you’re a teenager, you feel as if you’re all alone and there’s 75
you can talk to. Do you know twenty to thirty percent of teenagers in the US have a hard 76
going through the period? They feel lonely and sad. I think life is so much easier if you 77
your troubles with others. I regret that I didn’t take the advice when someone gave it to me.
When I was in school, I never thought I’d become a teacher. I acted badly in class,
and I feel 78 about that now. I love my job and I know how challenging it is, so I hope kids
can show their teachers more respect (尊敬). I hope kids can 79 that teachers push them to
do their best and not just to give them a hard time.
When I was a teenager, I never learned 80 to save money. I just spent it! My parents
gave me everything I wanted, but I realize now they spent little 81 themselves. Now I wish
I knew more about planning my money, and I am not the only one! It seems that today’s
teenagers know about money planning even less 82 me years ago. I do wish they could
learn about it earlier.
Appendices 317
Appendix C
Table of Empirical Washback Studies of High-stakes Standardised English Tests in the International Context
Author(s) Topic Target test(s) Site Participants Research methods Major findings
School Leaving
The washback of a
Khaniya (1990) Certificate English Nepal 358 students Test administration negative washback
textbook-based test
examination (SLC)
The differences
between IELTS and Classroom observation (22 IELTS
197 leaners; 20
Green (2006b) English for academic IELTS U.K. preparation classes; 13 EAP negative washback
teachers
purposes (EAP) writing classes); brief teacher interview
classes
The washback of a The Selection
national high-stakes Examination for
negative and harmful
Özmen (2011) examination on Professional Posts in Ankara, Turkey 164 student-teachers Questionnaire; interview
washback
prospective English Public Organization
teachers (SEPPPO)
The relationship
between intended FCE 15 native and non-
washback and teachers’ native FCE teachers
Tsagari (2011) FCE Greece Interview negative washback
perceptions towards the in private language
exam as well as schools
classroom practice
The washback
60 IELTS
Zafarghandi and correlation of two Test administration; interview;
IELTS; TOEFL Iran applicants; 60 negative washback
Nemati (2015) standardised tests in questionnaire
TOEFL applicants
Iran
Six intact classes in
Washback on students’ four high schools in 80 Iranian male
Damankesh and High school final the Northern Guilan generally negative, some
test-taking and test province, the cities
high school students Think aloud methodology
Babaii (2015) English exam in Iran positive effects
preparation strategies of Siyahkal and (freshmen learners)
Shaft, Iran
Appendices 319
Author(s) Topic Target test(s) Site Participants Research methods Major findings
572 students; 83 Questionnaire; classroom generally positive
Hawkey (2006) Impact of IELTS IELTS Worldwide teachers; 45 observation; interview; document washback
textbook evaluators analysis (teaching materials)
A test of spoken
19 graduate
The possibility of language ability
The University of advisors; 255 Interview; observation; test
Saif (2006) generating positive designed for positive washback
Victoria, Canada undergraduate administration
washback international teaching
students; 47 ITAs
assistants (ITAs)
3,868 students, 423
The consideration of the
The Test of English for high school
intended washback by
Green (2014) Academic Purposes Japan teachers; 19 Student and teacher questionnaires positive washback
test developers and how
(TEAP) university English
to achieve it
teachers
9 ASL teachers; 16
EFL teachers; 62
The washback of test Student questionnaire; structured different washback
Shohamy et al. ASL, low-stake test; ASL students; 50
changes in two national Israel interview (teachers and inspectors); patterns; both positive and
(1996) EFL, high-stakes test EFL students; 2
tests document analysis negative washback
ASL inspectors; 4
EFL inspectors
The washback of a new
Wall and English examination in The O-Level 7 Sri Lankan Classroom observation both positive and negative
Sri Lanka
Alderson (1993) Sri Lanka on language examination teachers washback
teaching
20 teachers for each
The comparison of test;
Student and teacher questionnaires;
IELTS and TOEFL iBT IELTS; 100 IELTS both positive and negative
Erfani (2012) Iran classroom observation; teacher
washback in test TOEFL iBT students; 120 washback
interview
preparation courses TOEFL iBT
students
Test of Readiness for
Academic English
(TRACE), a local
Saglam and integrated them-based Mixed methods of test both positive and negative
Washback on learning Turkey 147 EFL students
Farhady (2019) high-stakes English administration and focus groups washback
language proficiency
test, used in a university
EAP program
320 Appendices
Author(s) Topic Target test(s) Site Participants Research methods Major findings
Mainly in a Teacher interview; student no answer to the
Alderson and The washback of
specialised interview; TOEFL preparation undesirable TOEFL
Hamp-Lyons TOEFL preparation TOEFL Two teachers
institute in the classroom observation; non-TOEFL influence on language
(1996) courses
USA preparation classroom observation teaching
The relationship
between the university
entrance examinations
Watanabe Japanese university A yobiko school Two teachers and Interviews for teachers’ background no positive or negative
and the use of the
(1996a) entrance examinations in central Tokyo four courses information; classroom observation washback
grammar-translation
approach in teaching to
the exam
Note. The studies were listed based on washback result findings, namely, “positive washback”, “negative washback”, and “mixed/complex washback”.
Appendices 321
Appendix D
Author(s) Topic Target test(s) Site Participants Research methods Major findings
8 NMET
constructors; 6
The intended and Interview; classroom
Guangdong English inspectors; negative washback: the
Qi (2004b) unintended washback of NMET observation; teacher and
388 secondary intended washback failed
NMET student questionnaires
school teachers; 986
students
8 NMET
The reasons for the constructors; 6
Two provinces in Interview; student and teacher negative washback: the
Qi (2005) failure of intended NMET English inspectors;
China questionnaires intended washback failed
washback of NMET 388 teachers; 986
students
388 Senior III
middle school
Teacher and student
The washback of Guangdong, English teachers; negative washback: the
Qi (2007) NMET questionnaires; interview;
writing task in NMET Sichuan 986 Senior III intended washback failed
classroom observation
students; 8 NMET
test constructors
The effects of test uses
870 sophomores in A set of questionnaires
and test design on test
Xie and Guangdong, a university in (questionnaire of test
preparation (washback CET 4 negative washback
Andrews (2013) China Guangdong perception; questionnaire of
effects on teaching and
province test preparation)
learning)
Whether the MET
Six provinces 229 teachers and
innovation brought any Matriculation English
Li (1990) enlisted in the local English
change in English Test (MET), same as Questionnaire positive washback
MET experiment teaching-and-
teaching in middle the NMET
research officers
schools
The Junior Secondary 21 teachers and 130 Teacher and student
Washback of grammar
Yang (2015) English Achievement Jiangxi students from a questionnaires, classroom positive washback
part
Graduation Test junior high school observation, teacher interview
322 Appendices
Author(s) Topic Target test(s) Site Participants Research methods Major findings
(JSEAGT) –SHSEET in
Jiangxi province
Perceptions of test and
Zou and Xu TEM 8: both oral and 724 program
test washback on Mainland China Mailing questionnaire positive washback
(2017) paper tests administrators
teaching and learning
Washback of listening 30 English teachers Teacher and student positive washback on
Comprehensive
Teng and Fu test in the senior high and 298 students questionnaires; teacher and teaching; both positive and
Assessment Program Taiwan
(2019) entrance exam on EFL from three junior student semi-structured negative washback on
(CAP) listening test
teaching and learning high schools interviews learning
The effects of changes 3 cohorts of Test administration; test
Andrews et al. both negative and positive
to high-stakes tests on UE Oral examination Hong Kong Secondary 7 videotapes and transcripts
(2002) washback
test-takers’ performance students analysis
Five secondary Interview (five test item
254 Grade 9
The washback of schools in writers, five teachers);
students; 40 English both positive and negative
Zeng (2008) SHSEET on teaching SHSEET Hongshan classroom observation;
teachers; 4 test item washback
and learning District, Wuhan teacher and student
writers
province questionnaires
150 non-English
major Student questionnaire;
Wang et al. The actual washback Beijing both positive and negative
IB CET-4 undergraduates unstructured interview;
(2014) effects of IB CET4 washback
from Beijing classroom observation
Jiaotong University
Washback of NMET the foreign language Teacher and student
Chen et al. under the policy of two test in National College Zhejiang 79 teachers, 710 questionnaires, teacher and both positive and negative
(2018) tests a year on teaching Entrance Examination province, China Grade 12 students student semi-structured washback
and learning (NCEE), i.e., NMET interviews
Washback on learning A concurrent qualitative-
the National College
Zhi and Wang from low A rural town in 139 senior high dominant mixed-methods both positive and negative
Entrance English Exam
(2019) socioeconomic Central China school students approach; a cross-sectional washback
(NCEEE), i.e., NMET
background students survey design
The washback of the
Hong Kong Certificate
Teacher and student
of Education Hong Kong
48 teachers; 42 questionnaires; classroom difficult to judge the value
Cheng (1997) Examination in English HKCEE Secondary
students observation; unstructured of washback
(HKCEE) regarding schools
interviews
revised examination
syllabus
Appendices 323
Author(s) Topic Target test(s) Site Participants Research methods Major findings
Two cohorts of
The influence of
students at
examination change
Secondary 5: 1100
regarding classroom
Hong Kong students taking the limited/superficial
Cheng (1998) activities, practice HKCEE Student questionnaire
secondary schools old HKCEE in washback
opportunities and
1994; 600 students
learning strategies from
taking the new
students’ perspectives
HKCEE in 1995.
Department chair, 2
Two private or 3 teachers, 14 to
Shih (2007) The washback of GEPT
institutions of 15 students and 3 Interview; classroom various but limited degrees
on English learning and GEPT
higher education students’ family observation; document review of washback
Shih (2009) teaching
in Taiwan members from each
department
The potential washback
of preparatory IELTS
course on students’
IELTS test
146 students in 23 Web-based student
performance/the A university in
Gan (2009) IELTS undergraduate questionnaire (main); student no obvious washback
learning/motivational Hong Kong
programmes interview (supplementary)
effects of IELTS test
preparation course
among ESL students in
Hong Kong.
389 Secondary 4
The perceptions of the
Cheng et al. students; 315 Student and parent
impact of School-based HKCEE SBA Hong Kong complex impact
(2011) parents of those questionnaires
assessment (SBA)
students
A provincial
The CET 4 test impact Three non-English Diary approach; semi-
Zhan and comprehensive
on students’ out-of-class The revised CET-4 major structured post-diary diverse washback
Andrews (2014) university in
learning undergraduates interview
Jiangsu province
Note. The studies were listed based on washback result findings, namely, “positive washback”, “negative washback”, and “mixed/complex washback”.
324 Appendices
Appendix E
Date:
School:
Teacher:
Class number:
No. of students:
Time period:
Part A: General teaching and learning practices regarding the GVT
Major tasks/language
points
Time spent
Mention of the SHSEET
Assessment-related Teaching
activities Learning
Use of target language L1
L2
Materials Types
Purposes
Part B: LOA related practice
Assessment tasks
Learning focus Language
Test
Participant organisation T→Ss/Class
S→T/Class
Choral
Feedback forms Oral
Written
Only correction
Correction +
explanation
Result interpretation Simple (only
answer)
Advanced
(referring to
ECSCE/test
specification)
Part C: Field notes (nonverbal behaviours: facial expression, eye contact, etc.)
Appendices 325
Appendix F
*Please note: These interview questions may be changed once the classroom
observation data are collected and briefly analysed, therefore, some of the questions in
this protocol will be edited to reflect the data found in the previous observation stage.
Thanks for your participation. I am researching the test influence of the grammar and
vocabulary part in the Senior High School Entrance English Test (SHSEET) on
English teaching. I would really appreciate your views on the following questions
about the test influence and the Learning Oriented Assessment regarding the GVT.
Q1. Do you think the inclusion of the separate testing of grammar and vocabulary in
the SHSEET reflects the aim of the communicative language use development in the
English Curriculum Standards for Compulsory Education (ECSCE)?
- No. Why?
Q2. From your understanding, which aspects of grammar and vocabulary are the focus
of the SHSEET? Do you feel that these aspects reflect the intent of the ECSCE?
- If so, do you think they have the same guiding ideas of student- and learning-
centred instruction as specified in the ECSCE?
Q3. Do you focus on the language points listed for grammar and vocabulary in the
ECSCE only?
- Yes. How do you deal with new language points during teaching?
- No. Why? What is your focus on the teaching of grammar and vocabulary?
326 Appendices
Q4. Except for the language points listed in the ECSCE, are you also familiar with
other guidelines for teaching and assessment? Such as teaching should help to develop
students’ language use ability and the SHSEET should test students’ language use
ability in integrated macroskill tasks?
- No. Why?
- Yes. What are the guidelines and where do they come from?
Q7. Do you offer any feedback to students on their performance of the GVT?
- Yes.
- No. Why? Then do you offer any feedback on students’ performance on grammar and
vocabulary outside of the test itself?
Appendices 327
Q8. How do you prepare your students for the GVT?
- How long do you spend on teaching the grammar and vocabulary tasks in general?
- How frequently do you prepare students for the grammar and vocabulary tasks?
- Or other? Why?
Q9. What is your understanding of the term “Learning Oriented Assessment (LOA)”?
- Yes. In what ways? Do you arrange any classroom assessment which is learning oriented?
- No. Why? Do you think it is necessary and possible to arrange LOA activities such as
peer-evaluation for the GVT during teaching?
Q12. How will students be evaluated upon their graduation from junior high schools?
- If so, do you keep any formative records for students? What are they? What are the
purposes?
- Or other?
328 Appendices
- Yes. What is the impact?
- No. Why?
- No. why?
(The interviewer will have the 2011 ECSCE, 2018 Chongqing English test
specification, two past test papers at hand for reference.)
Appendices 329
Appendix G
*Please note: These interview questions designed are possible to change once the
classroom observation data are collected and briefly analysed, therefore, some of the
questions in this protocol will be edited to reflect the data found in the previous
observation stage.
Q2. Does your teacher mention the ECSCE or test specification when teaching the
grammar and vocabulary tasks of the SHSEET?
330 Appendices
- Yes.
- Language points?
- No. What else does she/he mainly talk about when teaching the GVT?
Q3. Does your teacher follow the teaching and assessment guidelines when teaching
the grammar and vocabulary part of the SHSEET? Such as teaching should help to
develop language use ability listed in the ECSCE?
- No. Why?
Q4. Do you have any feedback on your grammar and vocabulary achievement?
- No. Do you think the feedback will be useful for improving your knowledge and ability
to use grammar and vocabulary?
- Or other? Why?
Q6. Do you think the GVT can improve your learning of grammar and vocabualry?
Appendices 331
- Yes. In what ways?
Q7. Do you engage in any learning-oriented activities like pair or group work when
learning the GVT?
- Yes. How do you think of it? Are there other similar forms of learning-oriented activities
possible?
Q8. Does the inclusion of the separate testing of the GVT impact on your learning?
- No. Why?
- No. Why?
(The interviewer will have 2011 ECSCE, 2018 Chongqing English test specification,
two past test papers at hand for reference.)
332 Appendices
Appendix H
Symbol Meaning
xxx Not clear in the recording.
… Omission in the transcript.
…… Participants deliberately did not finish the sentence.
Appendices 333
Appendix I
Student Survey
Section 1
Pilot version of the student survey
Construct Indicators Item description
Perceptions of Regarding the GVT (the Multiple Choice Question, Cloze,
test design Sentence Completion, and Gap-filling Cloze), I think that:
characteristics
Negative v2 As long as I rote-memorise those grammar rules and vocabulary
perception of test lists, I can achieve a high score.
design
v3 The Sentence Completion and Gap-filling tasks are only testing
characteristics
students’ vocabulary spelling.
v4 The MCQ and cloze tasks are only testing students’ ability of
guessing the correct answers.
Positive v1 The language situations designed in the items are fully in line
perception of test with the real-life situations.
design
v5 I must read the whole sentence in order to answer the questions
characteristics
(for MCQ and Sentence Completion).
v6 I must read the whole passage in order to answer the questions
(for Cloze and Gap-filing cloze).
v7 I must understand the sentence context in order to answer the
questions (for MCQ and Sentence Completion).
v8 I must grasp the gist of the passage in order to answer the
questions (for Cloze and Gap-filing cloze).
v9 The four task types are testing various topics which require a
lot of background information and common sense.
v10 The four task types can test my overall ability to use language.
v11 The four task types can test my ability to use grammar and
vocabulary in different language situations.
Motivation My motivation to learn English grammar and vocabulary:
Intrinsic v12 Learning English grammar and vocabulary can greatly
motivation influence my future English study (such as study in higher
education).
v14 Learning English grammar and vocabulary is really interesting.
v15 Learning English grammar and vocabulary can further help me
to read English books or magazines, surf English websites, etc.
v16 Learning English grammar and vocabulary can greatly help me
in language communication (both oral and written
communication).
v19 Learning English grammar and vocabulary can help me make
use of various resources to understand the cultural background
of English-speaking countries.
Extrinsic v13 Learning English grammar and vocabulary is very helpful for
motivation me to get a higher test score in various language tests.
334 Appendices
v17 Learning English grammar and vocabulary is to meet the
requirement of school courses.
v18 Learning English grammar and vocabulary can help me to be
enrolled into my ideal senior high school.
v20 Learning English grammar and vocabulary can help me to
become a successful member of society.
v21 Learning English grammar and vocabulary can greatly help me
to pass any future language evaluations that may be given in my
future career.
Test anxiety In the time leading up to the GVT,
v22 My appetite was unchanged.
v23 My sleep habits were unchanged.
v24 I was confident that I could do much better than most of the
other students.
v25 I never worried that the teacher and my parents would criticise
me if I couldn’t get an ideal score.
v26 I still did NOT change my usual study habits for learning
English grammar and vocabulary.
Test preparation How many test papers did you take each week (excluding
effort the normal class time) after starting your test preparation?
Test papers taken v27 MCQ
v28 Cloze
v29 Sentence Completion
v30 Gap-filling cloze
Time investment In the time leading up to the GVT, each week I spent some
time on the following test tasks(excluding the normal class
time),
v31 MCQ
v32 Cloze
v33 Sentence Completion
v34 Gap-filling cloze
Learning In the time leading up to the GVT, I used the following
strategy strategies to learn English grammar and vocabulary,
Negative strategy v35 Focusing mainly on the language knowledge which is
frequently tested.
v36 Being reliant on supplementary learning materials (such as
grammar book, vocabulary list, and mock tests).
v37 Doing a lot of exercises and mock tests.
v38 Only rote-memorise grammar rules and vocabulary lists.
v39 Using test-wiseness strategies (such as eliminating the similar
options and avoiding using unfamiliar words or grammar rules).
Positive strategy v40 Finding suitable learning materials for myself.
v41 Keeping a record of exemplary words, sentences and
paragraphs that are useful for my future language learning
while reading.
Appendices 335
v42 Summarising and reviewing the language points on which I
often make mistakes.
v43 Reading extensively to build the sense of language
appropriateness (i.e., know what grammar structure or
vocabulary to be used in different sentences or contexts).
v44 Summarising rules for leaning grammar and vocabulary for
myself.
In the time leading up to the test, we did the following
grammar and vocabulary activities in class,
Classroom v45 We often solved problems in groups or pairs.
interaction
v46 We often just followed the English teacher’s instructions.
v47 We practiced conversations regarding various topics (not
including reading the text from the test).
v48 Our English teacher did things like nominating one student to
lead the whole class (the student taught some grammar and
vocabulary knowledge to the whole class).
v49 We performed drama and had English debates.
Involvement in v50 We were encouraged by the teacher to self-assess to identify
assessment strengths and weaknesses in grammar and vocabulary learning.
v51 We were encouraged by the teacher to assess peers’
performance on grammar and vocabulary tests to give
feedback.
v52 We were encouraged by the teacher to know and become
familiar with the grammar and vocabulary scoring rubrics in
different assessments (such as learning the scoring rubrics of
Gap-filling cloze).
v53 We were encouraged by the teacher to try to design grammar
and vocabulary test items to assess our own achievement.
v54 We were encouraged by the teacher to summarise and reflect
on our strengths and weaknesses after taking every grammar
and vocabulary test, including quiz and self-assessment.
Feedback In the time leading up to the test,
v55 The feedback on my grammar and vocabulary learning from my
English teacher was very frequent.
v56 The feedback on my grammar and vocabulary learning from my
English teacher was very timely.
v57 The feedback on my grammar and vocabulary learning from my
English teacher was very detailed.
v58 The feedback on my grammar and vocabulary learning from the
English teacher helped me to find my weaknesses and make
learning objectives for the next stage.
v59 As long as I acted upon the feedback on the grammar and
vocabulary learning from the English teacher seriously, I could
make progress in future.
v60 I felt very satisfied with the feedback method (such as “marking
the assignment after class—focused feedback in the next
class—individual feedback after the class”.)
336 Appendices
Learner In the time leading up to the test, I learned grammar and
autonomy vocabulary autonomously in the following ways,
v61 I preferred asking for my teacher’s or classmate’s help when I
had any grammar and vocabulary problems.
v62 I preferred solving the grammar and vocabulary problems by
myself (e.g., looking up in a dictionary, checking with the test
preparation materials about grammar rules or vocabulary
knowledge, reviewing my previous notes on incorrect test
responses).
v63 I usually tried to take every opportunity to take part in grammar
and vocabulary activities in the class, such as pair and group
discussion, to help my learning.
v64 I could make a very effective use of my free time to learn
English grammar and vocabulary.
v65 Except for the assignments required by the English teacher, I
also attended many extra-class activities to practice and learn
English grammar and vocabulary (such as practising
conversations with classmates).
Test importance The following sayings are about the test importance of the
GVT (MCQ, cloze, sentence completion, gap-filling cloze),
please choose the scale that suits you.
v66 I learned English grammar and vocabulary knowledge, because
it was tested in the exam.
v67 What are tested in the grammar and vocabulary should be what
students learn.
v68 The total score of the GVT (56 marks for all the four task types)
influences the entire score for the SHSEET directly.
v69 I think that learning the GVT can greatly influence my future
study and life.
Test difficulty To what extent is the GVT challenging to you?
v70 MCQ
v71 Cloze
v72 Sentence Completion
v73 Gap-filling cloze
Section 2
English version of student survey in the main study
Construct Indicators Item description
Demographic Have you ever completed the pilot version of this survey?
information
Are you a Grade 9 student who graduated in Chongqing this
year? Yes/No
Which test paper did you use when taking the SHSEET?
SHSEET Paper A/SHSEET Paper B
Gender: Male/Female
Your 2018 SHSEET score:
Appendices 337
Please write down the name of the junior high school you
studied. District name; School name; Class number.
Perceptions of Regarding the GVT (Multiple Choice Question, Cloze,
test design Sentence Completion, and Gap-filling Cloze tasks), I think
characteristics that:
Negative v1 It only aims to test students’ ability of rote-memorising
perception of test vocabulary and fixed collocations.
design
v2 The Sentence Completion and Gap-filling cloze tasks are only
characteristics
testing students’ spelling of vocabulary.
v3 The MCQ and Cloze tasks are only testing students’ ability of
guessing correct answers.
v4 The four task types are testing students’ ability of rote-
applying grammar rules into different sentences.
v5 The MCQ and Cloze are only testing students’ ability of
eliminating distracting options.
v6 What is tested the GVT should be the learning focus of
students.
Positive v7 I must read the whole sentence in order to answer the questions
perception of test (for MCQ and Sentence Completion tasks).
design
v8 I must read the whole passage in order to answer the questions
characteristics
(for Cloze and Gap-filing cloze tasks).
v9 I must understand the sentence context in order to answer the
questions (for MCQ and Sentence Completion tasks).
v10 I must grasp the gist of the passage in order to answer the
questions (for Cloze and Gap-filing cloze tasks).
v11 The language situations involved in the questions are absolutely
in line with real-life situations.
v12 The four task types are testing various topics which require a
lot of background information and common sense.
v13 The four task types can test my overall ability to use language.
v14 The four task types can test my ability to use grammar and
vocabulary in different language situations.
Motivation My motivation to learn English grammar and vocabulary:
Intrinsic v15 Learning English grammar and vocabulary can help me lay a
motivation foundation for future language learning.
v16 Learning English grammar and vocabulary can help me read
English books or magazines, surf English websites, etc.
v17 Learning English grammar and vocabulary can help me in
English language communication (both oral and written
communication).
v18 Learning English grammar and vocabulary can help me make
use of various resources to understand the cultural background
of English-speaking countries.
v19 Learning English grammar and vocabulary is really interesting.
Extrinsic v20 Learning English grammar and vocabulary is to meet the
motivation requirement of school courses.
338 Appendices
v21 Learning English grammar and vocabulary can help the
enrolment into my ideal senior high school.
v22 Learning English grammar and vocabulary can help me get a
higher test score in various language tests.
v23 Learning English grammar and vocabulary can help me become
a successful member of society.
v24 Learning English grammar and vocabulary can help me pass
any future language evaluations in my future career.
Test anxiety In the time leading up to the GVT,
v25 My appetite was unchanged.
v26 My sleep habits were unchanged.
v27 I was confident that I could do much better than most of the
other students and thus not afraid of comparing scores with
others.
v28 I never worried that the teacher and my parents would criticise
me if I couldn’t get an ideal score.
v29 I felt still relaxed.
Test preparation How many test papers did you take each week (excluding
effort the normal class time) after starting your test preparation?
Test papers taken v30 MCQ
v31 Cloze
v32 Sentence Completion
v33 Gap-filling cloze
Time investment In the time leading up to the GVT, each week I spent some
time on the following test tasks (excluding the normal class
time),
v34 MCQ
v35 Cloze
v36 Sentence Completion
v37 Gap-filling cloze
Learning In the time leading up to the GVT, I used the following
strategy strategies to learn English grammar and vocabulary,
Negative strategy v38 Focusing mainly on the language knowledge which is
frequently tested.
v39 Being reliant on supplementary learning materials (such as
grammar book, vocabulary list, and mock tests).
v40 Repetitively doing a lot of exercises and mock tests.
v41 Rote-memorising grammar rules and vocabulary lists.
v42 Using test-wiseness strategies (such as guessing test designers’
intentions to choose the right answers).
Positive strategy v43 Finding suitable learning materials for myself.
v44 Keeping a notebook of exemplary words, sentences and
paragraphs that were useful for my future language learning
while reading.
Appendices 339
v45 Summarising and reviewing the language points on which I
often made mistakes.
v46 Reading extensively to build the sense of language
appropriateness (i.e., know what grammar structure or
vocabulary to be used in different sentences or contexts).
v47 Summarising rules for leaning grammar and vocabulary for
myself.
In the time leading up to the GVT, we did the following
grammar and vocabulary activities in class,
Classroom v48 We often solved problems in groups or pairs.
interaction
v49 Our English teacher often used open-ended questioning to give
instruction on grammar and vocabulary (not including asking
for simple answers like “yes/no” and “right/wrong”).
v50 We practiced conversations regarding various topics (not
including reading the text from the test).
v51 Our English teacher did things like nominating one student to
lead the whole class (the student taught/explained some
grammar and vocabulary knowledge to the whole class).
v52 We had interesting learning activities such as performing drama
and having English debates.
Involvement in v53 We were encouraged by the English teacher to self-assess to
assessment identify our strengths and weaknesses in grammar and
vocabulary learning (such as checking our own homework).
v54 We were encouraged by the English teacher to assess peers’
performance on grammar and vocabulary tests to give
feedback.
v55 We were encouraged by the English teacher to know and
become familiar with the grammar and vocabulary scoring
rubrics in different assessments (such as learning the scoring
rubrics of grammar for Gap-filling cloze and writing tasks).
v56 We were encouraged by the English teacher to try to design
grammar and vocabulary test items to assess our own
achievement.
v57 We were encouraged by the English teacher to summarise and
reflect on our strengths and weaknesses after taking every
grammar and vocabulary test, including quiz and self-
assessment.
Feedback In the time leading up to the GVT,
v58 The feedback on my grammar and vocabulary learning from my
English teacher was very frequent.
v59 The feedback on my grammar and vocabulary learning from my
English teacher was very timely.
v60 The feedback on my grammar and vocabulary learning from my
English teacher was very detailed.
v61 The feedback on my grammar and vocabulary learning from the
English teacher helped me to find my weaknesses and make
learning objectives for the next stage.
v62 As long as I acted upon the feedback on the grammar and
vocabulary learning from the English teacher seriously, I could
make progress in future.
340 Appendices
v63 I felt very satisfied with the feedback method (such as “marking
the assignment after class—focused feedback in the next
class—individual feedback after the class”.)
Learner In the time leading up to the GVT, I learned grammar and
autonomy vocabulary autonomously in the following ways,
v64 I actively asked for my English teacher’s or classmate’s help
when I had any grammar and vocabulary problems.
v65 I actively solved the grammar and vocabulary problems by
myself (e.g., looking up in a dictionary, checking with the test
preparation materials about grammar rules or vocabulary
knowledge, reviewing my previous notes on incorrect test
responses).
v66 I actively tried to take every opportunity to take part in grammar
and vocabulary activities in the class, such as pair and group
discussion, to help me to learn better.
v67 I could make a very effective use of my free time to learn
English grammar and vocabulary.
v68 In addition to the assignments required by the English teacher,
I also attended many extra-curricular activities to practice and
learn English grammar and vocabulary (such as practising
conversations with classmates).
Test importance Regarding the test importance of the GVT (MCQ, cloze,
Sentence Completion, Gap-filling cloze), please choose the
importance scale that suits you.
v69 Junior high school graduation
v70 senior high school enrolment
v71 proving English grammar and vocabulary proficiency
v72 developing English language use ability
v73 helping future English learning
Test difficulty To what extent is the GVT challenging to you?
v74 MCQ
v75 Cloze
v76 Sentence Completion
v77 Gap-filling cloze
Section 3
中考英语语法与词汇测试的反拨效应研究
同学你好!我们正在进行一项关于中考英语语法和词汇测试(包括单项选
择、完形填空、完成句子和短文填空四种题型)的匿名问卷调查,目的是为了
了解重庆市 2018 届初中毕业生的初三英语学习情况。请认真阅读以下题目,
并根据自己的实际情况和真实想法作答。答案没有对错之分,对你提供的所有
Appendices 341
Appendix J
Table 1
Independent samples T-test results of paper survey (N=500) and online survey (N=422)
Appendices 347
Table 2
Independent Samples T-test results of SHSEET Paper A (N=805) and Paper B (N=115)17
17
Two participants did not respond to this item of “test paper type”.
348 Appendices
Appendix K
Std.
N Mean Skewness Kurtosis
Constructs Deviation
& Indicators Std. Std. Std.
Statistic Statistic Statistic Statistic Statistic
Error Error Error
v1 922 2.534 .0347 1.0550 .427 .081 -.355 .161
v2 922 2.288 .0354 1.0762 .781 .081 -.011 .161
perception
Appendices 349
v51 922 3.146 .0377 1.1454 -.094 .081 -.780 .161
v52 922 2.694 .0381 1.1577 .267 .081 -.693 .161
v53 922 3.328 .0366 1.1105 -.264 .081 -.636 .161
Involvement
assessment
v54 922 2.999 .0386 1.1710 -.095 .081 -.812 .161
v55 922 3.656 .0340 1.0326 -.512 .081 -.272 .161
v56 922 3.012 .0397 1.2048 -.083 .081 -.845 .161
in
350 Appendices
Appendix L
Table 1
Factor analysis results for the measurable constructs in the main study
18
v19 and v20 were removed from the scale.
19
v26 was removed from the scale.
20
v38, v42 and v43 were removed from the scale.
21
v49, v50, v54, and v56 were removed from the scale.
22
This is a two-factor solution, one factor includes “v69 v70”, and the other one composes “v71, v72
and v73”.
Appendices 351
Table 2
Assessment of normality for indicators within the washback mechanism model (N=922)
352 Appendices
Table 3
Assessment of normality for indicators within the construct LOA practices (N=488)
Appendices 353
Appendix M
354 Appendices