Evaluating Peer Feedback As A Reliable and Valid Complementary Aid To Teacher Feedback in EFL Writing Classrooms A Feedback Giver Perspective

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Studies in Educational Evaluation 73 (2022) 101140

Contents lists available at ScienceDirect

Studies in Educational Evaluation


journal homepage: www.elsevier.com/locate/stueduc

Evaluating peer feedback as a reliable and valid complementary aid to


teacher feedback in EFL writing classrooms: A feedback giver perspective
Wenyan Wu a, Jinyan Huang b, *, 1, Chunwei Han c, Jin Zhang d
a
Beijing Institute of Graphic Communication, China
b
Jiangsu University, China
c
Henan University of Engineering, China
d
Xi’an International University, China

A R T I C L E I N F O A B S T R A C T

Keywords: This study evaluated peer feedback as a reliable and valid complementary aid to teacher feedback in Chinese
EFL writing classrooms college English writing classrooms. Thirty college students as peer feedback givers and four experienced college
Peer feedback English teachers as teacher feedback givers participated in the study. The participants scored 30 English essays
Teacher feedback
holistically and provided written comments on the content, format, language, and organizational aspects of each
Holistic scoring
Written comments
essay. The results indicated that the reliability of up to three peer feedback givers’ holistic scoring is equivalent
Generalizability theory to that of one teacher feedback giver’s scoring; further, peers performed almost as well as teachers in making
comments on the content and organizational aspects of English essays. Implications for Chinese college students
and their English teachers are discussed.

1. Introduction examined the effectiveness of peer feedback in contrast to teacher


feedback (Alqassab et al., 2017; Alqassab et al., 2018; van Ginkel et al.,
1.1. Research Background 2017; Hyland & Hyland, 2006; Ruegg, 2015; Yu & Lee, 2016a, 2016b,
2016c; Yao et al., 2020).
During the past couple of decades, the positive impact of feedback on ESL/EFL researchers stated that teacher feedback plays a dominant
students’ learning has been well documented (Alqassab, Strijbos, & Ufer, role in students’ English writing process (Hyland & Hyland, 2006; Niu &
2017, 2018; Carless, 2015; Carless & Boud, 2018; van Ginkel, Gulikers, Zhang, 2018). Although students consider teacher feedback to be more
Biemans & Mulder, 2017; Hyland & Hyland, 2006; Ruegg, 2015; Yao, useful and reliable to revise drafts, they are also willing to have peer
Guo, Li & McCampbell, 2020; Xu & Carless, 2017). Feedback involves feedback together with teacher feedback to increase the diversity of
the learners’ appreciation of the information from a variety of sources feedback about their writing (Lei, 2017; Yao et al., 2020). Peer feedback
and then using it to polish their work (Carless, 2015; Carless & Boud, provides a good opportunity of learning and useful information for
2018). Teachers and peers become the major sources of information, and improvement in the EFL writing classroom for both peer feedback givers
the feedback they provide facilitates students’ effective learning through and receivers (Berg, Admiraal, & Pilot, 2006; Chang, 2015). By
multiple revisions (Carless & Boud, 2018; Yao et al., 2020). With the analyzing the peers’ writing, the feedback givers have an opportunity to
mass expansion of higher education coupled with limited resources, the understand their own process of EFL writing development and get new
feedback processes are difficult to implement effectively (Evans, 2013). perspectives about ESL/EFL writing (Yu, 2019; Zimmerman & Kitsantas,
One solution to the problem is to shift from teacher feedback to peer 2002). Similarly, the feedback receivers could in turn integrate the peer
feedback (Boud & Molloy, 2013; Price, Handley, & Millar, 2011) to in­ feedback to improve the capability to detect, diagnose, and remedy their
crease the utilization of feedback. As a result, an increasing number of writing problems (Yu & Hu, 2017; Yu & Lee, 2016a).
researchers in higher education focused on applying peer feedback to Furthermore, peer feedback is found to be a helpful complementary
the instruction of diverse subject areas including science, math, English, aid to teacher feedback in ESL/EFL writing classrooms (Gibbs & Simp­
and English as a second/foreign language (ESL/EFL); and they often son, 2004; Hu & Zhang, 2014; Wang, 2014; Yao et al., 2020). For one

* Correspondence to: The Evidence-based Research Center for Educational Assessment, Jiangsu University, Zhenjiang, Jiangsu Province 212013, China.
E-mail address: huangniagara@hotmail.com (J. Huang).
1
Joint first author.

https://doi.org/10.1016/j.stueduc.2022.101140
Received 10 May 2021; Received in revised form 30 January 2022; Accepted 27 February 2022
Available online 6 March 2022
0191-491X/© 2022 Elsevier Ltd. All rights reserved.
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

thing, peer feedback boasts some advantages teacher feedback does not to generate peer feedback on both global and local issues regarding
have. Unlike teacher feedback on which students passively depend to peers’ essays in accordance with a rubric as long as systematic training
improve the quality of the writing, peer feedback activities provide or teacher modeling was offered (Chang, 2015; Rahimi, 2013; Wang,
students with an audience (Caulk,1994), involves students actively in 2014). Feedback givers’ comments can facilitate feedback receivers to
the feedback process and gives them more control and autonomy perfect their writing from the first drafts to their final drafts (Hansen &
(Mendoca & Johnson, 1994), which makes writing a more authentic task Liu, 2005; Huisman et al., 2018). However, the feedback givers’ im­
(Caulk,1994). In addition, students always believe that their similar provements in their writing were manifested in all aspects of content,
learning experiences allow them to better understand each other’s dif­ structure, and style (Huisman et al., 2018).
ficulties, which reduces their fear and pressure as writers and reviewers Furthermore, several studies indicated that providing feedback
and help them to develop their self-confidence (Curtis, 2001; Cotterall & benefited students more than receiving feedback (Cho & MacArthur,
Cohen, 2003). For another, peer feedback can solve practical problems 2011; Hislop & Stracke, 2017; Lundstrom & Baker, 2009; Yu, 2019). For
in real classroom settings. Given that teacher feedback can sometimes example, after analyzing writing samples at the beginning and end of
delay in large-size classrooms, peer feedback could be its complement or one academic semester, Lundstrom and Baker (2009) found that ESL
even replacement so that students receive timely feedback for writing university students who offered feedback improved their writing more
improvement (Min, 2006; Zhao, 2010). than those who just received feedback. Further, Cho and MacArthur
Actually, many college English classrooms are large in Chinese (2011) presented that the students who reviewed, rated, and com­
higher education; as a result, most English teachers are supposed to mented on three sample papers performed better in a subsequent writing
teach two or three classes of over 50 students in each class in one se­ task than those who just read the same sample papers without rating and
mester (Hu & Zhang, 2014). Under this circumstance, it is challenging commenting on the papers, indicating the importance of providing
for classroom teachers to give all the students timely feedback about feedback. Several years later, Hislop and Stracke (2017) claimed that
their English writing. Without such timely feedback, students will be less reading their peers’ essays gave them an opportunity to get new per­
motivated to develop and improve their writing skills (Gibbs & Simpson, spectives, raised their awareness of their own writing shortcomings, and
2004; Hu & Zhang, 2014). Therefore, it is important to evaluate peer developed their writing skills. Similarly, Yu (2019) found that post­
feedback to see if it is an effective complementary aid to teacher feed­ graduates, as feedback givers, raised their awareness of the thesis genre,
back in college English writing classrooms in China. This study could improved their writing skills, and became more skillful learners and
provide some useful insights into EFL writing classrooms for other more reflective academic writers.
countries such as Japan and Iran, which also emphasize EFL learning To optimize the efficacy and benefits of peer feedback, factors
and many researchers from these countries had examined peer feedback affecting its effectiveness were explored by many researchers. The
in the writing classrooms (Azarnoosh, 2013; Allen & Mills, 2014; following is a brief review of these factors.
Rahimi, 2013; Ruegg, 2015; Séror, 2011; Tsui & Ng, 2000).
3.2. Factors affecting the effectiveness of peer feedback
2. Research problem
Many factors are found to affect the effectiveness of peer feedback (e.
This study aimed to examine whether Chinese college students are g., Allen & Mills, 2014; Hu, 2005; Liu & Hansen, 2002; Min, 2006; Tsui
able to provide reliable and valid feedback to their peers’ essay writing & Ng, 2000; Wu, 2019; Zhao, 2010). First, training can have a positive
as their English teachers often do, what challenges students face in impact on students’ peer feedback performance. Peer feedback is usually
giving feedback to their peers’ essays, and what benefits students a new practice to most students. They are often not confident about their
perceive they can get from this practice. Given the fact that rubric-based ability to offer feedback on the work of a peer and wish to have more
scoring, an important part of writing assessment, is informative about support (Alqassab et al., 2017). In the training process, clarifying the
the quality of a piece writing (Jonsson & Svingby, 2007; Patchan, goals and benefits of peer feedback, designing peer feedback guidance
Schunn, & Clark, 2018), the term “feedback” in this study is opera­ sheets, and modeling different types of feedback and making specific
tionally defined as both the holistic scores and comments provided by suggestions were found to be effective in helping students’ peer feed­
classroom teachers and peers that would lead to the improvement of back performance (Chang, 2015; Hu, 2005; Liu & Hansen, 2002).
Chinese college students’ English writing. Holistic scoring is the quan­ Notably, peer feedback guidance sheets are expected to involve items
titative evaluation and comments are the qualitative evaluation of a used to help students succeed in providing feedback on all aspects of a
student’s writing (Patchan et al., 2018). The two compose a complete peer’s work. Besides, error-prone grammar and vocabulary points
assessment of the student’s writing. A close investigation into both offers should be clearly listed down in the guidance sheets and elaborately
adequate evidence about whether peer feedback could be an effective explained through examples since domain knowledge scaffolds can help
complementary aid to teacher feedback in Chinese college English students provide better peer feedback (Alqassab et al., 2017). Without
writing classrooms. appropriate training, students might not be able to offer useful feedback
or to distinguish useful from useless peer feedback and revise their
3. Literature review writing accordingly (Min, 2006).
Second, the cultures in which students learn and grow up affect the
3.1. Efficacy and benefits of peer feedback effectiveness of peer feedback. Students from teacher-centered cultures
tend to believe that their peers are not competent enough to comment
Educational researchers have examined the efficacy and benefits of their essays and will doubt their suggestions (Tsui & Ng, 2000; Zhao,
peer feedback in the domains of mathematics, science, English, and ESL/ 2010). Students from collectivist cultures are more likely to avoid giving
EFL (e.g., Alqassab et al., 2017; Chang, 2015; Hansen & Liu, 2005; critical comments on their peers’ essays to maintain harmonious per­
Huisman, Saab, van Driel & van den Broek, 2018; Rahimi, 2013; Wang, sonal relationship (Nelson & Carson, 1998; Liu & Hansen, 2002). Ac­
2014; Yu, 2019). For example, Alqassab et al. (2017) examined peer cording to Hu and Lam (2010), however, adult Chinese students were
feedback in the domain of mathematics and reported that the feedback able to give and incorporate many helpful suggestions about each
given by students with high and medium domain knowledge stimulated other’s writing; furthermore, most of them were willing to have peer
their peers to act and reflect on their learning, whereas those low in suggestions as a valuable type of feedback; and finally, most of them
domain knowledge provided more peer feedback on accuracy of the claimed that peer feedback did not conflict with Chinese cultural beliefs
solution or the content knowledge used to solve the task. and practices. Therefore, it can be an effective pedagogical tool in Chi­
Researchers in the domain of ESL/EFL found that students were able nese writing classrooms.

2
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

Third, students with different domain knowledge levels provide peer contrast to teacher feedback (e.g., Lee, 2015; Lei, 2017; Séror, 2011;
feedback on different aspects (Alqassab et al., 2017). For instance, stu­ Yang, Badger, & Yu, 2006). The following section briefly reviews the
dents with high and medium domain knowledge on geometric con­ literature on the effectiveness of peer in contrast to teacher feedback.
struction provided more peer feedback on related tasks at the
self-regulation level (i.e., feedback stimulating peers to act and reflect 3.3. Effectiveness of peer versus teacher feedback
on their learning), whereas those low in domain knowledge provided
more peer feedback at the task level (i.e., feedback on the correctnes­ The effectiveness of peer versus teacher feedback could be different
s/incorrectness of the solution or the content knowledge used to solve as perceived by students (Carson & Nelson, 1994; Chen, 2010; Jacobs,
the task) (Alqassab et al., 2017). When it comes to peer feedback in Curtis, Braine & Huang, 1998; Nelson & Carson, 1998; Lee, 2015; Lei,
language studies, ESL/EFL students’ English proficiency levels affect the 2017; Yang et al., 2006; Zhang, 1995). For example, Zhang (1995)
effectiveness of peer feedback. Research shows that high English profi­ compared students’ views on peer and teacher feedback and reported
ciency (HEP)students offer more suggestions and are more actively that over 90% of his EFL students preferred teacher feedback to peer
engaged in peer feedback activities than low English proficiency (LEP) feedback. His finding was supported by other researchers. For example,
students, especially when the writer is poor in English proficiency (Allen Carson and Nelson (1994) and Nelson and Carson (1998) indicated all
& Katayama, 2016; Allen & Mills, 2014; Lundstrom & Baker, 2009). One the participants in their studies favored teacher feedback since they
possible reason is that HEP students tend to play the role of “expert” in assumed that teachers were more proficient in English language and
the activities while the LEP peers are more likely to take on the role of thus could offer more helpful feedback than peers. Yang et al. (2006)
“novice” (Allen & Katayama, 2016). Besides, LEP students are less further confirmed that Chinese EFL students regarded teacher feedback
capable of giving feedback on their HEP peer’s writing (Allen & Mills, as more important than peer feedback although they acknowledged the
2014). Wu (2019), nevertheless, discovered that LEP students performed importance of peer feedback. Recent studies (Séror, 2011; Lee, 2015;
better in peer feedback because they were given enough time and Lei, 2017) demonstrated that students regarded teacher feedback as
allowed to use the first language (L1). One explanation for this more useful and reliable than peer feedback to revise their writing, but
discrepancy may lie in the different implementation of peer feedback they were also willing to have peer feedback to increase the amount and
adopted in these studies. diversity of feedback on their writing.
Although Allen and Mills (2014) found no significant difference in However, Jacobs et al. (1998) found that their EFL students from
the type of feedback made by the two proficiency groups, Allen and Hong Kong and Taiwan preferred to have peer feedback on their writing.
Katayama (2016) and Wu (2019) revealed that HEP reviewers tended to Similarly, Chen (2010) reported that his Taiwanese master’s students
give more language-related feedback rather than content feedback while felt quite positive about providing and receiving peer feedback in their
LEP reviewers did the opposite. This finding was confirmed by Lund­ English writing. Whether students were forced to make a choice between
strom and Baker (2009), who reported that LEP reviewers made more peer and teacher feedback might explain the different results of the
progress in global aspects like content and organization than in local studies above (Jacobs et al., 1998). Peer and teacher feedback should
aspects like grammar and vocabulary after one-semester practice of not be mutually exclusive (Jacobs et al., 1998), and students would
feedback-giving to their peers’ essays. embrace both types of feedback when they were not asked to make a
Fourth, students’ perceptions can impact their engagement in peer choice between the two (Tsui & Ng, 2000).
feedback process, which in turn influences the effectiveness of peer Furthermore, the feedback provided by peers and teachers focuses on
feedback activities (Yao et al., 2020). For example, peer feedback pro­ different aspects and thus produces diverse effects on the improvement
vided by 53 preservice mathematics teachers demonstrated that feed­ of the work or on their ability. A quasi-experimental study (van Ginkel
back givers’ perceptions of their feedback message positively predicted et al., 2017) examined the effectiveness of peer, teacher, and
peer feedback accuracy (Alqassab et al., 2018). Amores (1997) indicated self-feedback on 144 first-year undergraduate students’ progression in
that students who perceived themselves as lower in proficiency than oral presentation skills, knowledge of the presentation, and attitudes
their peers tended to accept their peers’ suggestions uncritically; towards the presentation. Results demonstrated that teacher feedback
moreover, students who perceived themselves as higher in proficiency encouraged students’ presentation skills, while all the three feedback
were more likely to give more feedback on their peers’ writing. Hu and sources significantly developed students’ knowledge and attitudes to­
Lam (2010) further stated that students who perceived that their English wards presentation.
proficiency would affect their peer feedback behaviors offered fewer When the revisions made by students in their writing redrafts were
suggestions for their peers’ essays and incorporated fewer suggestions in investigated after they received peer or teacher feedback, it was found
their revisions than those who did not perceive so. Yu and Lee (2016b) that students incorporated more teacher feedback in their redrafts
revealed that LEP students who regarded peer feedback as a useful (Connor & Asenavage, 1994; Paulus, 1999;Tsui & Ng, 2000; Yang et al.,
learning practice felt more motivated to engage themselves in peer 2006; Zhao, 2010). According to Zhao (2010), students integrated more
feedback activities, and they could also contribute to their group teacher feedback not because they fully understood teacher feedback
members in peer feedback process. According to Yu and Lee (2016b), but because they considered teacher feedback to be more trustworthy
HEP students who viewed peer feedback as an activity only bringing than peer feedback. Incorporating feedback without fully understanding
rewards to feedback receivers would be less motivated to engage in the would not facilitate students’ development of writing ability. Instead,
activity seriously than those who viewed peer feedback as a learning the application of L1 enabled students to easily understand peer feed­
activity primarily benefiting feedback givers. back and integrating such peer feedback in redrafts could promote stu­
In fact, the effectiveness of peer feedback was not affected by a single dents’ writing development. This study highlighted the value of peer
factor (Wu, 2019; Yu & Hu, 2017). Yu and Hu (2017) examined two feedback (Zhao, 2010). Moreover, students made more meaning-level
Chinese EFL university students’ peer feedback practice and revealed an revisions (i.e., idea-related) after receiving peer feedback and more
abundance of factors affecting the practice, including such individual surface-level revisions (i.e., grammar-related) after receiving teacher
factors as beliefs in good writing, motives and goals, previous learning feedback (Connor & Asenavage, 1994; Yang et al., 2006). However, they
and feedback experience, as well as such contextual factors as teacher did not clarify whether different types of revisions were resulted from
feedback practice, feedback training, group dynamics (e.g., positive or different types of feedback.
negative comments, interpersonal relationships), and the learning and Recently, Ruegg (2015) conducted a one-year study in her EFL
feedback culture. This could account for the different findings in the writing classes to examine the relative effects of peer and teacher
above studies. feedback on students’ writing ability. She divided her students into peer
Researchers often examine the effectiveness of peer feedback in and teacher feedback groups. Pre- and post-writing tests showed that

3
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

there was no significant difference between gains for organization, vo­ 2019; Yu & Lee, 2016b, 2016c), more research is needed to explore ways
cabulary, content, or the total essay scores; the teacher feedback group, in involving students in peer feedback tasks effectively in the EFL
however, gained significantly more in grammar scores than the peer writing classrooms.
feedback group. Feedback analysis demonstrated that teacher feedback
was significantly more related to meaning-level issues and content. The 3.5. The present study
analysis results of writing tests and feedback seemed contradictory but
made sense according to other researchers (e.g., Biber, Nekrasova, & This study aimed to examine whether Chinese college students are
Horn, 2011), who stated that feedback on both content and form was able to provide reliable and valid feedback to their peers’ essay writing
likely to enhance grammatical accuracy, while feedback focusing on as their English teachers often do. The following five research questions
language form alone did not. guided the study: 1) what is the variability of the holistic scores assigned
More recently, Dressler, Chu, Crossman, and Hilman (2019) exam­ to English essays by peer in contrast to teacher feedback givers? 2) What
ined the quantity and quality of uptake of surface-level and is the reliability of the holistic scores assigned to English essays by peer
meaning-level feedback provided by peers and a teacher on academic in contrast to teacher feedback givers? 3) How valid are the written
writing assignments in an online graduate-level research course. Ana­ comments on EFL essays made by peer in contrast to teacher feedback
lyses of the students’ drafts and feedback they received indicated that givers? 4) What are peer feedback givers’ major challenges in providing
compared with the teacher, students were more likely to give feedback on peers’ English essays? And 5) what are peer feedback
surface-level feedback while teachers gave more meaning-level feed­ givers’ perceived benefits of peer feedback in writing English essays?
back. That was mainly because peers viewed the writing first and offered The purpose of the last two questions was to further provide evidence for
much surface-level feedback, and thus the teacher did not necessarily and explain the results of the peer feedback givers’ written comments.
repeat comments on that aspect. Besides, students incorporated slightly
higher percentage of teacher feedback than peer feedback. Further, 4. Methodology
students dealt with surface-level feedback a little more often than
meaning-level feedback. The reason behind it was that it was easy to 4.1. Methodological considerations
incorporate surface-level feedback while for meaning-level feedback,
referring to the literature, rewriting a section, or reorganizing the paper Due to the nature of peer feedback on EFL writing, both quantitative
(Baker, 2016) might be involved, which was troublesome and chal­ and qualitative approaches were adopted in the design. The variability
lenging. Based on the findings, it was concluded that training should be and reliability of holistic scores assigned by peer in contrast to teacher
enhanced so that students were more capable of providing and incor­ feedback givers were investigated quantitatively by using generaliz­
porating meaning-level feedback. ability (G-) theory (Cronbach, Gleser, Nanda & Rajaratnam, 1972) as a
Most recently, Tian and Zhou (2020) investigated five Chinese stu­ guiding framework. This is because G-theory has become a powerful
dents’ engagement with automated, peer and teacher feedback in an quantitative approach for detecting the score variability and reliability
online EFL writing course and explored the driving force behind their (Brennan, 2001; Zhao & Huang, 2020; Liu & Huang, 2020; Shavelson &
decision-making in integrating these sources of feedback into their Webb, 1991); further, it provides a comprehensive conceptual and
redrafts. The analyzes of their essay drafts indicated that the automated methodological framework for analyzing more than one measurement
writing evaluation (AWE) offered the most feedback in total but was facet simultaneously in investigations of score variability and reliability
least incorporated by the five students; however, the results for the (Huang, 2008, 2012; Huang & Foote, 2010; Brennan, 2001).
teacher feedback were just the opposite of the automated feedback; and In addition, the qualitative approach was used to analyze peer and
the results for the peer feedback were between them. More specifically, teacher feedback givers’ written comments on the four aspects of con­
the AWE gave surface-level but not meaning-level feedback; the peers tent, format, language, and organization of the selected EFL essays as
gave twice of the surface-level than the meaning-level feedback; how­ well as peer feedback givers’ responses to the open-ended questions
ever, the teachers gave twice of the meaning-level than the surface-level about their major challenges in providing feedback on peers’ English
feedback. Further, students incorporated more automated feedback on essays and perceived benefits of peer feedback in writing English essays.
grammar and mechanics, more peer feedback on the lexical meaning The purpose of including these qualitative investigations was to provide
and grammar, but more teacher feedback on the sentence and paragraph in-depth information about the validity of peer feedback in the context
levels in their redrafts. of writing assessment of the EFL classrooms.

3.4. Research gaps 4.2. Research context and participants

In view of the studies above, this study is significant in the following The study was conducted at a university in northern China. At the
ways. First, given the contradictory results of the effectiveness of peer university, non-English majors were placed into three levels (i.e.,
versus teacher feedback and limited research on the effectiveness of peer advanced, intermediate, and elementary) based on their performance in
in contrast to teacher feedback in the EFL writing classrooms, more the English language placement test before they started to take the two-
research is needed to find out on what aspects of an essay EFL students year compulsory college English course, with two sessions of 100-minute
are able to provide feedback to their peers’ writing as their English classes each week for 16 weeks each semester. It is important to note
teachers often do. Unlike previous studies (e.g., Paulus, 1999; Ruegg, that there was variation in students’ English proficiency within each
2015; Zhao, 2010), this study would directly compare the effectiveness level.
of peer in contrast to teacher feedback on the same essay simulta­ The non-English major students who participated in this study were
neously. Second, given that the existing literature mainly focused on the in their second semester of the second year and were preparing for the
qualitative comments offered by peers, more studies are needed to College English Test Band-4 (CET-4), one of the national high-stakes
investigate the variability and reliability evidence of quantitative peer standardized English proficiency tests for non-English major students
feedback in terms of holistic scoring since rubric-based scoring is administered by the Department of Higher Education of the Ministry of
informative about the quality of a piece writing (Patchan et al. 2018) Education in China (National College English Testing Committee, 2016).
and thus able to reflect effectiveness of peer versus teacher feedback. Given the nature of the CET-4, both norm-referenced and
Finally, given the large writing classes, uneven development of college criterion-referenced frameworks are evident (Li & He, 2015). One of the
students’ English writing skills as well as writing teachers’ hesitation to teaching objectives in that semester was to facilitate students to meet the
incorporate peer feedback into their teaching process in China (Yu, writing requirement of the CET-4, i.e., being able to write a 120-word

4
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

essay or practical writing based on a given outline, picture, situation or institution for over 15 years and one for over ten years. All of them had
title, with ideas accurately expressed, sentences well connected, and no served as the official CET-4 writing raters.
serious grammatical errors. During this specific semester, these students
were required to write one CET-4 essay every two or three weeks on an 4.3.3. The training of feedback givers
automatic writing assessment platform in China. The training of the peer and teacher feedback givers was conducted
Before the deadline, the students could revise their writing based on online separately, following the same procedures. The feedback giver
the comments given by the automatic assessment platform (comments training was essential to obtain reliable results. Therefore, both peer and
are usually on some local issues such as spelling, grammar, and vocab­ teacher feedback givers were thoroughly trained so that they could
ulary) for as many times as they wanted. After the deadline, the teacher apply the same standards to their feedback of the EFL essays (Han &
selected a few students’ writing samples to instruct students how to Huang, 2017). The following was a detailed description of the peer
organize the structure of an essay, what to be included in the intro­ feedback givers’ training.
duction, body, and conclusion parts, how to connect paragraphs and Before they assessed the writing samples independently, all feedback
sentences with transitional words/sentences, connectives or signal givers had received three-hour formal training administered by the first
words, how to make the ideas clearer and more convincing, and how to researcher. The purpose of the study was first described to the feedback
improve grammar and vocabulary accuracy. Students were suggested to givers. Further, the CET-4 scoring criteria (see Appendix B) was
revise their essays after the teacher’s instruction. Due to the large reviewed and explained by the trainer using the CET-4 writing samples
number of students and their writing samples, it was challenging for the of different qualities. Following that, the feedback givers were asked to
teacher to provide each student with specific comments on each writing assess four writing samples independently and then discussed their
sample. Thus, the teacher intended to encourage students to get feedback outcomes until they had finally reached an agreement. As soon
involved in the feedback providing process to explore the effectiveness as the training session was completed, all feedback givers were invited to
of peer feedback provided for their peers’ essays. assess the selected 30essays independently, first assigning a holistic
For the convenience of the research, the first researcher, a college score to each essay, and then providing approximately 100-word written
English teacher at this university, invited all her three intermediate level comments in Chinese on each essay.
college English classes (118 students altogether) to write one short En­ To avoid low reliability caused by exhaustion, both peer and teacher
glish essay (see Appendix A) in a normal class session within 30 min feedback givers were suggested to assess three essays after studying the
required by the CET-4. The writing task was the authentic writing item rating criteria each day. The 30 essays were dispersed to peer and
used in the previous CET-4 administration. teacher feedback givers three times, with 10 each time, to prevent them
In order to better focus on the research objective (i.e., evaluating the from rushing to assess the essays within a limited time before the
reliability and validity of peer feedback as an aid to teacher feedback in deadline, with the reliability sacrificed.
the EFL writing classrooms), this study did not involve all 118 students
and their essays in the study. Instead, only 30 representative writing 4.3.4. Data collection
samples and 30 peer feedback givers were selected for this study. As shown in Fig. 1, all 30 peer feedback givers and four teacher
feedback givers were invited to quantitatively and qualitatively assess
4.3. Research procedures the 30 essays independently first, and then peer feedback givers were
invited to answer follow-up open-ended questions about their percep­
4.3.1. The selection of writing samples tions of peer feedback in the EFL writing classrooms (see Appendix C).
After the 118 essays written by all students were collected, the first The data collection involved two steps. The first was for peer and teacher
researcher, together with other researchers who are experienced in EFL feedback givers to assign each essay a holistic score on a 15-point scale
writing assessment, reviewed all the 118 essays and then selected a total and to provide approximately 100-word Chinese written comments on
of 30 essays for this study. Since the first researcher was also the teacher the essay. The quantitative feedback resulted in 30 independent samples
of the participants in the study and she knew them well, she completed of EFL essays or EFL papers (student papers, p), scored by 30 peer
the selection task together with other researchers to avoid the potential feedback givers (peer feedback giver, r1) and four teacher feedback
bias functioning in being both a teacher and a researcher and also to givers (teacher feedback giver, r2). The holistic scoring strictly followed
make sure the essays selected represented a variety of different qualities. the authentic CET-4 scoring criteria (see Appendix B). The written
Although these 30 essays were similar in length, they were evaluated comments were made on the content, format, language, and organiza­
and carefully selected in terms of the structure and discourse clarity, tion aspects of each essay. The second step was for peer feedback givers
argument sufficiency, as well as the lexical and grammatical accuracy. to answer the open-ended questions, which were used to examine their
challenges in providing feedback on peers’ English essays as well as their
4.3.2. The selection of feedback givers
Students’ writing proficiency was considered in the selection of peer
feedback givers. From the three participating intermediate level EFL
classes, 30 students with slightly different English writing proficiency
levels (as measured by their writing scores in previous writing tasks and
final examinations) were invited to serve as peer feedback givers of this
study. None of them had the experience providing feedback to their
peers’ writing. Among them, ten majored in printing and packaging
engineering, ten in information engineering, and the remaining ten in
business management. Twenty-one of them (70%) were females, and
nine of them (30%) males. Each peer feedback giver received a financial
reward for this feedback activity due to the massive amount of work
required. It was worth noting that peer feedback givers did not know to
whose essays they gave feedback since all the writers’ names were
replaced by number codes.
Furthermore, the first researcher invited four of her colleagues to
serve as teacher feedback givers. All the four teachers were females.
Three of them had been teaching college English reading and writing in the Fig. 1. Data collection process.

5
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

perceived benefits of peer feedback in English writing classrooms. 5. Results

4.3.5. Data analysis 5.1. RQ1: what is the variability of the holistic scores assigned to English
The G-theory computer program GENOVA (Crick & Brennan, 1983) essays by peer in contrast to teacher feedback givers?
was used to analyze the quantitative data. Within G-theory framework,
data were analyzed in the following three stages: a) a person2-by-peer To answer the first research question, the following two G-studies
feedback giver (p × r1) random effects G-study for peer feedback givers; were conducted: a) a person-by-peer feedback giver (p × r1) random
b) a person-by-teacher feedback giver (p × r2) random effects G-study; effects G-study for peer feedback givers; and b) a person-by-teacher
c) two person-by-feedback giver random effects D-studies for peer feedback giver (p × r2) random effects G-study for teacher feedback
feedback givers (P × R1) and teacher feedback givers (P × R2), respec­ givers. Each G-study yielded the following three variance components:
tively. The variance components obtained from these G-studies were person (p), feedback giver (r), and person-by-feedback giver (pr). The
used to examine the variability of the holistic scores assigned to the 30 results are presented in Table 1.
EFL essays. The D-studies results were used to examine the reliability of As presented in Table 1, the results for peer feedback givers show
the writing scores assigned to these 30 EFL essays by peer in contrast to that the residual yielded the largest variance (44.56% of the total vari­
teacher feedback givers. ance). The residual contains the variability due to the interaction be­
It is important to mention that many other facets could also tween peer feedback givers, person, and other unexplained systematic
contribute to the score variation, e.g., essay quality, peer feedback and unsystematic sources of error. The person (p), the object of mea­
giver’s English writing proficiency level, gender, major, and their in­ surement, yielded the second largest variance component (43.23% of
teractions. Since the focus of this study’s G-theory analyses was to make the total variance), suggesting that the 30 college students were quite
comparisons of their score variability and reliability between peer and different in their written abilities as measured by the writing task. The
teacher feedback givers as raters, the rater facet then became the most feedback giver (r1) variance component yielded the third largest vari­
important facet that we are interested in investigating. ance component (12.21% of the total variance), which demonstrates
Further, qualitative written comments were first coded and sorted, that the 30 peer feedback givers differed greatly from one another in
then organized, and finally grouped and categorized according to the terms of leniency of scoring the 30 EFL essays.
recurring themes (Creswell, 2014). Specifically, the written comments Also as shown in Table 1, the person (p), the object of measurement,
were coded by the first, third, and fourth researchers independently as yielded the largest variance component (64.85% of the total variance),
valid and invalid. Valid comments were expected to improve the quality indicating that the 30 college students were extremely different in their
of the writing once incorporated. With the rating criteria of CET-4 writing abilities. The residual yielded the second largest variance
writing item for reference, a four-level coding scheme was used to component (28.43% of the total variance). Again, the residual contains
code the comments on content, format, language, and organization. the variability due to the interaction between person, feedback givers,
Each of the four levels is further divided into several detailed facets. In and other unexplained systematic and unsystematic sources of error.
the case of problematic coding, the three coders negotiated with each Teacher feedback giver (r2) yielded the smallest variance component
other until an agreement was reached. (6.72% of the total variance), suggesting that the four teacher feedback
After the coding process, the data were put into Excel to calculate the givers were only slightly different in terms of their leniency of scoring
total number of recorded comments and valid comments on each facet these 30 essays.
assigned to the 30 essays by each peer and teacher feedback giver. Then
descriptive analyses were conducted to examine the quantitative dif­
5.2. RQ2: what is the reliability of the holistic scores assigned to English
ferences in the written comments on each level between peer and
essays by peer in contrast to teacher feedback givers?
teacher feedback givers. Further, the quantity of valid written comments
on each level given by peer and teacher feedback givers, respectively,
To answer the second research question, the following two D-studies
were compared with the number of written comments agreed by the four
were conducted: a) a person-by-peer feedback giver (P × R1) random
researchers. They were considered as the gold standard because they
effects D-study for peer feedback givers; and b) a person-by-teacher
have similar background to the four teacher participants; more impor­
feedback giver (P × R2) random effects D-study for teacher feedback
tantly, they had worked together to make comments on each essay and
givers. Given both the norm-referenced and criterion-referenced nature
reached an agreement in terms of its strengths and weaknesses. It is
of the CET-4, both G- and Phi-coefficients were provided. The results of
assumed that the researchers’ group work would have been more valid
both D-studies are presented in Table 2.
than the four teacher participants’ individual work. Further, sample
As shown in Table 2, the G- and Phi-coefficients obtained for the 30
essays with written comments were analyzed for similarities and dif­
essays for the current 30 peer feedback givers were.97 and.96, respec­
ferences between peer and teacher feedback givers.
tively. In real classroom assessment context, where it is possible to have
Finally, peer feedback givers’ responses to follow-up open-ended
two to four peer feedback givers involved, the G-and Phi-coefficients can
questions were analyzed in a similar manner. The data were entered into
then range from .60 to .80, suggesting acceptable reliability coefficients.
an Excel spreadsheet by the four researchers who have rich experience in
qualitative data analysis including coding. After that, they indepen­
dently color-coded the responses under each open-ended question and Table 1
Variance components for random effects p × r1 and p × r2 G-studies.
sorted them into different categories, and then grouped and categorized
the conceptually similar responses collaboratively according to the Feedback giver group Source of variability df σ2 %
recurring themes (Creswell, 2014). Peer feedback givers pa
29 2.1001 43.23
r1 29 0.5928 12.21
pr1 841 2.1648 44.56
Total 899 4.8577 100
Teacher feedback givers pa 29 3.5718 64.85
r2 3 0.3703 6.72
pr2 87 1.5658 28.43
Total 119 5.5079 100.00
2 a
Since each person (i.e., p, the object of measurement) in this study Since each person (i.e., p, the object of measurement) in this study completed
completed only one writing task (i.e., essay), it was impossible to consider task only one writing task (i.e., essay), it was impossible to consider task as a facet of
as a facet of analysis in these G-studies. analysis in these G-studies.

6
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

Table 2
Summary of G- and Phi-coefficients for peer and teacher feedback givers.
Numberof peer feedback givers Peer feedback givers Numberof teacher feedback givers Teacher feedback givers

G-coefficients Phi-coefficients G-coefficients Phi-coefficients

1 .50 .43 1 .70 .65


2 .66 .60 2 .82 .79
3 .74 .70 3 .87 .85
4 .80 .75 4 .90 .88
5 .83 .79 5 .91 .90
6 .85 .82 6 .93 .92
7 .87 .84 7 .94 .93
8 .89 .86 8 .95 .94
9 .90 .87 9 .95 .94
10 .91 .88 10 .96 .95
11 .91 .89 11 .96 .95
13 .93 .91 12 .96 .96
15 .94 .92 13 .97 .96
17 .94 .93 14 .97 .96
19 .95 .93 15 .97 .97
21 .95 .94 16 .97 .97
23 .96 .95 17 .97 .97
25 .96 .95 18 .98 .97
27 .96 .95 19 .98 .97
30 .97 .96 20 .98 .97

Also as shown in Table 2, the G- and Phi-coefficients obtained for the feedback givers on content(teacher mean = 63, teacher standard
30 essays for the current four teacher feedback givers were .90 and .88, deviation = 19.24; peer mean = 52.2, peer standard
respectively. In real classroom assessment context, where it is possible to deviation = 14.77), language (teacher mean = 192.25, teacher stan­
have only one teacher feedback giver, the G-and Phi-coefficients are .70 dard deviation = 94.5; peer mean = 99.4, peer standard
and .65, respectively, suggesting barely acceptable reliability co­ deviation = 41.87), and organization(teacher mean = 134.25, teacher
efficients in the context of classroom assessment. standard deviation = 17.25; peer mean = 117.57, pear standard
deviation = 12.57). However, peer feedback givers made more com­
ments than teacher feedback givers on format (peer mean = 14.93, peer
5.3. RQ3: how valid are the written comments on EFL essays made by
standard deviation = 10.59; teacher mean = 13.75, peer standard
peer in contrast to teacher feedback givers?
deviation = 12.97). Overall, the descriptive statistical results indicate
that comments made on content, format, and organization did not vary
To answer the third question, the comments by both groups were
substantially between teacher and peer feedback givers, whereas
analyzed both quantitatively and qualitatively. The quantitative results
teacher feedback givers made substantially more comments on language
were presented first, followed by the qualitative results. Table 3 presents
than peer feedback givers.
the descriptive statistics (i.e., means and standard deviations) of the
As also shown in Table 3, the comments made on content, format,
total number of recorded comments and valid comments on the 30 EFL
language, and organization by teacher feedback givers were 100% valid,
essays made by peer and teacher feedback givers. It further includes the
but the comments made on the same four areas by peer feedback givers
percentage of valid comments as well as the percentage of expected valid
were 92.54%, 86.2%, 76.56%, and 93.17% valid, respectively. The
comments made by the two feedback giver groups.
percentage of valid comments over the total number of expected com­
As shown in Table 3, with all the comments of the 30 essays counted
ments agreed by the four researchers of this study demonstrated that
altogether, teacher feedback givers made more comments than peer

Table 3
Quantitative results of the written comments.
Total number of expected Feedback giver N Total number of Total number of Percentage of valid Percentage of expected valid
commentsa group commentsb valid commentsc commentsd(%) commentse(%)

Mean SD Mean SD

Content (183) Teacher feedback 4 63.00 19.24 63.00 19.24 100 34.43
givers
Format (94) Peer feedback givers 30 14.93 10.59 12.87 9.83 86.20 13.69
Teacher feedback 4 13.75 12.97 13.75 12.97 100 14.63
givers
Language (802) Peer feedback givers 30 99.40 41.87 76.10 35.92 76.56 9.49
Teacher feedback 4 192.25 94.50 192.25 94.50 100 23.97
givers
Organization (269) Peer feedback givers 30 117.57 12.57 109.54 13.57 93.17 40.72
Teacher feedback 4 134.25 17.25 134.25 17.25 100 49.91
givers
a
Total number of expected comments refers to the total number of comments in the four areas of content, format, language, and organization which should be
identified in the 30 EFL essays as agreed by the four researchers of this study.
b
Total number of comments refers to the total number of comments on the 30 EFL essays made by peer and teacher feedback givers.
c
Total number of valid comments refers to the total number of comments made by peer and teacher peer feedback givers on the 30 EFL essays which were valid.
d
Percentage of valid comments refers to the percentage of valid comments among the total number of comments made by peer and teacher feedback givers.
e
Percentage of expected valid comments refers to the percentage of valid comments among the total number of expected comments agreed by the four researchers of
this study.

7
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

both teacher and peer feedback merely covered part of the problems of directly modified them. Only four peer feedback givers, however,
the essays. Offering 100-word feedback in a separate paragraph instead identified some of the problems and then commented “the idea in para­
of directly in the essays might limit the feedback to comments on more graph…is not clear/is hard to understand,” “the ideas in the essay are not
important problems. Still, there were some differences between teacher clearly expressed,” and “some sentences in the essay are not easy to under­
and peer feedback givers in the areas of on content (teacher stand.” Apparently, peer feedback givers were more implicit in pointing
percentage = 34.43%; peer percentage = 26.4%), format (teacher out idea-expression problems, and they did not offer suggestions for
percentage = 14.63%; peer percentage = 13.69%), language (teacher modifications. Even worse, many peer feedback givers praised this essay
percentage = 23.97%; peer percentage = 9.49%), and organization for conveying clear ideas.
(teacher percentage = 49.91%; peer percentage = 40.72%). Second, comments on format included the spelling, capitalization,
Qualitative results of the written comments on content, format, and punctuation errors identified in the essay. For example, comments
language, and organization are presented in the following section. The on essay Sample #29 (see below) agreed by the four researchers
comments on sample essays by both feedback giver groups were included three underlined punctuations, and one spelling and two
compared with the comments agreed by the four researchers to capitalization errors in the essay as well as the corrections of these er­
demonstrate the similarities and differences in the comments made by rors. The results showed that two teacher feedback givers identified and
the peer and teacher feedback givers on these four aspects of the essay. corrected all punctuation errors, but no teacher feedback giver com­
The sample essays (or parts of the sample essays) were selected mented on the capitalization and spelling errors. In contrast, about ten
randomly for the presentation of the typical problems in the aspects of peer feedback givers identified and corrected the spelling and capitali­
content, format, language, or organization. zation errors, but only four peer feedback givers identified and corrected
First, comments on content involved whether the essay was pertinent one of the punctuation errors.
to the topic, whether adequate arguments were provided to support the (Essay Sample #29)
thesis statement or the topic sentence in corresponding paragraph, and Third, comments on language mainly included both vocabulary and
whether the ideas were clearly expressed. For example, comments on grammatical errors. For example, comments on essay Sample #18 (see
essay Sample #8 (see below) agreed by the four researchers included below) agreed by the four researchers included nine identified vocab­
that the argument in the fourth paragraph failed to support its topic ulary (circled and numbered from 1 to 9) and seven grammatical errors
sentence, and then a suggestion was offered to either change the topic (underlined and numbered from A to G) as well as their suggestions for
sentence or modify the details to ensure the consistency of the para­ modification. The results showed that the first teacher feedback giver
graph. The results indicated that only one teacher and one peer feedback just pointed out vocabulary errors #1 and #2 as well as the grammar
giver pointed it out. Surprisingly, almost half of the peer feedback givers error #B without offering suggestions for modification; the second
praised the essay by saying “adequate/well-founded arguments”. teacher feedback giver pointed out and corrected three vocabulary (i.e.,
(Essay Sample #8) #1, #5, and #9) and two grammatical (i.e., #B and #C) errors; the third
Furthermore, the underlined sentences and phrases are not clearly teacher pointed out and corrected all the errors underlined except vo­
expressed in this essay, which were identified and suggested for modi­ cabulary error #4; and the fourth teacher identified and corrected all the
fication by the four researchers. The results showed that all four teacher errors underlined but grammatical error #F.
feedback givers provided valid feedback. They either offered general (Essay Sample #18)
comments such as “too many invalid expressions lead to unclear ideas,” In contrast, for an EFL essay with so many language errors, three peer
“what do you mean by…? What does … sentence mean?” to get the writer’s feedback givers praised it for diverse sentence patterns and vocabularies
attention to the problem, or offered suggestions for modification, or in their written comments. Further, a few peer feedback givers made no

8
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

comments on language at all. Most peer feedback givers identified only underlined sentences and suggested changing the sentence order and
one to three vocabulary or grammatical errors highlighted in the essay, add connection words to make it more coherent. Awkwardly, nine peer
with vocabulary errors #2 and #5 as well as grammatical errors #A, #D, raters praised the essay by saying “the sentences in the essay were fluent
#E, and #G most frequently identified and corrected. and logical”.
Finally, comments on organization included sentence coherence,
paragraph organization, and passage structure. The underlined senten­
ces and circled connective phrases in Samples # 24 and # 27 (see below)
were commented as incoherent in the comments by the four researchers.
Then very explicit suggestions for modifications were provided. For
instance, in Sample #24 “so” was suggested to be replaced by “because”,
turning the sentence into a subordinate cause to the former sentence. For (An Excerpt from Essay Sample # 24)
the underlined sentences in Sample #27, it was suggested to move the
sentence “I am more like a child now that I afraid everything new” after
the second sentence of this paragraph and apply connective words like
“so…that” or “and” to make it clearer and more concise. Also, one of the
circled connection phrases is suggested to be modified to make them
parallel.
Most teacher and peer raters praised sample # 24 for its fluent,
coherent and logic sentences, ignoring the problem underlined above.
Only one teacher and one peer rater pointed out the problem in sample
# 24, yet without giving any modification suggestions. As for sample # (An Excerpt from Essay Sample # 27)
27, three teacher and 11 peer raters pointed out the unparallel The four researchers agreed that the essay structure of sample #9
connection phrases and offered ways of modifications; two teacher and (see below) was good for its consistency, with the conclusion corre­
four peer raters made general comments like “influent sentences of the sponding to the thesis statement; however, the essay was not well-
essay” and “lack of connection words between sentences”; only one organized. Specifically, the introduction included some arguments that
teacher and two peer raters identified the specific problem in the should have been in the body of the essay, and the conclusion was too

9
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

short to sum up the key ideas. The four researchers also offered several to overcome these challenges. Whenever they encountered difficulties in
specific suggestions for the reorganization of the essay. The results offering feedback, they looked up dictionaries and English grammar
indicated that almost all teacher and peer feedback givers pointed out books, or searched the Internet, or even asked friends, classmates, and
the problems in passage structure and paragraph organizations, but only teachers for assistance. They believed that they had made the greatest
one teacher and 13 peer feedback givers provided one or two sugges­ effort to give their peers more useful feedback.
tions for the improvement of the essay organization. When asked whether their feedback could help peers improve the
(Essay Sample #9) quality of their English essays, 27 peer feedback givers (90%) made a
positive response and three (10%) made a negative response. The three
peer feedback givers who made a negative response admitted that they
5.4. RQs 4 & 5: what are peer feedback givers’ major challenges in
were poor in English writing, which had prevented them from giving
providing feedback on peers’ English essays? And what are their perceived
quality feedback on their peers’ essays. One peer feedback giver who
benefits of peer feedback in writing English essays?
made a positive response commented that “my feedback could only help
peers to improve poor-quality essays but not the good-quality essays.”
To answer the fourth and fifth questions, all 30 peer feedback givers
Another peer feedback giver who made a positive response added that
responded to six open-ended questions. Their responses to the first few
“even if my feedback is invalid, it can be a warning to the peer writer, who will
open-ended questions were used to further prove the qualitative analysis
avoid making a similar error in the future.”
results, i.e., they are not proficient in providing feedback on language
All peer feedback givers believed that they had benefitted from of­
aspects; their responses to the remaining open-ended questions were
fering feedback on their peers’ essays. Their perceived benefits included
used to further explain their perceptions about the benefits of peer
a) the errors their peers made could be a reminder to them in future
feedback as well as their attitudes towards peer feedback, which would
English essay writing practices; b) seeking assistance from all kinds of
increase the validity of the written comments provided by the peer
channels could help further enhance their English proficiency; c) their
feedback givers and further suggest that peer written comments could be
awareness of the importance of essay organization was strengthened
practiced in the real EFL classroom setting. The findings are as follows.
through peer feedback activities; and d) their ability to write essays with
Among the 30 peer feedback givers, 28 of them (93.3%) felt easy to offer
grammatical language and sufficient content could be also increased.
feedback on the content and organization aspects of an essay. They
Taking these benefits into consideration, 26 peer feedback givers
could easily confirm whether an essay was pertinent to the topic and the
(86.7%) expressed their willingness to have peer feedback as a routine
ideas were clear by reading the thesis statement and conclusion as well
activity in their future English writing classrooms. Some of them further
as the major points in the body part. In addition, they could easily find
suggested that the peer feedback activity be practiced in small groups to
out whether an essay was well-organized or not by checking the signal
provide effective feedback to those peers with relatively poor English
words of each paragraph. These results were generally consistent with
writing skills. Further, other peer feedback givers recommended that
qualitative findings of the written comments. Finally, 15 peer feedback
peer feedback should be treated as a complementary to teacher’s feed­
givers (50%) mentioned that they could easily make comments on idea-
back in the college English writing classrooms. However, four peer
expression by examining whether the sentences were fluent or not.
feedback givers (13.3%) objected to incorporating it in their future
However, qualitative analysis of the written comments demonstrated
English writing classes because they thought that the peer feedback was
that only a very few peer feedback givers were able to properly identify
time-consuming although it did yield benefits for feedback givers and
problems on this aspect. Apparently, these peer feedback givers over­
receivers as well.
estimated their ability to give feedback on this aspect.
However, 27 (90%) and 12 (40%) peer feedback givers felt they had
6. Discussion
difficulty in providing feedback on grammar and vocabulary, respec­
tively. They reported to have challenges in identifying the grammatical
6.1. The variability of the holistic scores
and vocabulary (e.g., some Chinglish expressions) errors in an essay;
furthermore, even sometimes they could identify these errors, but they
The first research question attempted to examine the variability of
were not sure about how to modify them. These findings were also
the holistic scores assigned by peer in contrast to teacher feedback
consistent with qualitative findings of the written comments.
givers. The score variation did exist between the two feedback giver
More importantly, 29 peer feedback givers (96.7%) reported that
groups. The desired variance associated with the object of measurement
they had tried their best to give feedback on their peers’ essays in order

10
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

for the teacher feedback givers was considerably larger than that for the each was supposed to offer feedback to one piece of essay after each
peer feedback givers. Furthermore, the undesired feedback giver and writing task. Heavy workload might inhibit peers from offering suffi­
residual variance components for the peer feedback givers were almost cient feedback on the essays. Further, peers’ proficiency might also play
twice as much as for the teacher feedback givers. These results suggested a role. The participants in the study by Tian and Zhou (2020) were from
that there were considerably more score variations in peer feedback a privileged university and had passed CET-4, so they were more pro­
givers than in teacher feedback givers. The possible reason is that ficient than the participants in the present study. High English profi­
teacher feedback givers are definitely more experienced in scoring EFL ciency (HEP) students did offer more feedback than low English
essays than the peer feedback givers; further, it was the first time for all proficiency (LEP) students (Allen & Katayama, 2016; Allen & Mills,
peer feedback givers to score EFL essays following the CET-4 writing. 2014; Lundstrom & Baker, 2009).
Further study is needed to find out whether more practice will be Furthermore, peer feedback givers generally performed almost as
effective in narrowing the score variation between peer and teacher well as teacher feedback givers in making comments on most of the
feedback givers. content and organization aspects of EFL essays, with more than 92%
being valid, suggesting that most of the comments on content and or­
6.2. The reliability of the holistic scores ganization made by peer feedback givers were useful and effective. Peer
feedback givers’ comments on format and language, however, were
The second question attempted to examine the reliability of the ho­ comparatively lower in quality, with 86.20% and 76.57%being valid,
listic scores assigned to the 30 EFL essays by peer in contrast to teacher respectively. These results were similar to the study by Caulk (1994),
feedback givers. The results suggested that the G- and Phi-coefficients who found that 89% of the comments made by the students were valid.
obtained for the 30 essays for one teacher feedback giver in the real What should be noted is that although teachers offered more valid
classroom assessment context were.70 and.65, respectively. Such reli­ written comments than peers, how much feedback receivers could learn
ability coefficients could be easily achieved if the number of peer from each source of comments deserves further investigation. Zhao
feedback givers is increased to three. These results indicate that small- (2010) found that students accepted teachers’ feedback without fully
group quantitative peer feedback in Chinese college English writing understanding while the application of L1 and similar learning experi­
classrooms could be as reliable as teacher’s feedback. In other words, ence allowed students to easily understand peer feedback. In view of
four students are expected to be involved in one group to perform peer this, feedback receivers benefited more from peer feedback, which
feedback activities in a real classroom setting, with each essay given would be more likely to promote students’ writing development.
feedback by three peers, which is practicable. This was inconsistent with Despite lower percentage of valid feedback on format and language,
what other researchers suggested (e.g., Mittan, 1989; Paulus, 1999). students performed relatively well in offering feedback to their peers’
They claimed that pairs were more likely to discuss their writing in a essays. One probable reason is that students were trained to provide
more intensive way. This would deserve further investigation given that feedback on their peers’ essays by checking a feedback guidance sheet,
the conclusion this current study arrived at was based on a quantitative which clearly listed down and exemplified on what aspects of the essays
analysis, and no discussion of the written essays was involved in this should be commented. This type of training or scaffold can help students
study. Despite a summative purpose, being able to offer a reliable perform peer feedback better (Alqassab et al., 2017; Min, 2006).
rubric-based holistic scoring does demonstrate that students have a good Another possible reason is that peers were paid to assess the essays
understanding of the criteria, which would help them in giving quality (despite a small pay), which might have motivated them to perform the
written comments to their peers’ essays and benefit them greatly in task more conscientiously. Extrinsic stimuli like compliments, extra
writing their own essays. scores, and financial rewards do play a role in motivating students to
perform well in their schoolwork, particularly when it is the first time for
6.3. The validity of the written comments by peer and teacher feedback them to do that work and no intrinsic motivation is developed (Cameron
givers and Pierce, 1994; Lepper, Keavney, & Drake, 1996).
The findings above were also partly consistent with earlier research
The third research question aimed to examine the validity of the which showed that teacher feedback givers gave more feedback on
written comments made by peer in contrast to teacher feedback givers. grammar and vocabulary than peer feedback givers while peer feedback
The quantitative results suggested that all the written comments made givers tended to provide overall comments or make comments on global
by teacher feedback givers were valid and appropriate, whereas some of issues like organization (Chen, 2010); but it conflicted with the finding
the written comments made by peer feedback givers were invalid and by Ruegg (2015), Dressler et al. (2019), and Tian and Zhou (2020), who
inappropriate. With this taken into consideration, the total number of stated that teacher feedback givers provided more feedback on
correct comments made by teacher feedback givers on all four aspects of meaning-level issues.
content, format, language, and organization exceeded the total number Such a difference could be attributed to students’ different language
of valid comments made by peer feedback givers, especially on the proficiencies. The students in the study by Dressler et al. (2019) were
language aspect of EFL essays. This did make sense because teacher graduates, and the students in the study by Tian and Zhou (2020) were
feedback givers are more proficient in English; besides, considering that from a privileged university and had passed CET-4 before the study.
giving feedback to 30 peers’ essays was a “time-consuming” task to some Thus, they might have a higher English proficiency, and there were few
peer feedback givers, both the quantity and quality of the peer feedback grammar and vocabulary errors worthy of the teacher’s attention. In
could have been negatively influenced. Ruegg’s (2015) study, the students were English major sophomores.
Surprisingly, this finding was different from the result of the study by Considering the latter spending more time in learning English, they were
Tian and Zhou (2020). The latter found that peers offered more feedback more proficient in English and less likely to make more grammar and
than the teacher. The difference might be caused by the different vocabulary errors in their writing, which resulted in less feedback on
research designs. In the present study, teachers and peers gave feedback this perspective. Another possible reason was whether the teacher
on the essays simultaneously. In the study by Tian and Zhou (2020), shared the same L1 with the students. In the current study, the teachers
nevertheless, the teacher provided feedback after the students had shared the same L1 with the students while it was not the case in Ruegg’s
revised their essays based on their peers’ feedback, and thus the teacher (2015) study. The same L1 could reduce difficulties in meaning-level
might not find many problems with these essays. Besides, peer feedback comprehensibility (Ruegg, 2015), thus the teachers made no more
givers in the current study were expected to provide written comments feedback on this aspect than peers in this study.
on 30 essays over a period often days while the participants in the study Another noticeable point that might have an impact on the kind of
by Tian and Zhou (2020) were assigned into peer feedback pairs and feedback given is the format required to provide feedback. In the current

11
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

study, the feedback was given in written paragraphs rather than track proficiency level peer feedback givers. As much research suggested
changes in the essays directly. If feedback givers could offer feedback (Allen & Mills, 2014; Allen & Katayama, 2016; Lundstrom & Baker,
directly in the essays, they might be more inclined to correct each 2009; Wu, 2019; Yu & Hu, 2017; Yu & Lee, 2016b), peer feedback
mistake. However, they were asked to provide 100-word written com­ effectiveness is influenced by peers’ proficiency. The results would be
ments on a separate sheet in the study, which confined their feedback to different if students of advanced proficiency level were involved.
more important aspects. Second, the participants were from a single college and taught by the
According to the qualitative analysis of the written comments, peer same English teacher. The results might be different if participants from
feedback givers had more trouble identifying problems in sentence different colleges or taught by different teachers were engaged in the
coherence, idea-expression, vocabulary, grammar, and punctuation. study. Research indicated that the cultures in which EFL students learn
Also, some peer feedback givers sometimes identified improper errors in did play a role in their peer feedback practice (Tsui & Ng, 2000).
these aspects. Compared with the teachers, students were less qualified Third, feedback receivers’ views towards the effectiveness of peer
to give valid comments on language aspects (Ruegg, 2015). These boiled and teacher feedback and their incorporation of the feedback into their
down to peer feedback givers’ insufficient mastery of English gram­ revisions were not considered in the study. These are important in­
matical rules and vocabulary knowledge. Due to the L1 influence, they dicators of the effectiveness of peer feedback in contrast to teacher
were not capable of distinguishing words of the same root, like “grad­ feedback, especially considering the fact that incorporating feedback
uation versus graduate”, “suit versus suitable” in the samples above; does not necessarily amount to learning and benefiting from that feed­
they did not pay attention to diverse verb-noun collocations; they back (Zhao, 2010). Also, it is necessary to compare the types of feedback
expressed their ideas in Chinese word order; they frequently use given with the types of revisions made (Ferris, 1997). One type of
commas. In view of this, teachers could be suggested to provide feed­ feedback does not necessarily lead to the same type of revisions or the
back on sentence coherence, idea-expression, and grammar (or give same type of improvement (Biber et al., 2011; Paulus, 1999; Rueggs,
students more help in giving feedback on these aspects), while peers 2015).
provide feedback on organization and format, which was agreed by Fourth, the 30 peer raters of this study did receive three-hour formal
Ruegg (2015). training before assessing these essays. However, it seemed not as
effective as expected since it was conducted online, and a few peer raters
6.4. The challenges and perceived benefits reported by peer feedback missed some important information due to the unstable network and
givers some other personal reasons. As suggested in the literature, training is
commonly recognized as an indispensable factor exerting positive
The fourth and fifth research questions asked about peer feedback impact on students’ peer feedback effectiveness (Alqassab et al., 2017;
givers’ major challenges in providing feedback on peers’ English essays Hu, 2005; Liu & Hansen, 2002; Min, 2006). Moreover, both teacher and
and their perceived benefits of peer feedback in writing English essays, peer raters were trained to give 100-word comments. This hindered
respectively. They had major challenges in first identifying the gram­ them from giving as many comments as they could. Another important
matical and vocabulary errors in an essay and then providing the writer issue was that since this study was conducted in the last four weeks of
with appropriate suggestions for modification, which was consistent the semester, both teacher and peer raters were so busy that they might
with previous studies (Chen, 2010, Connor & Asenavage, 1994; Yang be distracted by exams and papers, with the validity and reliability of the
et al., 2006) and was also proved by the qualitative analyses of peers’ feedback more or less sacrificed. According to Wu (2019), time avail­
written comments. Furthermore, almost all of the peer feedback givers ability is one of the influential factors of peer feedback effectiveness.
claimed that they sought assistance from the Internet, dictionary, or Given that this was the first time for peer feedback givers to do the job,
grammar books to make sure their feedback on their peers’ essays was as more training should have been given to them to improve the feedback
valid as possible. As a result, they perceived that their feedback could quality (Min, 2006).
help their peers improve essays to some extent regardless of their Fifth, this study focused on the peer feedback reliability and validity
imperfection in providing revision suggestions on the language prob­ as compared with teacher feedback. It might cause misunderstanding
lems. More importantly, they agreed that offering feedback on their that peer feedback could replace teacher feedback since they do not
peers’ essays could also help them enhance their reader awareness, interact in an either-or relationship. Such comparison was to provide
involving them in self-reflection on and regulation of their own writing, evidence for evaluating peer feedback as a reliable and valid comple­
and thus improve their ability to write English essays with grammatical mentary aid to teacher feedback in EFL writing classrooms, which was
language, convincing content, and good organization. This was also exactly the purpose of this study. Therefore, the results of this study
proved by other studies (Berg et al., 2006; Chang, 2015; Wang, 2014; should be interpreted with caution.
Yao et al., 2020, Yu, 2019; Yu & Hu, 2017; Zimmerman & Kitsantas, Finally, the study was explorative and conducted in a research
2002). setting instead of a real classroom setting. In the current study, peer
Based on these benefits, peer feedback givers stated that they were feedback givers offered feedback individually while in a real classroom
willing to have peer feedback as a routine activity of their English setting, students are put into a group to assess each other’s essays, and
writing class as long as the number of essays assessed was no more than they have opportunities to discuss and offer feedback to the essays.
three or four after each writing task. Such a finding echoed the result of Therefore, the results of this study could not be totally generalized to a
the quantitative analyses of holistic scores, which suggested that the real classroom setting.
scores given to the same essay by three peer feedback givers and the
teacher would be equally reliable. Previous studies also demonstrated 8. Conclusions
that students were willing to have peer feedback to increase the amount
and diversity of feedback on their writing (Lee, 2015; Lei, 2017; Séror, In light of these limitations, the following conclusions were reached.
2011). First, peer feedback givers did a relatively satisfactory job in assessing
the 30 EFL essays, and peer feedback could be applied in EFL writing
7. Limitations class as an effective complementary aid to teacher feedback. The reli­
ability of the holistic scores by up to three peer feedback givers could
This study was limited in the following six ways. These limitations reach that of the holistic scores by one teacher, which was the real
may limit the generalization of the findings to the larger context of classroom assessment context. Besides, most peer feedback givers per­
feedback activities in the college English writing classrooms across the formed almost as well as teacher feedback givers in most content and
country. First, the student participants only represented intermediate organization aspects despite their difficulties in feedback provision of

12
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

grammar and vocabulary. Most importantly, the analyzes of open-ended feedback givers learn to provide appropriate and effective feedback in
questions demonstrated the students’ willingness to have peer feedback all aspects of an essay at both the global and local levels, which could be
as a routine activity of their writing class as long as the number of essays improved through frequent training (Min, 2006). Likewise, peer feed­
assessed was no more than three or four after each writing task. back receivers should trust their peers’ feedback and incorporate it in
Admittedly, the effectiveness of peer feedback depends on students’ the revision of English essays, considering the findings of this study and
positive perceptions of peer feedback to some extent since they facilitate previous studies (e.g., Yu & Lee, 2016b; Wu, 2019), which showed that
students’ engagement in the activity (Yao et al., 2020). even LEP students could make valuable comments on their peers’
Second, there were large residual variance components in the G- writing with the help of L1, reference tools, enough time availability,
studies, suggesting that other hidden facets were not considered in the and separation from their peers.
analysis (Brennan, 2001). These possible hidden facets might include
peer feedback givers’ gender, English proficiency, time spent on each 10. Recommendations for future research
essay, and the quality of each essay (Brennan, 2001; Huang, 2012;
Shavelson & Webb, 1991). More studies are needed in the future to Future research is needed to investigate into the effectiveness of four-
integrate these facets into the design to minimize the residual variance person group peer feedback as a reliable and valid complementary aid to
components. teacher feedback in a real classroom setting. Feedback receivers’ views
towards the effectiveness of peer and teacher feedback and their
9. Implications for practice incorporation of the feedback into their revisions also need to be studied
in the future since these are important indicators of the effectiveness of
The results of this study have important implications for Chinese peer feedback in contrast to teacher feedback. Besides, such facets as
college English teachers. They can make full use of peer feedback in the peer feedback givers’ gender, English proficiency, time spent on
writing classrooms. The explorative study provided evidence that the assessing each essay, and the quality of each essay should be integrated
reliability of three peers’ feedback was the same as that of a teacher. into the G-theory study design to minimize the residual variance com­
Consequently, teachers can assign four peers to each assessment group ponents. Above all, training, rather than a once-for-all practice, should
in a real classroom setting because they should have the ability to check be conducted throughout the entire process to help students practice the
peers’ writing and then provide effective feedback, particularly at the peer feedback activities.
levels of content and organization. At the same time, teachers could
concentrate on certain global problems such as inadequate arguments Disclaimers
and illogically coherent sentences as well as other local problems such as
unclear expressions, ungrammatical sentence structures, and vocabulary None.
errors. Above all, teachers should make sure students can fully under­
stand their feedback since only a full understanding the feedback can Source(s) of support
promote students’ writing development (Zhao, 2010).
The results of this study have implications for Chinese college stu­ This research did not receive any specific grant from funding
dents as well. They should be actively engaged in peer feedback activ­ agencies in the public, commercial, or not-for-profit sectors.
ities in their English writing classrooms. It is suggested that peer

Appendix A. The writing task

You are allowed 30 min to write an essay. Suppose you have two options upon graduation: one is to find a job somewhere and the other to go to a
graduate school. You are to make a choice between the two. Write an essay to explain the reasons for your choice. You should write at least 120 words
but no more than 180 words.

Appendix B. The CET-4 writing scoring criteria

The maximum score point of a CET-4 essay is 15, with the following six score ranges specified: 13–15, 10–12, 7–9, 1–3 and 0. The table below
presents the scoring criteria of each score range (Zhao & Huang, 2020, p. 8).

Score Scoring criteria


range

13–15 The essay is/has a) pertinent to the topic, b) clear idea-expression, c) coherent, and d) basically no language errors except a few minor errors.
10–12 The essay is/has a) pertinent to the topic, b) clear idea-expression, c) fairly coherent, and d) several language errors.
7–9 The essay is/has a) basically pertinent to the topic, b) not clear idea-expression in some sentences, c) barely coherent, and d) many language errors with several major
errors.
4–6 The essay is/has a) basically pertinent to the topic, b) unclear idea-expression, c) not coherent, and d) many major language errors.
1–3 The essay is/has a) not pertinent to the topic, b) disorganized idea-expression, c) fragmented sentences, and d) many incorrect sentences with major errors.
0 They essay is/has left blank, or only several isolated words, or not relevant to the topic.

Appendix C. Follow-up open-ended questions about peer raters’ reflections on the assessment activities

1. On what aspect could you give feedback easily? Why was it easy?
2. On what aspect did you have difficulty in offering feedback and why?

13
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

3. Did you seek any assistance from the internet, dictionary or other people when giving feedback on your peers’ essays? If yes, did you think it
enabled you to give more effective feedback on your peers’ essays?
4. Do you think your feedback could help your peers improve the quality of their essays? Why or why not?
5. Did you benefit from offering feedback for your peers’ essays? If yes, what were they? If not, why?
6. Are you willing to have peer feedback as a routine activity of your writing class? Why or why not?

References Curtis, A. (2001). Hong Kong student teachers’ responses to peer group process writing.
Asian Journal of English Language Teaching, 11, 129–143.
Dressler, R., Chu, M. W., Crossman, K., & Hilman, B. (2019). Quantity and quality of
Allen, D., & Katayama, A. (2016). Relative second language proficiency and the giving
uptake: Examining surface and meaning-level feedback provided by peers and an
and receiving of written peer feedback. System, 56, 96–106.
instructor in a graduate research course. Assessing Writing, 39, 14–24.
Allen, D., & Mills, A. (2014). The impact of second language proficiency in dyadic peer
Evans, C. (2013). Making sense of assessment feedback in higher education. Review of
feedback. Language Teaching Research, 20(4), 498–513.
Educational Research, 83(1), 70–120.
Alqassab, M., Strijbos, J. W., & Ufer, S. (2017). Training peer-feedback skills on
Ferris, D. R. (1997). The influence of teacher commentary on student revision. TESOL
geometric construction tasks: Role of domain knowledge and peer-feedback levels.
Quarterly, 31, 315–339.
European Journal of Psychology of Education, 33(1), 11–30. https://doi.org/10.1007/
Gibbs, G., & Simpson, C. (2004). Conditions under which assessment supports students’
s10212-017-0342-0
learning. Learning and Teaching in Higher Education, 1, 18–19.
Alqassab, M., Strijbos, J. W., & Ufer, S. (2018). Preservice mathematics teachers’ beliefs
Hansen, J. G., & Liu, J. (2005). Guiding principles for effective peer response. ELT
about peer feedback, perceptions of their peer feedback message, and emotions as
Journal, 59(1), 31–38.
predictors of peer feedback accuracy and comprehension of the learning task.
Hislop, J., & Stracke, E. (2017). ESL students in peer review: An action research study in
Assessment & Evaluation in Higher Education, 44(1), 139–154. https://doi.org/
a university English for Academic Purposes course. University of Sydney Papers in
10.1080/02602938.2018.1485012
TESOL, 12, 9–44.
Amores, M. J. (1997). A new perspective on peer-editing. Foreign Language Annals, 30(4),
Hu, C., & Zhang, Y. (2014). A study of college English writing feedback system based on
513–522. https://doi.org/10.1111/j. 1944-9720.1997.tb00858.x
M-learning. Modern Educational Technology, 7, 71–78. https://doi.org/10.3969/j.
Huang, J. (2008). How accurate are ESL students’ holistic writing scores on large-
issn.1009-8097.2014.07.010
scaleassessments? A generalizability theory approach. Assessing Writing, 13,
Hu, G. (2005). Using peer review with Chinese ESL student writers. Language Teaching
201–218.
Research, 9(3), 321–342.
Huang, J. (2012). Using generalizability theory to examine the accuracy and validity of
Hu, G., & Lam, S. T. E. (2010). Issues of cultural appropriateness and pedagogical
large-scale ESL writing assessment. Assessing Writing, 17, 123–139.
efficacy: Exploring peer review in a second language writing class. Instructional
Huang, J., & Foote, C. J. (2010). Grading between lines: What really impacts professors’
Science, 38, 371–394.
holistic evaluation of ESL graduate student writing? Language Assessment Quarterly,
Huisman, B., Saab, N., van Driel, J., & van den Broek, P. (2018). Peer feedback on
7, 219–223.
academic writing: Undergraduate students’ peer feedback role, peer feedback
Azarnoosh, M. (2013). Peer assessment in an EFL context: Attitudes and friendship bias.
perceptions and essay performance. Assessment & Evaluation in Higher Education, 43,
Language Testing in Asia, 3, 1–10.
955–968. https://doi.org/10.1080/02602938.2018.1424318
Baker, K. M. (2016). Peer review as a strategy for improving students’ writing process.
Hyland, K., & Hyland, F. (2006). Feedback on second language students’ writing.
Active Learning in Higher Education, 17(3), 170–192.
Language Teaching, 39, 83–101. https://doi.org/10.1017/S0261444806003399
Berg, I. V. D., Admiraal, W., & Pilot, A. (2006). Peer assessment in university teaching:
Jacobs, G. M., Curtis, A., Braine, G., & Huang, S. Y. (1998). Feedback on student writing:
Evaluating seven course designs. Assessment and Evaluation in Higher Education, 31
Taking the middle path. Journal of Second Language Writing, 7(3), 307–317.
(1), 19–36.
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and
Biber, D., Nekrasova, T., & Horn, B. (2011). The effectiveness of feedback for L1-English
educational consequences. Educational Research Review, 2, 130–144.
and, L2-writing development: A meta-analysis. TOEFL iBT Research Report, 14.
Lee, M.-K. (2015). Peer feedback in second language writing: Investigating junior
Boud, D., & Molloy, E. (2013). Rethinking models of feedback for learning: The challenge
secondary students’ perspectives on inter-feedback and intra-feedback. System, 55
of design. Assessment & Evaluation in Higher Education, 38(6), 698–712.
(Suppl. C), S1–S10.
Brennan, R. L. (2001). Statistics for social science and public policy: Generalizability theory.
Lei, Z. (2017). Salience of student written feedback by peer-revision in EFL writing class.
New York: Springer-Verlag.
English Language Teaching, 10(2), 151–157.
Cameron, J., & Pierce, W. D. (1994). Reinforcement reward and intrinsic motivation: A
Lepper, M. R., Keavney, M. M., & Drake, M. (1996). Intrinsic motivation and extrinsic
meta-analysis. Review of Educational Psychology, 35, 459–477.
rewards: Acommentary on Cameron and Pierce’s meta-analysis. Review of
Carless, D. (2015). Excellence in university assessment: Learning from award-winning
Educational Research, 66, 5–22.
practice. London: Routledge.
Li, H., & He, L. (2015). A comparison of EFL raters’ essay-rating processes across two
Carless, D., & Boud, D. (2018). The development of student feedback literacy: Enabling
types of rating scales. Language Assessment Quarterly, 12(2), 178–212.
uptake of feedback. Assessment & Evaluation in Higher Education, 43(8), 1315–1325.
Liu, J., & Hansen, J. G. (2002). Peer response in second language writing classrooms. Ann
Carson, J. G., & Nelson, G. L. (1994). Writing groups: Cross-cultural issues. Journal of
Arbor, MI: University of Michigan Press.
Second Language Writing, 3(1), 17–30.
Lundstrom, K., & Baker, W. (2009). To give is better than to receive: The benefits of peer.
Caulk, N. (1994). Comparing teacher and student responses to written work. TESOL
review to the reviewer’s own writing. Journal of Second Language Writing, 18(1),
Quarterly, 28(1), 181–188.
30–43.
Chang, C. Y. (2015). Teacher modeling on EFL reviewers’ audience-aware feedback and
Mendoca, C., & Johnson, K. (1994). Peer review negotiations: Revision activities in ESL
affectivity in L2 peer review. Assessing Writing, 25, 2–21.
writing instruction. TESOL Quarterly, 28(4), 745–768.
Chen, C. (2010). Graduate students’ self-reported perspectives regarding peer feedback
Min, H. T. (2006). The effects of training peer review on EFL students’ revision types and
and feedback from writing consultants. Asia Pacific Education Review, 11, 151–158.
writing quality. Journal of Second Language Writing, 15, 118–141.
Cho, K., & MacArthur, C. (2011). Learning by reviewing. Journal of Educational
Mittan, R. (1989). The peer review process: harnessing students’ communicative power.
Psychology, 103(1), 73–84.
In D. Johnson, & D. Roen (Eds.), Richness in writing: Empowering ESL students (pp.
Zhao, C., & Huang, J. (2020). The impact of the scoring system of a large-scale
207–219). New York: Longman.
standardized EFL writing assessment on its score variability and reliability:
National College English Testing Committee. (2016). Syllabus for college English test –
Implications for assessment policy makers. Studies in Educational Evaluation, 67,
Band 4. Shanghai, China: Shanghai Language Education Press.
Article 100911.
Nelson, G. L., & Carson, J. G. (1998). ESL students’ perceptions of effectiveness in peer
Liu, Y., & Huang, J. (2020). The quality assurance of a national English writing
response groups. Journal of Second Language Writing, 7(2), 113–131. https://doi.org/
assessment: Policy implications for quality improvement. Studies in Educational
10.1016/S1060-3743(98)90010-8
Evaluation, 67, Article 100941.
Niu, R., & Zhang, R. (2018). A case study of focus, strategy and efficacy of anL2 writing
Han, T., & Huang, J. (2017). Examining the impact of scoring methods on the
teacher’s written feedback. Journal of PLA University of Foreign Languages, 41(3),
institutional EFL writing assessment: A Turkish perspective. PASAA, 53, 1–36.
91–99.
Connor, U., & Asenavage, K. (1994). Peer response groups in ESL writing classes: How
Patchan, M. M., Schunn, C. D., & Clark, R. J. (2018). Accountability in peer assessment:
much impact on revision. Journal of Second Language Writing, 3(3), 256–276.
Examining the effects of reviewing grades on peer ratings and peer feedback. Studies
Cotterall, S., & Cohen, R. (2003). Scaffolding for second language writers: Producing an
in Higher Education, 43(12), 2263–2278.
academic essay. ELT Journal, 57(2), 158–166.
Paulus, T. M. (1999). The effect of peer and teacher feedback on student writing. Journal
Creswell, J. W. (2014). Research design: Qualitative, quantitative, and mixed methods
of Second Language Writing, 8(3), 265–289.
approaches (4th ed.). Thousand Oaks, CA: SAGE Publications.
Price, M., Handley, K., & Millar, J. (2011). Feedback: Focusing attention on engagement.
Crick, J. E., & Brennan, R. L. (1983). GENOVA: A general purpose analysis of variance
Studies in Higher Education, 36(8), 879–896.
system. Version 2.1. Iowa City, JA: American College Testing Program.
Rahimi, M. (2013). Is training student reviewers worth its while? A study of how training
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of
influences the quality of students’ feedback and writing. Language Teaching Research,
behavioral measurements: Theory of generalizability for scores and profiles. New York:
17, 67–89. https://doi.org/10.1177/1362168812459151
Wiley.

14
W. Wu et al. Studies in Educational Evaluation 73 (2022) 101140

Ruegg, R. (2015). The relative effects of peer and teacher feedback on improvement in Yao, Y., Guo, N. S., Li, C., & McCampbell, D. (2020). How university EFL writers’ beliefs
EFL students’ writing ability. Linguistics and Education, 29, 73–82. in writing ability impact their perceptions of peer assessment: Perspectives from
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, implicit theories of intelligence. Assessment & Evaluation in Higher Education, 1–17.
CA: Sage. https://doi.org/10.1080/02602938.2020.1750559
Séror, J. (2011). Alternative sources of feedback and second language writing Yu, S. (2019). Learning from giving peer feedback on postgraduate theses: Voices from
development in university content courses. Canadian Journal of Applied Linguistics, 14 Master’s students in the Macau EFL context. Assessing Writing, 40, 42–52.
(1), 118–143. Yu, S., & Hu, G. (2017). Understanding university students’ peer feedback practices in
Tian, L., & Zhou, Y. (2020). Learner engagement with automated feedback, peer EFL writing: Insights from a case study. Assessing Writing, 33, 25–35.
feedback and teacher feedback in an online EFL writing context. System, 91, 1–14. Yu, S., & Lee, I. (2016a). Peer feedback in second language writing (2005–2014).
Tsui, A. B. M., & Ng, M. (2000). Do secondary L2 writers benefit from peer comments? Language, Teaching, 49(4), 461–493.
Journal of Second Language Writing, 9, 147–170. Yu, S., & Lee, I. (2016b). Understanding the role of learners with low English language
van Ginkel, S., Gulikers, J., Biemans, H., & Mulder, M. (2017). The impact of the proficiency in peer feedback of second language writing. TESOL Quarterly, 50(2),
feedback source on developing oral presentation competence. Studies in Higher 483–494.
Education, 42(9), 1671–1685. https://doi.org/10.1080/03075079.2015.1117064 Yu, S., & Lee, I. (2016c). Exploring Chinese students’ strategy use in a cooperative peer
Wang, W. (2014). Students’ perceptions of rubric-referenced peer feedback on EFL feedback writing group. System, 58, 1–11.
writing: A longitudinal inquiry. Assessing Writing, 19, 80–96. Zhang, S. (1995). Reexamining the affective advantage of peer feedback in the ESL
Wu, Z. (2019). Lower English proficiency means poorer feedback performance? A mixed- writing class. Journal of Second Language Writing, 4(3), 209–222.
methods study. Assessing Writing, 41, 14–24. Zhao, H. (2010). Investigating learners’ use and understanding of peer and teacher
Xu, Y., & Carless, D. (2017). ‘Only true friends could be cruelly honest’: Cognitive feedback on writing: A comparative study in a Chinese English writing classroom.
scaffolding and social-affective support in teacher feedback literacy. Assessment & Assessing Writing, 15(1), 3–17.
Evaluation in Higher Education, 42(7), 1082–1094. Zimmerman, B. J., & Kitsantas, A. (2002). Acquiring writing revision and self-regulatory
Yang, M., Badger, R., & Yu, Z. (2006). A comparative study of peer and teacher feedback skill through observation and emulation. Journal of Educational Psychology, 94,
in a Chinese EFL writing class. Journal of Second Language Writing, 15, 179–200. 660–668.

15

You might also like