Does Peer Assessment Promote Student Learning A Meta-Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Assessment & Evaluation in Higher Education

ISSN: 0260-2938 (Print) 1469-297X (Online) Journal homepage: www.tandfonline.com/journals/caeh20

Does peer assessment promote student learning?


A meta-analysis

Hongli Li, Yao Xiong, Charles Vincent Hunter, Xiuyan Guo & Rurik Tywoniw

To cite this article: Hongli Li, Yao Xiong, Charles Vincent Hunter, Xiuyan Guo & Rurik Tywoniw
(2020) Does peer assessment promote student learning? A meta-analysis, Assessment &
Evaluation in Higher Education, 45:2, 193-211, DOI: 10.1080/02602938.2019.1620679

To link to this article: https://doi.org/10.1080/02602938.2019.1620679

Published online: 02 Jun 2019.

Submit your article to this journal

Article views: 9309

View related articles

View Crossmark data

Citing articles: 114 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=caeh20
ASSESSMENT & EVALUATION IN HIGHER EDUCATION
2020, VOL. 45, NO. 2, 193–211
https://doi.org/10.1080/02602938.2019.1620679

Does peer assessment promote student learning?


A meta-analysis
Hongli Lia , Yao Xiongb, Charles Vincent Hunterc , Xiuyan Guod and Rurik
Tywoniwa
a
Georgia State University, Atlanta, GA, USA; bImbellus, Los Angeles, CA, USA; cAdvancED/Measured
Progress, Alpharetta, GA, USA; dEmory & Henry College, Emory, VA, USA

ABSTRACT KEYWORDS
In recent years, there has been an increasing use of peer assessment in Peer assessment; effect;
classrooms and other learning settings. Despite the prevailing view that student learning;
peer assessment has a positive effect on learning across empirical stud- meta-analysis
ies, the results reported are mixed. In this meta-analysis, we synthesised
findings based on 134 effect sizes from 58 studies. Compared to stu-
dents who do not participate in peer assessment, those who participate
in peer assessment show a .291 standard deviation unit increase in their
performance. Further, we performed a meta-regression analysis to exam-
ine the factors that are likely to influence the peer assessment effect.
The most critical factor is rater training. When students receive rater
training, the effect size of peer assessment is substantially larger than
when students do not receive such training. Computer-mediated peer
assessment is also associated with greater learning gains than the
paper-based peer assessment. A few other variables (such as rating for-
mat, rating criteria and frequency of peer assessment) also show notice-
able, although not statistically significant, effects. The results of the
meta-analysis can be considered by researchers and teachers as a basis
for determining how to make effective use of peer assessment as a
learning tool.

Introduction
Peer assessment encompasses processes whereby students evaluate or are evaluated by their
peers. Extensive research has been published on the reliability and validity of peer assessment
results (e.g. Cho, Schunn, and Wilson 2006; Chang et al. 2011; Li et al. 2016). However, less atten-
tion has been paid to the learning outcomes of peer assessment. In some studies, positive effects
were found (e.g. Perera, Mohamadou, and Kaur 2010), whereas in others, no effects were
reported (e.g. Sadler and Good 2006). There are a few research syntheses on the effects of peer
assessment, but most of these focus on a specific setting, which makes them less generalisable
to other settings. For example, in some studies, only English Language Learners’ (ESL) writing
outcomes are synthesised (e.g. Biber, Nekrasova, and Horn 2011); in others, the sole focus is on
technology-mediated peer assessment (Chen 2016); and in others, only higher education settings
(e.g. Topping 1998; van Gennip, Segers, and Tillema 2009; van Zundert, Sluijsmans, and van
Merrienboer 2010) or K-12 (e.g. Hodgson 2010; Sanchez et al. 2017) are considered. Further,

CONTACT Hongli Li hli24@gsu.edu


ß 2019 Informa UK Limited, trading as Taylor & Francis Group
194 H. LI ET AL.

there are a few comprehensive literature reviews, but these do not synthesise the effect sizes of
peer assessment on learning in quantitative terms (e.g. Sebba et al. 2008; van Gennip, Segers,
and Tillema 2009; Chen 2016; Topping 2017). So far, there is no comprehensive meta-analysis
focused on the effect of peer assessment on learning in a general sense across multiple educa-
tional settings.
For this study, we performed a meta-analysis to examine whether and how peer assessment
promotes student learning. Two research questions are addressed: (1) What is the general effect
of peer assessment on learning? (2) What factors are likely to influence such effect? The findings
are of interest to practitioners on the best practices in implementing peer assessment activities
and in understanding which factors to consider to promote student learning in this context.

Literature review
Peer assessment is defined as ‘an arrangement in which individuals consider the amount, level,
value, worth, quality or success of the products or outcomes of learning of peers of similar sta-
tus’ (Topping 1998, p. 250). In addition to increasing teachers’ efficiency in grading, peer assess-
ment is advocated as an effective pedagogical strategy for facilitating students’ learning (Dochy
and McDowell 1997). Researchers trace peer assessment to Vygotsky’s (1978) social development
theory, which emphasises the important role of social interaction in learning. In Vygotsky’s social
development view, children’s development occurs on the social level through interacting with
peers, teachers and/or parents in a community. A rich social environment would foster student
interaction and hence promote student learning and development. Further, it is claimed that the
peer assessment process naturally constructs a favourable instructional environment for peers to
work within the zone of proximal development (ZPD) (Villamil and Guerrero 1996).
It has been advocated that peer assessment can benefit learners in both cognitive and non-
cognitive aspects. For example, peer assessment fosters students’ cognitive development
(Topping and Ehly 2001), evaluative and critical ability (Sluijsmans et al. 2002), metacognitive
awareness (Kim and Ryu 2013) and social-affective development (van Gennip, Segers, and
Tillema 2009). More importantly, students can become more autonomous learners if they are
actively involved in peer assessment (Cheng and Warren 1999; Bloxham and West 2004). Despite
the theoretical support, empirical evidence on the effects of peer assessment on student learning
is mixed.
A possible reason for the mixed results is that peer assessment is conducted differently under
different contexts. Gielen, Dochy and Onghena (2011) used 20 variables to describe the different
features of peer assessment. They classified these 20 variables into five categories: decisions con-
cerning the use of peer assessment, the link between peer assessment and other elements in
the learning environment, interactions between peers, the composition of assessment groups
and the management of the assessment procedure. Based on previous work (Li et al. 2016), in
the present meta-analysis, we grouped a list of variables into three categories – peer assessment
setting, peer assessment procedure and assessors and assesses – to describe the many aspects
of the peer assessment process (Table 1).
The first category of variables (i.e. grade level, subject area and task rated) are related to peer
assessment setting. For grade level where peer assessment takes place, it is believed that com-
pared with K-12 students, students in higher education can produce more accurate peer ratings
because of their stronger reflection skills (Falchikov and Boud 1989). However, it is not clear
which group of students gain more in terms of learning. Further, peer assessment has been
operating in different subject areas. We are interested in examining whether peer assessment is
equally effective across subject areas (such as social sciences and arts, sciences and engineering
and medical and clinical). We are also interested in exploring whether the effects of peer
Table 1. Variables, frequencies and effect size comparison.
No. of Mean
Category Variable Category effect sizes effect size SE 95% CI R2
Methodology Research design Quasi-experimental 98 .320 .058 [.206, .434] 0%
Experimental 36 .216 .094 [.033, .400]
Control condition Peer assessment vs. no assessment 64 .330 .071 [.190, .469] 0%
Peer assessment vs. teacher assessment 50 .260 .082 [.100, .421]
Peer assessment vs. self-assessment 20 .239 .132 [.020, .499]
Peer assessment setting Grade level Higher education 102 .331 .056 [.221, .441] 0%
K-12 31 .150 .105 [.056, .356]
Subject area Social science and arts 88 .284 .062 [.163, .406] 0%
Science and engineering 34 .345 .098 [.154, .537]
Medical and clinical 12 .197 .156 [.108, .503]
Task rated Essay writing 66 .302 .072 [.162, .443] 0%
Project, exam, or other 68 .281 .069 [.147, .415]
Peer assessment procedure Assessment mode Paper-based 100 .237 .056 [.127, .348] 2.97%
Computer-mediated 34 .452 .097 [.261, .643]
Rating format Only scores 23 .374 .116 [.147, .602] .27%
Only comments 51 .176 .082 [.015, .337]
Both scores and comments 60 .349 .073 [.206, .491]
Feedback mode for comments Written feedback 71 .256 .065 [.129, .382] 0%
Oral feedback 21 .205 .120 [.030, .440]
Both 19 .415 .126 [.168, .663]
Rater training Peer raters did not receive training 37 .017 .090 [.160, .194] 9.18%
Peer raters received training 97 .396 .056 [.286, .505]
Rating criteria Without explicit rating criteria 16 .136 .148 [.154, .425] .17%
With explicit rating criteria 118 .311 .052 [.208, .413]
Frequency One peer assessment session 53 .206 .078 [.053, .359] 1.33%
More than one peer assessment session 81 .347 .063 [.223, .470]
Requirement Peer assessment is compulsory 105 .293 .056 [.183, .404] 0%
Peer assessment is voluntary 29 .284 .105 [.079, .489]
Assessors and assessees Reciprocity Assessor only 20 .161 .124 [.082, .403] 0%
Assessee only 6 .197 .233 [.260, .654]
Both assessor and assessee 108 .323 .055 [.214, .431]
Number of assessors per assignment One assessor per assignment 68 .321 .070 [.185, .457] 0%
More than one assessor per assignment 66 .261 .070 [.122, .399]
Matching of assessors Raters and assessees are matched 58 .337 .075 [.189, .485] 0%
and assessees at random
Raters and assessees are not matched at random 76 .256 .066 [.128, .385]
ASSESSMENT & EVALUATION IN HIGHER EDUCATION

Anonymity Anonymous rating 45 .383 .086 [.215, .550] 0%


Non-anonymous rating 89 .246 .060 [.128, .364]
p<.05; p<.01; p<.001.
p values for the between-category effect size comparisons for each variable were all higher than .05 with the exception of rater training (p¼.0004).
195
196 H. LI ET AL.

assessment differ depending on the type of task on which peers are rated, such as essay writing,
a project, an examination or other performance.
The second category of variables are related to peer assessment procedure, including assessment
mode, rating format, feedback mode for comments, rater training, rating criteria, frequency of peer
assessment and requirement. In terms of the assessment mode, computer-mediated peer assessment
has been found to be more efficient and convenient than paper-based peer assessment (e.g. Wen
and Tsai 2008). However, it is still unknown as to whether computer-mediated peer assessment could
result in greater learning gains with all the affordances possible with technologies. In addition, the
rating format may also be relevant. Previous research shows that the inclusion of free text comments
in peer assessment improves the accuracy of peer ratings in general (Li et al. 2016; Patchan, Schunn,
and Clark 2018). We, therefore, hypothesise that in comparison to providing only scores, providing
scores accompanied by comments could lead to greater gains in learning. Finally, we anticipate that
the feedback mode for comments (i.e. written feedback, oral feedback or both) may also influence
the effect of peer assessment on student learning.
As shown in Table 1, the next two variables on the list are rater training and rating criteria.
When peer raters receive training and when explicit rating criteria are used, the peer assessment
is more organised in nature (Li et al. 2016). Thus, explicit rating criteria and training provided to
peer raters are expected to promote greater gains in learning. The next variable is frequency of
peer assessment. We are interested in examining whether more peer assessment sessions lead to
greater learning gains. Finally, whether peer assessment is compulsory or voluntary matters.
Research shows that if peer assessment is compulsory, students might feel more accountable,
which makes them responsible for the assessment they provide to their peers (Patchan, Schunn,
and Clark 2018). However, on the other hand, when peer assessments are voluntary, students are
more autonomous and motivated and thus may benefit more from the peer assessment process.
The third category of variables are related to the assessors and assessees, including reciprocity,
group work, number of assessors per assignment, matching of assessors and assessees and anonym-
ity. Among these variables, the first one pertains to whether the peer assessment is reciprocal (the
students provide and receive assessment) or non-reciprocal (the students either provide or receive
assessment). We hypothesise that learning gains are greater when students both provide and receive
assessment (Cho and Schunn 2007). This occurs in instances where students reflect on their own
learning by providing feedback to others and improve their own work by incorporating feedback
from others (Patchan, Schunn, and Correnti 2016). The second variable is about whether the number
of assessors per assignment is one or more than one. When there are more than one assessor per
assignment, assessees would be able to reflect on their work possibly from different perspectives and
thus may benefit more than when feedback is provided by only one assessor. The third variable refers
to whether or not the assessors and assessees are matched at random, and the fourth variable refers
to whether or not ratings are given and received anonymously. It may be that random matching and
anonymity create a safe environment in which raters can evaluate their peers honestly and critically
(Kane and Lawler 1978). This safe environment may permit students to achieve more learning gains.
In summary, given the many aspects of peer assessment processes, in this meta-analysis, we
examine whether the variables described influence the effect of peer assessment on student
learning. The results of the meta-analysis will provide information regarding which factors are
most likely to promote learning in the context of peer assessment practice.

Method
Selecting studies
The criteria used to select studies for inclusion in our meta-analysis are as follows. First, eligible
studies must have an experimental or quasi-experimental design with a control group and an
experimental group. We did not consider any single-group studies. Second, only studies with at
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 197

least one measured cognitive outcome were considered. The outcomes could be examination
scores or performance-based assessment scores targeting cognitive skills. Third, only studies with
sufficient information to calculate effect sizes were considered. Finally, we set a wide timeframe
from 1950 to 2017, because our preliminary search did not find any relevant studies before
1950. To be as inclusive as possible, both published and unpublished studies were eligible so
that grey literature could be included.
Using various key words such as peer assessment, peer evaluation, peer rating, peer grading,
peer scoring, peer marking and peer feedback, we searched several well-known online databases
(i.e. ERIC, PsycINFO, JSTOR and ProQuest) as well as Google Scholar. By examining the title and
the abstract, we found 350 relevant studies. Through several rounds of careful screening, we
determined that only 58 met all our inclusion criteria. The most common reason for excluding a
study was that it did not focus on the learning effect of peer assessment. Also, among studies
examining the learning effect of peer assessment, only a small proportion adopted an experi-
mental or quasi-experimental design. Further, among the studies that used an experimental or
quasi-experimental design, some did not provide sufficient information to calculate effect sizes.
After completing the search, we scrutinised reference lists of relevant materials but did not find
any extra studies via this approach.

Coding procedure
We extracted information to calculate effect size and also coded a list of variables for each study
to explain the variation of the effect sizes. As shown in Table 1, in addition to the variables
related to peer assessment setting, the peer assessment procedure and the assessors and asses-
sees, we coded variables related to the research methodology. The first methodology variable
pertains to study design, i.e. whether the study has an experimental or quasi-experimental
design. The second methodology variable pertains to the control condition of peer assessment,
i.e. no assessment, teacher assessment and self-assessment. On this basis, there are three kinds
of comparisons: peer assessment versus no assessment, peer assessment versus teacher assess-
ment and peer assessment versus self-assessment.
In most cases, the experimental condition is peer assessment only. Occasionally, the experimen-
tal condition includes peer assessment plus an additional treatment. We deem this acceptable as
long as the corresponding control condition also includes the same additional treatment, so that
the pure treatment is peer assessment only. For example, in Birjandi and Siyyari (2010), the experi-
mental condition is peer assessment plus revision instruction, and the control condition is revision
instruction only. This comparison would be peer assessment versus no assessment.
We included multiple effect sizes from one study as long as they were mutually exclusive (Biber,
Nekrasova, and Horn 2011; Graham, Hebert, and Harris 2015). When there were multiple control con-
ditions or multiple experimental conditions, we coded multiple effect sizes corresponding to the con-
ditions. For example, one experimental condition (peer assessment) and three control conditions (no
assessment, teacher assessment and self-assessment) were included in Sadeghi and Khonbi (2015).
We thus calculated the three effect sizes: peer assessment versus no assessment, peer assessment ver-
sus teacher assessment and peer assessment versus self-assessment. Sometimes, for a study with an
analytic rubric, such as separate scores were assigned to specific aspects of student performance, we
calculated a separate effect size for each aspect. However, when a holistic measure, such as a single
score for overall performance, was available, we used one overall effect size.
To ensure coding reliability, each article was coded by at least two coders first independently and
then collaboratively. When a discrepancy in coding arose, the two coders would discuss the case until
they reached an agreement. However, when a disagreement could not be resolved by the two coders,
the lead author was consulted. Finally, the lead author scrutinised the coding for each article to ensure
coding accuracy.
198 H. LI ET AL.

Effect size calculation and data analysis procedure


The sample sizes, mean outcome scores and standard deviations for both the experimental group
and the control group of the pre-test (if any) and the post-test were extracted from each article to cal-
culate effect size statistics, i.e. the standardised mean difference (Cohen 1988). The formula used to
calculate the effect size (i.e. Cohen’s d) is as follows:
 
d ¼ Y Epost  Y Cpost =Spooled (1)

where Y Epost is the post-test mean outcome for the experimental group; Y Cpost is the post-test
mean outcome for the control group; Spooled is the pooled standard deviation based on the
post-test results.
When a pre-test is available, the numerator of Equation (1) becomes
   
Y Epost  Y Epre  Y Cpost  Y Cpre (2)

where Y Epre is the pretest mean outcome for the experimental group; Y Cpre is the pretest mean
outcome for the control group.
The variance of the effect size can be calculated using the following equation:

nE þ nC d2
Vd ¼ þ (3)
nE nC 2ðnE þ nC Þ

where nE is the experimental group sample size and nC is the control group sample size in the
post-tests.
Cohen’s d has a slight upward bias, especially in small samples. Hedges (1981) proposed
removing this bias by using correction factor J:
3
J¼ 1 (4)
4ðnE þ nc Þ  9
Hedges’g ¼ d  J (5)

Variance of Hedges’ g ¼ Vd  J2 (6)

Using the above procedure, we calculated Hedge’s g and the variance of Hedges’ g and ana-
lysed the data using the R Package metaphor (Viechtbauer 2010). We did not detect any outliers
or influential cases. To begin, we combined the overall effect size of peer assessment using a
random-effects model because we assumed that the true effect size could vary from study to
study (Borenstein et al. 2009). Then, we conducted a meta-regression to examine how different
study characteristics explain the effects of peer assessment on learning.
In the meta-regression analysis, the effect size was the dependent variable, and the variables
listed in Table 1 were the independent variables. In the first step, we examined each variable by
running a meta-regression with one independent variable at a time. This is an alternative
approach to a subgroup analysis. For each variable, the R package reports the following informa-
tion: (a) the mean and 95% confidence interval of the effect size for each category, and p value
to indicate whether each effect size is significantly different from zero; (b) the p value to indicate
whether the effect size of each category (e.g., computer-mediated peer assessment) is signifi-
cantly different from the effect size of another category (e.g., paper-based peer assessment); and
(c) the R2 value to indicate the percentage of effect size variation explained by the variable. In
the second step, we ran a meta-regression with multiple independent variables simultaneously.
Here, we included only variables that had a noticeable impact on explaining the effect size vari-
ation in the first step.
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 199

Results
Overall effect size of peer assessment
In total, 134 effect sizes were extracted from 58 studies. In Appendix A, we present the descrip-
tive characteristics of each study included in the analysis. In Appendix B, we list the full referen-
ces of all the studies included in the meta-analysis. As shown in the scatter plot of the effect
sizes in Figure 1, there were more positive effect sizes than the negative effect sizes.
Based on the restricted maximum likelihood estimation, the estimated mean effect size was
.291, with a 95% confidence interval of .194 to .388. In other words, compared to students who
did not participate in peer assessment, those who did participate showed a .291 standard devi-
ation unit increase in their general performance. This effect size was statistically different from
zero at the .001 level. The Q statistic was 628.095 (df ¼ 133, p<.001). This significant Q statistic
indicates that the variation in effect sizes was much greater than what could be explained by
sampling error. In addition, the I2 of 82.13% shows that 82.13% of the effect size variation was
due to true differences in the effects. This result again confirms that it is appropriate to use the
random-effects model for estimation.

Meta-regression with one variable at a time


In Table 1, we present the information for each variable and the categories within each variable.
The R2 was zero for all the methodology variables, the peer assessment setting variables and the
variables related to the assessors and assessees. The R2 value was larger than zero for some vari-
ables related to the peer assessment procedure (i.e. assessment mode, rating format, rater train-
ing, rating criteria and frequency of peer assessment). A comparison between the effect sizes of

Figure 1. Scatter plot of effect sizes.


200 H. LI ET AL.

the categories within each variable showed that the only statistically significant difference was
the comparison between receiving rater training and not receiving rater training (p¼.002).

Methodology variables
The mean effect size of the studies with a quasi-experimental design (g¼.320, p<.001) appeared
to be larger than that of the studies with a true experimental design (g¼.216, p<.05). However,
this difference was not statistically significant. In terms of different control conditions, the mean
effect size of peer assessment versus no assessment was .330 (p<.001), and the mean effect size
of peer assessment versus teacher assessment was .260 (p<.01). These results indicate that peer
assessment was not only more effective than no assessment but also more effective than teacher
assessment. The mean effect size of peer assessment versus self-assessment was .239, not signifi-
cantly different from zero. Despite the observed differences, neither research design nor control
condition explained any of the effect size variation as evidenced by the R2 value of zero.

Peer assessment setting


The effect size appeared to be larger for courses in higher education settings (g¼.331, p<.001)
than for courses in K-12 settings (g¼.150, p>.05). In terms of subject area, the effect size for
social science and arts was .284 (p<.001), and the effect size for science and engineering was
.345 (p<.01). However, the effect size for medical and clinical (g¼.197, p>.05) was not signifi-
cantly different from zero. In regard to the task rated by peers, the effect size for essay writing
was similar to the effect size for project, examination or other. Nevertheless, despite the
observed difference, none of these three variables (grade level, subject area and task rated)
explained any effect size variation, as evidenced by the R2 value of zero.

Peer assessment procedure


The effect size appeared to be larger when peer assessment was computer-mediated (g¼.452,
p<.001) than when peer assessment was paper-based (g¼.237, p<.001). Although the two effect
sizes did not differ significantly from each other, assessment mode explained 2.97% of the effect
size variation alone. In regard to rating format, the effect size appeared to be smaller when only
comments were provided (g¼.176, p<.05) compared to when only scores were provided (g¼.374,
p<.01), or when both scores and comments were provided (g¼.349, p<.001). Nevertheless, the
three effect sizes were not significantly different from each other, and rating format explained only
.27% of the effect size variation. Furthermore, when both oral and written feedback were provided,
the effect size (g¼.415, p<.01) appeared to be larger than when only written feedback was provided
(g¼.256, p<.001) or when only oral feedback was provided (g¼.205, p>.05). Still, the feedback
mode for comments did not explain any of the effect size variation.
The only statistically significant factor among the peer assessment procedure variables was
whether peer raters receive training. The effect size when raters received training was .396
(p<.001), significantly higher than the effect size when peer raters did not receive training
(g¼.017, p>.05). This variable alone explained as much as 9.18% of the effect size variation. In
addition, when explicit rating criteria were provided, the effect size (g¼.311, p<.001) appeared
to be larger than the effect size when no explicit rating criteria were provided (g¼.136, p>.05).
However, only .17% of the effect size variation was explained by this variable.
When there was more than one peer assessment session, the effect size (g¼.347, p<.001)
appeared to be larger than that for one peer assessment session (g¼.206, p<.01). This variable
explained 1.33% of the effect size variation. However, whether the peer assessment was compul-
sory or voluntary did not show an impact on effect size, and the corresponding R2 value
was zero.
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 201

Assessors and assessees


Despite some observed differences, none of the variables in this category was statistically signifi-
cant, and the R2 value was zero for each variable. When a student was both an assessor and an
assessee, the effect size (g¼.323, p<.001) appeared to be larger than when the student was an
assessor only (g¼.161, p>.05) or an assessee only (g¼.197, p>.05). In addition, the effect sizes
were similar regardless of whether the number of assessors per assignment was one or more
than one. Further, when assessors and assessees were matched at random, the effect size
(g¼.337, p<.001) appeared to be larger than when assessors and assessees were not matched at
random (g¼.256, p<.001). When the peer assessment was anonymous, the effect size (g¼.383,
p<.001) appeared to be larger than when peer assessment was non-anonymous
(g¼.246, p<.001).

Meta-regression with multiple variables


In this step, given the large number of variables, we included only the variables with an R2 value
larger than 0 (i.e. assessment mode, rating format, rater training, rating criteria and frequency of
peer assessment). To offer a meaningful interpretation of the results, we always included the
methodology variables (experimental design and control conditions) in the meta-regression. As
shown in Table 2, categorical variables were dummy coded. These variables were used as predic-
tors in the meta-regression. For example, rating format has three categories (i.e. scores only,
comments only and both scores and comments). We used ‘scores only’ as reference variable and
included two predictors in the meta-regression, i.e. ‘comments only’ and ‘both scores
and comments’.
In the full model, rater training and assessment mode were statistically significant. With all
else controlled for in the model, when raters received training, the effect size was larger by a
.338 standard deviation unit than the effect size when raters did not receive training. Similarly,

Table 2. Results of meta-regression with multiple variables.


Model Full model Final model
Variables Regression Standard Regression Standard
coded Predictors coefficient error p value coefficient error p value
Intercept 024 .231 .918 .070 .130 .593
Research Experimental design .125 .121 .305 .112 .119 .347
design (reference: quasi-experimental design)
Control Peer assessment vs. .070 .117 .552 .101 .111 .363
condition teacher assessment
(reference: peer assessment vs.
no assessment)
Peer assessment vs. self-assessment .018 .149 .906 .034 .147 .819
(reference: peer assessment vs.
no assessment)
Assessment Computer-mediated (reference: paper-based) .244 .121 .044 .229 .110 .038
mode
Rating format Comments only (reference: scores only) .165 .156 .292
Both scores and comments (reference: .101 .156 .292
scores only)
Rater training Peer raters received training (reference: peer .338 .114 .003 .347 .110 .002
raters did not receive training)
Rating criteria With explicit criteria (reference: without .030 .167 .859
explicit criteria)
Frequency More than one peer assessment session .172 .119 .148 .207 .107 .054
(reference: one peer
assessment session)
p<.05; p<.01; p<.001.
202 H. LI ET AL.

when peer assessment was computer-mediated, the effect size was larger by a .244 standard
deviation unit than the effect size when peer assessment was paper-based.
To achieve greater estimation power, we dropped the non-significant predictors one at a
time, starting with those with the largest p values. In the final model, rater training and assess-
ment mode remained statistically significant and their corresponding coefficients did not change
much. The frequency of peer assessment showed marginal significance with a p value of .054.
When there was more than one peer assessment session, the effect size was larger by a .207
standard deviation unit than the effect size for only one peer assessment session. The method-
ology variables were not statistically significant in any of the cases but were retained in the
model to facilitate interpretation.

Discussion
Effects of peer assessment on student learning
This meta-analysis shows that the average effect size of peer assessment on student learning
is positive and nontrivial. Compared to students who did not receive peer assessment, stu-
dents who did receive peer assessment showed a .291 standard deviation unit improvement
in their general performance. This result aligns with the finding from Sanchez et al. (2017). In
their meta-analysis of 3rd to 12th grade students’ classroom grading, based on 11 effect sizes
from 7 studies, they found that students who engaged in peer-grading performed better on
subsequent tests than students who did not, with an effect size difference of .29. Although
the inclusion criteria for our meta-analysis are broader than those used by Sanchez et al., the
same conclusion is reached in regard to the positive effect of peer assessment on stu-
dent learning.
Furthermore, we also examined the effect sizes for the three control conditions: no assess-
ment, teacher assessment and self-assessment. We found that peer assessment not only gener-
ated a larger effect than no assessment but also showed a more positive effect than teacher
assessment. Despite the challenges associated with conducting peer assessments (Topping
2003), involvement in peer assessment does enhance student learning (Dochy, Segers, and
Sluijsmans 1999; Sebba et al. 2008; Li, Liu, and Steckelberg 2010). As summarised by Topping
(1998), peer assessment yields many benefits to students, such as constructive reflection,
increased time on task, more attention on work and a greater sense of accountability and
responsibility. Compared to peer assessment, teacher assessment is less effective in facilitating
students’ development toward becoming self-reliant learners (Harrison, O’Hara, and McNamara
2015). In this regard, it is not surprising that in comparison with teacher assessment, peer assess-
ment could bring greater learning gains, as evidenced in the present meta-analysis. However, in
this meta-analysis, we included only studies involving teacher assessment as a control condition.
In future research, it will be important to perform a comprehensive analysis on the effect of
teacher assessment per se.
In addition, we found that the effect of peer assessment is not significantly different
from the effect of self-assessment. However, the number of effect sizes involving self-assess-
ment was relatively small in the present meta-analysis, and, therefore, our result pertaining
to self-assessment is not conclusive. Sanchez et al. (2017) found the average effect size of
self-assessment to be .34 and the average effect size of peer assessment to be .29. However,
although the effect size of self-assessment was slightly larger than that of peer assessment,
Sanchez et al. did not report whether the two effect sizes were significantly different from
each other. In future research, when more relevant studies are available, the issue of com-
paring the effects of peer assessment with those of self-assessment should be exam-
ined further.
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 203

Factors influencing the effect of peer assessment


Among the many factors examined, rater training emerged as the strongest factor in explaining
the variation of the peer assessment effect. Rater training explained as much as 9.18% of the
effect size variation alone, and it remained statistically significant whether alone or in combin-
ation with other variables. Research shows that rating quality improves when peer assessment is
supported by training, checklists, exemplification, teacher assistance and monitoring (Pond, Ui-
Haq, and Wade 1995; Berg 1999; Miller 2003; Li et al. 2016). More reliable feedbacks are more
likely to generate more positive effects (Topping 2017). Our finding in regard to rater training
has clear implications for how peer assessment should be implemented, i.e. students must be
provided with training in terms of how to conduct peer assessment. Some researchers (i.e. Sebba
et al. 2008; Sanchez et al. 2017) call for teachers to receive training on conducting peer assess-
ment in the classroom as well. It would also be helpful for teacher development programs to
include classroom peer assessment training as a regular component.
Another salient factor we found is the mode of peer assessment, which alone explains almost
3% of the effect size variation. In general, computer-mediated peer assessment shows a larger
effect than paper-based peer assessment. Computer technology brings many advantages to peer
assessment, such as efficiency, flexibility and easy access (Chen 2016). In addition, it provides an
efficient way to ensure random assignment and anonymity, both of which are difficult to imple-
ment otherwise (Cho and Schunn 2007). It also makes the problems, such as off-task discussions
and unequal participation in a traditional face-to-face peer assessment setting, less influential
(Chen 2016). Our finding indicates that computer-mediated peer assessment may be preferable
to paper-based formats.
In addition to rater training and assessment mode, a few other variables, although not statis-
tically significant, showed some impact to different degrees. We understand that the lack of stat-
istical significance could be due to the relatively small number of effect sizes (Schmidt and
Hunter 2015). It is, therefore, important to examine these observed differences. For example,
peer assessment appears to have a larger effect in higher education settings than in K-12 set-
tings. This result is in line with our hypothesis because peer raters in higher education give
more accurate ratings and are more cognitively competent in producing peer feedback
(Falchikov and Boud 1989; Li et al. 2016).
Many variables related to peer assessment procedures showed interesting patterns, too. First,
in regard to rating format, there is concern that providing only scores may not be the best way
to improve student learning (Liu and Carless 2006), because scores are considered summative
and comments are considered formative. We found that the effect size for scores only and the
effect size for both scores and comments were similar, whereas the effect size for comments
only was smaller. A possible reason is that when peers provide only comments, it is likely that
the peer assessment procedure is relatively unstructured such that it may lack explicit criteria.
Second, when both written and oral feedback were provided, the effect size appeared to be
larger than when only one of these was provided. This finding is aligned with the literature in
which elaborated feedback, which includes discussion and negotiation, leads to more learning
(e.g. Wooley et al. 2008; Topping 2017). Third, peer assessment appeared to be more effective
when raters were provided with explicit rating criteria. This result agrees with the literature in
which peer rating is shown to be more accurate when explicit rating criteria are provided (Li
et al. 2016) and higher quality peer assessment is likely to generate more learning gains (Sebba
et al. 2008; Topping 2017). Finally, when there was more than one session of peer assessment,
the effect size was larger than when there was only one such session. This is expected as more
involvement in peer assessment promotes more student learning.
None of the variables related to the assessors and assessees were statistically significant. Still,
the directions of the observed differences mostly agree with our hypothesis and with the litera-
ture. For example, the effect size of peer assessment appeared to be larger when a student was
204 H. LI ET AL.

both an assessor and assessee instead of being an assessor only or assessee only (Cho and
Schunn 2007); when assessors and assessees were matched at random; or when peers ratings
were anonymous (Kane and Lawler 1978). However, there was no noticeable difference related
to whether the number of assessors per assignment was one or more than one.

Conclusion, implications, limitations and future research


To summarise, there is considerable theoretical support for using peer assessment to promote
student learning. Despite both the great potential and widespread use of peer assessment,
empirical evidence in regard to its effect on learning and the factors that might influence such
effect is insufficient and inconsistent. In this meta-analysis, we found that peer assessment in
general has a nontrivial positive effect on students’ learning performance. This confirms previous
literature on the benefits of peer assessment for student learning (Sebba et al. 2008; Sanchez
et al. 2017; Topping 2017). Furthermore, the effect size of peer assessment is significantly larger
when raters receive training and when assessment is computer-mediated rather than paper-
based. Although not statistically significant, a few other variables (such as rating format, rating
criteria and frequency of peer assessment) also show some noticeable impact in explaining the
variation of the peer assessment effect. Our findings can be used by researchers as a basis for
further investigation and by teachers as a foundation for determining how best to use peer
assessment as a learning tool.
However, the process of peer assessment is complicated such that we cannot reliably code
every aspect of peer assessment, which is a primary limitation of this meta-analysis. For example,
initially, we were interested in coding whether peer assessment is used for summative or forma-
tive purposes (Topping 2017). However, information needed for coding that factor is not
included in most studies and is difficult to infer. Also, student characteristics, such as native
speaker/English-language learner and disability/no disability, are of great interest. However, usu-
ally the results are reported for all the participants instead of subgroups; therefore, it was not
possible to quantitatively examine these student characteristics. Further, our outcome measure is
very general as we included a variety of tasks in our analysis. It would be an interesting research
topic to distinguish between higher and lower order learning outcomes and to investigate how
peer assessment as a treatment approach can impact those different learning outcomes. In sum-
mary, we could include only a limited number of more reliably coded variables to quantitatively
describe the peer assessment process, such that we may have missed some important qualitative
differences. In future peer assessment studies, we call for researchers to provide more details
about peer assessment procedures so that richer information can be made available to the field.
In this meta-analysis, we included multiple effect sizes from one study. In particular, we calcu-
lated multiple effect sizes when there were multiple control or experimental conditions. This has
been a common phenomenon in meta-analysis studies (Borenstein et al. 2009). However, the
information for the multiple effect sizes is dependent, and ignoring such dependency may bias
the estimation results (Scammacca, Roberts, and Stuebing 2014). As cautioned by a reviewer, we
conducted a sensitivity analysis using the robust variance estimation method, which adjusts the
standard errors caused by the effect size dependency (Hedges, Tipton, and Johnson 2010; Fisher
and Tipton 2014). The parameter estimates remained the same (such as mean effect sizes, regres-
sion coefficients), but some of the standard errors were slightly larger than the previous standard
errors. Still, our main conclusions remain the same in regard to whether an effect size or a pre-
dictor was statistically significant or not. In future research, one can also use a multi-level analysis
(Van Den Noortgate and Onghena 2003) or multivariate approach (Kalaian and Raudenbush
1996) to deal with the dependency issue.
Finally, because only a small proportion of peer assessment studies focus on its effect on
learning outcomes, our sample is limited. This is probably one of the reasons why some variables
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 205

are theoretically meaningful but statistically non-significant in explaining the variation of the
peer assessment effect. For example, we found that peer assessment effect size appeared to be
larger when both written and oral feedback were provided; when there were explicit rating crite-
ria; when there was more than one session of peer assessment, etc. However, none of these vari-
ables was statistically significant. It is, therefore, necessary to replicate and/or extend the current
meta-analysis when new studies are available.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was funded by Spencer Foundation, Grant number: #201700105.

Notes on contributors
Hongli Li, Ph.D., is an Associate Professor in the Department of Educational Policy Studies at Georgia State
University. Her research focuses on applied measurement and quantitative methods in education.
Yao Xiong received a Ph.D. in Educational Psychology from the Pennsylvania State University. She currently works
as a computational psychometrician at Imbellus Inc. in Los Angeles, USA.

Charles Vincent Hunter received a Ph.D. in Educational Policy Studies (concentration in Research, Measurement,
Statistics) from Georgia State University. He currently works as a research associate for AdvancED|Measured
Progress in Georgia, USA.
Xiuyan Guo received a Ph.D. in Educational Psychology from the Pennsylvania State University. She currently works
as an institutional research associate at Emory and Henry College, Virginia, USA.

Rurik Tywoniw is a Ph.D. student in Applied Linguistics at Georgia State University. His research interests include
second language assessment, second language literacy, and computational linguistics.

ORCID
Hongli Li http://orcid.org/0000-0002-1039-7270
Charles Vincent Hunter http://orcid.org/0000-0003-3537-0485
Rurik Tywoniw http://orcid.org/0000-0003-2885-5871

References
Berg, E. C. 1999. “The Effects of Trained Peer Response on ESL Students’ Revision Types and Writing Quality.”
Journal of Second Language Writing 8(3):215–241. doi:10.1016/S1060-3743(99)80115-5.
Biber, D., T. Nekrasova, and B. Horn. 2011. The Effectiveness of Feedback for L1-English and L2 Writing
Development: A Meta-Analysis. TOEFL iBT Research Report No. TOEFLiBT-14. Princeton, NJ: Educational Testing
Service. doi:10.1002/j.2333-8504.2011.tb02241.x.
Birjandi, P., and M. Siyyari. 2010. “Self-Assessment and Peer-Assessment: A Comparative Study of Their Effect on
Writing Performance and Rating Accuracy.” Iranian Journal of Applied Linguistics 13(1):23–45.
Bloxham, S., and A. West. 2004. “Understanding the Rules of the Game: Making Peer Assessment as a Medium for
Developing Students’ Conceptions of Assessment.” Assessment & Evaluation in Higher Education 29(6):721–733.
doi:10.1080/0260293042000227254.
Borenstein, M., L. Hedges, J. Higgins, and H. Rothstein. 2009. Introduction to Meta-Analysis. Chichester, West Sussex:
John Wiley & Sons.
206 H. LI ET AL.

Chang, C.-C., K.-H. Tseng, P.-N. Chou, and Y.-H. Chen. 2011. “Reliability and Validity of Web-Based Portfolio Peer
Assessment: A Case Study for a Senior High School’s Students Taking Computer Course.” Computers & Education
57(1):1306–1316. doi:10.1016/j.compedu.2011.01.014.
Chen, T. 2016. “Technology-Supported Peer Feedback in ESL/EFL Writing Classes: A Research Synthesis.” Computer
Assisted Language Learning 29(2):365–397. doi:10.1080/09588221.2014.960942.
Cheng, W., and M. Warren. 1999. “Peer and Teacher Assessment of the Oral and Written Tasks of a Group Project.”
Assessment & Evaluation in Higher Education 23(3):301–314. doi:10.1080/0260293990240304.
Cho, K., and C. D. Schunn. 2007. “Scaffolded Writing and Rewriting in the Discipline: A Web-Based Reciprocal Peer
Review System.” Computers and Education 48(3):409–426. doi:10.1016/j.compedu.2005.02.004.
Cho, K., C. D. Schunn, and R. W. Wilson. 2006. “Validity and Reliability of Scaffolded Peer Assessment of Writing
from Instructor and Student Perspectives.” Journal of Educational Psychology 98(4):891–901. doi:10.1037/0022-
0663.98.4.891.
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Erlbaum.
Dochy, F., and L. McDowell. 1997. “Assessment as a Tool for Learning.” Studies in Educational Evaluation 23(4):
279–298. doi:10.1016/S0191-491X(97)86211-6.
Dochy, F., M. Segers, and D. Sluijsmans. 1999. “The Use of Self-, Peer and co-Assessment in Higher Education: A
Review.” Studies in Higher Education 24(3):331–350. doi:10.1080/03075079912331379935.
Falchikov, N., and D. Boud. 1989. “Student Self-Assessment in Higher Education: A Meta-Analysis.” Review of
Educational Research 59(4):395–430. doi:10.2307/1170205.
Fisher, Z., and E. Tipton. 2014. Robumeta: Robust Variance Meta-Regression. Available at: http://cran.rproject.org/
web/packages/robumeta/index.html
Gielen, S., F. Dochy, and P. Onghena. 2011. “An Inventory of Peer Assessment Diversity.” Assessment & Evaluation in
Higher Education 36(2):137–155. doi:10.1080/02602930903221444.
Graham, S., M. Hebert, and K. R. Harris. 2015. “Formative Assessment and Writing: A Meta-Analysis.” The Elementary
School Journal 115(4):523–547. doi:10.1086/681947.
Harrison, K., J. O’Hara, and G. McNamara. 2015. “Re-Thinking Assessment: Self- and Peer-Assessment as Drivers of
Self-Direction in Learning.” Eurasian Journal of Educational Research 60:75–88. doi:10.14689/ejer.2015.60.5.
Hedges, L. V. 1981. “Distribution Theory for Glass’s Estimator of Effect Size and Related Estimators.” Journal of
Educational Statistics 6(2):107–128. doi:10.3102/10769986006002107.
Hedges, L. V., E. Tipton, and M. C. Johnson. 2010. “Robust Variance Estimation in Meta-Regression with Dependent
Effect Size Estimates.” Research Synthesis Methods 1(1):39–65. doi:10.1002/jrsm.5.
Hodgson, C. 2010. “Assessment for Learning in Science: What Works?” Primary Science 115:14–16.
Kalaian, H. A., and S. W. Raudenbush. 1996. “A Multivariate Mixed Linear Model for Meta-Analysis.” Psychological
Methods 1(3):227–235. doi:10.1037//1082-989X.1.3.227.
Kane, J. S., and E. E. Lawler. 1978. “Methods of Peer Assessment.” Psychological Bulletin 85(3):555–586. doi:10.1037//
0033-2909.85.3.555.
Kim, M., and J. Ryu. 2013. “The Development and Implementation of a Web-Based Formative Peer Assessment
System for Enhancing Students’ Metacognitive Awareness and Performance in Ill-Structured Tasks.” Educational
Technology Research and Development 61(4):549–561. doi:10.1007/s11423-012-9266-1.
Li, H., Y. Xiong, X. Zang, M. Kornhaber, Y. Lyu, K. Chung, and H. K. Suen. 2016. “Peer Assessment in a Digital Age: A
Meta-Analysis Comparing Peer and Teacher Ratings.” Assessment & Evaluation in Higher Education 41(2):245–264.
doi:10.1080/02602938.2014.999746.
Li, L., X. Liu, and A. L. Steckelberg. 2010. “Assessor or Assessee: How Student Learning Improves by Giving and
Receiving Peer Feedback.” British Journal of Educational Technology 41(3):525–536. doi:10.1111/j.1467-
8535.2009.00968.x.
Liu, N.-F., and D. Carless. 2006. “Peer Feedback: The Learning Element of Peer Assessment.” Teaching in Higher
Education 11(3):279–290. doi:10.1080/13562510600680582.
Miller, P. J. 2003. “The Effect of Scoring Criteria Specificity on Peer and Self-Assessment.” Assessment & Evaluation in
Higher Education 28(4):383–394. doi:10.1080/0260293032000066218.
Patchan, M. M., C. D. Schunn, and R. J. Clark. 2018. “Accountability in Peer Assessment: Examining the Effects of
Reviewing Grades on Peer Ratings and Peer Feedback.” Studies in Higher Education 43(12):2263–2278. doi:
10.1080/03075079.2017.1320374.
Patchan, M. M., C. D. Schunn, and R. Correnti. 2016. “The Nature of Feedback: How Feedback Features Affected
Students’ Implementation Rate and Quality of Revisions.” Journal of Educational Psychology 108(8):1098–1120.
doi:10.1037/edu0000103.
Perera, J., G. Mohamadou, and S. Kaur. 2010. “The Use of Objective Structured Self-Assessment and Peer-Feedback
(OSSP) for Learning Communication Skills: Evaluation Using a Controlled Trial.” Advances in Health Sciences
Education 15(2):185–193. doi:10.1007/s10459-009-9191-1.
Pond, K., R. Ui-Haq, and W. Wade. 1995. “Peer Review: A Precursor to Peer Assessment.” Innovations in Education &
Training International 32(4):314–323. doi:10.1080/1355800950320403.
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 207

Sadeghi, K., and A. Z. Khonbi. 2015. “Iranian University Students’ Experiences of and Attitudes Towards Alternatives
in Assessment.” Assessment & Evaluation in Higher Education 40(5):641–665. doi:10.1080/02602938.2014.941324.
Sadler, P. M., and E. Good. 2006. “The Impact of Self- and Peer-Grading on Student Learning.” Educational
Assessment 11(1):1–31. doi:10.1207/s15326977ea1101_1.
Sanchez, C. E., K. M. Atkinson, A. C. Koenka, H. Moshontz, and H. Cooper. 2017. “Self-Grading and Peer-Grading for
Formative and Summative Assessments in 3rd through 12th Grade Classrooms: A Meta-Analysis.” Journal of
Educational Psychology 109(8):1049–1066. doi:10.1037/edu0000190.
Scammacca, N., G. Roberts, and K. K. Stuebing. 2014. “Meta-Analysis with Complex Research Designs: Dealing with
Dependence from Multiple Measures and Multiple Group Comparisons.” Review of Educational Research 84(3):
328–364. doi:10.3102/0034654313500826.
Sebba, J., R. D. Crick, G. Yu, H. Lawson, W. Harlen, and K. Durant. 2008. “Systematic Review of Research Evidence of
the Impact on Students in Secondary Schools of Self and Peer Assessment. Technical Report.” In Research
Evidence in Education Library. London: EPPI-Centre, Social Science Research Unit, Institute of Education,
University of London.
Schmidt, F. L., and J. E. Hunter. 2015. Methods of Meta-Analysis: Correcting Errors and Bias in Research Findings. 3rd
ed. Thousand Oaks, CA: Sage.
Sluijsmans, D. M. A., S. Brand-Gruwel, J. J. G. van Merri€ enboer, and T. J. Bastiaens. 2002. “The Training of Peer
Assessment Skills to Promote the Development of Reflection Skills in Teacher Education.” Studies in Educational
Evaluation 29(1):23–42. doi:10.1016/S0191-491X(03)90003-4.
Topping, K. J. 1998. “Peer Assessment between Students in Colleges and Universities.” Review of Educational
Research 68(3):249–276. doi:10.3102/00346543068003249.
Topping, K. J. 2003. Self and Peer Assessment in School and University: Reliability, Validity and Utility. In Optimizing
New Modes of Assessment: In Search of Qualities and Standards, edited by M. S. R. Segers, F. J. R. C. Dochy, & E. C.
Cascallar. Dordrecht: Kluwer Academic.
Topping, K. J. 2017. “Peer Assessment: Learning by Judging and Discussing the Work of Other Learners.”
Interdisciplinary Education and Psychology 1(1):7.
Topping, K. J., and S. W. Ehly. 2001. “Peer Assisted Learning: A Framework for Consultation.” Journal of Educational
and Psychological Consultation 12(2):113–132. doi:10.1207/S1532768XJEPC1202_03.
Van Den Noortgate, W., and P. Onghena. 2003. “Multilevel Meta-Analysis: A Comparison with Traditional Meta-
Analytical Procedures.” Educational and Psychological Measurement 63(5):765–790. doi:10.1177/
0013164403251027.
van Gennip, N. A. E., M. S. R. Segers, and H. H. Tillema. 2009. “Peer Assessment for Learning from a Social
Perspective: The Influence of Interpersonal Variables and Structural Features.” Educational Research Review 4(1):
41–54.
van Zundert, M., D. Sluijsmans, and J. van Merrienboer. 2010. “Effective Peer Assessment Processes: Research
Findings and Future Directions.” Learning and Instruction 20(4):270–279. doi:10.1016/j.learninstruc.2009.08.004.
Viechtbauer, W. 2010. “Conducting Meta-Analyses in R with the Metafor Package.” Journal of Statistical Software
36(3):1–48.
Villamil, O. S., and M. C. M. Guerrero. 1996. “Peer Revision in the L2 Classroom: Social-Cognitive Activities,
Mediating Strategies, and Aspects of Social Behavior.” Journal of Second Language Writing 5(1):51–75. doi:
10.1016/S1060-3743(96)90015-6.
Vygotsky, L. 1978. Mind in Society: The Development of Higher Psychological Processes. Cambridge, MA: Harvard
University Press.
Wen, L. M., and C.-C. Tsai. 2008. “Online Peer Assessment in an in Service Science and Mathematics Teacher
Education Course.” Teaching in Higher Education 13(1):55–67. doi:10.1080/13562510701794050.
Wooley, R. S., C. A. Was, C. D. Schunn, and D. W. Dalton. 2008. The effects of feedback elaboration on the giver of
feedback. Paper presented at the 30th Annual Meeting of the Cognitive Science Society, Washington, DC.

Appendix A
Descriptive characteristics of studies included in the meta-analysis

No.
of effect Research
References sizes design Grade level Subject area Task rated
AbuSeileek and 1 E Undergraduate Writing for English Essay writing
Abualsha’r (2014) as a foreign language
learners
(continued)
208 H. LI ET AL.

Continued.
No.
of effect Research
References sizes design Grade level Subject area Task rated
Afrasiabi and 1 Q Undergraduate English as a foreign Essay writing
Khojasteh (2015) language
Ahangari and 2 Q 15–19 years old English English as a foreign Essay writing
Babapour (2015) language learners language
Bhullar et al. (2014) 1 Q Undergraduate Psychology Research paper writing
Birjandi and Hadidi 1 Q Undergraduate Teach English as a Essay writing
Tamjid (2012) foreign language
Birjandi and 2 Q Undergraduate Advanced writing Essay writing
Siyyari (2010)
Califano (1987) 2 Q 5th and 6th grade English Writing
Chaney and 12 Q Undergraduate Introductory financial Financial statement
Ingraham (2009) accounting analysis case
Chang et al. (2012) 3 Q Undergraduate Experimental physics Two-stage LED
simulation
Choi (2013) 4 Q Undergraduate English writing Essay writing
Diab (2010) 4 Q Undergraduate English III Editing language
errors
Diab (2011) 1 Q Undergraduate English III Essay writing
Dominick (1998) 3 E Undergraduate Psychology Decision making task
Eldredge et al. (2013) 1 Q First-year medical school Genetics and neoplasia PubMed searching skills
Farrell (1977) 1 Q High school Junior English Essay writing
Ford (1973) 2 E Undergraduate English composition Essay writing
Ghanbari et al. (2015) 1 E Post-secondary English as a foreign Essay writing
language
Gielen et al. (2010) 3 Q 7th grade Dutch language Essay writing
Grami (2010) 1 Q Undergraduate English as a second Essay writing
language
Guasch et al. (2013) 4 E Undergraduate Psychology Essay writing
Horn (2009) 2 Q 3rd grade Writing Essay writing
Huynh (2008) 1 Q Undergraduate English writing Essay writing
Hwang et al. (2014) 1 Q 6th grade Natural Science Developing video games
Karegianes 1 Q 10th grade Writing Essay writing
et al. (1980)
Kim (2005) 3 Q Undergraduate Introduction to Concept map
Educational Technology (technology-related
design assignment)
Kinsler (1990) 5 Q Undergraduate Developmental writing Essay writing
Krause et al. (2017) 1 E Undergraduate Clinical treatment Dental patient interviews
Lap and Yen (2013) 1 Q Unknown English writing Essay writing
Li (2006) 1 Q Undergraduate English as a second Essay writing
language
Li and Gao (2016) 3 Q Undergraduate Technology application Lesson plan
Li and 1 E Undergraduate Instructional technology Web-based project
Steckelberg (2004)
Maas et al. (2015) 1 E Post-graduate Physical therapist Physical therapy
postgraduate training
Mo (2005) 1 Q Undergraduate English as a foreign Essay writing
language
Murillo-Zamorano and 2 Q Undergraduate Economic growth; Oral presentation
Montanero (2018) accounting
Olson (1990) 2 Q 6th grade Writing Compositions
Ozogul et al. (2008) 2 Q Undergraduate Instructional design Lesson plan
Ozogul and 4 Q Undergraduate Computers in education Lesson plan
Sullivan (2009)
Peters et al. (2018) 2 E Post-secondary Cutting mechanics Working plan
Pfeifer (1981) 1 Q Undergraduate Freshman Composition Essay writing
and Rhetoric
Philippakos and 4 E 4th and 5th grade Writing Essay writing
MacAuthur (2016)
Pierson (1967) 1 E 9th grade English Essay writing
Prater and 1 E 4th grade Language arts Essay writing
Bermudez (1993)
(continued)
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 209

Continued.
No.
of effect Research
References sizes design Grade level Subject area Task rated
Rijlaarsdam and 4 Q 9th grade Written composition Essay writing
Schoonen (1988)
Sadeghi and 3 Q Undergraduate Teaching methodology Exam
Khonbi (2015)
Sadler and 2 Q 7th grade Science Exam
Good (2006)
Sch€onrock-Adema 6 E Undergraduate Medical study Medical professional
et al. (2007) behaviour
Sippel and 4 Q Undergraduate German language Ungrammaticality
Jackson (2015) judgment task
Soleimani and 1 E Post-secondary English as a second Essay writing
Jamzivar (2014) language
Sun et al. (2015) 2 E Undergraduate Statistics Homework
Trautmann (2009) 3 E Undergraduate Science Science project
Tricio et al. (2016) 1 Q Undergraduate Dental study Observation of
procedural skills
Tsai and 2 Q Undergraduate Intermediate composition Essay writing
Chuang (2013)
van den Boom 1 E Undergraduate Introduction to work Exam
et al. (2007) psychology
van Dulmen 1 E Post-graduate Physical therapy Clinical practice
et al. (2014)
van Ginkel 4 Q Undergraduate Forest and Nature Oral presentation
et al. (2015) conservation ;
Nutrition and Health
Wang et al. (2017) 2 Q Junior high school Computer Programming
Wise (1992) 2 Q 8th grade Reading/language arts Essay writing
Yaghoubi and 5 Q Undergraduate English as a foreign Essay writing
Mobin (2015) language
Note. E ¼ experimental design; Q ¼ quasi-experimental design.

Appendix B
Studies included in the meta-analysis

AbuSeileek, A., and A. Abualsha’r. 2014. “Using Peer Computer-Mediated Corrective Feedback to Support EFL
Learners’ Writing.” Language Learning & Technology 18(1):76–95.
Afrasiabi, M., and L. Khojasteh. 2015. “The Effect of Peer-Feedback on EFL Medical Students’ Writing Performance.”
Khazar Journal of Humanities and Social Sciences 18(4):5–15.
Ahangari, S., and M. Babapour. 2015. “The Effect of Self-Correction and Peer-Correction on EFL Learners’ Writing
Accuracy Improvement across Proficiency.” Modern Journal of Language Teaching Methods 5(2):465–473.
Bhullar, N., K. C. Rose, J. M. Utell, and K. N. Healey. 2014. “The Impact of Peer Review on Writing in a Psychology
Course: Lessons Learned.” Journal on Excellence in College Teaching 25(2):91–106.
Birjandi, P., and N. Hadidi Tamjid. 2012. “The Role of Self-, Peer and Teacher Assessment in Promoting Iranian EFL
Learners’ Writing Performance.” Assessment & Evaluation in Higher Education 37(5):513–533. doi:10.1080/
02602938.2010.549204.
Birjandi, P., and M. Siyyari. 2010. “Self-Assessment and Peer-Assessment: A Comparative Study of Their Effect on
Writing Performance and Rating Accuracy.” Iranian Journal of Applied Linguistics 13(1):23–45.
Califano, L. Z. 1987. Teacher and peer editing: Their effects on students’ writing as measured by t-unit length, holis-
tic scoring, and the attitudes of fifth and sixth grade students (Unpublished doctoral dissertation). Northern
Arizona University, Flagstaff, AZ.
Chaney, B. A., and L. R. Ingraham. 2009. “Using Peer Grading and Proofreading to Ratchet Student Expectations in
Preparing Accounting Cases.” American Journal of Business Education 2(3):39–48.
Chang, S.-H., T.-C. Wu, Y.-K. Kuo, and L.-C. You. 2012. “Project-Based Learning with an Online Peer Assessment
System in a Photonics Instruction for Enhancing LED Design Skills.” The Turkish Online Journal of Educational
Technology 11(4):236–246.
210 H. LI ET AL.

Choi, J. 2013. “Does Peer Feedback Affect L2 Writers’ L2 Learning, Composition Skills, Metacognitive Knowledge,
and L2 Writing Anxiety?” English Teaching 68(3):187–213.
Diab, N. M. 2010. “Effects of Peer- Versus Self-Editing on Students’ Revision.” System 38:85–95.
Diab, N. M. 2011. “Assessing the Relationship between Different Types of Student Feedback and the Quality of
Revised Writing.” Assessing Writing 16(4):274–292. doi:10.1016/j.asw.2011.08.001.
Dominick, P. G. 1998. The effects of a peer feedback instrument on team member behavior (Unpublished doctoral
dissertation). Stevens Institute of Technology, Hoboken, NJ.
Eldredge, J. D., D. G. Bear, S. J. Wayne, and P. P. Perea. 2013. “Student Peer Assessment in Evidence-Based Medicine
(EBM) Searching Skills Training: An Experiment.” Journal of the Medical Library Association101(4):244–251. doi:
10.3163/1536-5050.101.4.003.
Farrell, K. J. 1977. A comparison of three instructional approaches for teaching written composition to high school
juniors: Teacher lecture, peer evaluation, and group tutoring (Unpublished doctoral dissertation). Boston
University, Boston, MA.
Ford, B. W. 1973. The effects of peer editing/grading on the grammar-usage and theme-composition ability of col-
lege freshmen (Unpublished doctoral dissertation). The University of Oklahoma, Norman, OK.
Ghanbari, N., A. Karampourchangi, and M. R. Shamsaddini. 2015. “An Exploration of the Effect of Time Pressure and
Peer Feedback on the Iranian EFL Students’ Writing Performance.” Theory and Practice in Language Studies 5(11):
2251–2261. doi:10.17507/tpls.0511.08.
Gielen, S., L. Tops, F. Dochy, P. Onghena, and S. Smeets. 2010. “A Comparative Study of Peer and Teacher Feedback
and of Various Peer Feedback Forms in a Secondary School Writing Curriculum.” British Educational Research
Journal 36(1):143–162. doi:10.1080/01411920902894070.
Grami, G. M. A. 2010. The effects of integrating peer feedback into university-level ESL writing curriculum: A com-
parative study in a Saudi context (Unpublished doctoral dissertation). Newcastle University, Newcastle.
Guasch, T., A. Espasa, I. M. Alvarez, and P. A. Kirschner. 2013. “Effects of Feedback on Collaborative Writing in an
Online Learning Environment.” Distance Education 34(3):324–338. doi:10.1080/01587919.2013.835772.
Horn, G. C. 2009. Rubrics and revision: What are the effects of 3rd graders using rubrics to self-assess or peer-assess
drafts of writing? (Unpublished Doctoral dissertation). Boise State University, Boise, ID.
Huynh, M. H. 2008. The impact of online peer feedback on EFL learners’ motivation in writing and writing perform-
ance: A case study at Can Tho University (Unpublished master’s thesis). Can Tho University, Vietnam.
Hwang, G.-J., C.-M. Hung, and N.-S. Chen. 2014. “Improving Learning Achievements, Motivations and Problem-
Solving Skills through a Peer Assessment-Based Game Development Approach.” Educational Technology Research
and Development 62(2):129–145. doi:10.1007/s11423-013-9320-7.
Karegianes, M. J., E. T. Pascarella, and S. W. Pflaum. 1980. “The Effects of Peer Editing on the Writing Proficiency of
Low-Achieving Tenth Grade Students.” The Journal of Educational Research 73(4):203–207. doi:10.1080/
00220671.1980.10885236.
Kim, M. 2005. The effects of the assessor and assessee’s roles on preservice teachers’ metacognitive awareness, per-
formance, and attitude in a technology-related design task (Unpublished doctoral dissertation). Florida State
University, Tallahassee, FL.
Kinsler, K. 1990. “Structured Peer Collaboration: Teaching Essay Revision to College Students Needing Writing
Remediation.” Cognition and Instruction 7(4):303–321. doi:10.1207/s1532690xci0704_2.
Krause, F., G. Schmalz, R. Haak, and K. Rockenbauch. 2017. “The Impact of Expert- and Peer Feedback on
Communication Skills of Undergraduate Dental Students – A Single-Blinded, Randomized, Controlled Clinical
Trial.” Patient Education and Counseling 100(12):2275–2282. doi:10.1016/j.pec.2017.06.025.
Lap, T. Q., and C. H. Yen. 2013. “Vietnamese Learners’ Ability to Write English Argumentative Paragraphs: The Role
of Peer Feedback Giving.” Journal on English Language Teaching 3(4):12–20. doi:10.26634/jelt.3.4.2517.
Li, C. 2006. “The Impact of Teacher Involved Peer Feedback in the ESL Writing Class.” Sino-US English Teaching 3(5):
28–35.
Li, L., and F. Gao. 2016. “The Effect of Peer Assessment on Project Performance of Students at Different Learning
Levels.” Assessment & Evaluation in Higher Education 41(6):885–900. doi:10.1080/02602938.2015.1048185.
Li, L., and A. Steckelberg. 2004. Using Peer Feedback to Enhance Student Meaningful Learning. Paper presented at
the Association for Educational Communications and Technology, Chicago, IL.
Maas, M. J. M., P. J. van der Wees, C. Braam, J. Koetsenruijter, Y. F. Heerkens, C. P. M. van der Vleuten, and M. W. G.
Nijhuis-van der Sanden. 2015. “An Innovative Peer Assessment Approach to Enhance Guideline Adherence in
Physical Therapy: single-Masked, Cluster-Randomized Controlled Trial.” Physical Therapy 95(4):600–612. doi:
10.2522/ptj.20130469.
Mo, J. 2005. “An Exploratory Study of Conducting Peer Review among Chinese College Students.” CELEA Journal
28(6):43–48.
Murillo-Zamorano, L. R., and M. Montanero. 2018. “Oral Presentation in Higher Education: A Comparison of the
Impact of Peer and Teacher Feedback.” Assessment & Evaluation in Higher Education 43(1):138–150. doi:10.1080/
02602938.2017.1303032.
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 211

Olson, V. B. 1990. “The Revising Processes of Sixth-Grade Writers with and without Peer Feedback.” The Journal of
Educational Research 84(1):22–29.
Ozogul, G., Z. Olina, and H. Sullivan. 2008. “Teacher, Self and Peer Evaluation of Lesson Plans Written by Preservice
Teachers.” Educational Technology Research and Development 56(2):181–201. doi:10.1007/s11423-006-9012-7.
Ozogul, G., and H. Sullivan. 2009. “Student Performance and Attitudes under Formative Evaluation by Teacher, Self
and Peer Evaluators.” Educational Technology Research and Development 57(3):393–410. doi:10.1007/s11423-007-
9052-7.
Peters, O., H. Ko€rndle, and S. Narciss. 2018. “Effects of a Formative Assessment Script on How Vocational Students
Generate Formative Feedback to a Peer’s of Their Own Performance.” European Journal of Psychology of
Education 33(1):117–143. doi:10.1007/s10212-017-0344-y.
Pfeifer, J. K. 1981. The effects of peer evaluation and personality on writing anxiety and writing performance in col-
lege freshmen (Unpublished doctoral dissertation). Texas Tech University, Lubbock, TX.
Philippakos, Z. A., and C. A. MacArthur. 2016. “The Effects of Giving Feedback on the Persuasive Writing of Fourth-
and Fifth-Grade Students.” Reading Research Quarterly 51(4):419–433. doi:10.1002/rrq.149.
Pierson, H. 1967. Peer and teacher correction: A comparison of the effects of two methods of teaching composition
in grade nine English classes (Unpublished doctoral dissertation). New York University, New York, NY.
Prater, D., and A. Bermudez. 1993. “Using Peer Response Groups with Limited English Proficient Writers.” Bilingual
Research Journal 17(1–2):99–116. doi:10.1080/15235882.1993.10162650.
Rijlaarsdam, G., and R. Schoonen. 1988. Effects of a teaching program based on peer evaluation on written compos-
ition and some variables related to writing apprehension. SCO Cahier Nr. 47 (No. 90-6813-203–2).
Sadeghi, K., and Z. A. Khonbi. 2015. “Iranian University Students’ Experiences of and Attitudes towards Alternatives
in Assessment.” Assessment & Evaluation in Higher Education 40(5):641–665. doi:10.1080/02602938.2014.941324.
Sadler, P. M., and E. Good. 2006. “The Impact of Self-and Peer-Grading on Student Learning.” Educational
Assessment 11(1):1–31. doi:10.1207/s15326977ea1101_1.
Scho €nrock-Adema, J., M. Heijne-Penninga, M. A. J. Van Duijn, J. Geertsma, and J. Cohen-Schotanus. 2007.
“Assessment of Professional Behavior in Undergraduate Medical Education: Peer Assessment Enhances
Performance.” Medical Education 41(9):836–842. doi:10.1111/j.1365-2923.2007.02817.x.
Sippel, L., and C. N. Jackson. 2015. “Teacher vs. Peer Oral Corrective Feedback in the German Language Classroom.”
Foreign Language Annals 48(4):688–705. doi:10.1111/flan.12164.
Soleimani, H., and A. S. Jamzivar. 2014. “The Impact of Written Peer Corrective Feedback on Pre-Intermediate
Iranian EFL Learners’ Writing Performance.” International Journal of Language Learning and Applied Linguistics
World 5(4):1–10.
Sun, D. L., N. Harris, G. Walther, and M. Baiocchi. 2015. “Peer Assessment Enhances Student Learning: The Results of
a Matched Randomized Crossover Experiment in a College Statistics Class.” PLoS ONE 10(12): e0143177. doi:
10.1371/journal.pone.0143177.
Trautmann, N. M. 2009. “Interactive Learning through Web-Mediated Peer Review of Student Science Reports.”
Educational Technology Research and Development 57(5):685–704. doi:10.1007/s11423-007-9077-y.
Tricio, J. A., M. J. Woolford, and M. P. Escudier. 2016. “Fostering Dental Students’ Academic Achievements and
Reflection Skills through Clinical Peer Assessment and Feedback.” Journal of Dental Education 80(8):914–923.
Tsai, Y.-C., and M.-T. Chuang. 2013. “Fostering Revision of Argumentative Writing through Structured Peer
Assessment.” Perceptual and Motor Skills 116(1):210–221. doi:10.2466/10.23.PMS.116.1.210-221.
van den Boom, G., F. Paas, and J. J. G. van Merri€ enboer. 2007. “Effects of Elicited Reflections Combined with Tutor
or Peer Feedback on Self-Regulated Learning and Learning Outcomes.” Learning and Instruction 17(5):532–548.
doi:10.1016/j.learninstruc.2007.09.003.
van Dulmen, S. A., M. Maas, J. B. Staal, G. Rutten, H. Kiers, M. Nijhuis-van der Sanden, and P. van der Wees. 2014.
“Effectiveness of Peer Assessment for Implementing a Dutch Physical Therapy Low Back Pain Guideline: Cluster
Randomized Controlled Trial.” Physical Therapy 94(10):1396–1409. doi:10.2522/ptj.20130286.
van Ginkel, S., J. T. M. Gulikers, H. J. A. Biemans, and M. Mulder. 2017. “The Impact of the Feedback Source on
Developing Oral Presentation Competence.” Studies in Higher Education 42(9):1671–1685. doi:10.1080/
03075079.2015.1117064.
Wang, X.-M., G.-J. Hwang, Z.-Y. Liang, and H.-Y. Wang. 2017. “Enhancing Students’ Computer Programming
Performances, Critical Thinking Awareness and Attitudes toward Programming: An Online Peer-Assessment
Attempt.” Educational Technology & Society 20(4):58–68.
Wise, W. G. 1992. The effects of revision instruction on eighth graders’ persuasive writing (Doctoral dissertation).
Retrieved from Digital Repository at the University of Maryland. (ILLiad No. 1193624)
Yaghoubi, A., and M. Mobin. 2015. “Portfolio Assessment, Peer Assessment and Writing Skill Improvement.” Theory
and Practice in Language Studies 5(12):2504–2511. doi:10.17507/tpls.0512.10.

You might also like