Professional Documents
Culture Documents
Response
Response
A
cross multiple indicators of reproducibil- of other ML2014 studies of the same phenome- 1
Russell Sage College, Troy, NY, USA. 2University of
ity, the Open Science Collaboration (1) non and concluded that this reflects the max- Würzburg, Würzburg, Germany. 3University of Waterloo,
(OSC2015) observed that the original result imum reproducibility rate for OSC2015. Their Waterloo, Ontario, Canada. 4Virginia Commonwealth
was replicated in ~40 of 100 studies sam- analysis using ML2014 is misleading and does University, Richmond, VA, USA. 5University of Michigan, Ann
pled from three journals. Gilbert et al. (2) not apply to estimating reproducibility with Arbor, MI 48104, USA. 6Mathematica Policy Research,
Washington, DC, USA. 7Ashland University, Ashland, OH,
conclude that the reproducibility rate is, in fact, OSC2015’s data for a number of reasons. USA. 8Michigan State University, East Lansing, MI, USA.
as high as could be expected, given the study First, Gilbert et al.’s estimates are based on 9
Southern Oregon University, Ashland, OR, USA. 10University
methodology. We agree with them that both pairwise comparisons between all of the repli- of Göttingen, Institute for Psychology, Göttingen, Germany.
11
methodological differences between original cations within ML2014. As such, for roughly half University of Southern California, Los Angeles, CA, USA.
12
Australian National University, Canberra, Australia.
and replication studies and statistical power of their failures to replicate, “replications” had 13
Technische Universität Braunschweig, Braunschweig,
affect reproducibility, but their very optimistic larger effect sizes than “original studies,” whereas Germany. 14Parmenides Stiftung, Munich, Germany.
15
assessment is based on statistical misconceptions just 5% of OSC2015 replications had replication Queen’s University, Kingston, Ontario, Canada. 16Stanford
and selective interpretation of correlational data. CIs exceeding the original study effect sizes. University, Stanford, CA, USA. 17Keele University, Keele,
Staffordshire, UK. 18Boston College, Chestnut Hill, MA, USA.
Gilbert et al. focused on a variation of one of Second, Gilbert et al. apply the by-site varia- 19
Radboud University Nijmegen, Nijmegen, Netherlands.
OSC2015’s five measures of reproducibility: how bility in ML2014 to OSC2015’s findings, thereby 20
University of Koblenz-Landau, Landau, Germany. 21Erasmus
often the confidence interval (CI) of the original arriving at higher estimates of reproducibility. Medical Center, Rotterdam, Netherlands. 22University of
study contains the effect size estimate of the rep- However, ML2014’s primary finding was that Amsterdam, Amsterdam, Netherlands. 23Harvard University,
Cambridge, MA, USA. 24Occidental College, Los Angeles, CA,
lication study. They misstated that the expected by-site variability was highest for the largest USA. 25Willamette University, Salem, OR, USA. 26Arcadia
replication rate assuming only sampling error is (replicable) effects and lowest for the smallest University, Glenside, PA, USA. 27University of Potsdam,
95%, which is true only if both studies estimate the (nonreplicable) effects. If ML2014’s primary find- Potsdam, Germany. 28University of Bristol, Bristol, UK.
29
same population effect size and the replication has ing is generalizable, then Gilbert et al.’s analysis 30
Vrije Universiteit Amsterdam, Amsterdam, Netherlands.
Karolinska Institutet, Stockholm University, Stockholm,
infinite sample size (3, 4). OSC2015 replications did may leverage by-site variability in ML2014’s larger Sweden. 31Center for Open Science, Charlottesville, VA, USA.
not have infinite sample size. In fact, the expected effects to exaggerate the effect of by-site variabil- 32
University of Virginia, Charlottesville, VA, USA. 33Harvard
replication rate was 78.5% using OSC2015’s CI ity on OSC2015’s nonreproduced smaller effects, Medical School, Boston, MA, USA. 34Loyola University,
measure (see OSC2015’s supplementary informa- thus overestimating reproducibility. Baltimore, MD, USA. 35University of California, Riverside, CA,
USA. 36Wesleyan University, Middletown, CT, USA. 37University
tion, pp. 56 and 76; https://osf.io/k9rnd). By this Third, Gilbert et al. use ML2014’s 85% repli- of Konstanz, Konstanz, Germany. 38Yale University, New Haven,
measure, the actual replication rate was only cation rate (after aggregating across all 6344 par- CT, USA. 39Coventry University, Coventry, UK. 40Tilburg
47.4%, suggesting the influence of factors other ticipants) to argue that reproducibility is high University, Tilburg, Netherlands. 41Utrecht University, Utrecht,
than sampling error alone. when extremely high power is used. This inter- Netherlands. 42University of Leuven, Leuven, Belgium.
43
University of Padova, Padova, Italy. 44University of Vienna,
Within another large replication study, “Many pretation is based on ML2014’s small, ad hoc Vienna, Austria. 45Adams State University, Alamosa, CO, USA.
Labs” (5) (ML2014), Gilbert et al. found that sample of classic and new findings, as opposed to *Authors are listed alphabetically. †Corresponding author. E-mail:
65.5% of ML2014 studies would be within the CIs OSC2015’s effort to examine a more representa- nosek@virginia.edu
by Gilbert et al., and a fourth (the racial bias study evidence in OSC2015 countering their interpreta- testing to determine why. When results do not
from America replicated in Italy) was replicated tion, such as evidence that surprising or more differ, it offers some evidence that the finding is
successfully. Gilbert et al. also supposed that non- underpowered research designs (e.g., interaction generalizable. OSC2015 provides initial, not defin-
endorsement of protocols by the original authors tests) were less likely to be replicated. In sum, itive, evidence—just like the original studies it
was evidence of critical methodological differ- Gilbert et al. made a causal interpretation for replicated.
ences. Then they showed that replications that OSC2015’s reproducibility with selective inter-
were endorsed by the original authors were more pretation of correlational data. A constructive step REFERENCES AND NOTES
likely to be replicated than those not endorsed forward would be revising the previously non- 1. Open Science Collaboration, Science 349, aac4716 (2015).
(nonendorsed studies included 18 original au- endorsed protocols to see if they can achieve en- 2. D. T. Gilbert, G. King, S. Pettigrew, T. D. Wilson, Science 351,
thors not responding and 11 voicing concerns). dorsement and then conducting replications with 1037 (2016).
3. G. Cumming, R. Maillardet, Psychol. Methods 11, 217–227 (2006).
In fact, OSC2015 tested whether rated similarity the updated protocols to see if reproducibility 4. G. Cumming, J. Williams, F. Fidler, Underst. Stat. 3, 299–311
of the replication and original study was corre- rates improve. (2004).
lated with replication success and observed weak More generally, there is no such thing as exact 5. R. A. Klein et al., Soc. Psychol. 45, 142–152 (2014).
relationships across reproducibility indicators replication (8–10). All replications differ in innu- 6. C. R. Ebersole et al., J. Exp. Soc. Psychol. 65 (2016);
https://osf.io/csygd.
(e.g., r = 0.015 with P < 0.05 criterion; supplemen- merable ways from original studies. They are con- 7. A. Dreber et al., Proc. Natl. Acad. Sci. U.S.A. 112, 15343–15347
tary information, p. 67; https://osf.io/k9rnd). ducted in different facilities, in different weather, (2015).
Further, there is an alternative explanation for with different experimenters, with different com- 8. B. A. Nosek, D. Lakens, Soc. Psychol. 45, 137–141 (2014).
the correlation between endorsement and repli- puters and displays, in different languages, at dif- 9. Open Science Collaboration, Perspect. Psychol. Sci. 7, 657–660
(2012).
cation success; authors who were less confident ferent points in history, and so on. What counts 10. S. Schmidt, Rev. Gen. Psychol. 13, 90–100 (2009).
of their study’s robustness may have been less as a replication involves theoretical assessments
AC KNOWLED GME NTS
RELATED http://science.sciencemag.org/content/sci/351/6277/1037.2.full
CONTENT
http://science.sciencemag.org/content/sci/349/6251/aac4716.full
REFERENCES This article cites 9 articles, 3 of which you can access for free
http://science.sciencemag.org/content/351/6277/1037.3#BIBL
PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions
Science (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement of
Science, 1200 New York Avenue NW, Washington, DC 20005. The title Science is a registered trademark of AAAS.
Copyright © 2016, American Association for the Advancement of Science