ReplicationandtheorydevelopmentinthesocialandpsychologicalsciencesFINAL

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/378966795
Replication and Theory Development in the Social and Psychological Sciences
Chapter · December 2024
CITATIONS READS
0 331
2 authors, including:
Brian D. Earp
University of Oxford
315 PUBLICATIONS 5,943 CITATIONS
SEE PROFILE
All content following this page was uploaded by Brian D. Earp on 29 April 2024.
The user has requested enhancement of the downloaded file.

Replica(on and theory development in the
social and psychological sciences
Tomos G. ap Sion (University of St Andrews and Wrexham University)
and Brian D. Earp (University of Oxford)
This is the authors’ copy of a forthcoming book chapter. Please cite as:
ap Sion, T. G., & Earp, B. D. (in press). Replication and theory development in the social
and psychological sciences. In D. Trafimow (ed.), Research Handbook on the
Replication Crisis. Cheltenham, UK: Edward Elgar.
1. Introduc,on
It is a hallmark of social and psychological science that its findings should be reproducible
across relevant contexts, playing a key role in showcasing the reliability of detected effects.
However, following what some have called a “watershed” moment, a “crisis of confidence”
has befallen the discipline, with commentators zeroing in on the Reproducibility Project’s
low replicaGon esGmate of 36-47% (Open Science CollaboraGon, 2015; see also Pashler &
Wagenmakers, 2012; Wiggins & Christopherson, 2019). The theory crisis is another perhaps
less well-known challenge, with researchers arguing that many of the theories within social
and psychological science are generally of poor quality (Eronen & Bringmann, 2021; Fiedler,
2017; Oberauer & Lewandowsky, 2019). Due to this, theories are oaen not clearly accepted
or refuted, tending to come and go with lible of the cumulaGve progress oaen associated
with scienGfic knowledge. This chapter will look at how the replicaGon and theory crises
facing social and psychological science could interact by examining the relaGonship between
replicaGon and theory development. Following an overview of both crises, the first half of
the chapter will look at the way in which replicaGons could support theory development,
covering the idenGficaGon of robust phenomena, the idenGficaGon of effects’ boundary
condiGons, and the evaluaGon of theories’ predicGons. The second half of the chapter will
consider a number of more recent arguments suggesGng that well-specified theory is
required for replicaGons to be informaGve (Irvine, 2021; Klein, 2014; Muthukrishna &
Henrich, 2019).
1
2. Two crises facing social and psychological science
2.1 The replica.on crisis
The replicaGon crisis in the social and psychological sciences has gained much abenGon over
the last decade, raising significant concerns as well as driving reflecGon and reform. It is a
hallmark of psychological science that its findings should be reproducible: in parGcular, that
findings demonstraGng true phenomena should be replicable across relevant contexts,
suggesGng the reality (and reliability) of detected effects.
However, Open Science’s Reproducibility Project: Psychology esGmated that only between
36-47% of its reviewed findings could be reproduced, raising concerns regarding the
soundness of other findings in psychology (Open Science CollaboraGon, 2015; see also Earp,
2016; Hengartner, 2018; Simmons & Simonsohn, 2017; Wabs et al., 2018). Alongside a
number of other controversies, such as the publicaGon of Bem’s (2011) infamous arGcle
Feeling the Future that purportedly demonstrated the existence of paranormal precogniGon,
as well as high-profile accounts of research fraud (Callaway, 2011), such events have led to
what some have termed a “watershed” moment for psychological science (Wiggins &
Christopherson, 2019), resulGng in a deep sense of insecurity and doubt that some have
described as a “crisis of confidence” (Pashler & Wagenmakers, 2012; see also Baker, 2016).
A number of contribuGng factors have been idenGfied. Principal among these are a number
of quesGonable research pracGces (QRPs) that can increase the probability of detecGng what
appears to be a staGsGcally significant effect when no such effect exists. Such pracGces,
some commonly used, demonstrate that the decisions researchers make, such as when to
stop collecGng data, which observaGons to exclude, and which relaGonships to test, can all
increase the probability of a Type 1 error (falsely idenGfying “significant” findings) (Ioannidis,
2005; John et al., 2012; Simmons et al., 2011).
AddiGonally, a number of social and systemic issues also seem to contribute to the
replicaGon crisis, including publicaGon pracGces that discourage the reporGng of failed
replicaGons or (other) null findings (Francis, 2012; Pashler & Harris, 2012), as well as
incenGve schemes in academia that reward researchers for overselling their work (Giner-
2
Sorolla, 2012). Together, such condiGons drive researchers towards using QRPs to make their
findings more appealing, while also restricGng psychological science’s ability to self-correct
due to journals’ general prioriGzaGon of publishing posiGve over negaGve findings (Callard,
2022; Ioannidis, 2012; Koole & Lakens, 2012). This results in an incomplete or even
systemaGcally skewed literature base where most published findings appear successful yet
may oaen be false posiGves based on quesGonable methods (Greenwald, 1979; Earp, 2017).
Worryingly, many of the concerns about the replicability of psychological science, including
contribuGng pracGces, were raised decades ago. These include well-known criGcisms
associated with the near hegemony of Null Hypothesis Significance TesGng (Bakan, 1966),
the “file drawer problem” of researchers failing to write up or submit null findings (instead,
as the name suggests, simply “filing them away”) due to journals’ general reluctance to
publish them (Rosenthal, 1979), as well as the lack of transparency and the paucity of
replicaGon studies in psychological research (Dunnebe, 1966; see also Lakens, 2023).
AddiGonally, Dunnebe (1966) was aware of a number of QRPs, such as problemaGc decisions
around the exclusion of data, decades before Simmons and colleagues’ (2011) influenGal
paper making similar points. These are just a few examples, with numerous further criGcisms
levied against the discipline in decades past (e.g., Lykken, 1991).
Forward-thinking researchers and some journals have responded producGvely to this crisis,
for instance, by promoGng greater transparency in how research is reported, by proposing
changes to scienGfic methodology to reduce the probability of false-posiGve findings, by
working to re-align the reward structures in the discipline to remove barriers to publishing
replicaGons, and, of course, by conducGng or publishing more replicaGons (Benjamin et al.,
2018; Nosek et al., 2018; Kidwell et al., 2016; Klein et al., 2014; Trafimow et al., 2018; cf.
Hutmacher & Franz, 2024).
2.2 The theory crisis

This being said, for those who closely follow debates concerning the integrity of
psychological science, there is another crisis that has not received such widespread abenGon
as the replicaGon crisis. Namely, more and more researchers are raising concerns that the
quality of theories in psychological science is generally poor, and that more abenGon should
be placed on developing beber theories (Eronen & Bringmann, 2021; Fiedler, 2017; Klein,
3
2014; Oberauer & Lewandowsky, 2019). This concern, too, is not new (Dunnebe, 1966; Elms,
1975; Kruglanski, 1975; Lykken, 1991; Meehl, 1967, 1978, 1990). Rather, it has long been
noted that, due to their generally poor quality, theories in psychological science are rarely
clearly supported or refuted (for instance, based on “strict” or “severe” tests of the specific
predicGons they make; see Mayo, 2018). Instead, they tend to come and go with lible of the
cumulaGve progress oaen associated with scienGfic knowledge (e.g., Bird, 2016).
To illustrate this phenomenon, Meehl (1978) gave two examples: level of aspira2on and risky
shi5. These two theories (whose details needn’t concern us here) were popular in the 1930s
and 1960s, respecGvely, featuring prominently in reputable journals and hailed as powerful
theoreGcal constructs. However, over roughly a decade or two they each were gradually
forgoben—rather than falsified—eventually appearing only in cursory remarks if at all.
Dunnebe (1966) also idenGfied level of aspira2on as a theory experiencing intense but
short-lived interest, along with others he described as “fads.” As Rosenthal summarized such
observaGons: “we seem to start anew with each succeeding volume of the psychological
journals” (Rosenthal, 1993, p. 519, as cited in Lambdin, 2012).
A few more recent examples have been suggested. For instance, Eronen and Bringmann
(2021) note that there are at least 83 theories in the field of behaviour change alone, with
the likelihood of any substanGal number of these theories being convincingly accepted or
refuted seeming low. The authors also offer the theory of ego deple2on as another possible
example, which, compared to its historical prominence, has recently fallen into doubt with
no conclusive evidence either for or against it (Friese et al., 2019). Even the highly popular
dual process theory of cogniGon, posiGng Types 1 and 2 (roughly, “fast” and “slow”) is now
alleged to lack substanGve empirical support, with prominent researchers calling for it to be
abandoned (Melnikoff & Bargh, 2018, 2023).
Similar to the replicaGon crisis, the fact that many of these criGcisms of theory quality were
raised more than half a century ago is worrying. However, unlike the replicaGon crisis, there
are no concrete improvements of similar scale that have been implemented within social
and psychological science (Lakens, 2023). However, a number of factors contribuGng to the
theory crisis have been suggested.
4
For instance, some researchers have argued that the theories of psychological science tend
to be imprecise or formulated so vaguely that it is difficult to falsify them (Dunnebe, 1966;
Meehl, 1990; Oberauer & Lewandowsky, 2019). While some would say the Popperian ideal
of strict falsifiability is unachievable, not only in pracGce, but even in principle (including in
the so-called “hard” sciences such as physics; Tsang & Kwan, 1999), something like this ideal
should plausibly sGll be kept in view as a standard to approach or approximate. In other
words, even if absolute or definiGve falsificaGon may not be possible, it could sGll be
possible, desirable, and epistemically sound to take a more moderate or pragmaGc approach
to falsifiability that relies on a meaningful accumulaGon of informa2ve—even if not
conclusive—evidence for or against a theory over Gme (Earp & Trafimow, 2015).
Accordingly, building theories with enough specificity that high-quality empirical tests of
their predicGons will indeed be informaGve seems a laudable, even necessary, goal of any
empirical science. And yet, lible progress seems to have been made on this front (Lakens,
2023), with theories in psychology tending only to be weakly connected to the empirical
findings by virtue of which they might be supported or challenged (Oberauer &
Lewandowsky, 2019). Moreover, there is limited focus in the discipline towards building
beber theories that could meaningfully address such issues (Kruglanski & Higgins, 2004; see
however Muthukrishna & Henrich, 2019).
Other potenGal contribuGng factors to the “theory crisis” have been idenGfied, including
over-reliance on Null Hypothesis Significance TesGng (NHST), psychologists conGnuing to rely
on vague or weak theoreGcal constructs, a lack of focus on construct validity in psychological
science, and a paucity of robust empirical phenomena to constrain theory choice, to name a
few (Dunnebe, 1966; Eronen & Bringmann, 2021; Lakens, 2023; Meehl, 1978; Oberauer &
Lewandowsky, 2019).
Importantly, some perspecGves in the theoreGcal and philosophical literature suggest that
the replicaGon crisis and the theory crisis cannot, or should not, be insulated from each
other. The first half of this chapter will focus on the perceived role that replicaGons could
play in supporGng the development of high-quality theories, while the second half will look
5
at criGcisms suggesGng that high-quality theory building is a logical prerequisite to
conducGng informaGve empirical tests, including replicaGons.
3. Introducing replica,on
3.1 The perceived importance of replica.on
ReplicaGons are oaen described as studies that invesGgate whether the findings of another
study are reproducible by deploying, to a greater or lesser extent, a similar methodology to
the original study (e.g., Schmidt, 2009). ParGcularly in the context of the so-called replicaGon
crisis, reproducibility reflects the idea that originally reported findings and phenomena in
the sciences should be detectable (and so, intersubjecGvely verifiable) by other scienGsts,
including anyone with the knowledge and capacity to recreate the essenGal experimental
condiGons of the original study.
The importance of replicaGon is oaen emphasised both by empirical researchers and

philosophers of science (e.g., Asendorpf et al., 2013; Brandt et al., 2014; Schmidt, 2009;
Sikorski & Andreoleo, 2023; Zwaan et al., 2018), with some maintaining that there is almost
universal consensus regarding its value (for criGcal discussion, see Haig, 2022). For instance,
replicaGon has been described as the “coin of the scienGfic realm” (Loscalzo, 2012, p. 1211),
“the Supreme Court of the scienGfic system” (Collins, 1992, p. 19), a “touchstone for
objecGve knowledge” (Schmidt, 2009, p. 92), and as the “gold standard” of science (Jasny et
al., 2011, p. 1225). ReplicaGon may be so central, according to some, that it could serve as a
“demarcaGon criterion between science and nonscience” (Braude, 1979, p. 2).
The perceived importance of replicability is closely linked to the belief that nature behaves in
a consistent and law-like manner (Schmidt, 2009). When events in nature occur and reoccur
according to rules and regulariGes, there is an expectaGon that events of the same type
should be repeatedly observable if the same essenGal condiGons are present. Only through
this consistency can observaGons be repeatedly tested, and scienGsts can become confident
that their observaGons represent more than mere flukes or isolated coincidences (Popper,
1959).
6
In this sense, replicaGons are seen as important for determining the credibility and stability
of scienGfic findings (Radder, 1996). When a given finding has been independently replicated
by a number of different scienGsts, researchers can be more confident that the finding
represents a valid and stable empirical pabern in which trust and confidence can
appropriately be placed. In keeping with this aim, replicaGons are oaen seen posiGvely as a
means to verify the accuracy of facts in a field of research, to more accurately esGmate
effect sizes, and, depending on the specifics of the replicaGon strategy, to determine
whether an effect holds across populaGons, contexts, and operaGonalizaGons (i.e.
generalizability) (Brandt et al., 2014; Hüffmeier et al., 2016; LeBel et al., 2018; Schmidt,
2009).
However, the more a given finding fails to replicate, the less confidence researchers should
place in the finding represenGng a valid (or at least stable) empirical phenomenon, assuming
that these are high-quality replicaGon abempts by independent scienGsts (Earp & Trafimow,
2015). Relatedly, replicaGons are oaen viewed as protecGng against a number of factors that
could lead to problemaGc findings, including chance results (i.e., false posiGves), the
influence of experimenter effects and tacit knowledge (e.g., expectancy effects),
experimental arGfacts and issues of internal validity, as well as fraud (Hüffmeier et al., 2016;
Schmidt, 2009; see however Stroebe & Strack, 2014). As Machery puts it, replicaGons are
meant to allow scienGsts to “correct the empirical record” (Machery, 2022, p. 546).
Furthermore, commentators have also argued that replicaGons can test the assumed
underlying processes of a theory (Brandt et al., 2014), can idenGfy robust and reliable data
for theorizing (LeBel et al., 2018), and can provide the gold standard for evaluaGng a
theoreGcal predicGon (Hüffmeier et al., 2016).
3.2 The direct/conceptual dis.nc.on

A common disGncGon drawn in the replicaGon literature is between direct and conceptual
replicaGons. To begin with, a number of researchers call studies that aim to be as
methodologically similar to the original study as possible a “direct” replicaGon. In such
studies, the scienGst aims to, in a sense, “recreate” the original study by duplicaGng the
procedures, sample characterisGcs, sGmuli, measurement approaches, operaGonalizaGons,
7
and so on, of the original study (Brandt et al., 2014; Fabrigar & Wegener, 2016; Haig, 2022;
Schmidt, 2009).
Another type of replicaGon that is oaen recognised is called a “conceptual” replicaGon. Such
studies aim to invesGgate whether a finding from a previous study is detectable using
different methods (Haig, 2022; Schmidt, 2009), oaen with parGcular emphasis placed on
using different operaGonalizaGons of the (independent and dependent) variables under
invesGgaGon (Fabrigar & Wegener, 2016; Hüffmeier et al., 2016; LeBel et al., 2018).
One key point to make here is that there is no such thing as a replicaGon that is exactly the
same as the original study in all respects (Rosenthal, 1991). For instance, a replicaGon must
be different from the original study at least in the sense that it has been conducted at later
Gme point than the original study. ReplicaGons are also oaen conducted in different cultural
or linguisGc contexts, or at minimum, in different labs (for discussion, see Earp, 2020). In this
sense, direct replicaGons can only aspire to keep certain differences to a minimum (Feest,
2019).
Furthermore, difference is part of what makes replicaGon useful (Schmidt, 2009). As a

complete duplicate of the original study would in fact be the original study, it is trivially true
that it would “replicate.” In this sense, some differences are useful and even necessary as
they allow scienGsts to tell whether a finding holds across some type or degree of variaGon.
Accordingly, all replicaGons, to a greater or lesser extent, aim to establish whether a finding
can generalize across variaGons in context, whether physical, sociocultural, or temporal, or in
research design or materials (Nosek & Errington, 2020). If a finding can be detected across
labs, samples, operaGonalizaGons, and contexts, then it can be considered robust and
invariant across at least the tested changes.
AddiGonally, while the disGncGon between direct and conceptual replicaGon is commonly
drawn, this disGncGon is not meant to be interpreted as represenGng two discrete and
unrelated types (see, for example, the more complex or mulG-dimensional ways of
classifying replicaGons proposed by LeBel and colleagues, 2018, and Hüffmeier and
colleagues, 2016). Instead, direct replicaGons and conceptual replicaGons should be
8
interpreted as occupying two regions, or segments, along a theoreGcal conGnuum of
variaGon through which different replicaGon studies can be interpreted (Asendorpf et al.,
2013).
Taking the typology of LeBel and colleagues (2018) as an example, there are three different
“types” of replicaGons that can be categorised as a direct replicaGon (exact, very close, and
close replicaGon) as well as two types that can be categorised as a conceptual replicaGon (far
and very far replicaGons). As one travels along the conGnuum of replicaGon, differences are
steadily and, ideally, knowingly introduced, such as differences in procedural details, physical
seong, and sGmuli. Although perhaps somewhat arbitrary, the threshold between direct
and conceptual replicaGon can be seen as having been crossed when the replicaGon differs
from the original study in terms of its parGcipant populaGon (e.g., age cohort or clinical
status) or how the relevant target constructs are operaGonalized.
While some may quesGon the specific details of such taxonomies (e.g., which deviaGons
should fall within each “sub-type”), seeing methodological variaGon as a conGnuum has its
advantages. Firstly, it respects the intuiGve idea that replicaGons differ from the original
study in terms of degrees. For instance, a replicaGon that only varies one design facet is
more methodologically similar to the original study than one that varies several facets.
Secondly, as LeBel and colleagues note (2018), individuaGng and assessing replicaGons using
a unified framework can minimize researcher bias by providing a common stepping-off
point. Lastly, replicaGons can be conducted in a parGcular order, beginning with exact
replicaGons and moving towards the more conceptual end of the spectrum (Hüffmeier et al.,
2016; Lebel et al., 2018). This idea represents an approach whereby empirical paberns are
first tested for limited generalizability, before introducing greater levels of methodological or
contextual variaGon to determine the extent to which the pabern holds.
3.3 Interpre.ng successful replica.ons

Based on the above discussion, one can begin to see an important role for different types of
replicaGons. For instance, for Hüffmeier and colleagues (2016), methodologically similar
direct (“exact”) replicaGons are conducted by the original study’s author and could oaen
feature in mulG-study papers wherein the original finding is reported along with a
9
replicaGon. The main funcGon served by such replicaGons, according to Hüffmeier and
colleagues, is to protect against false posiGves (i.e. chance findings).
When an exact replicaGon of this kind is successful, the generalizability of the finding is
limited only across variaGons in sample characterisGcs (i.e., same populaGon but different
parGcipants), as well as any other minor variaGons that may be inadvertently or unavoidably
introduced. This form of “double-checking” exercise could be a useful way to make sure that
the original study, at least for the researcher who conducted it, did detect some kind of
reproducible empirical pabern (whatever its meaning or significance may be).
When further differences are added, such as when a similar kind of replicaGon is conducted
but by different researchers and in different labs, successful replicaGons suggest that the
original empirical finding can be further generalized across these variaGons. If conducted by
independent researchers, other problemaGc factors that could be responsible for the finding
can also be reduced, such as fraud, the use of QRPs (e.g., only reporGng staGsGcally
significant findings), and idiosyncraGc experimenter effects (e.g., unknown or non-
generalizable influences stemming from the original researcher or lab) (Hüffmeier et al.,
2016; Schmidt, 2009).
Generalizability in this sense is ulGmately tested through conceptual replicaGons using

different operaGonalizaGons of the target constructs, as well as different populaGons
(Hüffmeier et al., 2016; LeBel et al., 2018). By intenGonally introducing such variaGon,
researchers who perform high-quality replicaGons can help to confirm whether the
relaGonship under invesGgaGon is conGngent on, for instance, a specific sample or research
design. Such replicaGons provide further support for the claim that the empirical pabern
represents a valid phenomenon, as shown through mulGple different tests or
measurements.
3.4 Interpre.ng unsuccessful replica.ons

In an ideal world, an unsuccessful replicaGon abempt would provide strong evidence against
the finding of the original study. However, interpreGng replicaGon results can be challenging
when they are unsuccessful. Since even direct replicaGons invariably introduce some
10
differences compared to the original, an unsuccessful replicaGon could either indicate that
the original finding does not represent a robust empirical relaGonship, or that one or more
differences introduced by the replicaGon is responsible for the relaGonship not being
detected. In this sense, with every research-design or other facet change that is introduced,
the more ambiguous an unsuccessful replicaGon abempt will be (Schmidt, 2009). This can
make it challenging to idenGfy precisely why a replicaGon did not succeed (Stroebe & Strack,
2014).
This is especially problemaGc in the social and psychological sciences as there are oaen
numerous possible unmeasured moderaGng variables and inadvertently introduced
methodological changes (“unknown unknowns,” to use Rumsfeld’s phrase) that could impact
the finding under invesGgaGon. This inherent ambiguity or uncertainty has resulted in failed
replicaGon abempts someGmes being quite controversial (e.g., in the debate on social
priming), with researchers disagreeing on how small or seemingly irrelevant a deviaGon from
the original study design should -- or should not – result in the eliminaGon of the effect (e.g.,
as broadly discussed in Klein, 2014). While it is right to be skepGcal of genuinely ad hoc
efforts to “explain away” unsuccessful replicaGon abempts – especially if undertaken
competently and in good faith, which may itself be in dispute – it is also true that the more
differences there are between an original study and its (failed) replicaGon, the harder it may
be to interpret the results of the laber.
To reduce the ambiguity of unsuccessful replicaGons, researchers tend to suggest two

approaches. Firstly, if unsuccessful replicaGons are meant to be informaGve, then scienGsts
should minimize methodological differences between the replicaGon and the original study.
This is because the more methodological differences there are, the more alternaGve
explanaGons there are for why the finding was not detected. To address this concern, Brandt
and colleagues’ (2014) “replicaGon recipe” suggests a number of ways to minimize design
differences between a replicaGon and an original study, including by clearly defining the
effects and methods being replicated, following the original methods as exactly as possible,
having sufficiently high staGsGcal power, and by maximising the replicaGon’s methodological
transparency.
11
Secondly, scienGsts should, wherever possible, provide evidence that they are manipulaGng
and measuring the same constructs as the original study (e.g., manipulaGon checks). Doing
so supports the conclusion that “psychometric invariance” has been maintained between
the original study and the replicaGon, which is the assumpGon that the operaGonalizaGons
deployed by both share the same psychometric properGes (Fabrigar & Wegener, 2016; cf.
Trafimow, 2023). If the operaGonalizaGons deployed by the replicaGon study do not
manipulate or measure the same constructs as the original study, for instance, because a
given operaGonalizaGon only manipulates a parGcular construct accurately in a specific
populaGon or language (Ramscar et al., 2015), then the replicaGon may not be invesGgaGng
the same constructs in the same way as the original study. If this is the case, then one would
not expect the replicaGon to detect the finding that the original study did.
Such approaches aim to increase the informaGveness of replicaGons through increased

transparency and through reducing the scope of possible alternaGve explanaGons for why a
finding was not detectable by the replicaGon study. Due to the focus on minimizing
differences between the replicaGon and original study, “direct” replicaGons have been
presented as a means of protecGng replicaGon informaGveness when it comes to failed
replicaGons (Fabrigar & Wegener, 2016; Hüffmeier et al., 2016; LeBel et al., 2018). As direct
replicaGons aim to minimize differences in research design, they are less open to
reinterpretaGon. In contrast, replicaGons that introduce more design differences, such as so-
called “conceptual” replicaGons, tend to be seen as more problemaGc (Schmidt, 2009), as
large methodological differences lead to greater ambiguity when such replicaGons are
unsuccessful, as previously explained.
However, this does not mean that direct replicaGons are a silver bullet against the original
finding, nor does it mean that conceptual replicaGons are useless when they are
unsuccessful. As Earp and Trafimow (2015) note, if one direct replicaGon abempt fails, then
perhaps the authors of the replicaGon study got unlucky (e.g., generated a false-negaGve
result), or perhaps they inadvertently introduced a theoreGcally or pracGcally criGcal design
difference. However, if several independent direct replicaGons fail, and there is no reason to
believe that these were incompetently performed, then one should raGonally decrease one’s
confidence in the original finding. In this sense, replicaGons are not conclusive in the sense
12
that they can decisively indicate that a finding is false or non-replicable, but they can be
informaGve, especially in the aggregate.
Furthermore, when there are a number of successful direct replicaGons in place, conceptual
replicaGon could indicate that there are undiscovered boundary condiGons that influence
whether a finding empirically manifests (LeBel et al., 2018). For instance, Nosek and
Errington (2020) cite an interesGng example in which unsuccessful replicaGons in an
experiment using frogs were ulGmately accounted for by different studies unknowingly using
different “types” of frogs (see secGon 4.2). AddiGonally, if a finding cannot be detected by
different operaGonalizaGons that should, in principle, work, then conceptual replicaGons
could indicate that the original finding was the result of narrow and problemaGc
operaGonalizaGons (Hüffmeier et al., 2016).
4. Linking replica,on to theory building

What does all of this have to do with the alleged theory crisis? Either implicitly or explicitly,
researchers oaen argue that replicaGons can play a key role in supporGng theory building,
oaen emphasising the fact-checking role replicaGons could play in supporGng theory tesGng
and development (Brandt et al., 2014; LeBel et al., 2018; see also Sikorski & Andreoleo,
2023). Researchers also suggest that findings from replicaGons can provide further insights
that support theory building. Schmidt exemplifies this posiGon by arguing that “the funcGon
of conceptual replicaGon is not only to confirm facts but also to assist in developing models
and theories of the world” (Schmidt, 2009, p. 95). Nosek and Errington similarly argue that
the “purpose of replicaGon is to advance theory by confronGng exisGng understanding with
new evidence” (Nosek & Errington, 2020, p. 3).
Based on such perspecGves, some argue that numerous different types of replicaGons are
required for a theory to be rigorously developed and have applicaGon (Hüffmeier et al.,
2016). The below secGon is wriben in this light, looking at three ways some commentators
have argued that replicaGons can support theory building and evaluaGon.
4.1 Iden.fying robust phenomena
13
To begin with, this secGon will look at how replicaGons can support the claim that a
psychological or social phenomenon is a valid and stable empirical effect of interest for
theorizing. The reasoning behind this is straightorward. If an effect is not easily replicable,
then one may quesGon whether there is a valid empirical pabern in the first place to
theorize about, or whether there is sufficient confidence in the facts at hand to confidently
build a theory furnishing the reported findings and observaGons. As Haig (2022, p. 236)
notes, “replicaGon relates directly to phenomena detecGon … phenomena detecGon relates
directly to theory construcGon.”
If such a strong foundaGon of facts is not established, then theories may, in essence, be
explaining over false or non-existent enGGes, which could represent a waste of valuable
research effort. This may be the case in some fields. As Wiggins and Christopherson (2019)
note, replicaGon studies conducted in response to the replicaGon crisis have called a number
of prominent psychological constructs, such as social priming (O'Donnell et al., 2018) and
ego depleGon (Hagger et al., 2016), into quesGon. This suggests that well-replicated
observaGons may pave the way towards gaining a clearer idea of what the reliable
phenomena are in the psychological and social sciences, which may ulGmately help
determine what effects are worth theorizing about.
This argument is taken a step further by Eronen and Bringmann (2021). In their treatment of
the theory crisis, they argue that one reason why good psychological theories are difficult to
formulate is that there is insufficient knowledge of robust phenomena to constrain theory
choice. In psychological science, the relaGonship between phenomena and theories tends to
be construed as a one-way relaGonship, with theories explaining phenomena. However, as
the authors note, the relaGonship between theories and phenomena is in reality bi-
direcGonal. Phenomena also constrain theory choice as a given theory must at least be
consistent with all known phenomena (that is, all relevant phenomena there is good reason
to believe are genuine or real).
Given that known phenomena should constrain theories, how may replicaGons support the
iniGal idenGficaGon and verificaGon of phenomena? In answering this quesGon, one can
follow Haig (2022) and Machery (2022) by disGnguishing between data, phenomena, and
14
theory (Bogen & Woodward, 1988). Generally, phenomena represent observed regulariGes
that theories aim to explain and predict. In the social and psychological sciences, such
regulariGes are oaen termed effects. For example, the observaGon that humans tend to
allocate their abenGon towards the object of another’s gaze is oaen referred to as the gaze-
cuing effect (Driver et al., 1999; Frischen et al., 2007; Friesen & Kingstone, 1998; Hietanen,
1999; Langton & Bruce, 1999).
Data, on the other hand, are values (e.g., numbers) derived from measures in a specific
(oaen experimental) empirical context. These values are unique: they correspond to the
specific empirical context within or through which they are detected. For instance, while a
gaze-cuing effect may be detectable through the standard gaze-cuing paradigm, variability in
sample characterisGcs, measurement error, and the presence and influence of other factors
ensure that no two data sets are precisely the same. While average tendencies and effect
sizes can be predictable within a margin of error, the specific data points that these
calculaGons draw on are not. While data are seen as unique products of a specific empirical
context, the empirical regulariGes and correlaGons that are detectable within a given data
set could suggest the presence of a phenomenon. In this sense, phenomena are derived
from data, and theories aim to explain and predict phenomena.
Making this disGncGon is important when considering the relaGve contribuGon of different
types of replicaGon studies. In parGcular, direct replicaGons tend to be viewed as providing
evidence that similar paberns are repeatedly detectable in data, while conceptual
replicaGons tend to provide evidence that such paberns are not the product of a parGcular
operaGonalizaGon or specific empirical context. While direct replicaGons speak to the
robustness and limited generalizability of a parGcular data pabern, conceptual replicaGons
are one way to make the leap from data paberns to phenomenon.
Beginning with direct replicaGons, as discussed in previous secGons, direct replicaGons can
provide evidence that the findings of an original study are robust and not the product of, for
instance, chance findings, fraud, tacit experimenter knowledge, and so on. Furthermore, as
long as they appropriately match the research design of the original study, mulGple
unsuccessful direct replicaGons can, taken together, raGonally reduce one’s confidence in the
15
validity of the original finding. That is, while they cannot be decisive, they can be informaGve
(Earp & Trafimow, 2015). Thus, direct replicaGons could be beneficial for theory in a negaGve
sense as they may provide good reason to quesGon whether the original finding captures a
robust empirical regularity that may indicate the presence of a phenomenon.
In the posiGve sense, however, direct replicaGons may not provide parGcularly strong
support for the claim that data regulariGes stand for a valid and generalizable phenomenon
(Feest, 2019). While they can support the claim that a given method can, with limited
generalizability, reliably produce a certain empirical outcome, it is possible that the detected
empirical pabern is a product of problemaGc research design. So, while direct replicaGons
can ensure that a “sensible minimal standard” has been reached for accepGng a finding in
social and psychological science (Hüffmeier et al. 2016, p. 85), they cannot rule out other
problemaGc causes of a given finding. As Brandt notes, “if the original study was plagued by
confounds or bad methods, then the replicaGon study will similarly be plagued by the same
limitaGons” (Brandt et al., 2014, p. 222).
Failed conceptual replicaGons may, amongst other possibiliGes, suggest that the empirical
regularity detected by the original study is a product of problemaGc research design. From a
theoreGcal standpoint, different valid operaGonalizaGons of the target constructs should not
eliminate the detected relaGonship. If a finding is only detectable using a specific set of
operaGonalizaGons and not others, then perhaps the detected effect is an arGfact of the
operaGonalizaGons deployed.
However, as noted previously, unsuccessful conceptual replicaGons tend to be ambiguous, as

a failure to replicate could be abributed to one or more of the many design differences
introduced. To protect against this ambiguity, researchers oaen advise that scienGsts provide
measures demonstraGng that the “new” operaGonalizaGons do in fact manipulate and
measure the target constructs, for instance, by comparing how well the two measures of a
construct agree (i.e., concurrent validity) (Fabrigar & Wegener, 2016). Furthermore, if
conceptual replicaGons are conducted against a background of successful direct replicaGons,
then a conceptual replicaGon failure arguably becomes more informaGve as it may suggest
that unknown boundary condiGons influence the effect in quesGon.
16
This being said, if conceptual replicaGons are successful, then these findings suggest that the
observed empirical pabern can be detected by different operaGonalizaGons of the target
constructs, as well as across greater variaGons in populaGon and research design (Hüffmeier
et al., 2016). If an empirical pabern is detectable across variaGons in research design, then it
seems more likely that there is a common effect present that each study using different
designs is detecGng. As Haig argues, conceptual replicaGons are “the major, perhaps
primary, means by which we establish the existence of claims about empirical phenomena”
(Haig, 2022, p. 229).
In sum, a series of direct replicaGons support the validity and limited generalizability of the
finding in quesGon, and conceptual replicaGons support the claim that the finding in
quesGon may represent a valid and stable phenomenon. As phenomena are what theories
aim to explain and predict, replicaGons support theory building by ensuring that these
phenomena are in fact robust and generalizable empirical regulariGes.
4.2 Replica.on and boundary condi.ons

Another way in which replicaGons could support theory building is by indicaGng whether
there may or may not be boundary condiGons for an effect. As LeBel and colleagues note:
“PosiGve replicaGon evidence shows that an effect is robust across the known design
differences. When replicaGon evidence is negaGve, such differences provide iniGal clues
regarding potenGal boundary condiGons of an effect” (LeBel et al., 2018, p. 6).
As discussed above, when replicaGons fail, they may either indicate that the original study
did not detect a meaningful empirical regularity, or that one or more of the deliberate or
inadvertent differences introduced by the replicaGon was responsible for the effect not
being detected. While some commentators argue that this is a weakness of replicaGons, it
can also be a benefit as introducing limited differences between replicaGons can provide
researchers with an idea of what differences in method may be important from a theoreGcal
perspecGve.
17
One example of how this approach could be deployed is through what Hüffmeier and
colleagues (2016) call construc2ve replica2ons. In essence, these are direct replicaGons
(exact or close replicaGons using their terminology) followed by an addiGonal replicaGon
that introduces further design differences. If the first replicaGon succeeds while the second
fails, evidence is provided that some difference introduced by the second was potenGally
responsible. This may help researchers refine and develop their understanding of the
relaGonships between various variables.
Nosek and Errington (2020) provide an example of how replicaGons have played this role in
pracGce, drawing on Obo Loewi’s famous experiment involving (what is presently referred to
as) neurotransmibers. In essence, while Loewi’s experiment indicated that acetylcholine is
released from the vagus nerve of frogs, many abempts to replicate this finding failed. It was
later discovered that the Gme of year was a crucial factor, as only “winter” frogs were
sensiGve to the experimental manipulaGons of the original study and not “summer” frogs.
Through unsuccessful replicaGons, a moderator was discovered for the effect in quesGon. As
Hüffmeier and colleagues note, such replicaGons, at least in principle, have “the potenGal to
illuminate theoreGcally meaningful processes and conGngencies” (Hüffmeier et al., 2016, p.
86).
In other words, replicaGons can provide the jusGficaGon required for further exploratory
research. For example, if a number of direct replicaGons are successful, yet a handful of
more conceptual replicaGons fail, then further research seems required to idenGfy why
some of the conceptual replicaGons failed. Furthermore, differences in effect size across
replicaGons could also indicate that there are moderators present that may be influencing
the manifestaGon of a phenomenon (Brandt et al., 2014; LeBel et al., 2018). However, it may
be difficult to tell whether this is simply a consequence of using different operaGonalizaGons
of the target constructs and not due to the presence of an unmeasured moderator
(Hüffmeier et al., 2016).
In sum, by providing an idea of how invariant an effect is across condiGons, replicaGons can
prompt further research with the aim of understanding what variables have a significant
impact on a phenomenon. This may be parGcularly valuable when lible theoreGcal work has
18
been done as these replicaGons can provide iniGal direcGons for further and more
exploratory research to pursue.
4.3 Tes.ng theore.cal predic.ons

ReplicaGons may also more directly support theory building by providing evidence in favor of
or against an empirical predicGon derived from a theory. As Nosek and Errington put it,
“theories make predicGons; replicaGons test those predicGons” (Nosek & Errington, 2020, p.
7). Much of what has been said above applies to theoreGcal predicGons. Direct replicaGons
could provide evidence for the robustness and limited generalizability of a theorized
relaGonship, while also being more informaGve when unsuccessful due to the presence of
fewer alternaGve explanaGons. Conceptual replicaGons could provide further evidence for
the generalizability of a theorized relaGonship, as well as provide evidence that the effect is
not, for example, an arGfact stemming from a parGcular operaGonalizaGon.
However, when it comes to tesGng theoreGcal predicGons, a further level of difficulty is oaen
present. Specifically, predicGons are oaen not derived directly from a theory, but from a
theory in conjuncGon with a number of auxiliary assumpGons (Earp & Trafimow, 2015; see
also Trafimow, 2023). Auxiliary assumpGons are premises that are included in order to
render the link between theory and observaGon deducGve. Auxiliary assumpGons are also
present when conducGng replicaGons, with scienGsts assuming that certain differences, such
as replicaGng in a different lab or varying the sGmuli used, should not impact the ability of
the replicaGon to provide evidence in favor of, or against, a predicGon and its theory. This is
why a number of researchers argue that replicaGons should provide evidence that, for
example, the operaGonalizaGons of the target constructs are in fact manipulaGng and
measuring the theoreGcally relevant processes in quesGon (Fabrigar & Wegener, 2016;
Hüffmeier et al., 2016; LeBel et al., 2018). This is, in essence, an exercise in verifying that
certain auxiliary assumpGons hold.
The presence of a large number of auxiliary assumpGons affects the ability of replicaGons to
call into quesGon the underlying theory from which an empirical predicGon is drawn. This is
oaen the case in what Oberauer and Lewandowsky (2019) call “discovery-oriented”
research, where a given theory is not precisely formulated and requires a number of
19
(possibly quesGonable) auxiliary assumpGons to derive a predicGon. Importantly, when a
theory is not well specified, not all predicGons derived from the theory may be true as a
number of (if not most) auxiliary assumpGons will turn out to be false.
One consequence of this is that a given predicGon may be false but the theory itself may, in
some more specific form, be true. Accordingly, even if a large number of, for example, direct
replicaGons together make it reasonable to quesGon the reality of a parGcular predicGon or
hypothesis, that parGcular predicGon may be false because one or more auxiliary
assumpGons failed to hold and not the underlying theory. This seems to be a hard limit for
what replicaGons can achieve when it comes to “falsifying” empirical predicGons – even if
they, together, can make it reasonable to doubt a given empirical result, or even a pabern of
predicted results stemming from a parGcular hypothesis, the relevant theory can remain
untouched as long as there are auxiliary assumpGons that can, as it were, take the blame.
However, replicaGons deployed alongside the tesGng of unique predicGons could be a viable
strategy. With each predicGon shown to be problemaGc by numerous unsuccessful direct
replicaGons, the space in which a theory might plausibly hold true shrinks unGl it is
demonstrated to be largely irrelevant. As Nosek and Errington state, “accumulaGng failures-
to-replicate could result in a much narrower but more precise set of circumstances in which
evidence for the claim is replicable, or it may result in failure to ever establish condiGons for
replicability and relegate the claim to irrelevance” (Nosek & Errington, 2020, p. 4). If direct
replicaGons provide iniGal support for the empirical predicGon, conceptual replicaGons can
then expand the field in which a theoreGcal relaGonship holds, for example, across different
populaGons and operaGonalizaGons. In sum, replicaGons can provide evidence in favor of a
theoreGcal predicGon, which in turn supports the theory, but struggle in isolaGon when it
comes to “falsifying” a theory.
4.4 Limita.ons
While replicaGons can be informaGve for theory, they do not always represent a clear-cut
approach for bringing evidence to bear on a given phenomenon, theory, or predicGon. As
touched on above, one key issue is in interpreGng unsuccessful replicaGons, with
suggesGons on how this could be miGgated by minimizing methodological differences,
20
evaluaGng the influence of potenGal moderators, and by providing evidence of successful
operaGonalizaGons.
Another source of ambiguity concerns how to interpret what counts as a successful or

unsuccessful replicaGon. Is a staGsGcally significant finding with an effect size in the same
direcGon sufficient for a successful replicaGon, or do the effect sizes need to be sufficiently
similar? Is a null finding sufficient for a replicaGon to be considered unsuccessful? What if a
replicaGon arrives at a staGsGcally significant finding, but the effect is in the opposite
direcGon to the original study? Most commentators tend to suggest that replicaGons should
not be judged in isolaGon, but together through a meta-analyGc process where the evidence
can be compared in a principled manner (Brandt et al., 2014; Fabrigar & Wegener, 2016;
Hüffmeier et al., 2016; LeBel et al., 2018).
Furthermore, certain precondiGons may need to be present for replicaGons to be

worthwhile. LeBel and colleagues (2018) offer a number of examples in their framework. For
instance, if the original study does not provide transparent design and methodological
details, then it may not be possible to conduct a defensible replicaGon. Furthermore, if
reported staGsGcs and findings cannot be re-calculated using the original data set and
analysis strategies, especially if this results in a null finding, then there may be no valid or
jusGfied finding to support or challenge. AddiGonally, there may be instances when
replicaGng an effect may simply not be worth the resources or effort due to its (lack of)
significance to research and society, or when a number of different types of replicaGons have
already been conducted (Hüffmeier et al., 2016).
Lastly, as some researchers point out, replicaGons are not the only or necessarily the best
way to bring evidence to bear on a finding or predicGon. As Haig (2022) notes, there are
some research designs that are not easily amenable to replicaGon, for instance, when
dealing with longitudinal data spanning large Gmescales, studies with strong spaGal or
temporal dependencies, or more qualitaGve methodologies such as parGcipant observaGon.
This being said, the presence of complicaGng factors in some fields does not more generally
devalue the role replicaGons can play, nor does it mean that replicability is not an ideal to
strive for (Sikorski & Andreoleo, 2023). Furthermore, replicaGons oaen work in tandem with
21
other projects that are arguably of equal importance. As alluded to above, transparency
seems to be an important precondiGon to conducGng replicaGons. As such, efforts to
increase the transparency of research seem central to meeGng this precondiGon (e.g.,
Kidwell et al., 2016; Nosek et al., 2018; Wiggins & Christopherson, 2019).
5. Well-specified theory and replica,on

5.1 The necessity of theory for replica.on
As shown above, a number of researchers believe that conducGng replicaGons can be a
fruitul way to support theory development. However, some commentators have argued that
a sufficiently well-developed theoreGcal foundaGon is important, and perhaps even
necessary, for fruitul and jusGfied replicaGons to be conducted or interpreted (Irvine, 2021;
Klein, 2014; Muthukrishna & Henrich, 2019). On a strong reading of this posiGon, replicaGon
does not fundamentally support theory; theory fundamentally supports replicaGon. If
replicaGons are to be useful for theory, theoreGcal development must come first. This
suggests that proponents of replicaGon may have the crisis of confidence confused:
replicaGon is not the central concern facing the social and psychological sciences; issues of
theory quality are.
For instance, Klein (2014) argues that in order for a study to be considered a good replicaGon
of another study, the methodology of the replicaGon study needs to match all of the
essenGal condiGons of the original study as closely as possible. These are captured and
illustrated by the abstract principles of a theory, and, for instance, detail which contexts an
effect should be observable in and what factors might moderate the effect. Muthukrishna
and Henrich (2019) also raise a similar concern, arguing that without the presence of an
overarching and unifying theoreGcal framework, scienGsts cannot derive specific predicGons
about psychological phenomena. Due to this, scienGsts cannot determine whether a given
finding is suprising or expected, and, by extension, under what condiGons a given finding
should be expected to replicate. Due to this uncertainty, if a replicaGon abempt seems
unsuccessful, then some unknown moderator or variaGon in context may be ulGmately
responsible.
22
While the above researchers focus on unsuccessful replicaGons, Irvine (2021) takes this a
step further, arguing that both unsucessful and successful replicaGon abempts are not
clearly informaGve without a fairly well-developed and established background of theory.
This background of theory involves having a good understanding of the kinds of variables
and shias in context that may impact the relaGonship under invesGgaGon, such as cultural,
demographic, and educaGonal differences, as well as fairly clear understanding of the causal
interacGons between the variables in play (e.g., how a given experimental manipulaGon
affects a target phenomenon). Without this understanding, researchers cannot claim that a
given replicaGon abempt is of good quality.
Arguably, similar posiGons can also be found in the wider literature. To begin with, as noted
in the above, some researchers have argued that replicaGons should provide evidence that
their operaGonalizaGons manipulate or measure the target constructs (Brandt et al., 2014;
Fabrigar & Wegener, 2016; Hüffmeier et al., 2016). Doing so is oaen framed as a means to
minimize alternaGve explanaGons if the replicaGon fails to result in the same findings as the
original study. However, arguably, to provide evidence in favor this claim, some theoreGcal
understanding of the nature of the target constructs is required, or at least an idea of what
the operaGonalizaGons are supposed to be manipulaGng or measuring. For instance,
construct validity is one way to provide evidence in favor of a measure’s validity. However, as
Cronbach and Meehl (1955) argue, specific theories are needed to arGculate what a
measure is meant to represent so that the measure may be assessed.
Similarly, when a study’s manipulaGons and measures are temporally and contextually
bound, uncriGcally using the same methods is not always the best strategy (Stroebe &
Strack, 2014). For instance, as Fabrigar and Wegener note (2016), if the original study is
conducted with students from a specific university, and the materials, manipulaGons, and
measures explicitly capitalize on this associaGon, these would need to be modified if a
replicaGon were conducted with students from another insGtuGon. AddiGonally, as Stroebe
and Strack (2014) argue, if a manipulaGon aims to, for instance, invoke feelings of
embarrassment in parGcipants by having them read out obscene words to a group, what is
considered embarrassing may change over Gme or may differ between cultures. As such, it
seems that replicaGons someGmes should vary their research design or materials from that
23
of the original study. This especially holds if the aim of a given replicaGon is to bring
evidence to bear on the generalizability of an underlying theory across populaGons rather
than merely the generalizability of the specific research design or study materials as
deployed in a given context (Trafimow, 2023). In cases where this is not straightorward,
having a well-specified theory to guide the design of replicaGons could conceivably aid this
process.
Lastly, some researchers define replicaGons in such a way as to make theory directly
relevant. Machery (2022) is an example of this. Key to Machery’s resampling account of
replicaGon is the idea that a study only qualifies as a replicaGon if it samples from the same
populaGon as the original study. “PopulaGon” here should be taken as applying not only to a
study’s parGcipants, but also to the range of, for example, sGmuli, operaGonalizaGons, and
contexts a parGcular finding is expected to generalize over. In this sense, if the finding of a
study is only expected to generalize over an American populaGon, a study conducted with
French parGcipants would be a different experiment and not qualify as a replicaGon. Based
on Machery’s account, a theory defining these populaGons would seem helpful when
determing what does and does not qualify as a replicaGon.
Together, such perspecGves provide some reasons why theory may be essenGal for
replicaGon through specifying the kinds of moderators that may impact a finding, and the
kinds of contexts, materials, parGcipants, and operaGonalizaGons that are necessary for a
replicaGon to knowingly repeat the essenGal condiGons of the original study.
5.2 Replica.on in the absence of a well-specified theory

However, there are a number of reasons to believe that a well-developed theory is not
required to conduct jusGfied and informaGve replicaGons. When there is theoreGcal
uncertainty, a number of assumpGons or educated guesses need to be made regarding the
processes and constructs that an original study is invesGgaGng when designing a replicaGon.
Of course, without a clear understanding of why the to-be-replicated relaGonship may have
manifested in the first place, there is a degree of uncertainty as to whether the essenGal
condiGons that the original study allegedly instanGated are also instanGated in the
replicaGon. However, arguably, an original study’s methodology is a good yet fallible place to
24
start. As Nosek and Errington note, “using the same procedures is an interim soluGon for not
having clear theoreGcal specificaGon of what is needed to produce evidence about a claim”
(Nosek & Errington, 2020, p. 4).
On this view, replicaGons are, at least when there is theoreGcal uncertainty, at best
reasonably believed, rather than known, to instanGate the essenGal condiGons of the
original study. While, ideally, replicaGons should not be “infused with conceptual and
material presupposiGons and uncertainGes,” as Feest (2019, p. 896) puts it, someGmes this
may be unavoidable. The quesGon is, do these uncertainGes preclude the possibility of an
informaGve replicaGon from taking place? Do scienGsts need to have good theoreGcal
understanding to conduct an informaGve replicaGon? Plausibly, the answer is “no.”
Firstly, as Trafimow and Earp (2016) argue, there seem to be mulGple instances of well-
replicated findings in the absence of a well-specified theory. They draw on three examples to
support their case. The first example is phlogiston theory, a theory popular in chemistry
around the 17th to 18th centuries (Conant, 1964). While this theory was ulGmately
empirically undermined (e.g., certain metals were shown to be heavier aaer burning, even
though phlogiston theory predicted they should be lighter), and may not have been enGrely
well-specified (see, however Chang, 2012), researchers of the period were able to reliably
replicate many key findings, including the existence of what is now called oxygen
(“dephlogisGcated” air) and nitrogen (“phlogisGcated” air). Galileo’s experiments rolling balls
down inclined planes represents a similar case (Asimov, 1966). While Galileo did not have a
formal theory to guide his experimentaGon at the Gme, he sGll managed to produce highly
replicable findings describing the relaGonship between the inclines of planes and the
velocity of balls rolling down them. The last case draws on a contemporary example from
psychology, where Trafimow and colleagues (1991) were able to successfully prime the
“private” or “collecGve” self to elicit either more private or more collecGve self-cogniGons.
Importantly, the theory that guided Trafimow and colleagues to this well-replicated finding
(for meta-analysis, see Oyserman & Lee, 2008) was not itself precisely defined.
Trafimow and Earp argue that all three examples are instances of findings being successfully
replicated in the absence of a well-specified theory, suggesGng that well-specified theories
25
are not necessary for producing replicable findings that are informaGve to researchers.
These examples suggest that, at least when it comes to posi2ve replicaGon results, a well-
specified theory does not seem necessary.
Secondly, while the situaGon is more complex with unsuccessful replicaGons, there sGll seem
to be some key ways in which replicaGons can be informaGve in the absence of good
theoreGcal understanding. As noted in the first half of this chapter, unsuccessful replicaGon
results can be ambiguous. However, while ambiguous, they can be informaGve, especially
when mulGple careful replicaGons have been conducted (Earp & Trafimow, 2015). Assuming
the presence of mulGple high-quality replicaGons, a fairly consistent failure to replicate a
reported finding suggests either that the original study’s finding is flawed in some way or
that there is lible understanding of the essenGal condiGons that are required to bring about
the effect. Either way, replicaGons can be informaGve, minimally suggesGng that the
relaGonship under invesGgaGon is not well understood.
There are a couple of examples that can be used to illustrate this point. The first was
menGoned earlier and relates to Obo Loewi’s famous experiments on frogs where persistent
failures to replicate indicated that the process under invesGgaGon was not completely
understood (Nosek & Errington, 2020). Once the troublesome factors were idenGfied,
successful replicaGons became more frequent. Importantly, when Loewi first published his
results, they were met with severe criGcism unGl the variability in the results were
sufficiently well explained and replicated (Ungar, 1975). Thus, replicaGons arguably played
an informaGve role indicaGng that the processes under invesGgaGon were not fully
understood.
Another example comes from the cogniGve dissonance literature as illustrated by Stroebe
and Strack (2014). FesGnger and Carlsmith (1959) tested the hypothesis that an individual
will tend to change their belief if they are induced to say or do something contrary to that
belief. While their finding ulGmately proved replicable, it depended, for instance, on the
parGcipant being made to feel responsible for the lie. However, arguably, the iniGal
difficulGes in replicaGng the finding also indicated that the relaGonship under invesGgaGon
was not well understood.
26
Such examples suggest that frequent replicaGon failures can serve as a kind of “litmus test,”
indicaGng that theoreGcal understanding in a given field is weak. In this sense, replicaGons
can be informaGve in the absence of a well-specified theory as they can, minimally, suggest
that certain aspects of a theory require further development. AddiGonally, at least in
principle, mulGple unsuccessful direct replicaGons can also provide mounGng evidence that
a given finding may be false (LeBel et al., 2018).
Thirdly, for a jusGfiably well-specified theory to be developed, one arguably needs robust
empirical findings. This was also touched on in the first half of this chapter, with Eronen and
Bringmann (2021) arguing that one reason why good psychological theories are difficult to
formulate is that there is insufficient knowledge of robust phenomena to constrain theory
choice. As such, arguing that well-developed theory needs to be in place for replicaGons to
be informaGve may be like puong the cart before the horse. In other words, theories should
not be built on shaky empirical grounds.
Another way of approaching the problem is by recognising two types of scienGfic knowledge
and their relaGonship. The first is a descripGve form of knowledge that certain facts or
events occur, and the second a more explanatory knowledge concerning why those events
occur (Salmon, 1989). The laber more explanatory type of knowledge is generally seen as
being embodied by scienGfic theories, where these theories “furnish” descripGons of fact. As
such, there is a sense in which factual knowledge concerning the truth of events are
epistemically prior to the more explanatory knowledge that furnishes them. For instance,
how could scienGsts theorize about the origin of dinosaurs without first having access to
robust fossil and carbon daGng records, and how could astronomers develop theories to
predict the trajectories of planets without first observing these trajectories and inferring
common paberns? ReplicaGons could play a key role in supporGng the validity of this kind of
factual knowledge, providing evidence in support of or against their robustness.
However, as Irvine (2021) argues, science should ulGmately progress in a manner where
theory and observaGon inform each other in an iteraGve, mutually constraining way: that is,
with observaGons informing theory and theory guiding future observaGons. In other words,
27
theoreGcal development is required alongside observaGonal work. This issue arises when it
comes to conceptual replicaGons, as some kind of understanding of the target construct is
ulGmately required to create different and effecGve operaGonalizaGons thereof. However,
this does not mean that a well-specified theory is required for replicaGons to be informaGve
as argued previously – only that some theoreGcal development may be required.
Furthermore, in some instances, scienGsts may need to use their intuiGon and make an
educated guess if lible explicit theoreGcal development has already occurred.
One key part of this controversy seems to stem from the belief that for replicaGons to be
informaGve and useful it has to be known with confidence that they instanGate the essenGal
condiGons of the original study. As Irvine concludes: “Without this informaGon, a researcher
cannot claim to have performed a good replicaGon. And without this claim, the apparent
success or failure of the replicaGon is not obviously informaGve about whether the original
findings can, in fact, be replicated” (Irvine, 2021, p. 847).
While Irvine maintains this is a defeasible informaGonal state, the claim sGll seems too
strong. As indicated above, replicaGons can be conducted in theoreGcally uncertain
condiGons and sGll be informaGve, if somewhat ambiguous. However, this ambiguity should
not be considered paralyzing for replicaGons. Knowledge (or at least strongly jusGfied
understanding) is too great a requirement for replicaGon. It is sufficient that a replicaGon is
jusGfied as far as the informaGonal context in which they are conceived allows, which may,
at the fronGers of social and psychological science, be mainly through criGcal reference to
the original study’s methodology, use of educated guesses, and so on.
What if replicaGons are defined in a way that makes theory essenGal? If there are strong
grounds to accept an account that necessitates the inclusion of theory, then that would have
an impact on the arguments presented here. However, such accounts tend to sGll leave open
the possibility for replicaGons to be jusGfied only as far as their current informaGonal
context allows. For instance, as Machery (2022) acknowledges in his account, what
consGtutes a replicaGon is not stable and may change as theoreGcal understanding improves
(see also Nosek & Errington, 2020). As an example of this, when there is lible theoreGcal
understanding, what consGtutes a replicaGon may perhaps be determined by the informed
28
judgment of the original study’s author. However, as a greater understanding is developed of
what populaGon of parGcipants, measures, sGmuli, and so on, an effect is expected to
generalize over, what may have been considered a replicaGon at one point may no longer be
considered a replicaGon at another. However, this does not rule out that such “replicaGons”
were iniGally informaGve or played a role in their eventual supersession.
6. Conclusion
While both replicaGon and theory crises have caused concern for scienGsts and philosophers
alike, the reforms that so far have resulted allow for cauGous opGmism. Alongside changes
to publicaGon pracGces and incenGve structures, as well as a greater recogniGon and
miGgaGon of various quesGonable research pracGces, conducGng high-quality replicaGons
seems to be one avenue through which scienGsts can beber understand and meet the
challenges posed by said crises.
As covered in this chapter, replicaGons seem to have the ability to inform theory
development, even when lible theoreGcal development is in place. Through a mix of more
direct and more conceptual replicaGons, they seem to be able to support theory
development in three key ways. Firstly, they can provide evidence to support the conclusion
that a finding represents a valid and stable empirical phenomenon of interest to theorizing.
As phenomena are what theories aim to explain and predict, replicaGons can support theory
building by ensuring that these phenomena are robust and generalizable. Secondly,
unsuccessful replicaGons could also suggest that theoreGcal understanding is weak,
indicaGng that there are unknown moderators or changes in research design impacGng the
relaGonship in quesGon. Lastly, while strict falsifiability may be too high a bar, replicaGon in
conjuncGon with the tesGng of specific theoreGcal predicGons could be a way to reduce,
affirm, or expand the empirical space in which a theory applies.
While indispensable, it is crucial to stress that replicaGon is just one of many useful
direcGons to pursue, with other social, methodological, and theoreGcal foci being potenGally
of similar or equal importance. This especially applies to theoreGcal development, as
replicaGons can arguably only achieve their full efficacy if deployed alongside other
approaches with the purposeful aim of developing theory.
29
Bibliography
Asendorpf, J. B., Conner, M., De Fruyt, F., De Houwer, J., Denissen, J. A., Fiedler, K., . . . ...
(2013). RecommendaGons for increasing replicability in psychology. European Journal
of Personality, 27(2), 108-119.
Asendorpf, J. B., Conner, M., De Fruyt, F., De Houwer, J., Denissen, J. J., Fiedler, K., . . . ...
(2013). ReplicaGon is more than hiong the lobery twice. European Journal of
Personality, 27, 108-119.
Asimov, I. (1966). Understanding physics: mo2on, sound, and heat. New York, NY: Mentor.
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulle2n,
66(6), 423-437.
Baker, M. (2016). 1,500 scienGsts lia the lid on reproducibility. Nature, 533, 452-454.
Bem, D. J. (2011). Feeling the future: experimental evidence for anomalous retroacGve
influences on cogniGon and affect. Journal of Personality and Social Psychology,
100(3), 407-425.
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. -J., Berk, R., . .
. ... (2018). Redifine staGsitcal significance. Nature Human Behaviour, 2, 6-10.
Bird, A. (2016). ScienGfic progress. In P. Humphreys (Ed.), The Oxford Handbook of
Philosophy of Science (pp. 544-563). Oxford: Oxford University Press.
Bogen, J., & Woodward, J. (1988). Saving the phenomena. The Philosophical Review, 97(3),
303-352.
Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., . . . Veer,
A. V. (2014). the replicaGon recipie: what makes for a convincing replicaGon? Journal
of Experimental Social Psychology, 50, 217-224.
Braude, S. E. (1979). ESP and psychokinesis: a philosophical examina2on. Philadelphia:
Temple University Press.
Callard, F. (2022). ReplicaGon and reproducGon: crisis in psychology and academic labour.
Review of General Psychology, 26(2), 199-211.
Callaway, E. (2011). Report finds massive fraud at Dutch universiGes. Nature, 479, 15.
Chang, H. (2012). Is water H20? evidence, realism, and pluralism. Dordrecht, the
Netherlands: Springer Dordrecht.
Collins, H. M. (1992). Changing order: replica2on and induc2on in scien2fic prac2ce. Chicago:
University of Chicago Press.
Conant, J. B. (1964). The overthrow of the phlogiston theory: the chemical revolu2on of
1775-1789. Cambridge, MA: Harvard University Press.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological
Bulle2n, 52(4), 281-302.
Driver, J., Davis, G., Ricciardelli, P., Kidd, P., Maxwell, E., & Baron-Cohen, S. (1999). Gaze
percepGon triggers reflexive visuospaGal orienGng. Visual Cogni2on, 6, 509-540.
Dunnebe, M. D. (1966). Fads, fashions, and folderol in psychology. American Psychologist,
21(4), 343-352.
30
Earp, B. D. (2016). What did the OSC replicaGon iniGaGve reveal about the crisis in
psychology? An open review of the draa paper enGtled "ReplicaGon iniGaGves will
not salvage the trustworthiness of psychology" by James C. Coyne. BMC Psychology,
4(28), 1-19.
Earp, B. D. (2017). The need for reporGng negaGve results—a 90 year update. Journal of
Clinical and Transla2onal Research, 3(S2), 1-4.
Earp, B. D. (2020). FalsificaGon: How does it relate to reproducibility? In J.-F. Morin, C.
Olsson, & E. O. AGkcan (eds.), Research Methods in the Social Sciences: An A-Z of Key
Concepts (pp. 119-123). Oxford: Oxford University Press.
Earp, B. D., & Trafimow, D. (2015). ReplicaGon, falsificaGon, and the crisis of confidence in
social psychology. Fron2ers in Psychology, 6, 621.
Elms, A. C. (1975). The crisis of confidence in social psychology. American Psychologist,
30(10), 967-976.
Eronen, M. I., & Bringmann, L. F. (2021). The theory crisis in psychology: how to move
forward. Perspec2ves on Psychological Science, 16(4), 779-788.
Fabrigar, L. R., & Wegener, D. T. (2016). Conceptualizing and evaluaGng the replicaGon of
research results. Journal of Experimental Social Psychology, 66, 68-80.
Feest, U. (2019). Why replicaGon is overrated. Philosophy of Science, 86(5), 895-905.
FesGnger, L., & Carlsmith, J. M. (1959). CogniGve consequences of forced compliance. The
Journal of Abnormal and Social Psychology, 58(2), 203-210.
Fiedler, K. (2017). What consGtutes strong psychological science? the (neglected) role of
diagnosGcity and a priori theorizing. Perspec2ves on Psychological Science, 12(1), 46-
61.
Francis, G. (2012). PublicaGon bias and the failure of replicaGon in experimental psychology.
Psychonomic Bulle2n & Review, 19, 975-991.
Friese, M., Loschelder, D. D., Gieseler, K., Frankenbach, J., & Inzlicht, M. (2019). Is ego
depleGon real? an analysis of arguments. Personality and social psychology review,
23(2), 107-131.
Friesen, C. K., & Kingstone, A. (1998). The eyes have it! Reflexive orienGng is triggered by
nonpredicGve gaze. Psychonomic Bulle2n & Review, 5, 490-495.
Frischen, A., Bayliss, A. P., & Tipper, S. P. (2007). Gaze cueing of abenGon: visual abenGon,
social cogniGon, and individual dufferences. Psychological Bulle2n, 133, 694-724.
Giner-Sorolla, R. (2012). Science or art? how aestheGc standards grease the way through the
publicaGon bobleneck but undermine science. Perspec2ves on Psychological Science,
7(6), 562-571.
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis.
Psychological Bulle2n, 82(1), 1-20.
Hüffmeier, J., Mazei, J., & Schultze, T. (2016). Reconceptualizing replicaGon as a sequence of
different studies: a replicaGon typology. Journal of Experimental Social Psychology,
66, 81-92.
Hagger, M. S., ChatzisaranGs, N. L., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., . . . ...
(2016). A mulGlab preregistered replicaGon of the ego-depleGon effect. Perspec2ves
on psychological science, 11(4), 546-573.
Haig, B. D. (2022). Understanding replicaGon in a way that is true to science. Review of
General Psychology, 26(2), 224-240.
Hengartner, M. P. (2018). Raising awarenessfor the replicaGon crisis in clinical psychology by
focusing on inconsistencies in psychotherapy research: how much can we rely on
published from efficacy trials. Fron2ers in Psychology, 9, 256.
31
Hietanen, J. K. (1999). Does your gaze direcGon and head orientaGon shia my visual
abenGon? NeuroReport, 10, 3443-3447.
Hutmacher, F., & Franz, D. J. (2024). Approaching psychology's current crises by exploring the
vagueness of psychological concepts: recommendaGons for advancing the discipline.
American Psychologist, Advance online publicaGon.
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine, 2(8),
e124.
Ioannidis, J. P. (2012). Why science is not necessarily self-correcGng. Perspec2ves on
Psychological Science, 7(6), 645-654.
Irvine, E. (2021). The role of replicaGon studies in theory building. Perspec2ves on
Jasny, B. R., Chin, G., Chong, L., & Vignieri, S. (2011). Again, and again, and again... Science,
334(6060), 1225.
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevelance of quesGonable
research pracGces with incenGves for truth telling. Psychological Science, 23(5), 524-
532.
Kidwell, M. C., Lazarević, L. B., Baranski, E., Hardwicke, T. E., Piechowski, S., Falkenberg, L. S.,
. . . Nosek, B. A. (2016). Badges to acknowledge open pracGces: a simple, low-cost,
effecGve method for increasing transparency. PLoS Biol, 14(5), e1002456.
Klein, R. A., Ratliff, K. A., Vianello, M., Adams Jr., R. B., Bahník, Š., Bernstein, J., . . . ... (2014).
InvesGgaGng variaGon in replicability: a "many labs" replicaGon project. Social
Psychology, 45(3), 142-152.
Klein, S. B. (2014). What can recent replicaGon failures tell us about the theoreGcal
commitments of psychology? Theory & Psychology, 24(3), 326-338.
Koole, S. L., & Lakens, D. (2012). Rewarding replicaGons: a sure and simple way to improve
psychological science. Perspec2ves on Psychological Science, 7(6), 608-614.
Kruglanski, A. W. (1975). Theory, experiment and the shiaing publicaGon scene in personality
and social psychology. Personality and social psychology bulle2n, 1(3), 489-492.
Kruglanski, A. W., & Higgins, E. T. (2004). Theory construcGon in social personality
psychology: personal experiences and lessons learned. Personality and social
psychology review, 8(2), 96-97.
Lakens, D. (2023). Concerns about replicability, theorizing, applicability, generalizability, and
methodology accross two crises in social psychology. Retrieved from PsyArXiv
[preprint]: hbps://psyarxiv.com/dtvs7/
Lambdin, C. (2012). Significance tests as sorcery: science is empirical - significance tests are
not. Theory & Psychology, 22(1), 67-90.
Langton, S. R., & Bruce, V. (1999). Reflexive visual orienGng in response to the social
abenGon of others. Visual Cogni2on, 6, 541-567.
LeBel, E. P., McCarthy, R. J., Earp, B. D., Elson, M., & Vanpaemel, W. (2018). A unified
framework to quanGfy the credibility of scienGfic findings. Advances in Methods and
Prac2ces in Psychological Science, 1(3), 389-402.
Loscalzo, J. (2012). Irreproducible experimental results: causes, (mis)interpretaGons, and
consequences. Circula2on, 125(10), 1211-1214.
Lykken, D. T. (1991). What's wrong with psychology anyway? In D. Ciccheo, & W. M. Grove
(Eds.), Thinking clearly about psychology (pp. 3-39). Minneapolis: University of
Minneapolis Press.
Machery, E. (2022). What is a replicaGon? Philosophy of Science, 87(4), 545-567.
32
Mayo, D. G. (2018). Sta2s2cal inference as severe tes2ng: How to get beyond the sta2s2cs
wars. Cambridge: Cambridge University Press.
Meehl, P. E. (1967). Theory-tesGng in psychology and physics: a methodological paradox.
Philosophy of Science, 34(2), 103-115.
Meehl, P. E. (1978). TheoreGcal risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow
progress of soa psychology. Journal of Consul2ng and Clinical Psychology, 46(4), 806-
834.
Meehl, P. E. (1990). Why summaries of research on psychological theories are oaen
uninterpretable. Psychological Reports, 66(1), 195-244.
Melnikoff, D. E., & Bargh, J. A. (2023). Hoist by its own petard: The ironic and fatal flaws of
dual-process theory. Behavioral & Brain Sciences, 46, e-commentary.
Melnikoff, D. E., & Bargh, J. A. (2018). The mythical number two. Trends in Cogni2ve
Sciences, 22(4), 280-293
Muthukrishna, M., & Henrich, J. (2019). A problem with theory. Nature Human Behaviour, 3,
221-229.
Nosek, B. A., & Errington, T. M. (2020). What is replicaGon? PLoS Biology, 18(3), e3000691.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistraGon
revoluGon. Psychological and Cogni2ve Sciences, 115(11), 2600-2606.
Oberauer, K., & Lewandowsky, S. (2019). Addressing the theory crisis in psychology.
Psychonomic Bulle2n & Review, 26, 1596-1618.
O'Donnell, M., Nelson, L. D., Ackermann, E., Aczel, B., Akhtar, A., Aldrovandi, S., . . . ... (2018).
Registered replicaGon report: Dijksterhuis and van Knippenberg. Perspec2ves on
Open Science CollaboraGon. (2015). 2015. Science, 349(6251).
Oyserman, D., & Lee, W. S. (2008). Does culture influence what and how we think? Effects of
priming individualism and collecGvism. Psychological Bulle2n, 134(2), 311-342.
Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? three arguments
examined. Perspec2ves on Psychological Science, 7(6), 531-536.
Pashler, H., & Wagenmakers, E. (2012). Editors' introducGon to the special selecGon on
replicability in psychological science. Perspec2ves on Psychological Science, 7(6), 528-
530.
Popper, K. (1959). The logic of scien2fic discovery. London: Hutchinson.
Radder, H. (1996). In and about the world: philosophical studies of science and technology.
Albany, NY: SUNY Press.
Ramscar, M., Shaoul, C., Baayen, R. H., & Tbingen, E. K. U. (2015). Why many priming results
don’t (and won’t) replicate: A quanGtaGve analysis. Unpublished manuscript.
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological
Bulle2n, 86(3), 638-641.
Rosenthal, R. (1993). CumulaGng evidence. In G. Keren, & C. Lewis (Eds.), A handbook for
data analysis in the behavioral sciences: methodological issues (pp. 519-559).
Lawerence Erlbaum Associates, Inc.
Salmon, W. C. (1989). Four decades of scien2fic explana2on. Pibsburgh: University of
Pibsburgh Press.
Schmidt, S. (2009). Shall we really do it again? the powerful concept of replicaGon is
neglected in the social sciences. Review of General Psychology, 13(2), 90-100.
Sikorski, M., & Andreoleo, M. (2023). Epistemic funcGons of replicability in experimental
sciences: defending the orthodox view. Founda2ons of Science.
33
Simmons, J. P., & Simonsohn, U. (2017). Power posing: p-curving the evidence. Psychological
Science, 28(5), 687-693.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-posiGve psychology: undisclosed
flexibility in data collecGon and analysis allows presenGng anything as significant.
Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replicaGon.
Perspec2ves on Psychological Science, 9(1), 59-71.
Trafimow, D. (2023). A new way to think about internal and external validity. Perspec2ves on
Psychological Research, 18(5), 1028-1046.
Trafimow, D., & Earp, B. (2016). Badly specified theories are not responsible for the
replicaGon crisis in social psychology: comment on Klein. Theory and Psychology,
26(4), 540-548.
Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C. J., Beh, E. J., Bilgiç, Y. K., . . . ...
(2018). ManipulaGng the alpha level cannot cure significance tesGng. Fron2ers in
Psychology, 9, 699.
Trafimow, D., Triandis, H. C., & Goto, S. G. (1991). Some tests of the disGncGon between the
private self and the collecGve self. Journal of Personality and Social Psychology, 60(5),
649-655.
Tsang, E. W., & Kwan, K. M. (1999). ReplicaGon and theory development in organizaGonal
science: A criGcal realist perspecGve. Academy of Management review, 24(4), 759-
780.
Ungar, G. (1975). Molecular coding of memory. Minireviews of the Neurosciences from Life
Sciences, 459-468.
Wabs, T. W., Duncan, G. J., & Quan, H. (2018). RevisiGng the marshmallow test: a conceptual
replicaGon invesGgaGng links between early delay of graGficaGon and later outcomes.
Wiggins, B. J., & Christopherson, C. D. (2019). The replicaGon crisis in psychology: an
overview for theoreGcal and philosophical psychology. Journal of Theore2cal and
Philosophical Psychology, 39(4), 202-217.
Zwaan, R., Etz, A., Lucas, R., & Donnellan, M. (2018). Making replicaGon mainstream.
Behavioral and Brain Sciences, 41, E120.
34
View publication stats

ReplicationandtheorydevelopmentinthesocialandpsychologicalsciencesFINAL

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ReplicationandtheorydevelopmentinthesocialandpsychologicalsciencesFINAL

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Replication and Theory Development in the Social and Psychological Sciences

Chapter · December 2024

The user has requested enhancement of the downloaded file.

2.2 The theory crisis

The importance of replicaGon is oaen emphasised both by empirical researchers and

3.2 The direct/conceptual dis.nc.on

Furthermore, diﬀerence is part of what makes replicaGon useful (Schmidt, 2009). As a

3.3 Interpre.ng successful replica.ons

Generalizability in this sense is ulGmately tested through conceptual replicaGons using

3.4 Interpre.ng unsuccessful replica.ons

To reduce the ambiguity of unsuccessful replicaGons, researchers tend to suggest two

Such approaches aim to increase the informaGveness of replicaGons through increased

4. Linking replica,on to theory building

4.1 Iden.fying robust phenomena

However, as noted previously, unsuccessful conceptual replicaGons tend to be ambiguous, as

4.2 Replica.on and boundary condi.ons

4.3 Tes.ng theore.cal predic.ons

Another source of ambiguity concerns how to interpret what counts as a successful or

Furthermore, certain precondiGons may need to be present for replicaGons to be

5. Well-speciﬁed theory and replica,on

5.2 Replica.on in the absence of a well-speciﬁed theory

View publication stats

You might also like