B - IMAM, A. (2020) - Historically Recontextualizing Sidman S Tactics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Journal of the Experimental Analysis of Behavior 2021, 115, 115–128 NUMBER 1 (JANUARY)

Historically recontextualizing Sidman’s Tactics: How behavior


analysis avoided psychology’s methodological Ouroboros
Abdulrazaq A. Imam
John Carroll University

Psychology is undergoing major cultural changes methodologically, with efforts to redefine how psy-
chologists analyze and report their data. Davidson (2018) argued that psychology’s methodological cri-
ses stem from mechanical objectivity involving the adoption of an analytic tool as source of dependable
knowledge. This has led to institutionalization, and eventually uncritical ritualistic use, such as hap-
pened with null hypothesis statistical testing. Davidson invoked the mythological symbol of the
Ouroboros to represent the endless churning of statistical fads. Sidman (1960), in his Tactics of Scientific
Research provided a shield from these problems in terms of the premium he placed on the experience,
expertise, judgement, and decision-making of the scientist, that appear to be absent in psychology’s ritu-
alized processes.
Key words: psychological science, statistical reporting, NHST, behavior analysis, single-subject designs,
small-N designs, Sidman, history

When Tactics of Scientific Research turned Schwartzman, 2018), or single-subject (e.g.,


30 in 1990, The Behavior Analyst devoted a spe- Perone, 1994) research design, but is best
cial issue to commemorate its achievements appropriately referred to as small-N, within-
for behavior analysis. In his reply to contribu- subject design because the modern version usu-
tions to the issue, Sidman pointed out the dif- ally involves more than just a single subject or
ference in the methodology covered in Tactics participant. The alternative approach in psy-
from that of psychology, despite sharing chology usually involves groups of subjects or
important affinities with other scientific disci- participants whose average data are then com-
plines, noting that “[b]ehavior analysts need pared using null hypothesis statistical testing
not act defensively because of their differences (NHST; Johnston & Pennypacker, 1993; Poling
with psychology; they can stand proud in their et al., 1995; Rozeboom, 1960; Runyon
connections with more advanced sciences” et al., 1996; Schneider, 2015). Because of the
(1990, p. 188). By using and advocating reli- undue dependence on NHST for data analyses
ance on in-depth study of the individual and interpretation in this approach, many have
organism, Sidman’s (1960) approach fits very roundly criticized it for contributing to lack of
well with what had worked successfully in phys- progress in psychological and allied sciences
iology and medicine (Bernard, 1927/1957) (Branch, 2014, 2019; Falk & Greenbaum, 1995;
that predated the introduction and then Lambdin, 2012; Meehl, 1978; Morrison &
increasing adoption of inferential statistics in Henkel, 1970; Rozeboom, 1960).
psychology in the mid-1940s. The approach Davidson (2018) provided a framework for
was known to some as N = 1 research at the looking at the fate of methodological practices
time (Dukes, 1965) and now sometimes in psychology, such as the ritualized NHST
referred to as N-of-1 (American Psychological usage, using the Ouroboros metaphor, the
Association, 2020), single-case (e.g., Boswell & mythical symbol of endless “cycle of destruc-
tion and rebirth” (Merriam-Webster, 2020).
The general tenor of the Ouroboros metaphor
An earlier version of this paper was presented at the 45tt in Davidson’s analysis is that new tools emerge
annual meeting of the Association for Behavior Analysis and are improved upon over time until they
International, in Chicago, IL. I thank Mike Perone, Kobla
Agbota, and two anonymous reviewers for helpful com- become perceived as means for producing
ments on an earlier version of the paper. trustworthy knowledge. As purveyors of trust-
Address correspondence to the author at Depart- worthy knowledge, the tools become institu-
ment of Psychology, John Carroll University, 1 John tionalized as “forms of thinking,” which then
Carroll Blvd, University Heights, OH 44118, USA.
Email: aimam@jcu.edu
renders them ritualistic. The ritualized tools
doi: 10.1002/jeab.661 then become subjects of widespread criticisms

© 2020 Society for the Experimental Analysis of Behavior


115
116 Abdulrazaq A. Imam

Figure 1 decisions on null and alternative hypotheses fol-


General Outline of Stages in Davidson’s (2018) Methodological lowing data collection (Cohen, 1990; Dienes,
Ouroboros Template (A) and the Framework for Historical Ele-
ments in Psychology That Map On to the Template (B) 2016; Runyon et al., 1996). How did behavior
analysis avoid the fate of the cyclical wrath of
psychology’s methodological Ouroboros? The
path of methodological demarcation for behavior
analysis and psychology appears to be laid by his-
torical, epistemological, philosophical, and practi-
cal considerations that set the two apart.
Sidman’s Tactics provided the essential rationales
for the behavior analytic approach, helping to
shield it from the same fate that befell the NHST
ritual in psychology.
The present paper intends to accomplish
three objectives. The first is to recontextualize
Sidman’s contributions in a broader historical
scheme of developments in methodological
innovations and adoption in psychology within
the Ouroboros framework, including the evi-
dence in support of the framework. A second
objective is to give an account of controversial
and sometimes conflicting philosophical and
epistemological considerations about the pur-
ported subject matter of psychology and how to
study it. A final objective is to illustrate how
Sidman’s positions on important themes and his
advocacy for expert judgment shielded behavior
Note. Ouroboros. Adapted from Ouroboros by Mohamed analysis from experiencing similar cyclical fates
Ibrahim (https://www.learnreligions.com/ouroboros- proffered by the Ouroboros analysis.
4123019). In the public domain.

Historical Framework for Contextualizing


Sidman’s Tactics: Psychology’s Methodological
as “uncritical science,” and, eventually, other Ouroboros
tools emerge to address the criticisms and the
process would begin anew triggering a new cycle Davidson (2018) argued that psychology
(Davidson, 2018, pp. 469-470; see Fig. 1A). This lives on mechanical objectivity, by which it
is what Davidson argued happened to ritualistic tries to arrive at truth, in the methods it uses.
NHST use in psychology, and which he feared is These methods are widely adopted in stan-
likely to become the fate of statistical effect sizes dardized forms. According to Davidson, there
(ESs) that include the familiar standardized is an apparent struggle between the methods
mean difference, Cohen’s d, and other ESs not and expert interpretation. In psychology, the
commonly recognized as such (e.g., mean, mean standardized method wins over expertise by
difference, percentage, etc.; Cohen, 1990; default, largely because of this mechanical
Cumming, 2014b). Currently, ESs are rec- objectivity (see also Cohen, 1990), or “the cul-
ommended as alternatives to the overemphasis of tural norm of using quantitative methods and
NHST in psychology today (Cumming, 2014a). following standardized rules when producing
Of course, ESs are not new to psychological scientific knowledge,” which is in sharp oppo-
researchers, but they suffered from institutional sition to “trained judgement or to expertise”
neglect until the recent reform efforts of the (Davidson, 2018, p. 470). In other words, the
Association for Psychological Science (APS) as standardization did not come about by acci-
articulated by Cumming (2014a; see also dent. It emerged from the epistemological
Cohen, 1990). The use of NHST in psychology, wheel of mechanical objectivity.
of course, entails the ritualistic adoption of infer- Figure 1B outlines the general historical ele-
ential test statistics like the t- or F-test to make ments of the Ouroboros metaphor in general
Psychology’s Methodological Ouroboros 117

psychology’s methodology. The precursors to accept-reject dichotomy of Neyman and Pear-


the elements of the Ouroboros that David- son” (Woods, 2011, p. 41), and the hybrid
son (2018) described illuminate the strength of NHST has reigned in psychological research
its impacts on the prevailing methodological ever since. In light of Davidson’s (2018) analy-
practices in mainstream psychology. Histori- sis, the hybrid NHST became a standard tool
cally, the predominance of NHST in psychol- in psychological research and the research rit-
ogy today is really a byproduct of a uals that emerged became cemented by post-
hybridization of two different controversial sta- World-War-II textbooks. The ritualization has
tistical approaches to inference introduced and drawn a myriad of criticisms (e.g., Branch,
promoted for nearly a decade (1925-1933) by 1999, 2014; Harlow et al., 1997; Lambdin,
competing voices on the subject, namely, Fish- 2012; Morrison & Henkel, 1970; Rozeboom,
er’s on statistical significance testing (SST) and 1960; Woods, 2011), which variously decried
Neyman-Pearson’s on statistical hypothesis test- the pervasive adoption of the hybridized
ing (SHT). Schneider (2015) provided a con- NHST and the conventional p-value, the “oft-
cise summary of the differences between the neglected role of interpretation” (Davidson,
two positions. Schneider’s account of the con- 2018, p. 470) of statistics, and the mindless rit-
trasting stances in each approach indicates that uals involved. Davidson’s main point of con-
they could not have been more different from tention is that the outcome of the cycle of
each other (see also Lambdin, 2012). They dif- mechanical objectivity has resulted in the
fered in their interpretations of p-values recent recommendations for the adoption of
(as evidence against null or to determine how effect sizes such as Cohen’s d, as “new tools”
much error is tolerable), in how and when to adopt as replacement, which would begin,
(before or after data collection) to use the sta- predictably, another methodological
tistical tools, and in their purported purpose Ouroboros that would meet the same fate to
(inductive or deductive), among others. An ESs as with NHST previously in psychological
important point of contention between the two research. Beyond the advocacy of ESs, includ-
approaches is on the meaning of p-values espe- ing Cumming’s (2014a, b) so called “new” sta-
cially when reporting nonsignificant results: tistics (new because of the entrenchment of
Fisher’s SST would be silent on the null, but NHST), Davidson’s strongly recommended
Neyman-Pearson’s SHT would accept null, antidote is to include expert judgement in its
which “has nothing to do with the actual truth use to avoid the fate of NHST in psychology.
of H[0]” (Schneider, 2015, p. 415). Despite Davidson offered examples of advocacy for
their fundamental differences, the two expertise, noting, “[r]eintroducing expertise
approaches were combined, perhaps for expe- as a justifiable mode of reasoning requires
diency, or probably for the purpose of institu- ongoing fostering and rehabilitating of psycho-
tionalization discussed further below. The logical researcher’s content knowledge and
differences between them point to the reality self-confidence” (2018, p. 475). Sidman’s
that the hybrid, by virtue of merging these (1960) anecdote in the next section is illustra-
apparently incompatible positions, represents a tive. Of crucial interest in the present paper is
bastardization of each approach. Nevertheless, thus the historical circumstances of that first
it is what we have today, standardized for the Ouroboros of the NHST phenomenon in psy-
purpose of discovery of “truth” in psychology. chology that justify the calls for trained expert
How did the hybrid come about? The judgement, which in turn provides the context
answer lies in the textbooks on statistics and in which Sidman’s (1960) antidotal contribution
experimental design, which provided the rules in behavior analysis is to be appreciated. One
of the game early on. According to Hubbard might wonder how speculative Davidson’s (2018)
and Ryan (2000), Snedecor, who was a chief Ouroboros framework is: Is there any evidence
promoter of the Fisherian perspective, publi- to support the historical developments in the
shed Calculation and Interpretation of the Analysis introduction, adoption, institutionalization, and
of Variance and Covariance and Statistical then ritualization that marked the life of NHST
Methods in 1934 and 1937, respectively. There- in psychological research? I argue in what fol-
after, textbooks like Guildford’s Fundamental lows that such evidence exists. Hubbard and
Statistics in Psychology and Education appeared Ryan (2000) provided the empirical evidence
in 1942, consolidating “Fisher’s logic with the for what happened in psychology in this regard
118 Abdulrazaq A. Imam

Figure 2
Percentage of Articles Reporting Combined Probable Errors (PEs) and Critical Ratios (CRs) Vs. P-Values in Empirical Articles Appearing
in 12 APA Journals between 1911 and 1998

Note. No inferential statistics were reported from 1894 to 1910. Adapted from Hubbard and Ryan (2000). Copyright 2000
by Sage Publishing.

in their historical analysis of statistical reporting was the process of the emergence of the
practices as outlined in the next section. hybridization of the two aforementioned statis-
tical approaches (viz., SST and SHT) in the
Empirical Evidence for Psychology’s form of NHST and p-value reporting.
Methodological Ouroboros Figure 2 summarizes Hubbard and
Ryan’s (2000) presentation of yearly reporting
Hubbard and Ryan (2000) examined of the three statistics showing that crs and pes
12 American Psychological Association (APA) were the only inferential statistics reported
journals, ranging from clinical, educational, from 1911, averaging about 12%, until about
experimental, to general review journals, with 1928. After that, there was a surge in the two
highly consistent correspondences with statistics to about 36% on average, until 1941
Davidson’s (2018) elements of the Ouroboros when they declined to about 25%, before
outlined in the previous section. Their ana- finally dropping to 4% in 1948. Reporting of
lyses revealed that between 1894 and 1909, the two statistics peaked through the 1930s,
covering some 15 years, there were no reports specifically between 1928 and 1941 (A in
of inferential statistics in these journals at all. Fig. 2) just about when p-value reporting
The first reports of inferential statistics were started (B in Fig. 2). It was during this immedi-
two SST statistics, which appeared in 1910: 1) ate period of hybridization (1940s; see
the critical ratio (cr; of the mean difference to Fig. 1B), we see a gradual appearance of p-
its standard deviation), and 2) probable error value reporting until an intermediate phase
(pe) defined as the equivalent of two z scores between 1942 and 1947 when it became as
(Hubbard & Ryan, 2000). In a graphical pre- common as the other two statistics (C in
sentation of the combined data of crs, pes, and Fig. 2) when crs and pes started to decline
p-values for all the journals included in their (C in Fig. 2) relative to the preceding decade
study from 1911 to 1998, the period in which (A in Fig. 2). By the time of the institutionali-
these tests were reported in the target zation of NHST beginning in the early 1950s,
journals, Hubbard and Ryan recorded general the other two indices went down virtually to
growths in spurts roughly spanning a decade zero (D in Fig. 2) as p reporting rose to its
each followed by leveling off of the growths asymptote in the mid-1950s (E in Fig. 2). As
about the middle of the 1950s, which marked noted above, most importantly, there were
the institutionalization of NHST’s mechanical 48 empirical articles published in these
objectivity. As interesting as these growths in journals between 1894 and 1909 that did not
the three statistics may appear to be, however, report any inferential statistic at all (see Hub-
the more important historical development bard & Ryan, 2000, Table 1, p. 667), indicating
Psychology’s Methodological Ouroboros 119

that the latter is not indispensable for empiri- had been limited attention to ESs in text-
cal psychological research. books (Capraro & Capraro, 2002), recent
The mapping of the historical evidence evidence of increased reporting of ESs in
offered in Hubbard and Ryan’s (2000) data select journals following the APS efforts to
onto Davidson’s (2018) proposition of a reform statistical reporting in psychology
methodological Ouroboros in psychology’s (Imam & Frate, 2019) suggests the initial
epistemology proceeds as follows. Before stages of the new cycle in the Ouroboros for
the1910s, psychological publications in APA ESs may be already underway. The stage for
journals reported no inferential statistics at ritualization of ES reporting and interpreta-
all, despite decades of empirical research in tions, particularly the standardized varieties
psychology. Figure 1B shows that from about such as Cohen’s d, is set by the appeal of
1925 to 1933, the development of new adopting “rules of thumb for characterizing
research tools in psychology was already effect size as low, medium, or large”
underway in the debates between Fisher’s (Davidson, 2018; Stewart, 2000, p. 687), in
and Neyman-Pearson’s respective statistical the same vein as the adoption of p < .05 in
approaches with their attendant controver- NHST reporting yielded mechanical
sies. During the period, cr and pe reporting objectivity.
appeared and peaked (see Fig. 2, A). By the Before 1911, psychological researchers
1940s, the hybrid NHST had emerged and reporting in APA journals did not employ
began to rise (see Fig. 2, B and C). Trustwor- inferential statistics either of the non-NHST
thy knowledge was already emanating from varieties or of the hybrid NHST that emerged
the adoption of the hybrid as promoted by in psychology textbooks in the 1940s
postwar textbooks. In 1952, the first edition (Hubbard & Ryan, 2000), as noted above.
of the APA publication manual appeared, How then did researchers evaluate experi-
marking the beginning of the institutionali- mental and other data in psychology? Hub-
zation of NHST as a universal tool for bard and Ryan noted that before1940,
research in psychology and allied disciplines experimentation with the individual partici-
(see Lambdin, 2012). From 1960 to the pre- pant, for the most part, formed the basis for
sent, the use of NHST became ritualized in “inductive claims in psychology” (2000,
psychological research (see Fig. 2, E and p. 666). Schneider (2015) observed that part
beyond) and had effectively displaced the of Neyman and Pearson’s objections to Fish-
other two inferential statistics altogether er’s SST was due to the latter’s “inherent sub-
(see Fig. 2, D). Criticisms of NHST began jective interpretation, as well as the concept
early and continued to be a central contro- of inductive inference” (p. 415). Pre-1940
versial feature of research in psychology for research was not conducted in a vacuum,
decades (e.g., Harlow et al., 1997; Morri- therefore; psychological researchers thrived,
son & Henkel, 1970), during and beyond the but with a wholly different approach than was
years covered by Hubbard and Ryan’s data. represented by the predominant 20th century
The previous (6th) edition of the APA man- NHST approach that emerged thence. What
ual, for example, appeared in 2010 rec- foundational differences with psychology
ommending reporting of ESs and other allowed behavior analysis to evade the same
measures explicitly (Fidler, 2010), rep- influences of inferential statistics on data eval-
resenting the introduction of an alternative uation? The apparent marriage of psychologi-
(to NHST) research tool for psychologists. cal research and statistical inference was
The new emphasis on ESs served as the wrought with a turmoil that is unnecessary for
impetus for Davidson’s cautionary identifica- a productive scientific endeavor. The histori-
tion of psychology’s methodological cal elements that brought the coupling about,
Ouroboros, which might undermine the new however, were born out of philosophical, his-
emphases on ESs. Despite their long- torical, epistemological, and practical consid-
standing availability to researchers, Davidson erations with which earlier generations of
fears that the new advocacy for use and researchers had to contend. The next
reporting of ESs that emerged from the sun- section explores some of these to illuminate
dry criticisms of the null ritual in psychology, how behavior analysis diverged from the rest
awaits similar ritualization. Although there of psychology, in principle, and in so doing,
120 Abdulrazaq A. Imam

avoided the rancor associated with the path Whereas psychology embraced inferential statis-
psychology took. tics and carried it further into the realm of indi-
vidual differences and mental testing (see
Cowles, 2001; Cronbach, 1957), behavior
Points of Departure: Historical Themes that
analysis treated it methodologically as subject
Separated Behavior Analysis from Main
to control (Sidman, 1960) and analysis
Stream Psychology
(e.g., Page & Neuringer, 1985). Variability is
A confluence of historical and practical fac- not treated as a given (as the behavioral engi-
tors surrounding the declared subject matter neer might do), but rather as something to
of psychology help explain the relationships in be analyzed for discovering its sources in
the development of probability theory, treat- functional behavior–environment relations
ment of observations and variability, and the (Sidman, 1960; cf. Cronbach, 1957).
attendant issues of reaching conclusions about Perhaps because behavior analysis was never
generality (Cohen, 1990; Cowles, 2001; as preoccupied with issues associated with
Stigler, 1992). Cowles’s (2001) historical mental measurements as the rest of psychol-
account places the nexus of evolution, biomet- ogy was (Cowles, 2001; Cronbach, 1957), the
rics, and eugenics as providing the founda- different attitudes adopted towards variability
tional terrain for bringing statistics into obviated the historical necessity for inferential
psychology. According to Cowles, “[c]uriosity statistics that was of particular interest to main-
about diversity and variability leads to attempts stream psychology. Although Sidman (1960)
to classify and to measure” (2001, p. 2). In devoted two whole chapters of Tactics to the
essence, what made doing science possible in multifaceted topic of variability, the first men-
that historical context was the marriage of tion of it occurred earlier on in the book on
measurement and statistics. Consequently, the subject of representativeness, where he
doing science meant classifying all the varieties admonished against reliance on the use of
and assessing them, which required measuring groups and statistics as a solution. The behav-
them by way of quantification and data collec- ior analytic approach to variability has been,
tion, which then required statistical techniques and one hopes still remains, to subject it to
for data handling and evaluation. The follow- experimental manipulation, as available tech-
ing subsections discuss the respective positions niques and technology would permit, in order
of mainstream psychology and behavior analy- to decipher its role in a functional relation-
sis on the historically related issues of the role ship. In response to remarks on the desirabil-
of variations, of reaching truth by way of error ity of a possible allowance for inferential
estimation to deal with chance events, and of statistics in behavior analysis on the aforemen-
the use of probability in estimating errors due tioned 30th anniversary of the publication of
to chance with reliance on randomization as a Tactics, for example, Sidman asserted that
baseline. Statistical studies can tell us how many
subjects—but not which ones—are
Roots: Of Variations or Variability likely to show a particular effect of,
There has been a long-standing interest in say, age variation. Statistical analysis
the role of variations in nature from perspectives cannot yield a functional relation
of what is responsible for them, of their perva- between the age of any individual and
siveness and extent, of their impacts on organ- the effect in which we are interested.
isms individually and in whole, and of how to (1990, p. 191)
control or manipulate them (Cowles, 2001). Var-
iation is ubiquitous as an essential element, in Positions such as this allowed behavior analysts
many areas, from biology to psychology and to dodge the implied necessity for inferential
behavior analysis, in natural selection and selec- statistics in dealing with variability that main-
tion by reinforcement (Cowles, 2001; Donahoe, stream psychology had to contend with in
2003; Sidman, 1960). Without variation, neither practice within the context of the historical
type of selection is possible (Donahoe, 2003; marriage of probability theory and psychology
Sidman, 1960). Psychology and behavior analysis (Cowles, 2001) as it sought to tackle the pre-
have dealt with it in different ways, however. senting problem of variations in nature.
Psychology’s Methodological Ouroboros 121

Roots: Of Truth, Errors, and Chance (Cohen, 1990, p. 1307; Dienes, 2016) in the
Concerns with observation and measurement data. The use of probability, in this case, comes
are predicated on the assumption of the discov- out of Bayes Factor in terms of the relative
ery of truth. Any measurement presumably probability of the data given different compet-
includes some index of error. According to ing theories (Dienes, 2016; Kruschke, 2010),
Stigler (1992), 19th century theory of error nat- for example. It appears the difference between
urally assumes that observation by necessity cap- the NHST and Bayesian approaches to proba-
tures the truth along with a measure of error: bility is what prompted Rozeboom’s (1960)
“observation = truth + error” (p. 62). In NHST, retort that “the proper inferential task of the
the combination of a true measure and an experimental scientist is not a simple accep-
error component manifests, as the test statistic tance or rejection of the tested hypothesis, but
(e.g., t or F value) expressed as a function of determination of the probability conferred
effect variance and error variance, where the lat- upon it by the experimental outcome,”
ter includes “other influences like measure- (p. 422) in favor of the latter. Both of the con-
ment error or effects of uncontrolled variables ceptions of probability in the NHST and Bayes-
(i.e., ‘chance’)” (Branch, 2014, p. 259; see also ian families, nevertheless, are very different
Cowles, 2001). Hall (2007) quoted Fisher as from that of behavior analysts (Skinner, 1953/
stating that we have “the two desiderata of the 1999; Sidman, 1960).
reduction of error and of the valid estimation of For Skinner, for example, probability is
error…” (p. 298). Contradistinctively, the “simply a way of representing a frequency of
behavior analytic approach to research is the occurrence” (1953/1999, p. 102). In
focus on controlling variables for their func- addressing the role of hypothesis testing in
tional effects on the behavior of interest as illus- psychological experimentation, Sidman
trated by the quote from Sidman (1990) above; appears to be using this very sense of probabil-
rather than incorporate error or chance in a ity in this anecdote:
statistical “treatment” of observed effects, a
point decried by Cohen (1990) as well, behav- A colleague asked me what I expect
ior analysts seek to control it experimentally. would happen to the on-going avoid-
ance behavior as a result of the
Roots: Of Probabilities, Chance, and pairing of stimulus and unavoidable
Randomization shock. After some consideration I
There are various and often incompatible replied that I could not conceive of
conceptions of probability in the history of psy- there being no change in the behav-
chology and statistics. In Fisherian SST, for ior, because the experimental opera-
example, the p-value assesses, after the fact, tion represented a radical alteration
the evidence against the null hypothesis in a of the subject’s environment. We did
given experiment (Schneider, 2015). In that not usually find organisms unre-
context, when a p-value indicates that we reject sponsive to this kind of manipulation.
the null hypothesis, it does not mean that is
Also, I could not conceive that the
the probability that the null is true, nor does it
probability of the avoidance response
say anything about the truth of the alternative
hypothesis because the latter does not exist in would decline, because if such a reac-
SST (Cohen, 1990). In the hybrid NHST tradi- tion were to occur under analogous
tion, where the alternative hypothesis does conditions outside the laboratory the
exit, reflecting the Neyman-Pearson hypothe- species would never have survived to
sis testing approach, the p-value represents a become subjects for my experiments.
decision rule for rejecting or accepting the This left only one more possibility.
null, which corresponds to acceptance or The probability of the behavior would
rejection of the alternative research hypothe- have to increase. (1960, p. 5; emphasis
sis, respectively (Runyon et al., 1996). In both added)
cases, probability is used in a way that has a
wholly different meaning in Bayesian statistics Substantively, the problem of errors found two
where probability represents “degree of belief” distinct solutions in psychology (Cowles, 2001;
122 Abdulrazaq A. Imam

Perone, 1991, 1999) historically. Most of main- the interpretive rules of inferential sta-
stream psychology utilizes statistical control in tistics further shift attention away from
the form of NHST using randomization as base- important behavioral issues by focus-
line, but culminating in the null ritual (see also ing on the existence of a “significant”
Sidman, 1952). Variants of the NHST statistical difference between control and exper-
approach are Bayesian (e.g., Dienes, 2016) and
imental groups. Embedded in these
estimation (e.g., Cumming, 2014a). They all
general problems is an approach to var-
belong in the same family when contrasted with
the alternative approach, namely, experimental iability that fails to encourage or accommo-
control. Behavior analysis is unique in psychol- date controlling its sources so as to collect
ogy in adopting the latter approach by using meaningful data. (p. 324; emphasis
“active” manipulation of relevant variables and added)
“passive” observation as baseline (in accor-
dance to Bernard’s (1927/1957) distinction “Baselines” in the two approaches serve very
between “active” and “passive” as “experimen- different purposes, as is evident in the multi-
tation” and “observation”, respectively [p. 6]), faceted use of the term in behavior analysis. In
and thereby avoiding the null ritual altogether. behavior analysis, the baseline is a complex
The notion of randomization as a baseline has research tool that may serve multiple functions
its roots in the early works of Charles Peirce (Sidman, 1960). It could be the standard sta-
and Joseph Jastrow reported in1885 in which bility in data sought before implementation of
the former used a blind randomized experi- experimental manipulations (Perone, 1991). It
ment with himself as the subject on the psycho- may also serve as behavioral baseline, such as
physics of weights (Stigler, 1992). In contrast to in the study of drug effects, or as a technique
earlier Fechner’s self-experimentation on just- for systematic replication, to study transition
noticeable difference, they used shuffled decks states, or learning (Sidman, 1960). The pro-
of cards to determine which weights to use in found irony of the use of randomization as
what order in their weight judgments. baseline, of course, is that despite the statisti-
According to Stigler, it marked the beginning cal requirement for both random sampling
of randomization as “an artificial baseline” and random assignment in mainstream psy-
(1992, p. 65; see also Hall, 2007) that has chology, in practice, the former is accom-
become prominent in psychological research plished hardly ever, due to heavy reliance on
practices to date. As Johnston and Pen- convenience samples obtained from
nypacker (1993) aptly put it, and reflecting fur- participant-pools of college students in place
ther on the connection to the discussion of of sampling randomly from a target popula-
variations above, tion (e.g., Jaffe, 2005). Overuse of convenient
samples raises serious questions about the rep-
The problems begin with focusing the resentativeness of psychological findings
experimental question on whether derived from group designs, raising the fur-
there is a difference between control ther irony of the mistaken charge usually
and experimental groups, instead of made against small-N designs as lacking in
asking about the nature of the rela- generality due to small sample sizes.
tions between behavior and environ- These array of different philosophical and
mental variables. In the area of epistemological differences between psychol-
measurement, this approach tends to ogy and behavior analysis meant that the his-
encourage grouping individual data torical paths taken by each with regard to the
for analysis, which makes it difficult if adoption and use of inferential statistics were
very different indeed. They thereby contrib-
not impossible to identify the uted in no small way to the avoidance in
behavior–environment relations of behavior analysis of the fate that befell psy-
interest. Experimental comparisons chology methodologically in terms of the place
are typically made by comparing data occupied by inferential statistics in psychologi-
from control and experimental cal research. The next section focuses on the
groups, which mixes treatment effects role of Sidman’s Tactics in providing a formi-
with intersubject variability. Finally, dable alternative to withstand the onslaught of
Psychology’s Methodological Ouroboros 123

the NHST fad in the psychology of its time editorial decisions warranted the 6% and 10%
that turned out to be most productive for cases of small-N studies reporting NHST,
behavior analysis. respectively, in the two journals.
These cursory surveys revealed that even
Avoiding the Problem of Psychology’s though these journals provided an avenue for
Methodological Ouroboros in Behavior behavior analytic research to reach the rest of
Analysis psychology, the preponderance of the oppor-
tunity was against a line of research that advo-
In addition to prominent works in psychol- cated small number of subjects, without
ogy identified as reporting N = 1 research reliance on inferential statistics to evaluate
appearing before the mid-1930s, Dukes (1965) data and to aid reviews in their editorial deci-
reported subsequent (1934-1963) 246 N = 1 sions. It is not surprising then that JEAB would
studies published in 11 different psychology be founded to fill the gap in disseminating
journals, ranging from the American Journal of behavior analytic research. The publication of
Psychology (AJP), to the Journal of Comparative Sidman’s (1960) Tactics two years later pro-
and Physiological Psychology (JCPP), to the Jour- vided a shield against the mainstream enthusi-
nal of Social Psychology (JSP). Of the 246 studies, asm for NHST in psychological research.
only 27 (11%) were on learning and 7 (3%) The timeliness of Sidman’s (1960) publica-
on motivation; the rest focused on categories tion of Tactics cannot be overemphasized in
in extant psychology, suggesting that the use the impact it was to have in behavior analysis.
and reporting of N = 1 research was not lim- It arrived on the scene in the nick of time just
ited to the area of the experimental analysis of as NHST reached its growth asymptote in psy-
behavior. The list of journals mentioned in chology (see Fig. 2). By 1960, all journals in
the Dukes study did not include the Journal of psychology, including the ones that reportedly
the Experimental Analysis of Behavior (JEAB), had published N = 1 research (Dukes, 1965),
which premiered in 1958 devoted to publish- had adopted the approach almost exclusively,
ing behavioral research. Indeed, the listed and a new era in mindless NHST ritual
journals did not publish N = 1 research exclu- (Cohen, 1990; Davidson, 2018) was well under-
sively, but some of them served as outlets for way. The methodological rationales articulated
behavior-analytic research. The AJP and the in Tactics were to provide a shield against the
JCPP, for example, were outlets for learning onslaught of the NHST use pervading the rest
and operant research before the founding of of psychology at the time. A few of the impor-
JEAB. A search-term combination of operant tant positions are provided in what follows to
conditioning AND learning AND reinforce- illustrate the antidotal contributions of
ment in these journals from 1945 to 1957 Sidman’s thinking on a methodology that hel-
yielded 38 hits in AJP and 612 in JCPP. Of the ped the field to evade the ubiquitous statistical
16 articles using mainly rats, a monkey, and misunderstandings that were to engulf the rest
humans in the AJP search results, 75% of psychology.
deployed group designs, 56% used group
designs with NHST reporting, 63% altogether
Of Replication and Generality
reported NHST, only 19% used small-N
designs, and 6% used small-N designs with In psychology at large, there is an ongoing
NHST reporting. A sample of 20 articles using crisis of replication and reproducibility of
rats, pigeons, monkeys, chimpanzees, cats, and results. The crisis arose in part due to 1) the
humans in the JCPP search results revealed reliance on the aforementioned statistical con-
60% used group designs, 50% used group trol and the attendant NHST ritual; 2) as just
designs with NHST reporting, 60% altogether noted, the reliance on the often-violated sam-
reported NHST, only 35% used small-N pling randomization (Jaffe, 2005); and 3) the
designs, and 10% used small-N designs with rampant misconceptions by users of NHST
NHST reporting. Incidentally, all nine of about what p-value signifies, specifically that it
Sidman’s publications and the lone one by carries any information about likely reproduc-
Skinner and Morse in JCPP between 1953 and ibility (see Lambdin, 2012). In contrast, in
1957 used small-N designs without NHST behavior analysis, there is no crisis of replica-
reporting. One can only wonder what role of tion and reproducibility of results mainly due
124 Abdulrazaq A. Imam

to the reliance on experimental control and the research process. According to Davidson,
therefore to the attendant lack of a null ritual, however, “…the primary approach to results
and partly due to the inherently built-in repli- ought to be a researcher’s context-appropriate
cation in experimental manipulations that interpretation; any standardized guidelines for
ensure the repeatability and generality of the interpretation ought to be a last resort” (2018,
results. In behavior analysis, there are three p. 473). In this regard, Davidson’s analysis comes
lines of evidence of replication: 1) session-to- to bear on the relevance of Sidman’s (1960)
session consistency in data within a condition; methodological approach to doing the science
2) reproducible differences across conditions; of psychology in a wider historical context.
and 3) reproducible results across individual In comparing behavior analysis to the rest of
subjects (Perone, 2019), none of which is psychology, the former’s truth scale involves a
afforded by the group designs of mainstream methodology that entails scientific judgement
psychology and their intimate ties to NHST rit- at various points in the design and implementa-
uals (see also Johnston & Pennypacker, 1993). tion of the experiment, data collection, data
Contrary to what is usually assumed about evaluation, and interpretation. Sidman (1960)
the small-N experimental approach, namely, expounded at length on many of these levels.
that it lacks generality due to the sample size To start with, for example, on the role of rules
that is usually small compared to what is typical in developing scientific judgement, Sidman
in the alternative group-design approaches, advised “…the student [is] not to expect a set
generality is of paramount interest and is usu- of rules of experimental procedure, to be mem-
ally accounted for in behavior analytic research. orized in classic textbook version,” remarking
Replication is what affirms generality, especially that “[t]he pursuit of science is an intensely
of the type sought after by mainstream psychol- personal affair” (Sidman, 1960, preface). By
ogists. According to Sidman (1960), “If a series “personal affair,” of course, he did not mean
of experiments is performed, each of which doing sloppy science, as he clarified decades
yields results consistent with the other, the reli- later (Sidman, 1990).
ability and generality of the individual experi- Characteristically, then, evaluating data in
ments are greatly enhanced” (p. 135). In behavior analysis is not conducive to rote ritual
further noting that “[t]he number of such as Sidman variously emphasized. According to
experiments that must be performed cannot be Sidman, 1) there are no established “impartial”
prejudged [and that] [i]t will depend upon the rules for scientific evaluation of data; 2) there
same personal, subjective, pragmatic criteria are no firm, safe grounds upon which scientists
that science and individual scientists have can claim a “set of rules” to evaluate data with-
learned to use in evaluating all types of data,” out the illusion of “security”; and 3) mindless
Sidman (1960, p. 135) draws out the relevance eclecticism nevertheless is not the answer (1960,
of expert judgement arising from establishing p. 41). For Sidman, “the objectivity of science
generality from replications. consists not so much in set rules of procedure as
in the self-corrective nature of the scientific pro-
cess” (1960, p. 43). Whether or not behavior-
On the Role of Scientific Judgement and analytic research has evolved its own rituals that
Rules hamper its progress, beyond the intrusions of
Recall that a major problem identified in NHST reporting in behavioral journals, remains
Davidson’s (2018) Ouroboros allegory in to be examined systematically.
psychology’s methodology is the opposition to
expert judgement in favor of mechanical objec-
tivity realized via NHST rituals. Davidson was not Sidman on Judgement and Replications
the first or only one to advocate expert judge- It is on the role of judgment in assessing
ment, however. Also in the context of NHST replication that the premium Sidman (1960)
practice, Cohen’s (1990) outline of what coun- gave to expert judgement comes to full view.
ted for appropriate use of statistics included In addressing how one decides how often one
what he called expert judgement as well, even as would need to replicate a finding to be con-
he downplayed the prominence of “inferential vinced, for example, Sidman noted the conun-
statistics applied with informed judgement drum such an issue might pose to the
[as] a useful tool” (p. 1311) in the larger view of statistician, retorting:
Psychology’s Methodological Ouroboros 125

The answer is likely to vary from one making of the individual scientist in evaluating
experiment to another. Experi- experimental data. This, to the extent that
menters take into account such factors behavior analysts subscribe to and use it, is the
as the magnitude of the observed antidote behavior analysis has against
effect, their confidence in the ade- psychology’s methodological Ouroboros pro-
duced by its mechanical objectivity. Unfortu-
quacy of their experimental control,
nately, there may be cause for caution in
the consistency of their findings with
behavior analysis given recent developments,
related data, the stability of their base- such as the one alluded to in the previous sec-
line conditions, etc. (p. 87) tion, in the field. Ferron et al. (2019) intro-
Sidman’s conception of the experimenter’s duced a randomization procedure that
considerations in this context is consistent with culminated in p-value reporting for evaluating
Davidson’s (2018) advocacy of expert knowl- single-participant data while cognizant of the
edge and judgement quoted above as an anti- deviations involved in longstanding, proven
dote to the null ritual. Of these data handling and evaluation practices in
considerations, perhaps it is with respect to behavior analysis. To what end, however? I am
determining the stability of the data that not aware of any critique in the literature of
recent developments in behavior analysis (see, the general or specific tenor of Sidman’s (1960)
e.g., Ferron et al., 2019) may be moving away elucidation of various facets of what has
from the important role of expert judgement become foundational in behavior analytic
to heavy reliance on statistical approaches, to research. Attempts to sidestep the informed
the detriment of a previously valued approach judgement required for adequate and effective
to data-handling in behavior analysis. Ferron use of established principles and processes by
et al. (2019), for example, admitted that way of statistical options that would detract
“[a] limitation of preplanned CCDs [changing from experimental control fundamentals are ill
criterion designs] is that by setting the phase advised. In fact, the potent antidote to the
lengths prior to starting the study, the problem warrants that the researcher uses
researchers are unable to respond to the data expert judgement about their research. Fur-
as they are collected” (p. 10), the latter being thermore, it needs to be used in peer review,
a hallmark of behavior analytic methodology which in many domains involves ritualized calls
championed by Sidman and demonstrably for p-values. Peer review can remind
effective and productive for the field. Such researchers that the claims of likely reproduc-
efforts undermine the very thing that Sidman ibility based on p-values are mistaken.
ascribed to the scientist in pointing out, In another development of concern in
behavior analysis, Lanovaz et al. (2019)
Most scientists make such judgements suggested that within-subject replications
intuitively, unaware that they are con- might not be necessary due to applied consid-
tinuously making complex computa- erations. Why? A problem might arise in situa-
tions where treatment withdrawal is ill advised.
tions involving advanced and as yet
Usually, that leaves only the option of using
unformulated probability theory. Such the baseline-treatment (AB) design, which is
evaluations are almost second nature experimentally the weakest design in the
to them, carried out informally along behavior analytic arsenal largely because the
with the natural everyday activities of role of extraneous variables during the treat-
planning experiments, watching their ment (B) phase cannot be ruled out (Poling
progress, changing their course, and et al., 1995). Within-subject replications come
interpreting their results. (1960, into consideration only in the reversal design
p, 87) in which the AB sequence might be repeated
in an ABAB format or other variants. When
treatment withdrawal is not possible, why sug-
Concluding Remarks and Cautionary Notes gest that within-subject replication is not nec-
essary? First, by definition, applied concerns
Sidman (1960) placed a premium on the about ethical, financial, and/or social costs
experience, expertise, judgement, and decision- unquestionably allow for the use of the AB
126 Abdulrazaq A. Imam

design option so long as the demonstrable out- implied in the “analysis” of their title. Addi-
come is acceptable to the client, stakeholders, tionally, small-N designs cannot rely on a sin-
and/or the practitioner. Why lower the design gle subject or participant, literally, as in case
standards for that purpose then? Second, alter- studies or single-case designs, because of the
natives that allow for both within- and value of intersubject replications for esta-
between-subject replications are available, and blishing reliability and generality of findings
demonstrably effective, to both scientists and (see Sidman, 1960, pp. 74-85). As Sidman put
practitioners alike (Gast, 2014). Such alterna- it, “replication of an experiment with two sub-
tives include various multiple-baseline designs jects establishes greater generality for the data
that keep the AB structure in place, thereby among individuals of a population than does
obviating a need for withdrawals, while replication with two groups of subjects whose
maintaining acceptable experimental or prac- individual data have been combined” (1960,
tical comparisons and demonstrating treat- p.75). Basic and applied behavior analysts have
ment effectiveness in contradistinction to the a need of small-N designs for establishing the
weaker simple AB design alone. The hybridiza- reliability and generality of findings.
tion of SST and SHT into NHST provides an The distinction I am making here requires,
excellent illustration of the adverse conse- perhaps, a further differentiation within
quences of embracing inferential statistics behavior analysis to distinguish the concerns
blindfolded by extracurricular considerations, of practitioners in contrast to basic/applied
even if well-intentioned. Historic criticisms of research needs of the field. Service delivery
NHST were not based on how convenient or may be ritualized because it is packaged, but
cost-worthy it may be, but on fundamental log- that is not experimental analysis of behavior,
ical flaws in its use and interpretations (see, nor applied behavior analysis. As such, a fur-
e.g., Bernard, 1927/1957; Branch, 1999, 2014, ther distinction is suggested of those inter-
2019; Harlow et al., 1997; Lambdin, 2012; Mor- ested in applied behavior analysis from
rison & Henkel, 1970; Perone, 1999, 2019; practitioners who have no interest in contrib-
Rozeboom, 1960; Woods, 2011). Until such uting to the “science” but are themselves “con-
fundamental problems arise and are articu- sumers” (albeit for implementation purposes)
lated for small-N designs, questioning the of the products of that science, basic and
value of replications is ill advised and could applied alike, akin to the “technological” (à la
begin a slippery slope we may not be able to Baer et al., 1968, p. 95), or packaged service
reverse in practice. Smith’s (2012) review indi- delivery. In the absence of such differentia-
cates there is no agreement yet on what statis- tion, we run the risk of conflation and confu-
tical analysis standards are to be brought to sion of methodological standards, for students,
bear on these kinds of designs. researchers, and practitioners alike. The
The latest journal-article reporting stan- recent developments in the professionalization
dards of the APA now explicitly provide for of behavior analysis have their major advan-
what is characterized as N-of-1 studies under tages in drawing droves of interest the field
special designs category (American Psychologi- has not seen in decades. With it, though, come
cal Association, 2020). As mentioned in the tensions that are reminiscent of the earliest
introduction, it is customary to describe N-of-1 history of behaviorism (O’Donnell, 1985)
as single-case designs (SCD) especially in prac- between basic science and professional inter-
tice quarters (e.g., Boswell & ests. We should be cognizant of the distinction
Schwartzman, 2018). Poling et al. (1995) made between “basic” and “engineering,” which as
a further distinction of case study that involves Sidman noted, “yield different kinds of knowl-
implementing treatments without a baseline edge” (1990, p. 191), one focused on pro-
condition. Although some may use these terms cesses, the other concerned with outcomes in
interchangeably, they carry different meanings the main (cf. Cronbach, 1957).
and implications for use and practice. For Nevertheless, there should be no doubt
example, whereas SCDs may be appropriate about the value of proven methodological
for practitioners who deliver services in pack- practices within behavior analysis until there
ages to individual clients, it may not be appro- are considered critiques of those practices;
priate for applied behavior analysts who when they have started to prove inadequate to
actually assess functional relationships, as the task, then extant alternative approaches
Psychology’s Methodological Ouroboros 127

may be desirable. In the meantime, we should objectivity vs. expertise). Review of General Psychology,
not stop “asking question(s) of nature”: 22, 469-476. https://doi.org/10.1037/gpr0000154
Dienes, Z. (2016). How Bayes factors change scientific
According to Sidman, “Data can be negative practice. Journal of Mathematical Psychology, 72, 78-89.
only in terms of a prediction. When one sim- https://doi.org/10.1016/j.jmp.2015.10.003
ply asks a question of nature, the answer is Donahoe, J. W. (2003). Selectionism. In K. A. Lattal &
always positive. Even an experimental manipu- P. N. Chase (Eds.), Behavior theory and philosophy
lation that produces no change in the depen- (pp. 103-128). Kluwer Academic.
dent variable can provide useful and often Dukes, W. F. (1965). N = 1. Psychological Bulletin, 64, 74-79.
https://doi.org/10.1037/h0021964
important information” (1960, p. 9). Such was Falk, R., & Greenbaum, C. W. (1995). Significance tests
Sidman’s insights into scientific inquiry in die hard: The amazing persistence of a probabilistic
behavior analysis and it is not time to abandon misconception. Theory & Philosophy, 5, 75-98. https://
them. We should not run to statistical control. doi.org/10.1177/0959354395051004
Ferron, J., Rohrer, L. L., & Levin, J. R. (2019). Randomiza-
tion procedures for changing criterion designs. Behav-
References ior Modification. Advance online publication. https://
American Psychological Association. (2020). Publication doi.org/10.1177/0145445519847627
manual of the American Psychological Association: The offi- Fidler, F. (2010). The American Psychological Association
cial guide to APA style (7th ed.). https://doi.org/10. Publication Manual Sixth edition: Implications for sta-
1037/0000165-000 tistics education. In C. Reading (Ed.), Data and context
Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some cur- in statistics education: Towards an evidence-based society.
rent dimensions of applied behavior analysis. Journal Proceedings of the Eighth International Conference on Teach-
of Applied Behavior Analysis, 1, 91-97. https://doi.org/ ing Statistics. International Statistical Institute.
10.1901/jaba.1968.1-91 Gast, D. L. (2014). Replication. In D. L. Gast & J. R.
Bernard, C. (1927/1957). An introduction to the study of Ledford (Eds.), Single case research methodology: Applica-
experimental medicine. Dover Publications. tions in special education and behavioral sciences (pp.105-
Boswell, J. F., & Schwartzman, C. M. (2018). An explora- 123). Routledge.
tion of intervention augmentation in a single case. Hall, N. S. (2007). R. A. Fisher and his advocacy of ran-
Behavior Modification. Advance online publication. domization. Journal of the History of Biology, 40, 295-325.
https://doi.org/10.1177/0145445518796202 doi:https://doi.org/10.1007/s10739-006-9119-z
Branch, M. (1999). Statistical inference in behavior analy- Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.) (1997).
sis: Some things significance testing does and does What if there were no significance tests? Lawrence
not do. The Behavior Analyst, 22, 87-92. https://doi. Erlbaum.
org/10.1007/BF03391984 Hubbard, R., & Ryan, P. A. (2000). The historical growth
Branch, M. (2014). Malignant side effects of null-hypothesis of statistical significance testing in psychology—and
significance testing. Theory & Psychology, 24, 256-277. its future prospects. Educational and Psychological
https://doi.org/10.1177/0959354314525282 Measurement, 60, 661-681. https://doi.org/10.1177/
Branch, M. (2019). The “reproducibility crisis”: Might the 0013164400605001
methods used frequently in behavior analysis help? Imam, A. A., & Frate, M. (2019). A snapshot look at repli-
Perspectives on Behavior Science, 42, 77-89. https://doi. cation and statistical reporting practices in psychology
org/10.1007/s40614-018-0158-5 journals. European Journal of Behavior Analysis, 20, 204-
Capraro, R. M., & Capraro, M. M. (2002). Treatments of 229. https://doi.org/10.1080/15021149.2019.1680179
effect sizes and statistical significance tests in text- Jaffe, E. (2005, September 10). How random is that?
books. Educational and Psychological Measurement, 62, Observer, 18(9). https://www.psychologicalscience.
771-782. https://doi.org/10.1177/001216402236877 org/observer/how-random-is-that
Cohen, J. (1990). Things I have learned (so far). American Johnston, J. M., & Pennypacker, H. S. (1993). Strategies and
Psychologist, 45, 1304-1312. https://doi.org/10.1037/ tactics of behavioral research (2e). Lawrence Erlbaum.
0003-066X.45.12.1304 Kruschke, J. L. (2010). What to believe: Bayesian methods
Cowles, M. (2001). Statistics in psychology: An historical per- for data analysis. Trends in Cognitive Science, 14, 293-
spective. Lawrence Erlbaum. 300. https://doi.org/10.1016/j.tics.2010.05.001
Cronbach, L. J. (1957). The two disciplines of scientific Lambdin, C. (2012). Significance tests as sorcery: Science
psychology. American Psychologist, 12, 671-684. https:// is empirical—significant tests are not. Theory & Psy-
doi.org/10.1037/h0043943 chology, 22, 67-90. https://doi.org/10.1177/
Cumming, G. (2014a). The new statistic: Why and how? 0959354311429854
Psychological Science, 25, 7-29. https://doi.org/10.1177/ Lanovaz, M. J., Turgeon, S., Cardinal, P., & Wheatley, T. L.
0956797613504966 (2019). Using single case designs in practical settings:
Cumming, G. (2014b, May 22). The new statistics: Effect Is within-subject replication always necessary? Perspec-
sizes and confidence intervals (Part 3: Research integ- tives on Behavioral Science, 42, 153-162. https://doi.
rity and the new statistics) [video file]. Retrieved from org/10.1007/s40614-018-0138-9
https://www.psychologicalscience.org/members/new- Meehl, P. E. (1978). Theoretical risks and tabular asterisks:
statistics Sir Karl, Sir Ronald, and slow progress of soft psychol-
Davidson, I. J. (2018). The Ouroboros of psychological ogy. Journal of Consulting and Clinical Psychology, 46,
methodology: The case of effect sizes (Mechanical 806-834. https://doi.org/10.1037/0022-006X.46.4.806
128 Abdulrazaq A. Imam

Merriam-Webster (2020). Ouroboros. In Merriam-Webster Dic- widespread confusion and numerous misinterpreta-
tionary (Version 5.2.0.) [Mobile app]. App Store. https:// tions. Scientometrics, 102, 411-432. https://doi.org/10.
apps.apple.com/us/app/merriam-webster-dictionary/ 1007/s11192-014-1251-5
id399452287 Sidman, M. (1952). A note on functional relations
Morrison, D. E., & Henkel, R. E. (Eds.) (1970). The signifi- obtained from group data. Psychological Bulletin, 49,
cance test controversy: A reader. Aldine. 263-269. https://doi.org/10.1037/h0063643
O’Donnell, J. M. (1985). The origins of behaviorism: American Sidman, M. (1960). Tactics of scientific research: Evaluating
psychology, 1870-1920. University Press. experimental data in psychology. Authors Cooperative.
Page, S., & Neuringer, A. (1985). Variability is an operant. Sidman, M. (1990). Tactics; In reply… The Behavior Analyst,
Journal of Experimental Psychology: Animal Behavior 13, 187-197. https://doi.org/10.1007/BF03392538
Processes, 11, 429-452. https://doi.org/10.1037/0097- Skinner, B. F. (1999). The analysis of behavior. In B. F.
7403.11.3.429 Skinner (Ed.), Cumulative record (pp. 101-107). Cop-
Perone, M. (1991). Experimental design in the analysis of ley. (Reprinted from “Some contributions of an
free-operant behavior. In I. H. Iversen & K. A. Lattal Experimental Analysis of Behavior to psychology as a
(Eds.), Experimental analysis of behavior, Part 1 (pp. 135- whole,” 1953, American Psychologist, 8, 69-79, https://
171). Elsevier. doi.org/10.1037/h0054118)
Perone, M. (1994). Single-subject designs and develop- Smith, J. D. (2012). Single-case experimental designs: A
mental psychology. In S. H. Cohen & H. W. Reese systematic review of published research and current
(Eds.), Life-span developmental psychology: Methodological standards. Psychological Methods, 17, 510-550. https://
considerations (pp. 95-118). Taylor & Francis. https:// doi.org/10.1037/a0029312
doi.org/10.4324/9781315792712-5 Stewart, D. W. (2000). Testing statistical significance test-
Perone, M. (1999). Statistical inference in behavior analy- ing: Some observations of an agnostic. Educational and
sis: Experimental control is better. The Behavior Psychological Measurement, 60, 685-690. https://doi.
Analyst, 22, 190-116. https://doi.org/10.1007/ org/10.1177/00131640021970826
BF03391988 Stigler, S. M. (1992). A historical view of statistical con-
Perone, M. (2019). How I learned to stop worrying and cepts in psychology and educational research. Ameri-
love replication failures. Perspectives in Behavioral can Journal of Education, 101, 60-70. https://doi.org/
Science, 42, 91-108. https://doi.org/10.1007/s40614- 10.1086/444032
018-0153-x Woods, B. (2011). What’s still wrong with psychology, any-
Poling, A., Methot, L. L., & LeSage, M. G. (1995). Funda- way? Twenty slow years, three old issues, and one new
mentals of behavior analytic research. Plenum Press. methodology for improving psychological research
Rozeboom, W. W. (1960). The fallacy of null hypothesis [Unpublished master’s thesis]. University of
significance test. Psychological Bulletin, 57, 416-428. Canterbury.
https://doi.org/10.1037/h0042040
Runyon, R. P., Haber, A., Pittenger, D. J., & Received: August 18, 2020
Coleman, K. A. (1996). Fundamentals of behavioral sta-
Final Acceptance: November 27, 2020
tistics (8e). McGraw-Hill.
Editor-in-Chief: Mark Galizio
Schneider, J. W. (2015). Null hypothesis significance tests. Associate Editor: Iver Iversen
A mix-up of two different theories: The basis for

You might also like