Psychology in The Schools - 2018 - Drevon

Received: 30 July 2018 | Accepted: 17 September 2018
DOI: 10.1002/pits.22195
RESEARCH ARTICLE
A meta‐analytic review of the evidence for

check‐in check‐out
Daniel D. Drevon | Michael D. Hixson | Robert D. Wyse |

Alexander M. Rigney
Department of Psychology, Central Michigan

University, Mount Pleasant, Michigan Abstract
The purpose of this study was to conduct a descriptive and
Correspondence
Daniel D. Drevon, Department of Psychology, quantitative review of the literature examining the effec-
Central Michigan University, Mount Pleasant, tiveness of check‐in check‐out (CICO), a commonly used
MI 48859.
Email: drevo1dd@cmich.edu behavioral intervention in the context of a Positive Behavior
Interventions and Supports framework. Studies that experi-
mentally investigated the effectiveness of CICO or modified
CICO compared with a baseline or control condition on
student outcomes were included. A between‐case d‐statistic
comparable across single‐case and between‐groups experi-
ments was calculated for 59 dependent variables across 32
studies. The meta‐analysis was conducted using robust
variance estimation allowing for the inclusion of multiple
effect sizes per study by accounting for dependency
between outcomes. The overall effect size, g̅ = 1.22, 95%
confidence intervals (1.00, 1.44), suggested CICO improves
student outcomes by over one standard deviation. Studies
were characterized by considerable heterogeneity that went
largely unexplained in moderator analyses. Results are
situated in the context of other systematic reviews of CICO
and modified CICO. Overall, CICO appears to be an
effective intervention; however, at this time, it is unclear
what study‐ and sample‐level variables may impact its
effectiveness.
KEYWORDS
behavior education program, check‐in check‐out, positive behavior,
evidence‐based practice, interventions and support, meta‐analysis
Psychol Schs. 2019;56:393–412. wileyonlinelibrary.com/journal/pits © 2018 Wiley Periodicals, Inc. | 393

15206807, 2019, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/pits.22195 by Minnesota State University - Moorhead - USA, Wiley Online Library on [06/03/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
394 | DREVON ET AL.
1 | INTRODUCTION
Positive Behavior Interventions and Supports is a school‐wide model of service delivery designed to prevent and
remediate behavioral difficulties via tiered systems of evidence‐based practices (Simonsen & Myers, 2015). Critical
components of the Positive Behavior Interventions and Supports framework include a three‐tiered continuum of
increasingly intensive evidence‐based practices matched to student need and using data to make decisions about
the level of support students need and whether practices are effective across the continuum (Sugai & Horner,
2006). Tier 1, or primary prevention, consists of school‐wide methods for preventing behavioral difficulties. Tier 2,
or secondary prevention, consists of targeted interventions for students who do not respond to Tier 1 supports and
are at‐risk for developing further behavioral difficulties. Tier 1 supports should be broad and effective to ensure
that only about 15% of students require Tier 2 supports. Tier 3, tertiary prevention, consists of intensive,
individualized, function‐based interventions for Tier 1 and Tier 2 nonresponders. Only about 5% of students should
require this level of behavioral support when effective Tier 1 and Tier 2 supports are properly implemented within
a Positive Behavior Interventions and Supports model (Simonsen & Myers, 2015).
Tier 2 interventions are important in reducing the likelihood students develop significant behavioral difficulties.
Interventions at this level allow for efficient resource utilization by using standard protocols to reduce training time
and increase fidelity (Simonsen & Myers, 2015). In general, effective Tier 2 interventions are consistent with
schoolwide expectations, increase monitoring of behavioral performance, involve explicit instruction, provide
opportunities to practice and receive feedback on new skills, and include parent/guardian communication.
Logistically, Tier 2 interventions should be continuously available, require low effort from school personnel, flexible,
function‐based, and include a plan for fading the intervention (Hawken, Adolphson, MacLeod & Schumann, 2008;
Simonsen & Myers, 2015). There are a variety of empirically‐supported Tier 2 interventions to address behavioral
difficulties, such as social skills training and mentoring (Hawken et al., 2008). One of the most commonly
implemented Tier 2 interventions within the Positive Behavior Interventions and Supports framework is check‐in
check‐out (CICO).
CICO, also referred to as the Behavior Education Program (Crone, Hawken, & Horner, 2010), is a
multicomponent behavior intervention typically consisting of five defining features. First, a student checks in
with the CICO coordinator, or mentor, upon arriving at school. During check‐in, the CICO coordinator reviews
behavioral expectations and goals and the student receives a Daily Progress Report, the second defining feature of
CICO. Third, throughout the day the student’s teacher(s) provides oral and written feedback on the Daily Progress
Report at regular predetermined intervals on whether he or she is meeting the behavioral expectations, which can
also be used for progress monitoring. Fourth, the student checks out with the CICO coordinator at the end of the
school day to review behavioral performance. The student earns an incentive contingent on meeting his/her goal on
the Daily Progress Report. Fifth, to increase school‐home collaboration, students take their Daily Progress Report
home for parent review and signature. The Daily Progress Report is returned to the CICO coordinator the following
morning.
Though CICO was originally intended for behavioral difficulties maintained by adult attention, CICO has been
modified in different ways, for example, to address escape‐maintained behavior difficulties (e.g., Turtura,
Anderson, & Boyd, 2014) and internalizing behavior difficulties (e.g., Dart et al., 2015). Despite the wide variety
of applications, CICO has primarily been implemented in elementary schools, though there are also multiple
applications in secondary schools (e.g., Hawken & Horner, 2003) and alternative settings (Swoszowski, Jolivette,
Fredrick, & Heflin, 2012).
To date, the effectiveness of CICO has been investigated in five systematic reviews (Hawken, Bundock,
O’Keefe, & Barrett, 2014; Klingbeil, Dart, & Schramm, 2018; Maggin, Zurheide, Pickett, & Baillie, 2015; Mitchell,
Adamson, & McKenna, 2017; Wolfe et al., 2016). It should be noted Klingbeil et al. (2018) reviewed descriptive
characteristics of studies that modified CICO based on functional behavior assessments (FBAs) of students’
behavioral difficulties, but did not include a research question about the effectiveness of CICO in these
DREVON ET AL. | 395
T A B L E 1 Comparison of inclusion criteria and data analysis techniques for existing systematic reviews of CICO
Hawken Wolfe Maggin Mitchell

et al. (2014) et al. (2016) et al. (2015) et al. (2017)
Most recent publication year included 2013 2015 2014 2014
Dissertations/theses included Yes No Yes No
Designs included E, QE, SCD E, SCD E, QE, SCD E, SCD
Methodological quality restrictions No No Yes Yes
Dependent variable restrictions No No Yes Yes
Participant restrictions No No Yes No
Setting restrictions No No Yes No
Number of studies included 28 16 11 5
Reported intercoder reliability of data NA No No NA
extraction for SCDs
Effect sizes for BGDs PND Tau‐U IRD, NAP, RDMA, None
WC‐SMD
Effect sizes for SCDs R2, SMD SMD SMD None
Aggregated effect sizes No No Yes No
Conducted moderator analyses No No No No
Note. E: experimental design; IRD: improvement rate difference; NAP: nonoverlap of all pairs; PND: percent of
nonoverlapping data; QE: quasi‐experimental design; SCD: single‐case design; RDMA: raw data multilevel analysis;
WC‐SMD: within‐case standardized mean difference.
applications; therefore, it will not be discussed henceforth. Existing reviews varied in their inclusion criteria and
data analysis techniques. These are detailed in Table 1.
Because of differences in inclusion criteria used across the reviews, there is substantial nonoverlap in the
studies included. Hawken et al. (2014) included studies through 2013 yet had the greatest number (k = 28) of
studies in their review, which appears to be partially attributable to the fact that they included dissertations and
theses as well as quasi‐experimental designs. Wolfe et al. (2016) included 16 studies in their review, though they
included some references from 2015. This seems partially attributable to the exclusion of dissertations and theses.
Maggin et al. (2015) included 11 studies, which appears attributable to restrictions placed on the dependent
variables, participant characteristics, and setting of the intervention. Additionally, Maggin et al. only included
studies meeting the What Works Clearinghouse single‐case or group design standards (Version 3.0) were analyzed.
Mitchell et al. (2017) included five studies due to exclusion of theses, dissertations, and studies not meeting all of
the quality indicators of the Council for Exceptional Children’s (2014) evidence‐based practice standards.
Along with differences with inclusion criteria each of the existing reviews utilized different approaches to data
analysis. Hawken et al. (2014) included 20 single‐case designs (SCDs) and eight group designs in their seminal
review of CICO. For SCDs, they calculated and reported the median percent of nonoverlapping data (PND) for each
outcome of each study. The average median PND was 68%, suggesting CICO was questionably effective per criteria
proffered by Scruggs and Mastropieri (1998). At the case level, 55% of cases had PND values less than 70%. For
group designs, Cohen’s d or related effect sizes were reported for six studies and R2 was reported for the other two.
The average median Cohen’s d was 0.37, which was considered small. For studies reporting R2, the average median
effect size was 0.23, which was considered large. Though the authors calculated effect sizes for studies included in
their review, they did not aggregate them using meta‐analytic techniques. The authors concluded that ultimately
there was some evidence to support the effectiveness of CICO; however, effects may vary depending on setting
and dependent variable.
396 | DREVON ET AL.
Wolfe et al. (2016) utilized visual analysis supplemented by calculating a common index of nonoverlap, Tau‐U,
for SCDs (Parker, Vannest, Davis, & Sauber, 2011) and Cohen’s d for group designs for each outcome of each study.
Though the authors did not synthesize effect sizes, Tau‐U was variable across studies and indicated the amount of
data that improved from baseline to intervention phases ranged from 9% to 91% for problem behavior and 27–87%
for academic engagement. Cohen’s d for the one between‐groups study ranged from 0.02 to 0.40. Wolfe et al.
suggested there was evidence to support the effectiveness of CICO for reducing problem behavior maintained by
adult attention. They also expressed reservation at its effectiveness for addressing behavioral outcomes maintained
by peer attention and/or escape.
Maggin et al. (2015) supplemented visual analysis with four different effect sizes for SCDs: nonoverlap of
all pairs (NAP), improvement rate difference (IRD), no assumptions standardized mean difference (SMD), and
raw data multilevel analysis (RDMA). Effect sizes were synthesized using multilevel modeling procedures to
provide an overall estimate of the magnitude of the effectiveness of CICO. This is the only existing review to
aggregate effect sizes using meta‐analysis. Overall effect sizes were significantly different from 0. Effect sizes
based on nonoverlap of data across phases (i.e., NAP and IRD) were considered moderate and those based on
parametric statistics (i.e., SMD and RDMA) were considered weak. Cohen’s d was reported for the two
between‐groups designs included in this review. No homogeneity or moderator analyses were conducted to
investigate variability in effect sizes across studies. Overall conclusions again supported the effectiveness of
CICO, especially for studies that used SCD and those that targeted behavior difficulties maintained by adult
attention.
Mitchell et al. (2017) used the Council for Exceptional Children’s (2014) evidence‐based practice standards to
classify studies as having a positive, neutral (or mixed), or negative effects. The studies were not quantitatively
analyzed or synthesized. Three single case studies and a group study were identified as demonstrating positive
effects. Two single case studies were identified as demonstrating neutral effects.
2 | P U R P O S E AN D R E S E A R C H Q U E S T I O N S
Previous reviews of CICO are characterized by substantial nonoverlap in the studies identified for inclusion
and have been limited by either (a) failing to quantify the magnitude of effect, (b) failing to aggregate effect
sizes via meta‐analysis, and/or (c) failing to undertake moderator analyses. The current study extends previous
reviews in various ways and incorporates data analytic techniques that could lead to new findings. First, given
the increasing frequency with which high‐quality CICO research is being published, this review includes an up‐
to‐date and exhaustive search of the literature. Second, in the interest of increasing the number of CICO
studies included, as well as addressing potential publication bias, broad inclusion criteria were utilized (i.e.,
inclusion of theses/dissertations and no restrictions placed on the setting of the intervention, dependent
variables, or participant characteristics). Third, unlike in previous reviews, we evaluated the reliability of the
extracted data by computing multiple metrics of intercoder reliability. Fourth, because the literature base for
CICO includes both single‐case and between‐groups designs, we calculated a between‐case d statistic that is
comparable across these designs. Fifth, because no previous studies conducted an analysis of the variables that
may moderate the effects of CICO, we conducted moderator analyses. The research questions for the following
study were:
1. What are the study‐ and sample‐level characteristics of studies experimentally investigating the effectiveness
of CICO?
2. What is the overall effect of CICO on student outcomes?
3. Do study‐ and/or sample‐level characteristics of included studies moderate the effectiveness of CICO on
student outcomes?
DREVON ET AL. | 397
3 | METHODS
3.1 | Literature search and inclusion criteria

In June, 2018 we searched four databases through ProQuest: ERIC, ProQuest Dissertations & Theses Global,
PsycARTICLES, and PsycINFO with the search terms “behavior education program” or “check in check out” and
“effective*.” The aforementioned keywords and Boolean operators were entered into the search simultaneously. The
search was restricted to the years 2002–2018 because the first empirical study examining CICO was published in
2002 (March & Horner, 2002). This search yielded 114 results, after removing duplicates.
Inclusion criteria were independently applied to these results by the first author and fourth author. To be
included, studies must have been published in a peer‐reviewed journal or thesis/dissertation published in
English. Studies must have examined the effectiveness of CICO or modified CICO on student outcomes. CICO
or modified CICO must have been compared to a baseline or control (i.e., no intervention) condition. Studies
only comparing CICO to modified CICO were excluded (e.g., Campbell & Anderson, 2008; Harrison, 2013; Ross
& Sabey, 2015). Next, studies must have used a between‐groups experiment or an SCD with at least three
attempts at demonstrating an intervention effect. Finally, theses/dissertations that met the above inclusion
criteria, but later resulted in the publication of a peer‐reviewed journal article, were excluded to avoid
including redundant data in the meta‐analysis (e.g., Boyd, 2011; Kauffman, 2008; Miller, 2013). Application of
these inclusion criteria to the initial search resulted in 32 studies, after resolving initial disagreements on six
studies through discussion.
Finally, the first author searched the references of the five existing systematic reviews on CICO in order of
publication date, which resulted in three additional articles meeting the inclusion criteria after reviewing Hawken
et al. (2014) and two additional articles after reviewing Wolfe et al. (2016). No unique references were included in
Maggin et al. (2015), Mitchell et al. (2017), or Klingbeil et al. (2018). In sum, we identified 37 articles. See Figure 1,
the PRISMA statement (Moher, Liberati, Tetzlaff, Altman & Group, 2009), for a visual depiction of the literature
search and application of the inclusion criteria.
3.2 | Descriptive analysis

3.2.1 | Descriptive data
After recording author(s), year of publication, and journal title, several study‐ and sample‐level variables were
coded. At the study‐level, number of cases, publication type (i.e., journal article and thesis/dissertation),
research design (i.e., ABAB, between‐groups experiment, changing criterion, and multiple baseline), setting (i.e.,
alternative setting, elementary school, secondary school), and whether an FBA was conducted before
intervention. Further, we coded the dependent variable type (i.e., academic engagement or on‐task behavior;
daily report card; office discipline referrals; other; or problem, disruptive, or off‐task behavior) and its
corresponding method of measurement (i.e., records, student ratings, systematic direct observation, and
teacher ratings). At the sample‐level, we coded average age in years of participants by summing stated ages
and dividing by number of participants. When grade‐level was reported instead of age, we added five to each
grade level before calculating the average as described previously. We also coded percentage of participants
who were female, percentage of participants who were racial/ethnic minorities, percentage of participants who
were eligible for special education services, and percentage of participants with behavioral difficulties at least
partially maintained by adult/peer attention.
The first author coded the study‐ and sample‐level variables for all 37 studies. The fourth author coded 8
(≈ 22%) randomly selected studies to investigate intercoder agreement of the aforementioned study‐ and sample‐
level variables. Ten variables were coded for each of the studies: number of cases, publication type, research design,
setting, whether an FBA was conducted before implementing CICO, average age, percent female, percent racial/
398 | DREVON ET AL.
ethnic minority, percent receiving special education services, and percent with behavioral difficulties at least
partially maintained by adult/peer attention. Percent agreement was calculated by dividing the number of
agreements by number of agreements plus disagreements and multiplying by 100%. Ninety‐five percent of codes
were in exact agreement. Disagreements were resolved through discussion before proceeding with quantitative
analysis.
3.2.2 | Appraisal of methodological quality

The methodological quality of the included studies was evaluated by the fourth author using the What Works
Clearinghouse Standards Handbook, Version 4.0 (What Works Clearinghouse, 2017). The first author coded 8 (≈22%)
of the 37 studies to examine intercoder agreement. Initial intercoder agreement was 75%. The first and fourth
author discussed reasons for disagreement until 100% agreement was reached. Studies similar to those in
disagreement were reevaluated postdiscussion by the fourth author to ensure reliability in coding.
F I G U R E 1 PRISMA Flow Diagram (Moher, Liberati, Tetzlaff, Altman, & The PRISMA Group, 2009) for stages of
the meta‐analysis
DREVON ET AL. | 399
3.3 | Quantitative analysis

3.3.1 | Data extraction
Data point values were extracted as XY coordinates from graphs using WebPlotDigitizer (Rohatgi, 2018). The first
author extracted data from the 36 studies that utilized SCD. The fourth author coded 8 (≈22%) randomly selected
studies to investigate intercoder reliability of data extraction. Intercoder reliability was examined for 1,084 data
points across the eight studies. Because there is some unreliability in data extraction (i.e., coders may click slightly
different areas on a data point), intercoder reliability for y‐values was examined proportionally, as such, y‐values
within 1% of the y‐axis range were considered to be in agreement. This metric is commonly used to assess
intercoder reliability of plot digitizing tools (Boyle, Samaha, Rodewald, & Hoffmann, 2013; Rakap, Rakap, Evran, &
Cig, 2016). For the current study, 86% of data points were in proportional agreement. The average correlation of
the primary and secondary coders’ data were r = 0.999. The first author’s extracted data were used for calculation
of effect sizes.
3.3.2 | Effect size calculation

Between‐case d statistics were computed using different procedures for SCD studies and the one between‐groups
experiment. The between‐case d statistic for SCDs developed by Hedges, Pustejovsky, and Shadish (2012); Hedges,
Pustejovsky, and Shadish (2013) is beneficial because it is interpretable as the typical standardized mean difference
statistic used to describe the magnitude of effect in between‐groups experiments. Shadish, Hedges, and
Pustejovsky (2014) outline a number of additional advantages to this analytic approach including formal statistical
derivation, which allows researchers to use typical meta‐analytic techniques.
The SPSS DHPS macro provided and described by Shadish et al. (2014) was used to calculate 54 between‐case d
statistics for 31 of the 36 SCDs identified for inclusion. The macro corrects the obtained between‐case d statistics for
small sample bias resulting in Hedges’ g (Hedges, 1981). To obtain a between‐case d statistic the macro requires data
from studies with multiple baseline designs across cases or reversal designs. This led to the exclusion of three studies
that used changing‐criterion designs (i.e., Fairbanks, Sugai, Guardino, & Lathrop, 2007; Lane, Capizzi, Fisher, & Ennis,
2012; McDaniel & Bruhn, 2016) and one that used a multiple baseline design across settings (i.e., Boden, Jolivette, &
Alberto, 2018). The macro also requires three independent cases to calculate the standard error for the between‐case
d statistic, resulting in the exclusion of Melius, Swoszowski, and Siders (2015). Hunter, Chenier, and Gresham (2014)
included four cases, three of whom were evaluated using a multiple baseline design across cases and one of whom
was evaluated using a reversal design. Thus, for Hunter et al. (2014), the effect size is based only on the three cases
who were evaluated using a multiple baseline design across cases.
For the one between‐groups experiment identified for inclusion (Simonsen, Myers, & Briere, 2011), five
additional d statistics were calculated using ES Version 1.0 (Shadish, Robinson & Lu, 1997). These effect sizes were
calculated from postintervention data for the control and treatment conditions and were also corrected for small
sample bias (Hedges, 1981).
3.3.3 | Data analysis

Meta‐analytic procedures for combining between‐groups experiments and SCDs set forth by Zelinsky and Shadish
(2018) were utilized throughout the course of analyses. Data analysis was carried out with R (R Core Team, 2016)
and several associated add‐on packages were used to identify outlying effect sizes (outliers; Komsta, 2011),
conduct influence analyses (metafor; Viechtbauer, 2017), and conduct the meta‐analysis and subsequent
moderator analyses (robumeta; Fisher & Tipton, 2017). Multiple effect sizes and associated standard errors were
averaged within studies to conduct preliminary and influence analyses as well as investigate publication bias
because robumeta does not currently have this capability.
400 | DREVON ET AL.
Many of the included studies reported more than one dependent variable, which can be used to compute
multiple effect sizes; however, these effect sizes are nonindependent. If these effect sizes are treated as
independent in meta‐analyses, the associated standard errors are too small, which leads to an increased
probability of type‐I errors (Zelinsky & Shadish, 2018). Historically, this issue has been handled by either
selecting the one effect per study most central to the research question at hand or by averaging multiple effect
sizes per study (Card, 2012)—removing data from analyses. To address the concerns of independence and data
loss, Hedges, Tipton, and Johnson (2010) introduced robust variance estimation (RVE), which adjusts the
standard error for a given effect size given the correlation between effect sizes within a study. Utilizing RVE in
the current study permitted us to include a greater amount of potentially useful information in meta‐analyses and
subsequent moderator analyses.
4 | RES U LTS
4.1 | Descriptive review

Table 2 provides a detailed overview of the study‐ and sample‐level characteristics coded by study included in the
meta‐analysis.
4.1.1 | Descriptive data

Thirty‐seven studies satisfied the aforementioned inclusion criteria, 43% of which have not been included in any of
the existing reviews of CICO. Though the publication date spans the years 2002–2018, nearly 60% of included
studies were published in 2013 or more recently. Of the journal articles, studies were most frequently published in
Journal of Positive Behavior Interventions (n = 7). Four other journals featured two studies.
Table 3 summarizes study‐level descriptive data. In terms of publication type, 70% were journal articles and
30% were theses/dissertations. Multiple baseline designs were the most commonly used research design,
accounting for 68% of studies. ABAB (22%), changing‐criterion (8%), and between‐groups experiments (3%) were
also utilized. The majority of studies were conducted in elementary schools (54%) with fewer in secondary schools
(27%) and alternative settings (19%). Sixty‐eight percent of studies conducted an FBA before implementing CICO.
Of the 64 dependent variables measured across the 37 studies, problem behavior, or a closely related construct (i.e.,
disruptive behavior, and off‐task behavior), was most common, accounting for 40% of measured outcomes. Academic
engagement, or a closely related construct (i.e., on‐task behavior) was next most common, accounting for 25% of
dependent variables. Total or percent total of points on a daily report card, or similarly named measure (i.e., daily
behavior report card, and daily point card), was third most common, accounting for about 20% of dependent variables.
Dependent variables were most frequently (roughly 66%) measured with interval recording systems for the occurrence
and nonoccurrence of an operationally defined target behavior. Next, slightly over 25% of dependent variables were
based on teacher ratings. Five percent of outcomes were measured by accessing records to determine the frequency of
office discipline referrals. One outcome, accounting for 2% of dependent variables, was based on student ratings.
There were a total of 180 participants in the 37 studies included in the descriptive review. For the 36 SCDs
included in this study, there were 144 total participants, but 136 total cases. Hawken, MacLeod, and Rawlings
(2007) utilized four groups with three students in each group as the unit of analysis. For Simonsen et al. (2011), the
one between‐groups experiment included, there were 27 participants who received intervention.
For sample‐level characteristics, one study‐level value was computed and then these were averaged across the
number of studies with calculable values. The average age of participants was 10.2 years. One study did not provide
information to calculate the average age of participants. Twenty‐eight percent of participants were female. Fifty‐six
percent of participants were racial/ethnic minorities; however, five studies did not report this information. Thirty‐three
percent of participants received special education and related services. One study did not report this information.
T A B L E 2 Study‐ and sample‐level characteristics of studies included in the meta‐analysis
DREVON
Sample‐level characteristic
ET AL.
Study‐level characteristic Percent
Study n Publication type Design Setting Design strength FBA Average Age Female minority SPED attention
Barber (2013) 3 Thesis/dissertation MBL ES Meets Yes 6.67 0 100 0 100
a
Boden et al. (2018) 3 Journal article MBL SS Meets No 17.67 0 100 100 NR
Boyd and Anderson (2013) 3 Journal article ABAB ES Meets w/Res Yes 10.67 0 67 0 0
Bunch‐Crump and Lo (2017) 3 Journal article MBL ES Meets No 9.67 0 100 33 NR
Camacho (2016) 4 Thesis/dissertation MBL ALT Does Not Meet Yes 8.00 0 NR NR 100
Campbell and Anderson (2011) 4 Journal article ABAB ES Meets w/Res Yes 9.25 0 25 75 100
Collins, Gresham, and Dart (2016) 4 Journal article MBL ES Does Not Meet No 11.00 75 75 0 NR
Dart et al. (2015) 3 Journal article MBL ES Does Not Meet No 6.67 100 67 0 NR
Dexter (2015) 6 Thesis/dissertation MBL ES Meets w/Res No 8.17 67 0 17 NR
Ennis, Jolivette, Swoszowski, and Johnson (2012) 6 Journal article MBL ALT Meets w/Res Yes 13.83 33 50 67 50
a
Fairbanks et al. (2007) 10 Journal article CC ES Does Not Meet Yes 7.50 50 25 20 40
Fallon and Feinberg (2017) 3 Journal article MBL ALT Meets Yes 14.33 0 100 100 100
Harpole (2012) 8 Thesis/dissertation ABAB ES Does Not Meet No 7.88 63 38 0 NR
Hawken and Horner (2003) 4 Journal article MBL SS Meets Yes 12.75 0 NR 25 100
Hawken et al. (2007) 12 Journal article MBL ES Does Not Meet No NR 17 17 8 NR
Hunter et al. (2014) 4 Journal article MBL ES Does Not Meet No 9.75 25 25 0 NR
Klein (2014) 6 Thesis/dissertation MBL ES Does Not Meet Yes 5.67 33 83 0 67
a
Lane et al. (2012) 4 Journal article CC SS Does Not Meet Yes 13.25 0 0 25 100
MacLeod et al. (2016) 4 Journal article MBL ES Meets Yes 9.00 0 0 100 75
March and Horner (2002) 3 Journal article MBL SS Meets Yes 12.67 33 NR 67 67
a
McDaniel and Bruhn (2016) 2 Journal article CC SS Meets w/Res No 13.00 100 100 0 NR
|
McLemore (2016) 3 Thesis/dissertation ABAB SS Meets No 11.00 67 100 33 NR

401
(Continues)
(Continued)
402
TABLE 2
|
Sample‐level characteristic
Study‐level characteristic Percent
Study n Publication type Design Setting Design strength FBA Average Age Female minority SPED attention
a
Melius et al. (2015) 2 Journal article ABAB ALT Meets w/Res Yes 8.00 0 50 100 0
Miller, Dufrene, Joe olmi, et al. (2015) 4 Journal article ABAB ES Does Not Meet No 7.00 25 75 0 NR
Miller, Dufrene, Sterling, et al. (2015) 3 Journal article ABAB ES Meets No 9.00 33 100 33 NR
Mitchell (2012) 3 Thesis/dissertation MBL ES Meets Yes 7.33 33 0 0 25
Mong, Johnson, and Mong (2011) 4 Journal article MBL ES Does Not Meet Yes 8.25 50 50 0 100
Parry (2014) 3 Thesis/dissertation ABAB ES Meets Yes 9.67 33 33 0 100
Sanchez, Miltenberger, Kincaid, and Blair (2015) 3 Journal article MBL ES Meets Yes 9.00 0 NR 0 100
Simonsen et al. (2011) 27 Journal article BG SS Meets Yes 11.59 24 88 17 92
Stuart (2013) 3 Thesis/dissertation MBL ALT Meets w/Res Yes 9.33 67 67 0 100
Swain‐Bradway (2009) 6 Thesis/dissertation MBL SS Does Not Meet Yes 14.67 17 NR 33 17
Swoszowski et al. (2012) 6 Journal article MBL ALT Does Not Meet Yes 13.33 17 17 100 50
Swoszowski, Jolivette, and Melius (2013) 4 Journal article MBL ALT Meets Yes 8.00 50 100 100 100
Todd, Campbell, Meyer, and Horner (2008) 4 Journal article MBL ES Meets Yes 6.50 0 50 25 100
Toms (2012) 3 Thesis/dissertation MBL SS Does Not Meet Yes 15.00 0 100 100 0
Turtura et al. (2014) 3 Journal article MBL SS Meets Yes 12.00 33 0 0 0
Note. ALT: alternative setting; CC: changing‐criterion design; ES: elementary school; FBA: functional behavior assessment; MBL: multiple baseline design; NR: not reported; SPED:
special education, attention: behavioral difficulties at least partially maintained by peer/adult attention; SS: secondary school.
a
Study not included in quantitative review.
DREVON
ET AL.
DREVON ET AL. | 403
T A B L E 3 Descriptive data for study‐level characteristics
Study‐level characteristic n (N = 37) % of Total

Publication type
Journal article 26 70
Thesis/dissertation 11 30
Design
ABAB 8 22
Between‐groups experiment 1 3
Changing criterion 3 8
Multiple baseline 25 68
Setting
Alternative setting 7 19
Elementary school 20 54
Secondary school 10 27
Functional behavior assessment
No 12 32
Yes 25 68
Design strength
Meets 16 43
Meets with reservations 7 19
Does not meet 14 38
Dependent variable (N = 64)
Academic engagement 16 25
Daily report card 13 20
Office discipline referrals 3 5
Other 6 9
Problem behavior 26 41
Method for recording dependent variable (N = 64)
Records 3 5
Student rating 1 2
Systematic direct observation 43 67
Teacher rating 17 25
Sixty‐seven percent of participants had behavioral difficulties at least partially maintained by adult/peer attention,
though the 12 studies that did not conduct an FBA before implementing CICO did not report these data.
4.1.2 | Appraisal of methodological quality

The methodological quality rating for each study can be seen in Table 2. The plurality of the studies included in this
review met the standards proposed by the What Works Clearinghouse (2017) without reservations (43%). A
smaller percentage of studies met the standards with reservations (19%). While not the majority, more than a third
of the studies included in this review were determined to not meet the standards (38%).
4.2 | Quantitative review

4.2.1 | Preliminary and influence analyses
For preliminary and influence analyses as well as investigating publication bias we averaged effect sizes and their
associated standard errors within studies (k = 32). During preliminary analyses, a Grubbs’ test revealed the effect
404 | DREVON ET AL.
size for Toms (2012) was an outlier (G = 4.53, p < 0.001). This effect size, g = 7.18, was more than two times the next
highest effect size and included one dependent variable unique to the included studies (essentially a measure of
how accurately students carried out steps of the intervention which is not consistent with aims of CICO). For these
two reasons, this study was dropped from further analyses. As Zelinsky and Shadish (2018) noted, excluding a study
such as Toms (2012) necessarily reduces the overall effect. An additional Grubbs’ test revealed that the effect size
for Stuart (2013) was an outlier (G = 2.95, p < 0.01); however, there did not appear to be justification for its removal.
Several plots from influence analyses are available as supplementary materials. These analyses revealed that
effect size residuals are primarily normally distributed, with the exception of Stuart (2013); however, this may have
occurred by chance given the number of effect sizes included in this study. These analyses generally suggest
individual studies did not have undue influence on results on the meta‐analysis when interpreted in light of
guidance provided by Zelinsky and Shadish (2018).
4.2.2 | Overall effect

Using RVE, 59 effect sizes were synthesized by correcting for small sample bias. The random‐effects model was
statistically significant (g̅ = 1.22, 95% confidence interval [CI] [1.00, 1.44]); associated t(29) = 11.10; p < 0.001.
Compared with baseline or control conditions, CICO improved behavioral outcomes by over one standard
deviation. A measure of between‐study variance was τ2 = 0.30. Tests for homogeneity indicated effect sizes were
heterogeneous, Q(30.14) = 129.05; p < 0.001. Additionally, I2 = 76.65% which suggests about 77% of the variance in
effect sizes was due to systematic differences between studies rather than sampling error. This is a high level of
heterogeneity across studies, given a suggested benchmark of 75% (Higgins, Thompson, Deeks, & Altman, 2003).
Sensitivity analyses accounting for varying dependency between outcomes within studies did not change the
average effect size, standard error, or variance component reported above. Figure 2 depicts individual effect sizes
and associated 95% CI. Boxes depicting individual effect sizes are proportional to their weights.
4.2.3 | Overall effect by dependent variable

For studies examining the impact of CICO on problem behavior or a closely related construct, the random‐effects
model was statistically significant (g̅ = 1.16, 95% CI [0.84, 1.48]); associated t(20.1) = 7.51; p < 0.001. The random‐
effects model was also statistically significant ( g̅ = 1.53, 95% CI [1.08, 1.98]); associated t(13.5) = 7.36; p < 0.001 for
studies investigating the impact of CICO on academic engagement or a closely related construct. For studies
examining the impact of CICO on percent total points on a daily report card or a closely related construct, the
random‐effects model was statistically significant (g̅ = 1.10, 95% CI [0.61, 1.59]); associated t(6.57) = 5.43; p < 0.01.
Results were not subset further by dependent variable (e.g., office discipline referrals) because there were not
enough degrees of freedom to conduct the meta‐analysis via RVE with confidence.
4.2.4 | Moderator analyses

Moderator analyses were conducted via RVE to explore and potentially explain the considerable heterogeneity in
effect sizes identified by the Q‐test and I2 statistic. Nine study‐ and sample‐level characteristics were examined as
potential moderators: publication type, setting, design strength, whether an FBA was conducted before
implementing CICO, average age, proportion female, proportion racial/ethnic minority, proportion receiving
special education services, and proportion with behavioral difficulties at least partially maintained by adult/peer
attention. Moderators of interest were examined via separate regression models with the exception of setting,
which was a three‐level categorical variable that was dummy coded with elementary school as the reference group.
Results of the moderator analyses are presented in Table 4. No potential moderators significantly explained
heterogeneity in effect sizes.
| 405
Forest plot of effect sizes included in meta‐analysis using robust variance estimation
ET AL.
FIGURE 2
DREVON
406 | DREVON ET AL.
T A B L E 4 Regression table for moderator analyses using robust variance estimation
Moderator β0 SE β1 SE t df p
Publication type (0 = thesis/dissertation; 1 = journal article) 1.23 0.27 −0.02 0.29 −0.06 16.83 0.95
Setting (secondary schools) 1.24 0.14 0.00 0.36 0.01 8.07 1.00
Setting (alternative setting) – – −0.06 0.27 −0.21 7.82 0.84
Design strength (0 = does not meet; 1 = meets with reservations/meets) 1.09 0.13 0.21 0.20 1.03 20.82 0.31
FBA conducted before implementing CICO (0 = no; 1 = yes) 1.19 0.24 0.04 0.27 0.16 17.66 0.88
Age in years 1.26 0.41 0.00 0.04 −0.03 11.80 0.97
Proportion female 1.17 0.13 0.18 0.46 0.38 9.81 0.71
Proportion racial/ethnic minority 1.28 0.28 −0.09 0.41 −0.22 13.47 0.83
Proportion receiving special education services 1.23 0.15 −0.03 0.24 −0.11 10.00 0.92
Proportion whose behavioral difficulties were at least partially 1.31 0.27 −0.12 0.32 −0.38 7.40 0.72
maintained by adult/peer attention
Note. Moderators of interest were examined via separate regression models with the exception of setting which was a
three‐level categorical variable that was dummy coded with elementary school as the reference group and the comparison
group indicated in parentheses.
CICO: check‐in check‐out; FBA: functional behavior assessment.
4.2.5 | Publication bias

Publication bias is the possibility that studies with null or small effects are less likely to be published in peer‐
reviewed outlets than those with larger effects (Card, 2012; Zelinsky & Shadish, 2018). This could happen for a
variety of reasons. For example, a researcher utilizing SCD methodology may be unlikely to submit a study to a
journal if there is no functional relationship demonstrated between the independent and dependent variable. If
publication bias is present, this may lead meta‐analytic researchers to report a larger overall effect than if the total
universe of studies was included (Card, 2012).
For this study, a trim‐and‐fill analysis was used to assess publication bias and correct the overall effect size. See
Figure 3 for the results of the trim‐and‐fill analysis. The trim‐and‐fill analysis suggested the funnel plot was
asymmetrical and seven studies were missing from the left side of the funnel plot. The random‐effects model with
FIGURE 3 Trim‐and‐fill analysis to evaluate publication bias and correct the overall effect size
DREVON ET AL. | 407
Hartung–Knapp correction remained statistically significant, g̅ = 0.99, 95% CI [0.74, 1.24]. This is somewhat lower
than the overall effect produced via RVE, that is, g̅ = 1.22, 95% CI [1.00, 1.44].
5 | D IS C U S S IO N
The current study was conducted to descriptively and quantitatively aggregate the experimental evidence for the
effectiveness of CICO in improving student outcomes compared to baseline or control conditions. Although there
have been five recent systematic reviews of CICO, this study extended those reviews by conducting an updated
and exhaustive literature search and calculating a between‐case d statistic that allowed for the quantitative
synthesis of data from both single‐case and between‐groups designs. Furthermore, effect sizes were quantitatively
synthesized using RVE, which allowed for the inclusion of multiple effect sizes per study. Subsequent moderator
analyses were also conducted using RVE. Previous reviews have not conducted moderator analyses to investigate
study‐ or sample‐level characteristics that may moderate the effectiveness of CICO.
The updated and exhaustive literature search yielded 16 studies not included in existing systematic reviews of
CICO. The exhaustive literature search and our broader inclusion criteria resulted in the largest synthesis of CICO
outcomes thus far. Studies comparing CICO to a baseline or control condition have almost exclusively been conducted
using single‐case methodology, with multiple baseline designs being most common. Most studies were conducted in
elementary schools; however, there were multiple applications in secondary schools and residential facilities. Despite
being a Tier 2 intervention, which ordinarily do not incorporate function‐based support for addressing behavioral
difficulties, most studies reported FBA data for individual participants. Interestingly, one‐third of research participants
received special education services, which seems somewhat inconsistent with the aims of CICO in that it is packaged as
a Tier 2 intervention designed to prevent the development of more serious behavior problems.
Results of the meta‐analysis conducted via RVE found that CICO improved student outcomes by over one
standard deviation, compared with baseline or control conditions. Subsetting data by dependent variable resulted
in similar outcomes, with meta‐analyses of both problem behavior and academic engagement yielding effect sizes of
1.16 and 1.53, respectively. This runs counter to the findings by Wolfe et al. (2016) suggesting “there was little
support for the effectiveness of CICO for increasing appropriate behavior” (p. 12). Studies were characterized by
considerable heterogeneity that went unexplained in moderator analyses, possibly due to underpowered analyses.
Effect sizes did not vary by publication type, setting, design strength, whether an FBA was conducted before
implementing CICO, average age, proportion of female participants, proportion of participants who were racial/
ethnic minorities, proportion of participants receiving special education services, or proportion of participants with
behavioral difficulties at least partially maintained by adult/peer attention.
Potential moderators were identified a priori and largely based on conclusions drawn in existing systematic
reviews. Our analyses stand in contrast to two findings discussed in previous reviews. First, though they were
appropriately cautious in their claim, Hawken et al. (2014) speculated that the effectiveness of CICO may vary by
setting, with applications in elementary schools being more effective than secondary schools, particularly for
between‐groups designs. The current study results did not support this conclusion, however. CICO appears to be
similarly effective across settings.
Second, in regard to the role of function in determining the effectiveness of CICO, Wolfe et al. (2016) noted,
“Strong effects were demonstrated only with participants whose problem behavior was maintained by access to
attention” (p. 12). Similarly, Maggin et al. (2015) noted “standard CICO procedures tend to be most effective for
students with attention‐maintained behavioral problems” (p. 206). Both of these systematic reviews appeared to
rely on individual study results or descriptive analyses to arrive at these conclusions. In the current study, the role
of function was examined statistically by investigating whether variability in effect sizes differed as a function of
the percentage of participants with behavioral difficulties at least partially maintained by adult/peer attention. As
indicated previously, they did not. Of course, this does not suggest function of behavioral difficulties does not play a
408 | DREVON ET AL.
role in determining the effectiveness of CICO outright. There at least two reasons why function of behavioral
difficulties may not have moderated the heterogeneity in effect sizes. First, there were missing data for eight
studies for this variable, which may have affected point estimates and standard errors. Second, this meta‐analysis
treated individual studies examining what these researchers termed “basic” or “standard” CICO the same as studies
termed “modified” or “adapted” CICO for students with behavioral difficulties maintained by escape or avoidance
(e.g., Turtura et al., 2014). Analytically, this might be handled by examining whether there is an interaction effect
between type of CICO (i.e., basic or modified) and the percentage of participants with behavioral difficulties at least
partially maintained by adult/peer attention in explaining heterogeneity in effect sizes. These analyses were not
undertaken; given missing data and the fact there were few studies examining modified CICO with participants
with attention‐maintained behavioral difficulties.
Although the meta‐analysis results provide an average effect size, the heterogeneity of effect sizes across
studies that was still unexplained through the moderator analysis makes it difficult to predict the effect of CICO in
any given implementation. It is possible to speculate that uncoded and/or unavailable data related to study‐ or
sample‐level characteristics may have accounted for the heterogeneity in effect sizes. Although treatment integrity
was reported as high across studies, it is possible that the treatment integrity measures did not assess some
important features of the intervention, which could plausibly vary across studies.
One potentially important variable is the CICO coordinator. Are there differences across coordinators that
influence the reinforcing properties of their interactions with students? These possible differences could be related
to the role coordinators have played in the participants’ lives—such as counselor versus classroom teacher. A
coordinator who has a strong relationship with the participant may have greater influence than an initially
unfamiliar coordinator. Future research should examine the participants’ relationships with the coordinators and
the extent to which contact with the coordinator functions as reinforcement. Furthermore, there were also
differences across studies in the extent to which tangible reinforcers were used, the types of tangible reinforcers,
and the schedules of reinforcement. Future research should focus on systematically evaluating the reinforcing
efficacy of the putative reinforcers, as well as evaluating the schedule of reinforcement. In addition to addressing
issues related to the consequences delivered in CICO, other variables should be carefully reported and studied. For
example, in some cases, the behavioral expectations on the Daily Progress Reports were perfectly aligned with the
dependent variables, but in other cases they were not. Investigation of the alignment between behavioral
expectations and dependent variables should be evaluated to provide direction in designing these features of CICO.
In summary, the current meta‐analysis of CICO generally aligns with the results of prior meta‐analyses in
finding a positive effect on student outcomes. This study extended prior research by expanding the number of
included studies, using a statistical analysis in terms of standardized effect sizes that allowed SCD and between‐
groups results to be combined, and by conducting a statistical analysis of potential moderator variables. The
moderator analysis was not able to account for the considerable heterogeneity across studies. Future research
needs to be conducted to identify potential moderators of the effects of CICO.
6 | UNC IT ED REF E R E NCE S

(Studies; Kratochwill et al. (2010) MacLeod (2009); Marso and Shadish (2011); Tanner‐Smith and Tipton (2014);
What Works Clearinghouse (2013))
OR CID
Daniel D. Drevon http://orcid.org/0000-0002-4750-4498

DREVON ET AL. | 409
REFERENC ES
Studies preceded by an asterisk were identified for inclusion in the meta‐analysis.

*Barber, A. L. (2013). An evaluation of check‐in/check‐out with accountability tracking for at risk students in a high‐need
elementary school (Master’s thesis). Retrieved from ProQuest Dissertations and Theses Global. (UMI No. 1543038).
*Boden, L. J., Jolivette, K., & Alberto, P. A. (2018). The effects of check‐in, check‐out for students with moderate intellectual
disability during on‐and off‐site vocational training. Journal of Classroom Interaction, 53(1), 4–12.
Boyd, R. J. (2011). An evaluation of a secondary intervention for students whose problem behaviors are escape maintained.
(Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses Global. (UMI No. 3466318).
*Boyd, R. J., & Anderson, C. M. (2013). Breaks are better: A tier II social behavior intervention. Journal of Behavioral
Education, 22(4), 348–365. https://doi.org/10.1007/s10864‐013‐9184‐2
Boyle, M. A., Samaha, A. L., Rodewald, A. M., & Hoffmann, A. N. (2013). Evaluation of the reliability and validity of
GraphClick as a data extraction program. Computers in Human Behavior, 29(3), 1023–1027. https://doi.org/10.1016/j.
chb.2012.07.031
*Bunch‐Crump, K. R., & Lo, Y. (2017). An investigation of multitiered behavioral interventions on disruptive behavior and
academic engagement of elementary students. Journal of Positive Behavior Interventions, 19(4), 216–227. https://doi.org/
10.1177/1098300717696939
*Camacho, A. P. (2016) An evaluation of an assessment of check‐in/check‐out with children who are homeless in an after school
care program (Unpublished master’s thesis). University of South Florida, Tampa, FL
Campbell, A., & Anderson, C. M. (2008). Enhancing effects of check‐in/check‐out with function‐based support. Behavioral
Disorders, 33(4), 233–245. Retrieved from. http://www.jstor.org/stable/43153457
*Campbell, A., & Anderson, C. M. (2011). Check‐in/check‐out: A systematic evaluation and component analysis. Journal of
Applied Behavior Analysis, 44(2), 315–326. https://doi.org/10.1901/jaba.2011.44‐315
Card, N. A. (2012). Applied meta‐analysis for social science research. New York, NY: Guilford Press.
*Collins, T. A., Gresham, F. M., & Dart, E. H. (2016). The effects of peer‐mediated check in/check‐out on the social skills of
socially neglected students. Behavior Modification, 40(4), 568–588. https://doi.org/10.1177/0145445516643066
Crone, D. A., Hawken, L. S., & Horner, R. H. (2010). Responding to problem behavior in schools: The Behavior Education Program
(2nd ed.). New York, NY: Guilford.
*Dart, E. H., Furlow, C. M., Collins, T. A., Brewer, E., Gresham, F. M., & Chenier, K. H. (2015). Peer‐mediated check‐in/check‐
out for students at‐risk for internalizing disorders. School Psychology Quarterly, 30(2), 229–243. https://doi.org/10.
1037/spq0000092
*Dexter, C. A. (2015) Effects of a modified daily progress report for check in/check out at the elementary level (Unpublished
doctoral dissertation). The Pennsylvania State University, University Park, PA.
*Ennis, R. P., Jolivette, K., Swoszowski, N. C., & Johnson, M. L. (2012). Secondary prevention efforts at a residential facility
for students with emotional and behavioral disorders: Function‐based check‐in, check‐out. Residential Treatment for
Children & Youth, 29(2), 79–102. https://doi.org/10.1080/0886571X.2012.669250
*Fairbanks, S., Sugai, G., Guardino, D., & Lathrop, M. (2007). Response to intervention: Examining classroom behavior
support in second grade. Exceptional Children, 73(3), 288–310. https://doi.org/10.1177/001440290707300302
*Fallon, L. M., & Feinberg, A. B. (2017). Implementing a tier 2 behavioral intervention in a therapeutic alternative high
school program. Preventing School Failure, 61(3), 189–197. https://doi.org/10.1080/1045988X.2016.1254083
Fisher, Z., & Tipton, E. (2017) robumeta. R package version 2.0.
*Harpole, L. L. (2012) Evaluation of performance‐based and pre‐set conventional criterion for reinforcement in check in‐check out
Harrison, C. D. (2013). An evaluation of the effects of the academics and behavior check in/check‐out intervention (Doctoral
dissertation). Retrieved from ProQuest Dissertations and Theses Global. (UMI No. 3589510).
Hawken, L. S., Adolphson, S. L., MacLeod, K. S., & Schumann, J. (2008). Secondary‐tier interventions and supports. In W.
Sailor. In Dunlap, G., Sugai, G., & Horner, R. (Eds.), Handbook of positive behavior support (pp. 395–420). New York, NY:
Springer.
Hawken, L. S., Bundock, K., Kladis, K., O’keeffe, B., & Barrett, C. A. (2014). Systematic review of the check‐in, check‐out
intervention for students at risk for emotional and behavioral disorders. Education and Treatment of Children, 37(4),
635–658. https://doi.org/10.1353/etc.2014.0030
*Hawken, L. S., & Horner, R. H. (2003). Evaluation of a targeted intervention within a schoolwide system of behavior
support. Journal of Behavioral Education, 12(3), 225–240. https://doi.org/10.1023/A:1025512411930
*Hawken, L. S., Sandra macleod, K., & Rawlings, L. (2007). Effects of the Behavior Education Program (BEP) on office
discipline referral of elementary school students. Journal of Positive Behavior Interventions, 9(2), 94–101. https://doi.org/
10.1177/10983007070090020601
Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and related estimators. Journal of Educational
Statistics, 6(2), 107–128. https://doi.org/10.3102/10769986006002107
410 | DREVON ET AL.
Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2012). A standardized mean difference effect size for single‐case designs.
Research Synthesis Methods, 3(3), 224–239. https://doi.org/10.1002/jrsm.1052
Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2013). A standardized mean difference effect size for multiple baseline
designs across individuals. Research Synthesis Methods, 4(4), 324–341. https://doi.org/10.1002/jrsm.1086
Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta‐regression with dependent effect size
estimates. Research Synthesis Methods, 1(1), 39–65. https://doi.org/10.1002/jrsm.5
Higgins, J. P. T. (2003). Measuring inconsistency in meta‐analyses. British Medical Journal, 327(7414), 557–560. https://doi.
org/10.1136/bmj.327.7414.557
*Hunter, K. K., Chenier, J. S., & Gresham, F. M. (2014). Evaluation of check in/check out for students with internalizing
behavior problems. Journal of Emotional and Behavioral Disorders, 22(3), 135–148. https://doi.org/10.1177/
1063426613476091
Kauffman, A. L. (2008). Stimulus fading within check‐in/check‐out (Doctoral dissertation). Retrieved from ProQuest
Dissertations and Theses Global. (UMI No. 3335183).
*Klein, C. J. (2014). An evaluation of the relationship between function of behavior and a modified check‐in, check‐out intervention
using a daily behavior report card (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses Global.
(UMI No. 3631808).
Klingbeil, D. A., Dart, E. H., & Schramm, A. L. (2018). A systematic review of function‐modified check‐in/check‐out. Journal of
Positive Behavior Interventions, https://doi.org/10.1177/1098300718778032
Komsta, L. (2011) outliers. R package version 0.14.
MacLeod, K. S., Hawken, L. S., O’Neill, R. E., & Bundock, K. (2016). Combining tier 2 and tier 3 supports for students with
disabilities in general education settings. Journal of Educational Issues, 2(2), 331‐351. https://doi.org/10.5296/jei.v2i2.
10183
*Lane, K. L., Capizzi, A. M., Fisher, M. H., & Ennis, R. P. (2012). Secondary prevention efforts at the middle school level: An
application of the Behavior Education Program. Education and Treatment of Children, 35(1), 51–90. https://doi.org/10.
1353/et.2012.0002
Maggin, D. M., Zurheide, J., Pickett, K. C., & Baillie, S. J. (2015). A systematic evidence review of the check‐in/check‐out
program for reducing student challenging behaviors. Journal of Positive Behavior Interventions, 17(4), 197–208. https://
doi.org/10.1177/1098300715573630
*March, R. E., & Horner, R. H. (2002). Feasibility and contributions of functional behavioral assessment in schools. Journal of
Emotional and Behavioral Disorders, 10(3), 158–170. https://doi.org/10.1177/10634266020100030401
*McDaniel, S. C., & Bruhn, A. L. (2016). Using a changing‐criterion design to evaluate the effects of check‐in/check‐out with goal
modification. Journal of Positive Behavioral Interventions, 18(4), 197–208. https://doi.org/10.1177/1098300715588263
*McLemore (2016). The effects of peer‐mediated check‐in, check‐out with a self‐monitoring component on disruptive behavior and
appropriate engagement in the classroom (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses
Global.
*Melius, P., Swoszowski, N. C., & Siders, J. (2015). Developing peer led check‐in/check‐out: A peer‐mentoring program for
children in residential care. Residential Treatment for Children & Youth, 32(1), 58–79. https://doi.org/10.1080/0886571X.
2015.1004288
Miller, L. M. (2013). Effects of check in/checkout with a fading procedure on the academic engagement and program behavior of
elementary school students (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses Global. (UMI No.
3574381).
*Miller, L. M., Dufrene, B. A., Joe olmi, D., Tingstrom, D., & Filce, H. (2015). Self‐monitoring as a viable fading option in
check‐in/check‐out. Journal of School Psychology, 53(2), 121–135. https://doi.org/10.1016/j.jsp.2014.12.004
*Miller, L. M., Dufrene, B. A., Sterling, H. E., Olmi, D. J., & Bachmayer, E. (2015). The effects of check‐in/check‐out on
problem behavior and academic engagement in elementary school students. Journal of Positive Behavior Interventions,
17(1), 28–38. https://doi.org/10.1177/1098300713517141
*Mitchell, B. S. (2012). Investigating use of the Behavior Education Program for students with internalizing behavioral concerns
Mitchell, B. S., Adamson, R., & McKenna, J. W. (2017). Curbing our enthusiasm: An analysis of the check‐in/check‐out
literature using the council for exceptional children’s evidence‐based practice standards. Behavior Modification, 41(3),
343–367. https://doi.org/10.1177/0145445516675273
Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Preferred reporting items for systematic reviews and meta‐
analyses: The PRISMA statement. PLOS Medicine, 6(7), 1–6. https://doi.org/10.1371/journal.pmed.1000097
*Mong, M. D., Johnson, K. N., & Mong, K. W. (2011). Effects of check‐in/checkout on behavioral indices and mathematics
generalization. Behavioral Disorders, 36(4), 225–240. Retrieved from. http://www.jstor.org/stable/43153837
Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap and trend for single‐case research:
Tau‐U. Behavior Therapy, 42(2), 284‐299. https://doi.org/10.1016/j.beth.2010.08.006
DREVON ET AL. | 411
*Parry, M. J. (2014). Evaluating the effectiveness and feasibility of integrating self‐monitoring into an existing tier II intervention for
elementary school students (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses Global. (UMI No.
3640219).
R Core Team (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical
Computing. http://www.R‐project.org
Rakap, S., Rakap, S., Evran, D., & Cig, O. (2016). Comparative evaluation of the reliability and validity of three data
extraction programs: UnGraph, GraphClick, and DigitizeIt. Computers in Human Behavior, 55(Part A), 159–166. https://
doi.org/10.1016/j.chb.2015.09.008
Rohatgi, A. (2018). WebPlotDigitizer (Version 4.1) [Computer software]. Retrieved from http://arohatgi.info/
WebPlotDigitizer
Ross, S. W., & Sabey, C. V. (2015). Check‐in check‐out+social skills: Enhancing the effects of check‐in check‐out for students
with social skill deficits. Remedial and Special Education, 36(4), 246–257. https://doi.org/10.1177/0741932514553125
*Sanchez, S., Miltenberger, R. G., Kincaid, D., & Blair, K. C. (2015). Evaluating check‐in check‐out with peer tutors for
children with attention maintained problem behaviors. Child & Family Behavior Therapy, 37(4), 285–302. https://doi.org/
10.1080/07317107.2015.1104769
Scruggs, T. E., & Mastropieri, M. A. (1998). Summarizing single‐subject research: Issues and applications. Behavior
Modification, 22(3), 221–242. https://doi.org/10.1177/01454455980223001
Shadish, W. R., Hedges, L. V., & Pustejovsky, J. E. (2014). Analysis and meta‐analysis of single‐case designs with a
standardized mean difference statistic: A primer and applications. Journal of School Psychology, 52(2), 123–147. https://
doi.org/10.1016/j.jsp.2013.11.005
Shadish, W.R., Robinson, L., & Lu, C. (1997). ES: A computer program and manual for effect size calculation. Retrieved from
http://faculty.ucmerced.edu/wshadish/software/es‐computer‐program
Simonsen, B., & Myers, D. (2015). Classwide positive behavior interventions and supports: A guide to proactive classroom
management. New York, NY: Guilford Press.
*Simonsen, B., Myers, D., & Briere, D. E. (2011). Comparing a behavioral check‐in/check‐out (CICO) intervention to
standard practice in an urban middle school setting using an experimental group design. Journal of Positive Behavior
Interventions, 13(1), 31–48. https://doi.org/10.1177/1098300709359026
*Stuart, C. (2013). An evaluation on the effects of check‐in/check‐out with school‐aged children residing in a mental health
treatment facility (Master's thesis). Retrieved from ProQuest Dissertations and Theses Global. (UMI No. 1548514).
Sugai, G., & Horner, R. H. (2006). A promising approach for expanding and sustaining school wide positive behavior support.
School Psychology Review, 35(2), 245–259. http://www.nasponline.org/publications/periodicals/spr/volume‐35/volume‐
35‐issue‐2/a‐promising‐approach‐for‐expanding‐and‐sustaining‐school‐wide‐positive‐behavior‐support
*Swain‐Bradway J. L. (2009) An analysis of a secondary level intervention for high school students at risk of school failure: The
high school Behavior Education Program (Unpublished doctoral dissertation). University of Oregon, Eugene, OR.
*Swoszowski, N. C., Jolivette, K., Fredrick, L. D., & Heflin, L. J. (2012). Check in/check out: Effects on students with
emotional and behavioral disorders with attention‐ or escape‐maintained behavior in a residential facility.
Exceptionality, 20(3), 163–178. https://doi.org/10.1080/09362835.2012.694613
*Swoszowski, N. C., Mcdaniel, S. C., Jolivette, K., & Melius, P. (2013). The effects of tier ii check‐in/check‐out including
adaptation for non‐responders on the off‐task behavior of elementary students in a residential setting. Education and
Treatment of Children, 36(3), 63–79. https://doi.org/10.1353/etc.2013.0024
Tanner‐Smith, E. E., & Tipton, E. (2014). Robust variance estimation with dependent effect sizes: Practical considerations
including a software tutorial in Stata and SPSS. Research Synthesis Methods, 5(1), 13–30. https://doi.org/10.1002/
jrsm.1091
*Todd, A. W., Campbell, A. L., Meyer, G. G., & Horner, R. H. (2008). The effects of a targeted intervention to reduce problem
behaviors: Elementary school implementation of check in‐check out. Journal of Positive Behavior Intervention, 10(1), 46–
55. https://doi.org/10.1177/1098300707311369
*Toms, O. M. (2012). The effects of check‐in check‐out on the social and academic planning and outcomes of african‐american
males in urban secondary setting (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses New
Platform. (UMI No. 3510233).
*Turtura, J. E., Anderson, C. M., & Boyd, R. J. (2014). Addressing task avoidance in middle school students: Academic
behavior check‐in/check‐out. Journal of Positive Behavioral Interventions, 16(3), 159–167. https://doi.org/10.1177/
1098300713484063
Viechtbauer, W. (2017). metafor. R package version, 2, 0–0.
What Works Clearinghouse (2017). Standards handbook (v. 4.0). Washington, DC: Institute of Education Sciences
Wolfe, K., Pyle, D., Charlton, C. T., Sabey, C. V., Lund, E. M., & Ross, S. W. (2016). A systematic review of the empirical
support for check‐in check‐out. Journal of Positive Behavior Interventions, 18(2), 74–88. https://doi.org/10.1177/
1098300715595957
412 | DREVON ET AL.
Zelinsky, N. A. M., & Shadish, W. (2018). A demonstration of how to do a meta‐analysis that combines single‐case designs
with between‐groups experiments: The effects of choice making on challenging behaviors performed by people with
disabilities. Developmental Neurorehabilitation, 21(4), 266–278. https://doi.org/10.3109/17518423.2015.11
S U P P O R T I N G I N F O RMA T I O N
Additional supporting information may be found online in the Supporting Information section at the end of the
article.
How to cite this article: Drevon DD, Hixson MD, Wyse RD, Rigney AM. A meta‐analytic review of the
evidence for check‐in check‐out. Psychol Schs. 2019;56:393–412. https://doi.org/10.1002/pits.22195

Psychology in The Schools - 2018 - Drevon

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Psychology in The Schools - 2018 - Drevon

Uploaded by

Copyright:

Available Formats

Received: 30 July 2018 | Accepted: 17 September 2018

A meta‐analytic review of the evidence for

Daniel D. Drevon | Michael D. Hixson | Robert D. Wyse |

Department of Psychology, Central Michigan

Psychol Schs. 2019;56:393–412. wileyonlinelibrary.com/journal/pits © 2018 Wiley Periodicals, Inc. | 393

Hawken Wolfe Maggin Mitchell

3.1 | Literature search and inclusion criteria

3.2 | Descriptive analysis

3.2.2 | Appraisal of methodological quality

3.3 | Quantitative analysis

3.3.2 | Effect size calculation

3.3.3 | Data analysis

4.1 | Descriptive review

4.1.1 | Descriptive data

Study‐level characteristic Percent

McLemore (2016) 3 Thesis/dissertation ABAB SS Meets No 11.00 67 100 33 NR

Study‐level characteristic Percent

T A B L E 3 Descriptive data for study‐level characteristics

Study‐level characteristic n (N = 37) % of Total

4.1.2 | Appraisal of methodological quality

4.2 | Quantitative review

4.2.2 | Overall effect

4.2.3 | Overall effect by dependent variable

4.2.4 | Moderator analyses

T A B L E 4 Regression table for moderator analyses using robust variance estimation

4.2.5 | Publication bias

6 | UNC IT ED REF E R E NCE S

Daniel D. Drevon http://orcid.org/0000-0002-4750-4498

Studies preceded by an asterisk were identified for inclusion in the meta‐analysis.

You might also like