Effectsof Commission Errorson Behavior Intervention Plan Outcom

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/380753238
Effects of Commission Errors on Behavior Intervention Plan Outcomes
Thesis · May 2024

DOI: 10.13140/RG.2.2.30998.15687
CITATIONS READ
0 1
1 author:
Olivia Harvey
West Virginia University
2 PUBLICATIONS 1 CITATION
SEE PROFILE
All content following this page was uploaded by Olivia Harvey on 21 May 2024.
The user has requested enhancement of the downloaded file.

Graduate Theses, Dissertations, and Problem Reports
2024
Effects of Commission Errors on Behavior Intervention Plan

Outcomes
Olivia Brianne Harvey
Follow this and additional works at: https://researchrepository.wvu.edu/etd
Part of the Applied Behavior Analysis Commons
Recommended Citation
Harvey, Olivia Brianne, "Effects of Commission Errors on Behavior Intervention Plan Outcomes" (2024).
Graduate Theses, Dissertations, and Problem Reports. 12469.
https://researchrepository.wvu.edu/etd/12469
This Thesis is protected by copyright and/or related rights. It has been brought to you by the The Research
Repository @ WVU with permission from the rights-holder(s). You are free to use this Thesis in any way that is
permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain
permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license
in the record and/ or on the work itself. This Thesis has been accepted for inclusion in WVU Graduate Theses,
Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU.
For more information, please contact researchrepository@mail.wvu.edu.
Olivia B. Harvey
Thesis submitted
to the Eberly College of Arts and Sciences
at West Virginia University
in partial fulfillment of the requirements for the degree of
Master of Science in
Psychology
Claire St. Peter, PhD, Chair

Kathryn Kestner, PhD
Kathleen Morrison, PhD
Department of Psychology
Morgantown, West Virginia

2024
Keywords: treatment integrity, commission error, school-based intervention
Copyright 2024 Olivia Harvey

ABSTRACT
Olivia B. Harvey
When implemented well (with fidelity), behavior intervention plans (BIP) improve
student outcomes. Teachers tend to implement BIPs with poor overall fidelity, but little is known
about the specific errors occurring during BIP implementation or the subsequent impacts these
errors have on student outcomes. One possibility is that teachers learn what strategies suppress
challenging behavior and implement those strategies regardless of what is written in the formal
BIP. These added intervention components, termed commission errors, have not yet been
evaluated in the context of BIP implementation. The proposed studies will begin to address these
gaps. During Study 1, we identified the prevalence and types of errors that three teachers made
when implementing BIPs. A frequent commission error was selected for each student-teacher
dyad to be assessed in Study 2. During Study 2, we manipulated the identified error to determine
its impacts on student outcomes. To accomplish this aim, we compared rates of challenging
behavior when the error was present or absent during implementation of the BIP by a behavior
analyst, using a reversal design. Teacher’s engaged in frequent errors and one commission error
enhanced efficacy of a student’s BIP.
Table of Contents
Introduction……………………………………………………………………………………….. 1
General Method ………………………………………………………………………………….. 6
Recruitment and Consenting Process……………………………………………………. 6
Demographic Questionnaires, Record Review, and Likert Scale……………………...…. 6
Study 1………………………...……………………………………………………………..….... 9
Method……...…………………………………………………………...…………..……. 9
Setting……...……………………………………………………………………..………. 9
Response Measurement…………………………..…………………………..……….… 10
Data Analysis………………………………………………….……………..……….…. 13
Interobserver Agreement (IOA)……………………………….……………..……….…. 14
Results and Discussion………………………………………………..……..……….…. 15
Study 2………………………...……………………………………………………………….... 22
Method……...………………………………………………….…………………..……. 23
Implementer, Consenting Procedures, & Setting……..…………………………………. 23
Response Measurement and IOA……………………………………………..……….…24
Procedural Fidelity…………………………………………………………..……….…. 26
Error Selection………………………………………………………...……..……….…. 27
Experimental Manipulation………………………………………...………..……….…. 28
Experimental Design……..………………………..………………………………….…. 29
Results and Discussion…………………………………………...…………..……….… 31
General Discussion…………………………………………………………………..……….…. 34
References …………………………………………………………………………………. ……40
iii
Tables…... …………………………………………………………………………………. ……47
Figures…. …………………………………………………………………………………. ……58
Appendices………………………………………………………………………………….……70
iv
Procedural fidelity is the extent to which a procedure is implemented as planned (Cook et
al., 2015). Procedures implemented with high procedural fidelity yield better outcomes for skill-
acquisition tasks (Bergmann et al., 2021; DiGennaro Reed et al., 201; Holcombe et al., 1994;
Jenkins et al., 2015; Leon et al., 2014; Noell et al., 2002), self-care skills (Donnelly & Karsten,
2017), and behavior-reduction procedures (Foreman et al., 2022; St. Peter et al., 2016; St. Peter
Pipkin et al., 2010) relative to procedures implemented with reduced fidelity. However, the
specific effects of reduced procedural fidelity may depend on the type of error made.
There are at least two types of procedural fidelity errors: omission errors and commission
errors. An omission error consists of the absence of a step of an intervention. For example,
failing to deliver a reinforcer following an alternative (appropriate) response as specified in a
procedure would be an omission error. A commission error consists of adding a step to an
intervention context. For example, delivering a reinforcer following a target (challenging)
response, which is not specified in the procedure, would be a commission error.
There are relatively few studies evaluating effects of commission errors in isolation on
behavior-analytic procedures (Brand et al., 2019). Of the three studies that have evaluated
commission errors in isolation, all evaluated the same form of commission error: delivering a
reinforcer following unwanted behavior (DiGennaro Reed et al., 2011; Leon et al., 2014; St.
Peter Pipkin et al., 2010). For example, St. Peter Pipkin et al. (2010) evaluated commission
errors during differential reinforcement of alternative behavior with college students who
engaged in arbitrary responses. During Experiment 1, commission errors consisted of
periodically delivering a point when participants engaged in a target behavior that the
experimenters deemed analogous to challenging behavior. Commission errors resulted in higher
1
rates of target behavior relative to the intervention implemented without errors. Similarly,
DiGennaro Reed et al. (2011) evaluated commission errors during a skill-acquisition task with
elementary students diagnosed with autism spectrum disorder. Commission errors consisted of
praising incorrect responses. Commission errors resulted in low accuracy relative to teaching
without commission errors.
Several other forms of commission errors could occur. For example, Donnelly and
Karsten (2017) identified at least six forms of commission errors during observations of an
intervention for teaching self-care skills. One commission error was prompting the client to
complete the self-care skill steps out of order; this error produced undesirable outcomes
(participants did not reach mastery criterion) for both participants. A second commission error
was offering a choice of items prior to teaching when a choice should not have been offered.
Offering a choice of items may have improved outcomes (Bannerman et al., 1990), but its effects
were not evaluated.
To my knowledge, only one experiment (Carroll et al., 2013) has identified and evaluated
a commission error that was empirically demonstrated to improve outcomes. Carroll et al.
conducted a two-part study. In Study 1, experimenters observed four special-education teachers,
one regular-education teacher, one speech pathologist, and three paraprofessionals implement
discrete trial teaching with children diagnosed with autism spectrum disorder in a classroom
setting. During an average of 45% of trials, participants presented an instruction twice when the
procedure specified a single instruction. The researchers subsequently evaluated effects of this
error in Study 2. Introducing the error resulted in detrimental impacts (slower learning) for two
participants, but facilitative impacts (faster learning) for a third participant. Thus, in some
circumstances, commission errors could facilitate positive outcomes.
2
The two-part approach to first identifying errors and then manipulating them was also
adopted by Foreman et al. (2021) in the context of teachers’ implementation of school-based
behavior intervention plans (BIP) that included timeout procedures. Foreman et al. used
descriptive observation to identify common errors. The descriptive data demonstrated that
teachers frequently omitted timeout. Foreman et al. then experimentally manipulated the
frequency of timeout. For example, teachers implemented timeout following an average of one in
20 instances of challenging behavior for one student, so the researcher compared the efficacy of
timeout at 100% procedural fidelity (timeout after each response) and 5% procedural fidelity
(timeout following one in 20 responses, on average). Challenging behavior was suppressed for
two participants even when procedural fidelity was reduced. However, the timeout procedure
was only one component in more complex BIPs. Although Foreman et al. did not identify
frequent commission errors with the use of timeout, the limited scope of the evaluation may have
reduced the types and rates of commission errors that could have been observed.
Identifying a broader range of errors may be particularly important because teachers often
implement BIP’s with notably low procedural fidelity. For example, Wickstrom et al. (1998)
observed 27 teachers with varying levels of teaching experience implement behavior-change
procedures (e.g., differential reinforcement of alternative behavior, response cost) in their
classrooms. Observers collected data on the extent to which teachers implemented consequences
for student behavior as planned. Teachers provided planned consequences with 0% to 21%
procedural fidelity (M = 4%).
More recently, Codding et al. (2008) observed three teachers who had received an 8-hr
didactic training to implement a class-wide behavior management plan. Observers collected data
on the teacher’s implementation of 14 procedural components. Teachers implemented the
3
behavior management plan with 0% to 57% procedural fidelity. Although Codding et al. reported
a higher procedural fidelity percentage than did Wickstrom et al. (1998), these percentages are
still low in comparison to the recommended standard of at least 80% procedural fidelity
(National Autism Center, 2015).
Although data from Wickstrom et al. (1998) and Codding et al. (2008) demonstrate that
teachers implement procedures with low overall fidelity, neither study included the types of
errors that teachers made. For example, Wickstrom et al. reported that teachers did not
implement the planned consequence following each instance of student behavior, but not what
the teacher did instead. Similarly, Codding et al. recorded teachers’ erroneous implementation of
a component as “not implemented as written,” which limits the interpretation of the types of
errors teachers made (p. 331).
Recently, Morris et al. (2024) provided recommendations for measuring procedural
fidelity using a checklist during direct observation of a procedure. Morris et al. suggested that
BIP steps be operationalized into observable and measurable units. Then, the steps should be
organized sequentially or under subheadings of the contexts in which the steps should occur.
Finally, each step should have a measure that accurately captures the relevant dimensions of the
step (e.g., frequency, duration). The authors note that modifying the measurement system to
include commission errors when observed may be important. However, none of the previous
studies on naturally occurring commission errors have used the strategies suggested by Morris et
al. to broadly capture various kinds of errors.
One possibility is that teachers add steps to BIPs to suppress challenging behavior. For
example, a teacher may deliver additional access to reinforcers (attention or access to items)
when the student is on-task, relative to what is specified in the BIP. This error may increase on-
4
task behavior, thus increasing the likelihood the teacher continues to make the error. When
teachers make such errors, the BIP may not result in the same student outcomes when the student
transitions to a new teacher. This difference in outcomes may even (or especially) occur if the
new teacher implements the BIP as written (with high fidelity) because they would be delivering
fewer reinforcers relative to the previous teacher, which may result in an increase in challenging
behavior.
Fidelity errors have predominantly been found to be detrimental to outcomes. However,
several other types of errors could exist, and such errors may have different effects on outcomes.
Teachers are known to make errors, but little is known about the types of errors teachers make.
One possibility is that teachers add steps to the BIP (i.e., commission error not in the plan) and
that these interactions are maintained by the suppression of challenging behavior (i.e., facilitates
outcomes). A gap exists in the literature on how to capture a broader array of errors and the
impact such errors may have on outcomes. Given that errors occur, it is critical to (a) determine
the form and frequency of specific errors, and (b) evaluate impacts of those errors on outcomes.
Therefore, the purposes of the present studies were to identify the types and prevalence of
commission and omission errors in BIP implementation by teachers and to evaluate effects of an
identified commission error on student outcomes. Study 1 was a descriptive assessment to collect
data on types and frequencies of commission errors that occurred when teachers implemented
BIPs. Study 2 was an experimental manipulation of an identified commission error in a reversal
design to determine if an observed commission error from Study 1 affected rates of challenging
behavior.
5
General Method
Recruitment and Consenting Process
Student-teacher dyads were recruited from public elementary schools. We received
approval from the school district and contacted teachers who typically provided services to
students with BIPs. Researchers met with teachers who expressed interest to garner their consent.
The teachers were asked to provide information about the study to the legal guardian(s) of a
student(s) in their class who had a formal, written BIP. Legal guardian(s) who expressed interest
were sent a consent form. Once teacher and parent consent was secured, assent was obtained for
children above the age of 7 years without significant cognitive impairments.
Eight student-teacher dyads were recruited. Four dyads did not participate in the study:
two students were unenrolled from the study, one student changed schools, and one teacher
withdrew consent to participate before data collection began. Four dyads completed Study 1 and
were recruited for Study 2. Each of these dyads was from a public, alternative-education
elementary school.
Demographic Questionnaires, Record Review, and Likert Scale
Table 1 summarizes student demographic information and Table 2 summarizes teacher
demographic information and responses to the Likert Scale. See Appendix A and Appendix B for
the Questionnaires and Likert Scales.
Fabian1 was an 8-year-old white male whose primary language was English. Fabian’s
parent reported that he had attention deficit disorder, depression, a learning disability, and
oppositional defiant disorder and that Fabian took guanfacine and methylphenidate medication
daily. Fabian’s BIP had been written 8 months before his enrollment in the study. Fabian’s BIP
1
All names are pseudonyms.
6
specified he may protest, swear, insult others, leave his area, destroy property, or aggress.
Fabian’s educational records specified challenging behavior was maintained by escape. Fabian’s
BIP also included a pass that could be exchanged anytime to temporarily escape academic tasks
(tag-out area) and replace the ongoing academic task. Fabian’s BIP included a token system in
which tokens could be exchanged for items or activities with teachers on a reward menu (e.g.,
coloring, Legos). Fabian’s BIP contained two class-wide management strategies: a group token
system and the Good Behavior Game. Fabian’s teacher, Kelly, was a 40-year-old white female
whose primary language was English. Kelly had a master’s degree and over 10 years of
experience as a teacher. She held a credential as a Board Certified Behavior Analyst. Kelly had
known Fabian for 4 yr and worked with him for 1.5 yr. Kelly had assisted in writing Fabian’s
BIP.
Dakota was a 9-year-old white female whose primary language was English. Her legal
guardian reported that Dakota had attention deficit hyperactivity disorder and took focalin,
remeron, clonidine, risperidone, zoloft, and melatonin daily. Dakota’s BIP had been written 5
months before her enrollment in the study. Dakota’s BIP specified she may refuse to comply,
leave her area, destroy property, aggress, elope, or injure herself. Dakota’s educational records
specified challenging behavior may be triggered by denied requests or not being within
proximity to a teacher and that challenging behavior was maintained by access to tangibles and
attention. Dakota’s BIP included certificates that could be exchanged anytime for access to adult
attention and frequent praise. Dakota’s BIP also included a token system in which tokens could
be exchanged for escape from academic tasks and access to tangible items. Dakota’s teacher,
Rachel, was a 45-year-old white female whose primary language was English. Rachel had a
7
master’s degree and 22 years of experience. Rachel had known and been working with Dakota
for 5 months. Rachel had assisted in writing Dakota’s BIP.
Warren was a 7-year-old white male whose primary language was English. The legal
guardian did not report any diagnoses or medications. Warren’s BIP had been written 6 months
before his enrollment in the study. Warren’s BIP specified he may bargain, yell, swear, leave his
area, destroy property, aggress, and elope. Warren’s educational records specified challenging
behavior was maintained by escape and attention. Warren’s BIP included a reward system for
completing worksheets without challenging behavior. For each worksheet that Warren completed
without challenging behavior, he could select from various rewards on a menu (i.e., skip-an-
assignment pass, play with a peer for 10-min). Warren’s BIP also included a morning check-in
and access to tangible items or teacher attention during transitions. Warren’s teacher, Rhett, was
a 27-year-old white male whose primary language was English. Rhett was obtaining a master’s
degree in education. Rhett was a long-term substitute with 2 months of experience. Rhett was
also the teacher for Wybie. Wybie was a 10-year-old white male whose primary language was
English. The legal guardian reported that Wybie had diagnoses of attention deficit hyperactivity
disorder, oppositional defiant disorder, and sensory processing disorder and took focalin and
guanfacine. Warren’s BIP had been written 9 months before his enrollment in the study. Wybie’s
BIP specified he may refuse to comply, negatively interact with others, leave his area, destroy
property, aggress, and elope. Wybie’s educational records specified challenging behavior was
maintained by escape, attention, and access to tangibles. Wybie’s BIP included curricular
revision for English Language Arts activities, frequent praise, and teacher attention for cleaning
up tangible items in a timely manner. Wybie’s BIP also included a tiered reward system for
completing academic activities and refraining from challenging behavior. Wybie could earn
8
access to a novel reward if he completed academic activities and refrained from challenging
behavior, a regularly available reward if he did not complete academic activities and refrained
from challenging behavior, or no reward if he engaged in challenging behavior. Rhett had known
and been working with Warren and Wybie for 2 months, and had not assisted with writing either
student’s BIP.
Immediately after providing consent, teachers were provided with a questionnaire about
the importance of and their experiences with BIPs. Teachers completed a 5-point Likert Scale
(ranging from strongly disagree to strongly agree) assessing their agreement or disagreement
with statements about BIP implementation and student behavior (see Appendix C). The
researcher accessed the students’ BIPs and educational records from the students’ school files.
The BIPs were used to create fidelity checklists (see Response Measurement below), and the
educational records (e.g., assessments) were used in the selection process of an error to
manipulate in Study 2 (see Error Selection section in Study 2).
Study 1
The purpose of Study 1 was to determine the types and frequency of errors that occurred
when teachers implemented BIPs during regularly occurring classroom routines.
Method
Setting
Observations occurred in the alternative-education classroom during the students' typical
daily activities. Rachel’s classroom had students in 3rd and 4th grade and had an educational
assistant. Kelly’s and Rhett’s classrooms had students in Kindergarten through 2nd grade, and
each classroom had two educational assistants. Each classroom served fewer than 10 students.
9
Observations did not occur during irregular activities (e.g., fire drills, school assemblies)
or when the BIP was intentionally not implemented (e.g., assessments). Observers sat in a
location designed to minimize distraction to teachers and students and avoided interacting with
students. Each observation would have ended after 15 min or when the student transitioned to
working with a different teacher, whichever came first. However, the latter never occurred, and
all observations were 15 min in duration. The median number of observations per day was 2
(range: 2-3), and observations occurred across an average of 4 days for each student (range: 2-7).
The descriptive assessment for a student-teacher dyad was considered complete after a total of 50
commission errors had been recorded. If 50 commission errors were not recorded after 5 hr of
data collection, observations would have ended. However, this never happened.
Response Measurement
Each student’s existing BIP was used to create a procedural-fidelity checklist (see
Appendices D – G). A BIP step must have met three criteria to be included on the checklist (see
Table 3 for examples and nonexamples). First, only proactive steps were included to consistently
evaluate similar steps across observations, regardless of the occurrence of challenging behavior.
Proactive steps were procedures designed to prevent the occurrence of challenging behavior or
teach an adaptive response. Second, the step must have constituted a directly observable
interaction between the teacher and student. Third, the step must have been observable during a
15-min period. See Table 4 for a summary of the number of included and excluded steps for each
student-teacher dyad.
Four structural modifications were made to included BIP steps when applicable. First,
any step of the BIP that specified multiple teacher actions was divided into multiple steps on the
checklist. For example, “Set the timer and tell the student they have 3 min to engage in the
10
activity” would be divided into two steps (“set the timer for 3 min” and “tell the student they
have 3 min to engage in the activity”). This modification allowed us to detect the specific actions
for which teachers made errors. Second, any BIP step that was a duplicate (i.e., listed in two
places in the BIP) was only listed once on the checklist. Third, time windows were inferred for
the purposes of data collection as it was necessary for capturing errors. This specified a time
frame for the teacher to interact with their student after a context occurred. This is crucial for
determining whether a step should be scored as an omission or commission error. If the teacher
did not implement a step within the designated time window, the step was scored as an omission
error. If the teacher implemented a step outside the time window, the step was scored a
commission error. The temporal window for a step to be considered correct was based on the
frequency or immediacy of implementation specified in the BIP. Steps that were specified to
occur immediately or within 2 min had a 15-s window. Thus, a step specified to occur
“immediately” was still considered correct if it occurred within 15s, and a step specified to occur
“after 2 min” was considered correct if it occurred from 1.75 min to 2.25 min. Steps that should
occur in a range of 2 to 5 min had a 30-s window to be considered correct. For example, a step
that specified “praise every 5 min” would be considered correct if the teacher praised after 4.5
min or 5.5 min. Steps scheduled to occur every 5 min to 30 min had a 1-min window. Thus,
token exchanges scheduled to occur every 30 minutes were considered correct if they occurred
after 29 min to 31 min. We selected to standardize time windows to ease data collection. Fourth,
additional contexts were provided for steps that were dependent on correct performance of a
previous step. For example, “When the [5 min] timer sounds…” is a context that would not have
occurred if the teacher did not set the timer. Therefore, we added the context of “(or 5 min
elapses)” to make the steps independent.
11
In some cases, additional clarification was needed to transform the written BIP into the
data-collection checklist. Once the checklist was drafted, two individuals with a BCBA-D
credential ensured that each BIP step that met the inclusion criteria appeared on the checklist and
identified possible remaining ambiguities on the checklist. If ambiguities existed, the researcher
interviewed the teacher to further operationalize ambiguous steps. This process iterated until
both BCBA-D reviewers agreed that the checklist was complete, and components were
operationalized. The researcher then piloted the checklist with the secondary observer. The two
observers discussed the discrepancies between recorded responses and resolved how the
interactions would be recorded going forward. If a discrepancy arose that could not be resolved
with only the language specified on the checklist, a scoring rule was added in the margins of the
checklist for observers to reference.
The checklists included space to narratively record and tally the frequency of commission
errors that were not in the BIP but met the inclusion criteria. For example, a student’s BIP may
not mention praise; however, a teacher may deliver praise following appropriate behavior. After
every observation, observers were allotted 2 min to add details to qualitative descriptions. Each
narratively recorded commission error was then operationally defined and added to the student’s
checklist before the next observation.
During each observation, observers categorized the teacher’s responses as being correct,
an omission error, a commission error, or not applicable. Correct implementation was defined as
the teacher implementing the step as described in the plan. For example, the teacher delivers a
token when the student meets the criterion to be awarded a token. An omission error was defined
as any portion of the step not occurring. For example, the teacher does not deliver a token when
the student meets the criterion. A commission error was defined as implementing a step that was
12
not specified in the plan or implementing it differently than as specified. There were two forms
of commission errors. A commission error in the plan was defined as the teacher adding or
modifying a step that appeared in the plan. For example, the teacher delivers two tokens when
the student meets the criterion, or the teacher delivers a token when the student does not meet the
criterion. A commission error not in the plan was defined as a student-teacher interaction that
was not a step described in the plan. For example, the teacher delivers candy when the student
engages in an appropriate response, which is not a step in the plan. Any BIP step that included
negative language (e.g., “do not comment on the behavior”) was scored as a commission error if
the behavior specified to be omitted occurred.
Brief interactions between the student and nonparticipating teachers in the classroom
were not recorded on the checklist, but the primary observer narratively noted these interactions.
If the participating teacher omitted a BIP step, it was counted as an omission error even if a
nonparticipating teacher implemented the step (but this rarely occurred). The observation would
have ended if the nonparticipating teacher implemented three BIP steps in succession, but this
never occurred.
Data Analysis
A global measure of procedural fidelity was calculated by dividing the number of correct
student-teacher interactions by the total count of student-teacher interactions (corrects and errors)
and multiplied by 100 to yield a percentage.
The rate of specific categories of errors per hour was calculated by dividing the count of
errors for each BIP step (including commission errors not listed on BIP) by the total hours of
observation.
13
Interobserver Agreement (IOA)
Observers were trained to collect data during the initial pilot of each procedural fidelity
checklist. Training consisted of instructions on how to complete the checklist and feedback on
data collection by comparing their checklist to a trained researcher.
Because each form of error was typically infrequent during an observation, traditional
methods of calculating IOA resulted in very low agreement estimates. For example, if one
observer scored one instance of correct implementation of a step and the second observer scored
two instances, proportional agreement on that step would be 50% despite deviation by only one
count. Using a total agreement calculation (e.g., smaller count divided by larger count for each
category of data), the mean IOA scores for “commission errors not in plan” was 44% for Fabian-
Kelly, 64% for Dakota-Rachel, 34% for Warren-Rhett, and 50% for Wybie-Rhett (see Table 5 –
8). Therefore, we used a correlation analysis to evaluate the believability of the data.
Correlations between counts obtained by the primary and secondary data collectors are shown in
Figure 1. Data for each participant is shown in a separate graph. Each data point represents the
counts obtained by each observer for that category (correct implementation, omission error,
commission error in plan, commission error not in plan) for a single observation. We collapsed
the categories of interactions and obtained significant Spearman r correlations for each student-
teacher dyad. No clear patterns in the differences across observers were obtained, although
commission errors not in the plan seemed less likely to be detected identically across both
observers (with only three of 12 observations [25%] resulting in perfect correspondence of these
kinds of errors across the observers, relative to 75% for commission errors in plan, 50% for
correct implementation, and 33% for omission errors). Although perfect correspondence was
relatively rare, with observers only having perfect correspondence in 48% of all interactions,
14
observers’ records typically deviated by only a few instances per category per observation.
Nonetheless, the imperfect agreement suggests that the results should be interpreted with caution.
Results and Discussion
The Spearman correlations between the observers’ data were statistically significant for
all student-teacher dyads. However, agreement was far from perfect, and more typical measures
of IOA yielded low correspondence (34%-64% for commission errors not in the plan).
Disagreements may have been related to the number of discriminations required for the
observational method, how errors were categorized, or the exceptionally large difference
between the BIP as written and the BIP as implemented.
Observers had to engage in a lengthy discrimination process when recording data.
Observers had to (1) determine if the interaction occurred during a specified context, (2) and if
so, if the interaction was a BIP step, (3) and if so, the type of the interaction, and (4) if it was not
a context or BIP step, determine if the step met the three inclusion criteria, (5) and if so, if the
interaction was already operationally defined as a commission error not in the plan to tally, and
(5) if not operationalized, how to write a narrative description of the error. This process had to
occur quickly and repeatedly during live observations. Observers could make an error at any
point in the discrimination process or miss student-teacher interactions while recording previous
interactions.
The discrimination process was lengthy in part because interactions were categorized.
Procedural-fidelity scholars have categorized errors into the subtypes of omission and
commission (Brand et al., 2019). However, there is little information on the criteria or features of
an error that determine its classification (Vollmer et al., 2008). It is impractical to attempt to
classify errors when there is not sufficient information on how to classify errors (Han et al.,
15
2022). Researchers may consider identifying the features of an error that informs its
classification, and the best methods for classifying errors (Harvey & St. Peter, 2024).
The lack of correspondence between the procedural-fidelity checklist and the teacher’s
implementation made it difficult to determine if interactions were specified in the plan,
deviations in the plan, or deviations from the plan. This complicated consistently detecting
deviations from the plan as the interaction could look different each time it occurred. After every
observation, any recorded commission error(s) not specified in the plan were operationally
defined, even if only recorded once. Therefore, observers could have only one definitive example
of a commission error not in the plan for that specific teacher. Although alternative possible
examples were written on the checklist for observers to reference, it just was not possible to
identify every example of what the teacher interaction could look like. For example, there are
lots of different praise statements a teacher could make (e.g., “Awesome job!” “Keep up the
good work!”). Even with an operational definition, it could be challenging to decide in the
moment if the specific wording of a phrase was praise or not.
It is uncertain what methods could improve IOA because our observers were experts
(specialization in procedural fidelity and extensive experience and training with data collection),
and we made several efforts to improve data collection (operationalization of the BIP,
modifications of the checklist during piloting, and addition of scoring rules). It may be that the
scope of the measurement system needs to be narrowed further. Behavior analysts could consider
using information about the client (e.g., functional assessments) to their advantage by selecting a
few BIP steps that appear to be function-based and are hypothesized to influence behavior. For
example, a behavior analyst may collect data on a student’s token system for earning breaks from
academic activities if they have an escape function. However, if the goal is to capture deviations
16
from the intervention, you would have to ensure you do not inadvertently capture intervention
steps as deviations from the plan if the steps are not listed on the checklist.
It may be infeasible to collect procedural fidelity data on complex multistep BIPs. In a
pilot study, we determined that collecting procedural fidelity data on proactive and reactive steps
of a BIP was infeasible. Therefore, we narrowed our inclusion criteria to only proactive steps.
However, even with excluding reactive procedures (critical components of BIPs for which
procedural fidelity data should be collected) we did not obtain the minimum recommended
standard of 80% for IOA (Cooper et al., 2007). And, notably, two students (Warren and Wybie)
had more reactive than proactive steps in their BIP. It is troublesome that observers could not
agree on whether or to what extent a treatment was implemented, especially because our
methods for developing a procedural-fidelity checklist aligned with recent recommendations
(Morris et al., 2024). The data should be interpreted with this caveat in mind, and our findings
are framed to suggest areas of future research to improve observational methods for procedural
fidelity data collection.
Figure 2 shows the total count of each category of teacher implementation: correct
implementation, omission error, commission error of a step in the plan (i.e., repeatedly
prompting when the BIP specified a single prompt), and commission error of a step not in the
plan (i.e., adding a potential reinforcer following appropriate behavior). Data were collected for a
total of 45 min, 120 min, 105 min, and 75 min for Fabian-Kelly, Dakota-Rachel, Warren-Rhett,
and Wybie-Rhett, respectively. These durations of data collection were sufficient to capture at
least 75 BIP-related student-teacher interactions (M = 104.5 interactions, range 75-143
interactions).
17
All teachers implemented BIPs with low fidelity (range: 0-17% correct) and engaged in
high rates of errors (range: 39-162 errors/hr). For two of the teachers (Kelly and Rachel),
omission errors were most common. In contrast, Rhett most commonly added potentially
impactful BIP steps (i.e., commission errors not in the plan).
These results replicate previous research demonstrating that teachers implement
procedures with notably low fidelity. The global fidelity percentages were particularly low in
comparison to the minimum recommended standard of 80% fidelity (National Autism Center,
2015). However, the recommended standard appears to be set arbitrarily. Some procedures may
need to be implemented with more than 80% fidelity (e.g., Jones et al., 2022), whereas others
may still be efficacious when implemented with less than 80% fidelity (e.g., Foreman et al.,
2021). Future research should continue to parametrically investigate the impact various global-
fidelity percentages have on intervention outcomes. Identifying interventions that are efficacious
despite low fidelity may be advantageous for behavior analysts in school settings. That is,
because teachers implement with low fidelity, behavior analysts in school settings could consider
recommending interventions more resistant to the impacts of low fidelity than alternatives.
The complexity of a BIP (e.g., the number of steps or the contiguity of interactions) may
reduce fidelity of teacher implementation. For example, Fabian’s BIP had the most steps (56)
and the densest schedule of reinforcement (VR 2 token delivery for appropriate behavior), and
Kelly never implemented a step correctly. Alternatively, Dakota’s BIP had the leanest schedule
of reinforcement (30-min DRO token delivery for absence of challenging behavior), and Rachel
had the highest fidelity. Procedural-fidelity scholars have hypothesized several factors, like
complexity, that may make procedures more prone to low fidelity (Allen & Warzak, 2000).
However, few studies have evaluated the impacts that factors hypothesized to reduce fidelity
18
have on fidelity. Of the studies that have examined these variables, the findings are mixed (see
Fiske, 2008; for a review). Moreover, there may be school-specific factors that impact teachers’
fidelity, like staff buy-in, staff burnout, and school environment or support (Garcia et al., 2022;
Kincaid et al., 2007; Schlichte et al., 2005) or teacher-specific factors like skill sets and training.
Research evaluating the influence of barriers in school settings on teacher fidelity is warranted.
The errors that reduce global fidelity may be more influential on student outcomes than
the global-fidelity percentage. A sub-optimal global-fidelity percentage may represent a few
errors across all steps of an intervention or several errors in only one step of an intervention. The
lack of specificity in a global measure of fidelity may overlook crucial errors in the
implementation of an intervention (Cook et al., 2015). All steps of an intervention might be
important. However, there are likely crucial steps of an intervention (e.g., reinforcer delivery).
Few studies have reported the types of errors made (Han et al. 2022; cf., Carroll et al., 2013;
Donnelly & Karsten, 2017). Therefore, it was important to examine not just the global-fidelity
scores and kinds of errors, but also the distribution of errors across the various components of the
BIPs.
The rate of each specific error is displayed as bar graphs (see Figure 3-6).
Kelly engaged in 12 forms of errors. The three most frequent errors were omitting token
deliveries (e.g., a token was not placed on a token board; 88/hr), adding praise/acknowledgment
(e.g., “Great job!”; 43/hr), and adding flexible seating (e.g., standing at his desk; 7/hr). Rachel
engaged in 24 forms of errors. The three most frequent errors were omitting praise directed
specifically to Dakota (12/hr), adding proximity to teacher (e.g., Rachel standing next to
Dakota’s desk or Dakota moving her chair next to the teacher; 9/hr), and omitting placing a
“dollar” (token) for each domain of behavior that met expectations in her “wallet” (e.g., not
19
delivering a dollar when expectations had been met; 8/hr). Rhett engaged in 21 kinds of errors
with Warren and 16 kinds of errors with Wybie. For Warren-Rhett, the three most frequent errors
were adding proximity to teacher (e.g., standing directly in front of Warren’s desk; 13/hr),
adding restrictions of access to items (e.g., removing potentially distracting items in Warren’s
academic area; 6/hr), and adding token deliveries (e.g., placing a token on a token board; 5/hr).
For Wybie-Rhett, the three most frequent errors were adding proximity (e.g., Rhett standing over
Wybie from behind at his desk; 14/hr), adding access to items (e.g., giving Wybie an item from
his lunch box; 11/hr), and omitting praise (7/hr).
Few previous studies have reported on errors involving teachers adding new steps to the
plan, despite recent discussion emphasizing the importance of capturing these forms of errors
(e.g., Colón & Wallander, 2023). The narrative descriptions of commission errors not in the plan
were idiosyncratic for each student-teacher dyad, with only a few common errors across all
dyads. The idiosyncrasy of procedural fidelity measures may hinder their adoption in practice.
Each procedural fidelity measure must be tailored to reflect a client’s individualized intervention.
Then, observers should capture the unique steps specified in the plan and deviations from the
plan. This complicates the creation of a measurement system and the identification of correct and
erroneous implementation during observation.
All dyads had two common commission errors not in the plan. All teachers moved in
proximity to their students (e.g., standing next to the student’s desk) and provided physical
attention (e.g., patting the student on the back). Proximity to the student was a notable
commission error not in the plan because the teacher often engaged in a second interaction, such
as physical attention. For example, Rachel stood next to Dakota’s desk, patted her on the back,
and said a statement of encouragement. For other examples, Kelly stood next to Fabian’s desk
20
and provided extra help, or Rhett stood over Wybie at his desk and provided gesture prompts.
Proximity, or one-on-one interactions with the teacher, may influence student outcomes. These
errors may have been common across all teachers because the classrooms had small teacher-to-
student ratios. Alternatively, these interactions may be common across teachers because teachers
engage in these interactions with their students or view these interactions as general classroom
management.
Teachers' experiences with and perceptions of BIPs may be important factors when
attempting to identify commission errors not in plans. Table 2 shows teachers’ demographic
information and responses to the questionnaire about the importance of BIPs. Teachers had
similar responses on three of six Likert scale questions. Teachers reported that they sometimes or
infrequently made errors when implementing BIPs. However, our descriptive assessment
findings suggest otherwise. Although teachers engaged in frequent errors, they agreed that
consistency in BIP implementation was critical to its success and disagreed that BIPs were
difficult to implement. Even with a formal data collection system, teachers may still not be able
to accurately record their behavior and overestimate their procedural fidelity (Hagermoser
Sanetti & Kratochwill, 2011). Given that the teacher’s perception is that errors are infrequent, the
teacher may not be highly motivated to seek out or receive feedback on performance. Additional
information is needed on the actions teachers view as teaching versus implementing a BIP.
The West Virginia Department of Education (2023) has five professional teaching
standards that serve as a basis for assessing the expected performance of teachers. The standard
“The Learner and the Learning Environment” sets the expectation that teachers will demonstrate
a fundamental understanding of student development and foster an environment that promotes
learning for all students. An indicator that a teacher is distinguished in the substandard of setting
21
expectations for student behavior is “The teacher has established with students a mutually agreed
upon set of behaviors that foster standards of conduct and consequences in an environment that
focuses on learning” (p. 26). Teachers are expected to establish rules and consequences for
behavior for all students regardless of a BIP. In addition, an indicator that a teacher is
distinguished in the substandard of differentiating learning is “The teacher guides students in
developing individual learning processes by demonstrating extensive and subtle understanding of
the needs, interest, learning style, cultural heritage, gender, and environment of students” (p. 18).
The teacher may have viewed their interactions with students as meeting these expected
performance standards, whereas we may have captured these interactions as errors. When
attempting to identify interactions added to procedures, it may be important to determine the
foundational expectations of worker performance to inform data-based decisions (e.g., feedback
to the implementer). Nonetheless, capturing interactions that are expectations of worker
performance is important. Future studies might first evaluate or establish that teachers are
engaging in the expected behavior to support all learners before measuring implementation of
specific BIP components. If teachers lack these core skills, building them first may facilitate
positive outcomes for multiple students, and excluding these steps from BIP fidelity may ease
subsequent measurement.
In sum, Study 1 replicates previous findings that teachers implement procedures with low
fidelity and provides preliminary evidence that teachers engage in high rates of errors, including
adding steps not in BIPs. Given that it may be infeasible to measure fidelity for all components
of a BIP simultaneously, knowing which errors are potentially impactful could inform
streamlined initial measurement systems.
22
Study 2
The purpose of Study 2 was to experimentally manipulate a commission error identified
during Study 1 to determine if it impacted BIP outcomes. To accomplish this aim, we
implemented the BIP with high fidelity (HF), a programmed commission error (FCE), or no
fidelity (NF) in a reversal design and compared the rate of challenging behavior between
conditions to draw conclusions about the errors impact on treatment outcomes.
Method
Implementer, Consenting Procedures, & Setting
Because Study 2 required implementation of the BIP precisely as specified in each
experimental condition, a doctoral student in behavior analysis with three years of experience
developing and implementing BIPs (hereafter referred to as “researcher”) conducted all
procedures. The legal guardian again provided consent as described for Study 1. Assent was
obtained for all children because they all were above the age of 7 years and did not have
significant cognitive impairments. Assent was monitored before each block of sessions by asking
students if they would work with the researcher in an alternative room. If the student declined,
the session was not conducted. If the student declined for three consecutive sessions, they were
asked if they wanted to be asked again the next school day (Dakota and Wybie), or they were
withdrawn from the study (Fabian and Warren). If Dakota or Wybie wanted to be asked again
the next school day, the experimenter repeated the assent procedures. We changed the assent
procedures for Dakota and Wybie after challenges with obtaining assent from Fabian.
Assent issues occurred for each of the four participants. Warren did not provide initial
assent to participate, so no Study 2 data were collected. After the ninth session (in the No-
Fidelity condition), Fabian withdrew assent to participate. Although Fabian did not label a reason
23
for his withdrawal of assent, it seems likely due to there being more access to reinforcement in
the classroom than in the research sessions during that condition. Although we changed
procedures to ask across separate days, Wybie withdrew assent after 4 sessions. Dakota initially
participated consistently in April and May of 2023, but withdrew assent as the school year came
to a close. Like Fabian, she did not provide a reason for withdrawing assent. Anecdotally, the
rigor of academics in the classroom seemed to wane as the school year came to an end, and it
seems likely that participating in research sessions increased the amount of academic work that
Dakota was expected to complete. We resumed sessions with Dakota in October 2023. She
assented to only the first session, during which she contacted the academic tasks required
(journal prompts), which exceeded those expected of her in the classroom. Therefore, we
changed the academic task from journal prompts to mirror the tasks Dakota was expected to
complete in the classroom. This resulted in Dakota assenting to participate in five additional
sessions before she declined to participate in a session on one day. As with other refusals to
assent, no rationale was provided, but we hypothesized that Dakota may have not wanted to
leave her peers. After participating in three additional sessions, Dakota completed the
experiment.
Sessions were not conducted in the classroom to minimize disruption (as we presumed
challenging behavior would occur) and to prevent possible confounding variables (e.g., a peer
interrupting a session). Study 2 occurred in a barren office at the student’s school (hereafter
referred to as “research room”). Only the researcher, student, and data collectors were present.
All sessions were video recorded. Each session was 15 min in duration.
Response Measurement and IOA
24
The primary independent variable was the rate of the individually defined challenging
behavior, which was expressed as an aggregate responses per minute collapsed across all
topographies. Each student’s operational definitions of challenging behavior included any
topographies specified in the BIP. For example, Fabian’s property destruction definition included
ripping materials as that topography was written in his BIP, whereas Dakota’s property
destruction definition included swiping materials out of place as that topography was written in
her BIP. To maintain a consistent termination criterion, each student had the same definition of
elopement (which captured leaving the research room) regardless of whether it was specified in
the BIP. See Table 9 for each student’s operational definitions of challenging behavior.
Observers independently collected data from video using the Behavior Logger
Observation Coding System (BLOCS) software. The BLOCS software recorded timestamps as
observers press designated keyboard keys associated with participant responses. Data were
output to an Excel file, which included total session duration, a timestamp of each response, and
a summary of responses per minute or percentage of the session during which the response
occurred.
Observers were trained using a performance-based system, consisting of a self-instruction
manual, video models, and automated performance feedback. Training continued until observers
achieved IOA coefficients of at least 90% for all responses across two consecutive sessions
during the experiment, using the calculations described in the IOA section below. If IOA values
had decreased below 90% accuracy for any key for two consecutive sessions, the researcher
would have identified errors, and both observers would have returned to training. However, this
never occurred.
25
Interobserver agreement was calculated using a partial-agreement method with a 10-s
window, using the IOA calculator in BLOCS. The program divided each observer’s data into 10-
s intervals, divided the smaller count of behavior by the larger count for each interval, averaged
the values across intervals, and multiplied by 100 to yield a percentage. If observers agreed on
the absence of behavior in an interval, IOA for that interval was considered 100%. For Fabian,
IOA data were collected and calculated for 33% of HF sessions and 33% of NF sessions. IOA
was 100% for HF sessions and 98% for NF sessions. For Dakota, IOA data were collected and
calculated for 36% of HF sessions, 50% of FCE sessions, and the NF probe. IOA was 99% (97-
100%) for HF sessions, 100% for FCE sessions, and 100% for the NF probe. For Wybie, IOA
data were collected and calculated for 25% of HF sessions. IOA was 100% for HF sessions.
Procedural Fidelity
We measured the researcher’s procedural fidelity (i.e., programmed BIP errors would be
scored as correct). Procedural-fidelity data were collected using the modified version of the
procedural-fidelity checklist that included the possible facilitative error and reactive BIP steps.
Observers were trained to collect procedural fidelity data for each participant by collecting data
on video recordings of research sessions. Training continued until observers achieved IOA
coefficients of at least 90% across two consecutive sessions during the experiment, using the
calculations described below.
Procedural-fidelity scores were calculated by dividing the total number of correct
researcher responses by the total number of researcher responses and multiplying by 100 to yield
a percentage for each checklist. For Fabian, global procedural fidelity was 100%. For Dakota,
global procedural fidelity was 96% (67%- 100%). For the HF condition, the mean procedural
fidelity was 95% (67%- 100%). For the NF condition, the mean procedural fidelity was 100%.
26
For the FCE condition, the mean procedural fidelity was 95% (86%- 100%). See Table 10 for a
summary of the errors made. For Wybie, global procedural fidelity was 64% (27-100%). See
Table 11 for a summary of the errors made.
Correlations between counts obtained by the primary and secondary data collectors are
shown in Figure 7. We evaluated IOA identically to Study 1 and obtained significant Spearman r
correlations for two of three students. Observers had perfect correspondence for 75% of all
interactions.
Error Selection
We selected a commission error that occurred frequently during Study 1 and was
expected to facilitate intervention outcomes. In the context of proactive BIP steps, a possibly
facilitative error is a form of error that may reinforce desirable behavior, attenuate an
establishing operation, or that provides additional instructions or discriminative stimuli. For
example, the delivery of additional possible reinforcers (e.g., praise, proximity, access to items)
following appropriate behavior were categorized as possibly facilitative errors.
For Dakota, four commission errors were selected to simulate the student-teacher
interaction observed in the classroom. The four errors were as follows: proximity, access to a
new space, physical interactions, and statements of encouragement or assurance. Because these
errors co-occurred during Study 1, we treated all four error topographies as one combined error
during Study 2. Thus, each time an error occurred, the researcher moved close to Dakota, walked
with her into the corridor outside the session room, provided physical attention (e.g., patted
Dakota’s back), and made an encouraging statement (e.g., “You can do it!”). Dakota had a
certificate on her desk that had the phrase “talk to the teacher privately” to signal the presence of
these commission errors during the FCE condition. Given that one selected commission error for
27
Dakota was proximity to the teacher, the researcher measured the distance between Dakota’s
desk and the teacher’s desk in the classroom to simulate the classroom. Dakota’s desk was 7 ft
away from the researcher’s desk during all sessions.
Fabian, Wybie, and Warren did not experience commission errors because they did not
provide or withdrew assent.
Experimental Manipulation
Sessions consisted of a school routine like the one during which the error was observed in
the classroom. Fabian was presented with addition, subtraction, and sight word flashcards, as
Kelly had nominated these as skills Fabian should practice. The researcher presented all
flashcards for a particular skill before moving on to the next deck. The decks were presented in a
randomized order. Dakota was presented with two different kinds of academic activities. During
Sessions 1-18, she was instructed to write a response to a journal entry prompt. To address issues
with refusal of assent, during Sessions 19-27, she was instructed to complete activities that
would have been occurring in the classroom (e.g., educational computer program, worksheet).
Wybie’s BIP specified revising his curriculum based on his most recent assessment results.
Therefore, Wybie was presented with first-grade (rather than second-grade) English Language
Arts worksheets.
For Fabian, one to four sessions were conducted per day for four days, spanning 10
calendar days. For Dakota, sessions spanned 51 days with an average of 2.5 sessions (range: 1-4)
conducted an average of 2 days per week (range: 1-5). For Wybie, two sessions were conducted
per day for two days, spanning 18 calendar days. Sessions were 15 mins unless a termination
criterion was met. The researcher stopped the session if the student aggressed, injured
themselves, or eloped.
28
Experimental Design
Comparisons between a high-fidelity (HF) baseline and facilitative commission error
(FCE) condition were conducted using a reversal design for Dakota. Although reversal designs
were also planned for the other participants, either no data or only baseline (HF or No-Fidelity)
data were collected.
High-Fidelity (HF) Baseline. The purpose of the HF condition was to indicate if a
student’s BIP was effective or ineffective. All individualized components of the BIP relevant to
the academic routine were implemented exactly as written (100% procedural fidelity). The HF
baseline was conducted for at least three sessions, and until there was no trend in response rates.
Fabian was provided a pass to take a break from academics and replace the activity;
however, he never used it. The researcher delivered tokens for every 1-3 appropriate responses.
The appropriate responses were answering flashcards (regardless of whether the answers were
correct) and staying in his seat for an entire work block (i.e., up until the delivery of the last
token). After he earned 12 tokens, the tokens were exchanged for 3-5 min of access to an activity
of his choosing from a reward menu. Fabian selected coloring or Legos. The duration of the
reward activity was signaled with a timer, and Fabian was provided two advanced notices of
when reward-menu activities would end. The first advanced notice was when 1 min remained,
and the second when 30 s remained. Given that Fabian’s BIP had ranges for the number of
appropriate responses and min of access, two lists of values were generated using an online
random number generator, one for each BIP step. The values within the range were block-
randomized by three. A data collector seated behind Fabian discretely signaled to the researcher
when to deliver a reinforcer, and the number of minutes of access to an activity. If Fabian
refused or did not provide an answer, the researcher re-prompted the instruction.
29
Dakota was provided with two certificates. The first certificate could be exchanged to
help a teacher complete a task, and the second could be exchanged to work at a teacher’s desk
for 5 min. To maintain consistency across sessions, Dakota was not permitted to exchange a
certificate during sessions; instead, the researcher provided an alternative option if Dakota
requested to use a certificate. The researcher rephrased the instructions for 1 min when
presenting a journal prompt. The researcher praised every 5 min and attended to every hand
raise. Dakota’s BIP specified that she could earn up to three tokens every 30 min in clock time
(i.e., 10:00 a.m.) for being safe, respectful, and responsible. Tokens were delivered during
sessions in which the clock time occurred. The researcher stated a reminder if Dakota put her
head down, vocalized loudly, refused to start an academic activity, or left her area. The
researchers removed materials when Dakota destroyed property.
Wybie’s academic activities were revised to be at the first-grade level, and the researcher
praised every 5 min. If Wybie refused, the researcher asked Wybie if he needed help. If Wybie
left his area, the researcher stated a reminder.
Facilitative Commission Error (FCE); Dakota Only. The purpose of the FCE
condition was to indicate if a programmed error facilitated outcomes by comparing the rate of
challenging behavior to the HF condition. The BIP was implemented with 100% procedural
fidelity with the commission error, which consisted of the researcher moving close to Dakota,
walking with her to the corridor outside the session room, providing physical attention, and
making an encouraging statement. Before session 11, the researcher told Dakota what the talk to
teacher privately certificate was and how to use it. In correspondence with Dakota’s BIP, the
researcher could attend to Dakota when she raised her hand. Any request to use the certificate by
Dakota following a called-upon hand raise was honored. We opted for Dakota to initiate the
30
interaction, rather than the researcher, for clinical reasons. We were near the end of the school
year and Dakota would be transitioning to a different school. The certificate would be more
readily available to embed in her new BIP.
No-Fidelity (NF) Baseline. The purpose of the NF condition was to indicate if a student
required a BIP. The No-Fidelity condition consisted of omission of all proactive steps of the BIP.
Reactive steps were implemented correctly to maintain safety.
Fabian was not provided a pass to take a break from academics and replace the activity.
The researcher presented flashcards but never delivered tokens, and, therefore, did not permit
token exchange.
Dakota was not provided certificates. For Dakota, the researcher gave an initial
instruction but then did not provide praise, attend to hand raises, deliver tokens, or re-state the
instructions. The researchers removed materials when Dakota destroyed property. The researcher
stated a reminder if Dakota put her head down, vocalized loudly, refused to start an academic
activity, or left her area. The researchers removed materials when Dakota destroyed property.
Results and Discussion
Fabian, Dakota, and Wybie’s rate of challenging behavior across conditions is displayed
in Figure 8.
Fabian participated in nine research sessions. Rates of challenging behavior were low
during the HF condition. Following the transition to the NF condition, Fabian did not engage in
challenging behavior for the first two sessions (Sessions 4 and 5). A higher rate of challenging
behavior (1.19 responses/minute) occurred in Session 6, but it decreased across subsequent
sessions. Fabian withdrew assent after Session 9, thus terminating his participation.
31
Although mean response rates were lower in the HF condition than the NF condition, the
delayed change in rates, high degree of variability, and lack of replication in the planned reversal
design preclude conclusion regarding the efficacy of the BIP. Fabian’s withdrawal of assent also
precluded the evaluation of the planned commission error of praise. Thus, Fabian’s data do not
permit the identification of functional relations.
Dakota participated in 27 research sessions. Sessions were terminated if Dakota
aggressed, eloped, or injured herself; terminated sessions are depicted on the graph by white data
points. During the first HF condition, Dakota initially engaged in variable rates of challenging
behavior and met the termination criteria in one session because of self-injury. However, overall
rates of challenging behavior were low. Therefore, we conducted an NF session to determine if
the BIP effectively suppressed responding. Dakota engaged in a high rate (1.28 responses/min)
of challenging behavior during the NF session, ultimately leading to the termination of the
session due to aggression. Although we had initially planned an NF condition, the dangerous
escalation in challenging behavior during the first NF session caused us to pivot back to HF,
using a single NF session as a probe. Like the first several HF sessions, Dakota initially did not
engage in challenging behavior during sessions upon the return to the HF sessions. However, a
high rate (0.83 responses/min) of challenging behavior, ultimately leading to the termination of
the session due to aggression. Thus, across the HF condition, Dakota infrequently engaged in
challenging behavior, but the challenging behavior that occurred was severe and posed imminent
harm to herself and others. These results were replicated in the second HF condition.
During the first FCE condition, Dakota did not engage in challenging behavior across six
sessions. Technical difficulties occurred with the camera during Session 15, and we could not
retrieve the data. Although the frequency of severe behavior had increased in the replication of
32
the HF condition, no challenging behavior occurred across five consecutive sessions in the
replication of the FCE condition. Thus, implementing a BIP with high fidelity and a commission
error can actually facilitative student outcomes compared to the BIP without a commission error.
Dakota did not use the talk to the teacher privately certificate during the final FCE
condition and thus did not experience the commission error. One possibility is the availability of
the certificate was sufficient to mitigate challenging behavior. This effect has been demonstrated
in other interventions for which the availability of reinforcers was signaled by a certificate or
pass. For example, Ravid et al. (2021) provided children with a bedtime pass that could be
exchanged for nighttime parental attention. The children initially exchanged the passes often but,
over time, kept all the passes instead of exchanging them. The intervention effectively reduced
co-sleeping and increased independent sleeping even when the children kept the passes. When
Dakota withdrew assent in June of 2023, we recommended to Rachel that a “talk to teacher
privately” certificate be added to Dakota’s BIP. When we re-obtained assent in October of 2023,
Dakota’s BIP included the talk to teacher privately certificate, and the date of the BIP indicated
the intervention had been in place for approximately 3 months of school. Dakota’s educational
records indicated Dakota had been successful, and the new teacher reported Dakota was not
using the talk to the teacher privately certificate in the classroom. Given that Dakota had the
intervention in place for some extended time, Dakota may have started to keep the passes instead
of exchanging them with the intervention still effectively suppressing challenging behavior.
However, the lack of certificate use during research sessions limits our ability to draw firm
conclusions about effects of the commission errors.
All three students withdrew their assent at some point in the study, with two of the three
students withdrawing their assent when the BIP was implemented with high fidelity. Upon
33
reporting assent issues to the teachers, the teachers offered for us to conduct the study in the
classroom. We declined because the student’s data up to that point made it seem likely that
implementing the BIP as written would be disruptive to the classroom environment. However, in
the classroom, elementary students do not generally have opportunities to vocally assent to
participating in their BIPs. It has been suggested that students should get to assent to the
interventions they partake in, and efforts should be made to teach students to self-advocate. In
behavior analysis, Breaux and Smith (2023) called for the adoption of assent-based practices,
which include teaching clients about assent and how to advocate on their own behalf. In schools,
student’s self-advocacy and academic outcomes improved when they were provided explicit
instruction on how to participate in IEP meetings (Blackwell & Rossetti, 2014). In the present
study, the student’s withdrawal of assent may have suggested that the BIP as written was
aversive. Given that teachers are federally mandated to implement BIPs, it may be worthwhile to
include the students in the discussion on the services they receive (Individuals with Disabilities
Education Improvement Act, 2004).
General Discussion
The current studies aimed to identify the types of errors teachers make when
implementing BIPs (Study 1) and impacts such errors have on student outcomes (Study 2).
However, low agreement between observers in Study 1 requires the data to be interpreted
cautiously, and the lack of student assent in Study 2 limits the extent to which we could identify
possible functional relations. During Study 1, our process for creating procedural-fidelity
checklists aligned with recent best-practice recommendations (Morris et al., 2024), but we had
disagreements in IOA that reduced the believability of our data despite additional training and
feedback for observers. Despite this, our findings are consistent with previous research indicating
34
that teachers implement behavior-reduction procedures with sub-optimal fidelity in naturalistic
contexts (Codding et al., 2008; DiGennaro Reed et al. 2010; Foreman et al., 2021; Mouzakitis et
al. 2015; Sanetti et al. 2014). Our findings additionally extend the literature as we identified that
teachers engage in commission errors not in the plan. During Study 2, we had challenges
obtaining complete datasets due to a lack of student assent. However, we were able to obtain one
complete dataset. Contrary to previous research that identified commission errors as detrimental
to behavior-reduction procedure outcomes, our findings suggest that a commission error could
actually facilitate outcomes (Foreman et al., 2022; St. Peter et al., 2016; St. Peter Pipkin et al.,
2010). Collectively, these findings broaden our conceptualization of procedural fidelity as they
suggest that there are other errors that may need to be captured in data collection and that errors
may have a variety of impacts on outcomes. Additional research is still needed to identify
procedural fidelity measurement systems that produce better IOA to increase the believability of
data, and experimental analyses are needed to determine the various effects of errors on
behavior-analytic procedure outcomes.
Research studies that have identified naturalistic errors and then manipulated them in
experiments mainly focused on evaluating errors that are already specified in the plan (Carroll et
al., 2013; Foreman et al., 2021). However, this approach may not always be sufficient in
identifying all errors occurring. Sanetti et al. (2014) stated that they used the procedural fidelity
measurement system with the most empirical support to measure behavior plan implementation
in the classroom, which captured only errors specified in the plan. However, Sanetti et al.
acknowledged that additional research on fidelity would enhance our understanding, and the
measurement system would need to change to ensure all relevant variables are accounted for.
Given our findings, it is crucial to broaden the scope of errors identified in procedural fidelity
35
measurement systems to capture errors not in the plan. However, this approach has challenges,
such as the need for a sufficient level of expertise to understand how potential reinforcers impact
behavior in nuanced ways. Researchers may need to conduct exploratory studies using different
methods to identify common themes in commission errors that are not in plans. However, it is
important to note that the most common errors may not necessarily have the most significant
impact, and there is a vast array of possible commission errors to identify. As a result, further
investigation in this area may require some trial and error to identify the most efficacious
measurement systems as our understanding of procedural fidelity expands.
Teacher’s implementation of behavior plans in the classroom has been measured using
categorical measurement systems (Codding et al., 2005; Codding et al., 2008; DiGennaro Reed
et al., 2007; DiGennaro Reed et al., 2010; Mouzakitis et al., 2015; Sanetti et al., 2014). For
example, Codding et al. (2005) had observers score each procedure component as (1)
implemented as written, (2) not implemented as written, or (3) no opportunity to observe. Similar
to Study 1, Codding et al. also narratively recorded examples of deviations in implementation
and did not factor agreement of narrative descriptions into IOA calculations. Codding et al.
(2005) suggested that perhaps a more conservative method should be used that directly compares
the observer’s agreement between the type and frequency of deviations in the plan. Our
measurement system captured more detailed aspects of implementation, including the type and
frequency of errors, and even recorded implementation for each procedure step rather than by
procedure component. However, we yielded low agreement. It may be useful to have more
conservative measures of implementation, but they need to be reliable. Additional research is
needed to identify features that make more detailed procedural fidelity measurement systems
feasible. Researchers might parametrically manipulate the number of components or steps in a
36
procedural-fidelity checklist and determine the point at which observers no longer have
acceptable levels of agreement.
There may have been several other variables contributing to low IOA. We obtained low
IOA in Study 1 and high IOA in Study 2, despite using the same checklist across studies. There
were methodological differences that may have made the measurement system more feasible in
Study 2 compared to Study 1. First, observers collected data live during Study 1 but collected
data from video in most cases during Study 2. When procedural fidelity data were collected from
video, observers could pause, rewind, and fast-forward. Observers could have additional time to
collect data or re-watch the interaction when needed to make accurate discriminations. Second,
observers for Study 2 were provided a cheat sheet on how to classify interactions. The sheet
listed each BIP step and described what researcher interactions would be classified as correct,
omission, or commission. Observers could reference this sheet at any time to inform how they
classified interactions. Third, there was more correspondence between the checklist and the BIP
as implemented during Study 2 relative to Study 1 because the researcher was trained to
implement the BIP steps precisely. Even when the BIP was implemented with no fidelity (all
omission errors), it was evident when the researcher did not engage in an interaction as planned.
For example, the researcher attended to Dakota’s hand raises in the HF condition and did not
attend to Dakota’s hand raises in the NF condition. Collection of procedural fidelity data for
complex BIPs may be more feasible when observers can use video recording, which would allow
for additional time and reference to supporting resources. When video recording is not feasible,
data collection may need to be initially limited to just a few components of the BIP.
We had difficulty operationalizing the details of the BIPs in the student’s files, even with
the teacher’s input and despite teachers often having input on the development of the BIP. The
37
challenges with operationalization may be due to a lack of guidance on how to write BIPs to
precisely describe interventions while maintaining language that is consumable for teachers.
Existing recommendations about BIP writing focus mainly on components to include in BIPs,
such as providing background information on the client, defining target and alternative behavior,
or including interventions that reward positive behavior (Higgins et al., 2023; Horner et al.,
2000; Quigley et al., 2018; Williams & Vollmer 2015). Teachers bring their own histories to the
classroom, which likely influences how they interpret and implement BIPs. Personal
interpretation may be particularly likely when the BIP is not operationalized.
Interventions need to be clearly described, and it should be easy to carry out interventions
with high fidelity. Teachers receive inadequate training in classroom management, and student
disruptive behavior contributes to burnout which contributes to teacher shortages (National
Council of Teacher Quality, 2014; Kollerová et al., 2021). Additionally, teacher buy-in is a
barrier to the adoption of positive behavior supports in the classroom (Kincaid et al., 2007). If
interventions are incomprehensible or difficult to implement, it may only compound the problem
as disruptive behavior may persist and further reduce buy-in from the teacher. The next step for
improving measures of procedural fidelity may actually be addressing how to operationalize
BIPs, as identifying the steps in the plan is critical to be able to identify deviations from the plan.
Teachers may deviate from the plan because those deviations suppress challenging
behavior. In some cases, deviations appear to be valuable strategies that could be incorporated
into a student’s BIP. For example, Rachel’s one-on-one interactions with Dakota, which
provided encouragement, assurance, and physical interaction, facilitated outcomes. However,
just because a deviation suppresses challenging behavior does not make it an advantageous
deviation. Deviations may appear beneficial in the moment but may be detrimental in the long
38
term. For example, Wybie’s teacher was provided access to food whenever Wybie requested and
sometimes offered food even before a request. If Wybie was no longer provided access to food
throughout the day, it presumably may result in a dramatic increase in challenging behavior. This
deviation might not be sustainable long term. Researchers and clinicians should look for
deviations from the plan and hypothesize the possible effects the deviation has on outcomes.
Then, the practicality and longevity of deviations should be assessed by determining the
available resources and reviewing the long-term goals for the client.
39
References
Allen, K. D., & Warzak, W. J. (2000). The problem with parental nonadherence in clinical
behavior analysis: Effective treatment is not enough. Journal of Applied Behavior
Analysis, 33(3), 373-391. https://doi.org/10.1901%2Fjaba.2000.33-373
Bannerman, D. J., Sheldon J. B., Sherman, J. A., and Harchik, A. E. (1990). Balancing the right
to habilitation with the right to personal liberties: The rights of people with
developmental disabilities to eat too many doughnuts and take a nap. Journal of Applied
Behavior Analysis, 23(1), 79-89. https://doi.org/10.1901%2Fjaba.1990.23-79
Bergmann, S., Kodak, T., & Harman, M. J. (2021). When do errors in reinforcer delivery affect
learning? A parametric analysis of treatment integrity. Journal of the Experimental
Analysis of Behavior, 115(2), 561-577. https://doi.org/10.1002/jeab.670
Blackwell, W. H., & Rossetti, Z. S. (2014). The development of Individualized Education
Programs: Where have we been and where should we go now? Sage Open, 4(2).
https://doi.org/10.1177/2158244014530411
Brand, D., Henley, A. J., DiGennaro Reed, F. D., Gray, E., & Crabbs, B. (2019). A review of
published studies involving parametric manipulation of treatment integrity. Journal of
Behavioral Education, 28(1), 1-26. https://doi.org/10.1007/s10864-018-09311-8
Breaux, C. A., & Smith, K. (2023). Assent in applied behaviour analysis and positive behavior
support: ethical considerations and practical recommendations. International Journal of
Developmental Disabilities, 69(1), 111-121.
https://doi.org/10.1080/20473869.2022.2144969
Carroll, R. A., Kodak, T., & Fisher, W. W. (2013). An evaluation of programmed treatment-
40
integrity errors during discrete-trial instruction. Journal of Applied Behavior Analysis,
46(2), 379-394. https://doi.org/10.1002/jaba.49
Codding, R. S., Feinberg, A. B., Dunn, E. K., and Pace, G. M. (2005). Effects of immediate
performance feedback on implementation of behavior support plans. Journal of Applied
Behavior Analysis, 38(2), 205-219. https://doi.org/10.1901/jaba.2005.98-04
Codding, R. S., Livanis, A., Pace, G. M., & Vaca, L. (2008). Using performance feedback to
improve treatment integrity of classwide behavior plans: An investigation of observer
reactivity. Journal of Applied Behavior Analysis, 41(3), 417-422.
https://doi.org/10/1901/jaba.2008.41-417
Colón, C. L., & Wallander, R. (2023). Treatment integrity. In J. L. Matson (Eds)., Handbook of
applied behavior analysis: Integrating research into practice (pp. 439-463). Springer
Cham. https://doi.org/10.1007/978-3-031-19964-6
Cook, J. E., Subramaniam, S., Brunson, L. Y., Larson, N. A., Poe, S. G., & St. Peter, C. C.
(2015). Global measures of treatment integrity may mask important errors in discrete-trial
training. Behavior Analysis in Practice, 8(1), 37-47. https://doi.org/10.1007/s40617-014-
0039-7
Cooper, O. J., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis (2nd ed.)
Pearson.
DiGennaro Reed, F. D., Reed, D. D., Baez, C. N., & Maguire, H. (2011). A parametric analysis
of errors of commission during discrete-trial training. Journal of Applied Behavior
Analysis, 44(3), 611-615. https://doi.org/10.1901/jaba.2011.44-611
41
Donnelly, M. G., & Karsten, A. M. (2017). Effects of programmed teaching errors on acquisition
and durability of self-care skills. Journal of Applied Behavior Analysis, 50(3), 511-528.
https://doi.org/10.1002/jaba.390
Fiske, K. E. (2008). Treatment integrity of school-based behavior analytic interventions: A
review of the research. Behavior Analysis in Practice, 1(2), 19-25.
https://doi.org/10.1007%2FBF03391724
Foreman, A. P., St. Peter, C. C., Mesches, G. A., Robinson, N., & Romano, L. M. (2021).
Treatment integrity failures during timeout from play. Behavior Modification, 45(6), 988-
1010. https://doi.org/10.1177/0145445520935392
Foreman, A. P., Romano, L. M., Mesches, G. A., & St. Peter, C. C. (2022). A translational
evaluation of commission fidelity errors on differential reinforcement of other behavior.
The Psychological Record. https://doi.org/10.1007/s40732-022-00528-8
Garcia, E., Han, E., & Weiss, E. (2022). Determinants of teacher attrition: Evidence from
district-teacher matched data. Education Policy Analysis Archives, 30(25).
https://doi.org/10.14507/epaa.30.6642
Hagermoser Sanetti, L. M., & Kratochwill, T. R. (2011). An evaluation of treatment integrity
planning protocol and two schedules of treatment integrity self-report: Impact on
implementation and report accuracy. Journal of Educational and Psychological
Consultation, 21, 284-308. https://doi.org/10.1080/10474412.2011.620927
Han, J. B., Bergmann, S., Brand, D., Wallace, M. D., St. Peter, C. C., Feng, J. & Long, B. P.
(2022). Trends in reporting procedural integrity: A comparison. Behavior Analysis in
Practice, 16, 388-398. https://doi.org/10.1007/s40617-022-00741-5
42
Harvey, O. B., & St. Peter C. C. (2024). Classifying fidelity errors in the context of behavioral
treatment [Unpublished Manuscript]. Department of Psychology, West Virginia
University.
Higgins, J. P., Riggleman, S., & Lohmann, M. J. (2023). A practical guide to writing behavior
intervention plans for young children. The Journal of Special Education Apprenticeship,
12(1). https://doi.org/10.58729/2167-3454.1160
Holcombe, A., Wolery, M., Snyder, E. (1994). Effects of two levels of procedural fidelity with
constant time delay on children’s learning. Journal of Behavioral Education, 4(1), 49-73.
https://doi.org/10.1007/BF01560509
Horner, R. H., Sugai, G., Todd, A. W., & Lewis-Palmer, T. (2000). Elements of behavior support
plans: A technical brief. Exceptionality, 8(3), 205-215.
https://doi.org/10.1207/S15327035EX0803_6
Individuals with Disabilities Education Improvement Act, 20 U.S.C. § 300.530(f) (2004).
https://sites.ed.gov/idea/regs/b/e/300.530/f
Jenkins, S. R., Hirst, J. M., DiGennaro Reed, F. D. (2015). The effects of discrete-trial training
commission errors on learner outcomes: An extension. Journal of Behavioral Education,
24(2), 196-209. https://doi.org/10.1007/s10864-014-9215-7
Jones, S. H., & St. Peter, C. C. (2022). Nominally acceptable integrity failures affect
interventions involving intermittent reinforcement. Journal of Applied Behavior Analysis,
55(4), 1109-1123. https://doi.org/10.1002/jaba.944
Kincaid, D., Childs, K., Blase, K. A., & Wallace, F. (2007). Identifying barriers and facilitators
in implementing schoolwide positive behavior support. Journal of Positive Behavior
Interventions, 9(3), 174-184. https://doi.org/10.1177/10983007070090030501
43
Kollerová, L., Květon, P., Zábrodská, K., & Janošová, P. (2021). Teacher exhaustion: The
effects of disruptive student behaviors, victimization by workplace bullying, and social
support from colleagues. Social Psychology of Education, 26. 885-902.
https://doi.org/10.1007/s11218-023-09779-x
Leon, Y., Wilder, D. A., Majdalany, L., Myers, K., & Saini, V. (2014). Errors of omission and
commission during alternative reinforcement of compliance: The effects of varying levels
of treatment integrity. Journal of Behavioral Education, 23(1), 19-33.
http://dx.doi.org/10.1007/s10864-013-9181-5
Morris, C., Jones, S. H., & Oliveira, J. P. (2024). A practitioner’s guide to measuring procedural
fidelity. Behavior Analysis in Practice. https://doi.org/10.1007/s40617-024-00910-8
National Autism Center. (2015). Findings and conclusions: National standards project, phase 2.
Randolph, MA: Author
National Council of Teacher Quality (2014). Training our future teachers: Classroom
management.
https://www.nctq.org/dmsView/Future_Teachers_Classroom_Management_NCTQ_Repo
rt
Noell, G. H., Gresham, F. M., & Gansle, K. A. (2002). Does treatment integrity matter? A
preliminary investigation of instructional implementation and mathematics performance.
Journal of Behavioral Education, 11(1), 51-67. https://doi/10.1023/A:1014385321849
Peterson, L., Homer, A. L., & Wonderlich, S. A. (1982). The integrity of independent variables
in behavior analysis. Journal of Applied Behavior Analysis, 15(4), 477-492.
https://doi.org/10.1901/jaba.1982.15-477
44
Quigley, S. P., Ross, R. K., Field, S., & Conway, A. A. (2018). Toward an understanding of the
essential components of behavior analytic service plans. Behavior Analysis in Practice,
11(4), 436-444. https://doi.org/10.1007/s40617-018-0255-7
Ravid, A., Lagbas, E., Johnson, M., & Osborne, T. L. (2021). Targeting co-sleeping in children
with anxiety disorders using a modified bedtime pass intervention: A case series using a
changing criterion design. Behavior Therapy, 52(2), 298-312.
https://doi.org/10.1016/j.beth.2020.03.004
Schlichte, J., Yssel, N., & Merbler, J. (2005). Pathways to burnout: Case studies in teacher
isolation and alienation. Preventing School Failure: Alternative Education for Children
and Youth, 50(1), 35-40. http://dx.doi.org/10.3200/PSFL.50.1.35-40
St. Peter Pipkin, C., Vollmer, T. R., & Sloman, K. N. (2010). Effects of treatment integrity
failures during differential reinforcement of alternative behavior: A translational model.
Journal of Applied Behavior Analysis, 43(1), 47-70. https://doi.org/10.1901/jaba.2010.43-
47
St. Peter, C. C., Byrd, J. D., Pence, S. T., & Foreman, A. P. (2016). Effects of treatment-integrity
failures on a response-cost procedure. Journal of Applied Behavior Analysis, 49(2), 308-
328. https://doi.org/10.1002/jaba.291
Solomon, B. G., Klein, S. A., & Politylo, B. C. (2012). The effect of performance feedback on
teachers’ treatment integrity: A meta-analysis of the single-case literature. School
Psychology Review, 41(2), 160-175. https://doi.org/10.1080/02796015.2012.12087518
Vollmer, T. R., Sloman, K. N., & St. Peter Pipkin, C. (2008). Practical implications of data
reliability and treatment integrity monitoring. Behavior Analysis in Practice, 1(2), 4-11.
https://doi.org/10.1007%2FBF03391722
45
Wickstrom, K. F., Jones, K. M., LaFleur, L. H., & Witt, J. C. (1998). An analysis of treatment
integrity in school-based behavioral consultation. School Psychology Quarterly, 13(2),
141-154. https://doi.org/10.1037/h0088978
West Virginia Department of Education. (2023). West Virginia professional teaching standards.
https://wvde.us/wp-content/uploads/2023/05/WV-Professional-Teaching-Standards-
Final_5-3-2023.pdf
Williams, D. E., & Vollmer, T. R. (2015). Essential components of written behavior treatment
plans. Research in Developmental Disabilities, 36, 323-327.
https://doi.org/10.1016/j.ridd.2014.10.003
46
Table 1
Student Demographic Information
Number of
Grade in Primary Race &
Participant Age Sex Diagnoses proactive
school language ethnicity
BIP steps
ADD, ODD, Depression,
Fabian 8 2 English Male White 56
Learning Disability
ADHD, Anxiety, Mood
Dakota 9 3 English Female White 30
Regulation Disorder
Warner 7 2 English Male White N/A 18
ADHD, ODD, Sensory

Wybie 10 2 English Male White 22
Processing
Note. ADD = Attention deficit disorder, ADHD = Attention deficit hyperactivity disorder, ODD = oppositional defiance disorder
47
Table 2
Teacher Demographic Information
48
Table 3
Examples and Nonexamples of BIP Steps that Meet the Inclusionary Criterion of a Directly
Observable Interaction Between the Teacher and Student
Criterion Examples Nonexamples

Proactive step When the student completes a If the student refuses to start a
worksheet, deliver a token. worksheet, prompt a directive once
every 30s.
When the student raises their hand, If the student walks out of the
call upon them. classroom, follow them to monitor
their safety.
Directly At the start of each academic task, Write the date on the daily data
observable set one behavioral expectation. collection document. (not an
interaction interaction)
between the Throughout activities, honor every Prepare the token economy materials
teacher and appropriate request to leave an area before the student arrives to school.
student without challenging behavior. (not an interaction)
Before tests for which the student If the student leaves their seat on the
may not feel confident (i.e., math and bus, the bus driver should prompt the
writing), state a reminder that it is student to sit back in their seat.
okay to be unsure of answers. (interaction is not by the teacher;
step is not proactive)
At the end of every day, If the student engages in an
acknowledge areas of success that inappropriate void, call their parent.
school day. (interaction is not with the student;
step is not proactive)
Identifiable in At the start of the first work period If the student meets the behavioral
a 15-min of the day, orient the student to the criteria for 10 consecutive school
observation token board. days, provide a special lunch.
When the timer has 10 seconds left, Evaluate the students’ daily
provide a verbal countdown. behavioral data weekly.
Allow the student to select a quiet Provide a consistent and predictable
toy to take with them to the general routine across school days.
education classroom.
49
Table 4
Number of BIP Steps Included and Excluded for Each Student-Teacher Dyad
Excluded Steps by Exclusion Criteria
Not a Directly Not Observable

Dyad Included Excluded Reactive Observable During a 15-min Duplicate
Interaction Period
Fabian-Kelly 61 48 35 3 6 4
Dakota-Rachel 37 22 20 2 0 0
Warren-Rhett 18 41 29 8 0 4
Wybie-Rhett 22 39 35 1 2 1
50
Table 5
Interobserver Agreement Per Commission Error Not in Plan for Fabian-Kelly
Primary Secondary
Commission Error Not in Plan IOA Percentage
Observer Tallies Observer Tallies
Acknowledgement/Praise 16 21 76%
Allowed to switch or move seats or stand instead of sit No IOA data
High-probability sequence No IOA data
Proximity 2 2 100%
Extra help 2 0 0%
Physical attention No IOA data
Restricted access to items No IOA data
Stated a contingency or rule 0 1 0%
Access to food during instruction No IOA data
Allowed the student to show off a skill they requested to No IOA data
Answered hand raise during independent work No IOA data
51
Table 6
Interobserver Agreement Per Commission Error Not in Plan for Dakota-Rachel
Primary Secondary
Proximity 5 5 100%
Attend… No IOA data
Reminders 4 2 50%
Statements of encouragement or assurance No IOA data
Provided choices 1 2 50%
Let… No IOA data
Break from academic demands 1 2 50%
Access to peers during break 1 2 50%
Access to items (Not in the BIP) 2 2 100%
Told Dakota to get her dollars No IOA data
Access to extra teachers No IOA data
Awarded time with teacher certificate 1 2 50%
Access to snack No IOA data
Access to new classroom No IOA data
52
Table 7
Interobserver Agreement Per Commission Error Not in Plan for Warren-Rhett
Primary Secondary
Proximity 5 1 20%
Restricted access to items 1 0 0%
Awarded a token 2 1 50%
Praise with token delivery 0 1 0%
Statement of gratitude 1 1 100%
Advanced notice 1 0 0%
Access to items 0 1 0%
Allowing the student to sit where he wants No IOA data
Provided a choice No IOA data
Granted requests to have access to an activity longer No IOA data
Screening his view from peers 1 1 100%
53
Table 8
Interobserver Agreement Per Commission Error Not in Plan for Wybie-Rhett
Primary Secondary
Proximity 8 12 66%
Access to items 5 6 83%
Gesture prompt No IOA data
Honoring appropriate requests for items 1 0 0%
Statement of gratitude 1 2 50%
Provided a choice 1 1 100%
Advanced notice 1 0 0%
Awarded a token 1 2 50%
Allowing him to sit incorrectly 1 1 100%
Physical attention 1 0 0%
Statement of encouragement No IOA data
Visual prompt No IOA data
Offering access to food No IOA data
54
Table 9
Operational Definitions of Student Challenging Behavior
Student Behavior Operational Definition

Fabian Protesting Refuses to engage in an activity with a reference to his own
behavior.
Swearing Says swear word.
Insulting Says negative statement about appearance or intellect.
Destroying Swiping, ripping, crumbling, throwing, or making forceful
Property contact with items, furniture or the building.
Aggressing Attempting to or making forceful contact with another or
throwing large items towards the implementer.
Eloping The student takes one step outside of the research room door
without permission from the researcher.
Dakota Vocalizing Saying a statement or making noise above conversational
Loudly volume.
Protesting Refuses to engage in an activity with reference to own behavior.
Destroying Physically removing academic work from her proximity
Property (swiping work off desk) or swiping materials out of place,
throwing, breaking or otherwise damaging materials or items.
Aggressing Attempting to or making forceful contact (hitting or kicking)
towards another person that could cause harm.
Injuring Self Attempting to or making forceful contact (hitting) towards self
that could cause harm.
Wybie Protesting Refuses to engage in an activity with a reference to his own
behavior.
Negative Making an insulting, cursing, or threatening statement towards
Interactions another.
Destroying Throwing, tipping, or making forceful contact with items.
Property
Aggressing Attempting to or making forceful contact towards another person
that could cause harm.
55
Table 10
Procedural-Fidelity Errors in Study 2 for Dakota
Session
BIP Step Error Type Error Description
Number
6 Let her know she can raise her Omission Researcher did not implement the
hand if she wants to talk to the step with 15 s
teacher (within 15 s) Commission Researcher implemented the step
after 15 s
13 Attend to the hand raise Commission Researcher attended to Dakota
without Dakota raising her hand
14 Engage in at least one physical Omission Did not engage in a physical
interaction (programmed interaction (limited visibility of
commission error not in plan) interaction due to camera
difficulties)
20 Talk to Dakota for a minute Commission Researcher implemented the step
about the task (rephrase the for more than 1 min
instructions)
21 Ask Dakota if she would like Omission (4) Researcher did not ask
help with the assignment or to
chat with a teacher
Talk to Dakota for a minute Commission Researcher implemented the step
about the task (rephrase the for more than 1 min
instructions)
24 Ask Dakota if she would like Commission Researcher asked if she would like
help with the assignment or to any help.
chat with a teacher
25 Talk to Dakota for a minute Omission Researcher did not talk to Dakota
about the task (rephrase the for a minute about the task
instructions)
instructions)
instructions)
Note. The numbers in parentheses next to the error type indicate the count of times the error was
made during the session.
56
Table 11
Procedural-Fidelity Errors in Study 2 for Wybie
Session
BIP Step Error Type Error Description
Number
1 N/A Commission Gesture Prompt
Error Not in
Plan (8)
2 Tell him he can raise his hand Omission Researcher did not tell him that he
if he needs help can raise his hand if he needs help
Tell him to sit in his seat Omission Researcher did not tell him to sit
in his seat
N/A Commission Gesture Prompt
Error Not in
Plan (5)
3 Praise Wybie once every 5 Omission Researcher did not praise Wybie
minutes
Note. The numbers in parentheses next to the error type indicate the count of times the error was
made during the session.
57
Figure 1
Interobserver Agreement of Total Tallies for Each Type of Teacher Interaction
FABIAN DAKOTA
18 14
r = 1.00 r = 0.8587
16 p = 0.0006 p = 0.0006
12
14
10
OBSERVER 2
OBSERVER 2
12
10 8
8 6
6
4
4
2
2
0 0
0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14
OBSERVER 1 OBSERVER 1
WARREN WYBIE
12 18
r = 0.6885 r = 1.00
p = 0.0008 16 p = 0.0001
10
14
OBSERVER 2
OBSERVER 2
8 12
10
6
8
4 6
4
2
2
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12 14 16 18
Correct Commission Error in Plan

Omission Commission Error Not in Plan
58
Figure 2
Overall Count and Type of Errors by Student-Teacher Dyad
160
140
Count of Teacher Interactions
120
100 Commission Error

Not in Plan
Commission Error
80 in Plan
Omission Error
60
Correct
40
20
0
ly
tt
he
et
he
el
Rh
ac
-K
-R
-R
n-
ie
an
re
ta
yb
bi
ar
o
W
Fa
ak
W
D
Student-Teacher Dyad
59
Figure 3
Type of Errors by Hour for Fabian-Kelly
X-axis
BIP step
label
When Fabian is complying with academic instruction, gets to work immediately,
raises his hand, does not interrupt a peer when it is their turn to answer, or stays in
A
his area for an entire activity, deliver a token for every 1-3 instances of appropriate
behavior
B Acknowledgement or praise
C Flexible seating
D High-probability sequence
E Proximity
F Extra help
G Physical attention
H Restricted access to possibly distracting items
60
I Stated a contingency or rule
J Access to food during instruction
K Allowed the student to show off a skill they requested to
L Answered hand raise during independent work
61
Figure 4
Type of Errors by Hour for Dakota-Rachel
20
Correct
Omission Error
Commission Error in Plan
15 Commission Error Not in Plan
Errors Per Hour
10
0
A B C D E F G I H J K L MN O P Q R S T U VWX Y Z
Interaction
X-axis
BIP step
label
Throughout academic instruction, when Dakota attends to academic instruction, uses
A kind words, starts work right away, or completes a teacher directive, provide praise
directed specifically to Dakota a minimum of once every 5 minutes
B Proximity
When the 30-minute interval timer sounds or the minutes on the clock are 0 or 30,
C
place a “dollar” for each domain of behavior that met expectations in her “wallet”
D If Dakota raises her hand, attend to the hand raise
When the 30-minute interval timer sounds or the minutes on the clock are 0 or 30,
E
deliver behavior-specific praise to Dakota that describes what she is doing well
F Reminders
G Statements of encouragement or assurance
H Provided choices
62
When presenting a writing instruction, let her know she can raise her hand if she
I
wants to talk to the teacher
J Break from academic demands
K Access to peers during break
L Access to items (not in the BIP)
Before the end of the morning block and afternoon block, if Dakota was safe for the
M
entire block, state that if you have enough for break you can take a break right now
Before the end of the morning block and afternoon block, if Dakota was safe for the
N
entire block, state that it is time for dollars to be exchanged for reward(s)
Before the end of the morning block and afternoon block, f Dakota was safe for the
O
entire block, remind the student that they can borrow from their bank if they need to
Before the end of the morning block and afternoon block, f Dakota was safe for the
P
entire block, tell Dakota to count her dollars to determine what she would like
Q Told Dakota to get her dollars
R When presenting a writing instruction, talk to Dakota for a minute about the task
S Physical attention
T Access to extra teachers
U Awarded time with teacher certificate
V Access to snack
If Dakota requests help from a teacher on a task within her skill set, ask Dakota if
W
she would like help with the assignment or to chat with a teacher
If Dakota would like to exchange a certificate at a time an exchange cannot occur or
X
makes a request that cannot be granted, provide an alternative option
If Dakota would like to exchange a certificate at a time an exchange cannot occur or
Y makes a request that cannot be granted, praise acceptance of the choice of an
alternative option
Z Access to new classroom
63
Figure 5
Type of Errors by Hour for Warren-Rhett
X-axis
BIP step
label
A Proximity
B Once an activity while he is on-task and working, praise or acknowledge him
Each time an assignment is to be presented, if a group assignment is available, ask
C
him if he would like to work with a teacher or do a group activity
D Restricted access to possibly distracting items
E Awarded a token
F Praise with token delivery
G Statement of gratitude
H Advanced notice
I Once an activity while he is on-task and working, engage in a physical interaction
64
If he completes an activity without major challenging behavior, provide access to
J
the selected reward
If he completes an activity without major challenging behavior, allow him to pick a
K
reward
L Access to items
M Flexible seating
If Warren makes a reasonable request and is not engaging in challenging behavior,
N
honor it
O Provided a choice
P Granted requests to have access to an activity longer
S Screening his view from peers
T Physical attention (not in BIP)
During transitions, allow and/or encourage Warren to bring a book with him or have
U
an adult walk with him
65
Figure 6
Type of Errors by Hour for Wybie-Rhett
X-axis
BIP step
label
A Proximity
B Praise Wybie once every 5 minutes
C Access to items
D Gesture prompt
E Honoring appropriate requests for items
F Statement of gratitude
If an activity is above his level in English Language Arts, state instructions about
G
how he can request help or an alternative activity
H Provided a choice
I Advanced notice
66
J Awarded a token
K Flexible seating
L Physical attention
M Statement of encouragement
N Visual prompt
O Offering access to food
When Wybie must complete an academic activity independently, offer for him to sit
P
near the teacher or in a seat of his choice
67
68
EFFECTS OF COMMISSION ERRORS ON STUDENT OUTCOMES
Figure 7
Interobserver Agreement of Total Tallies for Each Type of Experimenter Interaction
FABIAN DAKOTA
150 24
r = 1.00 r = 0.90
p = 0.0008 p = <0.0001
125 20
OBSERVER 2
OBSERVER 2
100 16
75 12
50 8
25 4
0 0
0 25 50 75 100 125 150 0 4 8 12 16 20 24
WYBIE
6
r = 0.81
p = 0.25
OBSERVER 2
0
0 2 4 6
OBSERVER 1
Correct Commission Error in Plan

Omission Commission Error Not in Plan
68
69
Figure 8
Study 2 Experimental Designs

HF NF
1.5
Fabian
1.0
0.5
0.0
1 2 3 4 5 6 7 8 9 10
All Challenging Behavior Per Minute
HF FCE HF FCE
3.0
Dakota
NF Probe
2.5
2.0
1.5
1.0
0.5
0.0
1 4 7 10 13 16 19 22 25
HF
1.0
Wybie
0.5
0.0
1 2 3 4 5 6 7 8 9 10
Sessions
Note. HF = High-Fidelity condition, NF = No-Fidelity condition, and FCE = Facilitative

Commission Error condition.
69
70
APPENDIX A. Teacher Demographic Questionnaire.
70
71
71
72
72
73
APPENDIX B. Student Demographic Questionnaire.
73
74
74
75
75
76
APPENDIX C. Teacher Questionnaire.
76
77
77
78
78
79
APPENDIX D. Procedural Fidelity Checklist for Fabian.
79
80
80
81
81
82
82
83
83
84
84
85
85
86
86
87
87
88
APPENDIX E. Procedural Fidelity Checklist for Dakota.
88
89
89
90
90
91
91
92
92
93
93
94
APPENDIX F. Procedural Fidelity Checklist for Warren.
94
95
95
96
96
97
97
98
APPENDIX G. Procedural Fidelity Checklist for Wybie.
98
99
99
100
100
101
101
View publication stats

Effectsof Commission Errorson Behavior Intervention Plan Outcom

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Effectsof Commission Errorson Behavior Intervention Plan Outcom

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Effects of Commission Errors on Behavior Intervention Plan Outcomes

Thesis · May 2024

The user has requested enhancement of the downloaded file.

Effects of Commission Errors on Behavior Intervention Plan

Follow this and additional works at: https://researchrepository.wvu.edu/etd

Part of the Applied Behavior Analysis Commons

in partial fulfillment of the requirements for the degree of

Claire St. Peter, PhD, Chair

Morgantown, West Virginia

Keywords: treatment integrity, commission error, school-based intervention

Copyright 2024 Olivia Harvey

Effects of Commission Errors on Behavior Intervention Plan Outcomes

General Method ………………………………………………………………………………….. 6

Recruitment and Consenting Process……………………………………………………. 6

Demographic Questionnaires, Record Review, and Likert Scale……………………...…. 6

Interobserver Agreement (IOA)……………………………….……………..……….…. 14

Results and Discussion………………………………………………..……..……….…. 15

Implementer, Consenting Procedures, & Setting……..…………………………………. 23

Response Measurement and IOA……………………………………………..……….…24

Results and Discussion…………………………………………...…………..……….… 31

References …………………………………………………………………………………. ……40

Figures…. …………………………………………………………………………………. ……58

Procedural fidelity is the extent to which a procedure is implemented as planned (Cook et

failing to deliver a reinforcer following an alternative (appropriate) response as specified in a

procedure would be an omission error. A commission error consists of adding a step to an

intervention context. For example, delivering a reinforcer following a target (challenging)

response, which is not specified in the procedure, would be a commission error.

engaged in arbitrary responses. During Experiment 1, commission errors consisted of

experimenters deemed analogous to challenging behavior. Commission errors resulted in higher

without commission errors.

were not evaluated.

conducted a two-part study. In Study 1, experimenters observed four special-education teachers,

circumstances, commission errors could facilitate positive outcomes.

adopted by Foreman et al. (2021) in the context of teachers’ implementation of school-based

observed 27 teachers with varying levels of teaching experience implement behavior-change

procedures (e.g., differential reinforcement of alternative behavior, response cost) in their

procedural fidelity (M = 4%).

on the teacher’s implementation of 14 procedural components. Teachers implemented the

(National Autism Center, 2015).

errors teachers made (p. 331).

Recently, Morris et al. (2024) provided recommendations for measuring procedural

al. to broadly capture various kinds of errors.

Fidelity errors have predominantly been found to be detrimental to outcomes. However,

BIPs. Study 2 was an experimental manipulation of an identified commission error in a reversal

Recruitment and Consenting Process

Student-teacher dyads were recruited from public elementary schools. We received

children above the age of 7 years without significant cognitive impairments.

Demographic Questionnaires, Record Review, and Likert Scale

Table 1 summarizes student demographic information and Table 2 summarizes teacher

the Questionnaires and Likert Scales.

for 5 months. Rachel had assisted in writing Dakota’s BIP.

manipulate in Study 2 (see Error Selection section in Study 2).

when teachers implemented BIPs during regularly occurring classroom routines.

Observations occurred in the alternative-education classroom during the students' typical

elapses)” to make the steps independent.

checklist for observers to reference.

checklist before the next observation.

the behavior specified to be omitted occurred.

and multiplied by 100 to yield a percentage.

data collection by comparing their checklist to a trained researcher.

Results and Discussion

between the BIP as written and the BIP as implemented.