Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

J Autism Dev Disord (2016) 46:2464–2479

DOI 10.1007/s10803-016-2782-9

ORIGINAL PAPER

Measuring Changes in Social Communication Behaviors:


Preliminary Development of the Brief Observation of Social
Communication Change (BOSCC)
Rebecca Grzadzinski1,2 • Themba Carr3 • Costanza Colombi4 • Kelly McGuire5,6 •

Sarah Dufek1 • Andrew Pickles7 • Catherine Lord1

Published online: 9 April 2016


Ó Springer Science+Business Media New York 2016

Abstract Psychometric properties and initial validity of This work is a first step in the development of a novel
the Brief Observation of Social Communication Change outcome measure for social-communication behaviors with
(BOSCC), a measure of treatment-response for social- applications to clinical trials and longitudinal studies.
communication behaviors, are described. The BOSCC
coding scheme is applied to 177 video observations of 56 Keywords Autism  Autism spectrum disorder (ASD) 
young children with ASD and minimal language abilities. Autism Diagnostic Observation Schedule (ADOS)  Brief
The BOSCC has high to excellent inter-rater and test–retest Observation of Social Communication Change (BOSCC) 
reliability and shows convergent validity with measures of Social communication  Restricted and Repetitive
language and communication skills. The BOSCC Core Behaviors and Interests (RRB)  Toddlers  Preschoolers
total demonstrates statistically significant amounts of
change over time compared to a no change alternative
while the ADOS CSS over the same period of time did not. Introduction

Electronic supplementary material The online version of this There is a critical need for the development of outcome
article (doi:10.1007/s10803-016-2782-9) contains supplementary measures that assess changes in social communication
material, which is available to authorized users. behaviors. Though most treatments for autism spectrum
& Catherine Lord disorder (ASD) focus on improvements of social commu-
cal2028@med.cornell.edu nication behaviors (Rogers and Vismara 2008; Wolery and
Garfinkle 2002), the field of ASD intervention research in
1
Center for Autism and the Developing Brain, Weill Cornell particular has struggled to find measures of treatment
Medical College, New York-Presbyterian Hospital,
21 Bloomingdale Road, Rogers Building, White Plains,
response that adequately capture changes in these behav-
NY 10605, USA iors (Anagnostou et al. 2015; Fletcher-Watson and
2
Teachers College, Columbia University, New York, NY,
McConachie 2015; McConachie et al. 2015). This is in part
USA because changes in social communication behaviors are
3
Center for Autism Research and Treatment, Semel Institute
often subtle, making it difficult to find measures that are
for Neuroscience and Human Behavior, University of sensitive enough to capture small, though potentially
California, Los Angeles, CA, USA meaningful, changes (Anagnostou et al. 2015; Cunningham
4
University of Michigan, Ann Arbor, MI, USA 2012; Matson 2007; McConachie et al. 2015; Yoder et al.
5
Center for Autism and Developmental Disorders, Maine
2013).
Behavioral Health Care, South Portland, ME, USA Moreover, few of the measures available are flexible or
6 standardized enough to be used across sites and studies,
New York Presbyterian Hospital, Columbia University
Medical Center and New York State Psychiatric Institute, resulting in numerous studies that use a variety of different
New York, NY, USA measures (Cunningham 2012; McConachie et al. 2015). A
7
Institute of Psychiatry, Psychology and Neuroscience, Kings recent review noted that out of 195 behavioral intervention
College London, London, UK trials for ASD, over 200 different measurement tools were

123
J Autism Dev Disord (2016) 46:2464–2479 2465

used to assess treatment response (Bolte and Diehl 2013). 2012; Lord et al. 2012a), but has been less successful in
Sixty percent of these tools were used in only a single identifying changes over shorter periods of time (Brian
study, with only three tools used in more than 2 % of et al. 2015; Dawson et al. 2010; Estes et al. 2015; Shum-
studies (Bolte and Diehl 2013). In addition, another study way et al. 2012; Thurm et al. 2015). Analyses of the
noted that 75 commonly used tools have little validity as Autism Diagnostic Interview, Revised (ADI-R; Lord et al.
outcome measures (McConachie et al. 2015). 1994), a parent interview used for diagnostic assessment,
Recently, a panel of ASD experts determined that only a has also proven useful in identifying trajectories of change
handful of existing measures are appropriate for identifying over the course of years (Gutstein et al. 2007; Lord et al.
treatment response in ASD and even these recommended 2015; Sallows and Gaupner 2005), but its utility over
instruments have significant limitations (Anagnostou et al. shorter periods of time is unclear. A further hindrance to
2015; Scahill et al. 2015). For example, researchers often using measures such as the ADOS-2 and ADI-R is that they
measure skills, such as cognitive or adaptive skills, which require significant training to administer and score reliably,
are not directly targeted in intervention (Matson 2007; as well as substantial time from patients and clinicians,
Spence and Thurm 2010; Wolery and Garfinkle 2002). limiting feasibility in large-scale, multi-site studies. Given
Alternatively, researchers use treatment response measures these limitations, a recent review (Anagnostou et al. 2015)
that are study-specific (Green et al. 2010; Kaale et al. 2012; recommended against using ASD diagnostic measures, like
Kasari et al. 2012; Rogers et al. 2012; Yoder et al. 2014). the ADOS-2, as outcome measures though use of these
For example, researchers create a measure that captures the tools was previously encouraged for this purpose (Cun-
frequency of a specific operationalized behavior that is ningham 2012; Matson 2007).
targeted in treatment, such as joint attention (Kaale et al. An additional limitation to measures commonly used in
2012) or imitation (Rogers et al. 2012). Although these clinical trials is the reliance on caregiver or clinician report
measures may be helpful to identify changes in single (Anagnostou et al. 2015; Bolte and Diehl 2013), such as
behaviors in particular studies, they do not capture broader Clinical Global Impressions (CGI; Busner and Targum
social communication changes (Spence and Thurm 2010). 2007), because placebo effects are particularly strong for
Even though researchers often measure similar social caregiver or clinician report measures. These effects may
communication behaviors across different studies, these outweigh more subtle changes that occur over time or in
behaviors are often operationalized differently across response to interventions (Guastella et al. 2015; Lord et al.
measures, which interferes with interpretation of results 2012a; Owley et al. 2001). In a recent paper, caregiver-
beyond single studies (Wolery and Garfinkle 2002). Fur- report measures of response to treatment were more related
thermore, focusing outcome measures on behaviors that are to the caregiver’s belief that the child was receiving the
highly specific to a particular treatment could accentuate experimental treatment than to the treatment itself (Guas-
treatment effects that may not generalize beyond these tella et al. 2015). A second, related issue that limits mea-
specific measures (Yoder et al. 2013). surement of treatment effects is ‘‘unblinding,’’ which is
Other measures used are intended for screening, diag- often inherent in caregiver or clinician reports. For exam-
nosis, or measuring symptom severity (Anagnostou et al. ple, in treatments that have significant side effects, care-
2015; Scahill et al. 2015). As such, these measures are givers and clinicians are frequently aware if the child is
usually not sufficiently sensitive for measuring change over experiencing these other changes. Bias associated with
short periods of time (e.g., months rather than years). For unblinding is often inherent since few studies use reporters
example, the Autism Diagnostic Observation Schedule of outcome measures who are blind to the child’s treatment
(ADOS-2; Lord et al. 2012b, c), a measure intended for status (Wolery and Garfinkle 2002). Last, measures used to
diagnostic purposes, has frequently been applied as an capture a broad range of social-communication behaviors
outcome measure. Using raw scores from the ADOS-2 has are often confounded by co-occurring intellectual deficits
generally been unsuccessful in assessing changes (Owley and behavior and/or language problems (Hus et al. 2013).
et al. 2001), perhaps because ADOS-2 raw scores are not The influence of these confounds may make it difficult to
intended for use as interval data or for measuring change. disentangle meaningful changes in ASD-specific social-
When changes have been identified with ADOS-2 raw communication behaviors from other non-ASD-specific
scores, the clinical significance of these changes may be behaviors.
limited since changes are also present in treatment-as-usual The limitations of currently used measures interfere with
conditions when comparison groups are presented (Green the ability of clinicians and researchers to measure effec-
et al. 2010; Gutstein et al. 2007). tiveness of interventions, perhaps contributing to the phe-
The ADOS-2 Calibrated Severity Score (CSS; Esler nomenon that few ASD interventions have met standard
et al. 2015; Gotham et al. 2009) has been useful in iden- criteria for efficacy (Chambless and Hollon 1998; Danial
tifying changes over the course of years (Gotham et al. and Wood 2013; Levy et al. 2009; Rogers and Vismara

123
2466 J Autism Dev Disord (2016) 46:2464–2479

2008; Spreckley and Boyd 2009). Given this critical need, ADOS-2 (Lord et al. 2012b, c). All participants had elected
researchers have begun to focus efforts on developing to join various treatment studies (Kasari et al. 2010; Rogers
measures that are sensitive to change (Fletcher-Watson and et al. 2012; Wetherby et al. 2014) depending on which
McConachie 2015; McConachie et al. 2015). The Brief studies were available at the time and were then random-
Observation of Social Communication Change (BOSCC) is ized into a treatment condition at the University of
an initial attempt by our group to address the limitations of Michigan Autism and Communication Disorders Center
commonly used measures. The BOSCC is a new measure (UMACC; n = 49) or the Center for Autism and the
consisting of specific items that were developed to identify Developing Brain (CADB; n = 6), with the exception of
changes in social-communication behaviors over relatively one participant. Data from this one child was extracted
short periods of time by quantifying subtleties in both the from an existing database of children whose parents had
frequency and the quality of specific behaviors. The goal of provided written informed consent for their child’s clinical
the BOSCC is to provide researchers and clinicians with an information/assessments to be included in an Institutional
outcome measure that is flexible, easy to code, and mini- Review Board (IRB)-approved database. All children
mally-biased by caregiver or clinician report. The BOSCC included in this study with the exception of one child were
is flexible enough to be used across a variety of settings receiving some form of intervention while participating,
(e.g., across multi-site studies, in clinics or at home) and to either through the treatment condition in the clinical trials
be coded by a clinician/researcher who is blind to the or elsewhere, though the interventions varied in frequency
child’s treatment status, and is, for example, new to ASD and type (see Kasari et al. 2010; Rogers et al. 2012;
research (e.g., research assistant). Wetherby et al. 2014 for details regarding intervention
The BOSCC described in this work is applicable to trials). For the one child who was not receiving any form of
minimally-verbal, young children. The BOSCC is a coding intervention, only one BOSCC observation was available;
scheme that was developed by modifying and expanding accordingly, this child was not included in analyses of
codes from the ADOS-2 (Lord et al. 2012b) to capture change over time. For the purposes of this initial work,
more subtle variations in behaviors. In this initial paper, we which focuses on the validity and reliability of the BOSCC,
use the BOSCC to measure behaviors in children with the effects of specific treatment conditions are not
ASD, although applications of the BOSCC may extend to explored; future work will address this question. All chil-
other disorders with deficits in social-communication (e.g., dren included in the study were between 1 and 5 years of
language impairments, social/pragmatic communication age with minimal spontaneous language (simple phrase
disorder, and social anxiety disorder). The goal of this speech or less, equivalent to ADOS-2 Module Toddler, 1,
paper is to provide preliminary evidence for the utility of or 2), as is appropriate for the current BOSCC coding
the BOSCC as a treatment outcome measure. The specific scheme (described below). See Table 1 for demographic
aims are to (1) determine items for inclusion in the final and initial observation information.
BOSCC coding scheme through exploration of item cor-
relations, (2) explore the relationship between items using Primary Measure (BOSCC)
factor analysis, (3) examine its psychometric properties,
including inter-rater and test–retest reliability, and (4) For the purposes of assessing the initial psychometric
provide initial evidence for validity through explorations of properties and validity, the BOSCC coding scheme was
changes in BOSCC scores over time compared with applied to 10-min videos of free-play interactions between
changes in scores from other standard measures. In order to a caregiver and a child, gathered over the course of the
provide an example of how BOSCC data could be used in a child’s participation in an intervention trial. A parent was
clinical trial, we present data in a similar manner to the play partner for the majority of the BOSCC observa-
existing early intervention research. tions (97 %, n = 171), with 94 % (n = 160) of these
conducted with mothers. For the remaining videos, the
interaction was gathered with the child and another care-
Method giver (e.g., grandparent). The majority of observations
were gathered in the clinic setting (n = 147, 83 %) while
Participants the remaining observations were gathered at the child’s
home. These play samples were determined to be adequate
Fifty-six children (44 males) with a Best Estimate Clinical for applying the BOSCC coding scheme as they comprised
Diagnosis (BEC; Anderson et al. 2014) of ASD were many of the elements recommended for BOSCC observa-
included in this study. Diagnoses of ASD were determined tions including minimal structure, a variety of toys (such as
based on thorough diagnostic evaluations, including cause and effect and pretend objects), and duplicates of
administration of the ADI-R (Lord et al. 1994) and the toys (in order to promote interactive play). Caregivers were

123
J Autism Dev Disord (2016) 46:2464–2479 2467

Table 1 Background and first observation information (n = 56) restricted, repetitive behaviors/interests seen in ASD.
Mean (SD) Range
Three items were used as markers of Other Abnormal
Behaviors often seen in ASD, although these behaviors
Age (months) 28.9 (10.5) 12–56 were rarely observed in this sample of children playing
VABS (standard score) (n = 55) with their caregiver(s). Nevertheless, these items were
Communication 78.7 (17.5) 29–121 deemed relevant for the BOSCC because they may be
Socialization 79.0 (12.1) 32–110 useful in the future for determining whether a BOSCC
Daily living 84.0 (13.3) 36–117 observation is valid (e.g., high scores on these items may
Motor skills 89.1 (13.8) 34–113 suggest that the BOSCC observation was not representative
MSEL (ratio) (n = 54) of the child’s typical behavior).
VIQ 62.9 (21.9) 29–123 Each BOSCC item is coded using a novel, empirically-
NVIQ 78.5 (23.7) 30–145 based decision tree, which captures detailed information
ADOS-2 (n = 55) about specific behaviors, including, for example, informa-
CSS 7.6 (2.0) 3–10 tion about a behavior’s frequency and quality (see Sup-
SA CSS 7.7 (2.1) 3–10 plementary Figure 1 for example item). At each branch of
RRB CSS 7.0 (2.1) 1–10 the decision tree, the coder answers a question about the
child’s behavior before proceeding on to the next question
n (%)
or arriving at a code. For example, the Directed Vocal-
ADOS-2 Module (Toddler or 1) 47 (85) izations item first asks whether the child directs vocaliza-
Sex (males) 44 (79) tions to another person (branch 1), then asks whether this
Racea ever occurs beyond directed echoed or highly routinized
Caucasian 34 (61) speech (branch 2), how often these more flexible directed
African American 5 (9) vocalizations occur (branch 3), in what pragmatic contexts
Other 5 (9) these occur (branches 4 and 5), and in how many activities
Ethnicityb (Hispanic) 6 (11) (branch 6). The BOSCC is coded in two 5-min segments of
Maternal educationc (4? years college) 30 (57) a 10-min video (first 5-min segment = Segment A, second
5-min segment = Segment B), with codes averaged across
ADOS-2 Autism Diagnostic Observation Schedule, 2nd Edition, CSS segments. The initial coding process relied on viewing each
Calibrated Severity Score, MSEL Mullen Scales of Early Learning,
RRB CSS Restricted, Repetitive Behavior Calibrated Severity Score, video segment (5-min) one time and then coding. Over the
SA CSS Social Affect Calibrated Severity Score, SD standard devia- course of development, this process was modified such that
tion, VABS Vineland Adaptive Behavior Scales each video segment was watched and coded twice, with the
a
Twelve participants (21 %) did not provide race information second codes deemed final and used for analyses in this
b
Four (7 %) participants did not provide ethnicity information study. Observing and coding each segment twice resulted
c
Three participants (5 %) did not provide information about mater- in greater accuracy in capturing behaviors, higher relia-
nal education bility amongst coders on individual items, and greater
confidence in coding decisions. Coding a BOSCC video
takes a trained coder about 30 min to complete.
given minimal instruction and simply told to play ‘‘how Coders of data presented here were one psychologist,
you typically would’’ with the child. one psychiatrist, one clinical psychology graduate student,
Between one and eight videos were available per child. and several research assistants. All coders were blind to the
Two or more videos were available for a subset of children child’s treatment status as well as the treatment time point.
(n = 50) with an average of 5.9 months (SD = 3.1) from Before coding independently, coders obtained inter-rater
first to last video observation. Children were between the agreement standards that the authors deemed adequate
ages of 12 and 56 months at their first observation across both segments A and B: no more than three items
(M = 29, SD = 11) and between the ages of 18 and with more than one point disagreement AND within three
62 months at their last observation (M = 35, SD = 11). points across summed totals for all items, across three
The original BOSCC coding scheme consisted of 16 consecutive videos. Training involved review of the
items coded on a 6-point scale from 0 (abnormality is not BOSCC coding scheme, practice watching and coding
present) to 5 (abnormality is present and significantly video observations, and participation in coding discussions
impairs functioning). Nine items related to social com- with reliable coders. How quickly trainees reached these
munication behaviors; one of these items was subsequently inter-rater agreement standards varied though most met
eliminated (see ‘‘Preliminary Analyses’’ section below). standards after practice coding approximately ten to twelve
One item related to play and three items related to videos. Codes from coders that were ‘‘in training’’ (had not

123
2468 J Autism Dev Disord (2016) 46:2464–2479

yet met the above inter-rater agreement standards) were See Table 1 for information about VABS Domain scores at
never included in datasets. Most coders of the BOSCC used the initial observation.
in this study (September 2015 version) had been involved
in coding that used previous versions of the BOSCC coding Cognitive Functioning
scheme while it was under development; as such, these
coders, though many were bachelor-level assistants with Children were administered either the Mullen Scales of
limited previous ASD experience, had had exposure to the Early Learning (MSEL; Mullen 1995) or the Differential
BOSCC measure over several months. In addition, coders Abilities Scales (DAS; Elliot 2007), depending on the
began training on the BOSCC at different points in the child’s ability level. The MSEL (collected from 36 children
study, each participating in coding and consensus discus- at two or more time points) provides standard scores in the
sions of videos. Codes were only used in this study from domains of expressive language, receptive language, visual
coders who had attained the inter-rater agreement standards reception (non-verbal problem-solving), and fine motor
described above. skills. The DAS provides standard scores in the domains of
A random sub-sample of videos (approximately every verbal and nonverbal cognition. Ratio IQs were calculated
6th video) was chosen for coding by multiple coders for some children due to the inability to calculate norm-
(ranging from 2 coders to 7 coders, depending on coder referenced standard scores because the child’s age excee-
availability) in order to ensure that inter-rater agreement ded standard cut-offs and/or their developmental level was
standards were retained over time and to assess inter-rater too low to be calculated using standard metrics (see Bishop
reliability (see below). During consensus meetings for et al. 2011). None of the children received the DAS at more
these multiply-coded videos, coders determined consensus than one time point. As a result, only the participants with
codes; validity data presented here uses consensus codes multiple MSEL scores were explored in analyses address-
(16 %, 28 videos) when applicable (but these codes were ing change in cognitive scores. See Table 1 for information
not used for inter-rater reliability, see below). about cognitive functioning at the first observation.

Additional Measures ASD Symptoms

As part of participation in the intervention trials, children The Autism Diagnostic Observation Schedule, 2nd Edition
completed several assessments, including assessments of (ADOS-2; Lord et al. 2012b, c) was administered to 55
cognitive functioning, adaptive functioning, and diagnostic children at one time point with most children receiving
assessments. These additional measures provided an ADOS-2 Module 1 or the Toddler Module (85 %, n = 47),
opportunity to explore the convergent validity of the while the remaining children (n = 8) received ADOS-2
BOSCC. See Table 2 for a summary of measures included. module 2. A subset of children (n = 41) received an
ADOS-2 at two or more time points, allowing for explo-
Adaptive Functioning ration of change over time. The ADOS-2 obtains infor-
mation about a diagnosis of ASD through direct
The Vineland Adaptive Behavior Scales (VABS; Sparrow observation by a clinician. All clinicians involved in
et al. 2005) was completed with the caregiver(s) of a subset administering the ADOS-2 established research reliability
of children (n = 31) at two or more time points. The on the measure prior to administration. None of the clini-
VABS is a caregiver interview of adaptive functioning that cians involved in administering/scoring of the ADOS were
provides standard scores in the domains of socialization, involved in coding of the BOSCC. The ADOS-2 provides
communication, daily living, and motor skills as well as an CSS for the algorithm total (CSS Overall) and domain
overall adaptive behavior composite standard score (ABC). severity scores in the areas of Social Affect (CSS SA) and

Table 2 Information about assessments gathered


Assessment N with C2 # of observations # of observations # Months between first and last
observations (mean) (range) observation (mean)

BOSCC 50 3.4 1–8 5.9


ADOS-2 41 2.5 1–5 5.9
MSEL 36 2.0 1–3 9.2
VABS 31 2.1 1–3 9.5
ADOS-2 Autism Diagnostic Observation Schedule, 2nd Edition, BOSCC Brief Observation of Social Communication Change, MSEL Mullen
Scales of Early Learning, VABS Vineland Adaptive Behavior Scales

123
J Autism Dev Disord (2016) 46:2464–2479 2469

Restricted and Repetitive Behavior (CSS RRB; Esler et al. distribution over the coding range for items was desirable,
2015; Gotham et al. 2009). These scores provide a cross- although not always attained. Item codes were re-written
module comparison that takes into account language level over several versions to better achieve this distribution. As
and age. See Table 1 for information about the ADOS-2 changes were made to the BOSCC while under develop-
CSS at the first observation. ment, videos were re-coded to reflect these changes. Other
studies have used a preliminary version of the BOSCC
Clinical Global Impression-Improvement (CGI) (from February 2014; Fletcher-Watson et al. 2015; Kitze-
row et al. 2015). All data in this study used an updated
The CGI is a measure used by clinicians to evaluate version of the BOSCC coding scheme (September 2015
whether an individual is responding to treatment (Busner version).
and Targum 2007). Clinicians rate the participant’s level of Using the final version of the BOSCC described in this
improvement on a 7-point scale ranging from ‘‘very much paper (September 2015 version), Fig. 1 depicts the aver-
improved’’ (1) to ‘‘very much worse’’ (7). The CGI was aged (across segment A and B) item distributions for the 12
collected on six children who participated in an interven- BOSCC items (BOSCC Core). Since many children with
tion trial at CADB, for whom we also had BOSCCs at ASD do not show all of the coding ranges for RRBs (Kim
multiple time points. None of the clinicians who rated the and Lord 2010), we did not expect normal or uniform
CGI also coded the BOSCC. distributions for the three items related to these behaviors,
namely Sensory Interests, Hand/Finger Mannerisms, and
Data Analysis Restricted/Repetitive Behaviors/Interests. Few children
were scored as having Other Abnormal Behaviors (Sup-
Preliminary Analyses plementary Figure 2) and these items were therefore not
included in subsequent analyses; however, these items
Over several versions of the BOSCC coding scheme, were retained in the BOSCC coding scheme as they may
numerous codes and coding structures were generated and provide valuable information when determining whether
tested. Given the goals of the BOSCC, a uniform the BOSCC observation provides a valid representation of

Fig. 1 Distributions for 12 core BOSCC items (averaged across segments A and B). Note Solid red represents items in the social communication
domain; stripped blue represents items in the restricted, repetitive behavior domain (Color figure online)

123
2470 J Autism Dev Disord (2016) 46:2464–2479

the child’s behavior (e.g., if child was very irritable, this Restricted/Repetitive Behaviors/Interests) were collapsed
observation may not be representative). In addition, in to 3 or 4 categories based on the item distribution and
order to ensure that no coder was coding any item signif- treated as ordinal scores. EFA was conducted using all
icantly differently than other coders, ongoing reliability available codings, which includes multiple codings from
checks of individual coders were conducted; overall, there different coders of the same video (308 total available
were no significant coding discrepancies between coders codings). Analyses were undertaken in Mplus (Muthen and
except for one coder who consistently under-scored Muthen 1998–2012) using a promax oblique rotation,
behaviors in the RRB domain. As such, Sensory Interests, taking into account the multiple codings by using the
Hand/Finger Mannerisms, and Restricted/Repetitive complex survey adjustment with the child as the cluster-
Behaviors/Interests that were coded by this coder were re- level unit. EFA of the 12 Core items, of which the last three
coded by other coders. items were treated categorically, gave eigenvalues of 5.48,
A correlation matrix of the BOSCC items was con- 1.58, and 1.05 and RMSEA values of 0.107, 0.067, and
structed which indicated that the correlation between the 0.037 for the one, two, and three factor solutions, respec-
Shared Enjoyment and Facial Expressions items exceeded tively (see Tables 3, 4).
0.7, suggesting substantial overlap in the behaviors cap- For subsequent analyses, the two-factor solution was
tured by these codes. Facial Expressions had a more uni- chosen as a plausible parsimonious fit for the data as
form distribution across the coding range (0–5) and was eigenvalues were substantially greater than 1, with a
thus retained while Shared Enjoyment was eliminated from RMSEA value under 0.07 (Browne and Cudeck 1993),
the measure and subsequent analyses. and theoretical overlaps with other two-factor solutions
In order to determine domain scores, exploratory factor found in ASD literature (Guthrie et al. 2013; Mandy et al.
analyses (EFA) were conducted for the 12 Core BOSCC 2012; Shuster et al. 2014). Factor 1, the social commu-
items (shared enjoyment removed, see above). For the nication domain, consisted of items 1–8 (SC domain).
EFA, scores for the three RRB items with skewed distri- Although some studies suggest that play is a separate
butions (Sensory Interests, Hand/Finger Mannerisms, and factor (Boomsma et al. 2008; van Lang et al. 2006), the

Table 3 Brief Observation of


Model (df) v2 test of model fit df p Eigen value RMSEA
Social Communication Change
(BOSCC) exploratory factor 1-Factor (54) 221.29 54 \0.001 5.48 0.107
analysis model comparison
2-Factor (43) 101.85 43 \0.001 1.58 0.067
3-Factor (33) 46.70 33 0.057 1.05 0.037
df degrees of freedom, RMSEA root mean square error of approximation

Table 4 1, 2, and 3-Factor model factor loadings for Brief Observation of Social Communication Change (BOSCC) items
Item name (abbreviated) 1-Factor model 2-Factor model (promax) 3-Factor model (promax)
Factor 1 Factor 1 Factor 2 Factor 1 Factor 2 Factor 3

Eye contact 0.66 0.78 -0.06 1.00 -0.16 0.11


Facial expressions 0.51 0.62 -0.09 0.59 0.05 -0.02
Gestures 0.50 0.73 -0.21 0.42 0.36 -0.27
Vocalizations 0.77 0.63 0.24 0.10 0.80 -0.05
Integration of vocal and non-vocal 0.84 0.87 0.07 0.65 0.32 0.05
Social overtures 0.79 0.71 0.16 0.51 0.34 0.09
Social responses 0.76 0.56 0.31 0.14 0.67 0.05
Engagement 0.62 0.40 0.32 -0.06 0.72 0.03
Play 0.50 0.27 0.32 -0.15 0.69 0.01
Unusual sensory interests 0.57 -0.07 0.85 0.09 -0.05 0.95
Hand/finger/body mannerisms 0.40 -0.12 0.67 -0.03 0.09 0.55
Repetitive interests/behaviors 0.58 0.17 0.55 0.06 0.32 0.38
All factor loadings C0.4 shown in bold

123
J Autism Dev Disord (2016) 46:2464–2479 2471

Fig. 2 BOSCC items, domains,


and total. Note RRB restricted,
repetitive behavior/interest

play item, which cross-loaded both on factor 1 and 2, was variance component to the sum of common and error
placed in the RRB domain (items 9–12) for subsequent variances, with confidence intervals obtained using the
analyses due to item content that most closely related to delta-method. For the three skewed items (Sensory Inter-
play with materials rather than social aspects of play. The ests, Hand/Finger Mannerisms, and Restricted/Repetitive
two domains (SC and RRB) will be referred to in sub- Behaviors/Interests) these results should be interpreted
sequent analyses as well as the Core total (items 1–12; cautiously. ICCs for individual items (summed from seg-
see Fig. 2). As described above, the three items related to ment A and B) were also calculated.
Other Abnormal Behaviors were not included due to the
rare presentation of these behaviors in this sample of Test–Retest Reliability For estimation of test–retest reli-
children. ability, a test–retest sub-sample of 40 videos from 20
individuals that were gathered on two occasions less than
Primary Statistical Analyses one-month apart were randomly assigned to available
coders. ICCs on domains (defined from the EFA) and the
Inter-rater Reliability Sums for items in the factors (do- Core total were estimated as described above. ICCs were
mains) defined by the EFA results were calculated as well as also estimated for individual items (summed from segment
a sum for Core items (1–12). As described above, approx- A and B).
imately every 6th video (28 videos) was coded by multiple
coders; these double codings were used to obtain estimates Validity To assess the validity of the BOSCC as a mea-
for inter-rater reliability by randomly selecting two coders sure of relevant change, first, paired t tests with a = 0.05
when more than two coders (up to 7 coders) coded a video. and effect sizes of changes (Cohen’s D; the mean differ-
Consensus codes (mutually agreed upon codes for multiply- ence between first and last observation divided by the
coded videos) for these 28 (16 %) videos were not used for pooled standard deviation) were used to examine whether
inter-rater reliability. Rather, original codes from two ran- significant amounts of change in BOSCC and ADOS-2
domly selected coders were used for this purpose. Intraclass CSS scores were present from the first to last observation.
correlations (ICCs) for inter-rater reliability on domains Second, individual change models were fitted to all the
(SC and RRB defined from the EFA) and Core total (items available data on each child for the BOSCC Core total
1–12) were obtained from linear mixed models (xtmixed in (items 1–12), the VABS communication score, the MSEL
Stata 14). ICCs were the square root of the ratio of common receptive language score, and the ADOS-2 CSS (treated as

123
2472 J Autism Dev Disord (2016) 46:2464–2479

a 10-point ordinal scale) in order to include the multiple Finally, in order to explore whether decreases in the
observations available on the same individual (see BOSCC domain scores align with clinician’s impressions
Table 2). These analyses were also conducted on the of improvement, BOSCC scores for six children partici-
BOSCC SC (items 1–8) domain. Specifically, for each pating in an early intervention trial at CADB (DeGeorge
participant in turn, a linear regression was fitted and the et al. in preparation) were separated into responders and
coefficient associated with the age at assessment was used non-responders based on CGI scores. Specifically, four
as the average rate of change score for that participant. To children received CGI scores of ‘‘much improved’’ (re-
assist comparison for each measure we standardized the sponders) while two children received CGI scores of ‘‘no
expected change over 6 months by its standard deviation at change’’ (non-responders). No statistical analyses were
baseline. This can be thought of as the effect size (Cohen’s conducted on these six children due to small sample size.
D) that would have been obtained using each measure had
these intervention children all been followed for 6 months
from baseline, and compared to a randomized control Results
group showing no change. A comparison of these effect
sizes for the ADOS-2 CSS and BOSCC was constructed Inter-rater Reliability
using the delta-method following a multivariable regres-
sion. Third, correlations of cross-sectional and change The estimated inter-rater reliability from the 28 videos that
scores across these measures were conducted to assess were coded by multiple coders (two randomly selected
convergent validity. Fourth, discriminant validity and coders) was excellent for SC and RRB domains, as well as
coding contamination from maternal education and family for the Core Total, with ICCs ranging from 0.97, 95 % CI
income was tested by examining their association with [0.94–0.99], to 0.98, 95 % CI [0.96–0.99]. ICCs of indi-
BOSCC scores when included as fixed predictors within a vidual items (sums across segment A and B) ranged from
mixed effects model for the repeated BOSCCs. 0.72, 95 % CI [0.53–0.91] to 0.96, 95 % CI [0.93–0.99]
(see Supplementary Table 1).
Post-hoc Analyses Given the phenotypic heterogeneity of
ASD, it was expected that not all children would respond to
treatments (Rogers and Vismara 2008; Spence and Thurm Test–Retest Reliability
2010). Therefore, responders and non-responders were
identified based on changes from first to last observation on Using a sub-set of children (n = 20) with two video
the basis of other measures of social and communication observations separated by less than one month (40 videos
skills used as outcomes in previous studies (MSEL, VABS, total), the estimated test–retest reliabilities (ICCs) were
ADOS; Dawson et al. 2010; Wetherby et al. 2014). First, high: 0.89, 95 % CI (0.77, 0.98), for the social-communi-
responders were defined based on MSEL Receptive Lan- cation domain, 0.79, 95 % CI [0.62, 0.96], for the RRB
guage and, second, based on VABS Communication domain, and 0.90, 95 % CI [0.81, 0.98], for the Core total.
Standard Scores, consistent with changes observed in ICCs of individual items (sums across segment A and B)
recent intervention trials (Dawson et al. 2010; Wetherby ranged from 0.44, 95 % CI [-0.05, 0.92] to 0.89, 95 % CI
et al. 2014). Specifically, children who demonstrated an [0.81, 0.98] (see Supplementary Table 2).
increase in MSEL Receptive Language standard score of
C5 points (1/2 standard deviation) were defined as Validity
responders (n = 15) while the remaining children were
defined as non-responders (n = 21). Using the VABS First, results of paired t tests indicated that from first to last
standard Communication score, children were defined as BOSCC observation (n = 50), statistically significant
responders if they demonstrated an increase of C8 points changes were found in the Core total (M = -2.53,
(1/2 standard deviation; n = 16), while the remaining SD = 8.01), [t(49) = 2.23, p \ 0.05], corresponding to an
children were defined as non-responders (n = 15). Third, effect size of 0.26, although changes in the separate SC and
children were defined as responders if they demonstrated RRB domains were not statistically significant. Paired
an ADOS-2 CSS score decrease of C1 point (1 standard t tests from first to last ADOS-2 observation (n = 41)
deviation; n = 16), while the remaining children were indicated that there were no statistically significant changes
defined as non-responders (n = 25). Convergent validity in ADOS-2 CSS (M = -0.29, SD = 1.75, d = 0.15),
was assessed using t tests comparing the amount of change ADOS-2 SA CSS (M = -0.42, SD = 1.91, d = 0.21), or
in BOSCC SC and RRB domains and Core Totals between ADOS-2 RRB CSS (M = 0.42, SD = 1.84, d = 0.20). See
responder and non-responder groups as defined by the Table 1 for amount of time between first and last ADOS
above definitions on these measures. and BOSCC observations.

123
J Autism Dev Disord (2016) 46:2464–2479 2473

Second, results from individual growth curve models domain (t(29) = 2.51, p \ 0.05) and the BOSCC Core
indicated that the average rate of change in the ADOS-2 Total (t(29) = 2.40, p \ 0.05) than the VABS non-re-
CSS score over 6 months was 0.33, which corresponded to sponder group. In contrast, BOSCC domains and the
an effect size of -0.15, 95 % CI [-0.44, 0.15]. The BOSCC Core total did not statistically differ in the ADOS-
average rate of change in the BOSCC Core Total over 2 CSS responder and non-responder groups (non-signifi-
6 months was -4.2, corresponding to an effect size of cant results).
-0.37, 95 % CI [-0.73, -0.01]. Corresponding values for As shown in Fig. 4, with the exception of the BOSCC
the BOSCC SC domain score were -3.4 with an effect size RRB domain, from first to last time point, BOSCC scores
of -0.38, 95 % CI [-0.81, 0.05]. Though the effect sizes for the CGI responders consistently decreased more than
were larger for the BOSCC, a comparison of the difference the CGI non-responders. Figure 4 is provided for illustra-
in effect sizes between changes in BOSCC Core Total and tive purposes since no statistical analyses were conducted
changes in ADOS-2 CSS indicated no statistically signifi- on these groups given the small sample size.
cant difference (p = 0.35). However, the effect size of the
BOSCC Core total was statistically different from a no
change alternative (p \ 0.05) while the effect sizes of the Discussion
ADOS-2 CSS and BOSCC SC domain were not statisti-
cally different from a no change alternative (p = 0.33 and Results of these initial analyses suggest that the BOSCC is a
p = 0.08, respectively). promising outcome measure that is sensitive to subtle
Third, in cross-sectional correlations, the BOSCC Core changes in social communication behaviors over time. To
total and the ADOS-2 CSS score were strongly associated our knowledge, the BOSCC is the first briefly assessed,
(Pearson correlation of 0.48, cluster robust p \ 0.001). observation-based measure of treatment response specific to
When correlating change scores to assess convergent a broad range of social communication behaviors. A two-
validity, the MSEL Receptive Language and VABS factor model, consistent with other models of ASD symp-
Communication Standard scores showed highly correlated toms, supporting a social communication domain separate
change scores (r = 0.69, p \ 0.001). For the ADOS-2 CSS from RRBs (Guthrie et al. 2013; Mandy et al. 2012; Shuster
change score, evidence for convergent validity with the et al. 2014), fitted the item-data satisfactorily. The separation
MSEL Receptive Language and the VABS Communication of the two domains allows future researchers to explore
Standard score was neither significant nor consistent, while changes in social communication skills in children with
for the BOSCC Core total, correlations were in the social-communication impairments who do not necessarily
expected direction and, in the case of the MSEL Receptive have RRBs or meet criteria for ASD. Analyses of the psy-
Language, approached significance (r = -0.35, p = 0.05). chometric properties of the BOSCC indicate that the BOSCC
The correlation of ADOS-2 CSS to change in ADOS-2 has excellent inter-rater reliability and high test–retest reli-
CSS was 0.28 (p = 0.08) and of the BOSCC Core total to ability, meeting recommended standards (Cunningham
change in BOSCC Core total was -0.37 (p = 0.08). 2012) and consistent with other work using an earlier version
Fourth, results of discriminant validity analyses indi- of the measure (Kitzerow et al. 2015).
cated no associations of maternal education or family Results indicate that changes in BOSCC scores over a
income with the BOSCC social communication domain 6-month period demonstrated small to medium effect sizes,
(v2(2) = 1.94, p = 0.38), RRB (v2(2) = 1.75, p = 0.42) though the effect size varied a little depending on the
domain or the BOSCC Core Total (v2(2) = 1.53, statistical method used. Although these changes were not
p = 0.47). There was also no association of maternal statistically different than the effect sizes of change seen in
education and family income with the ADOS-2 CSS the ADOS-2 CSS, the effect size itself, considering the
(v2(2) = 3.40, p = 0.18). small sample size, is promising. In addition, the BOSCC
scores demonstrated statistically significant changes over
Post-hoc Analyses time while the ADOS-2 CSS scores did not, when com-
pared to a no change alternative. The BOSCC may be more
T tests comparing the amount of change in BOSCC scores sensitive to changes in social communication behavior than
between the groups indicated that the MSEL responder the ADOS-2 CSS, and hence more successful in identifying
group demonstrated significantly more change in the changes in response to treatments over shorter periods of
BOSCC SC domain (t(34) = 3.04, p \ 0.01) and the time (Brian et al. 2015; Dawson et al. 2010; Shumway
BOSCC Core total (t(34) = 3.58, p \ 0.01) than the et al. 2012; Thurm et al. 2015). Additional work is clearly
MSEL non-responders group (See Fig. 3). Results of t tests needed to confirm this hypothesis.
also indicated that the VABS responder group demon- This work is the first indication that the BOSCC has
strated significantly more change in the BOSCC RRB convergent validity with social communication changes

123
2474 J Autism Dev Disord (2016) 46:2464–2479

Fig. 3 Responder groups defined by MSEL, VABS, or ADOS-2 in Calibrated Severity Score, MSEL Mullen Scales of Early Learning,
early intervention studies. Note *p \ 0.05, **p \ 0.01; ADOS-2 n.s. not significant, RRB Restricted and Repetitive Behaviors BOSCC
Autism Diagnostic Observation Schedule, 2nd Edition, BOSCC Brief Domain, SC Social Communication BOSCC domain, VABS Vineland
Observation of Social Communication Change, CSS ADOS Adaptive Behavior Scales

Fig. 4 Responder groups defined by Clinical Global Impression Impression-Improvement, RRB Restricted and Repetitive Behaviors
(CGI) in community-based intervention study. BOSCC Brief Obser- BOSCC domain, SC Social Communication BOSCC domain
vation of Social Communication Change, CGI Clinical Global

seen in other measures, including a caregiver report mea- Core total and the ADOS-2 CSS score were highly corre-
sure (VABS) and a standardized cognitive measure lated with each other, although there was not a significant
(MSEL). There is also some preliminary evidence of correlation between change in the BOSCC Core total and
convergent validity with a clinician’s impression of change in the ADOS-2 CSS score. These findings suggest
improvement (CGI) in a very small sample. The BOSCC that the BOSCC may be measuring behaviors, especially

123
J Autism Dev Disord (2016) 46:2464–2479 2475

subtle behaviors that are improving, differently than the variation in between) or that subtle variations in these
ADOS-2 CSS. Alternatively, this finding may be related to behaviors are difficult to capture, especially within a 5-min
the limited range of change or limited range of scores time frame. Though still adequate, the RRB domain score
overall in the ADOS-2 CSS scores, consistent with other demonstrated lower test–retest reliability than the SC
studies (Dawson et al. 2010). domain, consistent with earlier iterations of the BOSCC
In contrast, correlations of BOSCC change scores with (Kitzerow et al. 2015) and the ADOS, from which initial
changes in receptive language (MSEL) and communication drafts of these items were developed. As mentioned, it was
skills (VABS) were, though not statistically significant, in the BOSCC Core total (combining SC and RRB domains)
the expected direction, indicating some evidence for con- that was most successful in identifying changes, indicating
vergent validity. The lack of a statistically significant the importance of these behaviors in combination with the
correlation with changes in the MSEL or VABS is not SC behaviors, at least in this ASD sample. Perhaps this is a
discouraging because we would not necessarily expect the result of the strong relationship between these domains in
BOSCC to correlate highly with the MSEL and VABS; the the ASD population (Richler et al. 2010). The BOSCC
BOSCC is a more global measure of social-communicative RRB domain may not prove to be a useful domain in which
skills than either the MSEL Receptive Language domain or to measure change on its own but additional studies are
the VABS Communication domain. Yet, when children needed. In the meantime, it may be helpful to use other
were defined as either responders or non-responders based measures of RRB behaviors to complement the BOSCC,
on the VABS and MSEL, BOSCC scores decreased sig- such as the Repetitive Behavior Scale-Revised (RBS-R;
nificantly more in responders than non-responders. This Lam and Aman 2007). Although there are biases with the
further suggests that although correlations of the BOSCC reliance on such a likely unblinded caregiver response
and other measures of communication did not reach sta- measure, concordance with the BOSCC may prove useful
tistical significance, the possibility is good for some con- in both providing validity for the BOSCC and in con-
vergence with measures of change when samples are firming the presence of meaningful change in caregiver
larger. reports.
It should be noted that, despite the significant correlation In line with the goals for development, research assis-
between the BOSCC and ADOS-2 CSS score, the BOSCC tants can reliably code the BOSCC (coding does not
is not intended to be a measure of diagnostic classification. require a highly experienced or credentialed coder), unlike
Rather, the BOSCC was developed to capture nuanced other commonly used measures (Bolte and Diehl 2013). In
social communication behaviors that may change over fact, our group has been successful at training several
relatively brief periods of time. This distinction is impor- undergraduate-level research assistants as well as one
tant to prevent misuse of this new measure. This also highly motivated high school student to code the BOSCC
highlights that BOSCC scores at any single time point are reliably. Since the BOSCC measures changes within an
only meaningful in relation to another time point; a individual, high levels of agreement amongst coders in a
BOSCC score at one time point cannot stand on its own. coding team are particularly crucial, though agreement
When considering the importance of the two BOSCC across sites is less important (unlike reliability training for
domains, improvements (decreases) in the BOSCC Core the ADOS, for example). The high inter-rater agreements
total (items 1–12, combining social communication and in our group suggest that this level of agreement is possible
RRB domains) most consistently converged with across a coder’s level of experience, though one experi-
improvement (increases) in other standard measures of enced coder (a child psychiatry fellow) in our group tended
communication (VABS, MSEL), while changes in the to consistently under-score behaviors in the RRB domain.
separate BOSCC SC and RRB domains was less consistent. The BOSCC may initially be more challenging for some-
Although the separate SC and RRB domains may prove one who has more advanced training or experience, par-
useful in non-ASD populations or when assessing change ticularly in a specific framework, though this remains to be
specific to one domain, this work suggests that the BOSCC thoroughly explored. Also in line with goals of develop-
Core total may be the most appropriate domain to identify ment, the BOSCC does not rely on caregiver report of
improvement in young, minimally verbal children with symptoms, minimizing measurement bias (Anagnostou
ASD. This needs to be confirmed in future work with larger et al. 2015; Bolte and Diehl 2013; Guastella et al. 2015)
samples. and allowing truly ‘‘blinded’’ coding. In addition, the
Of note, only three items on the BOSCC attempt to BOSCC’s minimally structured, naturalistic context places
capture RRB behaviors across a continuum. Item distri- little demand on administration and contributes to the
butions indicated that obtaining a continuum for these measure’s ecological validity. Through the use of video
behaviors was challenging. It may be that these behaviors coding, coders have more time to consider behaviors
are either clearly present or not present at all (with little without the pressure of assessing every behavior quickly,

123
2476 J Autism Dev Disord (2016) 46:2464–2479

as in live coding situations. Despite the advantage of video assessing the impact of the context in which the BOSCC is
coding, our group eventually aims to explore the utility of gathered is clearly warranted.
the BOSCC in live coding situations, as this method would Although the initial results of the BOSCC are promising,
not require video cameras or adequate audio/visual they should be interpreted in light of several limitations of
recordings. this project, including the small sample size. This study
Given the subtlety of social communication behaviors focuses on a sample of 56 young children with ASD, with
that the BOSCC measures, it is currently recommended even smaller samples of children with multiple observa-
that each BOSCC video segment (5 min) be viewed twice tions of other measures (e.g., VABS, MSEL, ADOS-2)
and the second set of codes be considered final for inter- used for convergent validity. Our small sample also did not
pretation. This method takes approximately 30 min per allow for analyses of differences by sex, race, or ethnicity.
video. Although our group found little difference in aver- All children included in this paper used simple phrase
aged totals between the first and second set of codes (data speech or less and the majority had completed the Toddler
not presented), changes at the item level were present. In Module or Module 1 of the ADOS-2. A subsample of eight
addition, coders reported having more confidence in their children completed module 2 of the ADOS-2. It may be
coding after their second viewing, suggesting the need to that this version of the BOSCC coding scheme is not
continue this practice. maximally effective at capturing change in this more verbal
Another aspect to consider in relation to the BOSCC is (module 2, phrase speech) group. Future work will address
that any changes in a child’s behavior during an interaction whether modifications to the BOSCC coding scheme are
with a caregiver must be considered in light of changes in necessary to capture adequate change in children using
the caregiver’s behavior. Parent–child interaction is often phrase speech. Also, test–retest reliability ICCs were high
described as bi-directional—the child’s behaviors impact in domain and total scores, but there was some variability
the parent and vice versa (Ginn et al. 2015; Rutgers et al. amongst item-level reliabilities. Though we do not rec-
2004; Siller and Sigman 2008; Slaughter and Ong 2014; ommend the use of BOSCC items individually, it is pos-
Zhou and Yi 2014). A recent parent-focused intervention sible that one month between observations may be too long
study found that changes in ASD symptoms, as measured to adequately assess test–retest reliability on the BOSCC.
by the ADOS-2 CSS, were mediated by parental synchrony In addition, this paper did not explore the effects of
(Pickles et al. 2015). Similarly, work has also shown that specific treatment or control conditions. We hope to expand
children’s language development may be influenced by a this work to a larger sample comparing different treatment
parent’s responsiveness during play interactions (Siller and conditions, employing the BOSCC as an independent mea-
Sigman 2008). Another study found a high correlation sure of treatment response. It is also important to consider the
between the quality of the parent–child interaction and the limited endorsement of other abnormal behaviors in this
child’s ASD severity (using the ADOS-2 CSS; Hobson context of free-play with a parent. Nevertheless, other
et al. 2015). Our study did not assess whether the care- researchers may want to consider these items in future
giver’s behavior significantly impacted the child’s BOSCC analyses; these behaviors may impact social communication
scores or if the child’s severity of ASD or other behaviors and RRB behaviors captured in other codes or be more
impacted the caregiver’s behavior. Given these potential common in other contexts. These behaviors may also be
confounds, some researchers may choose to have an useful in determining whether the BOSCC observation is a
examiner who is blind to the child’s treatment status valid representation of the child’s behavior.
interact with the child during the BOSCC. If the caregiver Although this study focused on a sample of children
is chosen as a BOSCC partner, researchers should consider with diagnoses of ASD, future work should also address
collecting additional measures of generalization and/or whether the BOSCC can capture changes in children with
caregiver behaviors that may contribute to observed social communication deficits who do not have ASD (e.g.,
changes in the child’s behavior (Pickles et al. 2015). Pre- social/pragmatic communication disorder, social anxiety
vious work has emphasized the importance of the context disorder). Our group is also working on several lines of
in which changes are assessed (Yoder et al. 2013), there- research related to the development of the BOSCC,
fore whichever social and environmental context is chosen including applying the BOSCC to school-age children who
for the BOSCC observation, the context should be as have limited speech and expanding the BOSCC to indi-
consistent as possible (e.g., same play partner, same viduals with verbal fluency. Researchers outside our group
materials, same location) in order to ensure the validity of have successfully applied an earlier version of the BOSCC
the observations gathered. At the same time, measures that to segments of ADOS-2 videos in a small sample (Kitze-
go beyond a single context are clearly necessary to ensure row et al. 2015). We aim to confirm the validity of this
generalization of skills gained in treatment. Future work method in future research, which would allow researchers

123
J Autism Dev Disord (2016) 46:2464–2479 2477

to explore pre- and post-treatment ADOS-2 videos from Medical College and Teachers College, Columbia University, Mari-
previously collected data. lyn and James Simons Family Giving, and the UK National Institute
for Health Research (NIHR) Biomedical Research Centre at South
Our ongoing work and the work of other researchers London and Maudsley NHS Foundation Trust and King’s College
(Fletcher-Watson et al. 2015; Kitzerow et al. 2015) will London. The views expressed are those of the authors and not nec-
continue to provide larger samples across multiple sites in essarily those of the UK NHS, the NIHR or the Department of Health.
order to contribute to our continued understanding of the
Compliance with Ethical Standards
value and limitations of the BOSCC. Because the BOSCC
is new and additional testing of its ability to capture Conflict of interest C.L. receives royalties from the sale of the
meaningful change needs to be completed, we recommend ADI-R and the ADOS-2. All royalties related to the research were
that the BOSCC be used in combination with other mea- donated to a non-profit organization. No other authors have conflicts
of interest with regard to this study.
sures of change. This is consistent with recommendations
from other researchers endorsing multiple means of Ethical Approval All procedures performed in studies involving
assessing treatment outcome (Cunningham 2012). The human participants were in accordance with the ethical standards of
utility of the BOSCC, which measures broad ASD symp- the institutional and/or national research committee and with the 1964
Helsinki declaration and its later amendments or comparable ethical
toms, may be strengthened when used in combination with standards.
other measures that assess more specific behaviors (e.g.,
joint attention) in detail. Also, the BOSCC may be useful in Informed Consent Informed consent was obtained from all indi-
clarifying potential placebo effects often found in caregiver vidual participants included in the study.
reports, allowing for more effective use of parent report
measures. As the field focuses efforts on finding appro-
priate outcome measures for longitudinal studies and ran- References
domized controlled trials, we look forward to the continued
validation of measures such as the BOSCC that will Anagnostou, E., Jones, N., Huerta, M., Halladay, A. K., Wang, P.,
hopefully provide unique, objective observational data to Scahill, L., et al. (2015). Measuring social communication
behaviors as a treatment endpoint in individuals with autism
aid in assessing the efficacy and course of treatments aimed spectrum disorder. Autism, 19(5), 622–636. doi:10.1177/13623
at improving social communication skills. 61314542955.
Anderson, D. K., Liang, J. W., & Lord, C. (2014). Predicting young
Acknowledgments This work was supported by a Dennis adult outcome among more and less cognitively able individuals
Weatherstone Predoctoral Fellowship from Autism Speaks and a with autism spectrum disorders. Journal of Child Psychology
Graduate Student fellowship with Weill Cornell Medical College and and Psychiatry, 55(5), 485–494. doi:10.1111/jcpp.12178.
Teachers College, Columbia University awarded to author R.G. Work Bishop, S. L., Guthrie, W., Coffing, M., & Lord, C. (2011).
for this project was also supported by Grants awarded to author C.L. Convergent validity of the mullen scales of early learning and
from NIMH (R01MH081757, 1RC1MH089721, R01RFAMH14100, the differential ability scales in children with Autism Spectrum
R01MH078165), Autism Speaks (5766), and HRSA (UA3MC11055) Disorders. American Association on Intellectual and Develop-
and author K.M. from Marilyn and James Simons Family Giving and mental Disabilities, 116(5), 331–343. doi:10.1352/1944-7558-
a NIH T32 (5T32MH016434-35). In addition, this work was partially 116.5.331.
funded by the UK National Institute for Health Research (NIHR) Bolte, E. E., & Diehl, J. J. (2013). Measurement tools and target
Biomedical Research Centre at South London and Maudsley NHS symptoms/skills used to assess treatment response for individ-
Foundation Trust and King’s College London. The views expressed uals with autism spectrum disorder. Journal of Autism and
are those of the authors and not necessarily those of the UK NHS, the Developmental Disorders, 43(11), 2491–2501. doi:10.1007/
NIHR or the Department of Health. The authors would like to sin- s10803-013-1798-7.
cerely thank Catherine Dick, Kyle Frost, Michelle Heyman, Natalie Boomsma, A., Van Lang, N. D., De Jonge, M. V., De Bildt, A. A.,
Hong, and Sophie Manevich for assistance with data coding and Sheri Van Engeland, H., & Minderaa, R. B. (2008). A new symptom
Stegall at Western Psychological Services for copyright assistance. model for autism cross-validated in an independent sample.
Journal of Child Psychology and Psychiatry, 49(8), 809–816.
Author Contribution R.G. participated in study conceptualization, doi:10.1111/j.1469-7610.2008.01897.x.
measure development, data coding, analysis, interpretation, and Brian, J., Smith, I., Zwaigenbaum, L., Roberts, W., & Bryson, S.
manuscript preparation. T.C. and C.C. participated in conceptualiza- (2015). The social ABCs caregiver-mediated intervention for
tion and development of the Brief Observation of Social Communi- toddlers with autism spectrum disorder: feasibility, acceptability,
cation Change (BOSCC). K.M. and S.D. participated in study and evidence of promise from a multisite study. Autism
conceptualization, measure development and data coding. A.P. Research,. doi:10.1002/aur.1582.
assisted with data analyses and interpretation. C.L. participated in Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing
study conceptualization, measure development, data analysis and model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural
interpretation, and manuscript preparation. equation models (pp. 136–162). Newbury Park, CA: Sage.
Busner, J., & Targum, S. D. (2007). The Clinical Global Impression
Funding This study was funded by NIMH (R01MH081757, Scale. Psychiatry, 4(7), 28–37.
1RC1MH089721, R01RFAMH14100, R01MH078165), NIH Chambless, D. L., & Hollon, S. D. (1998). Defining empirically
(5T32MH016434-35), Autism Speaks (5766, 9650), HRSA supported therapies. Journal of Consulting and Clinical Psy-
(UA3MC11055), a Graduate Student fellowship with Weill Cornell chology, 66(1), 7–18.

123
2478 J Autism Dev Disord (2016) 46:2464–2479

Cunningham, A. (2012). Measuring change in social interaction skills Hobson, J. A., Tarver, L., Beurkens, N., & Peter Hobson, R. (2015).
of young children with autism. Journal of Autism and Develop- The relation between severity of autism and caregiver–child
mental Disorders, 42, 593–605. interaction: a study in the context of relationship development
Danial, J. T., & Wood, J. J. (2013). Cognitive behavioral therapy for intervention. Journal of Abnormal Child Psychology,. doi:10.
children with autism: review and considerations for future 1007/s10802-015-0067-y.
research. Journal of Developmental and Behavioral Pediatrics, Hus, V., Bishop, S., Gotham, K., Huerta, M., & Lord, C. (2013).
34(9), 702–715. doi:10.1097/DBP.0b013e31829f676c. Factors influencing scores on the social responsiveness scale.
Dawson, G., Rogers, S., Munson, J., Smith, M., Winter, J., Greenson, Journal of Child Psychology and Psychiatry, 54(2), 216–224.
J., et al. (2010). Randomized, controlled trial of an intervention doi:10.1111/j.1469-7610.2012.02589.x.
for toddlers with autism: the Early Start Denver Model. Kaale, A., Smith, L., & Sponheim, E. (2012). A randomized controlled
Pediatrics, 125(1), e17–e23. doi:10.1542/peds.2009-0958. trial of preschool-based joint attention intervention for children
DeGeorge, A., Dufek, S., & Lord, C. (in preparation). Effects of with autism. Journal of Child Psychology and Psychiatry, 53(1),
parent mediated coaching versus psyhoeducation on social 97–105. doi:10.1111/j.1469-7610.2011.02450.x.
communication development of underserved young children Kasari, C., Gulsrud, A., Freeman, S., Paparella, T., & Hellemann, G.
with autism: A pilot study. (2012). Longitudinal follow-up of children with autism receiving
Elliot, C. D. (2007). Differential ability scales—second edition. New targeted interventions on joint attention and play. Journal of the
York: Harcourt Brace Jovanovich. American Academy of Child and Adolescent Psychiatry, 51(5),
Esler, A. N., Bal, V. H., Guthrie, W., Wetherby, A., Ellis Weismer, S., 487–495. doi:10.1016/j.jaac.2012.02.019.
& Lord, C. (2015). The autism diagnostic observation schedule, Kasari, C., Gulsrud, A., Wong, C., Kwon, S., & Locke, J. (2010).
toddler module: standardized severity scores. Journal of Autism Randomized controlled caregiver mediated joint engagement
and Developmental Disorders, 45(9), 2704–2720. doi:10.1007/ intervention for toddlers with autism. Journal of Autism and
s10803-015-2432-7. Developmental Disorders, 40(9), 1045–1056. doi:10.1007/s1080
Estes, A., Munson, J., Rogers, S., Greenson, J., Winter, J., & Dawson, 3-010-0955-5.
G. (2015). Long-term outcomes of early intervention in 6-year- Kim, S. H., & Lord, C. (2010). Restricted and repetitive behaviors in
old children with autism spectrum disorder. Journal of the toddlers and preschoolers with autism spectrum disorders based
American Academy of Child and Adolescent Psychiatry, 54(7), on the Autism Diagnostic Observation Schedule (ADOS).
580–587. Autism Research, 3(4), 162–173. doi:10.1002/aur.142.
Fletcher-Watson, S., & McConachie, H. (2015). The search for an Kitzerow, J., Teufel, K., Wilker, C., & Freitag, C. M. (2015). Using
early intervention outcome measurement tool in autism. Focus the brief observation of social communication change (BOSCC)
on Autism and Other Developmental Disabilities. doi:10.1177/ to measure autism-specific development. Autism Research,.
1088357615583468. doi:10.1002/aur.1588.
Fletcher-Watson, S., Petrou, A., Scott-Barrett, J., Dicks, P., Graham, Lam, K. S., & Aman, M. G. (2007). The Repetitive Behavior Scale-
C., O’Hare, A., et al. (2015). A trial of an iPad intervention Revised: Independent validation in individuals with autism
targeting social communication skills in children with autism. spectrum disorders. Journal of Autism and Developmental
Autism. doi:10.1177/1362361315605624. Disorders, 37(5), 855–866. doi:10.1007/s10803-006-0213-z.
Ginn, N. C., Clionsky, L. N., Eyberg, S. M., Warner-Metzger, C., & Levy, S., Mandell, D., & Schultz, R. (2009). Autism. Lancet, 374,
Abner, J. P. (2015). Child-directed interaction training for young 1627–1638.
children with autism spectrum disorders: Parent and child Lord, C., Bishop, S., & Anderson, D. (2015). Developmental
outcomes. Journal of Clinical Child and Adolescent Psychol- trajectories as autism phenotypes. American Journal of Medical
ogy,. doi:10.1080/15374416.2015.1015135. Genetics Part C: Seminars in Medical Genetics, 169(2),
Gotham, K., Pickles, A., & Lord, C. (2009). Standardizing ADOS 198–208. doi:10.1002/ajmg.c.31440.
scores for a measure of severity in autism spectrum disorders. Lord, C., Luyster, R. J., Gotham, K., & Guthrie, W. (2012a). Autism
Journal of Autism and Developmental Disorders, 39(5), Diagnostic Observation Schedule, second edition (ADOS-2) toddler
693–705. doi:10.1007/s10803-008-0674-3. module. Los Angeles, CA: Western Psychological Services.
Gotham, K., Pickles, A., & Lord, C. (2012). Trajectories of autism Lord, C., Luyster, R., Guthrie, W., & Pickles, A. (2012b). Patterns of
severity in children using standardized ADOS scores. Pediatrics, developmental trajectories in toddlers with autism spectrum
130(5), e1278–e1284. doi:10.1542/peds.2011-3668. disorder. Journal of Consulting and Clinical Psychology, 80(3),
Green, J., Charman, T., McConachie, H., Aldred, C., Slonims, V., 477–489. doi:10.1037/a0027214.
Howlin, P., et al. (2010). Parent-mediated communication- Lord, C., Rutter, M., & Couteur, A. (1994). Autism diagnostic
focused treatment in children with autism (PACT): A ran- interview-revised: A revised version of a diagnostic interview for
domised controlled trial. Lancet, 375(9732), 2152–2160. doi:10. caregivers of individuals with possible pervasive developmental
1016/S0140-6736(10)60587-9. disorders. Journal of Autism and Developmental Disorders,
Guastella, A. J., Gray, K. M., Rinehart, N. J., Alvares, G. A., Tonge, 24(5), 659–685. doi:10.1007/BF02172145.
B. J., Hickie, I. B., et al. (2015). The effects of a course of Lord, C., Rutter, M., DiLavore, P. C., Risi, S., Gotham, K., & Bishop,
intranasal oxytocin on social behaviors in youth diagnosed with S. L. (2012c). Autism Diagnostic Observation Schedule, second
autism spectrum disorders: a randomized controlled trial. edition (ADOS-2) modules 1–4. Los Angeles, CA: Western
Journal of Child Psychology and Psychiatry, 56(4), 444–452. Psychological Services.
doi:10.1111/jcpp.12305. Mandy, W. P., Charman, T., & Skuse, D. H. (2012). Testing the
Guthrie, W., Swineford, L. B., Wetherby, A. M., & Lord, C. (2013). construct validity of proposed criteria for DSM-5 autism
Comparison of DSM-IV and DSM-5 factor structure models for spectrum disorder. Journal of the American Academy of Child
toddlers with autism spectrum disorder. Journal of the American and Adolescent Psychiatry, 51(1), 41–50. doi:10.1016/j.jaac.
Academy of Child and Adolescent Psychiatry, 52(8), 797–805 2011.10.013.
e792. doi:10.1016/j.jaac.2013.05.004. Matson, J. (2007). Determining treatment outcome in early interven-
Gutstein, S., Burgess, A., & Montfort, K. (2007). Evaluation of the tion programs for autism spectrum disorders: A critical analysis
relationship development intervention program. Autism, 11(5), of measurement issues in learning based interventions. Research
397–411. in Developmental Disabilities, 28, 207–218.

123
J Autism Dev Disord (2016) 46:2464–2479 2479

McConachie, H., Parr, J., Glod, M., Hanratty, J., Livingstone, N., disorders. Journal of Autism and Developmental Disorders,
Oono, I., et al. (2015). Systematic review of tools to measure 44(1), 90–110. doi:10.1007/s10803-013-1854-3.
outcomes for young children with autism spectrum disorder. Siller, M., & Sigman, M. (2008). Modeling longitudinal change in the
Health Technology Assessment, 19(41), 1–506. language abilities of children with autism: Parent behaviors and
Mullen, E. M. (1995). Mullen scales of early learning. Circle Pines, child characteristics as predictors of change. Developmental
MN: American Guidance Service. Psychology, 44(6), 1691–1704.
Muthen, B., & Muthen, L. (1998–2012). Mplus User’s Guide. (7th Slaughter, V., & Ong, S. S. (2014). Social behaviors increase more
ed.). Los Angeles, CA: Muthen and Muthen. when children with ASD are imitated by their mother versus an
Owley, T., McMahon, W., Cook, E. H., Laulhere, T., South, M., unfamiliar adult. Autism Research, 7(5), 582–589. doi:10.1002/
Mays, L. Z., et al. (2001). Multisite, double-blind, placebo- aur.1392.
controlled trial of porcine secretin in autism. Journal of the Sparrow, S. S., Cicchetti, D. V., & Balla, D. A. (2005). Vineland
American Academy of Child and Adolescent Psychiatry, 40(11), adaptive behavior scales, (Vineland-II). Circle Pines, MN:
1293–1299. doi:10.1097/00004583-200111000-00009. American Guidance Services.
Pickles, A., Harris, V., Green, J., Aldred, C., McConachie, H., Spence, S., & Thurm, A. (2010). Testing autism interventions: Trials
Slonims, V., et al. (2015). Treatment mechanism in the MRC and tribulations. Lancet, 375, 2124–2125.
preschool autism communication trial: Implications for study Spreckley, M., & Boyd, R. (2009). Efficacy of applied behavioral
design and parent-focussed therapy for children. Journal of intervention in preschool children with autism for improving
Child Psychology and Psychiatry, 56(2), 162–170. doi:10.1111/ cognitive, language, and adaptive behavior: A systematic review
jcpp.12291. and meta-analysis. Journal of Pediatrics, 154, 338–344.
Richler, J., Huerta, M., Bishop, S. L., & Lord, C. (2010). Develop- Thurm, A., Manwaring, S. S., Swineford, L., & Farmer, C. (2015).
mental trajectories of restricted and repetitive behaviors and Longitudinal study of symptom severity and language in
interests in children with autism spectrum disorders. Develop- minimally verbal children with autism. Journal of Child
ment and Psychopathology, 22, 55–69. doi:10.1017/S095457940 Psychology and Psychiatry, 56(1), 97–104. doi:10.1111/jcpp.
9990265. 12285.
Rogers, S. J., Estes, A., Lord, C., Vismara, L., Winter, J., Fitzpatrick, van Lang, N. D., Boomsma, A., Sytema, S., de Bildt, A. A., Kraijer,
A., et al. (2012). Effects of a brief Early Start Denver model D. W., Ketelaars, C., & Minderaa, R. B. (2006). Structural
(ESDM)-based parent intervention on toddlers at risk for autism equation analysis of a hypothesised symptom model in the
spectrum disorders: a randomized controlled trial. Journal of the autism spectrum. Journal of Child Psychology and Psychiatry,
American Academy of Child and Adolescent Psychiatry, 51(10), 47(1), 37–44. doi:10.1111/j.1469-7610.2005.01434.x.
1052–1065. doi:10.1016/j.jaac.2012.08.003. Wetherby, A. M., Guthrie, W., Woods, J., Schatschneider, C., Holland,
Rogers, S., & Vismara, L. (2008). Evidence-based comprehensive R. D., Morgan, L., & Lord, C. (2014). Parent-implemented social
treatments for early autism. Journal of Clinical Child and intervention for toddlers with autism: an RCT. Pediatrics, 134(6),
Adolescent Psychology, 37(1), 8–38. doi:10.1080/15374410701 1084–1093. doi:10.1542/peds.2014-0757.
817808. Wolery, M., & Garfinkle, A. (2002). Measures in intervention
Rutgers, A. H., Bakermans-Kranenburg, M. J., van Ijzendoorn, M. H., research with young children who have autism. Journal of
& van Berckelaer-Onnes, I. A. (2004). Autism and attachment: A Autism and Developmental Disorders, 32(5), 463–478.
meta-analytic review. Journal of Child Psychology and Psychi- Yoder, P., Bottema-Beutel, K., Woynaroski, T., Chandrasekhar, R., &
atry, 45(6), 1123–1134. doi:10.1111/j.1469-7610.2004.t01-1- Sandbank, M. (2013). Social communication intervention effects
00305.x. vary by dependent variable type in preschoolers with autism
Sallows, G., & Gaupner, T. (2005). Intensive behavioral treatment for spectrum disorders. Evidence-Based Communication Assessment
children with autism: Four-year outcome and predictors. Amer- and Intervention, 7(4), 150–174.
ican Journal on Mental Retardation, 110(6), 417–438. Yoder, P., Woynaroski, T., Fey, M., & Warren, S. (2014). Effects of
Scahill, L., Aman, M. G., Lecavalier, L., Halladay, A. K., Bishop, S. dose frequency of early communication intervention in young
L., Bodfish, J. W., et al. (2015). Measuring repetitive behaviors children with and without Down syndrome. American Journal on
as a treatment endpoint in youth with autism spectrum disorder. Intellectual and Developmental Disabilities, 119(1), 17–32.
Autism, 19(1), 38–52. doi:10.1177/1362361313510069. doi:10.1352/1944-7558-119.1.17.
Shumway, S., Farmer, C., Thurm, A., Joseph, L., Black, D., & Zhou, T., & Yi, C. (2014). Parenting styles and parents’ perspectives
Golden, C. (2012). The ADOS calibrated severity score: on how their own emotions affect the functioning of children
Relationship to phenotypic variables and stability over time. with autism spectrum disorders. Family Process, 53(1), 67–79.
Autism Research, 5(4), 267–276. doi:10.1002/aur.1238. doi:10.1111/famp.12058.
Shuster, J., Perry, A., Bebko, J., & Toplak, M. E. (2014). Review of
factor analytic studies examining symptoms of autism spectrum

123
Journal of Autism & Developmental Disorders is a copyright of Springer, 2016. All Rights
Reserved.

You might also like