Professional Documents
Culture Documents
A Pilot Study Examining The Test-Retest and Internal Consistency Reliability of The Ablls-R
A Pilot Study Examining The Test-Retest and Internal Consistency Reliability of The Ablls-R
A Pilot Study Examining The Test-Retest and Internal Consistency Reliability of The Ablls-R
research-article2016
JPAXXX10.1177/0734282916678348Journal of Psychoeducational AssessmentPartington et al.
Brief Article
Journal of Psychoeducational Assessment
1–6
A Pilot Study Examining the © The Author(s) 2016
Reprints and permissions:
Test–Retest and Internal sagepub.com/journalsPermissions.nav
DOI: 10.1177/0734282916678348
Consistency Reliability of the jpa.sagepub.com
ABLLS-R
Abstract
The literature contains a variety of assessment tools for measuring the skills of individuals with
autism or other developmental delays, but most lack adequate empirical evidence supporting
their reliability and validity. The current pilot study sought to examine the reliability of scores
obtained from the Assessment of Basic Language and Learning Skills–Revised (ABLLS-R). Two
forms of reliability were measured: internal consistency and test–retest reliability. Analyses
using data obtained from neuro-typical children (N = 50) yielded strong evidence of internal
consistency and test–retest reliability. These preliminary findings suggest that the ABLLS-R can
yield reliable scores.
Keywords
autism, assessments, ABLLS-R, reliability, psychometric properties
Numerous practitioners (e.g., educators, board certified behavior analysts, speech and language
pathologists, etc.) provide services to children with autism. Best practice requires practitioners to
obtain an accurate assessment of the strengths and skill deficits of each child to identify appropri-
ate teaching strategies (Guldberg, 2010). Furthermore, professional organizations such as the
American Educational Research Association (1999) and federal law (i.e., Individuals With
Disabilities Education Act, 2004) call for the use of valid and reliable assessments. Although
many of the comprehensive assessment tools available to professionals who serve children with
autism possess strong clinical significance, most lack sufficient empirical validation of their
psychometric properties. In contrast, other norm-referenced assessments (e.g., Vineland II;
Sparrow, Cicchetti, & Balla, 2005) only assess a limited range of skills, but contain empirical
support for their validity and reliability. An ideal assessment would yield both a thorough review
of skills and contain empirical support for its psychometric properties.
The Assessment of Basic Language and Learning Skills–Revised (ABLLS-R; Partington, 2010)
is an internationally recognized, criterion-referenced assessment tool that provides a comprehen-
sive review of 544 skills from 25 skill areas (i.e., repertoires) including language, social interaction,
Corresponding Author:
James W. Partington, Behavior Analysts. Inc., 309 Lennon Lane, #104, Walnut Creek, CA 94598, USA.
Email: partington@behavioranalysts.com
2 Journal of Psychoeducational Assessment
self-help, academic, and motor skills. Leading researchers and professional organizations identified
the ABLLS-R as a useful tool to guide teaching language and other critical learner skills to children
with autism (American Medical Association, 2014; Thompson, 2011). Despite its widespread pop-
ularity, the ABLLS-R contains limited empirical support for its psychometric properties.
Currently, only one research report examined the reliability and validity of the ABLLS-R (Usry,
2015). Usry asked experts in the field of autism treatment to rate the extent that each ABLLS-R
item accurately measured the construct of interest. In addition, a second panel of experts who had
never previously conducted an ABLLS-R assessment watched a video and scored the performance
of an individual with autism on 86 ABLLS-R items. The results yielded evidence of good content
validity and interrater reliability. The assessment literature would benefit from a further examina-
tion of the psychometric properties of the ABLLS-R. The current pilot study aimed to measure
other forms of reliability including internal consistency and test–retest reliability.
Method
Participants and Setting
Parents and professionals who received training on the administration of the ABLLS-R from the
first author volunteered to collect data from their typically developing children. All assessors (N
= 50; 48 women and two men) had received at least a bachelor’s degree—74% had a master’s
degree or better—and 80% of the assessors had previously administered the ABLLS-R prior to
the study. The number of participants whose data were used in each analysis varied due to our
data analysis methodology. Our initial participant sample (N = 53) was reduced to only include
participants whose data were used in one or more of the analyses (N = 50; 29 girls and 21 boys,
age range: 6 to 72 months). Most participants scored within the average range of adaptive func-
tioning per their Adaptive Behavior Composite Score (M = 110.54, range = 87-131) obtained
from the Vineland II questionnaire (Sparrow et al., 2005), and all parents reported their children
as healthy and as having neither mental health disorders nor learning disabilities. We did not offer
compensation for participation, but all participants received free access to their online ABLLS-R
account (i.e., WebABLLS) for the duration of the study.
Materials
Participants entered data using their WebABLLS account. This criterion-referenced assessment
consists of 544 items from 25 repertoires (e.g., Labeling, Requesting, etc.) with each repertoire
containing between six and 57 items. For example, the Labeling repertoire contains 47 items,
which assess how well one can label objects, actions, adjectives, and so on. Depending upon the
skills of the participant, one could score between 0 and 4 points per item based upon the scoring
criteria specified in the assessment.
A completed ABLLS-R grid allows one to calculate a total score and the percentage of pos-
sible points for each repertoire. To calculate the total score for each repertoire, add the points
accumulated from each of the specific items within the repertoire. For example, adding scores
obtained from items F1 to F29 yields a total score for the Requesting repertoire with a maximum
score of 74. To obtain a percentage of possible points, divide the total score obtained for a reper-
toire by the maximum points possible for that repertoire and multiply the quotient by 100.
Procedure
All participants provided informed consent and demographic information prior to receiving
instructions on the data collection process. Parents or other professionals who both knew the
Partington et al. 3
child and received training on administering the ABLLS-R carried out the assessment in the
home setting. The data-collection period was between January 2007 and May 2013. Data collec-
tion occurred as early as 6 months of age and at each 3-month interval thereafter through 72
months. Participants could begin collecting data at any time once their child reached the next
3-month stage in development (e.g., a parent of a 34-month-old child would begin collecting data
at age 36 months). Although we encouraged data collection through 6 years of age, participants
could terminate data collection at their discretion.
We asked participants to update their WebABLLS grids at each 3-month stage in development
(i.e., 6 months, 9 months, 12 months, etc.) between the week prior to and following the day of the
3-month mark. We informed all participants that we would only use fully completed ABLLS-R
grids, completed within the 2-week window, in our data analyses.
Data Analysis
Internal consistency reliability. We measured the internal consistency reliability of the ABLLS-R
scores by computing a Cronbach’s alpha coefficient for each repertoire. The age selected for
computing alpha for each repertoire minimized the number of items with no variance (i.e., all
children at 0% or 100% for any given item). Researchers generally consider coefficients greater
than .70 as “acceptable,” greater than .80 as “good,” and greater than .90 as “excellent” (George
& Mallery, 2003).
Test–retest reliability. The researchers established a priori, a general first age point to use in the test–
retest correlations—the first age where the median score was approximately 30% (see Table 2)—
as scores from most repertoires contained minimal variability for the youngest and oldest
participants.
Test–Retest Reliability
The analysis revealed strong test–retest correlations at the 3-, 6-, 9-, and 12-month marks across
the majority of the repertoires (see Table 2). The analysis yielded strong average test–retest cor-
relation coefficients at the 3- (r = .84) and 6-month (r = .77) retests and to a lesser extent at the
9- (r = .62) and 12-month (r = .54) retests.
The current study examined the reliability of scores obtained from the ABLLS-R by obtaining
measures of internal consistency and test–retest reliability. The analyses yielded strong evidence
of both types of reliability. In addition to the primarily strong alpha coefficients obtained, we
4 Journal of Psychoeducational Assessment
found that removing an ABLLS-R item would only minimally improve the internal consistency
of nine ABLLS-R repertoires, which supports the inclusion of the existing ABLLS-R items. In
the test–retest reliability analysis, we observed a general trend of decreasing correlation coeffi-
cients at each increasing retest period. However, one could predict this finding as children con-
tinuously acquire skills, which likely accounts for the systematic decreases observed in the retest
correlations over time.
These findings compliment the research by Usry (2015) who found evidence of interrater reli-
ability and content validity. The preliminary empirical evidence thus far suggests that the
ABLLS-R can produce reliable scores.
Note. nrange: 21-34. ABLLS-R = Assessment of Basic Language and Learning Skills–Revised.
*p < .05. **p < .01. ***p < .001.
Acknowledgments
The authors wish to thank Frank Quinn for organizing the data set and Barry Collins, PhD, for his assistance
with conducting the statistical analyses.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this
article.
References
American Educational Research Association. (1999). Standards for educational and psychological testing.
Washington, DC: Author.
American Medical Association. (2014). CPT assistant. Chicago, IL: Author.
6 Journal of Psychoeducational Assessment
George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference. 11.0
update (4th ed.). Boston, MA: Allyn & Bacon.
Guldberg, K. (2010). Educating children on the autism spectrum: Preconditions for inclusion and notions
of “best autism practice” in the early years. British Journal of Special Education, 37, 168-174.
doi:10.1111/j.1467-8578.2010.00482.x
Individuals With Disabilities Education Act. (2004). Evaluation procedures. Retrieved from http://idea.
ed.gov/explore/view/p/,root,regs,300,D,300%252E304,
Partington, J. W. (2010). The ABLLS-R—The Assessment of Basic Language and Learning Skills–Revised.
Walnut Creek, CA: Behavior Analysts, Inc.
Sparrow, S. S., Cicchetti, D. V., & Balla, D. A. (2005). Vineland Adaptive Behavior Scales (2nd ed.).
Bloomington, MN: Pearson. doi:10.1037/t15164-000
Thompson, T. (2011). Individualized autism intervention for young children. Baltimore, MD: Paul H.
Brookes.
Usry, J. N. (2015). Validation of the Assessment of Basic Language and Learning Skills–Revised for stu-
dents with autism spectrum disorder using an expert review panel (Unpublished doctoral dissertation).
Lynchburg, VA: Liberty University.