07 2005 Foyd Properties of Assessments

School Psychology Review,
2005, Volume 34, No. 1, pp. 58-73
Measurement Properties of Indirect Assessment Methods

for Functional Behavioral Assessment: A
Review of Research
Randy G. Floyd and Robin L. Phaneuf

The University of Memphis
Susan M. Wilczynski
University of Nebraska Medical Center
Abstract. Indirect assessment instruments used during functional behavioral as-

sessment, such as rating scales, interviews, and self-report instruments, represent
the least intrusive techniques for acquiring information about the function of prob-
lem behavior. This article provides criteria for examining the measurement prop-
erties of these instruments designed to identify functional relations and reviews 46
studies examining the use and interpretation of the Motivation Assessment Scale
(Durand & Crimmins, 1992) or the Functional Assessment Interview (O’Neill et
al., 1997). Results indicate at least three inadequacies in the research: (a) insuffi-
cient attention paid to evidence based on instrument content and informant re-
sponse processes, (b) minimal evidence indicating consumer satisfaction and the
singular or unique effects of instrument data on intervention development, and (c)
excessive emphasis on research designs and measurement properties that assume
stability in behavior–environment interactions across contexts.
Functional behavioral assessment (FBA) ing the beginning stages of an FBA. Indirect
has received considerable attention in the pro- methods are characterized by being removed
fessional literature of school psychology since in time and place from the phenomena they
the passage of the 1997 revision of the Indi- measure (i.e., behaviors and functional rela-
viduals with Disabilities Education Act (IDEA, tions). Examples include ratings of others, in-
1997), which mandated the use of FBA proce- terviews, and self-report instruments (Cone,
dures and positive behavioral supports in some 1978).
instances. Although IDEA did not specify what Indirect methods of assessment offer a
constitutes an FBA, a number of heuristic number of potential benefits to the process of
models for FBA have been proposed (e.g., FBA. For example, interviews may be useful
O’Neill et al., 1997; Sugai, Lewis-Palmer, & in defining problem behaviors, determining
Hagan-Burke, 2000; Witt, Daly, & Noell, their severity, and specifying optimal condi-
2000). Across these models, indirect methods tions under which to observe the behavior.
of assessment are typically recommended dur- Additionally, they may contribute to hypoth-
Authors’ Note. We thank numerous instrument authors for responding to our requests for information,
David M. Murray for assistance in locating resources describing categorical data-analytic procedures, and
Renee Bergeron for proofing activities. We also appreciate editorial guidance from Cathy F. Telzrow and
feedback from several anonymous reviewers about earlier versions of this article.
Correspondence concerning this article should be addressed to Randy G. Floyd, PhD, The University of
Memphis, Department of Psychology, Memphis, TN 38152; E-mail: rgfloyd@memphis.edu.
Copyright 2005 by the National Association of School Psychologists, ISSN 0279-6015
58
Measurement Properties
esis-driven functional analysis1 and reduce the evaluation of traditional measurement proper-
time required to identify functional relations ties, such as construct or criterion-related va-
because fewer conditions may need to be ma- lidity (Ebel, 1961; Johnston & Pennypacker,
nipulated. The informant-based nature of these 1993). Second, the ability of an FBA instru-
methods also engages valuable stakeholders ment to inform effective interventions (i.e.,
(e.g., parents and teachers) who will later be treatment utility) is viewed by some as the most
involved in the development, implementation, important measurement property—if not ulti-
and monitoring of behavior support plans mately the only important property (Hayes,
(Dunlap, Newton, Fox, Benito, & Vaughn, Nelson, & Jarrett, 1987; Nelson-Gray, 2003).
2001). They logically are less time-consum- Third, the individual or idiographic nature of
ing than direct observations, and they require behavioral assessment does not appear to lend
notably less training, expertise, and staff col- itself to group-level statistical analyses that are
laboration to complete than functional analy- typically used to examine traditional measure-
sis. Furthermore, they may be the only assessment properties (Nelson, 1983; Nelson, Hay,
ment method available when problem behav- & Hay, 1977). Finally, the methodology and
iors occur at a low frequency or when func- research designs examining traditional mea-
tional analysis is unethical or untenable surement properties may be incongruent with
(O’Neill et al., 1997). what is known about behavior–environment
Although the benefits of indirect meth- interactions (Hartman, Roper, & Bradford,
ods for FBAs are numerous, informant reports 1979; Nelson, 1983; Nelson et al., 1977). For
stem from recollections of the problem behav- example, because functional relations are ex-
iors and personal judgments about behavior– pected to be variable across contexts—unlike
environment interactions and not their direct traits targeted by many traditional measure-
measurement. As a consequence, erroneous hy- ment properties—they are less likely to dem-
potheses may be developed or faulty con- onstrate adequate levels of test–retest reliabil-
clusions may be drawn. Despite this signifi- ity or agree with other measures of functional
cant limitation, no agreed-upon guidelines have relations obtained from different contexts.
emerged to evaluate the quality of the results These reasons for reluctance ground
obtained from indirect methods. Without a FBAs in low-inference assessment of directly
sound framework for examining their quality, observed behaviors and in assessments and
consumers may be left not only misguided by interventions that are useful in addressing the
their results but also uninformed about which target concerns for individuals. However, re-
instruments provide the most accurate and most luctance to develop and utilize measurement
useful information. Just as measurement stan- guidelines for assessment of functional rela-
dards are applied to the evaluation of norm- tions hinders a broader understanding of which
referenced tests to promote quality of results, properties produce the best results and which
so too must criteria be applied to indirect meth- properties produce the most error. Consider-
ods for FBAs. ation of the measurement properties does not
Despite several publications focusing on negate a practical focus on assessing and ad-
evaluating measures used during FBAs (e.g., dressing the target concerns of individuals.
Cone, 1997, 1998; Gresham, 2003; Shriver, Rather, measurement guidelines assist in evalu-
Anderson, & Proctor, 2001), there appears to ating the quality of assessment information in
be reluctance to embrace measurement stan- and of itself. Such guidelines ultimately lead
dards for assessment of functional relations, users to develop stronger hypotheses and con-
particularly among individuals who embrace clusions that are more accurate. Therefore, the
applied behavior analysis (Kratochwill & purpose of this article is threefold. It offers re-
Shapiro, 2000). There are at least four reasons search design characteristics and statistical
for this apparent reluctance. First, it can be ar- analyses appropriate for examining assessment
gued that direct assessment of behavior through instruments designed to identify functional
observation largely eliminates the need for relations. It evaluates the research examining
59
School Psychology Review, 2005, Volume 34, No. 1
the measurement properties of two such instru- primary research examining the use and inter-
ments. It also provides recommendations for pretation of the instruments. In addition, dur-
future research and instrument use. ing contact with instrument authors, research
including the instruments and the existence of
Method manuals or other published materials summa-
Identification of Instruments rizing characteristics of the instruments were
reported.
Several strategies were employed to
identify indirect assessment instruments mea- Instrument Selection and Description
suring functional relations to include in this Instruments selected for review had at
review (Cooper, 1998; Lipsey & Wilson, least (a) three studies examining their measure-
2001). First, electronic bibliographic searches ment properties, and (b) two studies conducted
of PsychInfo and ERIC (from January 1887 by independent researchers (i.e., those who
through August 2001) were conducted for pub- were not authors of the instruments or associ-
lished resources2 describing indirect assess- ated with their research group). These two se-
ment instruments for FBA. Search terms in- lection criteria were based on procedures de-
cluded functional behavioral assessment, func- signed to identify empirically supported thera-
tional analysis, behavioral assessment, infor- pies and interventions (Chambless & Hollon,
mant, questionnaire, behavior ratings scale, 1998; Kratochwill & Stoiber, 2002; Wolery,
checklist, interview, psychometric properties, 2003). The first criterion was used to include
accuracy, validity, reliability, and instruments with an established body of re-
generalizability. Second, after the resources search examining their use. The second crite-
were obtained, their content was reviewed to rion was used to include research conducted
identify additional instruments measuring across contexts, participants, and researchers.
functional relations. Third, testing information Only two instruments met both selection cri-
clearinghouses on the World Wide Web (e.g., teria.3
Buros Institute of Mental Measurements) and
assessment instrument catalogs from promi- Motivation Assessment Scale. The
nent test publishers were searched. Finally, the MAS (Durand & Crimmins, 1992) is a rating
primary authors of the instruments identified scale designed to identify the function of prob-
during the search were contacted by mail, elec- lem behaviors. The MAS consists of 16 items
tronic mail, or phone. They provided names of scored on a 7-point scale (ranging from Never
additional indirect assessment instruments to to Always) that may be completed indepen-
include in the review (Cooper, 1998). Based dently or through an interview with parents,
on these search strategies, 19 indirect assess- teachers, or careproviders. Items measure four
ment instruments (or collections of instru- classes of functional relations (Attention, Tan-
ments) measuring functional relations were gible, Escape, and Sensory), and the instrument
identified. yields four subscales representing these classes.
Users identify functional relations based on the
Identification of Research subscale with the highest rank.
In order to identify all research support- Functional Assessment Interview.
ing the use and interpretation of the identified The FAI (O’Neill et al., 1997) is a lengthy in-
instruments, additional electronic bibliographic terview for parents, teachers, or careproviders.
searches of articles, books, and book chapters The FAI is an extension of the Functional
were conducted using PsychInfo and ERIC Analysis Interview (O’Neill, Horner, Albin,
(from January 1887 to January 2003) using the Storey, & Sprague, 1990), and it has recently
names of the instruments and their authors’ been revised and abbreviated for use in school
names. Subsequently, abstracts and reference settings (Crone & Horner, 2003). The interview
lists obtained in the search were reviewed, and includes 11 sections soliciting information
the texts of resources were scoured to identify about problem behaviors, setting events, im-
60
mediate antecedents, consequences that may tions applying measurement standards to

maintain the problem behaviors, and the effi- evaluation of behavior rating scales and intel-
ciency of problem behaviors in obtaining re- ligence and achievement tests (e.g., Bracken,
inforcement. Other sections facilitate identifi- 1987; Floyd & Bose, 2003; Hammill, Brown,
cation of replacement behaviors, probable re- & Bryant, 1992). A final type of source was
inforcers, and discriminative stimuli. Addi- recent publications describing the assumptions
tional sections include questions about the of assessment of behaviors and their functions,
individual’s communication skills and about which were reviewed to tailor the criteria to
previous interventions used to treat the prob- these assumptions (e.g., Barnett, Lentz, &
lem behavior. The FAI concludes with the in- Macmann, 2000; Cone, 1977, 1997, 1998;
terviewer and interviewee constructing pre- Foster & Cone, 1995; Gresham, 1984, 2003;
liminary hypotheses regarding the functions of Gresham & Davis, 1988; Hayes et al., 1987;
the problem behaviors and identifying alter- Haynes & O’Brien, 2000; Haynes, Richard, &
nate or replacement behaviors. Kubany, 1995; Johnston & Pennypacker, 1993;
Supporting research. The search for Shriver et al., 2001; Silva, 1993; Sturmey,
research using the MAS or the FAI yielded 39 1994a).
resources: 35 articles and four books. Three Based on a synthesis of these four sets
resources were excluded after full review be- of information, coding criteria, coding sheets,
cause (a) the MAS was administered as a self- and a codebook were constructed.4 All materi-
report form to children with mental retarda- als were circulated among the authors and were
tion (Akande, 1994, 1998) and (b) only a gen- piloted with five articles not included in this
eral description of a case using the MAS was review. This process resulted in several revi-
sions of the coding criteria and materials. The
described (Lennox & Miltenberger, 1989).
first section of the coding criteria focused on
From the remaining resources, 46 studies were
variables associated with the design and de-
identified—35 examining the MAS and 11 the
scription of the studies, such as characteristics
FAI. Although no studies were identified in the
of participants being assessed, descriptions of
books, some validity evidence was presented
the informants, and descriptions of the contexts
in one book. Eight articles included more than
in which the problem behaviors occurred.
one study, and two articles included studies that
The second section focused on the pres-
were coded for both the MAS and the FAI.
ence of research providing reliability and va-
Outline of Qualitative Analysis lidity evidence and on the characteristics of
the studies yielding this evidence. As appar-
Development of review criteria. Four ent in Table 1, this section included (a) items
sets of information were used to develop the for the presence of test–retest reliability
review criteria for coding the research. The analysis, the interval between ratings, and
foundation of the criteria was the Standards associated statistical analyses; (b) items for
for Educational and Psychological Testing the presence of interrater reliability analysis
(American Educational Research Association and associated statistical analyses; and (c)
[AERA], American Psychological Association items focusing on the five sources of validity
[APA], & National Council on Measurement evidence described in Standards (AERA, APA,
in Education [NCME], 1999). Because the & NCME, 1999). The presence of evidence
Standards provides only general guidelines for based on test content, evidence based on re-
evaluation, three other sets of information were sponse processes, and evidence based on con-
reviewed to identify guidelines that are more sequences presented in Table 1 was coded. For
specific. These included (a) publications de- evidence based on internal structure, the pres-
scribing excellence in psychological measure- ence of statistical analyses was coded. For evi-
ment (e.g., Aiken, 1997; Anastasi & Urbina, dence based on relations with other variables,
1997; Merrell, 2003; Messick, 1989, 1995; information was coded for (a) studies compar-
Nunnally & Bernstein, 1994) and (b) publica- ing the results of the MAS or FAI to other as-
61
sessment techniques or instruments and (b) Results

studies where groups were compared using the
MAS or FAI. For studies examining relations Participants Being Assessed
with other variables by comparing different Across all studies, sample sizes ranged
measures (see Table 1), the other assessment from 1 to 118 (M = 20, Mdn = 5). For the MAS,
techniques or instruments used in the compari- the mean sample size was 22.7 and the me-
son, informants, temporal proximity of the re- dian 13. For the FAI, the mean sample size was
lated measurements, and statistical analyses 12.5 and the median 3. Participants ranged in
were coded. For the studies examining group age from children below 4 years of age (13
differences, the manner in which groups were studies or 28.2%), children of preschool and
formed was coded. In addition to coding these school age (21 studies or 45.7%), adolescents
variables associated with the measurement (21 studies or 45.7%), to adults (19 studies or
property, as evident in the right column in Table 41.3%). Diagnoses and classifications of par-
1, several research design characteristics ticipants were most frequently mental retarda-
deemed necessary for valid conclusions or con- tion (34 studies or 73.9%) and autistic disor-
sistency with the assumptions of behavioral der (21 studies or 45.7%). Two studies included
assessment were coded. children classified with emotional disturbance,
Review process. The first author re- 1 included a child with a learning disability,
viewed and coded all 46 studies. To provide and 3 reported that children had no classifica-
an index of interrater agreement, the second tion. Problem behaviors were most often self-
author reviewed and coded 30% of all studies injurious behaviors (34 studies or 73.9%), ag-
(31% of the studies examining the MAS and gression (32 studies or 69.6%), and disruptive
27% of the studies examining the FAI). Per- behaviors (27 studies or 58.7%). Across stud-
centage agreement and the more conservative ies, 12 (26.1%) reported the actual frequen-
kappa statistic, which controls for agreement cies of the problem behaviors, and 34 (73.9%)
by chance, were calculated for items from both used either general descriptors to describe fre-
sections of the review criteria. For the section quency (e.g., “frequent” or “problematic lev-
describing the design and description of the els”) or made no reference to the frequency of
studies, there was 95% agreement, and kappa problem behaviors.
was .88. For the reliability and validity evi- Informants
dence section as a whole (in Table 1), there
was 98.3% agreement, and kappa was .86. Informants who most frequently re-
When items indicating presence of reliability sponded to the indirect assessment instruments
and validity evidence were targeted and items either independently or through an interview
nested under them (e.g., statistical analyses for format were teachers or teacher assistants (19
test–retest reliability) were omitted, the per- studies or 41.3%), residential facility or devel-
centage agreement for each of the seven gen- opmental center staff (14 studies or 30.4%),
eral measurement properties presented in Table and parents or guardians (10 studies or 21.7%).
1 was as follows: test–retest reliability (95%), Across all studies, 16 (34.8%) quantified the
interrater reliability (99.3%), evidence based extent of the informants’ experience with the
on content (100%), evidence based on response problem behaviors, and 30 (65.2%) used only
processes (100%), evidence based on internal general descriptors (e.g., “knowledgeable in-
structure (97.7%), evidence based on relations formants”) or made no reference to the infor-
with other variables (94.8%), and evidence mants’ experiences with the behaviors. Simi-
based on consequences (100%). These esti- larly, the vast majority of studies (36 or 78.3%)
mates of interrater agreement indicate a high did not report the informants’ training in be-
level of consistency in coding. All disagree- havioral assessment or behavioral principles.
ments were evaluated and resolved by consen- Only 4 studies (8.7%) reported that informants
sus. were trained in these areas, and only 6 studies
62
Table 1
Criteria for Coding Reliability and Validity Evidence
Measurement Property Variables Associated With Measurement Property Research Design Characteristics
Test–retest reliability or stability Presence of analysis Informant did not observe problem behavior
Interval between ratings
Short-term (< 1 month)
Long-term (> 1 month)
Statistical analyses
Correlations between subscales or groups of items
Correlations between items
Agreement between subscales, groups of items, or indicators of function
Agreement between items or individual question
Interrater reliability Presence of analysis Experience of different raters controlled so

Statistical analyses (see test–retest reliability or stability above) that raters share all common experiences
with problem behavior
Evidence based on content Development of items based on literature review, review of case histories, or both
Development of items based on review of other instruments
Development of items based on direct observations
Development of items based on caregiver (parent, teacher, staff) input
Development of items based on expert opinion (other than authors)
Review and evaluation of item clarity
Review and evaluation of bias (gender, racial, special needs)
Review and item tryouts by users in applied settings
Readability analysis of items
Readability analysis of written informant instructions
Questions about function grouped by experts or caregivers
Evidence based on Evaluation of instrument instructions and response formats

response processes Interviews with informants about thought processes
“Think aloud” protocols with informants
Analysis of match between interviewer verbalizations and administration criteria (Table 1 continues)
63
64
(Table 1 continued)
Measurement Property Variables Associated With Measurement Property Research Design Characteristics
Evidence based on internal Item intercorrelations within scales or subgroups of items Informant blind to the results of other
structure Item intercorrelations across all items (within total score) assessment methods
Item correlations with total score (i.e., item–total correlations)
Internal consistency coefficients including only items within scales or subgroups of items
Internal consistency coefficients including all items (within total score)
Exploratory factor analysis of items/Confirmatory factor analysis of items
Correlations between subscales or subgroups of items
Evidence based on relations Results of the instrument were compared to results from a distinctly different
with other variables assessment instrument or technique that identifies function
Presence of analysis
Instrument or technique (and informant, if applicable)
Timing of measurements
Concurrent
Predictive
Statistical analyses
Correlations between subscales, groups of items, or indicators of function
Agreement between subscales, groups of items, or indicators of function
Groups were formed based on an external characteristic and results of the
instrument were compared between groups
Presence of analysis
Grouping
Grouped by classification of participant on whom data are collected

Grouped by problem behavior or response class
Evidence based on consequences Evaluation of treatment utility Instrument used alone to
Evaluation of acceptability of instrument or assessment method/technique assess function
for purpose of assessment Instrument used with a larger
Evaluation of use of instrument in relation to intended outcome assessment battery to assess function
Cost–benefit analysis Not reported if instrument
Cost–effectiveness analysis used alone or in larger battery
(13.0%) reported the general level of training amined the administration or scoring of the
or education of informants. MAS, and 3 examined the FAI. Statistics used
to examine interrater reliability included cor-
Context of Problem Behaviors relations between subscales or groups of items
Informants most often assessed problem (13 studies) and correlations between items (8
behaviors occurring in school or preschool set- studies). Also evident were studies examining
tings (21 studies or 45.7%) and in residential agreement between subscales, groups of items,
treatment or clinical settings (17 studies or or indicators of function (11 studies) and those
37.0%). Although it is likely that informants examining agreement between items or indi-
routinely delineated the specific contexts or vidual questions (6 studies). None of the stud-
settings (e.g., classroom) in which the prob- ies reported controlling the observations of
lem behaviors occurred before completing the informants so that their experiences with the
FBA, only 16 studies (34.8%) reported this problem behaviors were identical. However, 4
information. Even fewer studies (8 or 17.4%) studies reported the specific settings in which
reported the specific situations in which the the problem behaviors occurred, and 1 reported
specific situations. Thus, for most all studies
problem behaviors occurred, such as during
reviewed, it is likely that informants observed
independent reading time or during small group
problem behaviors in different settings and
instruction.
situations and, as a result, completed the as-
Measurement Properties sessment based on observations of different
behavior–environment interactions. Despite
Reliability. Five studies (10.9%) exam- the sizeable number of studies examining this
ined test–retest reliability or stability. All of measurement property, error attributed prima-
these studies focused on the MAS; none fo- rily to different perceptions and memories of
cused on the reliability of the functions identi- informants cannot be readily identified.
fied using the FAI. All but one of these studies
Evidence based on content. Across
used samples of children, and all but two were
the resources for the MAS and FAI, little in-
conducted in school settings. Of these five stud- formation was located documenting the appro-
ies, four examined short-term generalizability priateness and completeness of instrument
across time (<1 month), and three examined items or questions. As described very briefly
long-term stability (>1 month). Statistics used in only two resources, the content of the MAS
to examine test–retest reliability or stability items is based on a strong research base exam-
included correlations between subscales or ining the environmental contingencies main-
groups of items (three studies); correlations taining the problem behaviors of children and
between items (three studies); formulas exam- adults with developmental disabilities. Items
ining agreement between subscales, groups of were developed and piloted over a 4-year pe-
items, or indicators of function (one study); and riod, and items were added, deleted, or altered
formulas examining agreement between items during item tryouts with the aid of
or individual questions (three studies). Only careproviders of children with mental retarda-
two studies reported the specific settings in tion, autistic disorder, and other developmen-
which the problem behaviors occurred, and tal disabilities. However, the procedure for
only one reported specific situations. None of assigning items to subscales is unclear. No for-
the studies reported controlling the informants’ mal item-sorting procedures drawing upon the
exposure to the problem behaviors between decisions of experts appear to have been uti-
ratings. Thus, even in the studies that reported lized for assigning items to subscales. There
the specific contexts of the problem behaviors, were no descriptions of consultation with ex-
error attributed primarily to the passage of time ternal content experts during the development
between ratings cannot be identified. and evaluation of items, no evaluation of item
Eighteen studies (39.1%) examined bias, and no evaluation of item or instruction
interrater reliability. Of the 18 studies, 15 ex- readability or clarity.
65
It is apparent that the FAI has gone Behavioral Functioning Scale [Vollmer &
through several stages of development (see Matson, 1999]), 3 analyses compared it to
O’Neill et al., 1990; O’Neill et al., 1997) and semistructured interviews (e.g., the FAI), and
that it is thorough in leading informants to de- 2 analyses compared it to direct questioning
scribe and prioritize problem behaviors, to iden- about functional relations. For the FAI, 3 analy-
tify variables affecting behavior, and to promote ses compared it to rating scales, 1 analysis com-
the development of hypotheses. Thus, it ap- pared it to another semistructured interview,
pears to be based on expert opinion and feed- and 1 analysis compared it to direct question-
back from users. However, this review revealed ing about functional relations. Notably, 14
no description of its development or evidence analyses (10 for the MAS, 4 for the FAI) com-
based on content to support its validity. pared the functional relations identified by the
Evidence based on response pro- MAS or FAI to those identified through func-
cesses. When the characteristics of the infor- tional analysis. Ten analyses compared their
mant and the interaction between the informant results to functions identified through direct
and the assessment situation were considered, observation. Of these analyses including direct
observation, 9 used data collected using an
no evidence was found. This finding is similar
antecedent–behavior–consequence (ABC) for-
to other behavior rating scales and behavioral
mat. These comparative analyses most fre-
assessment interviews (Beaver & Busse, 2000;
quently examined the agreement between
Floyd & Bose, 2003).
subscales, groups of items, or other indicators
Evidence based on internal struc- of function (30 or 83.3%). Only 6 analyses
ture. Internal relations between items from the (16.7%) used correlations between measures.
MAS have been investigated through item- Although the number of analyses examining
level factor analysis and internal consistency this measurement property is impressive, very
analysis. More specifically, four analyses ex- few studies containing them (9 or 19.6%) re-
amined the factor structure of the MAS using ported that informants were blind to the results
exploratory factor analysis, and five analyses of the other assessment instruments and tech-
examined the relations between items within niques. In addition, only 15 analyses (41.6%)
each of its four scales. Most studies included reported the specific settings in which the prob-
only adolescents and adults with mental retar- lem behaviors occurred, and only 5 (13.8%)
dation displaying self-injurious behavior and reported specific situations.
other problem behaviors. No studies used con-
Evidence based on consequences.
firmatory factor analysis. Although meaning
Evidence of the treatment utility of the MAS
is most frequently derived from the MAS via
and the FAI was identified in nine studies
identifying its highest ranked scale, three
(19.6%).5 However, it is challenging to deter-
analyses examined the statistical relations
mine the unique contribution of the instruments
among all items, which is more appropriate for to intervention development from these stud-
an instrument measuring the severity of a wide ies because, in all cases but one, the instru-
variety of problem behaviors (e.g., Achenbach, ments were used as part of larger assessment
1991). batteries measuring functional relations. No
Evidence based on relations with studies addressing acceptability of these instru-
other variables. The review yielded 36 analy- ments and their outcomes have been published
ses (across 25 studies) comparing the results to date. In addition, no resource was identified
of the MAS or FAI to other instruments or tech- that included a cost–benefit or cost–effective-
niques yielding information about functional ness analysis.
relations. From the studies comparing the MAS Discussion
to other indirect assessment instruments, 2
analyses compared it to other rating scales (i.e., A number of problems plague the exist-
the Problem Behavior Questionnaire [Lewis, ing research examining the measurement prop-
Scott, & Sugai, 1994] and the Questions About erties of FBA measures. In addition to the ab-
66
sence of well-designed research examining lated to examine their effects on the accuracy
some properties, the research designs and sta- of resulting data. Informants also could be
tistical analyses used in many studies appear asked to verbalize their internal speech (i.e.,
to be incongruent with the assumptions of the “think aloud”) while completing instruments,
assessment of behavior and its function. In and verbal protocols could be analyzed to iden-
addition, other research design characteristics tify components deemed necessary for accu-
and statistical analyses appear to have been rate identification of functional relations and
flawed. apparent failure to use these components
(Ericsson & Simon, 1993). Researchers exam-
Implications for Research and ining FBA interviews may benefit from use or
Evaluation modification of existing coding schemes for
Well-designed research is needed to pro- verbalizations during interviews, such as the
vide sound evidence supporting the use and Consultant Analysis Record (Bergan &
interpretation of indirect assessment instru- Kratochwill, 1990; Bergan & Tombari, 1971).
ments for FBAs. Such research will be im- For example, research could examine verbal
proved if authors and journal editors include interactions and their correspondence to inter-
details in publications pertaining to (a) the as- view content to determine integrity of assess-
sumptions of behavioral assessment, such as ment procedures.
variability of behavior according to setting and Internal structure. Although statisti-
situation, and (b) research designs and statistical analysis of the internal relations between
cal analyses that promote greater precision in segments of most interviews for FBAs, such
identifying the measurement characteristics of as the FAI, do not appear to be tenable, the
these instruments. As evident from the size- application of such analysis to items from rat-
able number of studies included in this review ing scales and select self-report instruments
that included fewer than 10 participants, re- seems appropriate—if evidence based on test
searchers need not abandon a focus on indi- content supports item groupings. This review
viduals in the search for general principles re- revealed five analyses examining internal
lated to measurement properties of instruments. structure of the MAS. It is logical to expect
Content and response processes. This that some groups of items (i.e., subscales)
review indicates that researchers have invested would covary in a meaningful manner. How-
too little effort identifying and reporting evi- ever, there would be no reason to expect inter-
dence based on test content and response pro- nal relations across all items (e.g., for a total
cesses. Although the content of most indirect score) because groups of items representing
assessment instruments for FBA is drawn from each function are designed to measure distinct
the rich research base of applied behavioral (and largely mutually exclusive) environment–
analysis, it is important that authors make ex- behavior interactions. Increasing the number
plicit the resources used and steps taken dur- of well-constructed items within a related
ing instrument development. Potential consum- group and conducting item-level analysis with
ers and content experts independent of the de- large groups of informants early in the devel-
velopment could assist in refining instrument opment of the instrument would likely be ben-
wording and ensuring that information about eficial. An unpublished revision of the MAS
all relevant variables, such as setting events, seems to have benefited from both more items
is elicited (Haynes et al., 1995). and initial item-level analysis (Durand, 1997;
A number of naturalistic and analogue Haim, 2003).
studies could examine the response processes Relations with other variables. More
of informants completing these instruments. carefully designed studies should be conducted
For example, some informant characteristics to examine the correspondence between mea-
(e.g., training in observation of behavior and sures of functional relations. To increase the
its function) could be systematically manipu- integrity of results, research designs should
67
ensure that assessment data are collected in a complished through carefully controlled ana-
manner in which informants, observers, or logue studies or through meticulous documen-
other experimenters are blind to the results of tation of intervention development based on a
other assessment methods. This review indi- stepwise review of assessment results.
cated that fewer than 20% of the studies pro- This review also revealed that research
viding evidence based on external relations examining the MAS and FAI has not exam-
reported this effort. Continued attention ined the degree to which consumers are satis-
should also be paid to statistical analyses fied with the assessment process and their out-
appropriate for categorical-level data. In comes (Eckert & Hintze, 2000; Gresham &
addition to raw agreement indexes, the use of Lopez, 1996; Wolf, 1978). If consumers do not
other statistical analyses for determining agree- find an assessment instrument acceptable, they
ment between categorical-level data, such as may be unlikely to actively engage in the as-
kappa coefficients, tests of marginal homoge- sessment process or to trust and utilize the as-
neity, and generalizability coefficients, also can sessment results. In addition, the costs and
be explored (Kraemer, Periyakoil, & Noda, benefits of use of these instruments have not
2002; Li & Lautenschlager, 1997; Powers & been quantified and compared (Yates & Taub,
Xie, 2000). As evident from this review, 2003). For example, the use of staff time is a
Pearson product-moment correlations between variable that is likely to drive the choice of
measures may continue to be used when data, assessment instruments—especially under con-
such as items, are continuous. Spearman cor- ditions where there is limited information about
relations may be used when the probability of the quality of the instruments (Shriver & Proc-
functions indicated by instruments can be rank tor, 2004). The recent evolution of the FAI into
ordered. an abbreviated interview is evidence that in-
Research examining external relations of strument authors are attentive to this variable
FBA instruments should continue to anticipate (Crone & Horner, 2003).
the confounding influences of context on mea- Consistency across time and infor-
surement of functional relations. Most assess- mant. Because behaviors may serve different
ment instruments focusing on symptoms of functions in different contexts, research focus-
psychopathology (e.g., Achenbach, 1991; ing on test–retest reliability and interrater reli-
Reich, 2000) yield the most meaningful results ability should examine the consistency of re-
when informants observe symptoms across ports of functional relations derived from ob-
contexts. However, the truest measures of func- servations of identical settings and situations.
tional relations result from deliberate specifi- As a result of the variability of behavior and
cation and observation of settings (e.g., a class- its function across contexts and time, infor-
room) and situations (e.g., independent seat mants’ repeated observations of behaviors in
work) in which the behaviors occur (Shriver applied settings between initial and subsequent
et al., 2001). This review identified few stud- completion of the instruments undermines ac-
ies that reported information about the specific curate test–retest reliability estimates. Simi-
contexts of the problem behaviors. larly, informants observing different contexts
Consequences of assessment. Al- in which the behaviors occur weaken estimates
though treatment utility is considered by some of inter-rater reliability. Influences on consis-
to be the ultimate measurement property, this tency of assessment results across time and
review revealed only one study that examined informant will be best pinpointed by carefully
the isolated effects of the MAS. In all other controlled laboratory or analogue studies, such
cases, the instruments were included in a larger as those including observation of problem be-
assessment battery to inform the development haviors by videotape (Johnston &
of interventions. Research should partial out Pennypacker, 1993). In addition, study of these
the contributions of specific assessment results measurement properties will benefit from
to treatment development (Hayes et al., 1987; greater attention paid to statistical analyses
Nelson-Gray, 2003). This research can be ac- appropriate for categorical-level data.
68
Limitations tion. In particular, to demonstrate evidence of

response processes, practitioners can study
This review is not without limitations. adherence to interviews to determine whether
Although efforts were made to ensure inclu- they are being completed as intended and
sion of all related studies, it is possible that the whether specific variables, such as training
search strategies for the instruments and the scripts, increase assessment integrity
research examining them missed important (Ehrhardt, Barnett, Lentz, Stollar, & Reifin,
resources. Similarly, inclusion of instruments 1996). Practitioners can describe the effects
that did not meet the criteria for inclusion, such of well-monitored interventions developed
as some self-report inventories, may have led solely from indirect assessment instruments
to different conclusions. Finally, although in- and examine the incremental benefits of more
ter-rater agreement in coding the studies was direct assessment on interventions. Practitio-
high, it is possible that the decision-making was ners can also assess the acceptability of assess-
consistent but biased in a critical manner. ment instruments through surveys of teachers,
Implications for Practice parents, and other consumers of these instru-
ments.
Given the limitations of indirect assess-
ment instruments used during FBAs and the Footnotes
restricted and sometimes flawed research ex- 1
In this article the term functional analysis
amining the two most often studied instru- is used to refer to the process of systematically al-
ments, school psychologists should continue tering antecedents of behavior, consequences of
to rely on direct, minimally inferential assess- behavior, or both to examine effects on problem
ment techniques to identify functional rela- behavior.
2
tions. However, indirect assessment instru- Throughout this article, the term study is
ments, such as comprehensive interviews, con- used to describe a systematic investigation examin-
ing data from a set of participants. The term resource
ducted with knowledgeable informants may
is used to describe the medium for these studies and
yield unique and relevant information that aids for other evidence of measurement properties. Ex-
in the identification of functional relations. For amples include journal articles, books, and book
example, setting events or person variables chapters. The term analysis is used to describe the
(e.g., establishing operations) affecting prob- part of a study in which statistical techniques are
lem behaviors may not be apparent if observa- used to answer a research question. For example,
tions, functional analysis, or both techniques one resource, an article, contained four studies. One
are used exclusively. If indirect assessment of these studies contained two analyses examining
instruments are used, school psychologists relations with other variables.
3
At least a pair of studies examining the
should carefully review the content of the in-
measurement properties of the Questions About
struments due to minimal evidence based on Behavioral Functioning Scale (Vollmer &
content identified in this review. They should Matson, 1999), the Functional Assessment In-
determine if content is applicable to the set- formant Record for Teachers (Edwards, 2002),
tings of interest, if important information is the Problem Behavior Questionnaire (Lewis,
elicited, and if items are worded well and ap- Scott, & Sugai, 1994), the Student-Assisted
propriately for informants. Functional Assessment Interview (Kern, Dunlap,
Practitioners can and should contribute Clarke, & Childs, 1994), and the Student-Di-
to the research base regarding these assessment rected Functional Assessment Interview (O’Neill
et al., 1997) were identified. However, none were
instruments. Although some measurement
supported by sufficient independent research.
properties are most applicable to carefully con- 4
The coding sheets and coding manual can
trolled research and group-level analyses, ac- be obtained from the first author.
cumulated study of individual cases of FBA 5
Treatment utility evidence was defined lib-
can also be useful in building understanding erally as substantiation of links between assessment
of these instruments, especially about response results and resulting interventions (cf. Hayes et al.,
processes, treatment utility, and user satisfac- 1987 and Nelson-Gray, 2003).
69
References adequacy. Journal of Psychoeducational Assessment,

5, 313–326.
References marked with an asterisk were in- Chambless, D. L., & Hollon, S. D. (1998). Defining em-
pirically supported therapies. Journal of Consulting
cluded in the literature review. and Clinical Psychology, 66, 7–18.
Cone, J. D. (1977). The relevance of reliability and valid-
Achenbach, T. M. (1991). Integrative guide for the 1991 ity for behavioral assessment. Behavior Therapy, 8,
CBCL/4–18, YSR, and TRF profiles. Burlington: Uni- 427–430.
versity of Vermont, Department of Psychiatry. Cone, J. D. (1978). The behavioral assessment grid (BAG):
Aiken, L. R. (1997). Psychological testing and assessment A conceptual framework and a taxonomy. Behavior
(9th ed.). Needham Heights, MA: Allyn & Bacon. Therapy, 9, 882–888.
Akande, A. (1994). The motivation assessment profiles of Cone, J. D. (1997). Issues in functional analysis in behav-
low-functioning children. Early Child Development ioral assessment. Behavioral Research and Therapy,
and Care, 101, 101–107. 35, 259–275.
Akande, A. (1998). Some South African evidence of the Cone, J. D. (1998). Psychometric considerations: Concepts,
inter-rater reliability of the Motivation Assessment contents, and methods. In M. Hersen & A. S. Bellack
Scale. Educational Psychology, 18, 111–115. (Eds.), Behavioral assessment: A practical handbook
American Educational Research Association, American (4th ed., pp. 22–46). Boston: Allyn & Bacon.
Psychological Association, & National Council on * Conroy, M. A., Fox, J. J., Bucklin, A., & Good, W. (1996).
Measurement in Education. (1999). Standards for edu- An analysis of the reliability and stability of the Moti-
cational and psychological testing. Washington, DC: vation Assessment Scale in assessing the challenging
American Educational Research Association. behaviors of persons with developmental disabilities.
Anastasi, A., & Urbina, S. (1997). Psychological testing Education and Training in Mental Retardation and
(7th ed.). Upper Saddle River, NJ: Prentice–Hall. Developmental Disabilities, 31, 243–250.
*Arndorfer, R. E., Miltenberger, R. G., Woster, S. H., Cooper, H. M. (1998). Synthesizing research: A guide for
Rortvedt, A. K., & Gaffaney, T. (1994). Home-based literature reviews (3rd. ed). Thousand Oaks, CA: Sage.
descriptive and experimental analysis of problem be- * Crawford, J., Brockel, B., Schauss, S., & Miltenberger,
haviors in children. Topics in Early Childhood Special R. G. (1992). A comparison of methods for the func-
Education, 14, 64–87. tional assessment of stereotypic behavior. Journal of
Barnett, D. W., Lentz, F. E., Jr., & Macmann, G. (2000). the Association for Persons with Severe Handicaps,
Psychometric qualities of professional practice. In E. 17, 77–86.
S. Shapiro & T. R. Kratochwill (Eds.), Behavioral as- * Crone, D. A., & Horner, R. H. (2003). Building positive
sessment in schools: Theory, research, and clinical behavior support systems in schools: Functional be-
foundations (2nd ed., pp. 355–386). New York: havioral assessment. New York: Guilford Press.
Guilford Press. * Cunningham, E., & O’Neill, R. E. (2000). Comparison
* Barton-Arwood, S. M., Wehby, J. H., Gunter, P. L., & of results of functional assessment and analysis meth-
Lane, K. L. (2003). Functional behavior assessment ods with young children with autism. Education and
rating scales: Intrarater reliability with students with Training in Mental Retardation and Developmental
emotional or behavioral disorders. Behavioral Disor- Disabilities, 35, 406–414.
ders, 28, 386–400. * Duker, P. C., & Sigafoos, J. (1998). The Motivation As-
Beaver, B. R., & Busse, R. T. (2000). Informant reports: sessment Scale: Reliability and construct validity
Conceptual and research bases of interviews with par- across three topographies of behavior. Research in
ents and teachers. In E. S. Shapiro & T. R. Kratochwill Developmental Disabilities, 19, 131–141.
(Eds.), Behavioral assessment in schools: Theory, re- Dunlap, G., Newton, J., S., Fox, L., Benito, N., & Vaughn,
search, and clinical foundations (2nd ed., pp. 257– B. (2001). Family involvement in functional assess-
287). New York: Guilford Press. ment and positive behavior support. Focus on Autism
Bergan, J. R., & Kratochwill, T. R. (1990). Behavioral and Other Developmental Disabilities, 16, 215–221.
consultation and therapy. New York: Plenum Press. * Dunlap, G., White, R., Vera, A., Wilson, D., & Panacek,
Bergan, J. R., & Tombari, M. L. (1971). The analysis of L. (1996). The effects of multi-component, assessment-
verbal interactions occurring during consultation. Jour- based curricular modifications on the classroom be-
nal of School Psychology, 13, 209–225. havior of children with emotional and behavioral dis-
* Bihm, E. M., Kienlen, T. L., Ness, M. E., & Poindexter, orders. Journal of Behavioral Education, 4, 481–500.
A. R. (1991). Factor structure of the Motivation As- Durand, V. M. (1997). Motivation Assessment Scale II:
sessment Scale for persons with mental retardation. Test version. Unpublished instrument.
Psychological Reports, 68, 1235–1238. * Durand, V. M., & Carr, E. G. (1991). Functional com-
* Bird, F., Dores, P., Moniz, D., & Robinson, F. (1989). munication training to reduce challenging behavior:
Reducing severe aggression and self-injurious behav- Maintenance and application in new settings. Journal
ior with functional communication training: Direct, of Applied Behavior Analysis, 24, 251–264.
collateral and generalized results. American Journal * Durand, V. M., & Crimmins, D. B. (1988). Identifying
of Mental Retardation, 94, 37–48. the variables maintaining self-injurious behavior. Jour-
Bracken, B. A. (1987). Limitations of preschool instru- nal of Autism and Developmental Disorders, 18, 99–
ments and standards for minimal levels of technical 117.
70
* Durand, V. M., & Crimmins, D. B. (1992). The Motiva- Hammill, D. D., Brown, L., & Bryant, B. R. (1992). A
tion Assessment Scale administrative guide. Topeka, consumer’s guide to tests in print (2nd ed.). Austin, TX:
KS: Monaco & Associates. PRO–ED.
* Durand, V. M., Crimmins, D. B., Caulfield, M., & Tay- * Haring, T. G., & Kennedy, C. H. (1990). Contextual
lor, J. (1989). Reinforcer assessment: I. Using prob- control of problem behavior in students with severe
lem behavior to select reinforcers. Journal of the As- disabilities. Journal of Applied Behavior Analysis, 23,
sociation for Persons With Severe Handicaps, 14, 113– 235–243.
126. Hartman, D. P., Roper, B. L., & Bradford, C. C. (1979).
* Durand, V. M., & Kishi, G. (1987). Reducing severe Some relationships between behavioral and traditional
behavior problems among persons with dual sensory assessment. Journal of Behavioral Assessment, 1, 3–
impairments: An evaluation of a technical assistance 21.
model. Journal of the Association for Persons with Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The
Severe Handicaps, 12, 2–10. treatment utility of assessment: A functional approach
Ebel, R. (1961). Must all tests be valid? American Psy- to evaluating assessment quality. American Psycholo-
chologist, 15, 546–553. gist, 42, 963–974.
Eckert, T. L., & Hintze, J. M. (2000). Behavioral concep- Haynes, S. N., & O’Brien, W. H. (2000). Principals and
tions and applications of acceptability: Issues related practice of behavioral assessment. New York: Kluwer
to service delivery and research methodology. School Academic/Plenum.
Psychology Quarterly, 15, 123–148. Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995).
Edwards, R. P. (2002). A tutorial for using the Functional Content validity in psychological assessment: A func-
Assessment Informant Record for Teachers. Proven tional approach to concepts and methods. Psychologi-
Practice: Prevention and Remediation Solutions for cal Assessment, 7, 238–247.
Schools, 4, 31–33. Individuals with Disabilities Education Act Amendments
Ehrhardt, K. E., Barnett, D. W., Lentz, F. E., Jr., Stollar, S. of 1997, (Pub L No. 105–17. 20 USC Chapter 33, Sec-
A., & Reifin, L. H. (1996). Innovative methodology tions 1400 et seq. (Statute)
in ecological consultation: Use of scripts to promote Johnston, J. M., & Pennypacker, H. S. (1993). Strategies
treatment acceptability and integrity. School Psychol- and tactics of behavioral research (2nd ed.). Hillsdale,
ogy Quarterly, 11, 149–168. NJ: Lawrence Erlbaum.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analy- * Kearney, C. A. (1994). Interrater reliability of the Moti-
sis: Verbal reports as data (rev. ed.). Cambridge, MA: vation Assessment Scale: Another, closer look. Jour-
MIT Press. nal of the Association for Persons with Severe Handi-
Floyd, R. G., & Bose, J. E. (2003). A critical review of caps, 19, 139–142.
rating scales assessing emotional disturbance. Jour- Kern, L., Dunlap, G., Clarke, S., & Childs, K. E. (1994).
nal of Psychoeducational Assessment, 21, 43–78. Student-assisted functional assessment interview.
Foster, S. L., & Cone, J. D. (1995). Validity issues in clini- Diagnostique, 19(2,3), 29–39.
cal assessment. Psychological Assessment, 7, 248–260. * Kinch, C., Lewis-Palmer, T., Hagan-Burke, S., & Sugai,
* Gibson, A. K., Hetrick, W. P., Taylor, D. V., Sandman, G. (2001). A comparison of teacher and student func-
C. A., & Touchette, P. (1995). Relating the efficacy of tional behavior assessment interview information from
naltrexone in treating self-injurious behavior to the low-risk and high-risk classrooms. Education and
Motivation Assessment Scale. Journal of Developmen- Treatment of Children, 24, 480-494.
tal and Physical Disabilities, 7, 215–220. Kraemer, H. C., Periyakoil, V. S., & Noda, A. (2002).
Gresham, F. M. (1984). Behavioral interviews in school Kappa coefficients in medical research. Statistics in
psychology: Issues in psychometric adequacy and re- Medicine, 21, 2109–2129.
search. School Psychology Review, 13, 17–25. Kratochwill, T. R., & Shapiro, E. S. (2000). Conceptual
Gresham, F. M. (2003). Establishing the technical ad- foundations of behavioral assessment in schools. In E.
equacy of functional behavioral assessment: Concep- S. Shapiro & T. R. Kratochwill (Eds.), Behavioral as-
tual and measurement challenges. Behavioral Disor- sessment in schools: Theory, research, and clinical
ders, 28, 282–298. foundations (2nd ed., pp. 3–15). New York: Guilford
Gresham, F. M., & Davis, C. J. (1988). Behavioral in- Press.
terviews with teachers and parents. In E. S. Shapiro Kratochwill, T. R., & Stoiber, K. C. (2002). Evidence-
& T. R. Kratochwill (Eds.), Behavioral assessment based interventions in school psychology: Conceptual
in schools: Conceptual foundations and practical foundations of the Procedural and Coding Manual of
applications (pp. 455–493). New York: Guilford Division 16 and the Society for the Study of School
Press. Psychology Task Force. School Psychology Quarterly,
Gresham, F. M., & Lopez, M. F. (1996). Social valida- 17, 341–389.
tion: A unifying concept for school-based consultation * Lawry, J. R., Storey, K., & Danko, C. D. (1993). Ana-
research and practice. School Psychology Quarterly, lyzing problem behaviors in the classroom: A case
11, 204–227. study of functional analysis. Intervention in School and
Haim, A. (2003). The analysis and validation of the Moti- Clinic, 29, 96-100.
vation Assessment Scale-II Test Version: A structural Lennox, D. B., & Miltenberger, R. G. (1989). Conducting
equation model. Dissertation Abstracts International: a functional assessment of problem behavior in applied
Section B: The Sciences and Engineering, 63(10-B): settings. Journal of the Association of Persons with
4903. Severe Handicaps, 14, 304–311.
71
Lewis, T. J., Scott, T., & Sugai, G. (1994). The Problem Reich, W. (2000). Diagnostic Interview for Children and
Behavior Questionnaire: A teacher-based assessment Adolescents. Journal of the American Academy of
instrument to develop interventions to develop func- Child and Adolescent Psychiatry, 39, 59–66.
tional hypotheses of problem behavior in general edu- * Scotti, J. R., Schulman, D. E., & Hojnacki, R. M. (1994).
cation classrooms. Diagnostique, 19, 103–115. Functional analysis and unsuccessful treatment of
* Lewis, T. J., & Sugai, G. (1996). Functional assessment Tourette’s syndrome in a man with profound mental
of problem behaviors: A pilot investigation of the com- retardation. Behavior Therapy, 25, 721–738.
parative and interactive effects of teacher and peer so- * Shogren, K. A., & Rojahn, J. (2003). Convergent reli-
cial attention on students in general education settings. ability and validity of the Questions About Behavioral
School Psychology Quarterly, 11, 1–19. Function and the Motivation Assessment Scale: A rep-
Li, M. F., & Lautenschlager, G. (1997). Generalizability lication study. Journal of Developmental and Physi-
theory applied to categorical data. Educational and cal Disabilities, 15, 367–375.
Psychological Measurement, 57, 813–822. Shriver, M. D., Anderson, C. M., & Proctor, B. (2001).
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta- Evaluating the validity of functional behavior assess-
analysis. Thousand Oaks, CA: Sage. ment. School Psychology Review, 30, 180–192.
Merrell, K. W. (2003). Behavioral, social, and emotional Shriver, M. D., & Proctor, B. (2005). Social validity in
assessment of children and adolescents. Mahwah, NJ: disseminating functional behavior assessment technol-
Lawrence Erlbaum. ogy: Research directions for meeting consumer needs.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational Manuscript submitted for publication.
measurement (3rd ed., pp. 13–103). Washington, DC: * Sigafoos, J., Kerr, M., & Roberts, D. (1994). Interrater
American Council on Education. reliability of the Motivation Assessment Scale: Fail-
Messick, S. (1995). Validity of psychological assessment: ure to replicate with aggressive behavior. Research in
Validity of inferences from persons’ responses and Developmental Disabilities, 15, 333–342.
performances as scientific inquiry into score meaning. Silva, F. (1993). Psychometric foundations and behavioral
American Psychologist, 50, 741–749. assessment. Newbury Park, CA: Sage.
Nelson, R. O. (1983). Behavioral assessment: Past, present, * Singh, N. N., Donatelli, L. S., Best, A., Williams, E. E.,
and future. Behavioral Assessment, 5, 195–206. Barrera, F. J., Lenz, M. W., Landrum, T. J., Ellis, C.
Nelson, R. O., Hay, I. R., & Hay, W. M. (1977). Com- R., & Moe, T. L. (1993). Factor structure of the Moti-
ments on Cone’s “The relevance of reliability and va- vation Assessment Scale. Journal of Intellectual Dis-
lidity for behavioral assessment.” Behavior Therapy, abilities Research, 37, 65–75.
8, 427–430. * Spreat, C., & Connelly, L. (1996). Reliability analysis
Nelson-Gray, R. O. (2003). Treatment utility of psycho- of The Motivation Assessment Scale. American Jour-
logical assessment. Psychological Assessment, 15, nal of Mental Retardation, 100, 528–532.
521–531. Sturmey, P. (1994a). Assessing the function of aberrant be-
* Newton, J. T., & Sturmey, P. (1991). The Motivation haviors: A review of psychometric instruments. Jour-
Assessment Scale: Interrater reliability and internal nal of Autism and Developmental Disorders, 24, 293–
consistency in a British sample. Journal of Mental 304.
Deficiency Research, 35, 472–474. * Sturmey, P. (1994b). Assessing the functions of self-in-
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric jurious behavior: A case of assessment failure. Jour-
theory (3rd ed.). New York: McGraw Hill. nal of Behavior Therapy and Experimental Psychia-
* O’Neill, R. E., Horner, R. H., Ablin, R. W., Storey, K., & try, 25, 331–336.
Sprague, J. R. (1990). Functional analysis of problem Sugai, G., Lewis-Palmer, T., & Hagan-Burke, S. (2000).
behaviors: A practical guide. Sycamore, IL: Sycamore. Overview of the functional behavioral assessment pro-
* O’Neill, R. E., Horner, R. H., Ablin, R. W., Sprague, J. cess. Exceptionality, 8(3), 149–160.
R., Storey, K., & Newton, J. S. (1997). Functional as- * Thompson, S., & Emerson, E. (1995). Inter-informant
sessment and program development for problem be- agreement on the Motivation Assessment Scale: An-
haviors: A practical handbook. New York: Brooks/ other failure to replicate. Mental Handicap Research,
Cole. 8, 203–208.
* Paclawskyj, T. R., Matson, J. L., Rush, K. S., Smalls, Y., Vollmer, T. R., & Matson, J. L. (1999). Questions About
& Vollmer, T. R. (2001). Assessment of the conver- Behavioral Function manual. Baton Rouge, LA: Sci-
gent validity of the Questions About Behavioral Func- entific Publishers.
tion scale with analogue functional analysis and the Witt, J. C., Daly, E. M., & Noell, G. (2000). Func-
Motivation Assessment Scale. Journal of Intellectual tional assessments: A step-by-step guide to solv-
Disability Research, 45, 484–494. ing academic and behavior problems. Longmont,
Powers, D. A., & Xie, Y. (2000). Statistical methods for CO: Sopris West.
categorical data analysis. San Diego, CA: Academic Wolery, M. (2003, October). Single-subject methods: Util-
Press. ity for establishing evidence for practice. Paper pre-
* Reese, R. M., Richman, D. M., Zarcone, J., & Zarcone, sented at the 19th Annual International Conference on
T. (2003). Individualizing functional assessments for Young Children with Special Needs and Their Fami-
children with autism: The contribution of perseverative lies, Washington, DC.
behavior and sensory disturbances to disruptive behav- Wolf, M. M. (1978). The case of subjective measurement
ior. Focus on Autism and Other Developmental Dis- or how applied behavior analysis is finding its heart.
abilities, 18, 89–94. Journal of Applied Behavior Analysis, 11, 203–214.
72
* Yarbrough, S. C., & Carr, E. G. (2000). Some relation- ment: We should, we can, here’s how. Psychological
ships between informant assessment and functional assessment, 15, 478–495.
analysis of problem behavior. American Journal of * Zarcone, J. R., Rodgers, T. A., Iwata, B. A., Rourke, D.
Mental Retardation, 105, 130–151. A., & Dorsey, M. F. (1991). Reliability analysis of the
Yates, B. T., & Taub, J. (2003). Assessing the costs, cost- Motivation Assessment Scale: A failure to replicate.
effectiveness, and cost-benefit of psychological assess- Research in Developmental Disabilities, 12, 349–360.
Randy G. Floyd, PhD is an Assistant Professor of Psychology at The University of Mem-

phis. He received his doctoral degree in School Psychology from Indiana State University.
His research interests include assessment of cognitive abilities, identification of reading
and mathematics aptitudes, and improving behavioral assessment methods.
Robin L. Phaneuf, PhD is an Assistant Professor of Psychology at The University of Mem-

phis. She received her doctoral degree in School Psychology from the University of Mas-
sachusetts Amherst. Her research interests include assessment and intervention for early
learning and social emotional behavior of young children and application of school-based
consultation procedures to community settings.
Susan M. Wilczynski, PhD, BCBA, is an Assistant Professor of Pediatric Psychology at

Munroe-Meyer Institute for Genetics and Rehabilitation, a unit of the University of Ne-
braska Medical Center. She is a licensed psychologist and certified behavior analyst whose
applied and research interests address treatment of children with Autistic Spectrum Disor-
ders. She developed and administrates Project BEST-CASE, an intensive early childhood
intervention program for children with autism and related disorders.
73

07 2005 Foyd Properties of Assessments

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

07 2005 Foyd Properties of Assessments

Uploaded by

Copyright:

Available Formats

School Psychology Review,

2005, Volume 34, No. 1, pp. 58-73

Measurement Properties of Indirect Assessment Methods

Randy G. Floyd and Robin L. Phaneuf

Abstract. Indirect assessment instruments used during functional behavioral as-

mediate antecedents, consequences that may tions applying measurement standards to

sessment techniques or instruments and (b) Results

Interrater reliability Presence of analysis Experience of different raters controlled so

Evidence based on Evaluation of instrument instructions and response formats

Grouped by classification of participant on whom data are collected

Limitations tion. In particular, to demonstrate evidence of

References adequacy. Journal of Psychoeducational Assessment,

Randy G. Floyd, PhD is an Assistant Professor of Psychology at The University of Mem-

Robin L. Phaneuf, PhD is an Assistant Professor of Psychology at The University of Mem-

Susan M. Wilczynski, PhD, BCBA, is an Assistant Professor of Pediatric Psychology at

You might also like