Professional Documents
Culture Documents
07 2005 Foyd Properties of Assessments
07 2005 Foyd Properties of Assessments
Susan M. Wilczynski
University of Nebraska Medical Center
Functional behavioral assessment (FBA) ing the beginning stages of an FBA. Indirect
has received considerable attention in the pro- methods are characterized by being removed
fessional literature of school psychology since in time and place from the phenomena they
the passage of the 1997 revision of the Indi- measure (i.e., behaviors and functional rela-
viduals with Disabilities Education Act (IDEA, tions). Examples include ratings of others, in-
1997), which mandated the use of FBA proce- terviews, and self-report instruments (Cone,
dures and positive behavioral supports in some 1978).
instances. Although IDEA did not specify what Indirect methods of assessment offer a
constitutes an FBA, a number of heuristic number of potential benefits to the process of
models for FBA have been proposed (e.g., FBA. For example, interviews may be useful
O’Neill et al., 1997; Sugai, Lewis-Palmer, & in defining problem behaviors, determining
Hagan-Burke, 2000; Witt, Daly, & Noell, their severity, and specifying optimal condi-
2000). Across these models, indirect methods tions under which to observe the behavior.
of assessment are typically recommended dur- Additionally, they may contribute to hypoth-
Authors’ Note. We thank numerous instrument authors for responding to our requests for information,
David M. Murray for assistance in locating resources describing categorical data-analytic procedures, and
Renee Bergeron for proofing activities. We also appreciate editorial guidance from Cathy F. Telzrow and
feedback from several anonymous reviewers about earlier versions of this article.
Correspondence concerning this article should be addressed to Randy G. Floyd, PhD, The University of
Memphis, Department of Psychology, Memphis, TN 38152; E-mail: rgfloyd@memphis.edu.
Copyright 2005 by the National Association of School Psychologists, ISSN 0279-6015
58
Measurement Properties
esis-driven functional analysis1 and reduce the evaluation of traditional measurement proper-
time required to identify functional relations ties, such as construct or criterion-related va-
because fewer conditions may need to be ma- lidity (Ebel, 1961; Johnston & Pennypacker,
nipulated. The informant-based nature of these 1993). Second, the ability of an FBA instru-
methods also engages valuable stakeholders ment to inform effective interventions (i.e.,
(e.g., parents and teachers) who will later be treatment utility) is viewed by some as the most
involved in the development, implementation, important measurement property—if not ulti-
and monitoring of behavior support plans mately the only important property (Hayes,
(Dunlap, Newton, Fox, Benito, & Vaughn, Nelson, & Jarrett, 1987; Nelson-Gray, 2003).
2001). They logically are less time-consum- Third, the individual or idiographic nature of
ing than direct observations, and they require behavioral assessment does not appear to lend
notably less training, expertise, and staff col- itself to group-level statistical analyses that are
laboration to complete than functional analy- typically used to examine traditional measure-
sis. Furthermore, they may be the only assess- ment properties (Nelson, 1983; Nelson, Hay,
ment method available when problem behav- & Hay, 1977). Finally, the methodology and
iors occur at a low frequency or when func- research designs examining traditional mea-
tional analysis is unethical or untenable surement properties may be incongruent with
(O’Neill et al., 1997). what is known about behavior–environment
Although the benefits of indirect meth- interactions (Hartman, Roper, & Bradford,
ods for FBAs are numerous, informant reports 1979; Nelson, 1983; Nelson et al., 1977). For
stem from recollections of the problem behav- example, because functional relations are ex-
iors and personal judgments about behavior– pected to be variable across contexts—unlike
environment interactions and not their direct traits targeted by many traditional measure-
measurement. As a consequence, erroneous hy- ment properties—they are less likely to dem-
potheses may be developed or faulty con- onstrate adequate levels of test–retest reliabil-
clusions may be drawn. Despite this signifi- ity or agree with other measures of functional
cant limitation, no agreed-upon guidelines have relations obtained from different contexts.
emerged to evaluate the quality of the results These reasons for reluctance ground
obtained from indirect methods. Without a FBAs in low-inference assessment of directly
sound framework for examining their quality, observed behaviors and in assessments and
consumers may be left not only misguided by interventions that are useful in addressing the
their results but also uninformed about which target concerns for individuals. However, re-
instruments provide the most accurate and most luctance to develop and utilize measurement
useful information. Just as measurement stan- guidelines for assessment of functional rela-
dards are applied to the evaluation of norm- tions hinders a broader understanding of which
referenced tests to promote quality of results, properties produce the best results and which
so too must criteria be applied to indirect meth- properties produce the most error. Consider-
ods for FBAs. ation of the measurement properties does not
Despite several publications focusing on negate a practical focus on assessing and ad-
evaluating measures used during FBAs (e.g., dressing the target concerns of individuals.
Cone, 1997, 1998; Gresham, 2003; Shriver, Rather, measurement guidelines assist in evalu-
Anderson, & Proctor, 2001), there appears to ating the quality of assessment information in
be reluctance to embrace measurement stan- and of itself. Such guidelines ultimately lead
dards for assessment of functional relations, users to develop stronger hypotheses and con-
particularly among individuals who embrace clusions that are more accurate. Therefore, the
applied behavior analysis (Kratochwill & purpose of this article is threefold. It offers re-
Shapiro, 2000). There are at least four reasons search design characteristics and statistical
for this apparent reluctance. First, it can be ar- analyses appropriate for examining assessment
gued that direct assessment of behavior through instruments designed to identify functional
observation largely eliminates the need for relations. It evaluates the research examining
59
School Psychology Review, 2005, Volume 34, No. 1
the measurement properties of two such instru- primary research examining the use and inter-
ments. It also provides recommendations for pretation of the instruments. In addition, dur-
future research and instrument use. ing contact with instrument authors, research
including the instruments and the existence of
Method manuals or other published materials summa-
Identification of Instruments rizing characteristics of the instruments were
reported.
Several strategies were employed to
identify indirect assessment instruments mea- Instrument Selection and Description
suring functional relations to include in this Instruments selected for review had at
review (Cooper, 1998; Lipsey & Wilson, least (a) three studies examining their measure-
2001). First, electronic bibliographic searches ment properties, and (b) two studies conducted
of PsychInfo and ERIC (from January 1887 by independent researchers (i.e., those who
through August 2001) were conducted for pub- were not authors of the instruments or associ-
lished resources2 describing indirect assess- ated with their research group). These two se-
ment instruments for FBA. Search terms in- lection criteria were based on procedures de-
cluded functional behavioral assessment, func- signed to identify empirically supported thera-
tional analysis, behavioral assessment, infor- pies and interventions (Chambless & Hollon,
mant, questionnaire, behavior ratings scale, 1998; Kratochwill & Stoiber, 2002; Wolery,
checklist, interview, psychometric properties, 2003). The first criterion was used to include
accuracy, validity, reliability, and instruments with an established body of re-
generalizability. Second, after the resources search examining their use. The second crite-
were obtained, their content was reviewed to rion was used to include research conducted
identify additional instruments measuring across contexts, participants, and researchers.
functional relations. Third, testing information Only two instruments met both selection cri-
clearinghouses on the World Wide Web (e.g., teria.3
Buros Institute of Mental Measurements) and
assessment instrument catalogs from promi- Motivation Assessment Scale. The
nent test publishers were searched. Finally, the MAS (Durand & Crimmins, 1992) is a rating
primary authors of the instruments identified scale designed to identify the function of prob-
during the search were contacted by mail, elec- lem behaviors. The MAS consists of 16 items
tronic mail, or phone. They provided names of scored on a 7-point scale (ranging from Never
additional indirect assessment instruments to to Always) that may be completed indepen-
include in the review (Cooper, 1998). Based dently or through an interview with parents,
on these search strategies, 19 indirect assess- teachers, or careproviders. Items measure four
ment instruments (or collections of instru- classes of functional relations (Attention, Tan-
ments) measuring functional relations were gible, Escape, and Sensory), and the instrument
identified. yields four subscales representing these classes.
Users identify functional relations based on the
Identification of Research subscale with the highest rank.
In order to identify all research support- Functional Assessment Interview.
ing the use and interpretation of the identified The FAI (O’Neill et al., 1997) is a lengthy in-
instruments, additional electronic bibliographic terview for parents, teachers, or careproviders.
searches of articles, books, and book chapters The FAI is an extension of the Functional
were conducted using PsychInfo and ERIC Analysis Interview (O’Neill, Horner, Albin,
(from January 1887 to January 2003) using the Storey, & Sprague, 1990), and it has recently
names of the instruments and their authors’ been revised and abbreviated for use in school
names. Subsequently, abstracts and reference settings (Crone & Horner, 2003). The interview
lists obtained in the search were reviewed, and includes 11 sections soliciting information
the texts of resources were scoured to identify about problem behaviors, setting events, im-
60
Measurement Properties
61
School Psychology Review, 2005, Volume 34, No. 1
62
Table 1
Criteria for Coding Reliability and Validity Evidence
Measurement Property Variables Associated With Measurement Property Research Design Characteristics
Test–retest reliability or stability Presence of analysis Informant did not observe problem behavior
Interval between ratings
Short-term (< 1 month)
Long-term (> 1 month)
Statistical analyses
Correlations between subscales or groups of items
Correlations between items
Agreement between subscales, groups of items, or indicators of function
Agreement between items or individual question
Evidence based on content Development of items based on literature review, review of case histories, or both
Development of items based on review of other instruments
Development of items based on direct observations
Development of items based on caregiver (parent, teacher, staff) input
Development of items based on expert opinion (other than authors)
Review and evaluation of item clarity
Review and evaluation of bias (gender, racial, special needs)
Review and item tryouts by users in applied settings
Readability analysis of items
Readability analysis of written informant instructions
Questions about function grouped by experts or caregivers
Analysis of match between interviewer verbalizations and administration criteria (Table 1 continues)
63
64
(Table 1 continued)
Measurement Property Variables Associated With Measurement Property Research Design Characteristics
Evidence based on internal Item intercorrelations within scales or subgroups of items Informant blind to the results of other
structure Item intercorrelations across all items (within total score) assessment methods
Item correlations with total score (i.e., item–total correlations)
Internal consistency coefficients including only items within scales or subgroups of items
Internal consistency coefficients including all items (within total score)
Exploratory factor analysis of items/Confirmatory factor analysis of items
Correlations between subscales or subgroups of items
Evidence based on relations Results of the instrument were compared to results from a distinctly different
with other variables assessment instrument or technique that identifies function
Presence of analysis
Instrument or technique (and informant, if applicable)
Timing of measurements
School Psychology Review, 2005, Volume 34, No. 1
Concurrent
Predictive
Statistical analyses
Correlations between subscales, groups of items, or indicators of function
Agreement between subscales, groups of items, or indicators of function
Groups were formed based on an external characteristic and results of the
instrument were compared between groups
Presence of analysis
Grouping
(13.0%) reported the general level of training amined the administration or scoring of the
or education of informants. MAS, and 3 examined the FAI. Statistics used
to examine interrater reliability included cor-
Context of Problem Behaviors relations between subscales or groups of items
Informants most often assessed problem (13 studies) and correlations between items (8
behaviors occurring in school or preschool set- studies). Also evident were studies examining
tings (21 studies or 45.7%) and in residential agreement between subscales, groups of items,
treatment or clinical settings (17 studies or or indicators of function (11 studies) and those
37.0%). Although it is likely that informants examining agreement between items or indi-
routinely delineated the specific contexts or vidual questions (6 studies). None of the stud-
settings (e.g., classroom) in which the prob- ies reported controlling the observations of
lem behaviors occurred before completing the informants so that their experiences with the
FBA, only 16 studies (34.8%) reported this problem behaviors were identical. However, 4
information. Even fewer studies (8 or 17.4%) studies reported the specific settings in which
reported the specific situations in which the the problem behaviors occurred, and 1 reported
specific situations. Thus, for most all studies
problem behaviors occurred, such as during
reviewed, it is likely that informants observed
independent reading time or during small group
problem behaviors in different settings and
instruction.
situations and, as a result, completed the as-
Measurement Properties sessment based on observations of different
behavior–environment interactions. Despite
Reliability. Five studies (10.9%) exam- the sizeable number of studies examining this
ined test–retest reliability or stability. All of measurement property, error attributed prima-
these studies focused on the MAS; none fo- rily to different perceptions and memories of
cused on the reliability of the functions identi- informants cannot be readily identified.
fied using the FAI. All but one of these studies
Evidence based on content. Across
used samples of children, and all but two were
the resources for the MAS and FAI, little in-
conducted in school settings. Of these five stud- formation was located documenting the appro-
ies, four examined short-term generalizability priateness and completeness of instrument
across time (<1 month), and three examined items or questions. As described very briefly
long-term stability (>1 month). Statistics used in only two resources, the content of the MAS
to examine test–retest reliability or stability items is based on a strong research base exam-
included correlations between subscales or ining the environmental contingencies main-
groups of items (three studies); correlations taining the problem behaviors of children and
between items (three studies); formulas exam- adults with developmental disabilities. Items
ining agreement between subscales, groups of were developed and piloted over a 4-year pe-
items, or indicators of function (one study); and riod, and items were added, deleted, or altered
formulas examining agreement between items during item tryouts with the aid of
or individual questions (three studies). Only careproviders of children with mental retarda-
two studies reported the specific settings in tion, autistic disorder, and other developmen-
which the problem behaviors occurred, and tal disabilities. However, the procedure for
only one reported specific situations. None of assigning items to subscales is unclear. No for-
the studies reported controlling the informants’ mal item-sorting procedures drawing upon the
exposure to the problem behaviors between decisions of experts appear to have been uti-
ratings. Thus, even in the studies that reported lized for assigning items to subscales. There
the specific contexts of the problem behaviors, were no descriptions of consultation with ex-
error attributed primarily to the passage of time ternal content experts during the development
between ratings cannot be identified. and evaluation of items, no evaluation of item
Eighteen studies (39.1%) examined bias, and no evaluation of item or instruction
interrater reliability. Of the 18 studies, 15 ex- readability or clarity.
65
School Psychology Review, 2005, Volume 34, No. 1
It is apparent that the FAI has gone Behavioral Functioning Scale [Vollmer &
through several stages of development (see Matson, 1999]), 3 analyses compared it to
O’Neill et al., 1990; O’Neill et al., 1997) and semistructured interviews (e.g., the FAI), and
that it is thorough in leading informants to de- 2 analyses compared it to direct questioning
scribe and prioritize problem behaviors, to iden- about functional relations. For the FAI, 3 analy-
tify variables affecting behavior, and to promote ses compared it to rating scales, 1 analysis com-
the development of hypotheses. Thus, it ap- pared it to another semistructured interview,
pears to be based on expert opinion and feed- and 1 analysis compared it to direct question-
back from users. However, this review revealed ing about functional relations. Notably, 14
no description of its development or evidence analyses (10 for the MAS, 4 for the FAI) com-
based on content to support its validity. pared the functional relations identified by the
Evidence based on response pro- MAS or FAI to those identified through func-
cesses. When the characteristics of the infor- tional analysis. Ten analyses compared their
mant and the interaction between the informant results to functions identified through direct
and the assessment situation were considered, observation. Of these analyses including direct
observation, 9 used data collected using an
no evidence was found. This finding is similar
antecedent–behavior–consequence (ABC) for-
to other behavior rating scales and behavioral
mat. These comparative analyses most fre-
assessment interviews (Beaver & Busse, 2000;
quently examined the agreement between
Floyd & Bose, 2003).
subscales, groups of items, or other indicators
Evidence based on internal struc- of function (30 or 83.3%). Only 6 analyses
ture. Internal relations between items from the (16.7%) used correlations between measures.
MAS have been investigated through item- Although the number of analyses examining
level factor analysis and internal consistency this measurement property is impressive, very
analysis. More specifically, four analyses ex- few studies containing them (9 or 19.6%) re-
amined the factor structure of the MAS using ported that informants were blind to the results
exploratory factor analysis, and five analyses of the other assessment instruments and tech-
examined the relations between items within niques. In addition, only 15 analyses (41.6%)
each of its four scales. Most studies included reported the specific settings in which the prob-
only adolescents and adults with mental retar- lem behaviors occurred, and only 5 (13.8%)
dation displaying self-injurious behavior and reported specific situations.
other problem behaviors. No studies used con-
Evidence based on consequences.
firmatory factor analysis. Although meaning
Evidence of the treatment utility of the MAS
is most frequently derived from the MAS via
and the FAI was identified in nine studies
identifying its highest ranked scale, three
(19.6%).5 However, it is challenging to deter-
analyses examined the statistical relations
mine the unique contribution of the instruments
among all items, which is more appropriate for to intervention development from these stud-
an instrument measuring the severity of a wide ies because, in all cases but one, the instru-
variety of problem behaviors (e.g., Achenbach, ments were used as part of larger assessment
1991). batteries measuring functional relations. No
Evidence based on relations with studies addressing acceptability of these instru-
other variables. The review yielded 36 analy- ments and their outcomes have been published
ses (across 25 studies) comparing the results to date. In addition, no resource was identified
of the MAS or FAI to other instruments or tech- that included a cost–benefit or cost–effective-
niques yielding information about functional ness analysis.
relations. From the studies comparing the MAS Discussion
to other indirect assessment instruments, 2
analyses compared it to other rating scales (i.e., A number of problems plague the exist-
the Problem Behavior Questionnaire [Lewis, ing research examining the measurement prop-
Scott, & Sugai, 1994] and the Questions About erties of FBA measures. In addition to the ab-
66
Measurement Properties
sence of well-designed research examining lated to examine their effects on the accuracy
some properties, the research designs and sta- of resulting data. Informants also could be
tistical analyses used in many studies appear asked to verbalize their internal speech (i.e.,
to be incongruent with the assumptions of the “think aloud”) while completing instruments,
assessment of behavior and its function. In and verbal protocols could be analyzed to iden-
addition, other research design characteristics tify components deemed necessary for accu-
and statistical analyses appear to have been rate identification of functional relations and
flawed. apparent failure to use these components
(Ericsson & Simon, 1993). Researchers exam-
Implications for Research and ining FBA interviews may benefit from use or
Evaluation modification of existing coding schemes for
Well-designed research is needed to pro- verbalizations during interviews, such as the
vide sound evidence supporting the use and Consultant Analysis Record (Bergan &
interpretation of indirect assessment instru- Kratochwill, 1990; Bergan & Tombari, 1971).
ments for FBAs. Such research will be im- For example, research could examine verbal
proved if authors and journal editors include interactions and their correspondence to inter-
details in publications pertaining to (a) the as- view content to determine integrity of assess-
sumptions of behavioral assessment, such as ment procedures.
variability of behavior according to setting and Internal structure. Although statisti-
situation, and (b) research designs and statisti- cal analysis of the internal relations between
cal analyses that promote greater precision in segments of most interviews for FBAs, such
identifying the measurement characteristics of as the FAI, do not appear to be tenable, the
these instruments. As evident from the size- application of such analysis to items from rat-
able number of studies included in this review ing scales and select self-report instruments
that included fewer than 10 participants, re- seems appropriate—if evidence based on test
searchers need not abandon a focus on indi- content supports item groupings. This review
viduals in the search for general principles re- revealed five analyses examining internal
lated to measurement properties of instruments. structure of the MAS. It is logical to expect
Content and response processes. This that some groups of items (i.e., subscales)
review indicates that researchers have invested would covary in a meaningful manner. How-
too little effort identifying and reporting evi- ever, there would be no reason to expect inter-
dence based on test content and response pro- nal relations across all items (e.g., for a total
cesses. Although the content of most indirect score) because groups of items representing
assessment instruments for FBA is drawn from each function are designed to measure distinct
the rich research base of applied behavioral (and largely mutually exclusive) environment–
analysis, it is important that authors make ex- behavior interactions. Increasing the number
plicit the resources used and steps taken dur- of well-constructed items within a related
ing instrument development. Potential consum- group and conducting item-level analysis with
ers and content experts independent of the de- large groups of informants early in the devel-
velopment could assist in refining instrument opment of the instrument would likely be ben-
wording and ensuring that information about eficial. An unpublished revision of the MAS
all relevant variables, such as setting events, seems to have benefited from both more items
is elicited (Haynes et al., 1995). and initial item-level analysis (Durand, 1997;
A number of naturalistic and analogue Haim, 2003).
studies could examine the response processes Relations with other variables. More
of informants completing these instruments. carefully designed studies should be conducted
For example, some informant characteristics to examine the correspondence between mea-
(e.g., training in observation of behavior and sures of functional relations. To increase the
its function) could be systematically manipu- integrity of results, research designs should
67
School Psychology Review, 2005, Volume 34, No. 1
ensure that assessment data are collected in a complished through carefully controlled ana-
manner in which informants, observers, or logue studies or through meticulous documen-
other experimenters are blind to the results of tation of intervention development based on a
other assessment methods. This review indi- stepwise review of assessment results.
cated that fewer than 20% of the studies pro- This review also revealed that research
viding evidence based on external relations examining the MAS and FAI has not exam-
reported this effort. Continued attention ined the degree to which consumers are satis-
should also be paid to statistical analyses fied with the assessment process and their out-
appropriate for categorical-level data. In comes (Eckert & Hintze, 2000; Gresham &
addition to raw agreement indexes, the use of Lopez, 1996; Wolf, 1978). If consumers do not
other statistical analyses for determining agree- find an assessment instrument acceptable, they
ment between categorical-level data, such as may be unlikely to actively engage in the as-
kappa coefficients, tests of marginal homoge- sessment process or to trust and utilize the as-
neity, and generalizability coefficients, also can sessment results. In addition, the costs and
be explored (Kraemer, Periyakoil, & Noda, benefits of use of these instruments have not
2002; Li & Lautenschlager, 1997; Powers & been quantified and compared (Yates & Taub,
Xie, 2000). As evident from this review, 2003). For example, the use of staff time is a
Pearson product-moment correlations between variable that is likely to drive the choice of
measures may continue to be used when data, assessment instruments—especially under con-
such as items, are continuous. Spearman cor- ditions where there is limited information about
relations may be used when the probability of the quality of the instruments (Shriver & Proc-
functions indicated by instruments can be rank tor, 2004). The recent evolution of the FAI into
ordered. an abbreviated interview is evidence that in-
Research examining external relations of strument authors are attentive to this variable
FBA instruments should continue to anticipate (Crone & Horner, 2003).
the confounding influences of context on mea- Consistency across time and infor-
surement of functional relations. Most assess- mant. Because behaviors may serve different
ment instruments focusing on symptoms of functions in different contexts, research focus-
psychopathology (e.g., Achenbach, 1991; ing on test–retest reliability and interrater reli-
Reich, 2000) yield the most meaningful results ability should examine the consistency of re-
when informants observe symptoms across ports of functional relations derived from ob-
contexts. However, the truest measures of func- servations of identical settings and situations.
tional relations result from deliberate specifi- As a result of the variability of behavior and
cation and observation of settings (e.g., a class- its function across contexts and time, infor-
room) and situations (e.g., independent seat mants’ repeated observations of behaviors in
work) in which the behaviors occur (Shriver applied settings between initial and subsequent
et al., 2001). This review identified few stud- completion of the instruments undermines ac-
ies that reported information about the specific curate test–retest reliability estimates. Simi-
contexts of the problem behaviors. larly, informants observing different contexts
Consequences of assessment. Al- in which the behaviors occur weaken estimates
though treatment utility is considered by some of inter-rater reliability. Influences on consis-
to be the ultimate measurement property, this tency of assessment results across time and
review revealed only one study that examined informant will be best pinpointed by carefully
the isolated effects of the MAS. In all other controlled laboratory or analogue studies, such
cases, the instruments were included in a larger as those including observation of problem be-
assessment battery to inform the development haviors by videotape (Johnston &
of interventions. Research should partial out Pennypacker, 1993). In addition, study of these
the contributions of specific assessment results measurement properties will benefit from
to treatment development (Hayes et al., 1987; greater attention paid to statistical analyses
Nelson-Gray, 2003). This research can be ac- appropriate for categorical-level data.
68
Measurement Properties
69
School Psychology Review, 2005, Volume 34, No. 1
70
Measurement Properties
* Durand, V. M., & Crimmins, D. B. (1992). The Motiva- Hammill, D. D., Brown, L., & Bryant, B. R. (1992). A
tion Assessment Scale administrative guide. Topeka, consumer’s guide to tests in print (2nd ed.). Austin, TX:
KS: Monaco & Associates. PRO–ED.
* Durand, V. M., Crimmins, D. B., Caulfield, M., & Tay- * Haring, T. G., & Kennedy, C. H. (1990). Contextual
lor, J. (1989). Reinforcer assessment: I. Using prob- control of problem behavior in students with severe
lem behavior to select reinforcers. Journal of the As- disabilities. Journal of Applied Behavior Analysis, 23,
sociation for Persons With Severe Handicaps, 14, 113– 235–243.
126. Hartman, D. P., Roper, B. L., & Bradford, C. C. (1979).
* Durand, V. M., & Kishi, G. (1987). Reducing severe Some relationships between behavioral and traditional
behavior problems among persons with dual sensory assessment. Journal of Behavioral Assessment, 1, 3–
impairments: An evaluation of a technical assistance 21.
model. Journal of the Association for Persons with Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The
Severe Handicaps, 12, 2–10. treatment utility of assessment: A functional approach
Ebel, R. (1961). Must all tests be valid? American Psy- to evaluating assessment quality. American Psycholo-
chologist, 15, 546–553. gist, 42, 963–974.
Eckert, T. L., & Hintze, J. M. (2000). Behavioral concep- Haynes, S. N., & O’Brien, W. H. (2000). Principals and
tions and applications of acceptability: Issues related practice of behavioral assessment. New York: Kluwer
to service delivery and research methodology. School Academic/Plenum.
Psychology Quarterly, 15, 123–148. Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995).
Edwards, R. P. (2002). A tutorial for using the Functional Content validity in psychological assessment: A func-
Assessment Informant Record for Teachers. Proven tional approach to concepts and methods. Psychologi-
Practice: Prevention and Remediation Solutions for cal Assessment, 7, 238–247.
Schools, 4, 31–33. Individuals with Disabilities Education Act Amendments
Ehrhardt, K. E., Barnett, D. W., Lentz, F. E., Jr., Stollar, S. of 1997, (Pub L No. 105–17. 20 USC Chapter 33, Sec-
A., & Reifin, L. H. (1996). Innovative methodology tions 1400 et seq. (Statute)
in ecological consultation: Use of scripts to promote Johnston, J. M., & Pennypacker, H. S. (1993). Strategies
treatment acceptability and integrity. School Psychol- and tactics of behavioral research (2nd ed.). Hillsdale,
ogy Quarterly, 11, 149–168. NJ: Lawrence Erlbaum.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analy- * Kearney, C. A. (1994). Interrater reliability of the Moti-
sis: Verbal reports as data (rev. ed.). Cambridge, MA: vation Assessment Scale: Another, closer look. Jour-
MIT Press. nal of the Association for Persons with Severe Handi-
Floyd, R. G., & Bose, J. E. (2003). A critical review of caps, 19, 139–142.
rating scales assessing emotional disturbance. Jour- Kern, L., Dunlap, G., Clarke, S., & Childs, K. E. (1994).
nal of Psychoeducational Assessment, 21, 43–78. Student-assisted functional assessment interview.
Foster, S. L., & Cone, J. D. (1995). Validity issues in clini- Diagnostique, 19(2,3), 29–39.
cal assessment. Psychological Assessment, 7, 248–260. * Kinch, C., Lewis-Palmer, T., Hagan-Burke, S., & Sugai,
* Gibson, A. K., Hetrick, W. P., Taylor, D. V., Sandman, G. (2001). A comparison of teacher and student func-
C. A., & Touchette, P. (1995). Relating the efficacy of tional behavior assessment interview information from
naltrexone in treating self-injurious behavior to the low-risk and high-risk classrooms. Education and
Motivation Assessment Scale. Journal of Developmen- Treatment of Children, 24, 480-494.
tal and Physical Disabilities, 7, 215–220. Kraemer, H. C., Periyakoil, V. S., & Noda, A. (2002).
Gresham, F. M. (1984). Behavioral interviews in school Kappa coefficients in medical research. Statistics in
psychology: Issues in psychometric adequacy and re- Medicine, 21, 2109–2129.
search. School Psychology Review, 13, 17–25. Kratochwill, T. R., & Shapiro, E. S. (2000). Conceptual
Gresham, F. M. (2003). Establishing the technical ad- foundations of behavioral assessment in schools. In E.
equacy of functional behavioral assessment: Concep- S. Shapiro & T. R. Kratochwill (Eds.), Behavioral as-
tual and measurement challenges. Behavioral Disor- sessment in schools: Theory, research, and clinical
ders, 28, 282–298. foundations (2nd ed., pp. 3–15). New York: Guilford
Gresham, F. M., & Davis, C. J. (1988). Behavioral in- Press.
terviews with teachers and parents. In E. S. Shapiro Kratochwill, T. R., & Stoiber, K. C. (2002). Evidence-
& T. R. Kratochwill (Eds.), Behavioral assessment based interventions in school psychology: Conceptual
in schools: Conceptual foundations and practical foundations of the Procedural and Coding Manual of
applications (pp. 455–493). New York: Guilford Division 16 and the Society for the Study of School
Press. Psychology Task Force. School Psychology Quarterly,
Gresham, F. M., & Lopez, M. F. (1996). Social valida- 17, 341–389.
tion: A unifying concept for school-based consultation * Lawry, J. R., Storey, K., & Danko, C. D. (1993). Ana-
research and practice. School Psychology Quarterly, lyzing problem behaviors in the classroom: A case
11, 204–227. study of functional analysis. Intervention in School and
Haim, A. (2003). The analysis and validation of the Moti- Clinic, 29, 96-100.
vation Assessment Scale-II Test Version: A structural Lennox, D. B., & Miltenberger, R. G. (1989). Conducting
equation model. Dissertation Abstracts International: a functional assessment of problem behavior in applied
Section B: The Sciences and Engineering, 63(10-B): settings. Journal of the Association of Persons with
4903. Severe Handicaps, 14, 304–311.
71
School Psychology Review, 2005, Volume 34, No. 1
Lewis, T. J., Scott, T., & Sugai, G. (1994). The Problem Reich, W. (2000). Diagnostic Interview for Children and
Behavior Questionnaire: A teacher-based assessment Adolescents. Journal of the American Academy of
instrument to develop interventions to develop func- Child and Adolescent Psychiatry, 39, 59–66.
tional hypotheses of problem behavior in general edu- * Scotti, J. R., Schulman, D. E., & Hojnacki, R. M. (1994).
cation classrooms. Diagnostique, 19, 103–115. Functional analysis and unsuccessful treatment of
* Lewis, T. J., & Sugai, G. (1996). Functional assessment Tourette’s syndrome in a man with profound mental
of problem behaviors: A pilot investigation of the com- retardation. Behavior Therapy, 25, 721–738.
parative and interactive effects of teacher and peer so- * Shogren, K. A., & Rojahn, J. (2003). Convergent reli-
cial attention on students in general education settings. ability and validity of the Questions About Behavioral
School Psychology Quarterly, 11, 1–19. Function and the Motivation Assessment Scale: A rep-
Li, M. F., & Lautenschlager, G. (1997). Generalizability lication study. Journal of Developmental and Physi-
theory applied to categorical data. Educational and cal Disabilities, 15, 367–375.
Psychological Measurement, 57, 813–822. Shriver, M. D., Anderson, C. M., & Proctor, B. (2001).
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta- Evaluating the validity of functional behavior assess-
analysis. Thousand Oaks, CA: Sage. ment. School Psychology Review, 30, 180–192.
Merrell, K. W. (2003). Behavioral, social, and emotional Shriver, M. D., & Proctor, B. (2005). Social validity in
assessment of children and adolescents. Mahwah, NJ: disseminating functional behavior assessment technol-
Lawrence Erlbaum. ogy: Research directions for meeting consumer needs.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational Manuscript submitted for publication.
measurement (3rd ed., pp. 13–103). Washington, DC: * Sigafoos, J., Kerr, M., & Roberts, D. (1994). Interrater
American Council on Education. reliability of the Motivation Assessment Scale: Fail-
Messick, S. (1995). Validity of psychological assessment: ure to replicate with aggressive behavior. Research in
Validity of inferences from persons’ responses and Developmental Disabilities, 15, 333–342.
performances as scientific inquiry into score meaning. Silva, F. (1993). Psychometric foundations and behavioral
American Psychologist, 50, 741–749. assessment. Newbury Park, CA: Sage.
Nelson, R. O. (1983). Behavioral assessment: Past, present, * Singh, N. N., Donatelli, L. S., Best, A., Williams, E. E.,
and future. Behavioral Assessment, 5, 195–206. Barrera, F. J., Lenz, M. W., Landrum, T. J., Ellis, C.
Nelson, R. O., Hay, I. R., & Hay, W. M. (1977). Com- R., & Moe, T. L. (1993). Factor structure of the Moti-
ments on Cone’s “The relevance of reliability and va- vation Assessment Scale. Journal of Intellectual Dis-
lidity for behavioral assessment.” Behavior Therapy, abilities Research, 37, 65–75.
8, 427–430. * Spreat, C., & Connelly, L. (1996). Reliability analysis
Nelson-Gray, R. O. (2003). Treatment utility of psycho- of The Motivation Assessment Scale. American Jour-
logical assessment. Psychological Assessment, 15, nal of Mental Retardation, 100, 528–532.
521–531. Sturmey, P. (1994a). Assessing the function of aberrant be-
* Newton, J. T., & Sturmey, P. (1991). The Motivation haviors: A review of psychometric instruments. Jour-
Assessment Scale: Interrater reliability and internal nal of Autism and Developmental Disorders, 24, 293–
consistency in a British sample. Journal of Mental 304.
Deficiency Research, 35, 472–474. * Sturmey, P. (1994b). Assessing the functions of self-in-
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric jurious behavior: A case of assessment failure. Jour-
theory (3rd ed.). New York: McGraw Hill. nal of Behavior Therapy and Experimental Psychia-
* O’Neill, R. E., Horner, R. H., Ablin, R. W., Storey, K., & try, 25, 331–336.
Sprague, J. R. (1990). Functional analysis of problem Sugai, G., Lewis-Palmer, T., & Hagan-Burke, S. (2000).
behaviors: A practical guide. Sycamore, IL: Sycamore. Overview of the functional behavioral assessment pro-
* O’Neill, R. E., Horner, R. H., Ablin, R. W., Sprague, J. cess. Exceptionality, 8(3), 149–160.
R., Storey, K., & Newton, J. S. (1997). Functional as- * Thompson, S., & Emerson, E. (1995). Inter-informant
sessment and program development for problem be- agreement on the Motivation Assessment Scale: An-
haviors: A practical handbook. New York: Brooks/ other failure to replicate. Mental Handicap Research,
Cole. 8, 203–208.
* Paclawskyj, T. R., Matson, J. L., Rush, K. S., Smalls, Y., Vollmer, T. R., & Matson, J. L. (1999). Questions About
& Vollmer, T. R. (2001). Assessment of the conver- Behavioral Function manual. Baton Rouge, LA: Sci-
gent validity of the Questions About Behavioral Func- entific Publishers.
tion scale with analogue functional analysis and the Witt, J. C., Daly, E. M., & Noell, G. (2000). Func-
Motivation Assessment Scale. Journal of Intellectual tional assessments: A step-by-step guide to solv-
Disability Research, 45, 484–494. ing academic and behavior problems. Longmont,
Powers, D. A., & Xie, Y. (2000). Statistical methods for CO: Sopris West.
categorical data analysis. San Diego, CA: Academic Wolery, M. (2003, October). Single-subject methods: Util-
Press. ity for establishing evidence for practice. Paper pre-
* Reese, R. M., Richman, D. M., Zarcone, J., & Zarcone, sented at the 19th Annual International Conference on
T. (2003). Individualizing functional assessments for Young Children with Special Needs and Their Fami-
children with autism: The contribution of perseverative lies, Washington, DC.
behavior and sensory disturbances to disruptive behav- Wolf, M. M. (1978). The case of subjective measurement
ior. Focus on Autism and Other Developmental Dis- or how applied behavior analysis is finding its heart.
abilities, 18, 89–94. Journal of Applied Behavior Analysis, 11, 203–214.
72
Measurement Properties
* Yarbrough, S. C., & Carr, E. G. (2000). Some relation- ment: We should, we can, here’s how. Psychological
ships between informant assessment and functional assessment, 15, 478–495.
analysis of problem behavior. American Journal of * Zarcone, J. R., Rodgers, T. A., Iwata, B. A., Rourke, D.
Mental Retardation, 105, 130–151. A., & Dorsey, M. F. (1991). Reliability analysis of the
Yates, B. T., & Taub, J. (2003). Assessing the costs, cost- Motivation Assessment Scale: A failure to replicate.
effectiveness, and cost-benefit of psychological assess- Research in Developmental Disabilities, 12, 349–360.
73