Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Journal of Autism and Developmental Disorders, VoL 19, No.

2, 1989

Autism Diagnostic Observation Schedule:


A Standardized Observation of Communicative
and Social Behavior I
Catherine Lord 2
Department of Pediatrics, University of Alberta and Glenrose Rehabilitation Hospital

Michael Rutter and Susan Goode


MRC Child Psychiatry Unit, University of London

Jacquelyn Heemsbergen and Heather Jordan


Department of Pediatrics, University of Alberta and Glenrose Rehabilitation Hospital

Lynn Mawhood
MRC Child Psychiatry Unit, University ofLondon

E r i c Schopler
University of North Carolina

The A u t i s m Diagnostic Observation Schedule (ADOS), a standardized pro-


tocol f o r observation ofsocial and communicative behavior associated with
autism,/s described. The instrument consists o f a series o f structured and
semistructured presses f o r interaction, accompanied by coding o f specific
target behaviors associated with particular tasks and by general ratings o f

~This research was funded in part by grants from the Social Sciences and Humanities Research
Council of Canada, the Alberta Heritage Foundation for Medical Research, and the Medical
Sciences Institute to the first author. Susan Goode was supported by a grant from the Bethlem-
Maudsley Research Fund. Lynn Mawhood was supported by a postgraduate studentship funded
by the John D. and Catherine T. MacArthur Foundation. We acknowledge the work of Joyce
Magill, Deborah Dewey, and numerous members of the Department of Child and Adolescent
Psychiatry at the Institute of Psychiatry, Division TEACCH, and Glenrose Rehabilitation Hospi-
tal for their help in the development of this scale.
2Address all correspondence to Catherine Lord, Department of Psychology, Glenrose Rehabili-
tation Hospital, 10230-111 Avenue, Edmonton, Alberta T5G 0B7, Canada.
185
0162-32.57/89/0(~0,-0185506.00/0 9 1989 Plenum Publishing Corporation
186 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood, and Schopler

the quality o f behaviors. Interrater reliability for five raters exceeded weighted
kappas o f .55 f o r each item and each pair o f raters f o r matched samples o f
15 to 40 autistic and nonautistic, mildly mentally handicapped children (M
IQ = 59) between the ages o f 6 and 18 years. Test-retest reliability was ade-
quate. Further analyses compared these groups to two additional samples
o f autistic and nonautistic subjects with normal intelligence (M IQ = 95),
matched f o r sex and chronological age. Analyses yielded clear diagnostic
differences in general ratings o f social behavior, specific aspects o f commu-
nication, and restricted or stereotypic behaviors and interests. Clinical guide-
lines f o r the diagnosis o f autism in the draft version o f ICD-IO were
operationalized in terms o f abnormalities on specific A D O S items. An al-
gorithm based on these items was shown to have high reliability and dis-
criminant validity.

A crucial part of the diagnostic evaluation of autism is the assessment of


social and communicative behavior. Although information about these skills
must be acquired from a number of sources, such as observations in familiar
settings and parent interviews (see LeCouteur et al., 1989; Volkmar et
al., 1987; Rutter et al., 1988), most clinicians also wish to incorporate
observations made during their own interactions into their decisions about
diagnosis. However, often such interactions or the conclusions based on them
are not standardized in any way, either across examiner or across patients
or clients. The purpose of the Autism Diagnostic Observation Schedule
(ADOS) is to provide a standard series of contexts for the observation of
communicative and social behavior of persons with autism and related dis-
orders. The goals of the schedule are both diagnostic, that is, to discriminate
autism from other handicaps and from normal functioning, and research,
that is, to study directly the quality of social and communicative behaviors
associated with autism. The objective of this paper is to describe the ADOS
and provide a preliminary report of its psychometric properties.
Several observational scales for the diagnosis of autism are currently
available (see Parks, 1983, for a selective review). These scales include the
Childhood Autism Rating Scale (CARS; Schopler, Reichler, & Renner, 1986),
the Behavior Observation Scale for Autism (BOS; Freeman, Ritvo, Guthrie,
Schroth, & Ball, 1978; Freeman, Ritvo, & Schroth, 1984), the Behavior Rat-
ing Instrument for Autistic and Atypical Children (BRIAAC; Ruttenberg,
Kalish, Wenar, & Wolf, 1977), the Autism Observation Scale (Siegel, Anders,
Ciaranello, Bienenstock, & Kraemer, 1986), and the Autism Behavior Check-
list (ABC), which is part of the Autism Screening Instrument for Education-
al Planning (ASIEP; Krug, Arik, & Almond, 1980). Adequate interrater
reliability has been demonstrated for each of these instruments; however,
their validity in discriminating among different diagnostic groups ltas varied,
depending on the .selection of both the autistic subjects and the comparison
group (Cohen et al., 1978; Parks, 1983; Wenar, Ruttenberg, Kalish,Weiss,
Autism Diagnostic Observation Schedule 187

& Wolf, 1986; Volkmar et al., 1988). In general, discrimination has been
most clear when samples of autistic children with quite severe mental retarda-
tion have been compared to normally developing children (Wenar et al., 1986)
or to other nonautistic mentally handicapped children without specific
matching for intellectual level (Teal & Wiebe, 1986; Volkmar et al., 1986).
Because items associated not only with autism but with developmental ages
of less than 1 or 2 years were included in most of these scales (Wenar et al.,
1986), these schedules are less effective in identifying higher functioning au-
tistic children and adolescents than autistic children who are severely han-
dicapped (Krug et al.. 1980).
The ADOS differs from other scales in a number of ways. One of the
underlying purposes of the ADOS was to facilitate observation of social and
communicative features specific to autism rather than those accounted for
or exacerbated by severe mental retardation. The development of the ADOS
began with the goal of discriminating autistic children with mild or no men-
tal handicap across the age span of 6 years to young adulthood from age-
and IQ-matched normally developing and nonautistic mildly retarded in-
dividuals. Moreover, the focus of the ADOS is on social and communica-
tive behaviors. It is less comprehensive in its attention to specific autistic-type
movements, behavior difficulties, and sensory interests than many scales
(Krug et al., 1980; Siegel et al., 1986).
Two aspects of this social and communicative focus are most novel.
First, the ADOS is an interactive schedule. What is standardized in the ADOS
are the contexts that provide the background for all observations and, more
specifically, the behaviors of the examiner, not the sample. Social behavior
and communication generally involve more than one person. We wanted to
be able to take into account the examiner's actions in the objective evalua-
tion of the subject's behavior. In the ADOS, the examiner is considered a
participant/observer or confederate in a social experiment. He or she fol-
lows a set protocol that provides not just social but also contextual presses
for social behavior. "Press" is a term borrowed from Murray (1938). We use
it here without pejorative connotations to refer to aspects of the immediate
environment that have direct implications for the subject's behavior.
One critical issue was how to create dear presses for s o m e sort of social
or communicative behaviors to occur, without overtly structuring the social
aspects of the task so much that the quality of the behavior was predeter-
mined by the setting. The general format of the schedule is to encourage an
interaction that appears natural, during which preplanned "occasions" for
certain behaviors arise, with the imposed structure as invisible to the subject
as possible. However, in reality, this structure has been carefully determined
in terms of social tasks that are defined in detail by variations in cognitive
demands, in the type of materials, and in the behavior of the interviewer.
These standard situations thus provide comparable social stimuli for all
subjects.
188 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood, and Schopler

In its use of standardized presses for communication and social be-


havior, the ADOS is more like psychoeducational or developmental tests such
as the Psychoeducational Profile (Schopler & Reichler, 1980) or early-
communication assessments (MacDonald & Horstmeier, 1978; Seibert & Ho-
gan, 1982) than it is like most diagnostic rating scales for autism. In contrast
to other diagnostic instruments that have tended to focus on identifying the
behaviors to be observed and providing rules for rating these behaviors, a
very important aspect of the ADOS is the provision of a standard series of
social contexts, through the interviewer's behavior and the materials and cog-
nitive tasks. The ADOS is intended to be videotaped. Thus, the instrument
offers the potential of creating a standard data base that can be used in fur-
ther analyses of specific communicative or social behaviors, in addition to
the ratings offered in the ADOS itself.
The emphasis on the examiner's behavior as one aspect of the social
stimuli means that he or she must have had some experience with autistic
persons. The ADOS is not a screening instrument such as the ABC (Krug
et al., 1980; Volkmar et al., 1988). It is not intended to be used by persons
unfamiliar with autism to make decisions concerning educational placement
or need for referral.
Second, the ADOS allows the rating of the quality of social behavior,
not just its absence or its occurrence in limited quantities. Although it is
known that autistic persons often produce fewer spontaneous social and com-
municative behaviors than other people, one of the assumptions behind the
ADOS was that it is how these behaviors are carried out that is specific to
autism, not necessarily whether or not they occur. Thus, the ADOS consists
of a range of situations that serve as presses for autistic persons to interact.
The schedule is an attempt to combine aspects of previous scales by both
emphasizing the nature of social and communicative behavior in autism (as
in the CARS; Schopler et al., 1986) and evaluating the qualities of autistic
interactions and communication in terms of specific behaviors (as in the BOS;
Freeman et al., 1984; or ABC, Krug et al., 1980; Volkmar et al., 1988).
The development of the schedule involved three steps. First, behaviors
and aspects of behavior that were of interest were identified on the basis of
diagnostic formulations (American Psychiatric Association, 1987; World
Health Organization, 1987), recent research (Rutter & Schopler, 1987), and
clinical experience (Rutter, 1985). Second, appropriate contexts in which these
behaviors could reasonably be expected to occur were designed. Third, reliable
and valid ways of describing the behaviors that occurred within these con-
texts were defined. For each task, an immediate rating describes the sub-
ject's behavior in that particular situation. In addition, a set of general ratings
is scored at the end of the schedule on the basis of the subject's behavior
throughout the assessment. These two levels of ratings were employed be-
Autism Diagnostic Observation Schedule 189

cause o f evidence o f greater generalizability o f molar ratings from "labora-


tory" analogs of natural social situations to other settings (Pettit, McClaskey,
Brown, & Dodge, 1987), and because o f our interest in identifying those sit-
uations most likely to facilitate the observation o f autistic-type social behavior
and communication (Rutter et al., 1988).
The present paper provides a description o f the instrument as well as
a preliminary report o f its psychometric properties, including interrater and
test-retest reliability, and an initial evaluation o f its concurrent and dis-
criminant validity. In addition, clinical guidelines for the diagnosis of au-
tism in the draft version of ICD-10 (WHO, 1987) were operationalized in
terms of abnormalities on specific ADOS items. A diagnostic algorithm based
on these items was generated, and a preliminary assessment was made o f
its reliability and validity.

D E S C R I P T I O N OF T H E I N S T R U M E N T

As shown in Table I, the ADOS consists of eight tasks presented by


an examiner that generally require 20-30 rain to administer. There are two
sets of materials for most tasks so that content and cognitive demands can
be varied according to the chronological age and developmental level of the
subject. The order o f the tasks is flexible and is determined by how the in-
teraction flows. The first task consists of a construction activity (i.e., a puz-
zle or pegboard) presented in such a way that, in order to complete the task,
the subject has to obtain more pieces from the examiner. Scored at that time
is whether or not the subject indicates the need for more pieces, and, if so,
how. Second, a set containing both familiar and unusual miniatures is used
to provide the opportunity for spontaneous, individual imaginative play. The
examiner then attempts to join in with the subject's play, in order to observe

Table 1. Components of Autism Diagnostic Observation Schedule


Task Target behavior(s)
Construction task Asking for help
Unstructured presentation of toys Symbolic play
Reciprocal play
Giving help to interviewer
Drawing game Taking turns in a structured task
Demonstration task Descriptive gesture and mime
Poster task Description of agents and actions
Book task Telling a sequential story
Conversation Reciprocal communication
Socioemotional questions Ability to use language to
discuss socioemotional topics
190 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood, and Schopler

whether the subject can engage in reciprocal play and whether he or she takes
some initiative in extending the interaction. Third, turn-taking and joint object
use are observed in a structured drawing game. Fourth, the subject is asked
to demonstrate through mime and gesture, as well as language, how to carry
out a familiar series of actions (e.g., how to brush one's teeth).
Along with the demonstration task, the remaining tasks are intended
to provide standardized contexts for the collection o f a language sample as
well as an assessment o f nonverbal behaviors. The fifth task involves the
description of a poster depicting a variety o f familiar activities (e.g., a shop-
ping center with people in and out of stores). In the sixth task, subjects
are presented with a book that portrays a simple story without any text. The
requirement is for the subject to tell a story, indicating in some way an un-
derstanding of events as they occur within a story fine. For the seventh task,
the examiner seeks to engage the subject in conversation about topics that
have arisen in the course of the interview. The eighth task consists of specific
questions about emotions and social relationships, and is designed to assess
the subject's ability to describe social and emotional situations and concepts
(Wolff & Barlow, 1979).
Behaviors targeted for observation in each task are coded as the inter-
view proceeds, with general ratings made immediately after the interview.
General ratings are provided for four areas (a) reciprocal social interaction,
Co) communication/language, (c) stereotyped/restricted behaviors, and (d)
mood and nonspecific abnormal behaviors. Finally, an overall autism rat-
ing is included to facilitate comparisons with the CARS (Schopler et al., 1986).
In most cases, general ratings are made on a 3-point ordinal scale, from
0 = within normal limits, to 1 = infrequent or possible abnormality, to 2
= definite abnormality. For a few items requiring categorical scoring, an
additional rating of 7 is used to indicate behavior in the same category that
is abnormal in a way not encompassed by the codings (e.g., pronoun errors
other than those involving first-person references, such as "they" for "it").
Scores are not strictly intended to judge severity but to allow room for some
uncertainty on the part o f the examiner. The same aspect o f the same be-
havior cannot be coded as abnormal more than once in different ratings,
although different aspects of the same behavior can result in an abnormal
code for more than one item. Ratings are described in terms of-principles,
supported by examples. For example, the scores for quality o f social over-
tures are as follows: 0 = integrates appropriate facial expression, gesture,
and vocaliTation to communicate social intentions; 1 = slightly odd quality
o f social overtures. Overtures may often be for personal demands or related
to own interests, but there is some attempt to involve the examiner in that
interest; 2 = inappropriate overtures that lack social quality and integration
of own and other's behavior. Includes subject bringing up preoccupations
Autism Diagnostic Observation Schedule 191

Table II. Description of Samples for Interrater Reliability and Va-


lidity Studies a
Group b Chronological age Verbal IQ
Autistic/mildly retarded 13.00 60.80
(3.87) (12.01)
[6.1-18.8] [44-85]
Mentally handicapped 12.97 57.25
(2.52) (15.53)
[7.3-16.75] [45-79]
Autistic/nonretarded 12.99 95.35
(5.35) (15.74)
[6.2-28.1] [82-146]
Normally developing 12.91 --
(4.40)
[6.4-22.4]
~ autistic/mildly retarded and mentally handicapped groups
were involved in the interrater reliability study. Parentheses indi-
cate standard deviations. Brackets indicate ranges.
An = 20 for each group.

with no attempt to involve examiner in them; 8 -- negligible social overtures


of any kind.

Method

Subjects

As shown in Table II, the subjects for the assessment of interrater relia-
bility were 20 autistic children and adolescents and 20 mentally han-
dicapped/lower IQ children and adolescents matched individually for
chronological age, verbal IQ (on WISC-R or WAIS-R; Wechsler, 1974; 1981)
and sex (12 male, 8 female). Verbal IQ was used for matching as an approxi-
mate index of the complexity of each subject's productive and receptive lan-
guage skills. All subjects were between 6 and 18 years of chronological age,
and between 50 and 80 full-scale IQ on the appropriate intelligence test.
Autistic subjects were recruited from three sources: 10 subjects from
Division TEACCH (North Carolina), 5 subjects from the Department of
Child and Adolescent Psychiatry, Institute of Psychiatry (London, England),
and 5 subjects from the Child and Family Psychiatric Unit, Glenrose Re-
habilitation Hospital (Edmonton, Canada). Prior to inclusion in the study,
all subjects were judged to meet Rutter (1978) and DSM III-R (APA, 1987)
criteria for autism by one of the authors (C.L. or M.R.) on the basis of per-
sonal contact. In addition, all subjects from North Carolina and Canada had
CARS scores over 30 (Schopler et al., 1986).
192 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawbood, and Schopler

Nonautistic, mentally handicapped subjects were recruited from three


sources: 10 subjects from a special school (ESN) for mildly mentally han-
dicapped children in London, 7 subjects from a vocational junior/senior high
school in Edmonton, and 3 from an Edmonton social group for elementary
school age children with a variety of handicaps. Children were selected on
the basis of chronological age, sex, and IQ from school and group registers,
from which children with sensory or physical handicaps or previous diag-
noses of autism had been excluded. It should be noted that because these
subjects came from categorical educational and psychological services for
children having academic or social difficulties, they represented a more be-
haviorally abnormal group of children and adolescents than mentally han-
dicapped children selected in other ways (e.g., from integrated classrooms).
When relatively low reliabilities were found for several items scored
during videotaped assessments, interrater reliabilities were reassessed for these
items using live ratings. At the same time, items modified in any way be-
cause of results of the first analyses (see below) were reassessed. For these
live assessments of interrater reliability, 5 of the original Edmonton subjects
(3 autistic/mildly retarded, 2 mentally handicapped) were reassessed, as well
as 3 additional mentally handicapped boys from the same school as the others
and 7 additional autistic/mildly retarded boys. The autistic subjects were those
children within the same age and IQ ranges as above who were seen as con-
secutive referrals to an autism clinic, and who, after the entire intake, were
judged independently by a clinical psychologist (C.L.) and a child psychiatrist
to meet Rutter 0978) and DSM III-R (APA, 1987) criteria for autism.
Test-retest reliability was determined using the 5 Edmonton subjects
involved in both the initial videotaped and second live assessments (3 au-
tistic/mildly retarded, 2 mentally handicapped) described above, as well as
3 nonretarded autistic subjects and 3 normally developing subjects from the
validity study (described below).

Procedures

Except for the live interrater reliability assessment (where there were
two examiners), subjects were videotaped during the ADOS while alone in
a room with the examiner. The assessments took place in a variety of set-
tings including schools, homes, and clinics in all three countries. The sub-
jects and examiner sat next to or diagonally across from each other at a table,
with sufficient distance from the camera that both were in the picture. An
external microphone was placed on the table. The ADOS generally took 20-30
min to complete. The examiners were not usually blind to the diagnosis of
Autism Diagnostic Observation Schedule 193

the subject because subjects were often seen in residential and school place-
ments. However, all except live reliability statistics were computed using cod-
ings of videotapes made by raters who were blind to diagnosis.
During the live interrater reliability assessments, two examiners sat at
the table with the subject, with only the examiner currently interacting with
the child appearing on cmnera. Midway through the ADOS, the examiners
exchanged places, and the second examiner administered the second half of
the schedule. Both examiners scored the tasks whether or not they carried
them out, and both made notes throughout the interview, as well as scoring
the general ratings immediately after the scale's completion. In this case, be-
cause all consecutive referrals were seen (though not used in the study), the
examiners were blind to diagnosis. This procedure was decided upon when
agreement for practice ratings carried out live with one examiner administer-
ing the entire ADOS and the other examiner watching from inside the same
room was found to be closely similar to earlier ratings made from videotapes.
Test-retest subjects were seen by two different examiners in the same
location on two occasions, separated by 3 to 9 months. Alternate sets of
materials were used in the tasks for which they were available. Scores from
live ratings during both assessments were used.

Raters

Five persons (two from London, three from Edmonton) served as ex-
aminers and raters. Before this study was undertaken, all examiners had
worked together for over a year to devise the scale, modify its procedures
and codings, and standardize its administration.
A balanced incomplete block design was used to assess interrater relia-
bility (Fleiss, 1986). Four of the raters (all except C.L.) were each randomly
assigned 16-20 videotapes to rate, with the following constraints: Each rater
coded at least one subject in every cell, defined by diagnosis (autistic/mildly
retarded vs. mentally handicapped) and continent (UK vs. USA/Canada);
each pair rated 6-8 subjects in common; and no rater rated videotapes of
herself. Because the assignment of the examiners to subjects was not ran-
dom due to varying availability of children of different diagnoses in differ-
ent locations, assignment of raters to subjects could not be perfectly balanced.
An additional rater (C.L.) scored four tapes with each other rater, balanced
across diagnostic group and continent. Thus, the videotape of each subject's
ADOS was coded twice and, in some cases, three times (with C.L. always
as the third rater). Live ratings and test-retest administration and scoring
were carried out only by the three Edmonton raters.
194 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood, and $chopler

Results

Interrater Reliability

Weighted kappas (Cicchetti, Lee, Fontana, & Dowds, 1978; Cicchetti


& Sparrow, 198 I; Cohen, 1968) were selected in order to take into account
differences in degree of disagreement. They were computed for each item
for each pair of raters separately, as well as pooled across pairs of raters,
except for items where a code other than 0 was made fewer than three times
per rater. These items were mode of communication, pronoun reversal, ne-
ologisms, compulsions, mannerisms not necessarily associated with autism,
tics, vocal tics, appearance, disruptive or aggressive behavior, self-injury,
and misery. They were not eliminated from the scale; however, they cannot
be considered reliable until additional samples are studied.
Although the number of subjects rated by each pair of raters was too
small to allow appropriate evaluation of each separate kappa statistic, the
above procedure allowed us to check whether the summary weighted kappa
accurately reflected the scores of most of the individual pairs, or was the
average of very different degrees of agreement across pairs (Conger, 1980;
Hubert, 1977; Fleiss, 1986). No item with a summary weighted x > .55 had
more than 2 pairs of raters out of 10 who received a weighted x < .50. Thus,
on the basis of this finding and the estimated variance of the weighted scores,
a summary ~ < .55 was used to identify problematic items (Hubert, 1978; Rice
et al., 1986). Eleven i[ems were identified as such, all of which were general
ratings. Two items, curiosity and socially unacceptable habits, were elimi-
nated from the scale because of very poor reliability. Two related items, in-
tonation and rhythm/rate, were combined into a single score, resulting in
a weighted kappa of an acceptable level (i.e., .58).
One other task item, sequential play, was eliminated because of con-
ceptual overlap with the general rating of symbolic play, an overlap con-
firmed by a correlation of .89 between the two items. Another task item,
shared enjoyment, showed relatively poor reliability and substantial concep-
tual overlap (and a correlation of .92) with the task rating for reciprocal play.
Consequently, shared enjoyment was rewritten as a general rating to describe
the subject's behavior over the course of the schedule. In its new form, the
interrater reliability for shared enjoyment was reassessed in the live ratingsi
however, because of changes from the codings used in the original data col-
lection, it could not be included in the test-retest or validity analyses. For
this reason, it was not included in the algorithm.
The remaining items were also reassessed using live ratings. Overac-
tivity, attention, and anxiety were rated again as they were written in the
earlier version. Because of the frequent overlaps of I with both 0 and 2, scores
Autism Diagnostic Observation Schedule 195

Table m . Estimates of Interrater Agreement for Task Items


Xw
Item Interrater Test-retest
Asking for help .66 .61
Symbolic play .73 .57
Reciprocal play .68 .60
Giving help to interviewer .88 .70
Turn-taking in drawing .89 .62
Demonstration/mime .82 .78
Description of poster .90 .62
Telling a story .92 .84
Question about emotions .61 .60
Quality of person .64 .57
Concept of friendship .74 .59
Questions about marriage .82 .61

for intelligibility and unusual eye contact were made dichotomous by eliminat-
ing the 1 code completely. Codes of 1 for social distance and inappropriate
questions and statements were combined with 2's because of consistent overlap
between the two scores.
Reliability statistics for final versions o f items retained in the scale are
portrayed in Table III (tasks) and IV (general ratings). Overall, weighted kap-
pas for task items ranged from .61 to .92; weighted kappas for general rat-
ings ranged from .58 to .87. For the 10 pairs of raters, mean weighted kappas
across items ranged from .68 to .75 for each subject, with a grand mean of
.72 (SD = 0.01). Although the distribution of scores from 0 to 2 varied across
items and diagnostic groups, o f greatest diagnostic significance are disagree-
ments between 0 (i.e., normal) and 2 (i.e., definitely abnormal). On the whole,
such disagreements were very rare. They are reported for each item with the
reliability statistics in Table IV.

Test-Retest Reliability

Weighted kappas for test-retest reliabilities were adequate for all task
items (range .57-.84) and general ratings (range .58-.92). Mean weighted kap-
pas, combined across items for individual subjects, ranged from .58 to .92,
with no consistent differences across diagnostic groups.

Rater Bias

Cochran's Q (Marascuilo & McSweeney, 1977) was used to assess rater


bias by contrasting each rater's scores with those of all other raters for shared
196 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood, and Schopler

Table IV. Estimates of Interrater Agreement for General Ratings


x.
No. of disagreements
between scores of definite
abnormality (2) and
Item within normal limits (0) Interrater Test-retest
Reciprocal social interaction
Amount of overtures 0 .71 .67
Quality of social overtures 0 .73 .58
Quality of social response 0 .70 .63
Amount of reciprocal social
communication 2 .74 .79
Overall quality of rapport 4 .74 .69
Unusual use of eye contacta'b 0 .61 .60
Social disinhibition 1 .73 .65
Social distance~'b 1 .60 .59
Facial expression 0 .60 .62
Smiling 4 .70 .73
Nonverbal communication
linked with language 0 .66 .59
Shared enjoyment 0 .78 -
Level of nonechoed language 1 .61 .70
Conversation 1 .68 .67
Reports 0 .73 .64
Intelligibility"'b 0 .80 .80
intonation/rhythm/rateO,b.c 2 .58 .57
Immediate echolalia 1 .68 .62
Idiosyncratic language 2 .66 .64
Inappropriate questions and
statements~.b 1 .60 ,58
Imagination and creativity 0 .84 .6O
Unusual preoccupations 3 .69 .80
Unusual sensory interests 1 .7O .66
Autistic mannerisms and
stereotyped movements 0 .80 .77
Mood and nonspecific abnormal behaviors
Overactivity~ 0 .64 .65
Attentionb 2 .61 .60
Negativism 1 .78 .78
Overall distress 1 .63 .60
Anxietyb 1 .82 .62
Inappropriate cheerfulness 0 .79 .81
Overall clinical rating of autism 0 .80 1.00
altems in which codes combined to make dichotomous score (either 0/1-2 or 0/2).
bItems scored during live observation during which rater interacted with subject; 15 pairs total.
All other scores are f o r 56 pairs.
CTwo items (intonation, rhythm/rate) scored separately, combined to form one rating.
Autism Diagnostic Observation Schedule 197

subjects. Using the final data sets reported in Tables IIr and IV, no signifi-
cant results were found.

VALIDITY STUDIES

Methods

Subjects

Ratings of four groups of 20 subjects each (12 male, 8 female) were


used in the analyses of concurrent criterion-related and discriminant validi-
ty: autistic/mildly retarded, mentally handicapped, nonretarded autistic, and
normally developing. Descriptive data for these groups are reported in Ta-
ble II. Autistic/mildly retarded and mentally handicapped groups were
described earlier as part of the reliability analyses. Nonretarded autistic sub-
jects all had full-scale IQs onthe WISC-R or WAIS-R over 80. Nonretarded
autistic subjects were recruited from the same sources as the lower function-
ing autistic group: 6 subjects were seen in North Carolina, 2 in London, and
12 in Edmonton. Normally developing children and adolescents were recruited
from nonhandicapped volunteers attending integrated social groups for han-
dicapped children in Edmonton.

Results

Discriminant Validity: Comparison of Differences in Distribution of


Scores of Autistic and Nonautistic Samples

General Ratings. Distributions for general ratings are portrayed in Ta-


ble V, according to the number of nonautistic and autistic subjects who
received scores of 1 (i.e., possible abnormality) or 2 (i.e., definite abnormal-
ity) on each general rating. Kruskal-Wallis ANOVAs (Marascuilo &
McSweeney, 1977) were used to assess differences in these distributions for
the four samples on each item. Probability levels for significant differences
were set at p < .01 because of the number of separate analyses. As shown
in Table V, pairwise contrasts of ordered means using Tukey tests indicated
that, for more than half of the general ratings, all groups differed signifi-
cantly from each other. Two items, idiosyncratic language and inappropri-
ate questions and statements, yielded significant differences between all
groups except mentally handicapped and normally developing subjects. On
198 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood, and Schopler

m,
z~
^^ I I I I I I I I iX I XI I
<<

^ XXX XXXXXX I X I XX XX
.<

ii
o
rO
.<

A
<
,-]
I XX XXXXX I ~ I XXX Xk.

8
I:::

0
o

8 ..~ n
N

",~ Z ..~E!

~ ~
~ ~o l l ~ ~ j ~

o o

o o o ~ ~ r ~ o = Ou o ~=- ~
, ~ ' ~ ~,,.. ~ ~ o o
0 r 0"~ ~'~ ~'~ ~ ~"~ ~-'~

~'U'~.~ ~'~ ~ ~. ,~ = ~ o
Autism Diagnostic Observation Schedule 199

"a
E
II I I I I I I I ,'!

II

r-L

;=..,

E
II
:=
:E
X XX IX ~ I I I I I I
I=
'13 0

.u E
.9 o

~<

.=~
"a~

0
0

O.J

~v o
=
E
I

o~

,.~ .,= r,., ,ia

~.= ~ ~.=~ ~
u 0
--,,, ;~, ~ ~ ~ .=
~..,~ ~ o,..~

9~ o . ~ ~o ~ ~,_~'~_, . . .

0 ="~ ~ ~ ::='~ ='' "" I~ "" ~a " ~ ~" . ~ v v


200 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood, and Schopler

all items for which significant differences occurred between autistic samples,
the mildly retarded autistic sample was rated as more abnormal than the non-
retarded autistic group. Four general ratings showed significant differences
between autistic and nonautistic groups, but not between the autistic sam-
ples. These items were unusual use of eye contact, nonverbal communica-
tion linked with language, amount of overtures, and unusual preoccupations.
Two other ratings had significant differences within IQ ranges (i.e., between
normally developing and nonretarded autistic groups and between mentally
handicapped and autistic/mildly retarded groups) but not between the non-
retarded autistic group and the mentally handicapped subjects. These rat-
ings were level of nonechoed language and social distance. General ratings
for which no significant group differences were found were overactivity, at-
tention, negativism, overall distress, anxiety, and inappropriate cheerfulness.
Ratings of Tasks. Out of 12 task items, 3, describing a poster, asking
for help, and giving help to interviewer, yielded no significant specific group
comparisons. For two items, demonstration/mime and turn-taking, the only
group that differed significantly from all others was the autistic/mildly retard-
ed group. Two task ratings, reciprocal play and telling a story, discriminat-
ed autistic from nonautistic groups within the same IQ range, but did not
differentiate the nonretarded autistic group from the mentally handicapped
sample. These results are summarized in Table VI. Symbolic play, questions
about emotions, and questions about marriage each yielded significant differ-
ences between autistic and nonautistic groups, but not between the two autis-
tic samples. Finally, for two other socioemotional questions concerning
concept of friendship and quality of person, significant differences were iden-
tified between all groups.

Algorithm

In order to test concurrent criterion-related validity, clinical guidelines


for the diagnosis of autism in the draft version of ICD-10 (WHO, 1987) were
operationalized in terms of abnormalities on specific items of the ADOS.
As described below, an algorithm based on these items was then used to de-
termine the number of subjects who had earlier been placed in each diagnos-
tic group according to clinical judgments that met diagnostic criteria for
autism on the basis of behaviors observed during the ADOS.
First, as shown in Table VII, reliable items on the ADOS related to
specific diagnostic criteria were identified. Several items (i.e., pronoun rever-
sal, neologisms and idiosyncratic language, compulsions and rituals) with
incidences too low to be evaluated statistically were included, because agree-
ment for occurrence was very high. Second, items were grouped according
to the three guidelines for diagnosis in the ICD-10 draft (i.e., reciprocal so-
Autism Diagnostic Observation Schedule 201

Z
A XIIIIXIII
,- 0
~"

IXl I ~_o_.o
i~ I I XXXX g ~
O

I IX IXX IX IXX I

i
~r

O
.=.
I ='o" u
e~

~ o ~ ~.~
9- = ,, "~ .~,'EI ~'~o'

9~ c ~
....
~.~
o'~ o .~ ~ . ~ I ~' ~
I
a~

~0 ~ ~ 0 ,.-*,. ~-~1
"~ I ~ " ~ ' ~ ~ ~ ~ = = ~1"~ ~v v
202 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood, and Schopler

cial interaction, communication/language, and restricted/stereotyped be-


haviors), and a score for the total of these items for each guideline was
arbitrarily set to conform as much as possible to ICD-10 research criteria.
Next, total scores were computed for the mentally handicapped children, the
normal children, and both groups of autistic children from the validity sam-
pies. Cutoffs of 6 points for reciprocal social interaction, 4 points for com-
munication/language, and 2 points for restricted/stereotyped behaviors were
set, again adhering as closely as possible to the ICD-10 draft guidelines. Ana-
lyses were then conducted to assess the discriminant validity of each guide-
line. In addition, interrater and test-retest reliability for each algorithm area
score and the overall scores was assessed.

Validity of Algorithm

The discriminant validity of the algorithm was quite good; both the
social criteria and the communication criteria were indeed successful in
differentiating autistic subjects. Kruskal-Wallis ANOVAs yielded highly sig-
nificant differences, X2(3, N -- 80) -- 57.40 for social, 53.12 for communi-
cation, p < .0001. Paired comparisons 6O < .01) indicated significant
differences for scores for the social guideline, with the normally developing
group scoring lower than the mentally handicapped group, who scored low-
er than both autistic groups. Paired comparisons 6o < .01) for scores for
the communication/language guideline revealed significant differences for
all pairs of groups, including between the two autistic groups. These sum-
mary scores are depicted according to each guideline in Figures 1 and 2.
On the other hand, though criterion-related validity was highly signifi-
cant for the two guidelines considered together, Kruskal Wallis x2(3, N =
80) = 63.47, p < .0001, when cutoffs of 6 points for the social guideline
and 4 points for communication were used, not all individual subjects were
appropriately classified as autistic or nonautistic. As shown in Table VIII,
three nonretarded autistic subjects failed to meet the communica-
tion/language criteria for autism. For two of these subjects, summary scores
were 1 point below the cutoff. All three of these subjects had verbal IQs and
full-scale scores above 100, and scores on language tests (such as the Pea-
body Picture Vocabulary Test-Revised Version; Dunn & Dunn, 1981) above
a 7-year age equivalent.
In addition, two mentally handicapped subjects met criteria for autism
in both social and communication areas. These two children were quite differ-
ent from each other. One was an exceptionally active 12-year-old girl with mild
mental retardation who was very disinhibited. Her eye contact was unusual,
and she showed inappropriate social distance, very odd intonation, and repeti-
Autism Diagnostic Observation Schedule 203

20. [] Normal
9 Autistic - High
9 Autistic - Low
15
[] Mentally Handicapped

~-j 1084
"6

0 O0!i ojo
~s

o:s 4-7 e'. 12-'1s


Total Scores

Fig, 1, ADOS algorithm for reciproca] social interaction:


distribution of scores by diagnostic group,

20-
[] Normal
9 Autistic - High
t 9 Autistic - Low
15- [] Mentally Handicapped

O9
-a

z 5

ol is o
o-2 3"5 e'e 8-~2
Total Scores
Fig. 2. ADOS algorithm for communication/language:
distribution of scores by diagnostic group.
204 Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood, and Sehopler

Table VII. ADOS items for Algorithm Derived From Draft of ICD-10 Criteria
Criterion ADOS item

Qualitative impairments in reciprocal social interaction


Qualitative impairments in eye gaze, facial
expression, and gesture to regulate social
interaction
Use of eye contact
Facial expression
Social distance
Nonverbal communication linked with
language
A lack of socioemotional reciprocity as shown by
an impaired or deviant response to other people's
emotions; and/or a lack of modulation of behavior Overall rapport
according to social context; and/or a weak integra- Quality of social response
tion of social and communicative behaviors
Quality of social overtures
Social disinhibition
Qualitative impairments in communication/language
Relative failure to intiate or sustain conver- Conversation
sational interchange with to-fro responsivi- Reports
ty to the communication of other person Idiosyncratic language
Neologisms
Stereotypic and repetitive use of language Inappropriate questions and statements
Pronoun reversals

Abnormalities in pitch, stress, rate, rhythm, Intonation/rhythm/rate


and intonation
Lack of varied, spontaneous make-believe Imagination and creativity
play
Restricted, repetitive and stereotyped interests, activities, and patterns of behavior
Encompassing preoccupation with stereo- Unusual preoccupation
typed and restricted patterns of interest
Apparently compulsive adherence to specif- Compulsions/rituals
ic, non-functional, routines or rituals
Stereotyped and repetitive motor mannerisms Autistic mannerisms and stereotyped
that involve either hand/finger flapping or movements
twisting, or complex whole body
movements

Preoccupations with part-objects or non- Unusual sensory interests


functional elements of play materials
t
Autism Diagnostic Observation Schedule 20~

~Q

z oo

~5 0 0

.<

o~
8 ~
U
"d

t~

~ ~ "~ ~ ' ~ ,

~. "~ ~ ' ~
206 Lord, Rutter, Goode, Heemshergen, Jordan, Mawhood, and Sehopler

20, [] Normal
A u t i s t i c - High

9 Autistic - Low
15
Mentally Hsndicapped

o
@

~-3 lO 84
"6
JD
E

0 9
o
IJd I
Total Scores
:~
0

Fig. 3. ADOS algorithm for restricted/stereotyped be-


0
s~
0

haviors: distribution of scores by diagnostic group.

five language. The other subject was a painfully shy 14-year-old boy with mild
mental retardation. He was very restricted in his movements and nonverbal com-
munication. He had a particularly odd way of speaking. The rhythm and volume
of his speech were unusual and, in this situation, he had virtually no reciprocal
conversation. However, neither child was judged to be autistic on the overall
rating by any of three raters and, clinically, neither child would have even been
considered autistic-like in their social behavior. Thus, while statistical analyses
indicated significantly different distributions across groups and between each
pair of groups (Tukey, ps < .01) for combined scores of social and communi-
cation criteria, the assignment of individuals to diagnostic groups did not per-
fectly reflect clinical judgments.
Identifying restricted and stereotypic behaviors for some subjects was
difficult within the context of the ADOS, even though, once again, distribu-
tions across diagnostic groups were significantly different, Kruskal-Wallis x20,
N =80) = 48.11 p < .0001. In this case, the two nonautistic groups did not
differ from each other, but all other comparisons were significant. As shown
in Figure 3, for the autistic/mildiy retarded children, it was possible to observe,
during the ADOS, clear examples of sensory anomalies, preoccupations, or
autistic-type mannerisms (yielding a total score of 2 in 12 of 20 children. Seven
other autistic/mildly retarded children scored I, with only 1 autistic/mildly
retarded child showing no evidence of this type of behavior. On the other hand,
it was relatively unusual to observe (or at least recognize) compulsions or ritu-
als during the ADOS, even in the autistic/mildly retarded children.
Autism Diagnostic Observation Schedule 207

Restricted and stereotypic behaviors, and interests were seen with suffi-
cient clarity (i.e., received a total score of 2) in only 7 of 20 of the nonretarded
autistic subjects,~ with 6 nonretarded autistic subjects receiving a 1, and 7 more
receiving a 0, even though all of these children and adults were reported by
parents as regularly showing such behaviors in other contexts. Behaviors fall-
hag in this category occurred very rarely in the mildly mentally handicapped
and normally developing subjects.
The distribution of scoressummed across all three guidelines was also com-
pared across diagnostic groups, as portrayed in Figure 3, Kruskal-Wallis • (3,
N = 80) = 57.30, p < .0001. Again, all pairs of groups differed from each
other, with the distribution for the two autistic groups becoming slightly more
skewed.

Reliability of Algorithm

The interrater reliability of each of the algorithm guidelines and the


overall scores given to each subject by each rater was assessed using intraclass
correlations. Intraclass correlations ranged from .92 to .96 for all analyses
except restricted/stereotyped behaviors, which was .75 (all ps < .0001). Stan-
dard errors of measurement were as follows: social = 1.55, communication
= 1.26, restricted/stereotyped behavior = .51, social and communication
combined = 2.89, and total = 4.08. When total scores were compared across
raters, two mentally handicapped subjects were classified differently by each
rater on social criteria, one mentally handicapped subject was classified differ-
ently on communication criteria, and four autistic subjects were classified
differently for restricted/stereotyped behavior. All disagreements were wi-
thin 2 points. Only two disagreements (both for autistic subjects on restrict-
ed/stereotyped behavior) resulted in a change of overall diagnosis; both
subjects met all three criteria according to the one rater, but only the social
and communication criteria according to the other. There were no changes
in classification when only social and communication area scores were used.
Test-retest reliability of the algorithm was assessed in the same fashio n
using live ratings for the 11 subjects who were tested twice. Intraclass corre-
lations ranged from .70 for restricted/stereotyped behavior, to between .82
and .92 for all other scores. The mentally handicapped boy described earlier
no longer met the social criteria for autism on his second assessment, though
his total communication score remained the same. One autistic child,s score
on restricted/stereotyped behavior increased so that he came to meet criter-
ia on all three guidelines, as opposed to two. These shifts involved changes
of 1 and 2 points, respectively. Otherwise, all classifications, remaine d the
saFne.
208 Lord, Rutter, Goode, Heemshergen, Jordan, Mawhood, and Schopler

DISCUSSION

The ADOS provides a protocol for the observation and assessment of


communicative and social behaviors of autistic children and adults. Adequate
interrater and test-retest reliability was found for task ratings and general
codings, though some items require further study with larger populations.
The need for live ratings, made after both examiners had actually interacted
with the subject, illustrates the importance of the interactive nature of the
assessment of social behavior (Farrell, Mariotto, Conger, Curran, & Wal-
lander, 1979). Similarly, the finding that general ratings provided clearer dis-
crimination between groups and showed higher test-retest reliabilities than
did task codings indicates the importance of noting the quality of interac-
tions (Dewey, Lord, & Magill, 1988; Pettit et al., 1987). The semistruc-
tured method allows the examiner to observe behaviors in response to
standardized presses while participating in the interaction and adjusting the
particular contexts to the skills of the subject.
One of the restrictions of the ADOS is that it is dependent on the clini-
cal skills of the examiner. Substantial training is required for its administra-
tion and scoring. The protocol for the examiner consists of a hierarchy of
prompts that range from providing the autistic person with an opportunity
for spontaneous behavior to directing the autistic person or instructing his
or her behavior in quite specific ways if no response is made earlier in the
hierarchy. The aim is to achieve a balance between provoking some behavior
from the autistic subject and not overcuing or directing the kind of behavior
that the autistic person produces. This can be a difficult task even for ex-
perienced clinicians or teachers. Supervised practice is required, much the
same as for someone learning to administer an individual test of intelligence,
development, or motor skills.
The ADOS, as reported here; was designed originally for subjects with
an estimated mental age of 3 years or greater. The reliability and validity
samples included only verbal children and adolescents, whose estimated age
equivalents in expressive language, as well as nonverbal performance scores,
were equivalent to 3 years or higher. Suggestions for using the ADOS with
younger and/or lower-functioning children are available from the first author,
but reliability and validity have not yet been documented. This is an impor-
tant clinical limitation in the scale, since initial diagnoses of autism are usually
made for children under age 6 years, many of whom will not have language
at a 3-year-old level. Differentiation between severely retarded subjects with
and without autism using the ADOS remains an additional query.
Significant differences were found in the distributions of the two au-
tistic groups and the two nonautistic control groups. On the whole, general
ratings differed both for autistic-nonautistic comparisons and for compari-
sons of autistic subjects of different intellectual levels. However, as indicat-
Autism Diagnostic Observation Schedule 209

ed by the algorithm, some overlap occurred for a small number of individuals,


particularly when nonretarded autistic subjects were codtrasted with nonau-
tistic mentally handicapped subjects.
In addition, significant differences between nonautistic groups (i.e., nor-
mally developing and mentally handicapped) on most general ratings, on
several task items, and on all algorithm guideline scores except reciprocal
social interaction, suggest a developmental factor underlying some aspect
of the schedule. The question of how to minimize the contribution of de-
velopmental effects in the diagnosis of autism is a familiar one (Parks, 1983;
Volkmar et al., 1988). This issue, as it applies to the ADOS, obviously re-
quires additional investigation, though it did not interfere with the correct
classification of almost all of even a very high-functioning group of children
and adolescents with autism.
Further work is also needed in order to describe more exactly the na-
ture of the social and communicative deficits, verbal and nonverbal, that
are specific to autism. For example, the fine-grained analyses of Mundy, Sig-
man, Ungerer, and Sherman (1986) have yielded differences between the use
of eye contact in young autistic children and in other children in particular
situations involving the intiation of indicating. Magill and Lord (1989; Lord
& MagiU, 1989) found clear differences, specific to autism, in smiling and
the coordination of eye gaze, whole body movements, gesture, and talking,
during the greetings and initiations made by school age high-functioning au-
tistic children to familiar adults and peers in a group setting. The difficulty
for an instrument such as the ADOS lies in identifying such behaviors and
contexts that differentiate persons with autism from nonautistic persons across
age and IQ ranges.
The ADOS does not provide an adequate assessment of preoccupations,
compulsions, restricted interests, or autistic mannerisms, particularly in the
nonretarded autistic group. The time is too brief, the tasks too structured,
and the contexts too narrow. Thus, the occurrence of any stereotyped
or restricted behaviors or interests can be meaningful for a diagnosis, be-
cause these behaviors were almost always associated with autism in the popu-
lations studied here. However, the absence of such behaviors during the
ADOS cannot be interpreted. Consequently, information is required from
other sources for an accurate diagnosis (Le Couteur et al., 1989).
Confirmation of the algorithm employing data from additional sam-
ples is needed. It seems likely that, with additional data, specific items on
the algorithm may vary. For example, the item, shared enjoyment, was not
included in the algorithm because its codings were changed after many sub-
jects in the validity study had already been rated. Other items, such as social
disinhibition, may not be appropriate in discriminating normally develop-
ing younger populations or those with more severe mental handicaps from
autistic children with severe mental retardation.
210 Lord, Rutter, Goode, Heemshergen, Jordan, Mawhood, and Schopler

A l t o g e t h e r , the p l a n n e d series o f social a n d c o m m u n i c a t i v e presses t h a t


c o n s t i t u t e the A D O S was successful in a l l o w i n g us t o focus o n the q u a l i t a -
tive aspects o f social skills a n d c o m m u n i c a t i o n . W i t h i n the c o n s t r a i n t s o f
t h e c u r r e n t s a m p l e s , this focus p r o v i d e d the o p p o r t u n i t y t o m a k e j u d g m e n t s
a b o u t autistic b e h a v i o r s t h a t were relatively i n d e p e n d e n t o f d e v e l o p m e n t a l
level. F u r t h e r w o r k will p e r m i t e v a l u a t i o n o f the p r e s e n t findings with in-
d e p e n d e n t a n d m o r e v a r i e d s a m p l e s , as well as c o n t i n u e d i n v e s t i g a t i o n o f
t h e schedule's p s y c h o m e t r i c p r o p e r t i e s . A t this p o i n t , the A D O S seems m o s t
v a l u a b l e as a s t a n d a r d i z e d set o f c o n t e x t s in w h i c h to o b s e r v e b e h a v i o r s al-
r e a d y i d e n t i f i e d as o f d i a g n o s t i c significance, a n d as a s t a n d a r d d a t a b a s e
f r o m which t o p r o b e f o r m o r e a c c u r a t e u n d e r s t a n d i n g o f the p a r t i c u l a r q u a l -
ities o f social a n d c o m m u n i c a t i o n skills t h a t c h a r a c t e r i z e a u t i s m . A w e a l t h
o f i n f o r m a t i o n c o n c e r n i n g specific aspects o f social b e h a v i o r a n d l a n g u a g e
in a u t i s m awaits analysis.

REFERENCES

American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disord-
ers (3rd ed. rev.) Washington: Author.
Cicchetti, D. V., Lee, C. Fontana, A. F., & Dowds, B. N. (1978). A computer program for
assessing specific category rater agreement for qualitative data. Educational and Psy-
chological Measurement, 38, 805-813.
Cicchetti, D. V., & Sparrow, S. A. (1981). Developing criteria for establishing interrater relia-
bility of specific items: Applications to assessment of adaptive behavior. American Journal
of Mental Deficiency, 86, 127-137.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagree-
ment of partial credit. Psychological Bulletin, 70, 213-220.
Cohen, D. J., Caparulo, B. K., Gold, J. R., Waldo, M. C., Shaywitz, B., Ruttenberg, B. A.,
& Rirrdand, B. (1978). Agreement in diagnosis: Clinical assessment and behavior rating
scales for pervasively disturbed children. Journal of the American Academy of Child
Psychiatry, 17, 589-603.
Conger, A. J. (1980). Integration and generalization of kappa for multiple raters. Psychologi-
cal Bulletin, 88, 322-328.
Dunn, L. M., & Dunn, L. M. (1981). Peabody Picture Vocabulary Test-Revised: Manual for
Forms L and M. Circle Pines, MN: American Guidance Service.
FarreU, A. D., Mariotto, M. J., Conger, A. J., Curran, J. P., & WaUander, J. L. (1979). Self-
ratings and judges' ratings of heterosexual social anxiety and skill: A generaliTabilitystudy.
Journal of Consulting and Clinical Psychology, 47, 164-175.
Fleiss, J. (I986). The design and analysis of clinical experiments. New York; Wiley.
Freeman, B. J., Ritvo, E. R., Guthrie, D., Schroth, P., & Bail, J. (1978). The behavior obser-
vation scale for autism: Initial methodology, data analysis, and preliminary findings on
89 children. Journal of the American Academy of Child Psychiatry, 17, 576-588.
Freeman, B. J., Ritvo, E. R., & Schroth, P. C. (1984). Behavior assessment of the syndrome
of autism: Behavior Observation System. Journal of American Academy of Child Psy-
chiatry, 23, 588-594.
Hubert, L. (1977)~ Kappa revisited. Psychological Bulletin, 84, 289-297.
Hubert, L. J. (1978). A general formula for the variance of Cohen's weighted kappa. Psycho-
logical Bulletin, 85, 183-184.
Autism Diagnostic .Observation Schedule 211

Krug, D. A., Arick, J., & Almond, P. (1980). Behavior checklist for identifying severely han-
dicapped individuals with high levels of autistic behavior. Journal of Child Psychology
and Psychiatry, 21, 221-229.
LeCouteur, A., Rutter, M., Lord, C., Rios, P., Robertson, S., Holdgrafer,,M., & McLennan,
J. D. 0989). Autism diagnOSticinterview. Journalof Autism and DevelopmentalDisorders, 19.
Lord, C., & Magill, J. (1989). Observing_social behavior in an asocial population: Methodo-
logical and clinical issues. In G. Dawson (Ed.), Autism: New perspectives on diagnosis,
nature and treatment. New York: Guilford Press.
Magill, J., & Lord, C. (1988). An observational study of greetings of autistic, behavior-disordered
and normally developing children. Manuscript submitted for publication.
Marascuilo, L. A., & McSweeney, M. (1977). Nonparametric and distribution-free methods
for the social sciences. Monterey, CA: Brooks/Cole Publishing Co.
Mundy, P., Sigman, M., Ungerer, J., & Sherman, T. (1986). Defining the social deficits of
autism: The contribution of nonverbal communication measures. Journal of ChildPsy-
chology and Psychiatry, 27, 657-669.
Murray, H. A. (1938). Explorations in personality. New York: Oxford.
Parks, S. L. (1983). The assessment of autistic children: A selective review of available instru-
ments. Journal of Autism and Developmental Disorders, 13, 225-268.
Pettit, G. S., McClaskey, C. L., Brown, M. M., & Dodge, K. A. (1987). The generalizability
of laboratory assessments of children's socially competent behavior in specific situations.
Behavioral Assessments, 9, 81-96.
Rice, J. P., McDonald-Scott, P., Endicott, J., Coryell, W., Grove, W. M., Keller, M. B., &
Altis, D. (1986). The stability of diagnosis with an application to bipolar II disorder.
Psychiatry Research, 19, 285-296.
Ruttenberg, B. A., Kalish, B. I., Wenar, C. & Wolf, E. G. (1977). Behavioral rating instru-
ment for autistic and other atypical children. Philadelphia: The Developmental Center
for Autistic Children.
Rutter, M. (1978). Diagnosis and definition of childhood autism. Journal of Autism and De-
velopmental Disorders, 8, 139-161.
Rutter, M. (1985). Infantile autism. In D. Shaffer, A. Erhardt, & L. Greenhill (Eds.). A clini-
cal guide to child psychiatry (pp. 48-78) . New York: Free Press.
Rutter, M., Le Couteur, A., Lord, C., MacDonald, H., Rios, P., & Folstein, S. (1988).
Diagnosis and subclassification of autism: Concepts and instrument development. In E.
Schopler & G. B. Mesibov (Eds.), Diagnosticand assessment issuesin autism (pp. 239-260).
New York: Plenum Press.
Rutter, M., & Schopler, E. (1987)~. Autism and pervasive developmental disorders: Concepts
and diagnostic issues. Journal of Autism and Developmental Disorders, 17, 159-186.
Schopler, E., Reichler, R. J., & Renner, B. R. (1986). The ChildhoodAutism Rating Scale (CARS)
for diagnostic screening and classification o f autism. New York: Irvington Publishers.
Siegel, B., Anders, T. F., Ciaranello, R. D., Beinenstock, B., & Kraemer, H. C. (1986). Empir-
ically derived classification of the autistic syndrome. Journal of Autism and Develop-
mental Disorders, 16, 275-293.
Teal, M. B., & Wiebe, M. J. (1986). A validity analysis of selected instruments used to assess
autism. Journal of Autism and Developmental Disorders, 16, 485-494.
Volkmar, F. R., Ciccbetti, D. V., Dykens, E., Sparrow, S. S., Leckman, J. F., & Cohen, D.
J. (1988). An evaluation of the Autism Behavior Checklist. Journal of Autism andDe-
velopmental Disorders, 18, 81-98.
Voikmar, F. R., Sparrow, S. S., Goudreau, D., Ciccbetti, D. V., Paul, R., & Cohen, D. J.
(1987). Social deficits in autism: An operational approach using the Vineland Adaptive
Behavior Scales. Journal of the American Academy of Child and Adolescent Psychiatry,
26, 156-161.
Wechsler, D. (1974). WechslerIntelligence Scalefor Children-Revised. New York: Psychologi-
cal Corp.
Wechsler, D. ( 1981). WechslerIntelligence Scalefor Children-Revised. New York: Psychologi-
cal Corp.
212 Lord, Rutter, Goode, Heemshergen, Jordan, Mawhood, and Schopler

Wenar, C., Ruttenberg, R. A~, Kalish-Weiss, B.,& Wolf, E. G. (1986), The development of
normal and antistic children: A comparativesmdy. Journal of Autism and Developmental
Disorders, 16, 317-334.
Wolff, S., & Barlow, A. 0979). Schizoid ,personality in childhood: A comparative study of
schizoid, autistic and normal children, Journal of Child Psychology and Psychiatry, 20,
29-46. 9
World Health Organization. (1987). ICD-I O1986 draft o f chapter 5 categories FOO-F99.Men-
tal, behavioral and developmental disorders. Geneva: Author.

You might also like