Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

(2003). In C. R. Reynolds Handbook of psychological New York: Guilford.

& R. W. Kamphaus (Eds.), and educational assessment:


Intelligence, aptitude, and
achievement (2nd ed.)
(pp. 217-242).

10
Stanford-Binet Intelligence Scale:
Fourth Edition (SB4):
Evaluating the Empirical Bases for Interpretations

ERIC A. YOUNGSTROM
JOSEPH J. GLUTIING
MARLEY W. WATKINS

The Stanford-Binet Intelligence Scale: Despite these similarities, these revisions


Fourth Edition (SB4; Thorndike, Hagen, & are substantially different from their prede-
Sattler, 1986b) is the most recent edition in a cessors. The SB4 eliminated the traditional
line of instruments going back almost a cen- age scale format. In its place are 15 subtests
tury (viz., Binet & Simon, 1905). The 1986 whose age-corrected scaled scores make it
revision revitalized the Stanford-Binet by possible to interpret profile elevations and
both maintaining links with previous edi- profile depressions. Four "area" scores, de-
tions of the scale and simultaneously incor- rived from theoretically based subtest
porating more recent developments found in groupings, are also new. These reformula-
other popular tests of intelligence. The SB4 tions add to interpretative possibilities, and
retains as much item content as possible they attempt to broaden the coverage of
from the Stanford-Binet Intelligence Scale: cognitive ability over that offered by
Form L-M (SB-LM; Thorndike, 1973). SB4 SB-LM. SB4 permits calculation of the
also respects tradition by covering approxi- Composite (overall IQ) for performances
mately the same age range as SB-LM (ages based on specific "abbreviated batteries,"
2-23); it incorporates familiar basal and as well as for any combination of subtests
ceiling levels during testing; and it provides psychologists wish to regroup-promoting
an overall score that appraises general cog- flexibility in administration and interpreta-
nitive functioning. As this chapter is being tion.
written, the fifth edition of the Stanford- This chapter familiarizes readers with the
Binet (SBS) is beginning item tryouts in structure and content of SB4. It also evalu-
preparation for standardization. The plan ates selected aspects of the test's psychomet-
for this newest edition also shares a commit- ric and technical properties. In addition, we
ment both to the Stanford-Binet tradition hope to sensitize psychologists to factors
and to incorporating current theories about pertinent to the administration of SB4 and
psychometrics (e.g., item response theory) to the interpretation of its test scores. The
and the structure of intelligence. chapter aims to present a balanced treat-

217
218 II. ASSESSMENT OF iNTELLIGENCE AND LEARNING STYLES/STRATEGIES

ment of strengths and limitations. The high- THEORETICAL FOUNDATION


est professional standards were applied
throughout the development of SB4. Prior Perhaps the most fundamental change in-
to publication, the authors and publisher corporated into SB4 is the expansion of its
dedicated over 8 years to development and - theoretical model. Figure 10.1 shows that
2 years to extensive data analyses of the fi- SB4 has three levels, which serve both tra-
nal product. Thus, it is no surprise that we ditional and new Binet functions. At the
identify unique and praiseworthy features. apex is the Composite, or estimate of gen-
Similarly, no test is without faults, and this eral ability, traditionally associated with Bi-
thought should place the potential short- net scales. The second level is new to SB4.
comings of SB4 in context. It proposes three group factors: Crystallized
The first section of this chapter outlines Abilities, Fluid-Analytic Abilities, and
the theoretical model underlying SB4. We Short-Term Memorv. The first two dimen-
turn then to a general description of the sions originate from' the Cattell-Horn theo-
structure of SB4 and issues related to its test ry of intelligence (Cattell, 1940; Horn,
materials, administration, and scaling. 1968; Horn & Cattell, 1966). These are
Thereafter, we discuss the strengths and shaded in the figure, because the published
weaknesses associated with SB4's standard- interpretive system for the SB4 does not
ization, its reliability and validity, and fac- emphasize the calculation of observed
tors related to the interpretation of its test scores corresponding with these factors.
scores. Finally, we take a look at the devel- The additional component, Short-Term
opment of SB5. Memory, is not contained in the Cattell-

Composite Levell

Fluid-
Crystallized Short-Term
Analytic Level II
Abilities Memory
Abilities

Abstract!
Verbal Quantitative Level III
Visual
Reasoning Reasoning
Reasoning

Vocabulary Quantitative Pattern Analysis Bead Memory


Comprehension Number Series Copying Memory for Sentences
Absurdities Equation Building Matrices Memory for Digits Subtests
Verbal Relations Paper Folding Memory for Objects
and Cutting

FIGURE 10.1. Theoretical model for SB4.


10 Stanford-Sinet Intelligence SCGle Fourth Edition (S64) 219

Horn theory. Its inclusion reflects the way ability). In addition, both tests share the de-
in which psychologists used previous edi- sirable quality of using explicit theoretical
tions of the Binet (Thorndike, Hagen, & frameworks as guides for item development
Sattler, 1986a); to some extent, it also re- and for the alignment of subtests within
flects factor-analytic work with other intel: modeled hierarchies.
ligence tests, suggesting that short-term
memory is related to long-term memory
and to more complex learning and problem TEST STRUCTURE
solving (Thorndike, Hagen, & Sattler,
1986c). Subtest Names and Content
The third level illustrates another differ-
ence between the SB4 and earlier editions of SB4's subtests retain some reliable variance
the scale. Here, factors are identified in that is distinct from the score variation cap-
terms of three facets of reasoning: Verbal tured by area scores or the Composite. Be-
Reasoning, Quantitative Reasoning, and cause of this specificity within each subtest,
AbstractNisual Reasoning. These compo- the developers described the "unique abili-
nents resemble the third level of Vernon's ties" evaluated by SB4's subtests. Profile
(1950) hierarchical model of intelligence, analysis is a popular method for explicating
wherein well-known Verbal-Educational an examinee's strengths and weaknesses on
and Practical-Mechanical factors are subdi- these abilities (see Delaney & Hopkins,
vided to obtain even more homogeneous es- 1987; Naglieri, 1988a, 1988b; Rosenthal &
timates of ability. Vernon, for example, Kamphaus, 1988; Sattler, 1992; Spruill,
splits the Verbal-Educational factor into the 1988). Therefore, inasmuch as SB4 supports
scholastic content of verbal fluency, numeri- comparisons among subtest scores, it is
cal operations, and so on. SB4 follows this worthwhile to understand the identity and
orientation by incorporating dimensions for composition of these measures.
the assessment of Verbal Reasoning and Descriptions of the 15 SB4 sub tests are
Quantitative Reasoning. Similarly, SB4's provided in Table 10.1, organized according
AbstractNisual Reasoning dimension paral- to the theoretical area each occupies in the
lels the Practical-Mechanical component of scale.
the Vernon model.
The three group factors at the third level
(Verbal Reasoning, Quantitative Reasoning,
Content Similarity with Other IQ Tests
AbstractNisual Reasoning), plus the Short- SB4 items appear representative of the item
Term Memory factor at the second level, content found in intelligence tests (see
form the four "area" scores derived by SB4. Jensen, 1980, for a detailed analysis of item
The AbstractNisual Reasoning score at the types common among IQ tests). Visual in-
third level corresponds to the Fluid-Analytic spection reveals that six SB4 subtests share
Abilities dimension at the second level. No core content with the Wechsler Intelligence
area score is readily available for the third Scale for Children-Third Edition (WISC-
dimension at the second level, Crystallized III; Wechsler, 1991). For example, both SB4
Abilities; nevertheless, scores for this broad- Vocabulary and WISC-III Vocabulary assess
band factor can be estimated by collapsing word knowledge. SB4 Comprehension and
results across the remaining two of the four WISC-III Comprehension measure breadth
areas (Verbal Reasoning and Quantitative of knowledge of social and interpersonal sit-
Reasoning). uations, and the visual-perceptual abilities
Thus the SB4 model is an eclectic unifica- evaluated by SB4 Pattern Analysis generally
tion of multiple theories of intelligence. apply to WISC-III Block Design. Likewise,
Such synthesis is not unique to SB4. The there are marked similarities between the
Kaufman Assessment Battery for Children SB4 Quantitative subtest and WISC-III
(K-ABC; Kaufman & Kaufman, 1983) also Arithmetic, between SB4 Memory for Digits
accounts for test performance through in- and WISC-III Digit Span, and between SB4
terrelationships among theories (i.e., the Verbal Relations and WISC-III Similarities.
Luria-Das and Cattell-Horn theories of Resemblances in subtest content are also ap-
220 I A.SSESSMENT OF INTELLIGENCE AND LEi'li\NING STYLES/STRATEGIES

TABLE 10.1. SB4 Subtests: Age Range, Median Reliability, and Content
Arealsu btest Ages Reliability Content
Verbal Reasoning
Vocabulary 2-23 .87 Examinees supply word definitions. The first 15 items tap
receptive word knowledge (examinees name pictured
objects), and items 16 through 46 are presented both orally
and in writing vocabulary.
Comprehension 2-23 .89 Items 1 through 6 require the receptive identification of body
parts. Items 7 through 42 elicit verbal responses associated
with practical problem solving and social information.
Absurdities 2-14 .87 This subtest presents situations that are essentially false or
contrary to common sense. Examinees point to the
inaccurate picture among three alternatives (items 1 through
4), or they verbalize the absurdity in a single picture (items 5
through 32).
Verbal Relations 12-23 .91 Examinees state how three words, out of a four-word set, are
similar. The fourth word in each item is always different
from the three words preceding it.

Quantitative Reasoning
Quantitative 2-23 .88 Examinees are required to count, add, seriate, or complete
other numerical operations (e.g., count the number of blocks
pictured; how many 12" by 12" tiles would be needed to
cover a floor that is 7 feet by 9 feet?).
Number Series 7-23 .90 A row of four or more numbers is presented, and the task is
to identify the principle underlying a series of four or more
numbers and to apply that principle to generate the next two
numbers in the series (e.g., 1,3, 7, 15, _, _).
Equation Building 12-23 .91 Examinees resequence numerals and mathematical signs into
a correct solution (e.g., 15, 12,2,25, =, +, -).

Abstract/Visual Reasoning
Pattern Analysis 2-23 .92 Items 1 through 6 require examinees to complete
formboards. Items 7 through 42 involve the replication of
visual patterns through block manipulations.
Copying 2-13 .87 Examinees either reproduce block models (items 1 through
12) or draw geometric designs, such as lines, rectangles, and
arcs, that are shown on cards (items 13 through 28).
Matrices 7-23 .90 Each item presents a matrix of figures in which one element
is missing. The task is to identify the correct element among
multiple-choice alternatives.
Paper Folding 12-23 .94 Figures are presented in which a piece of paper has been
and Cutting folded and cut. Examinees chose among alternatives that
show how the paper might look if it were unfolded.

Short- Term Memory


Bead Memory 2-23 .87 Examinees recall the identity of one or two beads exposed
briefly (items 1 through 10), or they reproduce bead models
in a precise sequence (items 11 through 42).
Memory for 2-23 .89 Examinees are required to repeat each word in a sentence
Sentences in the exact order of presentation.
Memory for 7-23 .83 Examinees repeat digits either in the sequence they are
Digits presented, or in reverse order.
Memory for 7-23 .73 Pictures of objects are viewed briefly. Examinees then
Objects identify the objects in correct order from a larger array.
10 Stanford-Binet Intelligence Sc::::.e Fourth Edition (584) 221

parent between SB4 and the K-ABC. The probability of obtaining valid test scores.
four most striking parallels occur between However, sometimes color is not equally
(1) SB4 Pattern Analysis and K-ABC Trian- salient or perceptually unambiguous to ex-
gles, (2) SB4 Matrices and K-ABC Matrix aminees. Such a situation can arise when
Analogies, (3) SB4 Memory for Digits and_ persons with color-blindness are being as-
K-ABC Number Recall, and (4) SB4 Memo- sessed. For these individuals, color repre-
ry for Objects and K-ABC Word Order. sents an additional source of score variance
These comparisons suggest that there exists that can reduce test validity. They are most
a core set of subtests (generally including likely to experience difficulty when con-
those with the highest g saturation) that are fronted by the following color combina-
shared across commonly used measures of tions: red-brown, green-orange, red-grey,
ability. blue-purple, and red-green (Coren, Ward,
& Enns, 1999).
Two examples are presented where color
MATERIALS may alter SB4 item difficulties. Item 1 of the
Vocabulary subtest shows a red car on a
Three manuals accompany SB4: the Guide brown background. This color combination
for Administering and Scoring (Thorndike makes it more difficult for some individuals
et al., 1986a), the Technical Manual with color-blindness to distinguish the im-
(Thorndike et al., 1986c), and the supple- portant foreground stimulus (the car) from
mentary Examiner's Handbook (Delaney & its background. Another example can be
Hopkins, 1987). All three manuals are well seen in the form board items in Pattern
written and informative. Chapters pertinent Analysis. The red puzzle pieces and green
to test administration are especially well or- background make the formboard more dif-
ganized in the Examiner's Handbook. Psy- ficult for examinees with red-green color
chologists new to SB4 are encouraged to blindness. (See also Husband & Hayden,
read these sections of the handbook prior to 1996, for an investigation of the effects of
reviewing the Guide for Administering and varying stimulus color on several SB4 sub-
Scoring. tests.) Fortunately, the problems associated
SB4 materials are attractive, well pack- with color stimuli can be corrected by sim-
aged, and suitable to the age groups for ply not pairing these colors within test
which they are applied. The Bead Memory items. By adopting such changes, test pub-
subtest is a noteworthy exception. Direc- lishers will be able to continue offering the
tions for Bead Memory caution psycholo- benefits of color stimuli and simultaneously
gists to "BE SURE THAT EXAMINEES reduce the visual discrimination problems
DO NOT PLAY WITH THE BEADS. of examinees with color-blindness.
THERE IS A DANGER THAT YOUNG
EXAMINEES MAY TRY TO PUT THE
BEADS IN THEIR MOUTHS" (Thorndike ADMINISTRATION
et al., 1986b, p. 23; boldface capitals in
original). This caution is insufficient for the SB4 uses" adaptive testing" to economize on
danger presented. Two of the four bead administration time. This format offers the
types fit easily in a "choke tube"-an appa- added benefit of decreasing frustration, be-
ratus used to determine whether objects are cause examinees are exposed only to those
sufficiently small that young children will test items most appropriate to their ability
gag or suffocate on them. Psychologists, level. The Vocabulary subtest serves as a
therefore, should never allow young chil- "routing" measure at the beginning of each
dren to play with these objects. 1 assessment. Performance on the Vocabulary
Publishers are increasingly adding color subtest, in conjunction with an examinee's
to test stimuli. Rich colors enhance the at- chronological age, is used to determine the
tractiveness of test stimuli, and they have appropriate entry level for succeeding sub-
the positive effect of making test materials tests. Entry levels are arranged hierarchically
more child-oriented (Husband & Hayden, by item pairs (labeled "A" through "Q" on
1996). Color helps to maintain children's the test protocol). Basal and ceiling rules are
interest during testing, and it augments the then applied within subtests. A basal level is

j
222 !1 ASSESSMENT OF INTELLIGENCE AND LE.>\RNING STYLES/STRATEGIES

established when all items are passed at two One final area of concern is the develop-
consecutive levels. A ceiling is reached, and mental appropriateness of an instrument for
testing advances to the next subtest, when use with young children. Preschoolers vary
three failures (out of four possible) take in their knowledge of basic concepts (e.g.,
place across adjacent levels. There is some. "top," "behind," "same as"). As a result,
concern that the entry levels may be too high basic concepts in test directions may hinder
for youths and adults with mental retarda- preschool children's understanding of what
tion (Sattler, 1992; Spruill, 1991). This rout- is expected of them. Kaufman (1978) exam-
ing system also can be confusing to examin- ined this issue by comparing the number of
ers unfamiliar with the SB4 testing format basic concepts in the Boehm Test of Basic
(Vernon, 1987; Wersh & Thomas, 1990). Concepts (BTBC; Boehm, 1971) to those
Supervisors and instructors should make cer- found in the directions for several
tain that trainees are comfortable navigating preschool-level ability tests, including the
the routing system (Choi & Proctor, 1994), following: SB-LM; the McCarthy Scales of
and trainees should be vigilant for possible Children's Abilities (MSCA; McCarthy,
difficulty at subtest entry points when testing 1972); and the Wechsler Preschool and Pri-
children with suspected cognitive deficits. mary Scale of Intelligence (WPPSI; Wech-
SB4 deserves credit for its efficient testing sler, 1967). Results revealed that scores
format and for directions that are readable from SB-LM (5 basic concepts) were less
and straightforward. In contrast to SB-LM, susceptible to this influence than scores
SB4 administration is simpler due to such from the MSCA (7 basic concepts) or WPP-
features as incorporating most of the direc- SI (14 basic concepts).
tions, stimuli, and scoring criteria within the We compared directions in SB4 to basic
easel kits. The use of sample items helps fa- concepts in the BTBC. 2 In particular, direc-
miliarize examinees with directions and tions were analyzed for the eight SB4 sub-
item formats prior to actual testing. In addi- tests routinely administered to preschoolers.
tion, SB4 is a "power" test (as opposed to a Our findings show that SB4 assumes young
"speeded" test; Anastasi & Urbina, 1997). children know eight BTBC basic concepts.
Pattern Analysis is the only subtest requir- Although this represents an increase over
ing mandatory time limits. Doing away with the number found for SB-LM, it compared
the need for accurate timekeeping coinci- favorably to the number of basic concepts
dentally makes SB4's administration more in the MSCA, and it is fewer than that
convenient. found for the WPPSI. Thus SB4 directions
Administration times appear reasonable. are at least as likely to be understood by
The Technical Manual (Thorndike et al., preschoolers as those contained in other IQ
1986c) does not offer administration times tests.
by age level. Delaney and Hopkins (1987)
provide administration times by entry level
(A through M or higher), and we used this SCALING
information to approximate testing times
by age. Based on these estimates, testing Raw SB4 scores are converted to standard
would take between 30 and 40 minutes for age scores (SASs). SASs for the four areas
preschool-age children; 60 minutes for chil- and the Composite are synonymous with
dren between the ages of 6 and 11; and deviation IQs (M = 100, SD = 16, consistent
between 70 and 90 minutes for those at with Binet tradition). Subtest SASs are nor-
higher age levels. These values may under- malized standard scores with M = 50 and
estimate actual testing times. Sattler (1992) SD = 8. This metric is highly unusual. We
reports that the full battery is much too find no compelling reasoning for this
long to complete in most circumstances, choice, and we share Cronbach's (1987)
and he indicates that it may take 2 hours to criticism of SB4 that there is no advantage
administer the entire test to an adolescent. for choosing these units over conventional
The length of time required for the full bat- T-scores.
tery has spurred the development of a Percentile ranks are available for subtests,
plethora of short forms, which are dis- area scores, and the Composite. Although
cussed below. SB4 is no longer an age scale, age equiva-
T

10 Stcnford-Binet Inte!ligence Scoie Fourth Edition (SB4) 223

lents are supplied for the 15 subtests. More- of the WPPSI-R and K-ABC, lead to the
over, a conversion table is produced for pro- conclusion that SB4 provides an insufficient
fessionals who wish to interpret area scores floor for testing young children suspected to
and the Composite in a metric identical to perform at lower levels of ability (Flanagan
the Wechsler series (Ai = 100, SD = 15). & Alfonso, 1995). These findings are disap-
A historical advantage of Binet scales has pointing, since SB-LM was the only IQ test
been an extended floor for detecting moder- capable of diagnosing mental retardation
ate to severe mental retardation. Psycholo- with preschoolers between the ages of 2
gists will be no doubt disappointed that this years, 6 months (the upper age range of the
benefit is generally unavailable for young Bayley Scales) and 4 years, 0 months.
children on SB4 (Bradley-Johnson, 2001; Problems are compounded for younger
Grunau, Whitfield, & Petrie, 2000; McCal- preschoolers by the fact that area scores evi-
lum & Whitaker, 2000; Saylor, Boyce, Pea- dence even higher floors than the Compos-
gler, & Callahan, 2000). Table 10.2 pre- ite. For example, the lowest SAS for Quanti-
sents mlmmum overall ability scores tative Reasoning between the ages of 2
attainable for preschoolers on SB-LM, SB4, years, 0 months and 4 years, 6 months is
the WPPSI, and the K-ABC. Column 2 indi- 72. This score is above the range for mental
cates that SB-LM was fully capable of diag- retardation, and the median lowest attain-
nosing mild intellectual retardation by age able SAS is 87 for children between these
3, and moderate retardation by age 3 years, ages. With this instrument, it is impossible
6 months. In contrast, column 3 reveals that for younger preschoolers to show deficient
for all practical purposes, SB4's Composite or abnormal functioning in Quantitative
is unable to diagnose mild intellectual Reasoning. Even more disturbing, the trun-
deficits prior to age 4, and it shows no ca- cated floor makes it more probable that an
pacity for detecting moderate retardation artifactual "pattern" of strength in Quanti-
until age 5. tative Reasoning will emerge for any such
Tests such as the WPPSI, WPPSI-R, and preschooler whose Composite is in the gen-
the K-ABC have been criticized for being in- der range. Floor limitations dissipate by the
sensitive to preschoolers who perform at the age of kindergarten entry. SB4's Composite
lower end of the ability continuum (Brack- is able to identify both mild and moderate
en, 1985; Olridge & Allison, 1968; Sattler, intellectual retardation at the age of 5 years,
1992). Column 5 in Table 10.2 shows that o months. Similarly, shortcomings noted for
SB4 is somewhat more precise in this regard the Quantitative Reasoning area are re-
than the K-ABC. However, column 4 also solved essentially by the age of 4 years, 6
reveals that SB4 is no more sensitive than months (d. Grunau et al., 2000).
the WPPSI-R. These comparisons, com- Table 10.3 illustrates SB4's facility to de-
bined with existing views on the limitations tect functioning at the upper extreme. By,in-

TABLE 10.2. Preschoolers' Minimum Overall Ability Scores on SB-LM, SB4, the WPPSI-R, and the
K-ABC
Age in years and months SB4 a SB-LM WPPSI-R K-ABC
2 years, 0 months 94 b ,c 87 b ,c
2 years, 6 months 87 b,e 6g e 79 b ,c
3 years, 0 months 73 b,e 57 c
62 e 70 b ,c
3 years, 6 months 66 c
47 57 c
60 c
4 years, 0 months sse 40 48 60 c
4 years, 6 months 50 31 45 54
5 years, 0 months 44 27 43 58 e
5 years, 6 months 41 24 42 sse
Note. M = 100, SD = 16 for SB-LM and SB4; M = 100, SD = 15 for the WPPSI and K-ABC.
"'SB4 Composites are based on the assumption that a valid score (i.e., raw score> 1) is obtained on each subtest
appropriate for administration at a given age level and ability level.
bPrincipal indicator is insensitive to performances more than two standard deviations below the test mean.
cPrincipal indicator is insensitive to performances more than three standard deviations below the test mean.
224 II ASSESSN\ENT OF !NT~LL!GENCE AND 'c~ARNING STYLES/STRATEGIES

TABLE 10.3. Maximum Overall Ability Scores for Select Age Groups on SB4, SB-LM, the Wechsler
Scales, and the K-ABC
Age in years and months SB4a SB-LM Wechsler scale b K-ABC
2 years, 0 months 164 162
4 years, 0 months 164 160 160 160
6 years, 0 months 164 159 160 160
8 years, 0 months 164 164 160 160
10 years, 0 months 164 160 l60 160
12 years, 0 months 164 164 160 155
14years, 0 months 158 154 160
16 years, 0 months 152 138 155
18 years, 0 months 149 136 155
20 years, 0 months 149 155
Note. M = 100, SD = 16 for SB4 and SB-LM; M = 100, SD = 15 for all Wechsler scales and the K-ABC.
aFor any given age level, SB4 Composites are based on the maximum number of subtests specified in Appendix F of
the Guide for Administering and Scoring (Thorndike et al., 1986a).
bThe WPPSI-R Full Scale IQ (FSIQ) is the principal Wechsler indicator at age 4 years, 0 months; the WISC-III FSIQ
is used at ages 6 years, 0 months through 16 years, 0 months; and the WAIS-III FSIQ is used at ages 18 years, 0
months and 20 years, 0 months.

elusion of standard scores three or more cooperation. The second is the weighting of
standard deviations above the test mean, test scores to compensate for discrepancies
SB4 discriminates talent as adequately as between the designated sampling plan for
SB-LM did at all age levels, and it possesses socioeconomic status (SES) and SES levels in
slightly higher ceilings at ages 16 and above the obtained sample.
(columns 2 and 3). The Composite also
compares favorably to optimal performance
on the Wechsler scales and the K-ABC
Nonrandomness and General Referents
(columns 4 and 5, although the latest revi- One popular view holds that the strength of
sions of the Wechsler scales have eliminated an IQ test depends upon the degree to
most of SB4's previous advantage in this which its sample represents the general pop-
area). These comparisons suggest that the ulation. "Stratified random sampling"
SB4 would be a good choice for evaluations would be a relatively efficient method for
assessing potentially gifted youths, although obtaining such a representation. Many
it provides significantly higher scores than practitioners, as well as notable measure-
the more recent WISC-III (Simpson et ai., ment specialists (e.g., Hopkins, 1988), as-
2002). sume that individually administered IQ tests
are normed on stratified random samples.
This, however, is never the case. Test devel-
STANDARDIZATION opers must request examinees' cooperation.
The net effect is a loss of randomness, be-
The goal in developing the standardization cause people who volunteer are rarely like
sample for the SB4 was to approximate the those who do not (Jaeger, 1984).
demographics of the United States based on The common alternative to stratified ran-
the 1980 census (Thorndike et al., 1986c). dom sampling is to select examinees purpo-
There have been important demographic sively through "quota sampling." The
changes in the two decades since then. Most shortcoming of quota sampling is that its se-
notable has been the increase in ethnic mi- lections are likely to be biased, unless of
nority populations, particularly Spanish- course cooperation rates are high and uni-
speaking groups (Hernandez, 1997). Two form across strata (Hansen, Hurwitz, &
interrelated issues must be considered in re- Madow, 1953; Kish, 1965; Thorndike,
gard to the representativeness of SB4 norms. 1982). SB4 was normed on 5,013 individu-
The first is the loss of randomness that re- als arranged into 17 age groups (2 years, 0
sulted from the need to obtain examinees' months through 23 years, 11 months). Quo-
i 0 Stanford-Binet Intelligence Scaie Fourth Edition (SB4) 225
ta sampling was employed to approximate ulation. Such an approach assumes that
the U.S. population in terms of geographic prevalence rates are known for the various
region, community size, race, gender, and exceptionality subtypes. This assumption is
SES. Unfortunately, lower-SES examinees problematic for such conditions as learning
were underrepresented in the sample disabilities, for which there is no uniformly
(10.6% vs. 29.2% of the U.S. population), - accepted rate of occurrence in the general
and higher-SES examinees were overrepre- population and for which diagnostic rates
sented (43.1 % vs. 19.0%, respectively). continue to escalate (see Ysseldyke &
It would be simplistic to discredit SB4 for Stevens, 1986). The dilemma in such in-
sampling problems. The quota sampling in stances becomes this: "What is the appro-
SB4, as well as the differential rates of coop- priate percentage of individuals with learn-
eration, are common to all individually ad- ing disabilities to include in stadardization
ministered IQ tests, including the K-ABC, samples?"
WISC-III, and the Woodcock-Johnson Psy- Unsettling problems arise even when
cho-Educational Battery-Revised (W]-R) prevalences are known. A prevalence of 3 %
Tests of Cognitive Ability (Woodcock & is the standard endorsed for mental retarda-
Johnson, 1990a). The standardization sam- tion (Grossman, 1983). Yet it would be im-
ples of even the best available instruments proper to systematically target individuals
are imperfect approximations of the general identified as having mental retardation to
U.S. population at the time any given instru- form 3 % of a test's sample. Probability the-
ment was developed. ory dictates that a percentage of the volun-
teers in the sample who are not thus identi-
fied will also have mental retardation.
Nonrandomness and Other Referents
When the two groups are merged, individu-
An alternative perspective is that it is not als with mental retardation will be overrep-
necessary for IQ tests to reference the gener- resented. Counterintuitively, this overrepre-
al population. There are other legitimate sentation increases the likelihood that test
referents to which test scores can be com- norms will be diagnostically insensitive to
pared. Instruments such as SB4 are most of- persons with mental retardation. The over-
ten administered to two groups-namely, representation of low-scoring examinees
examinees who truly volunteer to be tested (i.e., those with retardation) will affect the
and examinees with suspected disabilities. conversion of raw scores to normalized
Consequently, it is essential that IQ tests ac- standard scores. As a result, a lower raw
curately reflect the capabilities of these two score will be needed to obtain an IQ in the
groups. Examinees who willingly consent to range for mental retardation (i.e., an IQ <
testing (self-referred individuals, those who 70) than if such examinees had not been
may be gifted, certain segments of the popu- oversampled. The diagnostic result is that
lation receiving special education) do not test norms will fail to qualify higher-func-
necessarily differ from the "volunteer" sub- tioning individuals with mental retardation
jects in standardization samples. At least in for needed services.
this regard, IQ test norms should be appro- One final issue is that either approach-
priate for volunteers. The second group is whether including specific conditions in the
more problematic. Individuals with disabili- standardization sample, or developing sepa-
ties are administered IQ tests for special rate specialized samples-assumes that de-
purposes (e.g., assignment to special educa- velopers will include all potentially relevant
tion categories, mandatory reevaluations). diagnostic categories. Unfortunately, at pre-
As such, most of these individuals cannot be sent we have incomplete knowledge of the
truly regarded as volunteers. Clearly linked different exceptionalities that might influ-
to this phenomenon is the need to consider ence performance on a cognitive ability bat-
persons with disabilities systematically-if tery. There is evidence that some psychiatric
not directly in test norms, then through spe- diagnoses, such as attention-deficit/hyperac-
cial studies. tivity disorder (e.g., Saklofske, Schwean,
One proposal for test development is to Yackulic, & Quinn, 1994; Schwean, Sak-
sample individuals with disabilities in pro- lofske, Yackulic, & Quinn, 1993) or autism
portion to their presence in the general pop- (Carpentieri & Morgan, 1994; Harris, Han-
......

226 II ASSESSMENT OF INTELLIGENCE AND L:.t\RNING STYLES/STRATEGiES

dleman, & Burton, 1990), may be associat- haves in these populations. Confirmatory
ed with mean differences in performance on factor analysis, for example, could identify
at least some aspects of ability. Currently it whether test dimensions are similar for per-
is unclear whether these group differences sons with and without disabilities (e.g., Kei-
reflect changes in cognitive processing, Of th & Witta, 1997). Comparisons based on
whether the effect is mediated by changes in item response theory (IRT; e.g., Hambleton,
motivation or test session behavior (Glut- Swaminathan, & Rogers, 1991) could veri-
ting, Youngstrom, Oakland, & Watkins, fy whether item difficulties are identical
1996). among exceptional and nonexceptional
Thus there are at least four links in the groups. IRT would also uncover whether
chain connecting knowledge to the design of item calibrations are sufficient for the maxi-
an appropriate standardization sample: (1) mum differentiation of low-scoring and
awareness of all the conditions and excep- high-scoring exceptionalities (Embretson,
tionalities that may influence performance 1999). Multiple-regression slope compar-
on an ability test; (2) accurate data about isons (and not bivariate correlations) should
the prevalence rate of these conditions in supply information relevant to whether test
the population; (3) efficient and affordable scores predict equally for persons with and
ways of identifying potential participants without disabilities (Jensen, 1980). Finally,
meeting criteria for the conditions, either by univariate and multivariate contrasts could
doing so among the "volunteers" or by gen- shed light on whether all possible test scores
erating a special reference sample; and (4) a (e.g., overall IQs, factor scores, subtest
clear theoretical rationale about the appro- scores) differ between the general sample
priateness of developing a separate set of and the various exceptionality subtypes
norms for a particular group (e.g., is it (persons with mental retardation, learning
meaningful to know how the working mem- disabilities, etc.).
ory performance of a youth with depression Compared to these ideals, SB4 leaves
compares to other such youths, or only to room for improvement. This finding is dis-
other youths the same age, regardless of ex- appointing, since sufficient data were gath-
ceptionality?). Given these hurdles, the most ered during SB4's development to complete
practical solution for test developers will many of the analyses identified above. SB4
probably continue to be approximation of is to be commended for verifying that its
stratified sampling, with the hope that par- Composite and area scores (but not neces-
ticipation biases do not lead to serious un- sarily subtest scores) differ between excep-
derrepresentation of important conditions. tional and nonexceptional samples. Never-
A statistical alternative might be to explicit- theless, no attempt was made to determine
ly model the selection process for partici- whether SB4's items are unbiased for those
pants, and then use estimates based on this with disabilities, or that its test dimensions
model to correct observed values for "non- are similar for individuals functioning nor-
sampling bias" (see Wainer, 1999, for dis- mally and exceptionally. Likewise, although
cussion and examples). Either way, it is im- criterion-related validity is reported for
portant for test consumers and users to those with disabilities, quantitative compar-
remain aware of these assumptions about isons were not conducted for the relative ac-
the representativeness of the standardiza- curacy of predictions between those with
tion sample. and without disabilities.
In fairness, no IQ test has met all of these
standards at the time of its publication.
Enhancing Diagnostic Utility However, the issue is not whether SB4
For the reasons discussed above, propor- should be excused because it is no more de-
tional sampling of individuals with disabili- ficient than other ability tests. Rather, the is-
ties is likely to create as many problems as it sue is why IQ tests are marketed without
solves. A more practical response is to sys- adequate evidence that they reflect the apti-
tematically overs ample these individuals, tudes of individuals with disabilities. We, as
but not necessarily to include them in test professionals responsible for the welfare of
norms. Instead, special studies should be clients, must demand this information at the
conducted to determine how the test be- time of a test's publication. Otherwise, we
10 Stanford-Binet Intelligence Scale Fourth ECiiion (SB4) 227
must accept the fact that we are willing to represented strata are counted more than
apply tests whose diagnostic capabilities are once by multiplying the original sample
unknown. variance upward to the desired population
estimate. The process is dependent upon the
assumption that examinees in the sample
Elimination versus Weighting
are representative of the entire population-
There is often "slippage" between a test's including those individuals who, for what-
sampling plan and the testing as executed ever reason, were not sampled.
(Thorndike, 1982). Two methods can bring There is no guarantee that the scores of
a sample back into alignment with its sam- examinees already in the sample are similar
pling plan. The first option is to eliminate to the scores of potential examinees who
examinees randomly from oversampled were not tested. However, this assumption
strata. The second option is to weight scores becomes more plausible when the obtained
from each stratum to their correct percent- sample has large numbers of examinees in
age of the population. Whereas both meth- each stratum who are representative of that
ods have their benefits, neither can fully particular population segment. SB4's stan-
compensate for a loss of randomness in the dardization sample is quite large (n =
sampling process (Glutting & Kaplan, 5,013), and its strata are probably of suffi-
1990). cient subject size for weighting. Moreover,
Until SB4, norms for individually admin- weighting is an accepted procedure for stan-
istered IQ tests were typically aligned by dardizing group tests. Consequently, from
eliminating examinees from oversampled this perspective, the weighting of test scores
strata. The benefit of elimination is that in SB4 appears as reasonable as the weight-
there is no redundancy in subject-generated ing used to norm group tests.
variance (i.e., an examinee is not counted as
more, or less, than one case). Moreover, the
practice is tidy. "Final" samples often align RELIABILITY
well with the population, in part, because
test manuals provide little discussion of dis- By and large, SB4's reliabilities are quite
carded cases. Therefore, had SB4 used elim- good. Internal consistency for the Compos-
ination, it would have been easy to marvel ite is excellent, with Kuder-Richardson 20
at how well the sample approximated the coefficients ranging from .95 to .99 across
general population on race, gender, SES, age levels. Reliabilities for area scores are
and so on. Instead, SB4 retained all 5,013 also substantial. Internal consistency for
participants in the standardization sample, two-, three-, and four-subtest groupings
even though higher-SES families were more vary from .86 to .97 for Verbal Reasoning
likely to provide informed consent and to (median r = .95). Coefficients for Ab-
participate than were lower-SES families. In stractNisual Reasoning range from .85 to
an effort to correct for these sampling bias- .97 and show a median of .95. Similarly, es-
es, the SES variables of occupation and edu- timates for Quantitative Reasoning vary
cation were weighted so that examinees' from .80 to .97 (median r = .94), and inter-
scores would conform to their correct per- nal consistency for 5hort-Term Memory
centages in the u.s. population. That is, ranges from .86 to .95 (median r = .86). It is
"each child from an advantaged back- worth noting that only the Composite
ground was counted as only a fraction of a achieves reliability coefficients consistently
case (as little as 0.28), while each child from greater than Kelley's (1927) recommended
a less advantaged background was counted threshold of .94 for making decisions about
as more than one case" (Thorndike et aI., individuals. Most of the area scores attain
1986c, p. 24). the less conservative threshold (reliabilities
One advantage of weighting is that it ac- ~ .85) proposed by Weiner and Stewart
counts for all scores in the sample. Related- (1984) for individual classification.
ly, it produces estimates of higher reliability Subtest internal consistencies are lower,
than does elimination. A potential flaw is as would be expected from their shorter test
that weighted estimates are not based en- lengths. Nonetheless, with the exception of
tirely on actual cases. Examinees in under- one sub test, median coefficients are reason-
228 II l\SSESSMENT OF IN-:-ELLIGENCE AND LEARNING STYLES/STRATEGIES

ably high (range == .83 to .94 across age (r's == .87, .67, and .81, respectively). How-
groups). The exceptional subtest (Memory ever, somewhat lower stability is found for
for Objects) is located in the Short-Term the Quantitative Reasoning area (r == .51).
Memory area, and it produces coefficients Preschoolers' Composites will, on aver-
of marginal reliability (median r == .73). The- age, increase approximately 8.2 points from
subtest with the second lowest reliability is test to retest administrations. Similarly,
also in the Short-Term Memory area (Mem- Composites are likely to increase by 6.4
ory for Digits; median r == .83). As a result, points for elementary school children who
psychologists should be alert that subtest are tested twice across short intervals. SB4
scores from the Short-Term Memory area offers no stability data for examinees of ju-
are likely to be less precise than subtest nior high or high school age, or for young
scores from other areas in SB4. adults, making it difficult to approximate
Standard errors of measurement (SEMs), the score increases that might be expected
and "confidence bands" derived from of these age groups.
SEMs, are the reliability issues most likely
to affect everyday practice. Confidence
bands produce information relevant to the VALIDITY
fallibility of test scores, and consequently
help to clarify the relative verity and utility An impressive amount of validity informa-
of test scores in decision making about indi- tion has been gathered in support of SB4. In
viduals (Glutting, McDermott, & Stanley, particular, investigations have addressed de-
1987). velopmental changes of raw scores by age;
Memory for Objects provides the least quantitative analyses of item fairness across
precise scores in SB4 (i.e., the largest confi- gender and ethnic groups; correlations with
dence bands). Its SEM shows a median of other IQ tests, using samples of both nor-
4.15 points across age groups. The subtest mal and exceptional examinees; correlations
with the second largest SEM is Memory for with achievement tests; score differences be-
Digits (median == 3.25). However, the SEMs tween the standardization sample and spe-
of these two sub tests (and for all other more cial groups (individuals who are gifted, have
reliable subtests) are within reasonable lim- learning disabilities, or have mental retarda-
its. Also, as might be expected, greater pre- tion); and the factor structure of SB4's test
cision in scores is found when interpreta- dimensions.
tions are based on the four area scores.
Median SEMs for Verbal Reasoning, Ab-
stractNisual Reasoning, Quantitative Rea-
Concurrent Validity
soning, and Short-Term Memory are as fol- Table 10.4 presents concurrent correlat,ions
lows: 3.9, 3.6, 3.8, and 4.8, respectively. between SB4 and other IQ tests administered
Finally, the most precise score in SB4 is the to normal samples. This compilation was ob-
Composite (median SEM == 2.3; all SEMs as tained from studies reported in the Technical
reported in Sattler, 1992). Manual (Thorndike et al., 1986c) and in a re-
The Technical Manual (Thorndike et al., view by Laurent, Swerdlik, and Ryburn
1986c) calculates score stability for samples (1992), as well as from studies conducted by
of preschoolers (5-year-olds) and children independent investigators. Laurent and col-
attending elementary school (8-year-olds). leagues present validity data for exceptional
Preschoolers' test-retest coefficients are rea- samples, too. Results show substantial asso-
sonable for the Composite (r == .91) and for ciations between SB4's Composite and over-
area scores (range == .71 to .78). Less stabili- all scores on SB-LM, all Wechsler scales, the
ty is evident for individual subtests, and in K-ABC, the WJ-R (Woodcock & Johnson,
particular for Bead Memory (r == .56). The 1990a), and the Differential Ability Scales
pattern of test-retest coefficients of elemen- (DAS; Elliott, 1990). Correlations ranged
tary school children is similar to that found from .53 to .91 (average r == .78 using Fisher's
for preschoolers. Appreciable stability is z' transformation). The consistency and
present for the Composite (r == .90) and for magnitude of these relationships speak well
the areas of Verbal Reasoning, AbstractNi- for the Composite's construct validity.
sual Reasoning, and Short-Term Memory Spearman's (1923) principle of the "indif-
10 Stanford-Binet Intelligence Scale Fourth Edition (SB4) 229

TABLE 10.4. Score Characteristics and Correlations of SB4 with Other IQ Tests Administered to
Nonexceptional Samples
Mean Mean Other Other
age SB4 IQ test mean IQ
Study n (years) - Composite test IQ difference Correlation
Elliott (1990) 55 9.9 109.8 DAS 106.3 3.5 .88

Thorndike, Hagen, & Sattler 175 7.0 112.7 K-ABC 112.3 0.4 .89
(1986c, Study 5)
Hendershott, Searight, 36 4 110.5 K-ABC 118.2 -7.7 .65
Hatfield, & Rogers (1990)
Krohn & Lamp (1989)a,b 89 4.9 93.4 K-ABC 96.0 -2.6 .86
Krohn & Lamp (1989)a,b 65 4 93.8 K-ABC 95.8 -2.0
Krohn & Lamp (1989)a,b 65 6 93.3 K-ABC 99.7 -6.4
Krohn & Lamp (1989)a,b 65 9 96.5 K-ABC 97.9 -1.4
Kaufman & Kaufman (1983) 121 School-age 116.5 K-ABC 114.5 +2.0 .61
Smith & Bauer (1989) 30 4.9 K-ABC .57

Clark, Wortman, Warnock, 47 SB-LM .53


& Swerdlik (1987)
Hartwig, Sapp, & Clayton 30 11.3 113.1 SB-LM 114.4 -1.3 .72
(1987)
Thorndike, Hagen, & Sattler 139 6.9 105.8 SB-LM 108.1 -2.3 .81
(1986c, Study 1)
Krohn & Lamp (1989)a,b 89 4.9 93.4 SB-LM .69
Lukens (1988) 31 16.75 44.8 SB-LM 46.7 -1.9 .86

Psychological Corporation 26 28.6 114.8 WAIS-III 113.3 +1.5 .88


(1997)

Carvajal, Gerber, Hughes, 32 18.0 100.9 WAIS-R 103.5 -2.6 .91


& Weaver (1987)
Thorndike, Hagen, & Sattler 47 19.4 98.7 WAIS-R 102.2 -3.5 .91
(1986c, Study 4)

Lavin (1996) 40 10.6 108.0 WISC-III 107.0 +1.0 .82


Rust & Lindstrom (1996) 57 6-17 109.9 WISC-III 111.3 -1.4 .81

Rothlisberg (1987) 32 7.8 105.5 WISC-R 112.5 -7.0 '.77


Thorndike, Hagen, & Sattler 205 9.4 102.4 WISC-R 105.2 -2.8 .83
(1986c, Study 2)
Carvajal & Weyand (1986) 23 9.5 113.3 WISC-R 115.0 -1.7 .78
Greene, Sapp, & Chissom 51 Grades 1-8 80.5 WISC-R 78.1 +2.4 .87
(1990)C
Wechsler (1991) 205 6-16 WISC-R .83

Woodcock & Johnson 64 2.9 WJ-R .69


(1990b, Study 1)
Woodcock & Johnson 70 9.5 WJ-R .69
(1990b, Study 2)
Woodcock & Johnson 51 17.5 WJ-R .65
(1990b, Study 3)

Thorndike, Hagen, & 75 5.5 105.3 WPPSI 110.3 -5.0 .80


Sattler (1986c, Study 3)
Carvajal, Hardy, Smith, 20 5.5 114.4 WPPSI 115.6 -1.2 .59
& Weaver (1988)
(continues)
230 ASSESS;\J\ENT OF INTELLIGENCE A.~D L~!,\RNING STYLES/STRATEGIES

TABLE 10.4. Continued


Mean Mean Other Other
age SB4 IQ test mean IQ
Study n (years) Composite test IQ difference Correlation
Carvajal, Parks, Bays, & 51 5.7 103.0 WPPSI-R 109.5 -6.5 .61
Logan (1991)
McCrowell & Nagle (1994) 30 5.0 95.9 WPPSI-R 94.1 +1.8 .77
Wechsler (1989) 105 5.6 107.2 WPPSI-R 105.3 +1.9 .74
Note. DAS, Differential Ability Scales; K-ABC, Kaufman Assessment Battery for Children; SB-LM, Stanford-Binet,
Form L-M; WAIS-R, Wechsler Adult Intelligence Scale-Revised; WAIS-III, Wechsler Adult Intelligence Scale-Third
Edition; WISC -R, Wechsler Intelligence Scale for Children-Revised; WISC-III, Wechsler Intelligence Scale for
Children-Third Edition; WPPSI, Wechsler Preschool and Primary Scale of Intelligence; WPPSI-R, Wechsler Preschool
and Primary Scale of Intelligence-Revised; WJ-R, Woodcock-Johnson Psychoeducational Battery-Revised.
aSame sample appears multiple times in table, because participants completed multiple ability tests.
bHead Start sample, followed longitudinally.
cAfrican American sample.

ference of the indicator" suggests that the lected, it is likely that the "Flynn effect" ac-
specific item content in intelligence tests is counts for much of this variance in scores
unimportant to the evaluation of general (Flynn, 1984, 1999). By virtue of the Flynn
ability (or g). The truly important phenome- effect, which refers to apparent secular
non for g is that IQ tests measure inductive gains in average performance on ability
and deductive reasoning. Thus correlations tests, it is likely that SB4 scores would be
between one IQ test and IQ tests with dis- about 3 points higher than scores derived
similar content can help evaluate the extent from tests normed a decade later, such as
to which the first test measures g. Based on the WAIS-III, the forthcoming revision of
the correlations in Table 10.4, at least the K-ABC, and the new W} III.
60.8% of the Composite's variance is ac-
counted for by g. These data suggest that
Factor Structure
the Composite provides a reasonably trust-
worthy estimate of general intelligence. The most controversial aspect of SB4 con-
Of applied interest are score differences cerns the interpretability of its area scores.
that can be expected between SB4 and other That is, do the capabilities evaluated by SB4
IQ tests. Column 7 in Table lOA (labeled actually conform to the four-factor model of
"IQ difference") shows that the Composite intelligence that has been advanced for the
averages 2.5 points lower than the IQs from test? This question of construct validity is
other intelligence tests published prior to open to empirical verification, and it is one
SB4. Interestingly, scores on the SB4 also av- that usually can be answered through factor
erage 0.5 points higher than scores obtained analysis.
on tests published after SB4 (i.e., the WPP- It is disconcerting that the authors of SB4
SI-R, DAS, WJ-R, WISC-III, and WAIS-III). themselves disagree about the number of in-
Individual comparisons are less precise be- terpretable factors. Thorndike, for example,
cause of the smaller number of studies be- in light of his own factor-analytic results,
tween SB4 and anyone test. With this offers no explanation for why the four-fac-
caveat in mind, psychologists might expect tor model should be applied to examinees
SB4 to produce IQs that are about 2 points younger than age 12. He "confirms" only
lower than those from SB-LM; 5 points two factors between ages 2 and 6 (Verbal,
lower than those from the WISC-R; 3 points AbstractNisual). His analyses then support
lower than those from the Wechsler Adult a three-factor model between ages 7 and 11
Intelligence Scale-Revised (WAIS-R); 3 (Verbal, AbstractNisual, Memory). Most
points lower than those from the WPPSI; importantly, the proposed four-factor model
and 2.5 lower than those from the K-ABC. does not emerge until ages 12 through 23.
Given the difference in times when these re- Sattler (1992), on the other hand, es-
spective standardization samples were col- chews the SB4 model. He proposes a two-
10 Stanford-Binet Intelligence Scale Fourth Edition (SB4) 231

factor solution between ages 2 and 6 (Ver- lihood chi-square test or the Kaiser criterion
bal Comprehension, Nonverbal Reason- (Zwick & Velicer, 1986). The common ele-
inglVisualization) and a three-factor solu- ment in all this is that no single study has
tion at ages 7 through 23 (Verbal definitively substantiated the existence of
Comprehension, Nonverbal ReasoningNi- four area factors. It therefore stands to rea-
sualization, Memory). Conspicuously ab- son that psychologists should refrain from
sent in Sattler's findings is the dimension of interpreting area scores until more evidence
Quantitative Reasoning, and at no age level is offered on their behalf.
does he recommend the interpretation of all It should be kept in mind that current in-
four area scores. abilities to support a four-factor model may
Perhaps because of this open disagree- not necessarily represent a failure of SB4 per
ment, investigators have extensively reana- se. Rather, the difficulty may lie in the sensi-
lyzed the SB4 normative data as well as con- tivity of factor analysis to data -related is-
ducting independent replications. We found sues in SB4. This is particularly true when
a dozen different published factor analyses confirmatory factor analysis is applied. The
of SB4. Four provided evidence consistent relationship between confirmatory factor
with the published four-factor structure analysis and SB4 was explored in detail by
(Boyle, 1989, 1990 [especially if one is will- Glutting and Kaplan (1990).
ing to exclude certain subtests]; Keith, Cool,
Novak, & White, 1988; Ownby & Carmin,
1988). Five studies challenge the four-factor SCORE INTERPRETATION
structure, suggesting anywhere from one
general ability factor (Reynolds, Kamphaus, The SB4 can potentially support clinical in-
& Rosenthal, 1988) to two or three factors, terpretation at a variety of levels of analysis.
depending on age (Gridley & McIntosh, The battery yields a single, global estimate
1991; Kline, 1989; Molfese, Yaple, Helwig, of general cognitive ability, the Composite
& Harris, 1992; Sattler, 1992). The remain- score, which represents the most general
ing studies are equivocal about the compet- level of analysis available on the SB4. Be-
ing models (McCallum, Karnes, & Crowell, neath the Composite, the SB4 also theoreti-
1988; Thorndike et aI., 1986c; Thorndike, cally could yield scores for Fluid and Crys-
1990). There is a tendency to detect more tallized cognitive ability, which are referred
factors in older age groups, with a two- to as "Level II scores" in the SB4 manuals.
factor structure describing preschool data, SB4 also includes the Short-Term Memory
and three factors describing data for youths factor score in Level II. Level III scores on
above 7 years of age. These differences in the Binet include factor-based scores mea-
factor structure, if true, could reflect either suring the specific cognitive abilities of Ver-
developmental change or alterations in the bal Reasoning, Quantitative Reasoning, and
subtest battery administered at each age. AbstractNisual Reasoning. SB4, unlike SB-
Older youths completed more subtests on LM, also provides standardized age scores
average, increasing the likelihood of statisti- for specific subtests, enabling potential in-
cally recovering additional factors (if such terpretation of subtest profiles. This exem-
additional dimensions of ability were mea- plifies the most fine-grained level of clinical
sured by the subtests). 3 interpretation that would be considered in
Interestingly, we are not aware of any most cases (d. Sattler, 1992, for discussion
published study that has used either Horn's of attention to responses to specific items).
parallel analysis (Horn, 1965) or the This structure for the SB4 is similar to the
method of minimum average partials hierarchical structures adopted by most
(Velicer, 1976) as decision rules to deter- contemporary measures of cognitive ability,
mine the appropriate number of factors to and this format lends itself readily to the
retain for the SB4. Methodological evidence "top-down" models of interpretation advo-
strongly suggests that these are the two cated by many assessment authorities (e.g.,
techniques most likely to recover the accu- Aiken, 2000; Kamphaus, 1993; Kaufman,
rate number of factors, and they tend to re- 1994; Sattler, 1992). It is important to con-
tain fewer factors than more commonly sider the evidence supporting these different
used procedures, such as the maximum-like- levels of interpretation; assessment practice
232 II. ASSeSSMENT OF INTELLIGeNC:: AND LEARNING STYLeS/STRATEGIES

is better driven by scientific evidence than little guidance offered to the practltlOner
by convention and appeals to authority. about how to calculate the Level II scores be-
The Level I score, the Composite, possess- yond Short-Term Memory. This level of
es good evidence of validity. The preponder- analysis has not received much attention in
ance of research involving the SB4 and its practice, and it probably should not be em-
predecessors has concentrated on the Com- phasized until more evidence is available
posite score, so there is considerable accu- demonstrating clear incremental validity
mulated evidence about the Composite above and beyond the information derived
score's convergent, criterion, and predictive from the Composite score.
validity for such constructs as academic Level III scores are also problematic, be-
achievement. The Composite score also has cause of disagreement about factor struc-
gained fairly consistent support from factor ture as well as a lack of information about
analyses of the SB4 subtests (which typically incremental validity. The purported struc-
have indicated either several correlated fac- ture of Verbal Reasoning, Quantitative Rea-
tors or one general ability factor). Although soning, and AbstractNisual Reasoning as
some have questioned the treatment validity three distinct factors has not consistently
of even these most global scores from cogni- emerged across the ages covered by SB4, or
tive ability tests (Macmann & Barnett, in analyses of independent samples (and not
1997; McCallum et aI., 1988), a good case always in secondary analyses of the stan-
can be made for using and interpreting these dardization data). Currently there is insuffi-
global scores (Neisser et aI., 1996), particu- cient evidence to permit us to conclude
larly in terms of psychoeducational and vo- whether the subtests on the SB4 adequately
cational assessment. assess these three different dimensions of
Level II scores are less well supported. The ability. More importantly from a practical
SB4 as published provides an observed score perspective, at present there is no informa-
for the Short-Term Memory area, but the tion about incremental validity for these
AbstractNisual Reasoning area score is the area scores after the Composite score is in-
only potential indicator of Fluid-Analytic terpreted. Although researchers have begun
Abilities in the battery. This limits the con- to explore the possibility that more discrete
struct validity of estimates of fluid ability de- ability scores might provide additional clini-
rived from the SB4, inasmuch as fluid ability cal data about achievement or behavior
may involve processes beyond abstract/visu- problems not subsumed in a more global
al reasoning (Carroll, 1993; Horn & Noll, ability score (d. Glutting, Youngstrom,
1997). Furthermore, the SB4 manuals and Ward, Ward, & Hale, 1997; Youngstrom,
interpretive aids do not formally present a Kogos, & Glutting, 1999), this work still
way of calculating a summary score for needs to begin with the SB4. Area score in-
Crystallized Abilities, although it is possible terpretation imposes burdens on the practi-
to estimate such a score by combining the tioner and consumer in terms of longer
Verbal and Quantitative Reasoning area tests, greater complexity of results, and po-
scores. The proposed Level II structure of the tentially greater likelihood of diagnostic er-
SB4 has not consistently been confirmed by rors (Silverstein, 1993). In light of these
secondary analyses of the standardization costs, it would seem premature to empha-
data or independent samples. Perhaps most size Level III area scores in SB4 interpreta-
crucially, there is a dearth of research ad- tions. The lack of consensus about the con-
dressing the criterion validity of Level II struct validity for these scores, based on
scores from the SB4 (d. Caruso, 2001). The factor analyses, further calls for caution.
paucity of research is probably largely relat- Psychologists may be tempted to make
ed to the lack of emphasis on Level II inter- "area" interpretations (e.g., Naglieri,
pretation in the SB4 materials, and it is pos- 1988a), even though there is little justifica-
sible that future research will demonstrate tion for this practice. Indeed, Hopkins
value in interpreting these more discrete abil- (1988) appears to believe that SB4's four
ityestimates (e.g., Moffitt & Silva, 1987). At area scores should be interpreted, and that
present, however, there is minimal literature practitioners need not "become emotionally
to guide clinical hypothesis generation or in- involved in the 'great debate' regarding the
terpretation of Level II scores, and there is theoretical structure of intelligence a deter-
10 Stanford-Binet Intelligence Scale Fourth Edition (SB4) 233
mined by the factor analytic method" (p. tion confronting the clinician is complex:
41). Hopkins's position is incorrect, because Who can generate a hypothesis with any
it implies that clinical necessity should su- confidence when faced with an average of
persede what can be supported empirically. eight or nine different abilities and another
However, the need to generate hypothese.s three background factors that could con-
about an examinee is never sufficient tribute to performance on a specific subtest?
grounds for the interpretation of a test In addition, subtest analysis faces sub-
score. This is especially true in the case of stantial psychometric challenges (Macmann
SB4's four area scores, since claims for their & Barnett, 1997; McDermott, Fantuzzo, &
construct validity have yet to be substantiat- Glutting, 1990; McDermott, Fantuzzo,
ed, in spite of a considerable amount of in- Glutting, Watkins, & Baggaley, 1992) that
vestigation. Even if this were accomplished, make it unlikely to deliver on the promise of
it would also be necessary to document cri- improved assessment or treatment planning.
terion validity and incremental validity In fact, the studies available to date for the
(above more parsimonious g-based models) SB4 clearly indicate that there is no signifi-
before clinical interpretation of area scores cant improvement in assessment when sub-
could be justified. test interpretation is added to the analytic
The addition of standard age scores for strategy (Kline, Snyder, Guilmette, &
subtests to the SB4 created the possibility of Castellanos, 1992, 1993). This is consistent
subtest profile interpretation, which has be- with growing evidence from investigations
come a prevalent practice in the use of other with other tests, indicating that subtest
major ability tests (e.g., Kaufman, 1994; analysis is problematic at best when applied
Naglieri, 1988b; Sattler, 1992). Many clini- to routine assessment goals such as predict-
cians and researchers welcomed this addi- ing academic achievement or diagnosis (e.g.,
tion as an opportunity to improve the per- Watkins, 1996; Watkins, Kush, & Glutting,
ceived clinical value of the SB4, hoping that 1997). In short, it appears that the SB-LM
more detailed attention to patterns of per- was not missing much by failing to include
formance on subtests would lead to im- subtest scores, and that practitioners would
proved psychoeducational prescription do well to avoid relying much on SB4 sub-
(Lavin, 1995) or to identification of profiles tests as a distinct level of analysis in con-
characterizing the performance of specific ducting evaluations.
diagnostic groups (e.g., Carpentieri & Mor-
gan, 1994; Harris et al., 1990). Procedures
and recommendations are available to pro- SHORT FORMS
mote this sort of analysis with the SB4
(Naglieri, 1988b; Rosenthal & Kamphaus, Cognitive assessment IS a time-consuming
1988; Spruill, 1988). Sattler (1992) also enterprise (Meyer et aI., 1998). This ex-
provides a detailed table (Table C-52) listing pense, combined with the lack of validity in-
the abilities thought to be reflected in each formation supporting the clinical use of
subtest, background factors thought to af- scores beyond the Composite as described
fect subtest performance, possible implica- above, strongly suggests the potential value
tions of high and low scores on each sub- of short forms of the SB4 that provide reli-
test, and instructional implications of able estimates of general ability without en-
unusual performance on each subtest. Sat- tailing the costs of a complete administra-
tler's table is thorough. For example, Sattler tion. SB4 offers four short forms that result
lists from 3 to 18 distinct abilities for each in a substantial savings of testing time: the
of the 15 subtests (M = 8.5, SD = 3.8), and six-subtest General Purpose Assessment
an average of five implications for every Battery (GPAB; Vocabulary, Bead Memory,
high or low score per subtest. This presenta- Quantitative, Memory for Sentences, Com-
tion clearly encourages the clinical interpre- prehension, and Pattern Analysis); the four-
tation of individual strengths and weakness- subtest Quick Screening Battery (Vocabu-
es at the subtest level. Although Sattler lary, Bead Memory, Quantitative, and
provides some cautionary statements about Pattern Analysis); the four- to six-subtest
not interpreting a subtest score in isolation, Battery for the Assessment of Students for
such tables seem prone to abuse. The situa- Gifted Programs; and the six-subtest Battery

j
""""

234 II ASSESS/\;\ENT OF INTELLIGENCE AND LEARNING STYLES/STRATEGIES

for Students Having Problems Learning in concerns that the short forms might show
School. Short forms with four or fewer sub- substantially lower external validity, in
tests are intended for screening purposes, spite of correlating well with the full-bat-
but batteries composed of at least six sub-- tery composite (Levy, 1968; McCormick,
tests can be used for placement decisions 1956). It is less clear that short forms pro-
(Thorndike et aI., 1986c, p. 50). This latter vide an adequate substitute for the full bat-
possibility makes it essential that test scores tery when individual classification decisions
from six-subtest abbreviated batteries be are required; in addition, the two-subtest
psychometrically equivalent to those from battery clearly is suitable only for group re-
the full test. search and not individual assessment.
According to the data presented in the In spite of the burgeoning literature ex-
Technical Manual (Thorndike et aI., amining SB4 short forms, important ques-
1986c), split-half reliabilities for two-, four- tions remain unanswered. One problem is
, and six-subtest short forms are fairly con- that practitioners often may develop idio-
stant and appreciable for examinees of dif- syncratic short forms that have not been
ferent ages. Correlations between empirically validated. Norms tables in SB4
Composites from short forms and the com- make it possible to calculate Composites
plete battery are also acceptable. However, from practically any combination of sub-
the Technical Manual fails to present infor- tests. Thus practitioners can develop their
mation about differences between estimated own short forms by "picking and choosing"
Composites and area scores from abbreviat- among favorite subtests. No matter how the
ed batteries and actual scores on the full particular combination is chosen, problems
test. Since publication of the SB4, more are likely to arise for short forms if the ad-
than a dozen independent studies have in- ministration sequence of the subtests is dis-
vestigated the psychometric properties of turbed. 4 Assume, for example, that a psy-
various abridged forms, using samples chologist elects to administer a short form
ranging from low to high ability and from consisting of subtests 1, 3, 4, 5, 6, and 13.
preschool to college. The majority of these The psychologist in such an instance is op-
investigations concluded that the six-subtest erating under the belief that norms for sub-
GPAB was the most acceptable substitute test 13 (Paper Folding and cutting) will re-
for a complete administration (Atkinson, main constant, regardless of the fact this
1991; Carvajal & Gerber, 1987; DeLamatre subtest now occupies position 6 in the new
& Hollinger, 1990; Kyle & Robertson, battery. Thus the validity of the procedure is
1994; McCallum & Karnes, 1990; Prewett, critically dependent on the assumption that
1992; Volker, Guarnaccia, & Scardapane, norms and examinees' performances are in-
1999), possessing both good correspon- dependent of a subtest's location in the' bat-
dence with the full battery and good exter- tery.
nal validity with other measures of ability Such assumptions of independence are
(Carvajal, Hayes, Lackey, & Rathke, 1993; certainly open to question. Decreases in
Carvajal, McVey, Sellers, & Weyand, testing time may lessen an examinee's frus-
1987). On the other hand, two investiga- tration and improve test scores on the short-
tions concluded that the four-subtest bat- er battery. Differences in the fatigue of the
tery performs essentially as well as the six- psychologist or examinee, or the fact that
subtest version, and argued that the the full test offers more opportunities to
four-subtest version is preferable for screen- gain experience in understanding test direc-
ing purposes in light of its brevity (Prewett, tions and familiarity with test materials,
1992; Volker et aI., 1999). Finally, Nagle could also affect performance. Learning or
and Bell (1993) found that all of the short "carryover" effects from one subtest to the
forms produced what they considered to be next are particularly likely for measures that
unacceptable levels of disagreement for in- require examinees to manipulate objects
dividual classification purposes. Instead, (i.e., nonverbal/performance subtests). Fi-
these authors recommend the use of item nally, even if these assumptions were satis-
reduction short forms rather than subtest fied, the psychologist must consider
reduction versions (Nagle & Bell, 1995). whether the external validity of the shorter
On the whole, these studies alleviate earlier battery is the same as that of the full test
i 0 Stanford-6i~eT intellige~.ce Scale !=ourth Edition (564) 235
(see Smith, McCarthy, & Anderson, 2000, ry to the published incarnation. Even so, the
for further recommendations about the de- theory and planning behind the SB5 deserve
velopment and evaluation of short forms). some comment.
This limitation also applies to the majority The plans for the SB5 seek to honor the
of extant research with SB4 short forms: ~ Binet tradition while also incorporating cur-
Researchers typically administer the full rent methodology and theories of intelli-
batterv and then extract different short gence (J. Wasserman, personal communica-
forms· from that battery. Thus practice, fa- tion, February 17, 2000). One major
tigue, and motivation effects are based on a change is explicit adoption of the multi-level
full administration, which would not be the model of intelligence expounded by Cattell,
case when a short form was administered Horn (see Horn & Noll, 1997), and Carroll
clinically. (see Carroll, 1993). The goal in developing
Without more information about the ef- the SB5 is to include items that will ade-
fects of subtest sequencing and battery quately sample all eight hypothesized specif-
length, as well as short-form external validi- ic ability factors: fluid reasoning, general
ty, it could be argued that psychologists knowledge, quantitative reasoning, working
should administer SB4 in its entirety (or memory (previously short-term memory),
possibly use the six-subtest GPAB) and long-term memory, auditory processing, vi-
should refrain from selectively administer- sual-spatial ability, and processing speed.
ing alternative batteries. We acknowledge The battery is also expected to include mea-
that this recommendation runs counter to sures of procedural knowledge in an effort
current neuropsychological practice and to measure Gc, or crystallized ability. If the
"multi battery" approaches to assessment, data support the desired model, then the
both of which appropriate subtests from a plan would be for SB5 to yield factor scores
variety of sources to construct idiosyncratic for each of these specific abilities. In a de-
batteries intended to test clinical hypotheses parture from tradition, the SB5 will proba-
and address specific referral needs. Our po- bly express these standard scores in a metric
sition is a conservative one, recognizing that with M = 100 and SD = 15 (not the SD = 16
multi battery approaches represent a depar- of previous Stanford-Binet scales). The ex-
ture from the standardized administration pectation is that the SB5 will also yield three
procedures used to develop test norms. An superordinate scores: Verbal Ability, Non-
alternative would be to use brief ability tests verbal Ability, and a full-scale Composite
that were designed for short administration score reflecting the single best estimate of
and that have known reliability and validity psychometric g obtained from the test. Each
when used in this manner (e.g., Glutting, of these constructs will also have an ob-
Adams, & Sheslow, 2000; Psychological served scaled score that practitioners will
Corporation, 1999). Practitioners must be calculate as part of the standard scoring of
cautious about trading away the advantages the battery.
inherent in following a standardized proto- Current plans also include other features
col in exchange for a briefer, more flexible, designed to make the test more appealing to
and allegedly more "focused" battery of un- clinicians. One is to utilize a balanced set of
known validity. verbal and nonverbal indicators for each of
the eight specific ability factors, addressing
a historical criticism of the SB instruments
FUTURE DIRECTIONS: THE as overemphasizing verbal abilities. A sec-
DEVELOPMENT OF SB5 ond feature is the plan to generate linking
samples of youths completing the SB5 and
As this chapter is being written, preliminary either the Wechsler Individual Achievement
item tryouts are beginning for the develop- Test-Second Edition or the Achievement
ment of the SB5, which is planned to be re- tests from the W}III. This would substan-
leased in Spring 2003. There obviously may tially facilitate the analysis of IQ-achieve-
be substantial changes between the pro- ment discrepancies when these popular
posed version of the test and the final pub- measures of academic achievement are used.
lished edition, with actual data playing a Perhaps most notable of all, the SB5 is ex-
substantial role in the translation from theo- pected to extend through adulthood, with
+
236 :1 ASSESSMENT OF INTELLIGENCE AND LE,t\RNING STYLES/STRATEGIES

new norms reaching ages 80-90. Extensive practice of interpreting them. At the same
validity studies are also planned, comparing time, providing better measures of the spe-
the SB5 with a variety of other measures of cific ability constructs of the Cattell-
cognitive ability, as well as looking at per- Horn-Carroll model would equip practi-
formance on the SB5 within special popula- tioners to measure distinct cogmtlve
tions (defined using independent research abilities underlying g. It would still be nec-
and diagnostic criteria). This would be an essary to demonstrate the treatment validity
important contribution to the body of of the different factor scores (d. Glutting,
knowledge, in addition to being useful data Youngstrom, et al., 1997; Youngstrom et
for test interpretation, because such an ap- al., 1999), but these scores would inherently
proach would avoid the circular reasoning possess better construct validity than sub-
that plagues much research in this area. Too test scores. We hope that the finished prod-
often researchers have used a test to define a uct for the SB5 achieves the goals its devel-
diagnosis (e.g., mental retardation or learn- opers have set for this revision.
ing disabilities), and then demonstrated that
this group shows different performance on
other measures of the same construct- RECOMMENDATIONS
without acknowledging the tautology of
this approach (Glutting, McDermott, Wat- On the basis of this review, we offer the fol-
kins, Kush, & Konold, 1997). lowing general recommendations affecting
Also under consideration is a return to use of the SB4. As with any recommenda-
the age scale format used in versions prior tion or clinical practice, these are subject to
to SB4. This would eliminate individual change in light of new research findings.
subtest scores from the SB5 and make the
factor scores the lowest level of analysis. 1. Reasonable construct validity is pre-
This approach would be consistent with the sent for the Composite, and the Composite
goal of making SB5 developmentally sensi- also has accumulated the most evidence for
tive, allowing a blending of items designed external and criterion-related validity. This
to measure the same construct across differ- is a score that psychologists can interpret on
ent ages, without requiring a formal change the basis of psychometric principles, empiri-
in subtest. Item-level factor analysis (or cal evidence, and best practices.
analysis of parcels developed using IRT) 2. SB4 area scores are problematic be-
would guide the organization of items as in- cause of the continued controversy about
dicators of the specific ability factors. SB4's factor structure, as well as the current
This return to an age scale format is likely lack of any data showing incremental valid-
to be controversial, given the amount of ity of the area scores surpassing the inter-
clinical lore surrounding the practice of sub- pretive value of the Composite. We have
test interpretation. However, this change is also advanced the position that current dis-
also consistent with the best evidence cur- agreement about the adequacy of a four-fac-
rently available, which shows that subtest tor model may not necessarily represent a
interpretation is fraught with psychometric failure of SB4 per se. Nevertheless, until op-
problems (Macmann & Barnett, 1997; Mc- timal methodological procedures are ap-
Dermott et al., 1990, 1992) and generally plied and empirical evidence supports four
has failed to deliver the promised improve- underlying factors, psychologists would do
ments in interpretation, diagnosis, or inter- well to avoid comparing or interpreting
vention (Watkins & Kush, 1994). Because these scores.
of their greater reliability and larger amount 3. Subtest interpretation should be deem-
of variance attributable to an underlying phasized or avoided, on both psychometric
cognitive ability (i.e., greater validity), fac- and scientific grounds. Subtest interpreta-
tor scores are more likely to enable clini- tion increases the possibility of Type I errors
cians to make finer-grained analyses than and complicates the assessment process.
simple interpretation of a global score. The Most importantly, subtest analysis has yet
planned format for the SB5 could do much to demonstrate incremental validity or
to promote good clinical practice in this re- treatment validity with the SB4 or other ma-
gard. Excluding the subtests would certainly jor tests of ability.
discourage the scientifically unwarranted 4. We believe we have amply demon-
T

10 Stanford-Binet intelligence Scale Fourth Edition (S84) 237

strated the hazards psychologists face in 8. It is critical that examiners control test
constructing their own SB4 short forms. pieces when evaluating young children (es-
Cases could be made either for administer- pecially the Bead Memory pieces, due to the
ing the test in its entirety, or for using one of potential choking hazard).
the established and validated short forms.- 9. Examiners should inquire about color-
The two most documented and empirically blindness or family history of color-blind-
supported short forms currently appear to ness, as well as remaining alert to this possi-
be the four-subtest form (especially as a bility in their clinical observations during
screener) and the six-subtest GPAB. Better testing. The prevalence of color-blindness is
documentation of the effects of subtest se- high enough that clinicians will encounter
quencing, as well as the establishment of this issue with some frequency, and it can
short forms' the external validity, should re- influence performance on some sub tests of
main high priorities on the research agenda. SB4.
Though less glamorous than some investiga-
tions, this work would have important ap-
plications in an era focusing on efficiency CONCLUSION
and cost containment in the provision of
psychological assessment. At the beginning of this chapter, we stated
S. SB4 should not be administered to that no test is entirely without fault or
preschoolers believed to have mental retar- virtue. Perhaps SB4's greatest limitation is
dation. Because of floor effects, the test that it tries too hard to offer everything psy-
shows little capacity for detecting moderate chologists want in an IQ test. Nevertheless,
to severe retardation at these age levels. SB4's potential for meeting the avowed pur-
Moreover, the WPPSI-R generally supports poses of IQ tests is great, and, as is far too
floors equal to or slightly lower than those rare in the field of test development, the
of SB4. positive features of this instrument out-
6. SB4 provides a sufficient ceiling for the weigh its limitations.
identification of examinees who may be
gifted at any age. The breadth of constructs
measured and its extended age range also
increase the likelihood that SB4 will become ACKNOWLEDGMENTS
a favored instrument for the assessment of
Special thanks to John Wasserman, project coordinator
giftedness (Laurent et al., 1992). However, for the development of the Stanford-Binet Intelligence
it is worth noting that the revisions of the Scale: Fifth Edition (SB5), for discussing the planned
Wechsler scales published after the SB4 also revisions while the SB5 project was still in progress.
extended their norms to 3.67 or 4 standard Thanks also to Carla Kmett Danielson, Shoshana Ka-
deviations (i.e., maximum standard scores hana, and Erin McMullen for their help in tracking
of 155 to 160), essentially establishing pari- down references for the various tables.
ty with the SB4 in this respect.
7. We have argued that IQ tests should
not be marketed without adequate evidence NOTES
that they reflect the aptitudes of individuals
with disabilities. However, we cannot rea- 1. The problem of small test pieces extends be-
sonably hold SB4 to standards that have yond SB4. Several other tests administered to
never been imposed on any IQ tests at the young children, including the Bayley Scales of
time of their publication. The SB4 clearly Infant Development (Bayley, 1969), contain
met the standard of practice in test develop- item pieces so small that they are dangerous.
Of course, test publishers could argue that it
ment when it was published. It is this stan-
is the responsibility of psychologists to exer-
dard of practice itself that needs improve- cise due caution with test materials. Such a
ment. Currently marketed tests have yet to position, however, ignores the likelihood that
do an adequate job of documenting the ap- the publisher will be named in any lawsuit
propriateness of the instrument for individ- stemming from accidents with test materials.
uals with disabilities or other specific popu- Superseding any financial considerations, it is
lations. In practical terms, the SB4 appears in the best interest of children that test materi-
comparable to the other best tests available als be safe.
in technical adequacy in this area. 2. Although the BTBC was replaced recently by
..
238 II ASSESSI';\ENT OF INTElliGENCE AND LEARNING STYLES/STRATEGIES

the Boehm Test of Basic Concepts-Revised Bradley-Johnson, S. (2001). Cognitive assessment for
(Boehm, 1986) the original BTBC (Boehm, the youngest children: A critical review of tests.
1971) was used so that current results would Journal of Psychoeducational Assessment, 19,
be comparable to those reported by Kaufman 19--44.
(1978). _ Carpentieri, S. c., & Morgan, S. B. (1994). Brief re-
3. Some of the best evidence for four-factor solu- port: A comparison of patterns of cognitive func-
tions relies on data using median subtest cor- tioning of autistic and nonautistic retarded children
relations collapsed across age ranges (e.g., on the Stanford-Binet-Fourth Edition. Journal of
Autism and Developmental Disorders, 24, 215-223.
Keith et al., 1988; Thorndike, 1990). Two
Carroll, J. B. (1993). Human cognitive abilities: A sur-
considerations argue for caution in interpret-
vey of factor-analytic studies. New York: Cam-
ing these solutions: (a) Using median correla- bridge University Press.
tions may hide developmental change (Sattler, Caruso, J. C. (2001). Reliable component analysis of
1992); and (b) such approaches have ignored the Stanford-Binet Fourth Edition for 2- to 6-year-
the problem of missing data. Vastly different olds. Psychological Assessment, 13,261-266.
numbers of participants completed subtests Carvajal, H. H., & Gerber, J. (1987). 1986 Stan-
within each age group. Tables B.1 to B.17 in ford-Binet abbreviated forms. Psychological Re-
the Technical Manual report the "pairwise" ports, 61,285-286.
n's for each correlation, and numbers can Carvajal, H. H., Gerber, ]., Hewes, P., & Weaver, K.
fluctuate dramatically (e.g., n's from 38 to A. (1987). Correlations between scores on Stan-
314 for age 12; see Table B.11) within a given ford-Binet IV and Wechsler Adult Intelligence
age group. These sampling problems are like- Scale-Revised. Psychological Reports, 61, 83-86.
Iv to contribute to technical difficulties in esti- Carvajal, H. H., Hardy, K., Smith, K. L., & Weaver, K.
~ating factor structure, and they bias ob- A. (1988). Relationships between scores on Stan-
served results in unknown ways. ford-Binet IV and Wechsler Preschool and Primary
4. The abbreviated batteries discussed earlier do Scale of Intelligence. Psychology in the Schools, 25,
not suffer from this problem, because they are 129-13I.
composed of subtests 1 through 6 in SB4's ad- Carvajal, H. H., Hayes, J. E., Lackey, K. L., & Rathke,
M. L. (1993). Correlations between scores on the
ministration sequence.
Wechsler Intelligence Scale for Children-III and the
General Purpose Abbreviated Battery of the Stan-
ford-Binet IV. Psychological Reports, 72, 1167-
REFERENCES 1170.
Carvajal, H. H., McVey, S., Sellers, T., & Weyand, K.
Aiken, L. R. (2000). Psychological testing and assess- (1987). Relationships between scores on the General
ment. Boston: Allyn & Bacon. Purpose Abbreviated Battery of Stanford-Binet IV,
Anastasi, A., & Urbina, S. (1997). Psychological test- Peabody Picture Vocabulary Test-Revised, Colum-
ing (7th ed.). New York: Macmillan. bia Mental Maturity Scale, and Goodenough-Harris
Atkinson, L. (1991). Short forms of the Stanford-Binet Drawing Test. Psychological Record, 37, 127-
Intelligence Scale, Fourth Edition, for children with 130.
low intelligence. Journal of School Psychology, 29, Carvajal, H. H., Parks, J. P., Bays, K.]., & Logan, R. A.
177-18I. (1991). Relationships between scores on Wechsler
Bayley, N. (1969). Bayley Scales of Infant Develop- Preschool and Primary Scale of Intelligence-=-Revised
ment: Birth to two years. New York: Psychological and Stanford-Binet IV. Psychological Reports, 69,
Corporation. 23-26.
Binet, A., & Simon, T. (1905). Methodes nouvelles Carvajal, H. H., & Weyand, K. (1986). Relationships
pour Ie diagnostic du niveau intellectual des anor- between scores on Stanford-Binet IV and Wechsler
maux. L'Annee Psychologique, 11,191-244. Intelligence Scale for Children-Revised. Psycholog-
Boehm, A. E. (1971). Boehm Test of Basic Concepts: ical Reports, 59, 963-966.
Manual. New York: Psychological Corporation. Cattell, R. B. (1940). A culture-free intelligence test, 1.
Boehm, A. E. (1986). Boehm Test of Basic Concepts- Journal of Educational Psychology, 31, 161-179.
Revised: Manual. New York: Psychological Corpo- Choi, H.-S., & Proctor, T. B. (1994), Error-prone sub-
ration. tests and error types in the administration of the
Boyle, G. J. (1989). Confirmation of the structural di- Stanford-Binet Intelligence Scale: Fourth Edition.
mensionalitv of the Stanford-Binet Intelligence Scale Journal of Psychoeducational Assessment, 12,
(Fourth Edi~ion). Personality and Individual Differ- 165-17I.
ences, 10, 709-715. Clark, R. D., Wortman, S., Warnock, S., & Swerdlik,
Boyle, G. J. (1990). Stanford-Binet IV Intelligence M. (1987). A correlational study of Form L-M and
Scale: Is its structure supported by LISREL con- the 4th edition of the Stanford-Binet with 3- to 6-
generic factor analyses? Personality and Individual year aIds. Diagnostique, 12, 112-130.
Differences, 11, 1175-118I. Coren, S., Ward, L. M., & Enns, J. T. (1999). Sensa-
Bracken, B. A. (1985). A critical review of the Kauf- tion and perception. New York: Harcourt Brace J 0-
man Assessment Battery for Children (K-ABC). vanovich.
School Psychology Review, 14,21-36. Cronbach, L. J. (1987). Review of the Stanford-Binet

'10 Stanford-Binet Inrelligence Sca'le Fourth Edition (5B4) 239

Intelligence Scale: Fourth Edition (1007-310). Lin- Grossman, J J (Ed.). (1983). Classification in mental
coln, NE: Buros Institute of .'vlental Measurements. retardation. Washington, DC: American Association
DeLamatre, J. E., & Hollinger, C. L. (1990). Utility of on Mental Deficiency.
the Stanford-Binet IV abbreviated form for placing Grunau, R. E., Whitfield, M. F., & Petrie, J (2000).
exceptional children. Psychological Reports, 6;:' Predicting IQ of biologically "at risk" children from
973-974. age 3 to school entry: Sensitivity and specificity of
Delaney, E. A., & Hopkins, T. F. (1987). Examiner's the Stanford-Binet Intelligence Scale IV. Journal of
handbook: An expanded guide for Fourth Edition Developmental and Behavioral Pediatrics, 21,
users. Chicago: Riverside. 401-407.
Elliott, C. D. (1990). Differential Ability Scales: Intro- Hambleton, R. K., Swaminathan, H., & Rogers, H. J
ductory and technical handbook. San Antonio, TX: (1991). Fundamentals of item response theory.
Psychological Corporation. Newbury Park, CA: Sage.
Embretson, S. E. (1999). Issues in the measurement of Hansen, .'v!. H., Hurwitz, W. N., & Madow, W. G.
cognitive abilities. In S. E. Embretson & S. L. Hersh- (1953). Sample survey methods and theory. New
berger (Eds.), The new rules of measurement: What York: Wiley.
every psychologist and educator should know (pp. Harris, S. L., Handleman, J S., & Burton, J. L. (1990).
1-16). Mahwah, NJ: Erlbaum. The Stanford-Binet profiles of young children with
Flanagan, D. P., & Alfonso, V. C. (1995). A critical re- autism. Special Services in the Schools, 6, 135-143.
view of the technical characteristics of new and re- Hartwig, S. S., Sapp, G. L., & Clayton, G. A. (1987).
cently revised intelligence tests for preschool chil- Comparison of the Stanford-Binet Intelligence Scale:
dren. Journal of Psychoeducational Assessment, 13, Form L-M and the Stanford-Binet Intelligence Scale
66-90. Fourth Edition. Psychological Reports, 60,
Flynn, JR. (1984). IQ gains and the Binet decrements. 1215-1218.
Journal of Educational Measurement, 21, 283-290. Hendershott, J. L., Searight, H. R., Hatfield, J. L., &
Flynn, JR. (1999). Searching for justice: The discovery Rogers, B. J (1990). Correlations between the Stan-
of IQ gains over time. American Psychologist, 54, ford-Binet, Fourth Edition and the Kaufman Assess-
5-20. ment Battery for Children for a preschool sample.
Glutting, J J., Adams, W., & Sheslow, D. (2000). Perceptual and Motor Skills, 71, 819-825.
Wide Range Intelligence Test manual. Wilmington, Hernandez, D. J (1997). Child development and the
DE: Wide Range. social demography of childhood. Child Develop-
Glutting, J J., & Kaplan, D. (1990). Stanford-Binet In- ment, 68, 149-169.
telligence Scale, Fourth Edition: Making the case for Hopkins, T. F. (1988). Commentary: The Fourth Edi-
reasonable interpretations. In C. R. Reynolds & R. tion of the Stanford-Binet: Alfred Binet would be
W. Kamphaus (Eds.), Handbook of psychological proud .... Measurement and Evaluation in Coun-
and educational assessment of children: Intelligence seling and Development, 21, 40-41.
and achievement (pp. 277-295). New York: Guil- Horn, J L. (1965). A rationale and test for the number
ford Press. of factors in factor analysis. Psychometrika, 30,
Glutting, J J., McDermott, P. A., & Stanley, J C. 179-185.
(1987). Resolving differences among methods of es- Horn, J L. (1968). Organization of abilities and the de-
tablishing confidence limits for test scores. Educa- velopment of intelligence. Psychological Review, 79,
tional and Psychological Measurement, 47, 242-259.
607-614. Horn, J L., & Cattell, R. B. (1966). Refinement and
Glutting, J J., McDermott, P. A., Watkins, M. M., test of the theory of fluid and crystallized intelli-
Kush, J c., & Konold, T. R. (1997). The base rate gence. Journal of Educational Psychology, 57,
problem and its consequences for interpreting chil- 253-270.
dren's ability profiles. School Psychology Review, Horn, J. L., & Noll, J. (1997). Human cognitive capa-
26,176-188. bilities: Gf-Gc theory. In D. P. Flanagan, J L. Gen-
Glutting, J J, Youngstrom, E. A., Oakland, T., & shaft, & P. L. Harrison (Eds.), Contemporary intel-
Watkins, M. (1996). Situational specificity and gen- lectual assessment: Theories, tests, and issues (pp.
erality of test behaviors for samples of normal and 53-91). New York: Guilford Press.
referred children. School Psychology Review, 25, Husband, T. H., & Hayden, D. C. (1996). Effects of
94-107. the addition of color to assessment instruments.
Glutting, J. J., Youngstrom, E. A., Ward, T., Ward, S., Journal of Psychoeducational Assessment, 14,
& Hale, R. (1997). Incremental efficacy of WISC-III 147-15l.
factor scores in predicting achievement: What do Jaeger, R. M. (1984). Refinement and test of the theory
they tell us? Psychological Assessment, 9, 295-301. of fluid and crystallized intelligence. Journal of Edu-
Greene, A. c., Sapp, G. L., & Chissom, B. (1990). Val- cational Psychology, 57, 253-270.
idation of the Stanford-Binet Intelligence Scale: Jensen, A. R. (1980). Bias in mental testing. New York:
Fourth Edition with exceptional black male stu- Free Press.
dents. Psychology in the Schools, 27, 35-4l. Kamphaus, R. W. (1993). Clinical assessment of chil-
Gridley, B. E., & McIntosh, D. E. (1991). Confirmato- dren's intelligence. Boston: Allyn & Bacon.
ry factor analysis of the Stanford-Binet: Fourth Edi- Kaufman, A. S. (1994). Intelligent testing with the
tion for a normal sample. Journal of School Psychol- WISC-III. New York: Wiley.
ogy, 29, 237-248. Kaufman, A. S. (1978). The importance of basic con-
240 II ASSESS'~/~.ENT OF iNTELLlGE~CE AND LEARNING STYLES/STRATEGIES

cepts in the individual assessment of preschool chil- WISC-III. School Psychology Quarterly, 12,
dren. Journal of School Psychology, 16,207-211. 197-234.
Kaufman, A. S., & Kaufman, N. L. (1983). K-ABC: NlcCallum, R. S., & Karnes, F. A. (1990). Use of a
Kaufman Assessment Battery for Children. Circle brief form of the Stanford-Binet Intelligence Scale
Pines, !YIN: American Guidance Service. (Fourth) for gifted children. Journal of School Psy-
Keith, T. Z., Cool, V. A., Novak, C. G., & White, L. J. chology, 28, 279-283.
(1988). Confirmatory factor analysis of the Stan- N1cCallum, R. S., Karnes, F. A., & Crowell, Nl. (1988).
ford-Binet Fourth Edition: Testing the theory-test Factor structure of the Stanford-Binet Intelligence
match. Journal of School Psychology, 26, 253-274. Scale (4th Ed.) for gifted children. Contemporary
Keith, T. Z., & Witta, E. L. (1997). Hierarchical and Educational Psychology, 13, 331-338.
cross-age confirmatory factor analysis of the WISC- McCallum, R. S., & Whitaker, D. P. (2000). The as-
III: What does it measure? School Psychology Quar- sessment of preschool children with the Stanford-
terly, 12, 89-107. Binet Intelligence Scale Fourth Edition. In B. A.
Kelley, T. L. (1927). Interpretation of educational mea- Bracken (Ed.), The psychoeducational assessment of
surements. Yonkers, NY: World Books. preschool children (3rd ed., pp. 76-102). New
Kish, L. (1965). Survey sampling. New York: Wiley. York: Grune & Stratton.
Kline, R. B. (1989). Is the Fourth Edition Stanford- McCarthy, D. A. (1972). Manual for the McCarthy
Binet a four-factor test?: Confirmatory factor analy- Scale of Children's Abilities. New York: Psychologi-
ses of alternative models for ages 2 through 23. cal Corporation.
Journal of Psychoeducational Assessment, 7, 4-13. McCormick, R. L. (1956). A criticism of studies com-
Kline, R. B., Snyder, J., Guilmette, S., & Castellanos, paring item-weighting methods. Journal of Applied
M. (1992). Relative usefulness of elevation, variabil- Psychology, 40, 343-344.
ity, and shape information from WISC-R, K-ABC, McCrowell, K. L., & Nagle, R. J. (1994). Comparabil-
and Fourth Edition Stanford-Binet profiles in pre- ity of the WPPSI-R and the S-B:IV among preschool
dicting achievement. Psychological Assessment, 4, children. Journal of Psychoeducational Assessment,
426-432. 12, 126-134.
Kline, R. B., Snyder, J., Guilmette, S., & Castellanos, McDermott, P. A., Fantuzzo, J. W., & Glutting, J. J.
M. (1993). External validity of the profile variability (1990). Just say no to subtest analysis: A critique on
index for the K-ABC, Stanford-Binet, and WISC-R: Wechsler theory and practice. Journal of Psychoedu-
Another cul-de-sac. Journal of Learning Disabilities, cational Assessment, 8, 290-302.
26, 557-567. McDermott, P. A., Fantuzzo, J. W., Glutting, J. ].,
Krohn, E. J., & Lamp, R. E. (1989). Concurrent validi- Watkins, Y1. W., & Baggaley, A. R. (1992). Illusions
ty of the Stanford-Binet Fourth Edition and K-ABC of meaning in the ipsative assessment of children's
for Head Start children. Journal of School Psycholo- ability. Journal of Special Education, 25, 504-526.
gy, 27, 59-67. NIeyer, G. ]., Finn, S. E., Eyde, L. D., Kay, G. G., Ku-
Krohn, E. ]., & Lamp, R. E. (1999). Stability of the biszyn, T. W., Moreland, K. L., Eisman, E. ]., &
SB:FE and K-ABC for young children from low- Dies, R. R. (1998). Benefits and costs of psychologi-
income families: A 5-year longitudinal study. Jour- cal assessment in health care delivery: Report of the
nal of School Psychology, 37, 315-332. Board of Professional Affairs Psychological Assess-
Kyle, J. M., & Robertson, C. M. T. (1994). Evaluation ment Workgroup, Part I. Washington, DC: Ameri-
of three abbreviated forms of the Stanford-Binet In- can Psychological Association.
telligence Scale: Fourth Edition. Canadian Journal of Moffitt, T. E., & Silva, P. A. (1987). WISC-R Verbal
School Psychology, 10, 147-154. and Performance IQ discrepancy in an unselected
Laurent, ]., Swerdlik, M., & Ryburn, M. (1992). Re- cohort: Clinical significance and longitudinal stabili-
view of validity research on the Stanford-Binet Intel- ty. Journal of Consulting and Clinical Psychology,
ligence Scale: Fourth Edition. Psychological Assess- 55, 768-774.
ment, 4, 102-112. Molfese, V., Yaple, K., Helwig, S., & Harris, L.
Lavin, C. (1995). Clinical applications of the Stan- (1992). Stanford-Binet Intelligence Scale (Fourth
ford-Binet Intelligence Scale: Fourth Edition to read- Edition): Factor structure and verbal subscale scores
ing instruction of children with learning disabilities. for three-year-olds. Journal of Psychoeducational
Psychology in the Schools, 32, 255-263. Assessment, 10, 47-58.
Lavin, C. (1996). The Wechsler Intelligence Scale for Nagle, R.]., & Bell, N. L. (1993). Validation of Stan-
Children-Third Edition and the Stanford-Binet In- ford-Binet Intelligence Scale: Fourth Edition abbre-
telligence Scale: Fourth Edition: A preliminary study viated batteries with college students. Psychology in
of validity. Psychological Reports, 78, 491-496. the Schools, 30, 227-231.
Levy, P. (1968). Short-form tests: A methodological re- Nagle, R. J., & Bell, N. L. (1995). Validation of an
view. Psychological Bulletin, 69,410-416. item-reduction short form of the Stanford-Binet In-
Lukens, J. (1988). Comparison of the Fourth Edition telligence Scale: Fourth Edition with college stu-
and the L-M edition of the Stanford-Binet used with dents. Journal of Clinical Psychology, 51, 63-70.
mentally retarded persons. Journal of School Psy- Naglieri, J. A. (1988a). Interpreting area score varia-
chology, 26, 87-89. tion on the fourth edition of the Stanford-Binet
Macmann, G. NL, & Barnett, D. W. (1997). Myth of Scale of Intelligence. Journal of Clinical Child Psy-
the master detective: Reliability of interpretations chology, 17, 225-228.
for Kaufman's "intelligent testing" approach to the Naglieri, J. A. (1988b). Interpreting the subtest profile
1D Stor,[ord-Binet Interlige~ce Scale Fourth Edition (SB4) 241

on the fourth edition of the Stanford-Binet Scale of Smith, D. K., & Bauer,].]. (1989). Relationship of the
Intelligence. Journal of Clinical Child Psychology, K-ABC and S-B:FE in a preschool sample, Paper
17,62-65. presented at the annual meeting of the National As-
Neisser, U., Boodoo, G., Bouchard, T ]., Jr., Boykin, sociation of School Psychologists, Boston,
A. W., Brody, N., Ceci, S. J., Halpern, D. F., Smith, G, T, McCarthy, D, ""1., & Anderson, K. G,
Loehlin, ]. C, Perloff, R., Sternberg, R. J., &- (2000). On the sins of short-form development. Psy-
Urbina, S. (1996). Intelligence: Knowns and un- chological Assessment, 12, 102-111.
knowns. American Psychologist, 51, 77-101. Spearman, C (1923). The nature of intelligence and
Olridge, O. A., & Allison, E. E. (1968). Review of the principles of cognition. London: :Vlacmillan.
Wechsler Preschool and Primary Scale of Intelli- Spruill, ]. (1988). Two types of tables for use with the
gence. Journal of Educational Measurement, 5, Stanford-Binet Intelligence Scale: Fourth Edition.
347-348. Journal of Psychoeducational Assessment, 6, 78-86.
Ownby, R. L., & Carmin, C 0J. (1988). Confirmatory Spruill,]. (1991). A comparison of the Wechsler Adult
factor analyses of the Stanford-Binet Intelligence Intelligence Scale-Revised with the Stanford-Binet
Scale, Fourth Edition. Journal of Psychoeducational Intelligence Scale (4th Edition) for mentally retarded
Assessment, 6, 331-340. adults. Psychological Assessment, 3, 133-135.
Prewett, ,Po 0;. (1992). Short forms of the Stanford- Thorndike, R. L. (1973). Stanford-Binet Intelligence
Binet Intelligence Scale: Fourth Edition. Journal of Scale: Form L-M, 1972 norms tables. Boston:
Psychoeducational Assessment, 10,257-264. Houghton Mifflin.
Psychological Corporation. (1997). Wechsler Adult In- Thorndike, R. L. (1990). Would the real factors of the
telligence Scale-Third Edition, Wechsler Memory Stanford-Binet Fourth Edition please come for-
Scale-Third Edition technical manual. San Anto- ward? Journal of Psychoeducational Assessment, 8,
nio, TX: Author. 412-435.
Psychological Corporation. (1999). Wechsler Abbrevi- Thorndike, R. L. (1982). Applied psychometrics.
ated Scale of Intelligence manual. San Antonio, TX: Boston: Houghton Mifflin.
Author. Thorndike, R. L., Hagen, E. P., & Sattler, J. M.
Reynolds, CR., Kamphaus, R. W., & Rosenthal, B. L. (1986a). Guide for administering and scoring the
(1988). Factor analysis of the Stanford-Binet Fourth Stanford-Binet Intelligence Scale: Fourth Edition.
Edition for ages 2 years through 23 years. Measure- Chicago: Riverside.
ment and Evaluation in Counseling and Develop- Thorndike, R. L., Hagen, E. P., & Sattler, ]. M.
ment, 21, 52-63. (1986b). Stanford-Binet Intelligence Scale: Fourth
Rosenthal, B. L., & Kamphaus, R. W. (1988). Interpre- Edition. Chicago: Riverside.
tive tables for test scatter on the Stanford-Binet In- Thorndike, R. L., Hagen, E. p" & Sattler, ]. M,
telligence Scale: Fourth Edition. Journal of Psychoe- (1986c). Technical manual, Stanford-Binet Intelli-
ducational Assessment, 6,359-370, gence Scale: Fourth Edition. Chicago: Riverside.
Rothlisberg, B. A, (1987). Comparing the Stanford-Bi- Velicer, W. F. (1976). Determining the number of com-
net, Fourth Edition to the WISC-R: A concurrent va- ponents from the matrix of partial correlations. Psy-
lidity study, Journal of School Psychology, 25, chometrika, 41,321-327.
193-196. Vernon, p, E. (1950). The structure of human abilities.
Rust, J. 0., & Lindstrom, A. (1996). Concurrent valid- London: Methuen.
ity of the WISC-III and Stanford-Binet IV. Psycho- Vernon, P. E. (1987). The demise of the Stanford-Binet
logical Reports, 79, 618-620. Scale. Canadian Psychology, 28, 251-258.
Saklofske, D. H., Schwean, V. L., Yackulic, R. A., & Volker, M. A., Guarnaccia, V., & Scardapane, j. R.
Quinn, D. (1994). WISC-III and SB:FE performance (1999). Short forms of the Stanford-Binet Intelli-
of children with attention deficit hyperactivity disor- gence Scale: Fourth Edition for screening potentially
der. Canadian Journal of School Psychology, 10, gifted preschoolers. Journal of Psychoeducational
167-171. Assessment, 17, 226-235.
Sattler,]. (1992). Assessment of children (3rd ed.). San Wainer, H. (1999). The most dangerous profession: A
Diego, CA: Author. note on nonsampling error. Psychological Methods, 4,
Saylor, C F., Boyce, G. c., Peagler, S. M., & Callahan, 250-256.
S. A. (2000). Brief report: Cautions against using the Watkins, M, W. (1996). Diagnostic utility of the
Stanford-Binet-IV to classify high-risk preschoolers. WISC-III Developmental Index as a predictor of
Journal of Pediatric Psychology, 25, 179-183. learning disabilities. Journal of Learning Disabili-
Schwean, V. L., Saklofske, D. H., Yackulic, R. A., & ties, 29, 305-312.
Quinn, D. (1993). WISC-III performance of ADHD Watkins, M. W., & Kush,]. C (1994). Wechsler sub-
children. Journal of Psychoeducational Assessment test analysis: The right way, the wrong way, or no
(WISC-III Monograph), pp, 56-70. way? School Psychology Review, 23, 640-651.
Silverstein, A. B. (1993). Type I, Type II, and other Watkins, M. W., Kush,]. C, & Glutting,]. J. (1997).
types of errors in pattern analysis. Psychological As- Prevalence and diagnostic utility of the WISC-III
sessment, 5, 72-74. SCAD profile among children with disabilities.
Simpson, M., Carone, D. A., Burns, W. J., Seidman, T, School Psychology Quarterly, 12, 235-248.
Montgomery, D., & Sellers, A. (2002). Assessing Wechsler, D. (1967). Manual for the Wechsler
giftedness with the WISC-III and the SB4. Psycholo- Preschool and Primary Scale of Intelligence. New
gy in the Schools, 39, 515-524. York: Psychological Corporation.
242 II ASSESS;V\ENT OF INTELLIGENCE ,"'ND LEARNING STYLES/STRATEGIES

Wechsler, D. (1989). Wechsler Preschool and Primary cock-Johnson Psychoeducational Battery-Revised


Scale of Intelligence-Revised: Manual. San Anto- Edition: Examiner's manual. Allen, TX: DLM
nio, TX: Psychological Corporation. Teaching Resources.
Wechsler, D. (1991). Manual for the Wechsler Intelli- Youngstrom, E. A., Kogos, J. L., & Glutting, ]. ].
gence Scale for Children-Third Edition. San Anto- (1999). Incremental efficacy of Differential Ability
nio, TX: Psychological Corporation. - Scales factor scores in predicting individual achieve-
Weiner, E. A., & Stewart, B. J. (1984). Assessing indi- ment criteria. School Psychology Quarterly, 14,
viduals. Boston: Little, Brown. 26-39.
Wersh, ]., & Thomas, M. R. (1990). The Stanford- Ysseldyke, J. E., & Stevens, L. J. (1986) Specific learn-
Binet Intelligence Scale-Fourth Edition: Observa- ing deficits: The learning disabled. In R. T. Brown &
tions, comments and concerns. Canadian Psycholo- c. R. Reynolds (Eds.), Psychological perspectives on
gy, 31, 190-193. childhood exceptionality: A handbook (pp.
Woodcock, R. W., & Johnson, M. B. (1990a). Wood- 381-422). New York: Wiley.
cock-Johnson Psychoeducational Battery-Revised Zwick, W. R., & Velicer, W. F. (1986). Comparison of
Edition. Allen, TX: DLM Teaching Resources. five rules for determining the number of components
Woodcock, R. W., & Johnson, M. B. (1990b). Wood- to retain. Psychological Bulletin, 99, 432-442.

You might also like