Lawpress

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

ARTICLE IN PRESS

Developmental Review xxx (2004) xxx–xxx


www.elsevier.com/locate/dr

On the law of intelligence


William Lichten*
Koerner Center for Emeritus Faculty, Yale University, New Haven, CT 06520-8368, USA
Received 21 November 2003; revised 26 March 2004
Available online

Abstract

The law of intelligence is presented in test independent form. Mental abilities, physical
brain size, and infant motor capacity follow the same law of growth from birth to adolescence.
Mental growth is independent of race, SES or the Flynn effect. The vitality of the mental age
scale calls for a reexamination of WechslerÕs deviation IQ. This paper builds on YenÕs method
of standardized differences (1986). The main theoretical advance here is to put development
back into intelligence testing and to show a universality among different measures of the
growth of the human nervous system.
Ó 2004 Elsevier Inc. All rights reserved.

This paper suggests a new theoretical structure for psychoeducational measure-


ment and uses it to derive a scale of growth of mental ability. Over the years, many
researchers sought the ‘‘law of intelligence,’’ the growth curve of mental ability, a
goal to be addressed by this paper (Bloom, 1964; Bock, 1983; Gesell, 1928; Heinis,
1924; Jensen, 1973; Keats, 1982; Thorndike, Bregman, Cobb, & Woodward, 1927;
Thurstone, 1925, 1928; Thurstone & Ackerson, 1929; and many others).

Remarks on natural laws

We consider quantitative relations in physics and psychophysics, fields that are


sometimes emulated by mental testers.

*
Fax: 1-203-432-8247.
E-mail address: William.lichten@yale.edu.

0273-2297/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved.
doi:10.1016/j.dr.2004.04.001
ARTICLE IN PRESS
2 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Scales

A pound of meat is a pound of meat. It weighs the same on the butcherÕs scales
whether by itself or if added to another piece of meat. The scale divisions are uniform
in meaning over the entire range of measurement. Similarly, an inch is the same any-
where on a yardstick. Likewise 1 s has a simple, well defined and measurable mean-
ing at any place or time. Thus the basic units of physics, mass, length, and time, and
the laws based on them, are measured on uniform, well-standardized scales.
Psychophysical scales measure the relation between subjective sensations (such
as loudness, pitch, and brightness) and objective physical correlates (sound inten-
sity, frequency, and light intensity). The scales of psychophysics purport to be uni-
form. For example, one asks an observer to vary the intensity of one sound until it
seems half as loud as a second sound. This way, one can set up a loudness scale
which has equal divisions. (Licklider, 1951; Stevens, 1951, 1975; Stevens & Davis,
1938; Woodworth & Schlosberg, 1954). An example of a psychophysical law is that
of Weber–Fechner, that the sensation of pitch or loudness of a sound, brightness
of a light, etc., is proportional to the logarithm of the corresponding physical var-
iable. (For a sampling of the many discussions of this law and alternatives, see En-
gen, 1971; Luce, Bush, & Galanter, 1963; Luce & Krumhansl, 1988; Luce &
Suppes, 2002; Stevens, 1975; Suppes & Zinnes, 1963; Thurlow, 1971; Woodworth
& Schlosberg, 1954.)

Is a law of intelligence possible?

Quantitative physical and psychophysical laws hinge on measurement scales. Can


we set up a law of intelligence by merely following the examples of physics and psy-
chophysics? Unfortunately the matter is not so simple. As Jensen (1993, p. 141) put
it, ‘‘There are no existing tests that could render such statements as the following at
all meaningful: ÔA person gains half of his adult level of mental ability by the age of
five.’’Õ The rationale underlying this statement is the impossibility of comparing di-
rectly the growth of intelligence at different ages.
For example, infants are in PiagetÕs sensorimotor stage:
. . .during the first year. . .intelligence, strictly speaking, is not yet observed.
(Piaget & Inhelder, 1969, p. 9)

On the other hand, adults are in a formal operational stage. Comparing the two
would be a case of apples and oranges.
The earliest intelligence measurements were expressed on a mental age (MA) scale
(Binet & Simon, 1916). On BinetÕs scale, test scores advanced by even amounts each
year. But almost all subsequent mental tests showed a quite different growth pattern.
Terman and Merrill (1937) noted MA was a very uneven scale, with rapid mental
growth among infants and young children and near stasis in late adolescence. Thus
neither the MA scale nor its grade level achievement twin can answer JensenÕs rhe-
torical question. Yet both are quite useful and are still widely used in the clinical,
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 3

developmental, and educational literature. In Terman and MerrillÕs words (1937, p.


25)
The expression of a test result in terms of age norms is simple and unambiguous, resting
upon no statistical assumptions. A test so scaled does not pretend to measure intelligence
as linear distance is measured by the equal units of a foot-rule, but tells us merely that
the ability of a given subject corresponds to the average ability of children of such and such
an age.

It was a pity that the MA scale was dropped from IQ testing. This paper will use
the MA scale to derive the law of intelligence. For further discussion of MA, see the
later section IQ and Mental Age Scales.
Layzer (1972, p. 276) pointed out a related difficulty with IQ, as a measure of de-
viation of intelligence from the average at a given age: ‘‘IQ does not measure an in-
dividual phenotypic character like height or weight; it is a measure of the rank order
or relative standing of test scores in a given population.’’ (See also Jensen, 1993.) To
illustrate his point, consider this question. Which step in intelligence is the greater:
from 100 to 130 or from 70 to 100 IQ points? 130 and 100 might be the difference
between a research medical doctor and a butcher. On the other hand, 100–70 is
the gap between average and mentally retarded, which under a recent Supreme Court
decision can be the difference between life and death (Atkins vs Virginia, 2002;
Greenhouse, 2002). Both intervals represent 30 points, but how can you compare
the two? Thus it is again an apples and oranges problem to equate units at different
points on existing mental ability scales.
Although we may not yet say what a mental growth scale is, we can certainly say
what it is not. Mental ability is not like a pound of meat; it cannot be put on a scale
where each interval is exactly equal to every other one in meaning and in size. If it
were that simple, the problem of finding the law of intelligence would be solved and
there would be no need for this paper. We now turn to the measurements which can
be the basis of such a law.

The measurement of mental ability (IQ, achievement, etc.)

Without a simple, linear scale, how can we deal quantitatively with intelligence? In
the words of Jensen (1969, pp. 5–6)
Intelligence, like electricity, is easier to measure than to define. And if the measurements
bear some systematic relationship to other data, it means we can make meaningful state-
ments about the phenomenon we are measuring. There is no point in arguing the question
to which there is no answer, the question of what intelligence really is. The best we can do is
to obtain measurements of certain kinds of behavior and look at their relationship to other
phenomena and see if these relationships make any kind of sense and order.

Luce and Krumhansl (1988) pointed out that the situation is similar to the early
days of the study of heat. Nobody knew exactly what temperature was. The basis of
the concept was subjective feeling of hot and cold. It took centuries for the develop-
ment of the laws of thermodynamics before temperature was really understood. Nev-
ertheless, pioneers went about constructing thermometers based on expansion of
ARTICLE IN PRESS
4 W. Lichten / Developmental Review xxx (2004) xxx–xxx

liquids. They marked their instruments at two standard temperatures and divided it
into equal steps. For example, FahrenheitÕs scale had its zero at a mixture of ice and
salt and its 96 at body temperature.
A simple way to compare scales is to match mid points. For example, consider
thermometers with standard temperatures at 0 and 100 °F. The scale midpoints for
gas thermometers differ from each other by only few thousandths of a degree. The
midpoint of the mercury scale differs from gas thermometers by 0.1 °F and from al-
cohol by 1 °F. The excellent agreement among most thermometric materials means
that thermometer scales are independent of the material used.
Water is an exception. It would make a poor thermometer. It would read 81.3 °F
at the midpoint of the 32–100 °F scale (66 °F). The reason for this gross discrepancy
is the non-linear expansion of water.
If one were to plot temperature from almost any scale against another, the plot
would be a straight line. This linear agreement among scales made it reasonable
to use any one to define temperature. Such scales preceded and agree with the
now well understood laws of thermodynamics.
Note that we cannot directly compare different parts of the temperature scale with
each other, as we might with two yardsticks by laying one on top of the other. There
is no simple, direct way to compare the temperature intervals 0–10 °C and 90–100 °C.
Yet the consistency and the linearity among scales make the measurement of temper-
ature exact.
In conclusion, this paper is a search for scales of the growth of mental ability
which do not depend on the specifics of the test used to measure it. A simple, prac-
tical test of this consistency of such scales is to compare growth midpoints. We obey
Jensen (1969) and avoid the claim that this is the way that ‘‘intelligence’’ really
grows. Rather, we shall compare the current scale with others in an effort to gain in-
sight into the nature of intelligence and other mental abilities.

Growth and variation. Local vs. global properties

The developmental psychologist Wohlwill (1973) split the growth of any quanti-
tative psychological trait into a universal growth function (AllportÕs nomothetic,
1942) and the individual variation about that function (AllportÕs idiographic).
McCall, Eichorn, and Hagerty (1977, p. 3) noted that developmental psychologists
tend to slight one of these two factors:
Ironically, most empirical research has stemmed from an individual difference tradition in
which cross age correlations were calculated between indices of mental performance, while
the major theorist, Piaget, deals only with developmental function.

McCallÕs criticism is particularly germane to IQ testing, where the WechslerÕs (1939)


deviation scale slights developmental function. One aim here is to overcome this lack.
Luce and Krumhansl (1988, p. 39) distinguished between local psychophysics,
‘‘which is concerned. . .with stimuli which are physically little different’’ vs. ‘‘global
. . . sensations over the full dynamic range of the physical stimuli.’’ In physics, local
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 5

properties are differential; global features are integral. Thus a differential equation
governs the acceleration of a falling body at a given position. Finding the integral
of this equation gives the overall motion of the body or projectile. Either form is de-
rived from the other by means of calculus. The integral form contains more informa-
tion (the boundary conditions) than the differential version. NewtonÕs laws give the
acceleration of a projectile at each point of its trajectory, but it takes a whole chapter
in physics textbooks to relate this local condition to global information, such as the
time the object takes to fall to the ground or the path followed by a projectile.
Likewise, in psychophysics, WeberÕs law, that the just noticeable difference (j.n.d.)
of a stimulus is proportional to its magnitude, is a local relation stated in differential
form. The Weber–Fechner law, that states that the sensation is proportional to the
logarithm of the stimulus magnitude, is a global version. As in physics, the global
and local relation can be derived mathematically from each other. However, in the
psychophysical case the global law involves further assumptions. (For references,
see Remarks on natural laws section at the beginning of this paper.)

Importance of growth

Growth was at the heart of the first intelligence tests (Binet & Simon, 1916), which
measured on the mental age scale but neglected to measure variation at a given age.
On the other hand, the modern deviation IQ scale is based on variation only and
gives no information about growth (Wechsler, 1939).
For developmental psychology and education growth is sine qua non. Casual ob-
servations as far back as Aristotle have shown that two adults of the same age are
more alike than a baby and an adult. The total growth of mental ability from birth
to adulthood is large compared to population variations occurring at a given age.
(see Appendix).
The global nature of mental ability goes beyond variation (IQ) at a given CA and
also includes the much larger growth and decay over the entire life cycle. We can view
IQ and growth over a short term (such as a year) as local and thus as incomplete.
Normally factor analyses of IQ tests are taken from data at the same CA. The
general factor of such analyses is g, general intelligence (Jensen, 1998). If a factor
analysis were instead made of data across the range of ages, for example in the
WISC, Wechsler Intelligence Scale for Children, the general factor would become
CA by far.
The problems of different fields are much more alike than their practitioners think. . .the
physical sciences have learned much by storing up amounts, not just directions. . .being so
uninterested in our variables that we do not care about their units can hardly be desirable.
Tukey (1969, pp. 83, 86, 89)

The need for units

This paperÕs goal is nomothetic: to find universal properties of the growth of abil-
ity. Accordingly, it aims to express measurements and derived scales in well-defined
ARTICLE IN PRESS
6 W. Lichten / Developmental Review xxx (2004) xxx–xxx

units. However, in psychology units are hard to come by. The psychophysicist, in the
classical 19th century tradition of Wundt, Hering, and Helmholtz, measured mental
events, such as brightness, loudness, and pitch, in terms of tangible, physical quan-
tities like intensity and frequency. During the 20th century the word ‘‘psychophys-
ics’’ became ‘‘psychometrics.’’ Mental testers had one physical variable (CA) upon
which to hang their hats. In addition, standardized tests often use deviation based
units like IQ or percentiles.
The present paper is based on normed, aggregated mental test data, the av-
erage and SD as a function of CA (for intelligence tests) or grade (for achieve-
ment tests). It works equally well and consistently on a variety of ability
measures and derived scales: raw scores; MA; IQ; Thurstone (1925), Rasch,
and Item Response Theory. Examples can be found in Appendix and Yen
(1986). This paper limits its data to well standardized mental ability tests. The
generic term ‘‘mental ability’’ covers a wide variety of standardized IQ, achieve-
ment, and infant development tests. Although some authors treat a even wider
range of abilities (Gardner, 1983; Salovey & Mayer, 1990; Sternberg, 1997; Tor-
rance, 1988; Torrance & Goff, 1989), none has made a standardized test which
could be used in this monograph (Jensen, 1998). The goal of this paper is to
find out to what extent an objective growth scale can be based on these mea-
sures.
Intelligence is what the tests test.
Boring (1923, p. 35)

The tests

This paper builds scales from aggregated data (group or population averages),
which are ‘‘true scores’’ (Gulliksen, 1987). It uses IQ, infant development and
achievement tests from birth to adolescence. Standardized tests for teenagers and
adults, such as the SAT (formerly the Scholastic Aptitude/Assessment Test), ACT
(formerly American College Testing Program), GRE (Graduate Record Examina-
tion), LSAT (Law School Admission Test), MCAT (Medical College Admission
Test), etc., and individual scores show idiosyncratic rather than lawful behavior
and thus receive limited consideration in this paper. The tests consist of a nested se-
ries of components.

Items

The smallest unit of mental ability tests is the item, which consists of single ques-
tion or task.

Subscales

Similar items, such as vocabulary or arithmetic problems, are grouped together to


form subscales.
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 7

Scales

Subscales in turn are combined to form scales. For example, the verbal scale in the
Wechsler IQ tests is a combination of information, similarities, arithmetic, vocabu-
lary, and comprehension subscales; the performance scale combines five subscales
such as picture completion and block design.
The true (error-free) raw score T for persons of a certain ability is simply the total
number of correctly answered or performed items listed in the norms manual for that
group. Testing companies take a representative sample of the population to ap-
proach error-free tables in the manual. For example, an average 10-year-old would
get 14 items correct on the WISC-III information subtest which translates into a
scale score of ten, according to the manual.
The subscale scores are combined to arrive at scale scores, which are then com-
bined to give an IQ for intelligence. Achievement test scores usually are given in per-
centiles (deviation score) or in grade level as a growth measure. The full scale score
(or composite score) combines all scales. This paper uses the terms subscales, scales,
and full scale. Scale score should be distinguished from the term scaled score or stan-
dard score z, which is the deviation of a raw score T from the mean M, expressed in
standard deviation units z ¼ T rM
T
.

IQ and mental age scales

The IQ and MA scales are test independent (all error-free tests give the same true
IQ and MA). On an easy test, an average 10-year-old may get 75 right answers on
100 items; on a hard test, the score might be only 25 items correct. On a sufficiently
large sample of a representative population, both scores would assign the same IQ
and MA to the average 10-year-old or to a sample of persons of any age with the
same mental ability. For intelligence tests, MA is a single, easily understood quantity
in well defined units of years (see Fig. 1), as is grade level for achievement.
MAÕs leveling off in late adolescence is characteristic of mental tests. Terman and
Merrill (1937, p. 25) noted:
. . .the mental age unit. . .appears definitely to decrease with age. . .the difference between 1-
year and 2-year intelligence (100 IQ points-auth.) is so great that any one can sense it. . .The
difference in intellectual ability between the average child of fifteen and the average child of
16 (then 5 IQ points-auth.) is so small that it can barely be detected by the most elaborate
mental tests.

Conversely, the SD or IQ unit, measured in MA or grade units, increases with age


(see Fig. 1 and Eq. (A.2)). At birth, rMA should vanish.
This behavior may seem strange for MA and its SD (Fig. 1) in isolation. When
both are combined to form a standardized growth function (in Appendix to this pa-
per), MA falls in line with other mental test scales (Fig. 18 and Table 1).
MA (and grade level) have the shortcoming that neither can handle children who
fall outside the scale range at either end. The reason is that both MA and grade level
ARTICLE IN PRESS
8 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Fig. 1. The mental age (MA) scale (Terman & Merrill, 1937.) Vertical bars:  1 SD.

can only refer to average abilities, and average ability never reaches above average
children at the top of the scale, nor does it reach below average children at the bot-
tom of the scale. For example, the top of the California Achievement Scale (CAT) is
at the grade level of 12 years 8 months, the last grade at which exams are given.
When a student, class, or school average is higher than that, an arbitrary score of
‘‘12.8’’ is assigned to it. The Iowa Achievement Test (ITBS) uses a fictitious scale go-
ing up to the 18th grade to handle above average 12th graders. In either case, the
number assigned has little meaning.
The growth scale used here is given in units of standard scores for each age. This
scale is extended simply by adding standard scores to it. Likewise, deviation IQ has
no problem handling exceptional persons at any age.
Since the time that MA was dropped by Wechsler (1939), IQ test scales have been
deviation based. These scales measure variation but not growth and thus are sub-
jected to the criticism voiced by McCall et al. (1977) and others. Indeed, one must
make correlational, longitudinal studies to study mental development (Anderson,
1939; Bloom, 1964; Furfey & Muehlenbein, 1932; McCall et al., 1977). To remedy
that lack, this paper uses the MA scale inter alia to handle both growth and variation.

Standard scores

The familiar standard scale z equates tests by aligning the population means and
standard deviations (SD). For example, IQ has a mean of 100 and SD ¼ 15. Figs. 2
and 3 show another example: distributions for two college entrance examinations,
the SAT-Verbal (mean ¼ 505, SD ¼ 111) and the ACT-English (mean ¼ 20.4,
SD ¼ 5:4).
Fig. 4 plots both distributions against standard scores. (The vertical heights of
both distributions also are normalized.) The distributions of standard scores equate
well.
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 9

Fig. 2. SAT-Verbal score distribution.

Fig. 3. ACT English score distribution.

Deviations between two sets of measurements which have effect sizes which are
less than 0.2 SD are considered small (Cohen, 1988). Inspection of Fig. 4 shows both
tests to be interchangeable within this precision over the range of z values between )2
and +2.
The SAT and ACT align because both have the same shaped distributions and the
huge number of tests irons out statistical fluctuations. For most tests, the distribu-
tions near the mean (jzj  2) are close to normal, which makes alignment of stan-
dard scores practical. For larger jzj, (Figs. 2 and 3) curves deviate from each other
and results become less comparable.
ARTICLE IN PRESS
10 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Fig. 4. Scaled score distributions of SAT-V and ACT English.

Norming samples at each age for IQ tests are only a few hundred at most. For
deviations of jzj  1 (IQ well outside the normal range of 85–115), there are so
few cases in the norming samples that one cannot talk of distributions at all and
it is impossible to compare different IQ tests. For example, the WISC-IV standard-
ization (norming) sample consisted of 200 children at each age. Assignments of ex-
ceptional children (gifted or retarded: IQ > 130 or <70) are based on a norming
sample in those IQ ranges of only 5 tests and are questionable. One shudders to think
that life and death decisions hinge on such data (Atkins vs Virginia, 2002; Green-
house, 2002). Especially worrisome are indications of inconsistency between Stan-
ford–Binet and Wechsler tests (Table 2 and Lichten & Wainer, 2004).

The method of standardized growth

Testers have long argued as to whose scales most accurately reflect ability. Re-
lated disputes were over whether or not the standard deviation of mental ability re-
ally increases or decreases with age. As far back as 1928, Thurstone dismissed
statements like ‘‘the distribution of intelligence follows the normal curve.’’ One
can only discuss the distribution of observable quantities like the raw score or scales
which are derived unambiguously from it, such as the latent traits of IRT (Item Re-
sponse Theory) or Rasch theory. Likewise, more recently Yen (1986) rejected dis-
putes over which mental ability scales better represented intuitive, operationally
undefined concepts.
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 11

Yen compared pairs of subtest scores (reading vocabulary and math computa-
tion) from two school achievement tests (CAT/C, California Achievement Test,
based on Thurstone number right scaling and CTBS/U, Comprehensive Tests of Ba-
sic Skills, based on IRT) and found them to be inconsistent (Figs. 5–8).
At first glance, these (Figs. 5–8) are difficult to understand, with an apparently
random pattern of change. For example, Figs. 7 and 8 show different patterns of

Fig. 5. ManufacturersÕ scales for achievement subtests (Yen, 1986). Vertical bars:  1 SD. CAT/C Reading
vocabulary. Thurstone number right scaling.

Fig. 6. ManufacturersÕ scales for achievement subtests (Yen, 1986). Vertical bars:  1 SD. CTBS/U
Reading vocabulary. IRT scaling.
ARTICLE IN PRESS
12 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Fig. 7. ManufacturersÕ scales for achievement subtests (Yen, 1986). Vertical bars:  1 SD. CAT/C Math-
ematics. Thurstone number right scaling.

Fig. 8. ManufacturersÕ scales for achievement subtests (Yen, 1986). Vertical bars:  1 SD. CTBS/U Math-
ematics. IRT scaling.

mean change, with much more pronounced early growth in 8 than in 7. On the other
hand, the changes on the SD (the error bars) seem to go in the opposite direction
(small to large in 7, large to small in 8). How can one make sense out of such a pat-
tern, in which mean and SD seem to work in opposite directions?
Furthermore, the growth curves for corresponding tests had dissimilar shapes,
patterns of growth which are contrary to the claims that both scales were equal in-
terval (1 unit represents the same amount of ability at any age). If both scales were
equal interval, the growth curves would have the same shape.
As we turn to YenÕs (1986) method of resolving these contradictions, we make
some general observations. First of all, neither the means nor SDs of either scale
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 13

can be meaningful by themselves, since both show patterns of growth with grade level
which are inconsistent. On the other hand, Yen found a consistent relation between
mean and SD, which she showed by comparing two corresponding tests (math vs.
math or vocabulary vs. vocabulary). At a given grade, if the SD of one test is rela-
tively large, the growth curve is relatively steep; when the SD of a test is relatively
small, the growth curve is flatter, with relatively small growth.
This relation is the basis of YenÕs method of standardized growth. By taking the
ratio of growth rate and SD, she found a quantity that did not depend on whether the
test was CAT/C or CTBS. (For mathematical formulae, see Appendix.) For incon-
sistency in either of two quantities (mean and SD) alone she substituted lawfulness
in the combination of both. This is the genius in YenÕs method, to which we now
turn.
In the example of the SAT and ACT scales, which had very different values for
means and standard deviations, plotting standard scores z made both distributions
the same. In a similar fashion, YenÕs standardized differences reduced annual growth
in both tests to be the same.

Yen’s results

Yen applied her method to the results shown in Figs. 5–8 (see Fig. 9 and Yen,
1986 for tables.) The standardized differences were closely the same for correspond-
ing subtests.
The close subtest agreement for each trait made moot the dispute over the relative
merits of the test scales. In her words,
standardized differences lead to essentially the same conclusion regardless of which scale is
used.
(Yen, 1986, p. 305)

Fig. 9. Standardized growth differences for the same subtests as in Figs. 5–8. SD ¼ 1.
ARTICLE IN PRESS
14 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Two remarks on standardized growth

YenÕs method was an important first step towards the law of intelligence. How-
ever, it had two limitations:
1. The method considers only local aspects of mental growth. That is, it only deals
with the standard deviation and 1 yearÕs growth of the test score. For a law of in-
telligence to handle global growth (the entire amount from birth to adulthood),
her calculation needs to be extended.
2. Standardization only brought together the growth for like scales (for a single abil-
ity of math or vocabulary). Unlike scales, vocabulary and math, did not show the
same growth. There is no a priori reason to expect growth functions in tests for
different subjects to be the same. This leaves us short of a law of intelligence, which
should not depend upon which test is used. The achievement of that goal will in-
volve finding a class of mental tests for which standardized growth is the same.

Extension of the method of standardized growth

This part of the present paper constructs a growth measure that extends and
broadens YenÕs results.
Fig. 10 shows curves obtained by the author by simply adding YenÕs differences to
obtain a growth function. This procedure has two advantages: it changes local to
global measures and irons out statistical fluctuations. (Compare Figs. 9 and 10.)

A reminder

As pointed out earlier, we cannot assume the units keep the same meaning at all
grades. It would be nice if mental ability were that simple, but that is not in the cards
at this stage of inquiry. Nevertheless, the growth function, as given here, is unambig-
uously defined. As such it is a valid measure.

Fig. 10. Growth curves obtained from Fig. 9.


ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 15

This paper explores standardized growth of the composite, or ‘‘full scale,’’ score
of a mental ability test, which combines all subjects, such as math computation,
math concepts, reading ability, vocabulary, spelling, geometric visualization, and
factual knowledge. By combining subjects which involve many portions of the brain
and occur at every stage of development, one might expect to find a more lawful
measure of human development that depends less on the specifics of the test.

Remarks on standardized growth

Growth curves for different subtests, such as ITBS reading vs. CAT/C math com-
putation sometimes do not agree; growth for subtests of like abilities agree. Calcu-
lations of growth are made without any adjustable parameters, a stringent test.
Typical coefficients of variation (c.o.v.) of total growth among unlike subtests are
25%, as compared with c.o.v. among like subtests of 10%. Standardized growth re-
duces variance by a factor of 6. As noted earlier here, cumulative growth is more pre-
cise than standardized differences, since it statistically averages out random
fluctuations of individual pairs of data (Figs. 9 and 10).
It matters not so much what the questions ask, as long as they are numerous.
(Binet & Simon, 1916, p. 329)

The indifference of the indicator.


(Spearman, 1927, p. 198)

Full scale tests

The expectation that full scale tests would probe universal aspects of the human
mind goes back to the earliest days of mental testing. Ever since Binet and Stern,
IQ tests have assessed a broad mixture of mental skills. The universal results did
not depend on the details of the test. This is the rationale for what follows in this
paper.

Achievement tests

Fig. 11 plots full scale scores for two widely used achievement tests, the CAT E,F
and the Iowa Tests of Basic Skills (ITBS). For comparison, the linear scales of both
tests are adjusted to make both coincide at grades 1 and 12. The curvesÕ different
shapes affect the growth midpoints. The CAT E,F (based on IRT scaling) midpoint
is in the 2nd grade; the ITBS (based on a proprietary scale) rises more gradually and
reaches the halfway mark in the 4th grade.
Fig. 12 shows standardized growth for the two tests. The halfway growth marks
are much closer together than in Fig. 11.
Table 1 compares a sample of widely used achievement tests. The standardized
growth mid grades are in better agreement than the manufacturerÕs scales. The
ARTICLE IN PRESS
16 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Fig. 11. ManufacturersÕ growth curves for composite CAT/E,F and ITBS achievement tests. Arrows:
score midpoints.

Fig. 12. Standardized growth curves for ITBS and CAT tests. Compare with Fig. 17.

c.o.v. again is reduced approximately from 25 to 10% by standardization, a reduc-


tion of variance by a factor of six. The method works for full scale tests as it did
for Yen for subtests.
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 17

Table 1
K-12 growth (SD); half-way growth points for achievement and IQ
Test Scale K-12 Growth Mid grade or mid age in years
(SD)
Standardized Original
Achievement tests
CAT IRT 7.7 2.4 grade 2.4
CTBS IRT 8.4 2.3 2.1
ITBS ITBS 8.4 2.8 5.2
MAT Rasch 7.8 2.7 2.7
SAT Rasch 8.7 2.3 2.5
W-J Rasch 10.4 2.7 2.1
WRAT Rasch 8.8 2.1 2.1
Mean
Raw 8.6 2.5 grade 2.7
Correcteda 6.9 8.2 years —
SD 0.7 0.26 years 1.1

IQ Tests
Mean 7.3 7.9 Years
SD 0.7 —
a
To convert grades for achievement tests to age in years to match IQ tests, corrections were made for
retention in grade and exclusion of special education students from testing.

The dispute as to whether mental tests are unidimensional is nearly a century old,
goes back to Spearman and Thurstone, and is still current. However, WilksÕ (1938)
theorem implies that correlations between different tests approach unity as test
lengths become infinite. Thus full scale, long tests may appear to be unidimensional,
even if the subtest contents are multidimensional. Yen (1985) found evidence of
multidimensionality in an achievement test, especially in the mathematics subtests.
However, this occurred in the high school grades, which age range was ruled out
for the present paper on the basis of lack of lawful behavior.
In Table 1, the ITBS appears to be an outlier from the other tests. However, ex-
amination of Figs. 5–8, shows that the Thurstone scaled CAT C tests also differ from
the CAT E, F, IRT scaled tests in similar fashion. Whether the CAT C–CAT E, F
difference is caused by the change in scaling from Thurstone to IRT, or by the revi-
sion of the test is anybodyÕs guess. Rather than try to untangle this complexity, it
suffices to say that YenÕs method of standardized differences eliminates the problem.
The correlation between different subtests is often relatively low; that between full
scale tests is high. Full scale tests consist of many, various items. Statistical sampling
theory tells us that the correlation between full scale tests will be high (Wilks, 1938).
Likewise, the statistical averaging in full scale tests irons out differences in subtest
growth.
Incidentally, the choice of scale makes little difference in the end. The outcome is a
universal growth function. It was predicted and confirmed that this function governs
brain growth (Lichten, 1993, 1996). This function transcends mental ability, unex-
pectedly, as it applies also to infant motor ability (Bayley, 1993).
ARTICLE IN PRESS
18 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Intelligence tests

IQ test correlations and growth follow suit. For example, in the Wechsler Intelli-
gence Scale for Children (WISC-III) the average correlation among subtests is
r ¼ 0:44; between verbal and performance scales, each consisting of five subtests, it
is r ¼ 0:66. The standardized growth curves for subtests, the verbal, performance
and full scale tests, are shown in Figs. 13–15. Growth among subtests vary; perfor-
mance, verbal, and full scale growth are close to each other. Similarly, different full scale
IQ tests usually have high correlations with each other and other standardized tests.
The standardized growth for IQ tests is calculated from the well-known MA def-
inition of IQ in Appendix. The standardized growth differences of MA (plotted in
Fig. 16) are inversely proportional to CA. On the log–log plot of Fig. 16, the inverse
relation becomes a straight line with negative unit slope. Fig. 16 compares commonly
used infant and childrenÕs IQ tests. The data fit such a line well between 4 months and
12 years, over a factor of 36 in ages and in growth rates. Table 2 lists for a variety of
IQ tests the empirical growth coefficient b in the inverse relation (for mathematical
details, see Appendix). The theoretical value (based on early 20th century tests) of
b is 6.7 and is in fair agreement with the data based on late 20th century tests.

Fig. 13. Standardized growth: WISC-III verbal.

Fig. 14. Standardized growth: WISC-III performance.


ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 19

Fig. 15. WISC-III performance, verbal, and full scale standardized growth. SD ¼ 15.

Fig. 16. Standardized differences for IQ tests and predicted line from Eq. (A.3). See also Table 2.

The growth constant b is slightly larger (ca. 10%) in earlier tests, presumably be-
cause less inclusive samples were used for standardization, which results in a smaller
SD and therefore a larger growth function. The Stanford–Binet appears to be the ex-
ception that proves the rule; it merits further investigation.

Construction of mental ability growth functions

We make a growth function J by taking the sum of standardized differences from


IQ data from birth onward (Fig. 16). Fig. 17 shows the growth function J , which is
logarithmic. Remarkably, J rises to half of its adult value in only a little more than a
year. The Appendix shows the mathematics and also a simplified model of standard-
ized growth to show how such a function is independent of the test.
ARTICLE IN PRESS
20 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Table 2
The growth constant b for IQ tests
Test Year Ages ba SEM
Binet–Simon (Burt) 1922 3–12 7.0 0.2
Yerkes point scale 1923 5–12 6.6 1.1
312–7 5.3 0.3
Stutsman (Merrill–Palmer) 1931 112–514 6.8 0.4
Stanford Binet (Terman) 1916 4–12 7.9 0.2
(Terman and Merrill) 1937 2.3–12 6.7 —
(R.L. Thorndike) 1986 2.3–12 5.2 0.3
(G.H. Roid) 2003 2–12 4.5b 0.5
Wechsler
Wechsler–Bellevue 1939 7.5–12 7.1 0.6
WPPSI-R 1989 3–6 6.2 0.2
WISC-III 1991 6–12 6.2 0.5
Bayley
First publication 1933 0.33–3 10.5 0.4
BSID 1969 0.33–2.5 7.6 0.3
BSID-R 1994 0.33–3 6.2 0.5
CogAT (Riverside) 1992 5–12 6.3 0.4
Average: All tests 1917–1994 0.33–12 6.5 0.4
Newer tests 1986–1994 0.33–12 5.9 0.2
Corrected value 1986–1994 0.33–12 6.8 0.3
a
Uncorrected, unless so mentioned.
b
Corrected for reliability and step size.

Unintuitive as this result may seem, it is undeniable. Moreover, this result correlates
with physical brain development. Furthermore, Chugani (1994) found by means of
positron spectroscopy that every part of the brain becomes functional in the first year.
Along these lines, it can be showed, inter alia, that at birth variation is almost en-
tirely due to maturation and for adults, test score differences reflect ability, not

Fig. 17. Growth functions for IQ tests from Fig. 16 and for the human brain.
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 21

maturation. These conclusions of psychological significance are among those that fol-
low from the growth function developed here, but do not depend on any comparison of
the real size of the SD of mental ability. Parenthetically, one could derive some of the
conclusions in this paper without the growth function and just by using standardized
differences. However, the procedures would be much less direct and less transparent.
It should be clear from the discussion given earlier in this paper that this result
does not necessarily mean that a 1-year-old has half the intelligence of an adult. Such
a statement would mean that the unit of the J scale, the SD, meant the same thing
throughout the life cycle. This paper makes no such apples and oranges claim. That
such a claim would be unwarranted has been pointed out by many previous investi-
gators (Flanagan, 1951; Jensen, 1969; Schulz & Nicewander, 1997; and especially
Yen, 1986, pp. 312 ff.).
GesellÕs (1928) intelligence scale, a prescient conjecture, was close to the present
growth function. He modeled it after the Weber–Fechner law, one of psychologyÕs
oldest principles, which is discussed in the remarks on natural laws section at the be-
ginning of this paper. His growth rate of intelligence was inversely proportional to
chronological age CA. He integrated it and also got a logarithmic relation. However,
he did not solve the problem of the divergence of the log function at both ends.
Because J is defined in SD units, the extension to non-average test scores is trivial.
One simply adds to J the standard score z. For example. an average 10-year-old
(IQ ¼ 100) has a growth function J ¼ 29:5. For a 10-year-old with IQ ¼ 85, the
growth function is J  1 ¼ 28:5; for an IQ of 115, the value is J þ 1 ¼ 30:5.

Comparison of mental ability and other growth scales

Intelligence and achievement

Fig. 18 and Table 1 compare standardized growth of achievement and intelligence


test scores (average of the most widely used tests). Other than aligning both func-
tions at the beginning, there are no adjustable constants. The difference in K-12
growth is less than 10%.

Fig. 18. Standardized growth for IQ and achievement tests. See Table 1.
ARTICLE IN PRESS
22 W. Lichten / Developmental Review xxx (2004) xxx–xxx

This is comparable to the coefficient of variation among full scale tests of the same
kind (IQ or achievement) and thus shows that growth of both types of tests are in-
distinguishable.

Infant mental and motor scales

Fig. 19 compares the standardized growth curves of the Bayley Mental and Motor
Infant Scales of Development (B.S.I.D., 1993):
The Mental Scale includes items that assess memory, habituation, problem solving, early
number concepts, generalization, classification, vocalizations, language, and social skills.
The Motor Scales assesses control of the gross and fine muscle groups. This includes. . .roll-
ing, crawling and creeping, sitting, standing, walking, running, and jumping. . .items. . .not
concerned with functions generally perceived as ÔmentalÕ or included in intelligence scales.
(Bayley, 1993, p. 1)

This claim is supported by the average correlation between Bayley Mental and
Motor Scales, which is only 0.45 (Bayley, 1993). The mental scales correlate well with
the WPPSI (Wechsler Pre-school and Primary Scale of Intelligence: r ¼ 0:73, typical
for two IQ tests); the motor scales correlate less well with WPPSI (r ¼ 0:41).
Yet both growth curves are remarkably the same. Here correlation and growth do
not go hand in hand. It appears that growth reflects more fundamental and general
factors than those shown by correlation.
Discussion: Theme-park psychology?

Growth is independent of race, SES, and Flynn effect

Sternberg (2000) has criticized what he calls ‘‘theme-park psychology,’’ a study of


human behavior under narrowly limited conditions. A psychological principle, like

Fig. 19. Bayley Mental and Motor standardized growth functions.


ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 23

any natural law, should hold for all times, places, and cultures. By going to different
eras, different races, and varying socio-economic status (SES), the author has gone as
far as possible with currently available data to avoid theme-park psychology. Table 2
shows that the law of growth has held over the entire history of IQ tests (but see re-
marks on Table 2). A comparison of mental growth between industrialized and non-
industrial cultures is not available. For the raison d’^etre of this paper is a well-stan-
dardized test, which is a hallmark of the developed world.
Within this sector, a pronounced source of diversity is race. The black–white men-
tal test gap, about 1 SD, has been known for nearly a century (Jencks & Phillips,
1998). Fig. 20 plots achievement as standardized growth (in SD units) for two races
based on the CAT (CTB, 1987). African-Americans have the same standard score at
all school ages. When gains in achievement between two ages are compared, the
black–white difference cancels and African-Americans neither lose nor gain relative
to the general population. The actual K.8-12.8 gains are 5.82 SD for the entire pop-
ulation and 5.91 SD for African-Americans, a difference of only 1.5%. Similar results
hold for other SES indices, such as income, parental education, ethnicity, suburban–
urban–rural, south–north, etc., which often have consistent standard scores.
There are no racial differences in the mental scale at birth (Bayley, 1965). Differ-
ences in test scores at school age then must represent corresponding deviations in
growth rate at some age. However, the differences are relatively small on the growth
scale (expression (A.40 )). A 1 SD difference at adolescence represents only a 3% dif-
ference in standardized growth over the entire period from birth onward. (The exact
meaning of this number, of course, is subject to the oranges and apples caveat ex-
pressed throughout this paper.) This difference is established approximately in the
second year of life, when the infant is learning to talk (Bayley, 1954, Fig. 1; Garber,
1988, Fig. 4-2). This may be connected with the observation that permanent IQ

Fig. 20. Black and White standardized growth for CAT achievement test are the same (see Fig. 22).
ARTICLE IN PRESS
24 W. Lichten / Developmental Review xxx (2004) xxx–xxx

differences are largely formed in the home and depend on the vocabulary and other
characteristics of the infantÕs parent(s) or caregiver (Farkas, 2001; Farkas & Beron,
in press; Hart & Risley, 1995).
Like racial and class differences, the Flynn effect, the worldwide secular advance
in intelligence test scores with time, is approximately independent of age and thus
leaves the standardized growth of mental ability unaffected. (Flynn, 1984, 1987;
Neisser, 1998).
The present ‘‘law of intelligence’’ is too universal to accuse it of being theme-park
psychology. Nevertheless the tests used were standardized largely on US popula-
tions. It would be desirable to see if the law holds in other countries, especially in
underdeveloped nations.
How does the mind grow? It grows like the nervous system; it grows with the nervous sys-
tem.
(Gesell & Ilg, 1943, p. 9)

Physical and mental growth

Physical growth is measured in objective units, such as meters and kilograms.


Mental growth is defined here in SD units. Fig. 21 compares typical growth curves
for different human body systems:
Lymphoid type: thymus, lymph nodes, and intestinal lymph masses.

Fig. 21. Growth curves for several body organ systems (Scammon, 1930).
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 25

Brain and head type: brain and head size, standardized mental ability (redrawn by
author).
General type: body as a whole, external dimensions (except head), respiratory and
digestive organs, kidneys, aortic and pulmonary trunks, musculature, and blood vol-
ume.
Reproductive type: testis, ovary, epididymis, prostate, seminal vesicles, and fallo-
pian tubes.
Fig. 17 compares the mental growth function with that of the human brain. The
similarity between mental and neural growth curves is remarkable and unique.

Objections and trivializations of the growth function

Growth of the mind and of big toes

Skeptics mock Gesell and Ilg along these lines: ‘‘The big toe grows with the mind.
Therefore the mind resides inside of our big toes.’’

Reply

This joke is a simplification which treats all growth the same. On the contrary,
each body system has a growth spurt at its own age (see Fig. 21). Sex shows a growth
spurt at puberty when mental growth is almost frozen; conversely the mental-motor-
brain growth spurt occurs perinatally when reproductive organ growth is nearly nil.
That reproductive and mental ability growth spurts occur at different times in the life
cycle jibes with the smallness of male-female IQ differences. Epstein (1979) claimed
the existence of non-perinatal growth spurts, but these have not been confirmed.
Skeletal growth shows a very different pattern from the brain and mental ability
(see Fig. 21). Hence growth patterns confirm Plato in that the mind resides in our
heads, not in our big toes.
Some have even suggested that ability and achievement measures are so much alike as to be
virtually the same thing.
(Gridley & Roid, 1998, p. 257)

Reply

The tests considered in this paper are not necessarily ‘‘the same thing.’’ IQ and
achievement tests have common features, such as vocabulary, but also have differ-
ences, such as spelling and maze performance. Tests or subtests can have substantial
correlations with each other (typically, ca 0.7), grow in the same way, and yet be
quite different (Campbell & Fiske, 1959). Examples are the Wechsler Verbal and Per-
formance scales (r ¼ 0:66. See Fig. 15). The Bayley Mental and Motor Scales only
correlate modestly (average value 0.45), but grow together in a lock step. The items
(fine hand movements vs. verbal proficiencies) are different, yet share a common law
of growth.
ARTICLE IN PRESS
26 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Does J apply to education? An achievement gap

Critics of American education have claimed that racial minorities and the poor
are short-changed by the public schools. Minority children start school only a few
months behind majority children, fall further behind, and finally graduate from high
school at only the eight or ninth grade level on standardized achievement tests (Cole-
man et al., 1966). On the other hand, Jensen (1973, pp. 97–102) noted that the edu-
cational achievement gap, expressed in SD, remained essentially the same throughout
the school years (see Fig. 20). Thus, according to this statistic, schools are educating
minority pupils as well as the majority.
But on IRT scaled tests like the CTBS or CAT, the decrease of the SD with grade
shows minority students catching up! Could this mean that their education is better
than that of the majority? Hardly (see Fig. 22).
The inconsistency of these conclusions results from the difficulty of comparing a
SD at one age with that at another age, as emphasized earlier in this paper (for ex-
ample, see paragraph on Extension of the method of standardized growth). One can-
not accept a conclusion that depends on which test scale one uses. On the contrary,
the sameness of standardized growth on full scale tests, despite differences among so-
cial, ethnic, racial groups and eras, merely shows the underlying lawfulness of mental
growth, but says little about the schools. One might view skeptically the often aimed,
but seldom achieved, goal of eliminating group test score differences (Jencks & Phil-
lips, 1998).

Evaluation

Based on YenÕs powerful method of standardized growth, this paper presents a


new psychometric yardstick of mental development. We now address several ques-
tions to evaluate this measure. Is the new approach really needed? If so, how does

Fig. 22. Based on IRT scales of the CAT E/F, minority achievement catches up (see Fig. 20).
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 27

it meet the needs? Does it have advantages over the highly developed techniques
presently used in psychometrics?

Is it really needed?

This paper has emphasized that growth is a pre-eminent aspect of mental ability,
whether it be intelligence or achievement. The current deviation scale of intelligence
gives no direct measures of growth and must rely on correlational data obtained
from difficult longitudinal studies. The situation is better in achievement testing,
where some methods, such as Rasch and IRT, agree fairly well with each other
and with the standardized growth method, as Table 1 shows.
It appears that the reason for the agreement is that IRT scales are also expressed
in SD units. Lord (1975, 1980) pointed out that the IRT method does not lead to a
unique ability scale. He showed that any transformation of the IRT ability scale can
be used in principle. It is proved in Appendix of this paper that the standardized
growth functions based on any transformation of the IRT scale are mathematically
identical. Thus the J scale is unique.
However, achievement tests only cover the school ages, which make up only the
last 20% of brain and standardized mental growth. The crucial early years, which
contain the major developments of language, social, and motor functioning, are left
out. Furthermore, psychometric measures of growth were disconnected. Prior to this
paper, there were few direct comparisons of the growth of intelligence with that of
achievement.
Although current item analysis techniques function well in constructing mental
ability (especially computerized) tests, we are left with little insight. The IRT method
of grading a mental ability test is so complex that it involves a computer program
that only a specialized coterie of psychometricians can fathom. Furthermore, it
should be noted that a typical achievement battery is not just a single test. Because
of the fierce growth among young schoolchildren, psychometricians must administer
a series of tests, each designed to fit a particular grade level. To link the scales from
these smaller tests into a grand k-12 scale involves vertical scaling procedures (Kolen
& Brennan, 1995), which do nothing to remove the mystery of test construction. This
disconnect between measurement expertise and developmental psychology and edu-
cational practice is unhealthy.

A comprehensive mental growth measure: Birth to adulthood. How it meets the needs

YenÕs method of standardized growth translates among mental ability tests and
puts them all on a common footing. It is used here to put growth back into IQ
and to link it with achievement. It also connects infant, child, and adult tests.

Advantages over present techniques

A central point of the method of standardized growth is its simplicity. The au-
thorÕs calculations often were done on a pocket calculator and, at times, with pencil
ARTICLE IN PRESS
28 W. Lichten / Developmental Review xxx (2004) xxx–xxx

and paper, since only elementary arithmetic was needed. To see how simple the cal-
culations are, read YenÕs paper (1986) and follow her calculations, or do the same for
the example in the Appendix. Item analysis and test equating may remain useful for
test construction, especially for computerized exams. However, standardized growth
functions are simpler and more versatile.
The controversy over which tests and scales are best, which should have been
ended by YenÕs paper (1986), becomes moot. For example the mental age scale, over
its range of applicability (0–12 years) and when its growth is standardized, is as good
a measure as any and has been shown here to possess some simplicities and advan-
tages. It is the only function that covers growth from birth to the teens (ca. 90% of
the total) in an unambiguous numerical formula.
This paper agrees with the social psychologist, Allport (1942), who concluded that
neither a purely nomothetic nor idiographic approach to psychology can suffice. Hu-
man mental ability, its magnitude, its variation among individuals, and its growth
with age are inextricably connected.

Further research. Probing the limits of validity of the law of growth

The present paper has pushed down the law of growth (Stern relation: Fig. 1; Eq.
(A.1); and rIQ ¼ 15) from its former value of 2 years (Stanford–Binet test) to 4
months (B.S.I.D.). As pointed out by the author (Lichten, 2002), corrections for ges-
tation may account for at least part of the deviations from the Stern relation below
the CA of 4 months. The corrected Stern relation may hold at even earlier ages.

Summary

This paper has found a new measure of growth which applies quantitatively, con-
sistently, and lawfully to intelligence, achievement, infant motor ability, and brain
development. When measured by total standardized growth, differences on full scale
tests, among social groups, epochs, and schools are small.
The long sought law of intelligence, given here in a test-independent form, applies
to full scale measurements of mental ability, both for IQ and achievement at all ages
up to adolescence. It is expressed in Appendix in three mathematically equivalent
ways.

Acknowledgments

Visitor 1998–1999 at the Educational Testing Service, Fellow 1994–2003 of the


Yale Institution for Social and Policy Studies. The author thanks W. Gilliam, T.
Goldsmith, R. E. Keen, J. Kihlstrom, C. Levinson, L. Mazes, U. Neisser, W. R.
Overton, R. Sternberg, H. Wainer, R. Wyman, K. Wynn, and E. Zigler for helpful
suggestions and encouragement. Preliminary versions of this paper were given by the
author (Lichten, 1993, 1996) and at meetings held by the Eastern Psychological
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 29

Association, Wash. DC, 1997 and Boston, MA, 1998; New England Psychological
Association, New London, CT, 1996; American Psychological Association, Wash.,
DC, 1998 and New Orleans, 2002.

Appendix

This appendix collects mathematical equations and also presents a simplified,


non-mathematical model of standardized growth for readers who prefer to skip
equations.

IQ, MA, and the mathematical form of the law of intelligence


IQ and growth were related before Wechsler (1939) by well-known mental age
definition of IQ (Stern, 1914)
MA
IQ ¼ 100 ; ðA:1Þ
CA
where CA is chronological age and MA is mental age.

Example

A 10-year-old with the mental ability of an 11.5-year-old has an


IQ ¼ 100  11:5
10
¼ 115.
The standard deviation of the IQ distribution was approximately 15 at each age.
This fact was the basis of WechslerÕs deviation definition of IQ, which dropped the
mental age Eq. (A.1), and defined the mean IQ to be 100 and the standard deviation
to be 15 points. It also is the basis of the law of intelligence, as presented here.
To compare growth and variation of mental ability, we go back to our 10-year-old
with MA ¼ 11.5 and IQ ¼ 115, one SD above average, a deviation within the normal
range of ability. This deviation is equivalent to 11:5  10 ¼ 1:5 years, only 15% of
the childÕs total age. Thus global features of growth overshadow local variations
(at a given age) in mental ability.
Strictly speaking, this argument violates the earlier statement in this paper that
one cannot equate mental growth scores at different chronological ages. However,
the dominance of growth over variation is huge on any scale. For example, a 10-
year-old who scored at the level of an average 4-year-old on the Stanford–Binet4
would have an IQ of 40 on the MA scale and zero on either the deviation (standard
score ca )6) or standardized growth (see expression (A.3)) scales. Such an IQ is far
outside the normal range of variation in test scores.

The law of intelligence: Derivation and three mathematically equivalent formulae

From expression (A.1) and since rIQ ¼ 15, we have the first statement of the law
of intelligence
ARTICLE IN PRESS
30 W. Lichten / Developmental Review xxx (2004) xxx–xxx

rMA ¼ 0:15CA: ðA:2Þ


We calculate the standardized growth of the MA scale as follows. To find the
standardized differences, find the quotient of the annual increase of MA and its SD
from expression (A.2). Since MAAV ¼ CA, its annual growth is 1 year, i.e.,
MAðCA þ 1Þ  MAðCAÞ ¼ 1 year.
A combination of these relations gives a second expression: the standardized
growth rate of mental ability (quotient of annual growth and standard deviation)
is given by
100
G:R: ¼ : ðA:3Þ
15  CA
Integration of this expression gives the third relation for the standardized growth J :
Z
6:7
J ¼ dðCAÞ ¼ 6:7 lne ðCAÞ þ Const: ðA:4Þ
CA
An alternative form gives the standardized growth of mental ability between two
ages CA1 and CA2 :
 
100 CA2 0
J2 J1 ¼ lne : ðA:4 Þ
15 CA1

A simplified model of standardized growth


Three imaginary achievement tests measure the growth of a mental ability, which
steadily increases with grade level. We further imagine the standard deviation of this
trait to be one grade level at all ages. Three tests show different patterns of growth
for average raw test score as shown in Fig. 23.
The raw score at a particular grade is the total number of dots from the left-hand
point (beginning of grade 0, i.e., kindergarten) to the grade in question.
In test ‘‘S’’ the items are evenly spaced. Test ‘‘C’’ has items concentrated at lower
grade levels; test ‘‘I’’ items crowd together at higher grade levels.
In test ‘‘S,’’ since there are simply five dots per grade, the average raw score (total
number of dots up to the grade in question in Fig. 23 and Table 3) increases evenly.
The raw score, shown in Fig. 24, falls on a simple straight line with slope equal to
five. The raw score SD is the number of dots (simply five in this case) in an interval

Fig. 23. A model of standardized growth. Each dot is for a single item; total number: 60.
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 31

of one standard deviation ¼ one grade. Since the SD is constant, its graph is a hor-
izontal straight line in Fig. 24.
In test ‘‘C’’ the items are unevenly spaced, with the items crowded together at be-
ginning grades (see Fig. 23). Correspondingly, the annual steps in the mean raw
score become smaller with grade level (see Table 4).
The ‘‘C’’ raw score graph has negative curvature (is concave downward) and its
SD graph slopes downward in Fig. 25. Mean and SD are obtained again by a simple
count of dots in Fig. 24. Since the items (dots) are crowded to the left, both mean and
SD plots are curved. Both the slope of the curve for means and the height of the SD
decrease toward the right.
In test ‘‘I,’’ the item spacing becomes small at higher grades (see Fig. 23 and Table
5). The test score has positive curvature (is concave upward) and the SD slopes up-
ward, as shown in Fig. 26.

Table 3
A model of growth. Test ‘‘S’’: Raw score has uniform growth rate
Grade Raw score Yearly Standard Standardized Standardized
growth deviation growth rate growth
0 0 5 0
1 5 5 5 1 1
2 10 5 5 1 2
3 15 5 5 1 3
4 20 5 5 1 4
5 25 5 5 1 5
6 30 5 5 1 6
7 35 5 5 1 7
8 40 5 5 1 8
9 45 5 5 1 9
10 50 5 5 1 10
11 55 5 5 1 11
12 60 5 5 1 12

Fig. 24. Average raw score and standard deviation for test ‘‘S’’ (count dots in Fig. 23).
ARTICLE IN PRESS
32 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Table 4
Test ‘‘C’’: Raw score growth slows down with grade
Grade Raw score Yearly Standard Standardized Standardized
growth deviation growth rate growth
K 0.0 7.4 0
1 7.2 7.2 7.0 1 1
2 14.0 6.8 6.6 1 2
3 20.3 6.4 6.2 1 3
4 26.3 6.0 5.8 1 4
5 31.9 5.6 5.4 1 5
6 37.1 5.2 5.0 1 6
7 41.9 4.8 4.6 1 7
8 46.3 4.4 4.2 1 8
9 50.4 4.0 3.8 1 9
10 54.0 3.6 3.4 1 10
11 57.2 3.2 3.0 1 11
12 60.0 2.8 2.6 1 12

Fig. 25. Mean and SD for test ‘‘C.’’

Tables 3–5 and Fig. 27 show the computation and plots of standardized growth
for the three types of tests. All three growths and SDs are the same; the process of
standardization irons out the apparent differences among the three types of tests
to reveal the true (test independent) growth of the trait.

Proof of the equivalence of classical and IRT growth functions


We connect true score T of classical theory and ability hðCAÞ of item response
theory, which are functions of two variables CA and z:
T ðCA; zÞ ¼ Tav ðCAÞ þ rT ðCAÞz
ðA:5Þ
hðCA; zÞ ¼ hav ðCAÞ þ rh ðCAÞz:
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 33

Table 5
Test ‘‘I’’: Growth speeds up with grade
Grade Raw score Yearly Standard de- Standardized Standardized
growth viation growth rate growth
0 0.0 2.6 0
1 2.8 2.8 3.0 1 1
2 6.1 3.2 3.4 1 2
3 9.7 3.6 3.8 1 3
4 13.7 4.0 4.2 1 4
5 18.1 4.4 4.6 1 5
6 22.9 4.8 5.0 1 6
7 28.1 5.2 5.4 1 7
8 33.7 5.6 5.8 1 8
9 39.7 6.0 6.2 1 9
10 46.1 6.4 6.6 1 10
11 52.9 6.8 7.0 1 11
12 60.0 7.2 7.4 1 12

Fig. 26. Mean and SD for test ‘‘I.’’

Fig. 27. Standardized growth for all three tests is the same.
ARTICLE IN PRESS
34 W. Lichten / Developmental Review xxx (2004) xxx–xxx

We consider the number-right score for a given test as a function of ability, the test
characteristic function fðhÞ. That we can write f as a function of a single variable h
is a consequence of the usual assumption of IRT that the test items are unidi-
mensional.
The dispute as to whether mental tests are unidimensional is nearly a century old,
goes back to Spearman and Thurstone, and is still current. But, as discussed in this
paper, WilksÕ theorem implies that correlations between different tests approach
unity as test lengths become infinite. Thus full scale, long tests may appear to be uni-
dimensional, even if the subtest contents are multidimensional. Yen (1985) found ev-
idence of multidimensionality in an achievement test, especially in the mathematics
subtests. However, this occurred in the high school grades, which age range was
ruled out for the present paper on the basis of lack of lawful behavior.
We return to the argument and consider the standard growth rate of the test
characteristic function:
dfav
G:R:ðfÞ ¼ : ðA:6Þ
rf dðCAÞ
We take differentials
df
df ¼ dh: ðA:7Þ
dh
which imply
df
rf ¼ rh : ðA:8Þ
dh
We also have, by the chain rule, the relation
dfav df dhav
¼ : ðA:9Þ
dðCAÞ dh dðCAÞ
Combining the last two equations, we obtain the standard growth rate for f:
1 dfav 1 dhav
G:R:ðfÞ ¼ ¼ ¼ G:R:ðhÞ: ðA:10Þ
rf dðCAÞ rh dðCAÞ
Denoting by J and H the corresponding total growth between two ages CA0 and
CA, we have the expressions
Z CA Z CA
1 dfav 1 dhav
J ðCAÞ  J ðCA0 Þ ¼ dðCAÞ ¼ dðCAÞ
CA0 r f dðCAÞ CA0 rh dðCAÞ

¼ HðCAÞ  HðCA0 Þ: ðA:11Þ


The growth function found from the number-right score for any test formed from
a unidimensional set of test items will be the same as the growth function of the
ability variable itself.
A corollary is that any transformation xðhÞ of the ability variable has the same
growth function as h itself. We merely rename the transform f and follow the
proof immediately above. Thus the growth function obtained from YenÕs standard-
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 35

ized differences, whether formed from the classical test score or from the IRT abil-
ity variable, forms an invariant measure of mental ability. This gives a theoretical
basis for the agreement between standardized growth and item analysis shown in
Table 1.

References

Allport, G.W. (1942). The use of psychological documents in psychological science. New York: Social
Science Research Council, Bulletin 49.
Anderson, J. E. (1939). The limitations of infant and preschool tests in the measurement of intelligence.
Journal of Psychology, 8, 351–379.
Atkins vs Virginia (2002). US Supreme Court decision 00-8452.
Bayley, N. (1954). Some increasing parent–child similarities during the growth of children. Journal of
Educational Psychology, 45, 1–21.
Bayley, N. (1965). Comparison of mental and motor test scores for ages 1–15 months by sex, birth order,
race, geographical location and education of parents. Child Development, 36, 379–411.
Bayley, N. (1993). Bayley scales of infant development. Manual (2nd ed.). San Antonio, TX: Psychological
Corporation.
Binet, A., Simon, T. (1916). The development of intelligence in children (The Binet–Simon scale)
(Translated by E. Kite). Baltimore, MD: Williams and Wilkins (reprinted by Ayer, Salem, NH).
Bloom, B. S. (1964). Stability and change in human characteristics. New York: Wiley.
Bock, R. D. (1983). The mental growth curve reexamined. In D. Weiss (Ed.), New horizons in testing.
Latent trait test theory and computerized adaptive testing (pp. 205–218). New York: Academic Press.
Boring, E. G. (1923). Intelligence as the tests test. The New Republic (June 6), 35–37.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-
multimethod matrix. Psychological Bulletin, 56, 81–105.
Chugani, H. (1994). Development of regional brain glucose metabolism in relation to behavior and
plasticity. In G. Dawsom & K. W. Fischer (Eds.), Human behavior and the developing brain. New York:
Guilford Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Coleman, J.S., Campbell, E., Hobson, C., McPartland, J., Mood, A., Weinfeld, F., York, R. (1966).
Equality of educational opportunity (Washington, DC: US Department of Health, Education and
Welfare OE-38001, US Government Printing Office, Catalog number FS 5.328.38001).
CTB/McGraw-Hill (1987). California achievement tests. Forms E and F. Levels 10–20. Technical report.
Table 49. (CTB/McGraw Hill, Monterey, CA).
Engen, T. (1971). Psychophysics. In J. W. Kling & L. A. Riggs (Eds.), Woodworth & SchlosbergÕs
experimental psychology (pp. 11–86). New York: Holt, Rinehart & Winston.
Epstein, H. (1979). Growth spurts during brain development: Implications for educational policy. In J. S.
Chall & A. F. Mirsky (Eds.), 1978 Yearbook of the national society for the study of education (pp. 343–
370). Chicago: University of Chicago Press.
Farkas, G. (2001). Family linguistic culture and social reproduction: Verbal skill from parent to child in
the preschool and school years. Paper presented at the session on Consequences of child poverty and
deprivation, at the Annual Meeting of the Population Association of America, Washington, DC,
March 31. Available http://www.pop.psu.edu/~farkas/paa301.pdf.
Farkas, G., Beron, K. (in press). The detailed age trajectory of oral vocabulary knowledge: Differences by
class and race. Social Science Research.
Flanagan, J. C. (1951). Units, scores and norms. In E. F. Lindquist (Ed.), Educational measurement (pp.
695–763). Washington: American Council on Education.
Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932–1978. Psychological Bulletin, 95, 29–
50.
ARTICLE IN PRESS
36 W. Lichten / Developmental Review xxx (2004) xxx–xxx

Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin,
101, 171–191.
Furfey, P. H., & Muehlenbein, J. (1932). The validity of infant intelligence tests. Journal of Genetic
Psychology, 40, 219–223.
Garber, H. (1988). The Milwaukee project. Preventing mental retardation in children at risk. Washington:
American Association on Mental Retardation.
Gardner, H. (1983). Frames of mind. The theory of multiple intelligences. New York: Basic Books.
Gesell, A. (1928). Infancy and human growth. New York: MacMillan.
Gesell, A., & Ilg, F. L. (1943). Infant and child in the culture of today. New York: Harper.
Greenhouse, L. (2002). The Supreme Court: The death penalty; citing Ônational consensus,Õ justices bar
death penalty for retarded defendants. New York Times. June 21, A1.
Gridley, B. E., & Roid, G. H. (1998). The use of the WISC-III with achievement tests. In A. Profitera & D.
Saklofske (Eds.), WISC III (pp. 249–288). San Diego, CA: Academic Press.
Gulliksen, H. (1987). Theory of mental tests. Hillsdale, NJ: Erlbaum (reprint of 1950 book).
Hart, B., & Risley, T. (1995). Meaningful differences in the everyday experience of young American children.
Baltimore: Paul Brookes.
Heinis, H. (1924). La loi du developpement mental. Archives de Psychologie, 19, 97–127.
Jencks, C.& Phillips, M. (Eds.). (1998). The black–white test score gap. Washington: Brookings.
Jensen, A. R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational
Review, 39, 1–123.
Jensen, A. R. (1973). Educability and group differences. New York: Harper & Row.
Jensen, A. R. (1993). Psychometric g and achievement. In B. R. Gifford (Ed.), Policy perspectives on
educational testing (pp. 117–227). Boston: Kluwer.
Jensen, A. R. (1998). The g Factor. The science of mental ability. Westport, CT: Praeger.
Keats, J. A. (1982). Ability measures and theories of cognitive development. In H. A. Wainer & S. Messick
(Eds.), Principals of modern psychological measurement: A festschrift for Frederic M. Lord. Hillsdale,
NJ: Erlbaum.
Kolen, M. J., & Brennan, R. L. (1995). Test equating. Methods and practice. New York: Springer.
Layzer, D. (1972). Science or superstition. A physical scientist looks at the IQ controversy. Cognition:
International Journal of Cognitive Science, 1, 265–299.
Lichten, W. (1993). The big bang model of the growth of intelligence. Proceedings and abstracts of the annual
meeting of the Eastern Psychological Association (p. 66). Arlington, VA: Eastern Psychological
Association (unpublished).
Lichten, W. (1996). The big bang model of the growth of intelligence. Confirmation. Proceedings and
abstracts of the annual meeting of the Eastern Psychological Association (p. 86). Washington, DC:
Eastern Psychological Association (unpublished).
Lichten, W. (2002). Are all men created equal? Unpublished paper delivered at the New Orleans meeting of
the American Psychological Society.
Lichten, W., Wainer, H. (2004). IQ: A matter of life and death. Paper given at the American Psychological
Society, 16th Annual Convention, Chicago, IL May 25–30.
Licklider, J. C. R. (1951). Basic correlates of the auditory stimulus. In S. S. Stevens (Ed.), Handbook of
experimental psychology (1st ed., pp. 985–1039). New York: Wiley.
Luce, R. D., Bush, R. R., & Galanter, E. (1963). Psychological scaling. In R. D. Luce, R. R. Bush,
& E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 2, pp. 245–307). New York:
Wiley.
Luce, R. D., & Krumhansl, C. L. (1988). Measurement, scaling and psychophysics. In R. C. Atkinson, R.
J. Herrnstein, G. Lindzey, & R. D. Luce (Eds.), StevensÕ handbook of experimental psychology (2nd ed.,
pp. 3–74). New York: Wiley.
Luce, R. D., & Suppes, P. (2002). Representational measurement theory. In H. Pasler & J. Wixted (Eds.),
StevensÕ handbook of experimental psychology (4, (3rd ed., pp. 1–41). New York: Wiley.
McCall, R. B., Eichorn, D. H., & Hagerty, P. S. (1977). Transitions in early development. Monographs of
the Society for Research in Child Development (Ser. No. 171, 42, No. 3).
Neisser, U. (Ed.). (1998). The rising curve: Long-term gains in IQ and related measures. Washington:
American Psychological Association.
ARTICLE IN PRESS
W. Lichten / Developmental Review xxx (2004) xxx–xxx 37

Piaget, J., & Inhelder, B. (1969). The psychology of the child. New York: Basic.
Salovey, P., & Mayer, J. D. (1990). Emotional intelligence. Imagination, Cognition, and Personality, 9, 185–
211.
Scammon, R. E. (1930). The measurement of the body in childhood. In J. A. Harris, C. M. Jackson, D. G.
Paterson, & R. E. Scammon (Eds.), The measurement of man (pp. 173–215). Minneapolis: University of
Minnesota.
Schulz, E. M., & Nicewander, W. A. (1997). Grade equivalent and IRT representations of growth. Journal
of Educational Measurement, 34, 315–331.
Spearman, C. (1927). The abilities of man: Their nature and measurement. New York: MacMillan.
Stern, W. (1914). The psychological methods of testing intelligence (translated by G. Whipple). Baltimore:
Warwick & York.
Sternberg, R. (1997). The concept of intelligence and its role in lifelong learning and success. American
Psychologist, 52, 1030–1037.
Sternberg, R. (2000). Theme-park psychology: A case study regarding human intelligence and its
implications for education. Educational Psychology Review, 12, 247–268.
Stevens, S. S. (1951). Mathematics, measurement and psychophysics. In S. S. Stevens (Ed.), Handbook Of
experimental psychology (1st ed.). New York: Wiley.
Stevens, S. S. (1975). In G. Stevens (Ed.), Psychophysics: Introduction to its perceptual, neural, and social
prospects. New York: Wiley.
Stevens, S. S., & Davis, H. (1938). Hearing. Its psychology and physiology. New York: Wiley.
Suppes, P., & Zinnes, J. L. (1963). Basic measurement theory. In R. D. Luce, R. R. Bush, & E. Galanter
(Eds.), Handbook of mathematical psychology (Vol. 1, pp. 1–76). New York: Wiley.
Terman, L. M. (1916). The measurement of intelligence. Boston: Houghton Mifflin.
Terman, L. M., & Merrill, M. A. (1937). Measuring intelligence: A guide to the administration of the new
revised Stanford–Binet tests of intelligence. Boston: Houghton-Mifflin.
Thorndike, E.L., Bregman, E.O., Cobb, M.V., Woodward, E. (1927). The measurement of intelligence.
New York: Bureau of publications, Teachers College, Columbia University (Reprint Edition 1973,
Arno Press).
Thurlow, W. R. (1971). Audition. In J. W. Kling & L. A. Riggs (Eds.), Woodworth & SchlosbergÕs
experimental psychology (pp. 223–272). New York: Holt, Rinehart & Winston.
Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational
Psychology, 16, 433–451.
Thurstone, E. L. (1928). The absolute zero in intelligence measurement. Psychological Review, 35, 175–
197.
Thurstone, E. L., & Ackerson, L. (1929). The mental growth curve for the Binet tests. Journal of
Educational Psychology, 20, 569–583.
Torrance, E. P. (1988). The nature of creativity as manifest in its testing. In R. J. Sternberg (Ed.), The
nature of creativity. Contemporary psychological perspectives (pp. 43–75). New York: Cambridge
University Press.
Torrance, E. P., & Goff, K. (1989). A quiet revolution. Journal of Creative Behavior, 23, 136–145.
Tukey, J. W. (1969). Analyzing data. Sanctification or detective work? American Psychologist, 24, 83–89.
Wechsler, D. (1939). The measurement of adult intelligence. Baltimore: Williams & Wilkens.
Wilks, S. S. (1938). Weighting systems for linear functions of correlated variables when there is no
dependent variable. Psychometrika, 3, 23–40.
Wohlwill, J. F. (1973). The study of behavioral development. New York: Academic.
Woodworth, R. S., & Schlosberg, H. (1954). Experimental psychology. New York: Holt, Rinehart &
Winston.
Yen, W. M. (1985). Increasing item complexity: A possible cause of scale shrinkage for unidimensional
item response theory. Psychometrika, 50, 399–410.
Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of
Educational Measurement, 23, 299–325.

You might also like