Download as pdf or txt
Download as pdf or txt
You are on page 1of 400

Chapter 1

Applications and Consequences


of Psychological Testing
Learning Objectives
1.1 Analyze the varied applications of 1.2 Illustrate the need for ethical and
psychological testing, its ethical and social professional standards in testing and the
consequences and factors that determine the responsibilities of test publishers
soundness of the test

1.1: The Nature and Uses endeavor. In this topic, we survey professional guidelines
that impact testing and review the influence of cultural
of Psychological Testing background on test results.

1.1 Analyze the varied applications of psychological 1.1.1: The Consequences of Testing
testing, its ethical and social consequences and
From birth to old age, we encounter tests at almost every
factors that determine the soundness of the test
turning point in life. The baby’s first test conducted
If you ask average citizens “What do you know about psy- immediately after birth is the Apgar test, a quick, multi-
chological tests?” they might mention something about variate assessment of heart rate, respiration, muscle tone,
intelligence tests, inkblots, and true-false inventories such reflex irritability, and color. The total Apgar score (0 to 10)
as the widely familiar MMPI. Most likely, their under- helps determine the need for any immediate medical
standing of tests will focus on quantifying intelligence and attention. Later, a toddler who previously received a low
detecting personality problems, as this is the common view Apgar score might be a candidate for developmental dis-
of how tests are used in our society. Certainly, there is more ability assessment. The preschool child may take school-
than a grain of truth to this common view: Measures of readiness tests. Once a school career begins, each student
personality and intelligence are still the essential mainstays endures hundreds, perhaps thousands, of academic tests
of psychological testing. However, modern test developers before graduation—not to mention possible tests for
have produced many other kinds of tests for diverse and learning disability, giftedness, vocational interest, and
imaginative purposes that even the early pioneers of test- college admission. After graduation, adults may face tests
ing could not have anticipated. The purpose of this chapter for job entry, driver’s license, security clearance, person-
is to discuss the varied applications of psychological test- ality function, marital compatibility, developmental disa-
ing and also to review the ethical and social consequences bility, brain dysfunction—the list is nearly endless. Some
of this enterprise. persons even encounter one final indignity in the frail-
The chapter begins with a panoramic survey of psy- ness of their later years: a test to determine their compe-
chological tests and their often surprising applications. In tency to manage financial affairs.
Module 1.1, The Nature and Uses of Psychological Testing, Tests are used in almost every nation on earth for
we summarize the different types and varied applications counseling, selection, and placement. Testing occurs in
of modern tests. We also introduce the reader to a host of settings as diverse as schools, civil service, industry, medi-
factors that can influence the soundness of testing such as cal clinics, and counseling centers. Most persons have
adherence to standardized procedures, establishment of taken dozens of tests and thought nothing of it. Yet, by the
rapport, and the motivation of the examinee to deceive. In time the typical individual reaches retirement age, it is
Module 1.2, Ethical and Social Implications of Testing, we likely that psychological test results will have helped to
further develop the theme that testing is a consequential shape his or her destiny. The deflection of the life course

1
2 Chapter 1

by psychological test results might be subtle, such as when t­alent of a 7-year-old girl, cognitive test results changed
a prospective mathematician qualifies for an accelerated her life trajectory for the better. In the second case involv-
calculus course based on tenth-grade achievement scores. ing the tragic saga of children exposed to lead poisoning,
More commonly, psychological test results alter individ- the test data helped redress a social injustice. In the third
ual destiny in profound ways. Whether a person is admit- situation—the impulsive candidate for police officer—
ted to one college and not another, offered one job but personality test results likely served the public interest by
refused a second, diagnosed as depressed or not—all such tipping the balance against a questionable applicant. But
determinations rest, at least in part, on the meaning of test test results do not always provide a positive conclusion.
results as interpreted by persons in authority. Put simply, In the last case mentioned above, a young student wasted
psychological test results change lives. For this reason it is time and money following the seemingly flawed guid-
prudent—indeed, almost mandatory—that students of ance of a well-known vocational inventory.
psychology learn about the contemporary uses and occa- The idea of a test is thus a pervasive element of our
sional abuses of testing. In Case Exhibit 1.1, the life-­ culture, a feature we take for granted. However, the layper-
altering aftermath of psychological testing is illustrated by son’s notion of a test does not necessarily coincide with the
means of several true case history examples. more restrictive view held by psychometricians. A psycho-
metrician is a specialist in psychology or education who
develops and evaluates psychological tests. Because of
Case Exhibit 1.1 widespread misunderstandings about the nature of tests, it
is fitting that we begin this topic with a fundamental ques-
True-Life Vignettes of Testing tion, one that defines the scope of the entire book: What is
The influence of psychological testing is best illustrated by a test?
example. Consider these brief vignettes:

• A shy, withdrawn 7-year-old girl is administered an


1.1.2: Definition of a Test
IQ test by a school psychologist. Her score is phenom- A test is a standardized procedure for sampling behavior
enally higher than the teacher expected. The student is and describing it with categories or scores. In addition,
admitted to a gifted and talented program where she most tests have norms or standards by which the results
blossoms into a self-confident and gregarious scholar. can be used to predict other, more important behaviors. We
• Three children in a family living near a lead smelter elaborate these characteristics in the sections that follow,
are exposed to the toxic effects of lead dust and suffer but first it is instructive to portray the scope of the defini-
neurological damage. Based in part on psychological tion. Included in this view are traditional tests such as per-
test results that demonstrate impaired intelligence and sonality questionnaires and intelligence tests, but the
shortened attention span in the children, the family re- definition also subsumes diverse procedures that the
ceives an $8 million settlement from the company that reader might not recognize as tests. For example, all of the
owns the smelter. following could be tests according to the definition used in
• A candidate for a position as police officer is admin- this book: a checklist for rating the social skills of a youth
istered a personality inventory as part of the selection with mental retardation; a nontimed measure of mastery in
process. The test indicates that the candidate tends to adding pairs of three-digit numbers; microcomputer
act before thinking and resists supervision from au- appraisals of reaction time; and even situational tests such
thority figures. Even though he has excellent training as observing an individual working on a group task with
and impresses the interviewers, the candidate does not two “helpers” who are obstructive and uncooperative.
receive a job offer. In sum, tests are enormously varied in their formats
• A student, unsure of what career to pursue, takes a vo- and applications. Nonetheless, most tests possess these
cational interest inventory. The test indicates that she defining features:
would like the work of a pharmacist. She signs up for • Standardized procedure
a prepharmacy curriculum but finds the classes to be
• Behavior sample
both difficult and boring. After three years, she aban-
• Scores or categories
dons pharmacy for a major in dance, frustrated that she
still faces three more years of college to earn a degree. • Norms or standards
• Prediction of nontest behavior

In the sections that follow, we examine each of these


These cases demonstrate that test results impact in- characteristics in more detail. The portrait that we draw
dividual lives and the collective social fabric in powerful pertains especially to norm-referenced tests—tests that use
and far-reaching ways. In the first story about the hidden a well-defined population of persons for their interpretive
Applications and Consequences of Psychological Testing 3

framework. However, the defining characteristics of a test inflection of a television announcer, “1-800-325-3535”?
differ slightly for the special case of criterion-referenced Such a series would be far easier to recall than a more ran-
tests—tests that measure what a person can do rather than dom set, such as, “7-2-8-1-9-4-6-3-7-4-2.” The speed of pres-
comparing results to the performance levels of others. For entation would also crucially affect the uniformity of a
this reason, we provide a separate discussion of criterion- digit span test. For purposes of standardization, it is essen-
referenced tests. tial that every examiner present each series at a constant
rate, for example, one digit per second. Finally, the exam-
iner needs to know how to react to unexpected responses
such as a subject asking, “Could you repeat that again?”
For obvious reasons, the usual advice is “No.”

Standardized procedure is an essential feature of any


psychological test. A test is considered to be standardized if An interesting point—and one little understood by the
the procedures for administering it are uniform from one lay public—is that the test items need not resemble the
examiner and setting to another. Of course, standardiza- behaviors that the test is attempting to predict. The essen-
tion depends to some extent on the competence of the tial characteristic of a good test is that it permits the exam-
examiner. Even the best test can be rendered useless by a iner to predict other behaviors—not that it mirrors the
careless, poorly trained, or ill-informed tester, as the reader to-be-predicted behaviors. If answering “true” to the ques-
will discover later in this topic. However, most examiners tion “I drink a lot of water” happens to help predict depres-
are competent. Standardization, therefore, rests largely on sion, then this seemingly unrelated question is a useful
the directions for administration found in the instructional index of depression. Thus, the reader will note that suc-
manual that typically accompanies a test. cessful prediction is an empirical question answered by
The formulation of directions is an essential step in the appropriate research. While most tests do sample directly
standardization of a test. In order to guarantee uniform from the domain of behaviors they hope to predict, this is
administration procedures, the test developer must pro- not a psychometric requirement.
vide comparable stimulus materials to all testers, specify A psychological test must also permit the derivation of
with considerable precision the oral instructions for each scores or categories. Thorndike (1918) expressed the essen-
item or subtest, and advise the examiner how to handle a tial axiom of testing in his famous assertion, “Whatever
wide range of queries from the examinee. exists at all exists in some amount.” McCall (1939) went a
To illustrate these points, consider the number of dif- step further, declaring, “Anything that exists in amount
ferent ways a test developer might approach the assess- can be measured.” Testing strives to be a form of measure-
ment of digit span—the maximum number of orally ment akin to procedures in the physical sciences whereby
presented digits a subject can recall from memory. An numbers represent abstract dimensions such as weight or
unstandardized test of digit span might merely suggest temperature. Every test furnishes one or more scores or
that the examiner orally present increasingly long series of provides evidence that a person belongs to one category
numbers until the subject fails. The number of digits in the and not another. In short, psychological testing sums up
longest series recalled would then be the subject’s digit performance in numbers or classifications.
span. Most readers can discern that such a loosely defined The implicit assumption of the psychometric view-
test will lack uniformity from one examiner to another. If point is that tests measure individual differences in traits
the tester is free to improvise any series of digits, what is to or characteristics that exist in some vague sense of the
prevent him or her from presenting, with the familiar word. In most cases, all people are assumed to possess the
4 Chapter 1

trait or characteristic being measured, albeit in different behaviors, other than those directly sampled by the test.
amounts. The purpose of the testing is to estimate the Thus, the tester may have more interest in the nontest
amount of the trait or quality possessed by an individual. behaviors predicted by the test than in the test responses
In this context, two cautions are worth mentioning. per se. Perhaps a concrete example will clarify this point.
First, every test score will always reflect some degree of Suppose an examiner administers an inkblot test to a
measurement error. The imprecision of testing is simply patient in a psychiatric hospital. Assume that the patient
unavoidable: Tests must rely on an external sample of responds to one inkblot by describing it as “eyes peering
behavior to estimate an unobservable and, therefore, out.” Based on established norms, the examiner might then
inferred characteristic. Psychometricians often express this predict that the subject will be highly suspicious and a
fundamental point with an equation: poor risk for individual psychotherapy. The purpose of the
testing is to arrive at this and similar predictions—not to
X = T + e
determine whether the subject perceives eyes staring out
where X is the observed score, T is the true score, and e is a from the blots.
positive or negative error component. The best that a test The ability of a test to predict nontest behavior is deter-
developer can do is make e very small. It can never be com- mined by an extensive body of validational research, most
pletely eliminated, nor can its exact impact be known in of which is conducted after the test is released. But there are
the individual case. no guarantees in the world of psychometric research. It is
The second caution is that test consumers must be not unusual for a test developer to publish a promising test,
wary of reifying the characteristic being measured. Test only to read years later that other researchers find it defi-
results do not represent a thing with physical reality. Typi- cient. There is a lesson here for test consumers: The fact that
cally, they portray an abstraction that has been shown to be a test exists and purports to measure a certain characteristic
useful in predicting nontest behaviors. For example, in dis- is no guarantee of truth in advertising. A test may have a
cussing a person’s IQ, psychologists are referring to an fancy title, precise instructions, elaborate norms, attractive
abstraction that has no direct, material existence but that is, packaging, and preliminary findings—but if in the dispas-
nonetheless, useful in predicting school achievement and sionate study of independent researchers the test fails to
other outcomes. predict appropriate nontest behaviors, then it is useless.
A psychological test must also possess norms or stand-
ards. An examinee’s test score is usually interpreted by
comparing it with the scores obtained by others on the 1.1.3: Further Distinctions in Testing
same test. For this purpose, test developers typically pro- The chief features of a test previously outlined apply espe-
vide norms—a summary of test results for a large and rep- cially to norm-referenced tests, which constitute the vast
resentative group of subjects (Petersen, Kolen, & Hoover, majority of tests in use. In a norm-referenced test, the per-
1989). The norm group is referred to as the standardization formance of each examinee is interpreted in reference to a
sample. relevant standardization sample (Petersen, Kolen, &
The selection and testing of the standardization ­Hoover, 1989). However, these features are less relevant in
­sample is crucial to the usefulness of a test. This group the special case of criterion-referenced tests, since these
must be representative of the population for whom the test instruments suspend the need for comparing the individual
is intended or else it is not possible to determine an exami- examinee with a reference group. In a criterion-referenced
nee’s relative standing. In the extreme case when norms test, the objective is to determine where the examinee
are not provided, the examiner can make no use of the test stands with respect to very tightly defined educational
results at all. An exception to this point occurs in the case objectives (Berk, 1984). For example, one part of an arith-
of criterion-referenced tests, discussed later. metic test for 10-year-olds might measure the accuracy
level in adding pairs of two-digit numbers. In an untimed
test of 20 such problems, accuracy should be nearly perfect.
For this kind of test, it really does not matter how the indi-
vidual examinee compares to others of the same age. What
matters is whether the examinee meets an appropriate,
specified criterion—for example, 95 percent accuracy.
Because there is no comparison to the normative perfor-
mance of others, this kind of measurement tool is aptly
designated a criterion-referenced test. The important dis-
tinction here is that, unlike norm-referenced tests, crite-
Finally, tests are not ends in themselves. In general, the rion-referenced tests can be meaningfully interpreted
ultimate purpose of a test is to predict additional without reference to norms.
Applications and Consequences of Psychological Testing 5

Another important distinction is between testing and that by their design and purpose must be administered one
assessment, which are often considered equivalent. How- on one. An important advantage of individual tests is that
ever, they do not mean exactly the same thing. Assessment the examiner can gauge the level of motivation of the sub-
is a more comprehensive term, referring to the entire pro- ject and assess the relevance of other factors (e.g., impul-
cess of compiling information about a person and using it siveness or anxiety) on the test results.
to make inferences about characteristics and to predict For convenience, we will sort tests into the eight cate-
behavior. Assessment can be defined as appraising or esti- gories depicted in Table 1.1. Each of the categories con-
mating the magnitude of one or more attributes in a per- tains norm-referenced, criterion-referenced, individual,
son. The assessment of human characteristics involves and group tests. The reader will note that any typology of
observations, interviews, checklists, inventories, projec- tests is a purely arbitrary determination. For example, we
tives, and other psychological tests. In sum, tests represent could argue for yet another dichotomy: tests that seek to
only one source of information used in the assessment pro- measure maximum performance (e.g., an intelligence test)
cess. In assessment, the examiner must compare and com- versus tests that seek to gauge a typical response (e.g., a
bine data from different sources. This is an inherently personality inventory).
subjective process that requires the examiner to sort out
conflicting information and make predictions based on a Table 1.1 The Main Types of Psychological Tests
complex gestalt of data.
Intelligence Tests: Measure an individual’s ability in relatively global
The term assessment was invented during World War II areas such as verbal comprehension, perceptual organization, or reasoning
(WWII) to describe a program to select men for secret ser- and thereby help determine potential for scholastic work or certain
vice assignment in the Office of Strategic Services (OSS occupations.
Aptitude Tests: Measure the capability for a relatively specific task or
Assessment Staff, 1948). The OSS staff of psychologists and type of skill; aptitude tests are, in effect, a narrow form of ability testing.
psychiatrists amassed a colossal amount of information on Achievement Tests: Measure a person’s degree of learning, success,
candidates during four grueling days of written tests, or accomplishment in a subject or task.
interviews, and personality tests. In addition, the assess- Creativity Tests: Assess novel, original thinking and the capacity to
find unusual or unexpected solutions, especially for vaguely defined
ment process included a variety of real-life situational tests problems.
based on the realization that there was a difference between Personality Tests: Measure the traits, qualities, or behaviors that
know-how and can-do: determine a person’s individuality; such tests include checklists, inventories,
and projective techniques.
We made the candidates actually attempt the tasks with Interest Inventories: Measure an individual’s preference for certain
their muscles or spoken words, rather than merely indi- activities or topics and thereby help determine occupational choice.
cate on paper how the tasks could be done. We were Behavioral Procedures: Objectively describe and count the
frequency of a behavior, identifying the antecedents and consequences of
prompted to introduce realistic tests of ability by such the behavior.
findings as this: that men who earn a high score in Neuropsychological Tests: Measure cognitive, sensory,
Mechanical Comprehension, a paper-and-pencil test, may perceptual, and motor performance to determine the extent, locus, and
be below average when it comes to solving mechanical behavioral consequences of brain damage.
problems with their hands.
(OSS Assessment Staff, 1948, pp. 41–42) In a narrow sense, there are hundreds—perhaps
The situational tests included group tasks of transport- thousands—of different kinds of tests, each measuring a
ing equipment across a raging brook and scaling a 10-foot- slightly different aspect of the individual. For example,
high wall, as well as individual scrutiny of the ability to even two tests of intelligence might be arguably different
survive a realistic interrogation and to command two types of measures. One test might reveal the assumption
uncooperative subordinates in a construction task. that intelligence is a biological construct best measured
On the basis of the behavioral observations and test through brain waves, whereas another might be rooted in
results, the OSS staff rated the candidates on dozens of spe- the traditional view that intelligence is exhibited in the
cific traits in such broad categories as leadership, social capacity to learn acculturated skills such as vocabulary.
relations, emotional stability, effective intelligence, and Lumping both measures under the category of intelligence
physical ability. These ratings served as the basis for select- tests is certainly an oversimplification, but nonetheless a
ing OSS personnel. useful starting point.
Intelligence tests were originally designed to sample
a broad assortment of skills in order to estimate the indi-
1.1.4: Types of Tests vidual’s general intellectual level. The Binet-Simon scales
Tests can be broadly grouped into two camps: group tests were successful, in part, because they incorporated hetero-
versus individual tests. Group tests are largely pencil-and- geneous tasks, including word definitions, memory for
paper measures suitable to the testing of large groups of designs, comprehension questions, and spatial visualiza-
persons at the same time. Individual tests are instruments tion tasks. The group intelligence tests that blossomed with
6 Chapter 1

such profusion during and after WWII also tested diverse performance and an achievement test to monitor past
abilities—witness the Army Alpha with its eight different learning.
sections measuring practical judgment, information, arith- Creativity tests assess a subject’s ability to produce
metic, and reasoning, among other skills. new ideas, insights, or artistic creations that are accepted as
Modern intelligence tests also emulate this historically being of social, aesthetic, or scientific value. Thus, meas-
established pattern by sampling a wide variety of profi- ures of creativity emphasize novelty and originality in the
ciencies deemed important in our culture. In general, the solution of fuzzy problems or the production of artistic
term intelligence test refers to a test that yields an overall works. A creative response to one problem is illustrated in
summary score based on results from a heterogeneous Figure 1.1.
sample of items. Of course, such a test might also provide a Tests of creativity have a checkered history. In the
profile of subtest scores as well, but it is the overall score 1960s, they were touted as a useful alternative to intelli-
that generally attracts the most attention. gence tests and used widely in U.S. school systems. Educa-
Aptitude tests measure one or more clearly defined tors were especially impressed that creativity tests required
and relatively homogeneous segments of ability. Such tests divergent thinking—putting forth a variety of answers to a
come in two varieties: single aptitude tests and multiple complex or fuzzy problem—as opposed to convergent
aptitude test batteries. A single aptitude test appraises, thinking—finding the single correct solution to a well-
obviously, only one ability, whereas a multiple aptitude defined problem. For example, a creativity test might ask
test battery provides a profile of scores for a number of the examinee to imagine all the things that would happen
aptitudes. if clouds had strings trailing from them down to the
Aptitude tests are often used to predict success in an ground. Students who could come up with a large number
occupation, training course, or educational endeavor. For of consequences were assumed to be more creative than
example, the Seashore Measures of Musical Talents their less-imaginative colleagues. However, some psy-
­(Seashore, 1938), a series of tests covering pitch, loudness, chometricians are skeptical, concluding that creativity is
rhythm, time, timbre, and tonal memory, can be used to just another label for applied intelligence.
identify children with potential talent in music. Specialized
aptitude tests also exist for the assessment of clerical skills,
Figure 1.1 Solutions to the Nine-Dot Problem as
mechanical abilities, manual dexterity, and artistic ability. Examples of Creativity
The most common use of aptitude tests is to determine
Note: Without lifting the pencil, draw through all the dots with as few straight
college admissions. Most every college student is familiar lines as possible. The usual solution is shown in a. Creative solutions are
depicted in b and c.
with the SAT (Scholastic Assessment Test, previously
called the Scholastic Aptitude Test) of the College Entrance
Examination Board. This test contains a Verbal section
stressing word knowledge and reading comprehension; a
Mathematics section stressing algebra, geometry, and
insightful reasoning; and a Writing section. In effect, col-
leges that require certain minimum scores on the SAT for
admission are using the test to predict academic success.
Achievement tests measure a person’s degree of learn-
ing, success, or accomplishment in a subject matter. The
implicit assumption of most achievement tests is that the
schools have taught the subject matter directly. The pur-
pose of the test is then to determine how much of the mate-
rial the subject has absorbed or mastered. Achievement
tests commonly have several subtests, such as reading,
mathematics, language, science, and social studies.
The distinction between aptitude and achievement
tests is more a matter of use than content (Gregory, 1994a). a b c
In fact, any test can be an aptitude test to the extent that it
helps predict future performance. Likewise, any test can be Personality tests measure the traits, qualities, or
an achievement test insofar as it reflects how much the sub- behaviors that determine a person’s individuality; this
ject has learned. In practice, then, the distinction between information helps predict future behavior. These tests
these two kinds of instruments is determined by their come in several different varieties, including checklists,
respective uses. On occasion, one instrument may serve inventories, and projective techniques such as sentence
both purposes, acting as an aptitude test to forecast future completions and inkblots (Table 1.2).
Applications and Consequences of Psychological Testing 7

Interest inventories measure an individual’s prefer- Neuropsychology is the study of brain–behavior relation-
ence for certain activities or topics and thereby help ships. Over the years, neuropsychologists have discovered
determine occupational choice. These tests are based on that certain tests and procedures are highly sensitive to the
the explicit assumption that interest patterns determine effects of brain damage. Neuropsychologists use these spe-
and, therefore, also predict job satisfaction. For example, cialized tests and procedures to make inferences about the
if the examinee has the same interests as successful and locus, extent, and consequences of brain damage. A full
satisfied accountants, it is thought likely that he or she neuropsychological assessment typically requires three to
would enjoy the work of an accountant. The assumption eight hours of one-on-one testing with an extensive battery
that interest patterns predict job satisfaction is largely of measures. Examiners must undergo comprehensive
borne out by empirical studies, as we will review in a advanced training in order to make sense out of the result-
later chapter. ing mass of test data.

Table 1.2 Examples of Personality Test Items 1.1.5: Uses of Testing


(a) An Adjective Checklist By far the most common use of psychological tests is to
Check those words which describe you: make decisions about persons. For example, educational
( ) relaxed ( ) assertive institutions frequently use tests to determine placement
( ) thoughtful ( ) curious levels for students, and universities ascertain who should
( ) cheerful ( ) even-tempered
be admitted, in part, on the basis of test scores. State, fed-
eral, and local civil service systems also rely heavily on
( ) impatient ( ) skeptical
tests for purposes of personnel selection.
( ) morose ( ) impulsive
Even the individual practitioner exploits tests, in the
( ) optimistic ( ) anxious
main, for decision making. Examples include the consult-
(b) A True-False Inventory
ing psychologist who uses a personality test to determine
Circle true or false as each statement applies to you:
that a police department hire one candidate and not
TF I like sports magazines.
another, and the neuropsychologist who employs tests to
TF Most people would lie to get a job.
conclude that a client has suffered brain damage.
TF I like big parties where there is lots of noisy But simple decision making is not the only function of
fun.
psychological testing. It is convenient to distinguish five
TF Strange thoughts possess me for hours at
a time. uses of tests:
TF I often regret the missed opportunities in
my life. • Classification
TF Sometimes I feel anxious for no reason at all. • Diagnosis and treatment planning
TF I like everyone I have met. • Self-knowledge
TF Falling asleep is seldom a problem for me. • Program evaluation
(c) A Sentence Completion Projective Test • Research
Complete each sentence with the first thought that comes to you:
I feel bored when These applications frequently overlap and, on occa-
What I need most is sion, are difficult to distinguish one from another. For
I like people who example, a test that helps determine a psychiatric diagno-
My mother was
sis might also provide a form of self-knowledge. Let us
examine these applications in more detail.

Many kinds of behavioral procedures are available for


assessing the antecedents and consequences of behavior,
including checklists, rating scales, interviews, and struc-
tured observations. These methods share a common
assumption that behavior is best understood in terms of
clearly defined characteristics such as frequency, duration,
antecedents, and consequences. Behavioral procedures
tend to be highly pragmatic in that they are usually inter-
woven with treatment approaches.
Neuropsychological tests are used in the assessment
of persons with known or suspected brain dysfunction.
8 Chapter 1

characteristics or needs. Ordinarily, psychometricians


acknowledge that screening tests will result in many mis-
classifications. Examiners are, therefore, advised to do
­follow-up testing with additional instruments before mak-
ing important decisions on the basis of screening tests. For
example, to identify children with highly exceptional tal-
ent in spatial thinking, a psychologist might administer a
10-minute paper-and-pencil test to every child in a school
system. Students who scored in the top 10 percent might
then be singled out for more comprehensive testing.
Certification and selection both have a pass/fail qual-
ity. Passing a certification exam confers privileges. Exam-
ples include the right to practice psychology or to drive a
car. Thus, certification typically implies that a person has at
least a minimum proficiency in some discipline or activity.
Selection is similar to certification in that it confers privi-
leges such as the opportunity to attend a university or to
gain employment.
Another use of psychological tests is for diagnosis and
treatment planning. Diagnosis consists of two intertwined
tasks: determining the nature and source of a person’s
abnormal behavior, and classifying the behavior pattern
within an accepted diagnostic system. Diagnosis is usually
a precursor to remediation or treatment of personal dis-
tress or impaired performance.
Psychological tests often play an important role in diag-
nosis and treatment planning. For example, intelligence
tests are absolutely essential in the diagnosis of mental retar-
dation. Personality tests are helpful in diagnosing the nature
and extent of emotional disturbance. In fact, some tests such
as the MMPI were devised for the explicit purpose of
increasing the efficiency of psychiatric diagnosis.
Diagnosis should be more than mere classification,
more than the assignment of a label. A proper diagnosis
conveys information—about strengths, weaknesses, etiol-
The term classification encompasses a variety of pro- ogy, and best choices for remediation/treatment. Knowing
cedures that share a common purpose: assigning a person that a child has received a diagnosis of learning disability
to one category rather than another. Of course, the assign- is largely useless. But knowing in addition that the same
ment to categories is not an end in itself but the basis for child is well below average in reading comprehension, is
differential treatment of some kind. Thus, classification can highly distractible, and needs help with basic phonics can
have important effects such as granting or restricting access provide an indispensable basis for treatment planning.
to a specific college or determining whether a person is Psychological tests also can supply a potent source of
hired for a particular job. There are many variant forms of self-knowledge. In some cases, the feedback a person
classification, each emphasizing a particular purpose in receives from psychological tests can change a career path
assigning persons to categories. We will distinguish place- or otherwise alter a person’s life course. Of course, not
ment, screening, certification, and selection. every instance of psychological testing provides self-
Placement is the sorting of persons into different pro- knowledge. Perhaps in the majority of cases the client
grams appropriate to their needs or skills. For example, already knows what the test results divulge. A high-­
universities often use a mathematics placement exam to functioning college student is seldom surprised to find that
determine whether students should enroll in calculus, his IQ is in the superior range. An architect is not perplexed
algebra, or remedial courses. to hear that she has excellent spatial reasoning skills. A stu-
Screening refers to quick and simple tests or proce- dent with meager reading capacity is usually not startled
dures to identify persons who might have special to receive a diagnosis of “learning disability.”
Applications and Consequences of Psychological Testing 9

Another use for psychological tests is the systematic and theoretical branches of behavioral research. As an
evaluation of educational and social programs. We have example of testing in applied research, consider the prob-
more to say about the evaluation of educational programs lem faced by neuropsychologists who wish to investigate
when we discuss achievement tests in a later chapter. We the hypothesis that low-level lead absorption causes
focus here on the use of tests in the evaluation of social pro- behavioral deficits in children. The only feasible way to
grams. Social programs are designed to provide services explore this supposition is by testing normal and lead-­
that improve social conditions and community life. For burdened children with a battery of psychological tests.
example, Project Head Start is a federally funded program Needleman and associates (1979) used an array of tradi-
that supports nationwide pre-school teaching projects for tional and innovative tests to conclude that low-level lead
underprivileged children (McKey and others, 1985). absorption causes decrements in IQ, impairments in
Launched in 1965 as a precedent-setting attempt to provide ­reaction time, and escalations of undesirable classroom
child development programs to low-income families, Head behaviors. Their conclusions inspired a tumultuous and
Start has provided educational enrichment and health ser- bitter exchange of opinions that we will not review here
vices to millions of at-risk preschool children. (Needleman et al., 1990). However, the passions inspired
But exactly what impact does the multi-billion-dollar by this study epitomize an instructive point: Academicians
Head Start program have on early childhood develop- and public policymakers respect psychological tests. Why
ment? Congress wanted to know if the program improved else would they engage in lengthy, acrimonious debates
scholastic performance and reduced school failure among about the validity of testing-based research findings?
the enrollees. But the centers vary by sponsoring agencies,
staff characteristics, coverage, content, and objectives, so
1.1.6: Factors Influencing
the effects of Head Start are not easy to ascertain. Psycho-
logical tests provide an objective basis for answering these the Soundness of Testing
questions that is far superior to anecdotal or impressionis- Psychological testing is a dynamic process influenced by
tic reporting. In general, Head Start children show imme- many factors. Although examiners strive to ensure that test
diate gains in IQ, school readiness, and academic results accurately reflect the traits or capacities being
achievement, but these gains dissipate in the ensuing years assessed, many extraneous factors can sway the outcome
(Figure 1.2). of psychological testing. In this section, we review the
potentially crucial impact of several sources of influence:
the manner of administration, the characteristics of the
Figure 1.2 Longitudinal Test Results from the Head Start tester, the context of the testing, the motivation and experi-
Project ence of the examinee, and the method of scoring.
Source: From McKey, R. H., and others. (1985). The impact of Head Start on The sensitivity of the testing process to extraneous
children, families and communities. Washington, DC: U.S. Government Printing
Office. In the public domain. influences is obvious in cases where the examiner is cold,
hurried, or incompetent. However, invalid test results do
not originate only from obvious sources such as blatantly
nonstandard administration, hostile tester, noisy testing
room, or fearful examinee. In addition, there are numer-
ous, subtle ways in which method, examiner, context, or
motivation can alter test results. We provide a comprehen-
sive survey of these extraneous influences in the remainder
of this topic.

1.1.7: Standardized Procedures


in Test Administration
The interpretation of a psychological test is most reliable
when the measurements are obtained under the standard-
So far we have discussed the practical application of ized conditions outlined in the publisher’s test manual.
psychological tests to everyday problems such as job selec- Nonstandard testing procedures can alter the meaning of
tion, diagnosis, or program evaluation. In each of these the test results, rendering them invalid and, therefore, mis-
instances, testing serves an immediate, pragmatic purpose: leading. Standardized procedures are so important that
helping the tester make decisions about persons or pro- they are listed as an essential criterion for valid testing in
grams. But tests also play a major role in both the applied the Standards for Educational and Psychological Testing (1999),
10 Chapter 1

a reference manual published jointly by the American Psy- the norms remain valid. After all, the testers who collected
chological Association and other groups: data from the standardization sample did not act like
heartless robots when posing questions to subjects. Exam-
In typical applications, test administrators should follow iners who wish to obtain valid results must likewise exer-
carefully the standardized procedures for administration cise a reasoned flexibility in testing procedures.
and scoring specified by the test publisher. Specifications
However, considerable clinical experience is needed to
regarding instructions to test takers, time limits, the form
determine whether an adjustment in procedure is minor or
of item presentation or response, and test materials or
so substantial that existing norms no longer apply. This is
equipment should be strictly observed. Exceptions should
be made only on the basis of carefully considered profes- why psychological examiners normally receive extensive
sional judgment, primarily in clinical applications. supervised experience before they are allowed to adminis-
(AERA, APA, NCME, 1999) ter and interpret individual tests of ability or personality.
In certain cases an examiner will knowingly depart
Suppose the instructions to the vocabulary section of a from standard procedures to a substantial degree; this
children’s intelligence test specify that the examiner should practice precludes the use of available test norms. In these
ask, “What does sofa mean, what is a sofa?” If a subject instances, the test is used to help formulate clinical judg-
were to reply, “I’ve never heard that word,” an inexperi- ments rather than to determine a quantitative index. For
enced tester might be tempted to respond, “You know, a example, when examining aphasic patients, it may be
couch—what is a couch?” This may strike the reader as a desirable to ignore time limits entirely and accept rounda-
harmless form of fair play, a simple rephrasing of the origi- bout answers. The examiner might not even calculate a
nal question. Yet, by straying from standardized proce- score. In these rare cases, the test becomes, in effect, an
dures, the examiner has really given a different test. The adjunct to the clinical interview. Of course, when the exam-
point in asking for a definition of sofa (and not couch) is pre- iner does not adhere to standardized procedures, this
cisely that sofa is harder to define and, therefore, a better should be stated explicitly in the written report.
index of high-level vocabulary skills.
Even though standardized testing procedures are nor- 1.1.8: Desirable Procedures
mally essential, there are instances in which flexibility in
procedures is desirable or even necessary. As suggested in
of Test Administration
the APA Standards, such deviations should be reasoned and A small treatise could be written on desirable procedures
deliberate. An analogy to the spirit of the law versus the of test administration, but we will have to settle for a brief
letter of the law is relevant here. An overly zealous exam- listing of the most essential points. For more details, the
iner might capture the letter of the law, so to speak, by interested reader can consult Sattler (2001) on the individ-
adhering literally and strictly to testing procedures out- ual testing of children and Clemans (1971) on group test-
lined in the publisher’s manual. But is this really what ing. We discuss individual testing first, then briefly list
most test publishers intend? Is it even how the test was some important points about desirable procedures in
actually administered to the normative sample? Most group testing.
likely publishers would prefer that examiners capture the
spirit of the law even if, on occasion, it is necessary to
adjust testing procedures slightly.
The need to adjust standardized procedures for testing
is especially apparent when examining persons with cer-
tain kinds of disabilities. A subject with a speech impedi-
ment might be allowed to write down the answers to orally
presented questions or to use gesture and pantomime in
response to some items. For example, a test question might
ask, “What shape is a ball?” The question is designed to The uninitiated student of assessment often assumes
probe the subject’s knowledge of common shapes, not to that examination procedures are so simple and straightfor-
examine whether the examinee can verbalize “round.” The ward that a quick once-through reading of the manual will
written response round and the gestured response (a circu- suffice as preparation for testing. Although some individ-
lar motion of the index finger) are equally correct, too. ual tests are exceedingly rudimentary and uncomplicated,
Minor adjustments in procedures that heed the spirit many of them have complexities of administration that,
in which a test was developed occur on a regular basis and unheeded, can cause the examinee to fail items unnecessar-
are no cause for alarm. These minor adjustments do not ily. For example, Choi and Proctor (1994) found that 25 of
invalidate the established norms—on the contrary, the 27 graduate students made serious errors in the adminis-
appropriate adaptation of procedures is necessary so that tration of the Stanford-Binet: Fourth Edition, even though
Applications and Consequences of Psychological Testing 11

the sessions were videotaped and the students knew their complex flow of reaction and counterreaction, as outlined
testing skills were being evaluated. Ramos, Alfonso, and in three pages of instructions. Woe to the tester who has not
Schermerhorn (2009) reviewed 108 protocols from the rehearsed this subtest and anticipated the proper response
Woodcock Johnson III Tests of Cognitive Abilities adminis- to examinees who falter on the first two designs.
tered by 36 first-year graduate students in a school psy-
chology doctoral program. The researchers found an Sensitivity to Disabilities  Another important
average of almost 5 errors per test, including the use of ingredient of valid test administration is sensitivity to dis-
incorrect ceilings, failure to record errors, and failure to abilities in the examinee. Impairments in hearing, vision,
encircle the correct row for the total number correct. Loe, speech, or motor control may seriously distort test results.
Kadlubek, and Williams (2007) reviewed 51 WISC-IV pro- If the examiner does not recognize the physical disability
tocols administered by graduate students and found an responsible for the poor test performance, a subject may be
average of almost 26 errors per protocol. The two most branded as intellectually or emotionally impaired when, in
common errors were the failure to query incomplete or fact, the essential problem is a sensory or motor disability.
ambiguous verbal responses, and granting too many points Vernon and Brown (1964) reported the tragic case of a
for substandard answers. In many cases, these errors mate- young girl who was relegated to a hospital for the mentally
rially affected the Full Scale IQ, shifting it upward or retarded as a consequence of the tester’s insensitivity to
downward from the likely true score. What these studies physical disability. The examiner failed to notice that the
confirm is that appropriate attention to the details of child was deaf and concluded that her Stanford-Binet IQ of
administration and scoring is essential for valid results. 29 was valid. She remained in the hospital for five years,
The necessity for intimate familiarity with testing pro- but was released after she scored an IQ of 113 on a perfor-
cedures is well illustrated by the Block Design subtest of mance-based intelligence test! After dismissal from the
the WAIS-IV (Wechsler, 2008). The materials for the sub- hospital, she entered a school for the deaf and made good
test include nine blocks (cubes) colored red on two sides, progress.
white on two sides, and red/white on two sides. The Persons with disabilities may require specialized tests
examinee’s task is to use the blocks to construct patterns for valid assessment. In this section, we concentrate on the
depicted on cards. For the initial designs, four blocks are vexing issues raised when standardized tests for normal
needed, while for more difficult designs, all nine blocks populations are used with mildly or moderately disabled
are provided (Figure 1.3). subjects. We include separate discussions of the testing
process for examinees with a hearing, vision, speech, or
Figure 1.3 Materials Similar to WAIS-IV Block Design motor control problem. Vexing issues are raised when
Subtest standardized tests for normal populations are used with
mildly or moderately disabled subjects. There are various
testing processes for examinees with a hearing, vision,
speech, or motor control problem. However, the reader
needs to know that many exceptional examinees have mul-
tiple disabilities.
Valid testing of a subject with a hearing impairment
requires first of all that the examiner detect the existence of
the disability! This is often more difficult than it seems.
Many persons with mild hearing loss learn to compensate
for this disability by pretending to understand what others
say and waiting for further conversational cues to help
clarify faintly perceived words or phrases. As a result,
other persons—including psychologists—may not per-
ceive that an individual with mild hearing loss has any dis-
ability at all.
Bright examinees have no difficulty comprehending Failure to notice a hearing loss is particularly a problem
this task and the exact instructions do not influence their with young examinees, who are usually poor informants
performance appreciably. However, persons whose intelli- about their disabilities. Young children are also prone to
gence is average or below average need the elaborate dem- fluctuating hearing losses due to the periodic accumulation
onstrations and corrections that are specified in the of fluid in the middle ear during intervals of mild illness
WAIS-IV manual (Wechsler, 2008). In particular, the exam- (Vernon & Alles, 1986). A child with a fluctuating hearing
iner demonstrates the first two designs and responds to the loss may have normal hearing in the morning, but perceive
examinee’s success or failure on these according to a conversational speech as a whisper just a few hours later.
12 Chapter 1

Indications of possible hearing difficulty include lack retardation because cerebral palsy rendered his speech
of normal response to sound, inattentiveness, difficulty in incomprehensible. The patient was wheelchair-bound and
following oral instructions, intent observation of the had almost no motor control, so his performance on non-
speaker’s lips, and poor articulation (Sattler, 1988). In all verbal tests was also grossly impaired. The staff assumed
cases in which hearing impairment is suspected, referral he was severely retarded, so the patient remained on the
for an audiological examination is crucial. In persons with back ward for decades. However, he befriended a fellow
a mild hearing loss, it is essential for the examiner to face resident who could comprehend the patient’s gutteral ren-
the subject squarely, speak loudly, and repeat instructions dition of the alphabet. The friend was severely retarded
slowly. It is also important to find a quiet room for testing. but could nonetheless recognize keys on a typewriter. With
Ideally, a testing room will have curtains and textured laborious letter-by-letter effort, the patient with incapaci-
wall surfaces to minimize the distracting effects of back- tating cerebral palsy wrote and published an autobiogra-
ground noises. phy, using his friend with mental disability as a conduit to
In contrast to those with hearing loss, subjects with the real world.
visual disabilities generally attend well to verbally pre- Even if their disability is mild, persons with cerebral
sented test materials. The examinee with visual impairment palsy or other motor impairments may be penalized by
introduces a different kind of challenge to the examiner: timed performance tests. When testing a person with a
detecting that a visual impairment exists, and then ensur- mild motor disability, examiners may wish to omit timed
ing that the subject can see the test materials well. performance subtests or to discount these results if they are
Detecting visual impairment is a straightforward mat- consistently lower than scores from untimed subtests. If a
ter with adult subjects—in most cases, a mature examinee subject has an obvious motor disability—such as a diffi-
will freely volunteer information about visual impairment, culty in manipulating the pieces of a puzzle—then stand-
especially if asked. However, children are poor informants ard instruments administered in the normal manner are
about their visual capacities, so testers need to know the largely inappropriate. A number of alternative instruments
signs and symptoms of possible visual impairment in a have been developed expressly for examinees with cere-
young examinee. Common sense is a good starting point: bral palsy and other motor impairments, and standard
Children who squint, blink excessively, or lose their place tests have been cleverly adapted and renormed.
when reading may have a vision problem. Holding books
or testing materials up close is another suspicious sign. Desirable Procedures of Group Testing  Psy-
Blurred or double vision may signify visual problems, as chologists and educators commonly assume that almost
may headaches or nausea after reading. In general, it is so any adult can accurately administer group tests, so long as
common for children to require corrective lenses that he or she has the requisite manual. Administering a group
examiners should be on the lookout for a vision problem in test would appear to be a simple and straightforward pro-
any young subject who does not wear glasses and has not cedure of passing out forms and pencils, reading instruc-
had a recent vision exam. tions, keeping time, and collecting the materials.
Depending on the degree of visual impairment, exam- In reality, conducting a group test requires as much
iners need to make corresponding adjustments in testing. finesse as administering an individual test, a point recog-
If the child’s vision is of no practical use, special instru- nized years ago by Traxler (1951). There are numerous
ments with appropriate norms must be used. For example, ways in which careless administration and scoring can
the Perkins-Binet is available for testing children who are impair group test results, causing bias for the entire group
blind. For obvious reasons, only the verbal portions of tests or affecting only certain individuals. We outline only the
should be administered to sighted children with an uncor- more important inadequacies and errors in the following
rected visual problem. paragraphs, referring the reader to Traxler (1951) and
Speech impairments present another problem for diag- ­Clemans (1971) for a more complete discussion.
nosticians. The verbal responses of subjects with speech Undoubtedly the greatest single source of error in
impairment are difficult to decipher. Owing to the failed group test administration is incorrect timing of tests that
comprehension of the examiner, subjects may receive less require a time limit. Examiners must allot sufficient time
credit than is due. Sattler (1988) relates the lamentable case for the entire testing process: setup, reading instructions
of Daniel Hoffman, a youngster with speech impairment out loud, and the actual test taking by examinees. Allotting
who spent his entire youth in classes for those with mental sufficient time requires foresightful scheduling. For exam-
retardation because his Stanford-Binet IQ was 74. In actual- ple, in many school settings, children must proceed to the
ity, his intelligence was within the normal range, as next class at a designated time, regardless of ongoing activ-
revealed by other performance-based tests. In another ities. Inexperienced examiners might be tempted to cut
tragic miscarriage of assessment, a patient in England was short the designated time limit for a test so that the school
mistakenly confined to a ward for those with severe schedule can be maintained. Of course, reduced time on a
Applications and Consequences of Psychological Testing 13

test renders the norms completely invalid and likely low- be, on average, 3 wrong answers for every correct answer,
ers the score for most subjects in the group. so for 9 wrong guesses we would expect 3 correct guesses
Allowing too much time for a test can be an equally on other questions. The subject’s corrected score—the one
egregious error. For example, consider the impact of receiv- actually reported and compared to existing norms—would
ing extra time on the Miller Analogies Test (MAT), a high- then be 32; that is, 35 minus 3. In other words, she probably
level reasoning test once required by many universities for knew 32 answers but by guessing on 12 others she boosted
graduate school application. Since the MAT is a speeded her score another 3 points.
test that requires quick analogical thinking, extra time The scoring correction outlined in the preceding para-
would allow most examinees to solve several extra prob- graph pertains only to wild, uneducated guesses. The
lems. This kind of testing error would likely lower the effect of such a correction is to eliminate the advantage oth-
validity of the MAT results as a predictor of graduate erwise bestowed on unabashed risk takers. However, not
school performance. all guesses are wild and uneducated. In some instances, an
A second source of error in group test administration examinee can eliminate one or two of the alternatives,
is lack of clarity in the directions to the examinees. Examin- thereby increasing the odds of a correct guess among the
ers must read the instructions slowly in a clear, loud voice remaining choices. In this situation, it may be wise for the
that commands the attention of the subjects. Instructions examinee to guess.
must not be paraphrased. Where allowed by the manual, Whether an educated guess is really to the advantage
examiners must stop and clarify points with individual of the examinee depends partly on the diabolical skill of
examinees who are confused. the item writer. Traxler (1951) notes:
Noise is another factor that must be controlled in
In effect, the item writer attempts to make each wrong
group testing. It has been known for some time that noise response so plausible that every examinee who does not
causes a decrease in performance, especially for tasks of possess the desired skill or ability will select a wrong
high complexity (e.g., Boggs & Simon, 1968). Surprisingly, response. In other words, the item writer’s aim is to make
there is little research on the effects of noise on psychologi- all or nearly all considered guesses wrong guesses.
cal tests. However, it seems almost certain that loud noise,
A skilled item writer can fashion questions so that the
especially if intermittent and unpredictable, will cause test
correct alternative is completely counterintuitive and the
scores to decline substantially. Elementary schoolchildren
wrong alternatives are persuasively appealing. For these
should not be expected to perform well while a construc-
items, an educated guess is almost always wrong.
tion worker jackhammers a cement wall in the next room.
Nonetheless, many test developers now advise sub-
In fairness to the examinees, there are times when the test
jects to make educated guesses but warn against wild
administrator should reschedule the test.
guesses. For example, a recent edition of the test prepara-
Another source of error in the administration of a
tion manual Taking the SAT advises:
group test is failure to explain when and if examinees
should guess. Perhaps more frequently than any other Because of the way the test is scored, haphazard or ran-
question, examiners are asked, “Is there a penalty if I guess dom guessing for questions you know nothing about is
wrong?” In most instances, test developers anticipate this unlikely to change your score. When you know that one
issue and provide explicit guidance to subjects as to the or more choices can be eliminated, guessing from among
the remaining choices should be to your advantage.
advantages and/or pitfalls of guessing. Examiners should
not give supplementary advice on guessing—this would Whether or not a group test uses a scoring correction,
constitute a serious deviation from standardized procedure. the important point to emphasize in this context is that the
Most test developers incorporate a correction for administrator should follow standardized procedure and
guessing based on established principles of probability. never offer supplementary advice about guessing. In group
Consider a multiple-choice test that has four alternatives testing, deviations from the instructions manual are simply
per item. On those items that the subject makes a wild, unacceptable.
uneducated guess, the odds on being correct are 1 out of 4,
while the odds on being wrong are 3 out of 4. Thus, for
every three wrong guesses, there will be one correct guess 1.1.9: Influence of the Examiner
that reflects luck rather than knowledge. Suppose a young The Importance of Rapport Test publishers urge
girl answers correctly on 35 questions from a 50-item test examiners to establish rapport— a comfortable, warm
but answers erroneously on 9 questions. In all, she has atmosphere that serves to motivate examinees and elicit
answered 44 questions, leaving 6 blank. The fact that she cooperation. Initiating a cordial testing milieu is a crucial
selected the wrong alternative on 9 questions suggests that aspect of valid testing. A tester who fails to establish rap-
she also gained 3 correct answers due to luck rather than port may cause a subject to react with anxiety, passive-
knowledge. Remember, on wild guesses we expect there to aggressive noncooperation, or open hostility. Failure to
14 Chapter 1

establish rapport distorts test findings: Ability is underesti- with an African American examiner scored significantly
mated and personality is misjudged. higher than the high-mistrust group with a white exam-
Rapport is especially important in individual testing iner (average IQs of 96 versus 86, respectively). In addi-
and particularly so when evaluating children. Wechsler tion, the low-mistrust group with a white examiner scored
(1974) has noted that establishing rapport places great slightly higher than the low-mistrust group with an Afri-
demands on the clinical skills of the tester: can American examiner (average IQs of 97 versus 92,
respectively). In sum, the authors concluded that mis-
To put the child at ease in his surroundings, the examiner
might engage him in some informal conversation before trustful African Americans do poorly when tested by
getting down to the more serious business of giving the white examiners. Data bearing on this type of racial effect
test. Talking to him about his hobbies or interests is often are meager, and there is certainly room for additional
a good way of breaking the ice, although it may be better research.
to encourage a shy child to talk about something concrete
in the environment—a picture on the wall, an animal in
his classroom, or a book or toy (not a test material) in the
1.1.10: Background and Motivation
examining room. In general, this introductory period of the Examinee
need not take more than 5 to 10 minutes, although the Examinees differ not only in the characteristics that exam-
testing should not start until the child seems relaxed iners desire to assess but also in other extraneous ways that
enough to give his maximum effort.
might confound the test results. For example, a bright sub-
Testers may differ in their abilities to establish rapport. ject might perform poorly on a speeded ability test because
Cold testers will likely obtain less cooperation from their of test anxiety; a sane murderer might seek to appear men-
subjects, resulting in reduced performance on ability tests tally ill on a personality inventory to avoid prosecution; a
or distorted, defensive results on personality tests. Overly student of average ability might undergo coaching to per-
solicitous testers may err in the opposite direction, giving form better on an aptitude test. Some subjects utterly lack
subtle (and occasionally blatant) cues to correct answers. motivation and don’t care if they do well on psychological
Both extremes should be avoided. tests. In all of these instances, the test results may be inac-
curate because of the filtering and distorting effects of cer-
Examiner Sex, Experience, and Race A wide tain examinee characteristics such as anxiety, malingering,
body of research has sought to determine whether certain coaching, or cultural background.
characteristics of the examiner cause examinee scores to
Test Anxiety Test anxiety refers to those phenomeno-
be raised or lowered on ability tests. For example, does it
logical, physiological, and behavioral responses that
matter whether the examiner is male or female? Experi-
accompany concern about possible failure on a test. There
enced or novice? Same or different race from the exami-
is no doubt that subjects experience different levels of test
nee? We will contain the urge to review these studies—with
anxiety ranging from a carefree outlook to incapacitating
a few exceptions—for one simple reason: The results are
dread at the prospect of being tested.
contradictory and, therefore, inconclusive. Most studies
Several true-false questionnaires have been developed
find that sex, experience, and race of the examiner make
to assess individual differences in test anxiety (e.g., Lowe,
little, if any, difference. Furthermore, the few studies that
Lee, Witteborg, & others, 2008; Spielberger, Gonzalez,
report a large effect in one direction (e.g., female examin-
­Taylor, & others, 1980; Spielberger & Vagg, 1995). Follow-
ers elicit higher IQ scores) are contradicted by other stud-
ing, we list characteristic items and their direction of
ies showing the opposite trend. The interested reader can
­keying (T for True, F for False):
consult Sattler (1988) for a discussion and extensive listing
of references. (T) When taking an important examination, I sweat a great deal.
Yet, it would be unwise to conclude that sex, experi- (T) I freeze up when I take intelligence tests or school exams.
ence, or race of the examiner never affect test scores. In (F) I really don’t understand why some people get so upset about tests.
isolated instances, a particular examiner characteristic (T) I dread courses in which the instructor likes to give “pop” quizzes.
might very well have a large effect on examinee test
scores. For example, Terrell, Terrell, and Taylor (1981) An extensive body of research has confirmed the
ingeniously demonstrated that the race of the examiner commonsense notion that test anxiety is negatively cor-
interacts potently with the trust level of African American related with school achievement, aptitude test scores,
examinees in IQ testing. These researchers identified Afri- and measures of intelligence (e.g., Chapell, Blanding, &
can American college students with high and low levels Silverstein, 2005; Naveh-Benjamin, McKeachie, & Lin,
of mistrust of whites; half of each group was then admin- 1987; Ortner & Caspers, 2011). However, the interpreta-
istered the WAIS by a white examiner, the other half by an tion of these correlational findings is not straightforward.
African American examiner. The high-mistrust group One possibility is that students develop test anxiety
Applications and Consequences of Psychological Testing 15

because of a history of performing poorly on tests. That


Figure 1.4 Influence of Timing and Anxiety Level on WAIS
is, the decrements in performance may precede and Subtest Results
cause the test anxiety. In support of this viewpoint, Paul-
Source: Based on data from Siegman, A. W. (1956). The effect of manifest
man and Kennelly (1984) found that—independent of anxiety on a concept formation task, a nondirected learning task, and on timed
and untimed intelligence tests. Journal of Consulting Psychology, 20, 176–178.
their anxiety—many test-anxious students also display
ineffective test taking in academic settings. Such students 12
Low-Anxious
would do poorly on tests whether or not they were anx- Subjects

Subtest Score
ious. Moreover, Naveh-Benjamin et al. (1987) determined
11
that a large proportion of test-anxious college students
have poor study habits that predispose them to poor test
High-Anxious
performance. The test anxiety of these subjects is partly a 10 Subjects
by-product of lifelong frustration over mediocre test
results. Untimed Timed
Subtests Subtests
Other lines of research indicate that test anxiety has a
directly detrimental effect on test performance. That is,
test anxiety is likely both cause and effect in the equation Motivation to Deceive Test results also may be
linking it with poor test performance. Consider the semi- inaccurate if the examinee has reasons to perform in an
nal study on this topic by Sarason (1961), who tested inadequate or unrepresentative manner. Overt faking of
high- and low-anxious subjects under neutral or anxiety- test results is rare, but it does happen. A small fraction of
inducing instructions. The subjects were college students persons seeking benefits from rehabilitation or social
required to memorize two-syllable words low in mean- agencies will consciously fake bad on personality and
ingfulness—a difficult task. Half of the subjects performed ability tests.
under neutral instructions—they were simply told to
memorize the lists. The remaining subjects were told to
memorize the lists and told that the task was an intelli- 1.2: Ethical and Social
gence test. They were urged to perform as well as possi-
ble. The two groups did not differ significantly in Implications of Testing
performance when the instructions were neutral and non-
1.2 Illustrate the need for ethical and professional
threatening. However, when the instructions aroused
standards in testing and the responsibilities of test
anxiety, performance levels for the high-anxious subjects
publishers
dropped markedly, leaving them at a huge disadvantage
compared to low-anxious subjects. This indicates that The general theme of this book is that psychological testing
test-anxious subjects show significant decrements in per- is a beneficial influence in modern society. When used ethi-
formance when they perceive the situation as a test. In cally and responsibly, testing provides a basis for arriving
contrast, low-anxious subjects are relatively unaffected by at sensible inferences about individuals and groups. After
such a simple redefinition of the context. all, the intention of the enterprise is to promote proper
Tests with narrow time limits pose a special problem guidance, effective treatment, accurate evaluation, and fair
to persons with high levels of test anxiety. Time pressure decision making—whether in one-on-one clinic testing or
seems to exacerbate the degree of personal threat, causing institutional group testing. Who could possibly complain
significant reductions in the performance of test-anxious about these goals?
persons. Siegman (1956) demonstrated this point many Thankfully, tests generally are applied in an ethical
years ago by comparing performance levels of high- and and responsible manner by psychologists, educators,
low-anxious medical/psychiatric patients on timed and administrators, and others. But there are exceptions.
untimed subtests from the WAIS. The WAIS consists of Almost everyone has heard the horrific anecdotes: the
eleven subtests, including six subtests for which the exam- minority grade schooler casually labeled as having mental
iner uses a stopwatch to enforce strict time limits, and five retardation on the basis of a single IQ score; the college stu-
subtests for which the subject has unlimited time to dent implausibly diagnosed as schizophrenic from a pro-
respond. Interestingly, the high- and low-anxious subjects jective test; the job applicant wrongfully screened from
were of equal overall ability on the WAIS. However, each employment based on an irrelevant measure; the aspiring
group excelled on different kinds of subtests in predictable teacher given unfair advantage when a competency test is
directions. In particular, the low-anxious subjects sur- mysteriously leaked beforehand; or the minority child
passed the high-anxious subjects on timed subtests, penalized in testing because English is not her first lan-
whereas the reverse pattern was observed on untimed guage. Exceptions such as these illustrate the need for ethi-
­subtests (Figure 1.4). cal and professional standards in testing.
16 Chapter 1

A major purpose of this topic is to introduce the reader a detailed consent form that openly and honestly de-
to the ethical and professional standards that inform the scribes the evaluation process. However, the consent
practice of psychological testing. We also pursue the form explains that specific feedback about the test re-
related theme of special considerations in the testing of cul- sults will not be provided to job candidates. Question:
tural and linguistic minorities. The two topics share sub- Is it ethical for the psychologist to deny such feedback
stantial overlap: When an examinee is not from the majority to the candidates?
Anglo-American culture (predominantly Caucasian, 2. A competent counselor who has received extensive
­English-speaking, individualistic, future-oriented), ethical training in the interpretation of the MMPI continues to
and professional concerns in testing rise to the forefront. use this instrument even though it has been superced-
Finally, we examine a troubling and under-reported ed by the MMPI-2. His rationale is simply that there is
implication of widespread testing, namely, to the extent a huge body of research on the MMPI and, he feels se-
that society uses test results to make important decisions, cure about the meaning of elevated MMPI test profiles,
the motivation for stakeholders to cheat is intensified. As a whereas he knows very little about the MMPI-2. He
result, cheating has emerged as a dark, unintended conse- intends to switch over to the MMPI-2 at some undeter-
quence of high-stakes testing, especially in the school sys- mined future date, but finds no compelling reason to
tems of our nation. do so immediately. Question: Is the counselor’s refusal
to use the MMPI-2 a breach of professional standards?
1.2.1: The Rationale for Professional 3. A consulting psychologist is asked to evaluate a 9-year-
Testing Standards old boy of Puerto Rican descent for possible learning
Testing is generally applied in a responsible manner, but as disability. The child’s primary language is Spanish
previously noted, there are exceptions. On rare occasions, and his secondary language is English. The psycholo-
testing is irresponsible by design rather than by accident. gist intends to use the Wechsler Intelligence Scale for
Consider, with shuddering amazement, the advertisement Children-IV (WISC-IV) and other tests. Because he
­
for Mind Prober featured in a pop psychology magazine: knows almost no Spanish, the psychologist asks the
child’s after-school babysitter to act as translator when
Read Any Good Minds Lately? With the Mind Prober you
this is required to communicate test directions, specific
can. In just minutes you can have a scientifically accurate
questions, or the child’s responses. Question: Is it an
personality profile of anyone. This new expert systems
appropriate practice to use a translator when adminis-
software lets you discover the things most people are
afraid to tell you. The strengths, weaknesses, sexual inter- tering an individual test such as the WISC-IV?
ests and more. 4. In the midst of taking a test battery for learning disa-
(Eyde & Primhoff, 1992) bility, a distraught 20-year-old female college student
confides a terrifying secret to the psychologist. The cli-
In this case the irresponsibility is so blatant that dis-
ent has just discovered that her 25-year-old brother,
cussion of ethical and professional guidelines is almost
who died three months ago, was most likely a pedo-
superfluous.
phile. She shows the psychologist photographs of
However, testing practices do not always present in
naked children posing in the brother’s bedroom. To
sharply contrasting shades, responsible or irresponsible.
complicate matters, the brother lived with his
The real challenge of competent assessment is to determine
mother—who is still unaware of his well-concealed
the boundaries of ethical and professional practice. As
sexual deviancy. Question: Is the psychologist obli-
usual, it is the borderline cases that provide pause for
gated to report this case to law enforcement?
thought. The reader is encouraged to read the quandaries
of testing described in Case Exhibit 1.2 and form an opin-
ion about each. These examples are based on firsthand
reports to the author. At the close of this chapter, we will The dilemmas of psychological testing do not always
return to these problematic vignettes. have simple, obvious answers. Even thoughtful and expe-
rienced psychologists may disagree as to what is ethical or
professional in a given instance. Nonetheless, the scope of
Case Exhibit 1.2 ethical and professional practice is not a matter of individ-
ual taste or personal judgment. Responsible test use is
Ethical and Professional Quandaries in Testing defined by written guidelines published by professional
1. A consulting psychologist agrees to perform preem- associations such as the American Psychological Associa-
ployment screening for psychopathology in police of- tion, the American Counseling Association, the National
ficer candidates. At the beginning of each consultation, Association of School Psychologists, and other groups.
the psychologist asks the candidate to read and sign Whether they know it or not, all practitioners owe
Applications and Consequences of Psychological Testing 17

allegiance to these guidelines, which we review in the fol- Competence of Test Purchasers  Test publishers
lowing sections. recognize the broad responsibility that only qualified users
In general, the evolution of professional and ethical should be able to purchase their products. By way of brief
standards has been almost uniformly restrictive, providing review, the reasons for restricted access include the poten-
an ever-narrowing demarcation of where, when, and how tial for harm if tests fall into the wrong hands (e.g., an
psychological tests may be used. Partly in response to the undergraduate psychology major administers the MMPI-2
modern climate of litigation, organizations concerned with to his friends and then makes frightful pronouncements
psychological testing have published guidelines that col- about the results) and the obvious fact that many tests are
lectively define the ethical and professional standards rel- no longer valid if potential examinees have previewed
evant to the practice of assessment. them (e.g., a teacher memorizes the correct answers to a
These standards also pertain to corporations and indi- certification exam).
viduals who publish tests. We begin with a survey of These examples illustrate that access to psychological
guidelines for test publishers before examining the respon- tests needs to be limited. But limited to whom? The answer,
sibilities of test users. The chapter closes with a review of it turns out, depends on the complexity of the specific test
special concerns in the testing of cultural and linguistic under consideration. Guidelines proposed many years ago
minorities. by the American Psychological Association (APA, 1953) are
still relevant today, even though they are not enforced by
1.2.2: Responsibilities all publishers. The APA proposed that tests fall into three
of Test Publishers levels of complexity (Levels A, B, and C) that require differ-
ent degrees of expertise from the examiner. Level A com-
The responsibilities of publishers pertain to the publica-
prised simple paper-and-pencil tests that require minimal
tion, marketing, and distribution of their tests. In particu-
training. These can be used by responsible nonpsycholo-
lar, it is expected that publishers will release tests of high
gists such as educational administrators. Examples include
quality, market their product in a responsible manner, and
group educational tests and vocational proficiency scales.
restrict distribution of tests only to persons with proper
Level B tests require training in statistics and knowledge of
qualifications. We consider each of these points in turn.
test construction. Some graduate training is needed. This
Publication and Marketing Issues  Regarding group includes aptitude tests and personality inventories
the publication of new or revised instruments, the most relevant to normal populations. Level C includes the most
important guideline is to guard against premature release complex instruments. Minimum training required is a
of a test. Testing is a noble enterprise but it is also big busi- master’s degree in psychology or a related field. Instru-
ness driven by the profit motive, which provides an inher- ments include projective personality tests, individual tests
ent pressure toward early release of new or revised of intelligence, and neuropsychological test batteries.
materials. Perhaps this is why the American Psychological In general, test publishers try to screen out inappropri-
Association and other organizations have published stand- ate requests by requiring that purchasers have the neces-
ards that relate to test publication (AERA/APA/NCME, sary credentials. For example, the Psychological
1999). These standards pertain especially to the technical Corporation, one of the major suppliers of test materials in
manuals and user guides that typically accompany a test. the United States, requires prospective customers to fill out
These sources must be sufficiently complete so that a quali- a registration form detailing their training and experience
fied user or reviewer can evaluate the appropriateness and with tests. Buyers who do not hold an advanced degree in
technical adequacy of the test. This means that manuals psychology must list details of courses in the administra-
and guides will report detailed statistics on reliability anal- tion and interpretation of tests and in statistics. References
yses, validity studies, normative samples, and other tech- are required, too.
nical aspects. Most test publishers also specify that individuals or
Marketing tests in a responsible manner refers not groups who provide testing and counseling by mail are not
only to advertising (which should be accurate and digni- allowed to purchase materials. On a related note, ethical
fied) but also to the way in which information is portrayed standards now discourage practitioners from giving “take-
in manuals and guides. In particular, test authors should home” tests to clients. Until recent years, this has been an
strive for a balanced presentation of their instruments and occasional practice with lengthy personality tests such as the
refrain from a one-sided presentation of information. For MMPI. The ethics committee endorsed the following point:
example, if some preliminary studies reflect poorly on a Nonmonitored administration of the MMPI generally
test, these should be given fair weight in the manual along- does not represent sound testing practice and may result
side positive findings. Likewise, if a potential misuse or in invalid assessment for a variety of reasons (e.g., influ-
inappropriate use of a test can be anticipated, the test ence from other people or completion of the test while
author needs to discuss this matter as well. intoxicated).
18 Chapter 1

In general, users are advised to refrain from giving Because of their specificity, a detailed analysis of rele-
take-home tests and publishers are counseled to deny vant ethical and professional standards is beyond the scope
access to practitioners or groups who promote this practice. of this text. What follows is a summary of the general pro-
Even though publishers attempt to filter out unquali- visions that pertain to the responsible practice of psycho-
fied purchasers, there may still be instances in which sensi- logical testing and clinical psychological assessment.
tive tests are sold to unscrupulous individuals. Oles and These principles apply to psychologists, students of
Davis (1977) discovered that graduate students in psychol- psychology, and others who work under the supervision of
ogy could purchase the WISC-R, MMPI, TAT, Stanford- a psychologist. We restrict our discussion to those princi-
Binet, and 16PF if they typed their orders on college ples that are directly pertinent to the practice of psycho-
stationery, placed the letters Ph.D. after their names, logical testing. Proper adherence to these principles would
enclosed payment, and used a post office box return address. eliminate most—but not all—legal challenges to testing.
Although illicit test orders are few in number, they do occur.
Best Interests of the Client Several ethical prin-
ciples recognize that all psychological services, including
1.2.3: Responsibilities of Test Users assessment, are provided within the context of a profes-
sional relationship. Psychologists are, therefore, enjoined
The psychological assessment of personality, interests, brain
to accept the responsibility implicit in this relationship. In
functioning, aptitude, or intelligence is a sensitive profes-
general, the practitioner is guided by one overriding ques-
sional action that should be completed with utmost concern
tion: What is in the best interests of the client? The func-
for the well-being of the examinee, his or her family, employ-
tional implication of this guideline is that assessment
ers, and the wider network of social institutions that might
should serve a constructive purpose for the individual
be affected by the results of that particular clinical assess-
examinee. If it does not, the practitioner is probably violat-
ment (Matarazzo, 1990). Over the years, the profession of
ing one or more specific ethical principles. For example,
psychology has proposed, clarified, and sharpened a series
Standard 11.15 in the Standards manual (AERA, APA,
of thorough and thoughtful standards to provide guidance
NCME, 1999) warns testers to avoid actions that have
for the individual practitioner. Professional organizations
unintended negative consequences. Allowing a client to
publish formal ethical principles that bear upon test use,
attach unsupported surplus meanings to test results would
including the American Psychological Association (APA,
not be in the best interests of the client and would, there-
2002), the American Association for Counseling and Devel-
fore, constitute an unethical testing practice. In fact, with
opment (AACD, 1988), the American Speech-Language-
certain worry-prone and self-doubting clients, a psycholo-
Hearing Association (ASHA, 1991), and the National
gist may choose not to use an appropriate test, since these
Association of School Psychologists (NASP, 2010).
clients are almost certain to engage in self-destructive mis-
In addition to ethical principles, several testing organi-
interpretation of virtually any test findings.
zations have published practice guidelines to help define
the scope of responsible test use. Sources of test use guide- Confidentiality and the Duty to Warn  Prac-
lines include teaching groups (AFT, NCME, NEA, 1990), titioners have a primary obligation to safeguard the con-
the American Psychological Association (APA, 1992b), the fidentiality of information, including test results, that
Educational Testing Service (ETS, 1989), the Joint Commit- they obtain from clients in the course of consultations
tee on Testing Practices (JCTP, 1988), the Society for Indus- (Principle 5; APA, 1992a). Such information can be ethi-
trial and Organizational Psychology (SIOP, 1987), and cally released to others only after the client or a legal rep-
professional alliances (AERA, APA, NCME, 1999). Finally, resentative gives unambiguous consent, usually in
we should mention that the principles of responsible test written form. The only exceptions to confidentiality
use have been distilled in an illuminating casebook pub- involve those unusual circumstances in which the with-
lished jointly by several testing groups (Eyde, Robertson, holding of information would present a clear danger to
& Krug, 2009). the client or other persons. For example, most states have
The dozens of guidelines relevant to testing are quite passed laws that mandate that health care practitioners
specific, for example: must report all cases of suspected abuse in children and
vulnerable elderly persons. In most states, a psychologist
Standard 5.9: When test score information is released to
who learns in the course of testing that the client has
students, parents, legal representatives, teachers, clients,
physically or sexually abused a child is obligated to
or the media, those responsible for testing programs
should provide appropriate interpretations. The interpre-
report that information to law enforcement.
tations should describe in simple language what the test Psychologists also have a duty to warn that stems
covers, what scores mean, the precision of the scores, from the 1976 decision in the Tarasoff case (Wrightsman,
common misinterpretations of test scores, and how scores Nietzel, Fortune, & Greene, 2002). Tanya Tarasoff was a
will be used. young college student in California who was murdered by
Applications and Consequences of Psychological Testing 19

Prosenjit Poddar, a student from India. What makes the K scale. The K scale is usually considered a good index of
case relevant to the practice of psychology is that Poddar defensive test-taking attitudes, especially for mental health
had made death threats regarding Tarasoff to his campus- evaluations with clinic or hospital referrals. By way of
based therapist. Although the therapist warned the police quick review, MMPI T scores of approximately 50 are av-
that Poddar had made death threats, he did not warn Tara- erage, whereas elevations of 70 or higher are considered
soff. Two months later, Poddar stabbed Tarasoff to death at noteworthy. The consulting psychologist noticed the can-
her home. The parents of Tanya Tarasoff sued, and the Cal- didate’s elevated score on the K scale, surmised hastily that
ifornia Supreme Court later agreed that therapists have a the candidate was unduly defensive, and cautioned the po-
duty to use “reasonable care” to protect potential victims lice chief not to hire her.
from their clients. Although the Tarasoff ruling has been What the psychologist did not know is that elevated
modified by legislation in many states, the thrust of the K-scale scores are extremely common among law enforce-
case still stands: Clinicians must communicate any serious ment job applicants. For example, Hiatt and Hargrave
threat to the potential victim, law enforcement agencies, or (1988) found that about 25 percent of a sample of peace of-
both. ficers produced MMPI profiles with K scales at or above a
Finally, the clinician should consider the client’s wel- T score of 70. In fact, successful police officers tend to have
fare in deciding whether to release information, especially higher K-scale scores than “problem” peace officers! In this
when the client is a minor who is unable to give volun- case the test user did not possess sufficient expertise to use
tary, informed consent. When appropriate, practitioners the MMPI for job screening. His ignorance on this point
are advised to inform their clients of the legal limits of constituted a breach of professional ethics. Incidentally, the
confidentiality. case was settled out of court for a substantial sum of mon-
ey, showing that trespasses of responsible test use can have
Expertise of the Test User A number of principles
serious legal consequences.
acknowledge that the test user must accept ultimate
responsibility for the proper application of tests. From a
practical standpoint, this means that the test user must be
well trained in assessment and measurement theory. The The expertise of the psychologist is particularly rele-
user must possess the expertise needed to evaluate psy- vant when test scoring and interpretation services are
chological tests for proper standardization, reliability, used. The Ethical Principles of the American Psychological
validity, interpretive accuracy, and other psychometric Association leave no room for doubt:
characteristics. This guideline has special significance in Psychologists retain appropriate responsibility for the
areas such as job screening, special education, testing of appropriate application, interpretation, and use of assess-
persons with disabilities, or other situations in which ment instruments, whether they score and interpret such
potential impact is strong. tests themselves or use automated or other services.
Psychologists who are poorly trained in their chosen (APA, 1992a)
instruments can make serious errors of test interpretation Informed Consent Before testing commences, the
that harm examinees. Furthermore, inept test usage may test user needs to obtain informed consent from test takers
expose the examiner to professional sanctions and civil or their legal representatives. Exceptions to informed con-
lawsuits. A common error observed among inexperienced sent can be made in certain instances, for example, legally
test users is the overzealous, pathologized interpretation of mandated statewide testing programs, school-based group
personality test results (Case Exhibit 1.3). testing, and when consent is clearly implied (e.g., college
admissions testing). The principle of informed consent is
so important that the Standards manual devotes a separate
Case Exhibit 1.3 standard to it:
Informed consent implies that the test takers or represent-
Overzealous Interpretation of the MMPI
atives are made aware, in language that they can under-
An inexperienced consulting psychologist routinely used stand, of the reasons for testing, the type of tests to be
the MMPI for preemployment screening of law enforce- used, the intended use and the range of material conse-
ment candidates. One candidate subsequently filed a law- quences of the intended use. If written, video, or audio
suit, alleging that she had been harmed by the psycholo- records are made of the testing session, or other records
gist’s report. The plaintiff, a young woman with extensive are kept, test takers are entitled to know what testing
information will be released and to whom.
training and background in law enforcement, was denied
(AERA et al., 1999)
a position as police officer because of a supposedly “de-
fensive” MPI profile. Her profile was entirely within nor- Even young children or test takers with limited intel-
mal limits, although she did obtain a T score of 72 on the ligence deserve an explanation of the reasons for
20 Chapter 1

assessment. For example, the examiner might explain, “I’m Obsolete Tests and the Standard of Care
going to ask you some questions and have you work on Standard of care is a loose concept that often arises in the
some puzzles so I can see what you can do and find out professional or legal review of specific health practices,
what things you need more help with.” including psychological testing. The prevailing standard
From a legal standpoint, the three elements of of care is one that is “usual, customary or reasonable”
informed consent include disclosure, competency, and (Rinas & Clyne-Jackson, 1988). To cite an extreme example,
voluntariness (Melton, Petrila, Poythress, & Slobogin, in medicine the standard of care for a fever might include
1998). The heart of disclosure is that the client receive suf- the administration of aspirin—but would not include the
ficient information (e.g., about risks, benefits, release of antiquated practice of bleeding the patient.
reports) to make a thoughtful decision about continued Practitioners of psychological testing must be wary of
participation in the testing. Competency refers to the obsolete tests, because their use might violate the prevailing
mental capacity of the examinee to provide consent. In standard of care. A case in point is the MMPI versus the
general, there is a presumption of competency unless the MMPI-2. Even though the MMPI-2 is a relatively conserva-
examinee is a child, very elderly, or has mental disabili- tive revision of the highly esteemed MMPI, the improve-
ties (e.g., has mental retardation). In these cases, a guard- ments in norming and scale construction are substantial.
ian will need to provide legal consent. Finally, the The MMPI-2 is now the standard of care in MMPI-based
standard of voluntariness implies that the choice to assessment of psychopathology. Practitioners who continue
undergo an assessment battery is given freely and not to rely on the original MMPI could be liable for malpractice
based on subtle coercion (e.g., inmates are promised suits, especially if the test interpretation resulted in mis-
release time if they participate in research testing). In leading interpretive statements or an incorrect diagnosis.
most cases, the examiner uses a written informed consent Another concern relevant to the standard of care is
form such as that found in Figure 1.5. reliance on test results that are outdated for the current
purpose. After all, individual characteristics and traits
show valid change over time. A student who meets the cri-
Figure 1.5 Abbreviated Example of Informed Consent for teria for learning disability (LD) in the fourth grade might
Psychological Assessment
show large gains in academic achievement, such that the
Note: This form is illustrative only. Practitioners should consult legal counsel
in regard to the details of an informed consent form. LD diagnosis is no longer accurate in the fifth grade. Per-
sonality test results are especially prone to quixotic change.
INFORMED CONSENT FOR PSYCHOLOGICAL ASSESSMENT A short-term personal crisis might cause an MMPI-2 pro-
This is an agreement between [Client’s Name] and Dr. [Practitioner’s Name], a file to look like a range of mountains. A week later, the test
licensed psychologist in the state of Illinois. You are encouraged to ask ques-
tions at any time about my training and background, and about the process
profile could be completely normal. It is difficult to provide
of testing. comprehensive guidelines as to the “shelf life” of psycho-
1. General Information: The purpose of this assessment is to provide you logical test results. For example, GRE test scores that are
(and possibly others) with information about your psychological function- years old still might be validly predictive of performance
ing that could prove helpful. The assessment will involve a brief interview
and psychological testing. The entire process will take about three to four in graduate school, whereas Beck Depression Inventory
hours. test results from yesterday could mislead a therapist as to
2. Specific Procedures: In addition to interview, the following tests will be
administered: [List of tests and brief descriptions], e.g., MMPI-2, a 567-
the current level of depression. Practitioners must evaluate
item true-false inventory of psychological functioning. WAIS-IV, a general the need for retesting on an individual basis.
test of adult intelligence in varied areas.
3. Test Report: The relevant information from the interview and the test
results will be summarized in a written report. The results and the report Responsible Report Writing  Except for group
will be reviewed with you in approximately one week. I will keep a copy of testing, the practice of psychological testing invariably cul-
this report in a locked file for at least seven years.
4. Confidentiality: The report will not be released to any other source un- minates in a written report that constitutes a semiperma-
less you sign a formal request. A few (remote) exceptions to the confiden- nent record of test findings and examiner recommendations.
tiality guideline include situations of potential harm to self or others, abuse
of children or elderly, or a court order to release the test results. Effective report writing is an important skill because of the
5. Cost: An hourly rate of $____ is used in determining the total fee. I will potential lasting impact of the written document. It is
bill your insurance company, but you are responsible for the cost. The
estimated total cost for your assessment is $____.
beyond the scope of this text to illuminate the qualities of
6. Side Effects: While most people find these tests and procedures to be effective report writing, although we can refer the reader to
interesting, some people experience anxiety when tested. Yet, it is unlikely
a few sources (Gregory, 1999; Tallent, 1993).
that you will experience any long-term adverse effects from this assess-
ment. You are encouraged to talk about the experience as we proceed. Responsible reports typically use simple and direct
7. Refusal of Assessment: Most people find the process of psychological writing that steers clear of jargon and technical terms. The
assessment to be beneficial. However, you are not required to undergo
this assessment. You can withdraw consent and discontinue at any time. proper goal of a report is to provide helpful perspectives
On request, I will discuss referral options with you. on the client, not to impress the referral source that the
___________________________________ _________________ examiner is a learned person! When Tallent (1993) sur-
Client’s Signature Date
veyed more than one thousand health practitioners who
Applications and Consequences of Psychological Testing 21

made referrals for testing, one respondent declared his dis- understanding of what IQ scores mean. After all, IQ is a
dain toward psychologists who “reflect their needs to shine limited slice of intellectual functioning: It does not evaluate
as a psychoanalytic beacon in revealing the dark, deep drive or character of any kind, it is accurate only to about
secrets they have observed.” On a related note, effective ±5 points, it may change over time, and it does not assess
reports stay within the bounds of expertise of the exam- many important attributes such as creativity, social intelli-
iner. For example: gence, musical ability, or athletic skill. But a client may have
an unrealistic perspective about IQ and, hence, might jump
It is never appropriate for a psychologist to recommend
that a client undergo a specific medical procedure (such to erroneous conclusions when hearing that her score is
as a CT scan for an apparent brain tumor) or receive a “only” 93. The careful practitioner will elicit the client’s
particular drug (such as Prozac for depression). Even views and challenge them when needed before proceeding.
when the need for a special procedure seems obvious Further thoughts on feedback can be found in Pope (1992).
(e.g., the symptoms strongly attest to the rapid onset of a Going beyond the general pronouncement to avoid
brain disease), the best way to meet the needs of the client harm when providing test feedback, Finn and Tonsager
is to recommend immediate consultation with the appro- (1997) present the intriguing view that information about
priate medical profession (e.g., neurology or psychiatry). test results should be directly and immediately therapeutic
(Gregory, 1999) to individuals experiencing psychological problems. In
Additional advice on effective report writing can be other words, they propose that psychological assessment is
found in Ownby (1991) and Sattler (2001). a form of short-term intervention, not just a basis for gath-
ering information that is later used for therapeutic pur-
Communication of Test Results  Individuals poses. In one study (Finn & Tonsager, 1992), they examined
who take psychological tests anticipate that the results will the effects of a brief psychological assessment on clients at
be shared with them. Yet practitioners often do not include a university counseling center. Thirty-two students took
one-to-one feedback as part of the assessment. A major rea- part in an initial interview, completed the MMPI-2, and
son for reluctance is a lack of training in how to provide then received a one-hour feedback session conducted
feedback, especially when the test results appear to be neg- according to a method developed by Finn (1996). A com-
ative. For example, how does a clinician tell a college stu- parison group of 29 students was interviewed and received
dent that her IQ is 93 when most students in that milieu an equal amount of supportive, nondirective psychother-
score 115 or higher? apy instead of the test feedback. The clients in the MMPI-2
Providing effective and constructive feedback to cli- assessment group showed a greater decline in sympto-
ents about their test results is a challenging skill to learn. matic distress and a greater increase in self-esteem, imme-
Pope (1992) emphasizes the responsibility of the clinician diately following their feedback session and also two
to determine that the client has understood adequately and weeks later, than the clients in the comparison group. The
accurately the information that the clinician was attempt- feedback group also felt more hopeful about their prob-
ing to convey. Furthermore, it is the responsibility of the lems after the brief assessment. These findings illustrate
clinician to check for adverse reactions: the importance of providing thoughtful and constructive
Is the client exceptionally depressed by the findings? Is the test feedback instead of rushing through a perfunctory
client inferring from findings suggesting a learning disor- review of the results.
der that the client—as the client has always suspected—is
“stupid”? Using scrupulous care to conduct this assess- Consideration of Individual Differences 
ment of the client’s understanding of and reactions to the Knowledge of and respect for individual differences is
feedback is no less important than using adequate care in highlighted by all professional organizations that deal with
administering standardized psychological tests; test psychological testing. The American Psychological Associ-
administration and feedback are equally important, fun- ation lists this as one of six guiding principles:
damental aspects of the assessment process. (p. 271)
Principle D: Respect for People’s Rights and Dignity …
Proper and effective feedback involves give-and-take Psychologists are aware of cultural, individual, and role
dialogue in which the clinician ascertains how the client differences, including those due to age, gender, race, eth-
has perceived the information and seeks to correct poten- nicity, national origin, religion, sexual orientation, disabil-
tially harmful interpretations. ity, language, and socio-economic status. Psychologists
Destructive feedback often arises when the clinician try to eliminate the effect on their work of biases based on
fails to challenge a client’s incorrect perceptions about the those factors, and they do not knowingly participate in or
condone unfair discriminatory practices.
meaning of test results. Consider IQ tests in particular—a
(APA, 1992a)
case in which many persons deify test scores and consider
them an index of personal worth. Prior to providing test The relevance of this principle to psychological testing
results, a clinician is advised to investigate the client’s is that practitioners are expected to know when a test or
22 Chapter 1

interpretation may not be applicable because of factors Porteus, who undertook a wide-ranging investigation of
such as age, gender, race, ethnicity, national origin, reli- the temperament and intelligence of Australian aboriginal
gion, sexual orientation, disability, language, and socioeco- peoples. Porteus (1931) used many traditional instruments
nomic status. We can illustrate this point with a case study (block designs, mazes, digit span), but to his credit he also
reported in Eyde et al. (1993). A psychologist evaluated a devised an ecologically valid measure of intelligence for
75-year-old man at the request of his wife, who had noticed this group, namely, footprint recognition. Whereas the abo-
memory problems. The psychologist administered a men- riginal examinees performed poorly on the Eurocentric
tal status examination and a prominent intelligence test. tests, their ability to recognize photographed footprints
Performance on the mental status examination was nor- was on a par with other racial groups studied. Even so,
mal, but standard scores on the intelligence test revealed a Porteus displayed an acute awareness that his procedures
large discrepancy between verbal subtests and subtests still might have handicapped the aboriginals:
measuring spatial ability and processing speed. The psy-
The photograph of a footprint is not the same as the foot-
chologist interpreted this pattern as indicating a deteriora- print itself, and quite probably a number of cues that are
tion of intellectual functioning in the husband. made use of by the aboriginal tracker are absent from a
Unfortunately, this interpretation was based on faulty use photograph. The varying depths of parts of the foot
of non-age-corrected standard scores. Also, the psycholo- impression are not visible in the photograph, and the
gist did not assess for depression, which is known to cause individual peculiarities other than general shape and size
visuospatial performance to drop sharply (Wolff & G ­ regory, of the footprint may not be brought out clearly. Hence we
1992). In fact, a series of further evaluations revealed that must expect that the aboriginal subjects would be under
the husband was a perfectly healthy 75-year-old man. The some disadvantage in matching these photographs of
psychologist failed to consider the relevance of the gentle- footprints, as against recognition of the footprints them-
selves. (pp. 399–400)
man’s age and emotional status when interpreting the intel-
ligence test. This was a costly oversight that caused the In a similar vein, DuBois (1939) found that Pueblo
client and his wife substantial unnecessary worry. Indian children displayed superior ability on his specially
devised horse drawing test of mental ability, whereas they
1.2.4: Testing of Cultural performed less well on the mainstream Goodenough (1926)
Draw-A-Man test. From these early studies onward, psy-
and Linguistic Minorities chologists have maintained a keen interest in the impact of
Background and Historical Notes  Persons of language and culture on the meaning of test results.
ethnic minority descent (non-European origin) currently
constitute about a third of the U.S. population, and it is The Impact of Cultural Background on Test
estimated that they will comprise more than 50 percent Results  Practitioners need to appreciate that the cul-
within several decades. Yet the enterprise of testing is tural background of examinees will impact the entire pro-
based almost entirely on the efforts of white psychologists cess of assessment. For this reason, Sattler (1988) advises
who bring an Anglo-American viewpoint to their work. assessment psychologists to approach their task from a
The suitability of existing tests for the evaluation of diverse pluralistic standpoint:
populations cannot be taken for granted. The assessment Cultural groups may vary with respect to cultural values
of ethnic minority individuals raises important questions, (stemming in part from cultural shock, discontinuity, or
especially when test results translate to placement deci- conflict); language and nuances in language style; views
sions or other sensitive outcomes, as is commonly the case of life and death; roles of family members; problem-­
within educational institutions. solving strategies; attitudes toward education, mental
Unfortunately, the early pioneers in the testing move- health, and mental illness; and stage of acculturation (the
ment largely ignored the impact of cultural background on group may follow traditional values, accept the dominant
test results. For example, in the 1920s Henry Goddard con- group’s values, or be at some point between the two). You
cluded that the intelligence of the average immigrant was should adopt a frame of reference that will enable you to
understand how particular behaviors make sense within
alarmingly low, “perhaps of moron grade.” Yet he down-
each culture. (p. 505)
played the likelihood that language and cultural differ-
ences could explain the low test scores of immigrants. For example, it is often noted that Native Americans
Goddard’s role in the history of testing is discussed in the display a distinctive conception of time, emphasizing pre-
next chapter. sent-time as opposed to the future-time orientation that is so
Perhaps as a rebound against these early methods, powerfully formative in white, middle-class America
beginning in the 1930s psychologists displayed an ­(Panigua, 1994). A possible implication of this cultural dif-
increased sensitivity to cultural variables in the practice of ference is that time limits might not mean the same thing for
testing. A shining example in this regard was Stanley a Native American child as for a child from the mainstream
Applications and Consequences of Psychological Testing 23

culture. Perhaps the minority child will disregard the sub- The essential lesson of this study is that culturally
test instructions and work at a careful, measured pace rather based differences in response style may function to conceal
than seeking quick solutions. Of course, this child would the underlying competence of some examinees. Cautious
then obtain a misleadingly low score on that measure. interpretation of test results is always advisable, but this is
While acknowledging the impact of cultural differ- especially important for examinees from culturally or lin-
ences on testing, it is also important to avoid stereotypical guistically diverse backgrounds.
overgeneralization. Culture is not monolithic. Every per- The influence of cultural factors is not limited to the
son is unique. Some Native Americans will exhibit a dis- test performance of children but extends to adults as well.
tinctive orientation to time but perhaps most will not. The Terrell, Terrell, and Taylor (1981) investigated the effects of
challenge for the practitioner is to observe the clinical racial trust/mistrust on the intelligence test scores of Afri-
details of performance and to identify the culture-based can American college students. They identified African
nuances of behavior that help determine the test results. American students with high and low levels of mistrust of
An ingenious study by Moore (1986) powerfully illus- whites. Using a 2 3 2 design, half of each group was then
trates the relevance of cultural background for understand- administered an individual intelligence test by a white
ing the test performance of ethnic minority examinees. She examiner, the other half by an African American examiner.
compared not only the intelligence test scores but also the As predicted, the analysis of variance revealed no differ-
qualitative manner of responding to test demands in two ences for the main effects of race of examiner (white versus
groups of adopted African American children. One group African American) or level of mistrust (high versus low)
of 23 children had been transracially adopted into middle- (Figure 1.6). But a substantial interaction was revealed;
class white families. The other group of 23 children had namely, the high-mistrust group with an African American
been intraracially adopted into middle-class African Ameri- examiner scored much better than the high-mistrust group
can families. All children were adopted prior to age 2 and with a white examiner (average IQs of 96 versus 86, respec-
the backgrounds of the adoptive families were similar in tively). Put simply, cultural mistrust among African Amer-
terms of education and social class. Thus, group difference icans was associated with significantly lower IQ scores, but
in test scores and test behaviors could be attributed mainly only when the examiner was white.
to differences in cultural background arising from the fact Further illustrating cultural influences, Steele (1997) has
that one group was adopted into African American families, proposed a theory that societal stereotypes about groups
the other adopted into white families. Testing and observa- influence the immediate intellectual performance and also
tions were completed by two female African American the long-term identity development of individual group
examiners who were “blind” to the purposes of the study. members. He has applied this theory both to women—when
Tested at 7 to 10 years of age, the transracially adopted chil- stereotypes affect their achievement in math and sciences—
dren scored an average IQ of 117 on the WISC compared to and to African Americans—when stereotypes apparently
an average IQ of 104 for the traditionally adopted children. depress their performance on standardized tests. Here we
These IQ results were not remarkable, insofar as Scarr and discuss his research on stereotype threat with African Amer-
Weinberg reported similar findings years before. ican college students (Steele & Aronson, 1995).
The surprising and informative outcome of the study
was that the two groups of children showed very different Figure 1.6 Mean IQ Scores of African American Students
qualitative behaviors during testing. As a group, the chil- as a Function of Race of Examiner and Cultural Mistrust
dren with lower IQ scores (those adopted by African Source: Based on data in Terrell, F., Terrell, S., & Taylor, J. (1981). Effects of
American families) were less likely to spontaneously elab- race of examiner and cultural mistrust on the WAIS performance of Black stu-
dents. Journal of Consulting and Clinical Psychology, 49, 750–751.
orate on their work responses and more likely simply to
refuse to respond when presented with a test demand. 100
o Low Mistrust
Moore (1986) offers the following interpretations: x
Mean IQ Score

95
o
Children’s tendency to spontaneously elaborate on their 90
work responses may be a very important index of their x High Mistrust
85
level of involvement in task performance, strategies for
80
problem solving, level of motivation to generate a correct
response, and level of adjustment to the standardized test African White
situation…. Although the terminal not-work response is American
treated as an incorrect response, it does not actually pro- Race of Examiner
vide any empirical documentation of what the child does
or does not know or of what the child can and cannot do. The idea of stereotype threat is essentially a sophisti-
The only information available is that the child did not cated version of a self-fulfilling prophecy. The researchers
respond to the demand. (p. 322) define stereotype threat as the threat of confirming, as
24 Chapter 1

self-characteristic, a negative stereotype about one’s group. to score lower on standardized tests. The details are beyond
For example, based on published data and media coverage the scope of this text, but the overall conclusion is not:
about race and IQ scores, African Americans are stereo-
Our best assessment is that stereotype threat caused an
typed as possessing less intellectual ability than others. As inefficiency of processing much like that caused by other
a consequence, whenever they encounter tests of intelli- evaluative pressures. Stereotype-threatened participants
gence or academic achievement, individuals from this spent more time doing fewer items more inaccurately—
group may perceive a risk that they will confirm the stereo- probably as a result of alternating their attention between
type. In the short run, stereotype threat is hypothesized to trying to answer the items and trying to assess the self-
depress test performance through heightened anxiety and significance of their frustration.
other mechanisms. In the long run, it may have the further (Steele & Aronson, 1995, p. 809)
impact of pressuring African American students to “pro- In sum, the authors propose a social-psychological
tectively disidentify” with achievement in school and perspective on the meaning of lower test scores in African
related intellectual domains. Americans and perhaps other stereotype-threatened
Steele and Aronson (1995) conducted a series of four groups as well. Their viewpoint emphasizes that test
studies to evaluate the hypothesis of stereotype threat. All results do not reside within individuals. Test scores occur
the investigations supported the hypothesis. We focus here within a complex social-psychological field that is poten-
on the first study, in which African American and white tially influenced by national history, predicaments of race,
college students were given a 30-minute test composed of and many other subtle factors.
challenging items from the verbal Graduate Record Exami-
nation. Students from both racial groups were randomly
1.2.5: Unintended Effects
assigned to one of three test conditions: stereotype-threat,
in which the test was described as diagnostic of individual of High-Stakes Testing
verbal ability; control, in which the test was described as a The prevailing view in the general public is that cheating
research tool only; and control-challenge, in which the test rarely or never occurs in nationally administered testing
was described as a research tool only but participants were programs. We tend to think that the risks are too high and
exhorted to “take this challenge seriously.” Scores on the the opportunities too limited for cheaters to prevail. There-
verbal test were adjusted (covariate analysis) on the basis fore, we rest assured that test fraud must be a rare event.
of prior achievement scores so as to eliminate the effects of Unfortunately, this view is probably naive. After all, a grow-
preexisting differences between groups. ing number of people must pass a test to gain college entry,
Race differences were small and nonsignificant in the get a job, or obtain a promotion. Furthermore, school offi-
control and control-challenge conditions, whereas African cials increasingly are evaluated on the basis of average test
Americans scored much lower than whites in the stereo- scores in their district. Precisely because the stakes are so
type-threat condition (Figure 1.7). In other studies, Steele high, unscrupulous individuals will try to beat the system.
and Aronson (1995) investigated the mechanism of media- Widespread cheating in public school systems is spo-
tion by which stereotype threat caused African Americans radically reported in many large cities across the United
States. In most cases, the cheating is motivated by the
desire of teachers and principals to further their own
Figure 1.7 Average Verbal Items Correct for Whites and careers by creating the illusion of educational excellence.
African Americans under Three Conditions For example, in 1999, dozens of teachers and two princi-
Source: Based on data in Steele, C. M., & Aronson, J. (1995). Stereotype pals in the New York City public school system were
threat and the intellectual test performance of African Americans. Journal of
Personality and Social Psychology, 69, 797–811. charged with helping students cheat on the standardized
reading and math tests used to rank schools and determine
15
whether students move on to the next grade (New York
14
x Times, December 12, 1999). The cheating scheme was
13
Average Test Performance

12
Whites x described as “one of the largest in the recent history of
x o o
11 American public schools.” In 2000, an entire eighth-grade
10 class in a Chicago elementary school was required to retake
9
o African Americans
the Iowa Tests of Basic Skills (ITBS) because a school
8 administrator allegedly filled in incomplete tests and
7 changed incorrect answers to correct ones (Chicago Tribune,
6
June 2, 2000). Officials were tipped off to the fraud because
5
the test scores were simply too good to be true—the average
Stereotype- Control Control- score for the class was two years above their standing. In
Threat Only Challenge 2005, the Dallas Morning News reported strong evidence of
Applications and Consequences of Psychological Testing 25

“organized, educator-led cheating” in dozens of schools on then coaches the student by means of an audio receiver
the statewide achievement test and found suspicious (e.g., hidden in the ear).
scores in hundreds more (www.dallasnews.com, March 21, Stories about miniature transmitters are not fanciful.
2005). Disturbingly, one assessment expert noted, “You’re Consider the following story reported from a monolithic
catching the dumb cheaters. The smart cheaters you’re not culture where test results literally make or break a child’s
going to be able to detect.” We only read about the cases of future. In China, 10 million 18-year-olds take a two day
cheating that are detected. The number of undetected cases exam each year that determines whether they will be
is simply unknown, although probably larger than the allowed to attend public universities. Success or failure
public would like to believe. drastically impacts their lives and those of their families
Cheating in public school systems is not a thing of the who might depend on their future income. In 2009, eight
past. It continues unabated, year after year. In 2011, a dec- parents were jailed for up to three years after it was deter-
ade long cheating scandal was revealed in the Atlanta, mined that they were transmitting stolen test answers to
Georgia, public school system (Atlanta Journal-Constitution, their children through miniature earpieces. The subterfuge
July 6, 2011). Teachers and principals routinely changed was discovered when police detected unusual radio sig-
students’ answer sheets to produce higher scores. The nals near the school (www.guardian.co.uk, April 3, 2009).
school system scores soared dramatically, bringing national In 2012, cheating was brought to light on the board
acclaim to the district and the superintendent. But it was certification test for radiology (CNN, Prescription for Cheat-
all based on fraud perpetrated by 178 educators, including ing, January 13, 2012). For years, doctors around the coun-
38 principals. Cheating was confirmed in 44 of 56 schools try have helped one another cheat by each memorizing one
examined. In 2011, six charter schools in Los Angeles were or two test questions verbatim, writing down the questions
threatened with closure when it was discovered that the after taking the test, and circulating the ever-expanding list
founding director had ordered principals to open the state of questions (dubbed “recalls”) to cooperating programs.
standardized tests and train students on actual test ques- The practice is so widespread and considered so egregious
tions (Los Angeles Times, June 22, 2011). Suspiciously, scores that the American Board of Radiology released a sternly
for the schools had vaulted upward in recent years. The worded video condemning the use of recalls as unethical.
director and the six principals were terminated. CNN found at least 15 years’ worth of test questions (with
An especially flagrant instance of cheating on national answers) on a website for residents in radiology.
tests was uncovered in Louisiana in 1997. This case Recently, efforts to circumvent exam security have
involved wholesale circulation of the Educational Testing become even more brazen, with some test preparation
Service (ETS) exam administered to teachers who want to companies encouraging students to steal copies of college
be school principals. As reported in the New York Times entrance exams such as the Scholastic Assessment Tests
(September 28, 1997), copies of the 145-item test, along (SAT) (Los Angeles Times, October 12, 2005). Fortunately, the
with correct answers, had circulated among teachers publisher of the SAT was granted a restraining order in
throughout southern Louisiana, most likely for several federal court, prohibiting individuals or companies from
years. In a state ranked at or near the bottom on nearly soliciting stolen copies of the test. Even so, this episode
every educational index, it appears that many potentially illustrates once again that high-stakes testing has had a
unqualified persons cheated their way into running the corrupting influence on the testing process.
schools. ETS handled this case quietly by asking more than Dishonest and inappropriate practices by school offi-
200 teachers to retake the test so as to “confirm” their initial cials are implicated in the recent inflation of scores on
scores. Unfortunately, the Louisiana case was not an iso- nationally normed group tests of achievement. By defini-
lated instance. In another case, ETS allegedly failed to tion, for a norm-referenced test, 50 percent of the exami-
monitor its handling of the federal government’s test for nees should score above the 50th percentile, 50 percent
immigrants who want to become citizens, with the likely below. If the same test is used in a large sample of typical
result that test supervisors accepted bribes. English-­ and representative school systems, average scores for the
proficiency tests for foreign students also were vulnerable school systems should be split evenly—about half above
to cheating. In 1994, ETS canceled the scores of 30,000 stu- the nationally normed 50th percentile, half below.
dents from China after discovering a ring that was selling According to a survey reported in the news media
the examinations abroad. Cizek (1999) catalogues literally (Foster, 1990), virtually all states of the union claim that
dozens of ingenious ways that students have developed average achievement scores for their school systems
for cheating on tests: writing information on the floor, in exceed the 50th percentile. The resulting overly optimistic
tissues, on the back of a bottled water label; using an ultra- picture of student achievement is labeled the Lake
violet pen to write information on “blank” paper; and ­Wobegon Effect, in reference to humorist Garrison
using a video transmitter (e.g., hidden in an eyeglass case) ­Keillor’s mythical Minnesota town where “all the children
to send pictures of the test to an outside accomplice who are above average.”
26 Chapter 1

How does inflation of achievement test scores arise? polluting the validity of a worthy test—especially when
According to Cannell (1988), the major cause is educa- crucial stakeholders have no voice in the process.
tional administrators who are desperate to demonstrate Further, in teaching to the tests, educators may empha-
the excellence of their school systems. Precisely because size bits and pieces of factual knowledge rather than
our society attaches so much importance to achievement imparting a general ability to think clearly and solve prob-
test results, some educators apparently help students lems. In conclusion, it appears that an excessive emphasis
cheat on standardized tests. The alleged cheating includes on nationally normed achievement tests for selection and
the following: evaluation promotes inappropriate behavior, including
outright fraud and cheating on the part of students and
• Teachers and principals coach students on test school officials. Just how widespread is the problem?
answers. Although we live with the optimistic assumption that
• Examiners give more than the allotted time to take fraud in nationally normed testing programs is rare, the
tests. disturbing truth is that we really don’t know how often
• Administrators alter answer sheets. this occurs.
• Teachers teach directly to the specific test items.
• Teachers make copies of the tests to give to their 1.2.6: Reprise: Responsible Test Use
­students. We return now to the real-life quandaries of testing men-
In sum, the importance that our society attaches to tioned at the beginning of the topic. The reader will recall
achievement test scores has caused a number of unappeal- that the first quandary had to do with whether a consulting
ing side effects that undermine the very foundations of psychologist responsibly could refuse to provide feedback
nationally normed group-testing programs. to police officer candidates referred for preemployment
Moore (1994) reports on a special case in educational screening. Surprisingly, the answer to this query is “Yes.”
testing, namely, the districtwide consequences of court- Under normal circumstances, a practitioner must explain
ordered achievement testing. He surveyed 79 teachers assessment results to the client. But there are exceptions, as
from third- through fifth-grade level in a midwestern explained by Principle 9.10 of the APA Ethical Code:
town in which the court required the use of a standardized Psychologists take reasonable steps to ensure that expla-
test to determine the effectiveness of a desegregation nations of results are given to the individual or desig-
effort. The test in question, the Iowa Tests of Basic Skills nated representative unless the nature of the relationship
(ITBS), is a well-respected group achievement test that precludes provision of an explanation of results (such as
requires strict adherence to instructions and time limits for in some organizational consulting, preemployment or
obtaining valid results. Yet the teachers found little value security screenings, and forensic evaluations), and this
fact has been clearly explained to the person being
in the testing program, complaining that its benefits did
assessed in advance.
not offset the time and costs involved. As a consequence of
their devaluing the effort, nonstandard testing was practi- The second quandary concerned a counselor who con-
cally the rule rather than the exception. The teachers tinued to use the MMPI even though the MMPI-2 has been
engaged in several nonstandard practices, most of which available for several years. Is the counselor’s refusal to use
tended to inflate the test scores. Inappropriate testing the MMPI-2 a breach of professional standards? The
practices included praising students who answered a answer to this query is probably “Yes.” The MMPI-2 is well
question correctly during the test (67 percent), using last validated and constitutes a significant improvement upon
year’s test questions for practice (44 percent), recoding a the MMPI. As mentioned previously, the MMPI-2 is now
student’s answer sheet because he or she just “miscoded” the standard of care in MMPI-based assessment of psycho-
the answer (26 percent), giving students as much time as pathology. The counselor who continued to rely on the
they needed (24 percent), giving students items that were original MMPI could be liable for malpractice suits, espe-
directly off the test (24 percent), and giving hints or clues cially if his test interpretations resulted in misleading inter-
during the test (23 percent). In general, Moore (1994) notes pretive statements or a false diagnosis.
that teachers modified their instructional efforts and cur- The third predicament involved the use of a neighbor-
riculum in anticipation of having their students take the hood friend as translator in the administration of the WISC-
test. More than 90 percent of the teachers added test- IV to a 9-year-old boy whose first language was Spanish.
related lessons to the curriculum, and more than 70 per- This is usually a mistake as it sacrifices strict control of the
cent eliminated topics so that they could spend more time testing material. The examiner was not bilingual and, there-
on test-related skills. fore, he would have no way of knowing whether the trans-
What this study demonstrates is that mandated educa- lator was remaining faithful to the original text or was
tional testing can have the unanticipated consequence of possibly supplying additional cues. In an ideal world, the
Applications and Consequences of Psychological Testing 27

proper procedure would be to enlist a Spanish-speaking In fact, the psychologist did report the case to authorities
examiner who would use a test formally translated and also with unexpected consequences. Police obtained a search
standardized with Hispanic examinees. For example, the warrant, went to the home of the client’s mother (where
Escala de Inteligencia Wechsler Para Niños-Revisada de the brother had lived), and ransacked the brother’s bed-
Puerto Rico (EIWN-R PR) would be a good choice. room. The mother was traumatized by the unexpected visit
The final quandary concerned the client who informed from the police and blamed the fiasco on her daughter. A
a psychologist that her recently deceased brother was most bitter estrangement followed, and the client then sued the
likely a pedophile. Is the psychologist obligated to report psychologist for violation of confidentiality!
this case to law enforcement? The answer to this query is
probably “Yes,” but it may depend on the jurisdiction of Chapter Quiz: Applications and Consequences of
the psychologist and the wording of the relevant statutes. ­Psychological Testing
Chapter 2
The History of
Psychological Testing
Learning Objectives
2.1 Discuss the origins of psychological testing 2.2 Review the history of testing from the early
over the ages 1900s until now

2.1: The Origins of modest when describing the purposes and applications of
his instruments:
Psychological Testing Psychology cannot attain the certainty and exactness of the
physical sciences, unless it rests on a foundation of experi-
2.1 Discuss the origins of psychological testing over
ment and measurement. A step in this direction could be
the ages
made by applying a series of mental tests and measure-
The history of psychological testing is a fascinating story ments to a large number of individuals. The results would
and has abundant relevance to present-day practices. be of considerable scientific value in discovering the con-
After all, contemporary tests did not spring from a vac- stancy of mental processes, their interdependence, and
uum; they evolved slowly from a host of precursors intro- their variation under different circumstances. Individuals,
besides, would find their tests interesting, and, perhaps,
duced over the last 100 years. Accordingly, Chapter 2
useful in regard to training, mode of life or indication of
features a review of the historical roots of present-day
disease. The scientific and practical value of such tests
psychological tests. In Module 2.1, The Origins of Psycho-
would be much increased should a uniform system be
logical Testing, we focus largely on the efforts of Euro- adopted, so that determinations made at different times
pean psychologists to measure intelligence during the and places could be compared and combined.
late nineteenth century and pre–World War I era. Euro- (Cattell, 1890)
pean psychologists have put in a lot of efforts to measure
intelligence during the late nineteenth century and pre– Cattell’s conjecture that “perhaps” tests would be use-
World War I era. These early intelligence tests and their ful in “training, mode of life or indication of disease” must
successors often exerted powerful effects on the exami- certainly rank as one of the prophetic understatements of
nees who took them, so the first topic also documents the all time. Anyone reared in the Western world knows that
historical impact of psychological test results. Module 2.2, psychological testing has emerged from its timid begin-
Testing from the Early 1900s to the Present, catalogues the nings to become a big business and a cultural institution
profusion of tests developed by American psychologists that permeates modern society.
in the first half of the twentieth century. As we shall see, the importance of testing is evident
Psychological testing in its modern form originated from historical review. Students of psychology generally
little more than 100 years ago in laboratory studies of sen- regard historical issues as dull, dry, and pedantic, and
sory discrimination, motor skills, and reaction time. The sometimes these prejudices are well deserved. After all,
British genius Francis Galton (1822–1911) invented the many textbooks fail to explain the relevance of historical
first battery of tests, a peculiar assortment of sensory and matters and provide only vague sketches of early develop-
motor measures, which we review in the following. The ments in mental testing. As a result, students of psychol-
American psychologist James McKeen Cattell (1860–1944) ogy often conclude incorrectly that historical issues are
studied with Galton and then, in 1890, proclaimed the boring and irrelevant.
modern testing agenda in his classic paper entitled “Men- In reality, the history of psychological testing is a cap-
tal Tests and Measurements.” He was tentative and tivating story that has substantial relevance to

28
The History of Psychological Testing 29

present-day practices. In later chapters, we examine the 2.1.2: Physiognomy, Phrenology,


principles of psychological testing, investigate applica-
tions in specific fields (e.g., personality, intelligence, neu-
and The Psychograph
ropsychology), and reflect on the social and legal Physiognomy is based on the notion that we can judge the
consequences of testing. However, the reader will find inner character of people from their outward appearance,
these topics more comprehensible when viewed in his- especially the face. Albeit misguided and now largely dis-
torical context. So, for now, we begin at the beginning by credited, physiognomy represents an early form of psycho-
reviewing rudimentary forms of testing that existed over logical testing. Hence, we provide a primer on the topic,
4,000 years ago in imperial China. including its more recent cousin, phrenology.
Interest in physiognomy can be dated to the fourth
century, when the Greek philosopher Aristotle (384–322
2.1.1: Rudimentary Forms of Testing b.c.) published a short treatise based on the premise that
in China in 2200 b.c.e. the soul and the body “sympathize” with each other.
Although the widespread use of psychological testing is Essentially, Aristotle argued that changes in a person’s soul
largely a phenomenon of the twentieth century, histori- (inner character) could impact the appearance of the body,
ans note that rudimentary forms of testing date back to and vice versa. The relationship between the two allowed
at least 2200 b.c. when the Chinese emperor had his offi- the astute observer to infer personality characteristics from
cials examined every third year to determine their fit- individual appearance. Aristotle catalogued a vast array of
ness for office (Bowman, 1989; Chaffee, 1985; Franke, traits that could be discerned from features of hair, fore-
1963; Teng, 1942–43). Such testing was modified and head, eyebrows, eyes, nose, lips, and so on. Here are some
refined over the centuries until written exams were examples:
introduced in the Han dynasty (202 b.c.–a.d. 200). Five
topics were tested: civil law, military affairs, agriculture, Hair that hangs down without curling, if it be of a fair
complexion, thin, and soft withal, signifies a man to be
revenue, and geography.
naturally fainthearted, and of a weak body but of a quiet
The Chinese examination system took its final form
and harmless disposition. Hair that is big, and thick, and
around 1370 when proficiency in the Confucian classics short withal, denotes a man to be of a strong constitution,
was emphasized. In the preliminary examination, candi- secure, and deceitful, and for the most part unquiet, and
dates were required to spend a day and a night in a small vain, lusting after beauty, and more foolish than wise,
isolated booth, composing essays on assigned topics and though fortune may favor him. (Aristotle, Of Physiog-
writing a poem. The 1 to 7 percent who passed moved up nomy, www.exclassics.com/arist/arist63.htm)
to the district examinations, which required three separate
sessions of three days and three nights. Many other classical Latin authors wrote about physi-
The district examinations were obviously grueling and ognomy, including Juvenal, Suetonius, and Pliny the Elder.
rigorous, but this was not the final level. The 1 to 10 per- But it was not until centuries later that physiognomy began
cent who passed were allowed the privilege of going to to flourish when a Swiss theologian penned a popular best-
Peking for the final round of examinations. Perhaps 3 per- seller on the topic.
cent of this final group passed and became mandarins, eli- Johann Lavater (1741–1801) published his Essays on
gible for public office. Physiognomy in Germany in the late eighteenth century.
Although the Chinese developed the external trap- English and French translations followed shortly and sales
pings of a comprehensive civil service examination pro- exploded in Western Europe and the United States. Even-
gram, the similarities between their traditions and current tually, more than 150 editions of the text were published
testing practices are, in the main, superficial. Not only were (Graham, 1961). Lavater’s book contained hundreds of
their testing practices unnecessarily grueling, but the Chi- meticulous drawings depicting his principles of physiog-
nese also failed to validate their selection procedures. nomy by which character could be judged from details of
Nonetheless, it does appear that the examination program facial appearance. Lukasik (2004) describes the allure of
incorporated relevant selection criteria. For example, in the this approach:
written exams beauty of penmanship was weighted very
Since Lavaterian physiognomy read moral character from
heavily. Given the highly stylistic features of Chinese writ-
unalterable and involuntary facial features, it created a
ten forms, good penmanship was no doubt essential for visual system for discerning a person’s permanent moral
clear, exact communication. Thus, penmanship was proba- character despite his or her social masks. Readers of the
bly a relevant predictor of suitability for civil service 1817 Pocket Lavater, for instance, learned how to look at
employment. In response to widespread discontent, the the features of various white male faces in order to dis-
examination system was abolished by royal decree in 1906 criminate “the physiognomy of … a man of business”
(Franke, 1963). from that of “a rogue.” (p. 1)
30 Chapter 2

Physiognomy remained popular for centuries and laid Even though the new emphasis on objective methods
the foundation for the more specialized form of quackery and measurable quantities was a vast improvement over
known as phrenology—reading “bumps” on the head. the largely sterile mentalism that preceded it, the new
The founding of phrenology is usually attributed to experimental psychology was itself a dead end, at least
the German physician Franz Joseph Gall (1758–1828). His as far as psychological testing was concerned. The prob-
“science” actually was based on a veneer of plausibility. In lem was that the early experimental psychologists mis-
his major work, The Anatomy and Physiology of the Nervous took simple sensory processes for intelligence. They used
System in General, and of the Brain in Particular (1810), Gall assorted brass instruments to measure sensory thresh-
argued that the brain is the organ of sentiments and facul- olds and reaction times, thinking that such abilities were
ties and that these capacities are localized. Furthermore, he at the heart of intelligence. Hence, this period is some-
reasoned, to the extent that a specific faculty was well times referred to as the Brass Instruments era of psycho-
developed, the corresponding component of the brain logical testing.
would be enlarged. In turn, because the skull conforms to In spite of the false start made by early experimental-
the shape of the brain, a cranial “bump” would signify an ists, at least they provided psychology with an appropriate
enlargement of the underlying faculty. These plausible (but methodology. Such pioneers as Wundt, Galton, Cattell, and
incorrect) assumptions allowed Gall and his followers to Wissler showed that it was possible to expose the mind to
decide if an individual was amorous, secretive, hopeful, scientific scrutiny and measurement. This was a fateful
combative, benevolent, self-confident, happy, imitative—in change in the axiomatic assumptions of psychology, a
all, dozens of traits were discerned from cranial bumps. change that has stayed with us to the current day.
Johann Spurzheim (1776–1832), a disciple of Gall, Most sources credit Wilhelm Wundt (1832–1920) with
popularized phrenology and disseminated it to the United founding the first psychological laboratory in 1879 in
States and Great Britain, where it became enormously Leipzig, Germany. It is less well recognized that he was
popular. In fact, a few entrepreneurs developed auto- measuring mental processes years before, at least as early
mated devices to measure the bumps with precision. In as 1862, when he experimented with his thought meter
1931, after decades of tinkering, Henry C. Lavery, a self- (Diamond, 1980). This device was a calibrated pendulum
proclaimed genius and ardent believer in phrenology, with needles sticking off from each side. The pendulum
spent a small fortune developing his machine known as would swing back and forth, striking bells with the nee-
the psychograph (McCoy, 2000). It consisted of hundreds dles. The observer’s task was to take note of the position
of moving parts assembled in a large helmet-like device of the pendulum when the bells sounded. Of course,
fitted over the examinee’s head. Each of 32 mental facul- Wundt could adjust the needles beforehand and thereby
ties was rated 1 through 5 (“deficient” to “very superior”) know the precise position of the pendulum when each bell
based on the way that probes made contact with the head. was struck. Wundt thought that the difference between
A belt-driven motor stamped out statements for each of the observed pendulum position and the actual position
the 32 faculties, providing one of the first automated per- would provide a means of determining the swiftness of
sonality descriptions. Initially, the psychograph was a thought of the observer.
spectacular success, and its promoters earned small for- Wundt’s analysis was relevant to a long-standing
tunes. But by the mid-1930s public skepticism held sway, problem in astronomy. The problem was that two or more
and the company that manufactured the instrument went astronomers simultaneously using the same telescope
out of business (McCoy, 2000). (with multiple eyepieces) would report different crossing
times as the stars moved across a grid line on the telescope.
2.1.3: The Brass Instruments Even in Wundt’s time, it was a well-known event in the
history of science that Kinnebrook, an assistant at the Royal
Era of Testing Observatory in England, had been dismissed in 1796
Experimental psychology flourished in the late 1800s in because his stellar crossing times were nearly a full second
continental Europe and Great Britain. For the first time in too slow (Boring, 1950). Wundt’s analysis offered another
history, psychologists departed from the wholly subjective explanation that did not assume incompetence on the part
and introspective methods that had been so fruitlessly pur- of anyone. Put simply, Wundt believed that the speed of
sued in the preceding centuries. Human abilities were thought might differ from one person to the next:
instead tested in laboratories. Researchers used objective
For each person there must be a certain speed of thinking,
procedures that were capable of replication. Gone were the
which he can never exceed with his given mental consti-
days when rival laboratories would have raging argu- tution. But just as one steam engine can go faster than
ments about “imageless thought,” one group saying it another, so this speed of thought will probably not be the
existed, another group saying that such a mental event was same in all persons.
impossible. (Wundt, 1862, as translated in Rieber, 1980)
The History of Psychological Testing 31

This analysis of telescope reporting times seems sim- not thousands of subjects. Because of his efforts in devising
plistic by present-day standards and overlooks the possi- practicable measures of individual differences, historians
ble contribution of such factors as attention, motivation, of psychological testing usually regard Galton as the father
and self-correcting feedback from prior trials. On the posi- of mental testing (Goodenough, 1949; Boring, 1950).
tive side, this was at least an empirical analysis that sought To further his study of individual differences, Galton
to explain individual differences instead of trying to set up a psychometric laboratory in London at the Interna-
explain them away. And that is the relevance to current tional Health Exhibition in 1884. It was later transferred to
practices in psychological testing. However crudely, the London Museum, where it was maintained for six
Wundt measured mental processes and begrudgingly years. Various anthropometric and psychometric measures
acknowledged individual differences. This emphasis on were arranged on a long table at one side of a narrow room.
individual differences was rare for Wundt. He is more Subjects were admitted at one end for threepence and
renowned for proposing common laws of thought for the given successive tests as they moved down the table. At
average adult mind. least 17,000 individuals were tested during the 1880s and
1890s. About 7,500 of the individual data records have sur-
Galton and the First Battery of Mental Tests vived to the present day (Johnson et al., 1985).
Sir Francis Galton (1822–1911) pioneered the new experi- The tests and measures involved both the physical and
mental psychology in nineteenth-century Great Britain. behavioral domains. Physical characteristics assessed were
Galton was obsessed with measurement, and his intellec- height, weight, head length, head breadth, arm span,
tual career seems to have been dominated by a belief that length of middle finger, and length of lower arm, among
virtually anything was measurable. His attempts to meas- others. The behavioral tests included strength of hand
ure intellect by means of reaction time and sensory dis- squeeze determined by dynamometer, vital capacity of the
crimination tasks are well known. Yet, to appreciate his lungs measured by spirometer, visual acuity, highest audi-
wide-ranging interests, the reader should be apprised that ble tone, speed of blow, and reaction time (RT) to both vis-
Galton also devised techniques for measuring beauty, per- ual and auditory stimuli.
sonality, the boringness of lectures, and the efficacy of Ultimately, Galton’s simplistic attempts to gauge intel-
prayer, to name but a few of the endeavors that his biogra- lect with measures of reaction time and sensory discrimi-
pher has catalogued in elaborate detail (Pearson 1914, 1924, nation proved fruitless. Nonetheless, he did provide a
1930a,b). tremendous impetus to the testing movement by demon-
Galton was a genius who was more interested in the strating that objective tests could be devised and that
problems of human evolution than in psychology per se meaningful scores could be obtained through standardized
(Boring, 1950). His two most influential works were Heredi- procedures.
tary Genius (1869), an empirical analysis purporting to
prove that genetic factors were overwhelmingly important Cattell Imports Brass Instruments to the
for the attainment of eminence, and Inquiries into Human United States James McKeen Cattell (1860–1944)
Faculty and Its Development (1883), a disparate series of studied the new experimental psychology with both
essays that emphasized individual differences in mental Wundt and Galton before settling at Columbia University
faculties. where, for 26 years, he was the undisputed dean of Ameri-
Boring (1950) regards Inquiries as the beginning of the can psychology. With Wundt, he did a series of painstak-
mental test movement and the advent of the scientific psy- ingly elaborate RT studies (1880–1882), measuring with
chology of individual differences. The book is a curious great precision the fractions of a second presumably
mixture of empirical research and speculative essays on required for different mental reactions. He also noted,
topics as diverse as “just perceptible differences” in lifted almost in passing, that he and another colleague had small
weight and diminished fertility among inbred animals. but consistent differences in RT. Cattell proposed to Wundt
There is, nonetheless, a common theme uniting these that such individual differences ought to be studied sys-
diverse essays; Galton demonstrates time and again that tematically. Although Wundt acknowledged individual
individual differences not only exist but also are objec- differences, he was philosophically more inclined to study
tively measurable. general features of the mind, and he offered no support for
Galton borrowed the time-consuming psychophysical Cattell’s proposal (Fancher, 1985).
procedures practiced by Wundt and others on the Euro- But Cattell received enthusiastic support for his study
pean continent and adapted them to a series of simple and of individual differences from Galton, who had just opened
quick sensorimotor measures. Thus, he continued the tra- his psychometric laboratory in London. After correspond-
dition of brass instruments mental testing but with an ing with Galton for a few years, Cattell arranged for a two-
important difference: his procedures were much more year fellowship at Cambridge so that he could continue the
amenable to the timely collection of data from hundreds if study of individual differences. Cattell opened his own
32 Chapter 2

research laboratory and developed a series of tests that Wissler obtained both mental test scores and academic
were mainly extensions and additions to Galton’s battery. grades from more than 300 students at Columbia Univer-
Cattell (1890) invented the term mental test in his sity and Barnard College. His goal was to demonstrate that
famous paper entitled “Mental Tests and Measurements.” the test results could predict academic performance. With
This paper described his research program, detailing 10 our early twenty-first-century perspective on research and
mental tests he proposed for use with the general public. testing, it seems amazing that the early experimentalists
These tests were clearly a reworking and embellishment of waited so long to do such basic validational research.
the Galtonian tradition: Wissler’s (1901) results showed virtually no tendency for
the mental test scores to correlate with academic achieve-
ment. For example, class standing correlated .16 with
memory for number lists, −.08 with dynamometer strength,
.02 with color naming, and −.02 with reaction time. The
highest correlation (.16) was statistically significant because
of the large sample size. However, so humble a correlation
carries with it very little predictive utility.1
Also damaging to the brass instruments testing move-
ment was the very modest correlations between the mental
tests themselves. For example, color naming and hand
movement speed correlated only .19, while RT and color
naming correlated −.15. Several physical measures such as
head size (a holdover measure from the Galton era) were,
not surprisingly, also uncorrelated with the various sen-
sory and RT measures.
With the publication of Wissler’s (1901) discouraging
results, experimental psychologists largely abandoned the
use of RT and sensory discrimination as measures of intel-
ligence. This turning away from the brass instruments
approach was a desirable development in the history of
psychological testing. The way was thereby paved for
immediate acceptance of Alfred Binet’s more sensible and
Strength of hand squeeze seems a curious addition to a useful measures of higher mental processes.
battery of mental tests, a point that Cattell (1890) addressed A common reaction among psychologists in the early
directly in his paper. He was of the opinion that it was 1900s was to begrudgingly conclude that Galton had been
impossible to separate bodily energy from mental energy. wrong in attempting to infer complex abilities from simple
Thus, in Cattell’s view, an ostensibly physiological meas- ones. Goodenough (1949) has likened Galton’s approach to
ure such as dynamometer pressure was an index of one’s “inferring the nature of genius from the nature of stupidity
mental power as well. Clearly, the physiological and sen- or the qualities of water from those of the hydrogen and
sory bias of the entire test battery reflects its strongly Gal- oxygen of which it is composed.” The academic psycholo-
tonian heritage (Fancher, 1985). gists apparently agreed with her, and American attempts to
In 1891, Cattell accepted a position at Columbia Uni- develop intelligence tests virtually ceased at the turn of the
versity, at that time the largest university in the United twentieth century. For his own part, Wissler was apparently
States. His subsequent influence on American psychology so discouraged by his results that he immediately switched
was far in excess of his individual scientific output and was to anthropology, where he became a strong environmental-
expressed in large part through his numerous and influen- ist in explaining differences between ethnic groups.
tial students (Boring, 1950). Among his many famous doc- The void created by the abandonment of the Galtonian
toral students and the years of their degrees were E. L. tradition did not last for long. In Europe, Alfred Binet was
Thorndike (1898) who made monumental contributions to on the verge of a major breakthrough in intelligence
learning theory and educational psychology; R. S.
­Woodworth (1899) who was to author the very popular 1
By way of quick preview, correlations can range from –1.0 to +1.0.
and influential Experimental Psychology (1938); and E. K. Values near zero indicate a weak, negligible linear relationship
Strong (1911) whose Vocational Interest Blank—since between the two variables. For example, correlations between −.20
revised—is still in wide use. But among Cattell’s students, and +.20 are generally of minimal value for purposes of individual
it was probably Clark Wissler (1901) who had the greatest prediction. Note also that negative correlations indicate an inverse
influence on the early history of psychological testing. relationship.
The History of Psychological Testing 33

testing. Binet introduced his scale of intelligence in 1905, considered by many the father of Islamic philosophy. He
and shortly thereafter H. H. Goddard imported it to the questioned whether the successive degrees of heat and
United States, where it was applied in a manner that Gould cold could be equal but did not propose a means for
(1981) has described as “the dismantling of Binet’s inten- answering the inquiry. Al-kindi made important contribu-
tions in America.” Whether early twentieth-century Amer- tions in many fields, including astronomy, chemistry, and
ican psychologists subverted Binet’s intentions is an medicine (www.muslimphilosophy.com/kindi).
important question that we review in the next topic. First According to McReynolds and Ludwig (1984), the first
we turn to a more general topic, the rise of rating scales in person to devise and apply rating scales for psychological
the history of psychology. variables was Christian Thomasius (1655–1728). Thoma-
sius was a German jurist and philosopher whose career
2.1.4: Rating Scales spanned numerous fields of inquiry. He developed a the-
and Their Origins ory of personality that posited four major dimensions—
sensuousness, acquisitiveness, social ambition, and
Rating scales are widely used in psychology as a means of rational love. He employed judges to assess individuals on
quantifying subjective psychological variables of many all four inclinations on a 12-point scale (5, 10, 15, 20, all the
kinds. An example of a simple rating scale might be the way up to 60). In 1692, he published numerical data—
11-point scale used by doctors when they ask patients in including reliability data—on five individuals as rated by
the emergency room “On a scale from 0 to 10, where 0 is no himself and other judges. This was a landmark accom-
pain at all, and 10 is the worst pain you have ever felt, how plishment: “This work appears to constitute the first sys-
bad is your pain right now?” Albeit crude, this is a form of tematic collection and analysis of quantitative empirical
psychological measurement. Psychometricians have devel- data in the entire history of psychology” (McReynolds &
oped a rich literature on the qualities and applications of Ludwig, 1984, p. 282).
rating scales of this type (Guilford, 1954; Nunnally, 1967; Ratings scale slowly caught on in the years after their
Nunnally & Bernstein, 1994). first serious use by Thomasius. Among those applying
Historians of psychology used to think that numerical these new devices were phrenologists, including the
rating scales originated in the “brass instruments” era of renowned practitioner Orson Fowler. Phrenology is
Francis Galton (McReynolds & Ludwig, 1987). However, it described in an earlier section of this chapter. Fowler
now appears that a crude form of rating scale can be traced depicted the application of seven-point rating scales in his
to Galen, the second century Greco-Roman physician. Practical Phrenology (1851). The bulges in different areas of
Galen believed in the prevailing humor theory of health the skull were rated as 1, VERY SMALL; 2, SMALL; 3,
and disease, in which the harmony or disharmony among MODERATE; 4, AVERAGE; 5, FULL; 6, LARGE; 7, VERY
four bodily fluids or “humors” determined one’s health. LARGE. From these ratings, the relative strengths of spe-
The four humors were yellow bile, black bile, phlegm, and cific moral and intellectual qualities were presumed to be
blood. The humorology of the time also featured the quantified.2
dichotomies of hot–cold and wet–dry as elements of health The use of ratings scale may have provided Fowler’s
or illness. With respect to the hot-cold dimension, Galen practice of phrenology a facade of respectability. Even so,
recognized the need for something more sophisticated this did not prevent his arrest in 1886 for practicing medi-
than a simple dichotomy: cine without a license (New York Times, January 17, 1886).
This standard, or neutral value, he suggested should be According to the Times article:
the temperature, as reflected in direct sense–perception,
The phrenologist denies that he practices medicine and
of a mixture of equal quantities of boiling water and ice
asserts that he has violated no law, that he is simply a
(Taylor, 1942). Further, Galen proposed a convention of
phrenologist, and does not give remedies to persons who
four degrees of heat and four degrees of cold, on either
apply to him to have their craniums examined. There was
side of that standard, that could be induced in patients by
quite a crowd of patrons in the Professor’s anteroom at
various drugs.
the hotel when the detective served the warrant. Prof.
(McReynolds & Ludwig, 1987, p. 281)
Fowler was held to await action by the Grand Jury, and
Although he did not say so explicitly, Galen was in released on his own recognizance.
effect proposing a nine-point rating scale consisting of four Phrenology, which surrounded itself with the trap-
points above and four points below a neutral point. pings of science, including models of the head and brain,
Whether the successive increases of heat or cold were equal
in the hot–cold scale—what we would now refer to as the
underlying scale of measurement—was an issue left to oth- 2
The common idiom “You should have your head examined”
ers, including the ninth-century Islamic philosopher, Al- probably alludes to the (now discredited) practice of phrenology
kindi (Taylor, 1942). Al-kindi was an Arab polymath (Ammer, 2003).
34 Chapter 2

authoritative pronouncements, and, yes, even ratings between mental retardation (then called idiocy) and mental
scales, phrenology which flourished into the early 1900s, illness (often referred to as dementia). J. E. D. Esquirol
eventually faded into disrepute. (1772–1840) was the first to formalize the difference in writ-
ing. His diagnostic breakthrough was noting that mental
retardation was a lifelong developmental phenomenon,
2.1.5: Changing Conceptions of whereas mental illness usually had a more abrupt onset in
Mental Retardation in the 1800s adulthood. He thought that mental retardation was incur-
able, whereas mental illness might show improvement
Many great inventions have been developed in response to
(Esquirol, 1845/1838).
the practical needs created by changes in societal values.
Esquirol placed great emphasis on language skills in
Such is the case with intelligence tests. To be specific, the
the diagnosis of mental retardation. This may offer a partial
first such tests were developed by Binet in the early 1900s
explanation as to why Binet’s later tests and the modern-
to help identify children in the Paris school system who
day descendents from them are so heavily loaded on lin-
were unlikely to profit from ordinary instruction. Prior to
guistic abilities. After all, the original use of the Binet scales
this time, there was little interest in the educational needs
was, in the main, to identify children with mental retarda-
of children with mental retardation. A new humanism
tion who would not likely profit from ordinary schooling.
toward those with mental retardation thus created the
Esquirol also proposed the first classification system in
practical problem—identifying those with special needs—
mental retardation and it should be no surprise that lan-
that Binet’s tests were to solve.
guage skills were the main diagnostic criteria. He recog-
The Western world of the late 1800s was just emerging
nized three levels of mental retardation: (1) those using
from centuries of indifference and hostility toward the psy-
short phrases, (2) those using only monosyllables, and (3)
chiatrically and mentally impaired. Medical practitioners
those with cries only, no speech. Apparently, Esquirol did
were just beginning to acknowledge a distinction between
not recognize what we would now call mild mental retarda-
individuals with emotional disabilities and mental retarda-
tion, instead providing criteria for the equivalents of the
tion. For centuries, all such social outcasts were given simi-
modern-day classifications of moderate, severe, and pro-
lar treatment. In the Middle Ages, they were occasionally
found mental retardation.
“diagnosed” as witches and put to death by burning. Later
on, they were alternately ignored, persecuted, or tortured.
Seguin and Education of Individuals with
In his comprehensive history of psychotherapy and psy-
Mental Retardation Perhaps more than any other
choanalysis, Bromberg (1959) has an especially graphic
pioneer in the field of mental retardation, O. Edouard
chapter on the various forms of maltreatment toward those
Seguin (1812–1880) helped establish a new humanism
with mental and emotional disabilities, from which only
toward those with mental retardation in the late 1800s. He
one example will be provided here. In 1698, a prominent
had been a student of Esquirol and had also studied with
physician wrote a gruesome book, Flagellum Salutis, in
J. M. G. Itard (1774–1838), who is well known for his five-
which beatings were advocated as treatment “in melan-
year attempt to train the Wild Boy of Aveyron, a feral child
cholia; in frenzy; in paralysis; in epilepsy; in facial expres-
who had lived in the woods for his first 11 or 12 years
sion of feebleminded” (Bromberg, 1959).
(Itard, 1932/1801).
By the early 1800s, saner minds began to prevail. Medi-
Seguin borrowed from techniques used by Itard and
cal practitioners realized that some of those with psychiatric
devoted his life to developing educational programs for
impairment had reversible illnesses that did not necessarily
persons with mental retardation. As early as 1838, he had
imply diminished intellect, whereas other exceptional per-
established an experimental class for such individuals. His
sons, those with mental retardation, showed a greater devel-
treatment efforts earned him international acclaim and he
opmental continuity and invariably had impaired intellect.
eventually came to the United States to continue his work.
In addition, a newfound humanism began to influence
In 1866, he published Idiocy, and Its Treatment by the Physio-
social practices toward individuals with psychological and
logical Method, the first major textbook on the treatment of
mental disabilities. With this humanism there arose a greater
mental retardation. This book advocated a surprisingly
interest in the diagnosis and remediation of mental retarda-
modern approach to education of individuals with mental
tion. At the forefront of these developments were two French
retardation and even touched on what would now be
physicians, J. E. D. Esquirol and O. E. Seguin, each of whom
called behavior modification.
revolutionized thinking about those with mental retarda-
Such was the social and historical background that
tion, thereby helping to create the necessity for Binet’s tests.
allowed intelligence tests to flourish. We turn now to the
Esquirol and Diagnosis in Mental Retarda- invention of the modern-day intelligence test by Alfred
tion Around the beginning of the nineteenth century, Binet. We begin with a discussion of the early influences
many physicians had begun to perceive the difference that shaped his famous test.
The History of Psychological Testing 35

2.1.6: Influence of Binet’s Early By making an error and later accounting for the cause,
one learns not to make the mistake a second time. In
Research on his Test regard to children, it is necessary to be suspicious of two
As most every student of psychology knows, Alfred Binet principal causes of error: suggestion and failure of atten-
(1857–1911) invented the first modern intelligence test in tion. This is not the time to speak on the first point. As for
1905. What is less well known, but equally important for the second, failure of attention, it is so important that it is
always necessary to suspect it when one obtains a nega-
those who seek an understanding of his contributions to
tive result. One must then suspend the experiments and
modern psychology, is that Binet was a prolific researcher
take them up at a more favorable moment, restarting
and author long before he turned his attentions to intelli- them 10 times, 20 times, with great patience. Children, in
gence testing. The character of his early research had a fact, are often little disposed to pay attention to experi-
material bearing on the subsequent form of his well-known ments which are not entertaining, and it is useless to hope
intelligence test. For those who seek a full understanding that one can make them more attentive by threatening
of his pathbreaking influence, brief mention of Binet’s early them with punishment. By particular tricks, however, one
career is mandatory. For more details the reader can con- can sometimes give the experiment a certain appeal.
sult Fancher (1985), Goodenough (1949), Gould (1981), and (Binet, 1895, quoted in Pollack, 1971)
Wolf (1973). It is interesting to contrast modern-day testing prac-
Binet began his career in medicine but was forced to tices—which go so far as to specify the exact wording the
drop out because of a complete emotional breakdown. He examiner should use—with Binet’s advice to exercise
switched to psychology, where he studied the two-point nearly endless patience and use entertaining tricks when
threshold and dabbled in the associationist psychology of testing children.
John Stuart Mill (1806–1873). Later, he selected an appren-
ticeship with the neurologist J. M. Charcot (1825–1893) at
the famous Salpetriere Hospital. Thus, for a brief time
2.1.7: Binet and Testing for Higher
Binet’s professional path paralleled that of Sigmund Freud, Mental Processes
who also studied hysteria under Charcot. At the Salpetriere In 1896, Binet and his Sorbonne assistant, Victor Henri,
Hospital, Binet coauthored (with C. Fere) four studies sup- published a pivotal review of German and American work
posedly demonstrating that reversing the polarity of a on individual differences. In this historically important
magnet could induce complete mood changes (e.g., from paper, they argued that intelligence could be better meas-
happy to sad) or transfer of hysterical paralysis (e.g., from ured by means of the higher psychological processes rather
left to right side) in a single hypnotized subject. In response than the elementary sensory processes such as reaction
to public criticism from other psychologists, Binet later time. After several false starts, Binet and Simon eventually
published a recantation of his findings. This was a painful settled on the straightforward format of their 1905 scales,
episode for Binet, and it sent his career into a temporary discussed subsequently.
detour. Nonetheless, he learned two things through his The character of the 1905 scale owed much to a prior
embarrassment. First, he never again used sloppy experi- test developed by Dr. Blin (1902) and his pupil, M. Damaye.
mental procedures that allowed for unintentional sugges- They had attempted to improve the diagnosis of mental
tion to influence his results. Second, he became skeptical of retardation by using a battery of assessments in 20 areas
the zeitgeist (spirit of the times) in experimental psychol- such as spoken language; knowledge of parts of the body;
ogy. Both of these lessons were applied when he later obedience to simple commands; naming common objects;
developed his intelligence scales. and ability to read, write, and do simple arithmetic. Binet
In 1891, Binet went to work at the Sorbonne as an criticized the scale for being too subjective, for having
unpaid assistant and began a series of studies and publica- items reflecting formal education, and for using a “yes or
tions that were to define his new “individual psychology” no” format on many questions (DuBois, 1970). But he was
and ultimately to culminate in his intelligence tests. Binet much impressed with the idea of using a battery of tests, a
was an ardent experimentalist, often using his two daugh- feature that he adopted in his 1905 scales.
ters to try out existing and new tests of intelligence. Binet’s In 1904, the Minister of Public Instruction in Paris
experiments with his children greatly influenced his views appointed a commission to decide on the educational
on proper testing procedures: measures that should be undertaken with those children
who could not profit from regular instruction. The com-
The experimenter is obliged, to a point, to adjust his
method to the subject he is addressing. There are certain mission concluded that medical and educational examina-
rules to follow when one experiments on a child, just as tions should be used to identify those children who could
there are certain rules for adults, for hysterics, and for the not learn by the ordinary methods. Furthermore, it was
insane. These rules are not written down anywhere; each determined that these children should be removed from
one learns them for himself and is repaid in great measure. their regular classes and given special instruction suitable
36 Chapter 2

to their more limited intellectual prowess. This was the An interesting point that is often overlooked by con-
beginning of the special education classroom. temporary students of psychology is that Binet and Simon
It was evident that a means of selecting children for did not offer a precise method for arriving at a total score
such special placement was needed, and Binet and his col- on their 1905 scale. It is well to remember that their pur-
league Simon were called on to develop a practical tool for pose was classification, not measurement, and that their
just this purpose. Thus arose the first formal scale for motivation was entirely humanitarian, namely, to identify
assessing the intelligence of children. those children who needed special educational placement.
The 30 tests on the 1905 scale ranged from utterly sim- By contemporary standards, it is difficult to accept the
ple sensory tests to quite complex verbal abstractions. fuzziness inherent in such an approach, but that may
Thus, the scale was appropriate for assessing the entire reflect a modern penchant for quantification more than a
gamut of intelligence—from severe mental retardation to weakness in the 1905 scale. In fact, their scale was popular
high levels of giftedness. The entire scale is outlined in among educators in Paris. And, even with the absence of
Table 2.1. precise quantification, the approach was successful in
selecting candidates for special classes.

Table 2.1 The 1905 Binet-Simon Scale


1. Follows a moving object with the eyes.
2.1.8: The Revised Scales and
2. Grasps a small object which is touched. The Advent of IQ
3. Grasps a small object which is seen.
In 1908, Binet and Simon published a revision of the 1905
4. Distinguishes between a square of chocolate and a square of wood.
scale. In the earlier scale, more than half the items had been
5. Finds and eats a square of chocolate wrapped in paper.
6. Executes simple commands and imitates simple gestures.
designed for the very retarded, yet the major diagnostic
7. Points to familiar named objects, e.g., “Where is your head?” decisions involved older children and those with border-
8. Points to objects shown in pictures, e.g., “Put your finger on the line intellect. To remedy this imbalance, most of the very
window.” simple items were dropped and new items were added at
9. Names objects in pictures, e.g., “What is this?” [examiner points to a
picture of a dog].
the higher end of the scale. The 1908 scale had 58 problems
10. Compares two lines of markedly unequal length. or tests, almost double the number from 1905. Several new
11. Repeats three spoken digits. tests were added, many of which are still used today:
12. Compares two weights. reconstructing scrambled sentences, copying a diamond,
13. Shows susceptibility to suggestion. and executing a sequence of three commands. Some of the
14. Defines common words by function.
items were absurdities that the children had to detect and
15. Repeats a sentence of 15 words.
explain. One such item was amusing to French children:
16. Tells how two common objects are different, e.g., “paper and
cardboard.” “The body of an unfortunate girl was found, cut into 18
17. Names from memory objects displayed on a board for 30 seconds. pieces. It is thought that she killed herself.” However, this
[Later dropped]
item was very upsetting to some American subjects, dem-
18. Reproduces from memory two designs shown for 10 seconds.
onstrating the importance of cultural factors in intelligence
19. Repeats a longer series of digits than in item 11 to test immediate
memory. (Fancher, 1985).
20. Tells how two common objects are alike, e.g., “butterfly and flea.” The major innovation of the 1908 scale was the intro-
21. Compares two lines of slightly unequal length. duction of the concept of mental level. The tests had been
22. Compares five blocks to put them in order of weight. standardized on about 300 normal children between the
23. Indicates which of the previous five weights the examiner has removed.
ages of 3 and 13 years. This allowed Binet and Simon to
24. Produces rhymes, e.g., “What rhymes with ‘school’?”
25. A word completion test based on those proposed by Ebbinghaus.
order the tests according to the age level at which they
26. Puts three nouns, e.g., “Paris, river, fortune” in a sentence. were typically passed. Whichever items were passed by 80
27. Responds to 25 abstract (comprehension) questions. to 90 percent of the 3-year-olds were placed in the 3-year
28. Reverses the hands of a clock. level, and similarly on up to age 13. Binet and Simon also
29. After paper folding and cutting, draws the form of the resulting holes. devised a rough scoring system whereby a basal age was
30. Defines abstract words by designating the difference between, e.g., first determined from the age level at which not more than
“boredom and weariness.”
one test was failed. For each five tests that were passed at
Source: Based on Kite, E. (1916), The development of intelligence in children, Vineland,
NJ: Vineland Training School. levels above the basal, a full year of mental level was
granted. Insofar as partial years of mental level were not
Except for the very simplest tests, which were designed credited and the various age levels had anywhere from
for the classification of very low-grade idiots (an unfortu- three to eight tests, the method left much to be desired.
nate diagnostic term that has since been dropped), the tests In 1911, a third revision of the Binet-Simon scales
were heavily weighted toward verbal skills, reflecting appeared. Each age level now had exactly five tests. The
Binet’s departure from the Galtonian tradition. scale was also extended into the adult range. And with
The History of Psychological Testing 37

some reluctance, Binet introduced new scoring methods immigrants with mental retardation and the quick, accurate
that allowed for one-fifth of a year for each subtest passed classification of Army recruits (Boake, 2002).
beyond the basal level. In his writings, Binet emphasized Whether these early tests really solved social dilem-
strongly that the child’s exact mental level should not be mas—or merely exacerbated them—is a fiercely debated
taken too seriously as an absolute measure of intelligence. issue reviewed in the following sections. One thing is cer-
Nonetheless, the idea of deriving a mental level was a tain: The profusion of tests developed early in the twenti-
monumental development that was to influence the char- eth century helped shape the character of contemporary
acter of intelligence testing throughout the twentieth cen- tests. A review of these historical trends will aid in the
tury. Within months, what Binet called mental level was comprehension of the nature of modern tests and a better
being translated as mental age. And testers everywhere, appreciation of the social issues raised by them.
including Binet himself, were comparing a child’s mental
age with the child’s chronological age. Thus, a 9-year-old
2.2.1: Early Uses and Abuses
who was functioning at the mental level (or mental age) of
a 6-year-old was retarded by three years. Very shortly, of Tests in The United States
Stern (1912) pointed out that being retarded by three years
First Translation of the Binet-Simon Scale In
had different meanings at different ages. A 5-year-old func-
1906, Henry H. Goddard was hired by the Vineland Train-
tioning at the 2-year-old level was more impaired than a
ing School in New Jersey to do research on the classifica-
13-year-old functioning at the 10-year-old level. Stern sug-
tion and education of “feebleminded” children. He soon
gested that an intelligence quotient computed from the
realized that a diagnostic instrument would be required
mental age divided by the chronological age would give a
and was, therefore, pleased to read of the 1908 Binet-Simon
better measure of the relative functioning of a subject com-
scale. He quickly set about translating the scale, making
pared to his or her same-aged peers.
minor changes so that it would be applicable to American
In 1916, Terman and his associates at Stanford revised
children (Goddard, 1910a).
the Binet-Simon scales, producing the Stanford-Binet, a
Goddard (1910b) tested 378 residents of the Vineland
successful test that is discussed in a later chapter which is
facility and categorized them by diagnosis and mental age.
a successful test. Terman suggested multiplying the intel-
He classified 73 residents as idiots because their mental age
ligence quotient by 100 to remove fractions; he was also
was 2 years or lower; 205 residents were termed imbeciles
the first person to use the abbreviation IQ. Thus was born
with mental age of 3 to 7 years; and 100 residents were
one of the most popular and controversial concepts in the
deemed feebleminded with mental age of 8 to 12 years. It is
history of psychology. Binet died in 1911 before the IQ
instructive to note that originally neutral and descriptive
swept American testing, so we will never know what he
terms for portraying levels of mental retardation—idiot,
would have thought of this new development based on
imbecile, and feebleminded—have made their way into
his scales. However, Simon, his collaborator, later called
the everyday lexicon of pejorative labels. In fact, Goddard
the concept of IQ a “betrayal” of their scale’s original
made his own contribution by coining the diagnostic term
objectives (Fancher, 1985, p. 104), and we can assume
moron (from the Greek moronia, meaning “foolish”).
from Binet’s humanistic concern that he might have held
Goddard (1911) also tested 1,547 normal children with
a similar opinion.
his translation of the Binet-Simon scales. He considered
children whose mental age was four or more years behind

2.2: Testing from the Early their chronological age to be feebleminded—these consti-
tuted 3 percent of his sample. Considering that all of these

1900s to the Present children were found outside of institutions for the retarded,
3 percent is rather an alarming rate of mental deficiency.
2.2 Review the history of testing from the early 1900s Goddard (1911) was of the opinion that these children
until now should be segregated so that they would be prevented
from “contaminating society.” These early studies piqued
The Binet-Simon scales helped solve a practical social quan-
Goddard’s curiosity about “feebleminded” citizenry and
dary, namely, how to identify children who needed special
the societal burdens they imposed. He also gained a repu-
schooling. With this successful application of a mental test,
tation as one of the leading experts on the use of intelli-
psychologists realized that their inventions could have
gence tests to identify persons with impaired intellect. His
pragmatic significance for many different segments of soci-
talents were soon in heavy demand.
ety. Almost immediately, psychologists in the United States
adopted a utilitarian focus. Intelligence testing was The Binet-Simon and Immigration In 1910, God-
embraced by many as a reliable and objective response to dard was invited to Ellis Island by the commissioner of
perceived social problems such as the identification of immigration to help make the examination of immigrants
38 Chapter 2

more accurate. A dark and foreboding folklore had grown his insistence that much undesirable behavior—crime,
up around mental deficiency and immigration in the early alcoholism, prostitution—was due to inherited mental
1900s: deficiency also does not sit well with the modern environ-
mentalist position.
It was believed that the feebleminded were degenerate
beings responsible for many if not most social problems; However, the most likely reason that modern authors
that they reproduced at an alarming rate and menaced have ignored Goddard is that he exemplified a large num-
the nation’s overall biological fitness; and that their num- ber of early prominent psychologists who engaged in the
bers were being incremented by undesirable “new” blatant misuse of intelligence testing. In his efforts to dem-
immigrants from southern and eastern European coun- onstrate that high rates of immigrants with mental retarda-
tries who had largely supplanted the “old” immigrants tion were entering the United States each day, Goddard
from northern and western Europe. sent his assistants to Ellis Island to administer his English
(Gelb, 1986)
translation of the Binet-Simon tests to newly arrived immi-
Initially, Goddard was unconcerned about the sup- grants. The tests were administered through a translator,
posed threat of feeblemindedness posed by the immi- not long after the immigrants walked ashore. We can guess
grants. He wrote that adequate statistics did not exist and that many of the immigrants were frightened, confused,
that the prevalent opinions about undue percentages of and disoriented. Thus, a test devised in French, then trans-
mentally defective immigrants were “grossly overesti- lated to English was, in turn, retranslated back to Yiddish,
mated” (Goddard, 1912). However, with repeated visits to Hungarian, Italian, or Russian; administered to bewildered
Ellis Island, Goddard became convinced that the rates of farmers and laborers who had just endured an Atlantic
feeblemindedness were much higher than estimated by the crossing; and interpreted according to the original French
physicians who staffed the immigration service. Within a norms.
year, he reversed his opinions entirely and called for con- What did Goddard find and what did he make of his
gressional funding so that Ellis Island could be staffed with results?
experts trained in the use of intelligence tests. In the fol-
lowing decade, Goddard became an apostle for the use of
intelligence tests to identify feebleminded immigrants.
Although he wrote that the rates of mentally deficient
immigrants were “alarming,” he did not join the popular
call for immigration restriction (Gelb, 1986).
The story of Goddard and his concern for the “menace
of feeblemindedness,” as Gould (1981) has satirically put
it, is often ignored or downplayed in books on psychologi-
cal testing. The majority of textbooks on testing do not
mention or refer to Goddard at all. The few books that do
mention him usually state that Goddard “used the tests in
institutions for the retarded,” which is surely an under-
statement. In his influential History of Psychological Testing,
DuBois (1970) has a portrait of Goddard but devotes less
than one line of text to him.
The fact is that Goddard was one of the most influen-
tial American psychologists of the early 1900s. Any
thoughtful person must, therefore, wonder why so many There is much, much more to the Goddard era of early
contemporary authors have ignored or slighted the person intelligence testing, and the interested reader is urged to
who first translated and applied Binet’s tests in the United consult Gould (1981) and Gelb (1986). The most important
States. We will attempt an answer here, based in part on point that we wish to stress here is that—like many other
Goddard’s original writing, but also relying on Gould’s early psychologists—Goddard’s scholarly views were
(1981) critique of Goddard’s voluminous writings on men- influenced by the social ideologies of his time. Finally,
tal deficiency and intelligence testing. We refer to Gelb’s Goddard was a complex scholar who refined and contra-
(1986) more sympathetic portrayal of Goddard as well. dicted his professional opinions on numerous occasions.
Perhaps Goddard has been ignored in the textbooks One ironic example: After the damage was done and his
because he was a strict hereditarian who conceived of intel- writings had helped restrict immigration, Goddard (1928)
ligence in simpleminded Mendelian terms. No doubt his recanted, concluding that feeblemindedness was not incur-
call for colonization of “morons” so as to restrict their able and that the feebleminded did not need to be segre-
breeding has won him contemporary disfavor as well. And gated in institutions.
The History of Psychological Testing 39

The Goddard chapter in the history of testing serves as It is now a century, more or less, since Hollingworth’s
a reminder that even well-meaning persons operating proclamation. Gender differences in eminence and achieve-
within generally accepted social norms can misuse psycho- ment still exist, but they have been greatly reduced.
logical tests. We need be ever mindful that disinterested
“science” can be harnessed to the goals of a pernicious The Stanford-Binet: The Early Mainstay of
social ideology. IQ  Although it was Goddard who first translated the
Binet scales in the United States, it was Stanford professor
Lewis M. Terman (1857–1956) who popularized IQ testing
Testing for Giftedness: Leta Stetter Holling-
with his revision of the Binet scales in 1916. The new
worth One of the earliest uses of IQ tests like the
­Stanford-Binet, as it was called, was a substantial revision,
­ tanford-Binet was testing for giftedness. A pioneer in this
S
not just an extension, of the earlier Binet scales. Among the
application was Leta Stetter Hollingworth (1886–1939)
many changes that led to the unquestioned prestige of the
who spent her short career (she died of cancer at the age of
Stanford-Binet was the use of the now familiar IQ for
53) focusing on the psychology of genius. In one study,
expressing test results. The number of items was increased
Hollingworth (1928) demonstrated that children of high
to 90, and the new scale was suitable for those with mental
genius (Stanford-Binet IQs hovering around 165) showed
retardation, children, and both normal and “superior”
significantly greater school achievement than those of mere
adults. In addition, the Stanford-Binet had clear and well-
ordinary genius (IQs clustering around 146). In another
organized instructions for administration and scoring.
study, she dispelled the belief, common at the time, that
Great care had been taken in securing a representative
gifted children should not be moved ahead in school
sample of subjects for use in the standardization of the test.
because they would lag behind older children in penman-
As Goodenough (1949) notes: “The publication of the Stan-
ship and other motor skills (Hollingworth & Monahan,
ford Revision marked the end of the initial period of exper-
1926). In yet another study, she found that highly gifted
imentation and uncertainty. Once and for all, intelligence
adolescents were judged by total strangers to be signifi-
testing had been put on a firm basis.”
cantly better looking than matched controls of the same
The Stanford-Binet was the standard of intelligence
age (Hollingworth, 1935).
testing for decades. New tests were always validated in
Hollingworth was a prolific researcher who advanced
terms of their correlations with this measure. It contin-
the science of IQ testing. Being an idealist, she was ahead
ued its preeminence through revisions in 1937 and 1960,
of her time. She proposed a revolving fund from which
by which time the Wechsler scales (Wechsler, 1949, 1955)
gifted children could draw for their development, with the
had begun to compete with it. The latest revision of the
moral (but not legal) obligation to pay the money back in
Stanford-Binet was completed in 2003. This test and the
20 years. She surmised that such a fund would grow expo-
Wechsler scales are discussed in detail in a later chapter.
nentially over the decades and benefit the nation in unfore-
It is worth mentioning here that the Wechsler scales
seeable ways (H. Hollingworth, 1943). Unfortunately, this
became a quite popular alternative to the Stanford-Binet
remarkable plan never came to fruition.
mainly because they provided more than just an IQ
Hollingworth also was a feminist who attributed gen-
score. In addition to Full Scale IQ, the Wechsler scales
der differences in eminence and achievement to social and
provided 10 to 12 subtest scores and a Verbal and Perfor-
cultural impacts:
mance IQ. By contrast, the earlier versions of the
­Stanford-Binet supplied only a single overall summary
It is undesirable to seek for the cause of sex differences in score, the global IQ.
eminence in ultimate and obscure affective and intellec-
tual differences until we have exhausted as a cause the
known, obvious, and inescapable fact that women bear 2.2.2: Group Tests and The
and rear the children, and that this has had as an inevita-
ble sequel the occupation of housekeeping, a field where Classification of WWI Army Recruits
eminence is not possible. As a corollary it may be added Given the American penchant for efficiency, it was only
that … It is desirable, for both the enrichment of society natural that researchers would seek group mental tests to
and the peace of individuals, that women may find a way supplement the relatively time-consuming individual
to vary from their mode as men do, and yet procreate. intelligence tests imported from France. Among the first to
Such a course is at present hindered by individual preju-
develop group tests was Pyle (1913), who published
dice, poverty, and the enactment of legal measures. But
schoolchildren norms for a battery consisting of such well-
public expectation will slowly change, as the conditions
worn measures as memory span, digit-symbol substitu-
that generated that expectation have already changed,
and in another century the solution to this problem will tion, and oral word association (quickly writing down
have been found. words in response to a stimulus word). Pintner (1917)
(Hollingworth, 1914, p. 529) revised and expanded Pyle’s battery, adding to it a timed
40 Chapter 2

cancellation test in which the child crossed out the letter a


Figure 2.1 Sample Items from the Army Alpha
wherever it appeared in a body of text. Examination
But group tests were slow to catch on, partly because Source: Reprinted from Yerkes, R. M. (Ed.). (1921). Psychological examining
the early versions still had to be scored laboriously by hand. in the United States Army. Memoirs the National Academy of Sciences,
­Volume 15. With permission from the National Academy of Sciences,
The idea of a completely objective test with a simple scoring ­Washington, DC.
key was inconsistent with tests such as logical memory for Note: Examinees received verbal instructions for each subtest.

which the judgment of the examiner was required in scor-


ing. Most amazing of all—at least to anyone who has spent FOLLOWING ORAL DIRECTIONS

Mark a cross in the first and also the third circle:


any time as a student in American schools—the multiple-
choice question was not yet in general use. ARITHMETICAL REASONING

The slow pace of developments in group testing Solve each problem:


How many men are 5 men and 10 men? Answer ( )
picked up dramatically as the United States entered World If 3 1/2 tons of coal cost $21, what will 5 1/2 tons cost? Answer ( )
PRACTICAL JUDGMENT
War I in 1917. It was then that Robert M. Yerkes, a well-
Why are high mountains covered with snow? Because
known psychology professor at Harvard, convinced the they are near the clouds
the sun shines seldom on them
U.S. government and the Army that all of its 1.75 million the air is cold there

recruits should be given intelligence tests for purposes of SYNONYM–ANTONYM PAIRS


Are these words the same or opposite?
classification and assignment (Yerkes, 1919). Immediately largess—donation same? or opposite?
accumulate—dissipate same? or opposite?
upon being commissioned into the Army as a colonel, DISARRANGED SENTENCES
Yerkes assembled a Committee on the Examination of Can these words be rearranged to form a sentence?
envy bad malice traits are and true? or false?
Recruits, which met at the Vineland school in New Jersey
NUMBER SERIES COMPLETION
to develop the new group tests for the assessment of Army Complete the series: 3 6 8 16 18 36 ... ...
recruits. Yerkes chaired the committee; other famous mem- ANALOGIES

bers included Goddard and Terman. Which choice completes the analogy?
tears—sorrow :: laughter— joy smile girls grin
Two group tests emerged from this collaboration: the granary—wheat :: librar y— desk books paper librarian

Army Alpha and the Army Beta. It would be difficult to INFORMATION


Choose the best alternative:
overestimate the influence of the Alpha and Beta on subse- The pancreas is in the abdomen head shoulder neck
The Battle of Gettysburg was fought in 1863 1813 1778 1812
quent intelligence tests. The format and content of these
tests inspired developments in group and individual test-
ing for decades to come. We discuss these tests in some platform at the front of the class and engaged in panto-
detail so that the reader can appreciate their influence on mime to explain each of the eight tests.
modern intelligence tests. The Army testing was intended to help segregate and
eliminate the mentally incompetent, to classify men accord-
The Army Alpha and Beta Examinations The ing to their mental ability, and to assist in the placement of
Alpha was based on the then unpublished work of Otis competent men in responsible positions (Yerkes, 1921).
(1918) and consisted of eight verbally loaded tests for aver- However, it is not really clear whether the Army made
age and high-functioning recruits. The eight tests were (1) much use of the masses of data supplied by Yerkes and his
following oral directions, (2) arithmetical reasoning, (3) eager assistants. A careful reading of his memoirs reveals
practical judgment, (4) synonym–antonym pairs, (5) disar- that Yerkes did little more than produce favorable testimo-
ranged sentences, (6) number series completion, (7) analo- nials from high-ranking officers. In the main, his memoirs
gies, and (8) information. Figure 2.1 lists some typical items say that the Army could have saved millions of dollars and
from the Army Alpha examination. increased its efficiency if the testing data had been used.
The Army Beta was a nonverbal group test designed To some extent, the mountains of test data had little
for use with illiterates and recruits whose first language practical impact on the efficiency of the Army because of
was not English. It consisted of various visual-perceptual the resistance of the military mind to scientific innovation.
and motor tests such as tracing a path through mazes However, it is also true that the Army brass had good rea-
and visualizing the correct number of blocks depicted in son to doubt the validity of the test results. For example, an
a three-dimensional drawing. Figure 2.2 depicts the internal memorandum described the use of pantomime in
blackboard demonstrations for all eight parts of the Beta the instructions to the nonverbal Beta examination:
examination.
For the sake of making results from the various camps
In order to accommodate illiterate subjects and recent comparable, the examiners were ordered to follow a cer-
immigrants who did not comprehend English, Yerkes tain detailed and specific series of ballet antics, which had
instructed the examiners to use largely pictorial and ges- not only the merit of being perfectly incomprehensible
tural methods for explaining the tests to the prospective and unrelated to mental testing, but also lent a highly
Army recruits. The examiner and an assistant stood atop a confusing and distracting mystical atmosphere to the
The History of Psychological Testing 41

whole performance, effectually preventing all approach The Army Alpha and Beta were also released for gen-
to the attitude in which a subject should be while having eral use. These tests quickly became the prototypes for a
his soul tested. large family of group tests and influenced the character of
(cited in Samelson, 1977) intelligence tests, college entrance examinations, scholas-
In addition, the testing conditions left much to be desired, tic achievement tests, and aptitude tests. To cite just one
with wave upon wave of recruits ushered in one door, specific consequence of the Army testing, the National
tested, and virtually shoved out the other side. Tens of thou- Research Council, a government organization of scien-
sands of recruits received a literal zero for many subtests, tists, devised the National Intelligence Test, which was
not because they were retarded but because they couldn’t eventually given to 7 million children in the United States
fathom the instructions to these enigmatic new instruments. during the 1920s. Thus, such well-known tests as the
Many recruits fell asleep while the testers gave esoteric and Wechsler scales, the Scholastic Aptitude Tests, and the
mysterious pantomime instructions. Graduate Record Exam actually have roots that reach
On the positive side, the Army testing provided psychol- back to Yerkes, Otis, and the mass testing of Army recruits
ogists with a tremendous amount of experience in the psy- during World War I.
chometrics of test construction. Thousands of correlation The College Entrance Examination Board (CEEB) was
coefficients were computed, including the prominent use of established at the turn of the twentieth century to help
multiple correlations in the analysis of test data. Test construc- avoid duplication in the testing of applicants to U.S. col-
tion graduated from an art to a science in a few short years. leges. The early exams had been of the short answer essay
format, but this was to change quickly when C. C.
Brigham, a disciple of Yerkes, became CEEB secretary after
2.2.3: Early Educational Testing World War I. In 1925, the College Board decided to con-
For good or for ill, Yerkes’s grand scheme for testing Army struct a scholastic aptitude test for use in college admis-
recruits helped to usher in the era of group tests. After sions (Goslin, 1963). The new tests reflected the now
World War I, inquiries rushed in from industry, public familiar objective format of unscrambling sentences, com-
schools, and colleges about the potential applications of pleting analogies, and filling in the next number in a
these straightforward tests that almost anyone could sequence. Machine scoring was introduced in the 1930s,
administer and score (Yerkes, 1921). The psychologists making objective group tests even more efficient than
who had worked with Yerkes soon left the service and car- before. These tests then evolved into the present College
ried with them to industry and education their newfound Board tests, in particular, the Scholastic Aptitude Tests,
notion of paper-and-pencil tests of intelligence. now known as the Scholastic Assessment Tests.
The functions of the CEEB were later subsumed under
Figure 2.2 The Blackboard Demonstrations for All Eight the nonprofit Educational Testing Service (ETS). The ETS
Parts of the Beta Examination directed the development, standardization, and validation
Source: Reprinted from Yerkes, R. M. (Ed.). (1921). Psychological examining of such well-known tests as the Graduate Record Examina-
in the United States Army. Memoirs of the National Academy of Sciences,
­Volume 15. With permission from the National Academy of Sciences, tion, the Law School Admissions Test, and the Peace Corps
­Washington, DC.
Entrance Tests.
Meanwhile, Terman and his associates at Stanford
were busy developing standardized achievement tests.
The Stanford Achievement Test (SAchT) was first pub-
lished in 1923; a modern version of it is still in wide use
today. From the very beginning, the SAchT incorporated
such modern psychometric principles as norming the sub-
tests so that within-subject variability could be assessed
and selecting a very large and representative standardiza-
tion sample.

2.2.4: The Development


of Aptitude Tests
Aptitude tests measure more specific and delimited abili-
ties than intelligence tests. Traditionally, intelligence tests
assess a more global construct such as general intelligence,
although there are exceptions to this trend that will be dis-
cussed later. By contrast, a single aptitude test will measure
42 Chapter 2

just one ability domain, and a multiple aptitude test bat- detecting Army recruits who were susceptible to psycho-
tery will provide scores in several distinctive ability areas. neurosis. Virtually all the modern personality inventories,
The development of aptitude tests lagged behind that schedules, and questionnaires owe a debt to Woodworth’s
of intelligence tests for two reasons, one statistical, the Personal Data Sheet (1919).
other social. The statistical problem was that a new tech- The Personal Data Sheet consisted of 116 questions
nique, factor analysis, was often needed to discern which that the subject was to answer by underlining Yes or No.
aptitudes were primary and, therefore, distinct from each The questions were exclusively of the “face obvious” vari-
other. Research on this question had been started quite ety and, for the most part, involved fairly serious symp-
early by Spearman (1904) but was not refined until the tomatology. Representative items included:
1930s (Spearman, 1927; Kelley, 1928; Thurstone, 1938). This
• Do ideas run through your head so that you cannot
new family of techniques, factor analysis, allowed Thurs-
sleep?
tone to conclude that there were specific factors of primary
• Were you considered a bad boy?
mental ability such as verbal comprehension, word flu-
ency, number facility, spatial ability, associative memory, • Are you bothered by a feeling that things are not real?
perceptual speed, and general reasoning (Thurstone, 1938; • Do you have a strong desire to commit suicide?
Thurstone & Thurstone, 1941). More will be said about this
Readers familiar with the Minnesota Multiphasic
in the later chapters on intelligence and ability testing. The
Personality Inventory (MMPI) must surely recognize the
important point here is that Thurstone and his followers
debt that this more recent inventory has to Woodworth’s
thought that global measures of intelligence did not, so to
instrument.
speak, “cut nature at its joints.” As a result, it was felt that
The next major development was an inventory of neu-
such measures as the Stanford-Binet were not as useful as
rosis, the Thurstone Personality Schedule (Thurstone &
multiple aptitude test batteries in determining a person’s
Thurstone, 1930). After first culling hundreds of items
intellectual strengths and weaknesses.
answerable in the yes-no? manner from Woodworth’s
The second reason for the slow growth of aptitude
inventory and other sources, Thurstone rationally keyed
batteries was the absence of a practical application for
items in terms of how the neurotic would typically answer
such refined instruments. It was not until World War II
them. Reflecting Thurstone’s penchant for statistical
that a pressing need arose to select candidates who were
finesse, this inventory was one of the first to use the method
highly qualified for very difficult and specialized tasks.
of internal consistency whereby each prospective item was
The job requirements of pilots, flight engineers, and navi-
correlated with the total score on the tentatively identified
gators were very specific and demanding. A general esti-
scale to determine whether it belonged on the scale.
mate of intellectual ability, such as provided by the group
From the Thurstone test sprang the Bernreuter Person-
intelligence tests used in World War I, was not sufficient
ality Inventory (Bernreuter, 1931). It was a little more
to choose good candidates for flight school. The armed
refined than its Thurstone predecessor, measuring four
forces solved this problem by developing a specialized
personality dimensions: neurotic tendency, self-sufficiency,
aptitude battery of 20 tests that was administered to men
introversion-extroversion, and dominance-submission. A
who passed preliminary screening tests. These measures
major innovation in test construction was that a single test
proved invaluable in selecting pilots, navigators, and
item could contribute to more than one scale.
bombadiers, as reflected in the much lower washout rates
Any chronology of self-report inventories must surely
of men selected by test battery instead of the old methods
include the Minnesota Multiphasic Personality Inventory
(Goslin, 1963). Such tests are still used widely in the
(MMPI; Hathaway & McKinley, 1940). This test and its
armed services.
revision, the MMPI-2, are discussed in detail later. It will
suffice for now to point out that the scales of the MMPI
2.2.5: Personality and Vocational were constructed by the method that Woodworth pio-
Testing After WWI neered, contrasting the responses of normal and psychiat-
rically disturbed subjects. In addition, the MMPI
Although such rudimentary assessment methods as the
introduced the use of validity scales to determine fake bad,
free association technique had been used before the turn
fake good, and random response patterns.
of the twentieth century by Galton, Kraepelin, and others,
it was not until World War I that personality tests emerged
in a form resembling their contemporary appearance. As
2.2.6: The Origins of
has happened so often in the history of testing, it was once
again a practical need that served as the impetus for this Projective Testing
new development. Modern personality testing began The projective approach originated with the word associa-
when Woodworth attempted to develop an instrument for tion method pioneered by Francis Galton in the late 1800s.
The History of Psychological Testing 43

Galton gave himself four seconds to come up with as many Murray (1938) believed that underlying personality
associations as possible to a stimulus word and then cate- needs, such as the need for achievement, would be revealed
gorized his associations as parrotlike, image-mediated, or by the contents of the stories. Although numerous scoring
histrionic representations. This latter category convinced systems were developed, clinicians in the main have relied
him that mental operations “sunk wholly below the level on an impressionistic analysis to make sense of TAT proto-
of consciousness” were at play. Some historians have even cols. Modern applications of the TAT are discussed in a
speculated that Freud’s application of free association as a later chapter.
therapeutic tool in psychoanalysis sprang from Galton’s The sentence completion technique was also begun
paper published in Brain in 1879 (Forrest, 1974). during this era with the work of Payne (1928). There have
Galton’s work was continued in Germany by Wundt been numerous extensions and variations on the technique,
and Kraepelin and finally brought to fruition by Jung which consists of giving subjects a stem such as “I am
(1910). Jung’s test consisted of 100 stimulus words. For bored when ———,” and asking them to complete the sen-
each word, the subject was to reply as quickly as possible tence. Some modern applications are discussed later, but it
with the first word coming to mind. Kent and Rosanoff can be mentioned now that the problem of scoring and
(1910) gave the association method a distinctively Ameri- interpretation, which vexed early sentence completion test
can flavor by tabulating the reactions of 1,000 normal sub- developers, is still with us today.
jects to a list of 100 stimulus words. These tables were An entirely new approach to projective testing was
designed to provide a basis for comparing the reactions of taken by Goodenough (1926), who tried to determine not
normal and “insane” subjects. just intellectual level but also the interests and personality
While the Americans were pursuing the empirical traits of children by analyzing their drawings. Buck’s
approach to objective personality testing, a young Swiss (1948) test, the House-Tree-Person, was a little more stand-
psychiatrist, Hermann Rorschach (1884–1922), was devel- ardized and structured and required the subject to draw a
oping a completely different vehicle for studying personal- house, a tree, and a person. Machover’s (1949) Personality
ity. Rorschach was strongly influenced by Jungian and Projection in the Drawing of the Human Figure was the logical
psychoanalytic thinking, so it was natural that his new extension of the earlier work. Figure drawing as a projec-
approach focused on the tendency of patients to reveal tive approach to understanding personality is still used
their innermost conflicts unconsciously when responding today, and a later chapter discusses modern developments
to ambiguous stimuli. The Rorschach and other projective in this practice.
tests discussed subsequently were predicated on the pro- Meanwhile, projective testing in Europe was domi-
jective hypothesis: When responding to ambiguous or nated by the Szondi Test, a wacky instrument based on
unstructured stimuli, we inadvertently disclose our inner- wholly faulty premises. Lipot Szondi was a Hungarian-
most needs, fantasies, and conflicts. born Swiss psychiatrist who believed that major psychiat-
Rorschach was convinced that people revealed impor- ric disorders were caused by recessive genes. His test
tant personality dimensions in their responses to inkblots. consisted of 48 photographs of psychiatric patients
He spent years developing just the right set of 10 inkblots divided into six sets of the following eight types: homo-
and systematically analyzed the responses of personal sexual, epileptic, sadistic, hysteric, catatonic, paranoiac,
friends and different patient groups (Rorschach, 1921). manic, and depressive (Deri, 1949). From each set of eight
Unfortunately, he died only a year after his monograph pictures, the subject was instructed to select the two pic-
was published, and it was up to others to complete his tures he or she liked best and the two disliked most. A per-
work. Developments in the Rorschach are reviewed later son who consistently preferred one kind of picture in the
in the text. six sets was presumed to have some recessive genes that
Whereas Rorschach’s test was originally developed to made him or her have sympathy for the pictured person.
reveal the innermost workings of the abnormal subject, the Thus, projective preferences were presumed to reveal
TAT, or Thematic Apperception Test (Morgan & Murray, recessive genes predisposing the individual to specific
1935), was developed as an instrument to study normal psychiatric disturbances.
personality. Of course, both have since been expanded for Deri (1949) imported the test to the United States and
testing with the entire continuum of human behavior. changed the rationale. She did not argue for a recessive
The TAT consists of a series of pictures that largely genetic explanation of picture choice but explained such
depict one or more persons engaged in an ambiguous preferences on the basis of unconscious identification with
interaction. The subject is shown one picture at a time and the characteristics of the photographed patients. This was a
told to make up a story about it. He or she is instructed to more palatable theoretical basis for the test than the dubi-
be as dramatic as possible, to discuss thoughts and feel- ous genetic theories of Szondi. Nonetheless, empirical
ings, and to describe the past, present, and future of what research cast doubt on the validity of the Szondi Test, and
is depicted in the picture. it shortly faded into oblivion.
44 Chapter 2

2.2.7: The Development of Interest (Hathaway & McKinley, 1940, 1942, 1943). Subsequently,
applications of this empirically based true-false inventory
Inventories have expanded to include assessment of personal and
While the clinicians were developing measures for ana- social adjustment, pre-employment screening of individu-
lyzing personality and unconscious conflicts, other psy- als in high-risk law enforcement positions, testing of
chologists were devising measures for guidance and patients in medical and substance abuse settings, evalua-
counseling of the masses of more normal persons. Chief tion of persons in forensic or courtroom proceedings, and
among such measures was the interest inventory, which appraisal of college students for career counseling
has roots going back to Thorndike’s (1912) study of devel- (Butcher, 2005). Many other useful tests followed along-
opmental trends in the interests of 100 college students. In side this pathbreaking measure, now in its second edition
1919–1920, Yoakum developed a pool of 1,000 items relat- (MMPI-2). Some widely used alternative tests include the
ing to interests from childhood through early maturity Sixteen Personality Factor Questionnaire (16PF), a test
(DuBois, 1970). Many of these items were incorporated in derived from factor analysis, useful in the evaluation of
the Carnegie Interest Inventory. Cowdery (1926–1927) normal and abnormal personality; the California Psycho-
improved and refined previous work on the Carnegie logical Inventory (CPI, Gough, 1987) a spinoff from the
instrument by increasing the number of items, comparing MMPI that measures folk concepts like responsibility,
responses of three criterion groups (doctors, engineers, dominance, tolerance, and flexibility; and, the Myers-
and lawyers) with control groups of nonprofessionals, Briggs Type Indicator (MBTI; Myers & McCaulley, 1985), a
and developing a weighting formula for items. He was self-report inventory based on Carl Jung’s theory of
also the first psychometrician to realize the importance of ­personality types. The MBTI is widely used in corporate
cross validation. He tested his new scales on additional settings.
groups of doctors, engineers, and lawyers to ensure that More recently, some personality tests demonstrate
the discriminations found in the original studies were allegiance to a theory known as the “big 5” model, which is
reliable group differences rather than capitalizations on commonly viewed as the consensus model of personality
error variance. (Goldberg, 1990). According to this approach, five factors
Edward K. Strong (1884–1963) revised Cowdery’s test of personality are sufficient to capture the important
and devoted 36 years to the development of empirical keys domains of individual functioning. These factors are neu-
for the modified instrument known as the Strong Voca- roticism, extraversion, openness to experience, agreeable-
tional Interest Blank (SVIB). Persons taking the test could ness, and conscientiousness. Well respected tests loyal to
be scored on separate keys for several dozen occupations, this approach include the NEO-Personality Inventory-
providing a series of scores of immeasurable value in voca- Revised (Costa & McCrae, 1992), the Five-Factor Personal-
tional guidance. The SVIB became one of the most widely ity Inventory (FFPI, Hendriks, Hofstee, & De Raad, 1999),
used tests of all time (Strong, 1927). Its modern version, the and the NEO-Personality Inventory-3 (Costa, McCrae, &
Strong Interest Inventory, is still widely used by guidance Martin, 2005).
counselors. Tens of millions of individuals undergo personality
For decades the only serious competitor to the SVIB testing each year. According to its publisher, the Myers-
was the Kuder Preference Record (Kuder, 1934). The Kuder Briggs Type Indicator is given to more than 1.5 million indi-
differed from the Strong by forcing choices within triads of viduals annually, including employees of most Fortune 500
items. The Kuder was an ipsative test; that is, it compared companies. Worldwide, an estimated 15 million persons
the relative strength of interests within the individual, take the MMPI in its different versions each year (MMPI-A,
rather than comparing his or her responses to various pro- for adolescents, MMPI-2, for adults) (Paul, 2004). The test
fessional groups. More recent revisions of the Kuder Pref- has been translated with wild profusion into dozens of lan-
erence Record include the Kuder General Interest Survey guages (Butcher, 2000). Another widely translated test is the
and the Kuder Occupational Interest Survey (Kuder, 1966; 16 Personality Factor (16PF), which has been adapted into
Kuder & Diamond, 1979). 35 languages. In each setting, the test is interpreted accord-
ing to local norms (Cattell & Mead, 2008). Although exact
2.2.8: The Emergence of Structured figures are not available, beyond a doubt the 16PF is taken
by millions of individuals annually.
Personality Tests
Beginning in the 1940s, personality tests began to flourish
2.2.9: The Expansion and
as useful tools for clinical evaluation and also for assess-
ment of the normal spectrum of functioning. The most Proliferation of Testing
respected and highly researched device of this genre is the In the twenty-first century, the reach of testing continues
MMPI, initially conceived to facilitate psychiatric diagnosis to increase, both in one-to-one clinical uses and in group
The History of Psychological Testing 45

testing for societal applications. Regarding one-to-one 2.2.10: Evidence-Based Practice


assessment, clinical psychology has spawned several new
specialties, each requiring innovative approaches to test-
and Outcomes Assessment
ing. For example, once merely an area of focus within Evidence-based practice is an important trend in health
psychological practice, clinical neuropsychology is now a care, education, and other fields. This recent movement
well-defined domain of expertise with specialized tests will greatly boost the need for assessment with tests and
used mainly by those with proper credentials. In a mas- outcome measures. According to the Institute of Medicine
sive tome that runs to 1,240 pages, Strauss, Sherman, and (IOM, 2001), evidence-based practice is “the integration of
Spreen (2006) provide norms and commentary for nearly best research evidence with clinical expertise and patient
100 neuropsychological tests and scales. Health psychol- values (p. 147).” The advance of evidence-based practice is
ogy is another emerging specialty that has generated part of a worldwide trend to require proof that treatments
many new tests, as evidenced by the twin volumes Meas- and interventions yield measurable positive outcomes. Of
uring Health and Measuring Disease (Bowling, 1997, 2001). course, whenever measurement is needed, psychological
These books detail hundreds of measures of health status tests often are the best alternative. In education, for exam-
and illness, including tests of well-being, quality of life ple, recent federal legislation such as the No Child Left
measures, and disease impact scales. Additional special- Behind (NCLB) Act (2001), which promotes standards-
ties, each with a panoply of new tests, include child clini- based educational change, absolutely requires regular aca-
cal psychology, forensic psychology, and industrial- demic achievement testing with validated instruments. In
organizational psychology. The number of available tests 2012, a revised version of NCLB was reauthorized. This
for individual clinical assessment surely must number in law likely will remain a driving force behind increased
the many thousands. educational assessment for years to come.
Group testing for broad social purposes such as edu- In psychology, the evidence-based movement has led to
cational assessment, entry to college and graduate school, evidence-based psychological practice (EBPP), which man-
and certification in the professions also continues to dates the practice of empirically supported interventions
expand. Testing is probably more widely used and more (APA Task Force, 2006). EBPP also involves the use of out-
important now than at any time in history. Consider just comes assessment with psychotherapy patients. Increas-
one arena for group testing, the millions of standardized ingly, insurance companies require periodic assessments
tests administered every year in public school systems. with short, simple outcome measures as a condition for
According to FairTest, a national advocacy group for fair ongoing reimbursement. EBPP is here to stay. It will pro-
and open testing, more than 100 million standardized mote increased testing with brief measures such as the Out-
tests—including achievement, IQ, screening, and readi- come Rating Scale (ORS, Miller & Duncan, 2000), an index of
ness tests—were given in America’s public schools in 2007 a patient’s current functioning. The ORS is a visual analogue
(www.fairtest.org/testing-explosion-0). Regarding group scale consisting of four 10-centimeter lines, each represent-
testing for college and graduate school admissions, based ing a bipolar dimension of well-being (individual, interper-
on relevant websites, more than 3 million students take sonal, social, and general). The patient merely places a hash
the Scholastic Assessment Test (SAT) or the American Col- mark on each line. The distance from the starting point in
lege Test (ACT) each year, and more than 600,000 thou- centimeters is the score for each dimension. These scores are
sand students complete the Graduate Record Exam each summed to obtain the total score, which can range from 0 to
year. Many tens of thousands of applicants also take spe- 40. The scale takes less than a minute to complete, and pro-
cialized tests for professional training like the MCAT vides a surprisingly reliable and valid index of current func-
(Medical College Admissions Test), the LSAT (Law School tioning (Miller, Duncan, Brown, Sparks, & Claud, 2003).
Admissions Test), and the GMAT (Graduate Management
Admissions Test). Chapter Quiz: The History of Psychological Testing
Chapter 5
Theories and Individual
Tests of Intelligence
and Achievement
Learning Objectives
5.1a Analyze various meanings of the term 5.2 Analyze the individual intelligence tests
intelligence
5.1b Discuss the influence that definitions and
theories have on the structure and content
of intelligence tests

5.1: Theories of Intelligence Intelligence is one of the most highly researched topics
in psychology. Thousands of research articles are pub-
and Factor Analysis lished each year on the nature and measurement of intelli-
gence. New journals such as Intelligence and The Journal of
5.1a Analyze various meanings of the term intelligence Psychoeducational Assessment have flourished in response to
5.1b Discuss the influence that definitions and theories the scholarly interest in this topic. Despite this burgeoning
have on the structure and content of intelligence research literature, the definition of intelligence remains
tests elusive, wrapped in controversy and mystery. In fact, the
discussion that follows will illustrate a major paradox of
This chapter opens an extended discussion of intelligence
modern testing: Psychometricians are better at measuring
and achievement testing, a topic so important and immense
intelligence than conceptualizing it!
that we devote the next two chapters to it as well. In order
Even though defining intelligence has proved to be a
to understand contemporary cognitive testing, the reader
frustrating endeavor, there is much to be gained by review-
will need to assimilate certain definitions, theories, and
ing historical and contemporary efforts to clarify its mean-
mainstream assessment practices. The goal of Module 5.1,
ing. After all, intelligence tests did not materialize out of
Theories of Intelligence and Factor Analysis, is to investi-
thin air. Most tests are grounded in a specific theory of intel-
gate the various meanings given to the term intelligence
ligence and most test developers offer a definition of the
and to discuss how definitions and theories have influ-
construct as a starting point for their endeavors. For these
enced the structure and content of intelligence tests. An
reasons, we can better understand and evaluate the multi-
important justification for this topic is that an understand-
faceted character of contemporary tests if we first review
ing of theories of intelligence is crucial for establishing the
prominent definitions and theories of intelligence.
construct validity of IQ measures. Furthermore, because
the statistical tools of factor analysis are so vital to many
theories of intelligence, we provide a primer of the topic 5.1.1: Definitions of Intelligence
here. In Module 5.2, Individual Tests of Intelligence and Before we discuss definitions of intelligence, we need to
Achievement, we summarize a number of noteworthy clarify the nature of definition itself. Sternberg (1986) makes
approaches to individual assessment and focus on one a distinction between operational and “real” definitions
important application, the evaluation of learning disabili- that is important in this context. An operational definition
ties. We begin with a foundational question: How is intel- defines a concept in terms of the way it is measured. Boring
ligence defined? (1923) carried this viewpoint to its extreme when he defined

100
Theories and Individual Tests of Intelligence and Achievement 101

intelligence as “what the tests test.” Believe it or not, this


Definition of Intelligence provided by different experts.
was a serious proposal, designed largely to short-circuit
rampant and divisive disagreements about the definition of
intelligence.
Operational definitions of intelligence suffer from two
dangerous shortcomings (Sternberg, 1986). First, they are
circular. Intelligence tests were invented to measure intel-
ligence, not to define it. The test designers never intended
for their instruments to define intelligence. Second, opera-
tional definitions block further progress in understanding
the nature of intelligence, because they foreclose discus-
sion on the adequacy of theories of intelligence.
This second problem—the potentially stultifying effects
of relying on operational definitions of intelligence—casts
doubt on the common practice of affirming the concur-
rent validity of new tests by correlating them with old
tests. If established tests serve as the principal criterion
against which new tests are assessed, then the new tests
will be viewed as valid only to the extent that they corre-
late with the old ones. Such a conservative practice drasti-
cally curtails innovation. The operational definition of
intelligence does not allow for the possibility that new reader can consult Bracken and Fagan (1990), Sternberg
tests or conceptions of intelligence may be superior to the (1994), and Sternberg and Detterman (1986) for additional
existing ones. ideas. Certainly, this sampling of views is sufficient to dem-
We must conclude, then, that operational definitions of onstrate that there appear to be as many definitions of intel-
intelligence leave much to be desired. In contrast, a real ligence as there are experts willing to define it!
definition is one that seeks to tell us the true nature of the In spite of this diversity of viewpoints, two themes
thing being defined (Robinson, 1950; Sternberg, 1986). Per- recur again and again in expert definitions of intelligence.
haps the most common way—but by no means the only Broadly speaking, the experts tend to agree that intelli-
way—of producing real definitions of intelligence is to ask gence is (1) the capacity to learn from experience and
experts in the field to define it. (2) the capacity to adapt to one’s environment. That learn-
ing and adaptation are both crucial to intelligence stands
out with poignancy in certain cases of mental disability in
Expert Definitions of Intelligence  Intelligence
which persons fail to possess one or the other capacity in
has been given many real definitions by prominent
sufficient degree (Case Exhibit 5.1).
researchers in the field. In the following, we list several
examples, paraphrased slightly for editorial consistency.
The reader will note that many of these definitions
appeared in an early but still influential symposium, “Intel- Case Exhibit 5.1
ligence and Its Measurement,” published in the Journal of
Learning and Adaptation as Core Functions
Educational Psychology (Thorndike, 1921). Other definitions
of Intelligence
stem from a modern update of this early symposium, What
Is Intelligence?, edited by Sternberg and Detterman (1986). Persons with mental disability often demonstrate the im-
Intelligence has been defined as the following: portance of experiential learning and environmental adap-
The preceding list of definitions is representative tation as key ingredients of intelligence. Consider the case
although definitely not exhaustive. For one thing, the list is history of a 61-year-old newspaper vendor with moderate
exclusively Western and omits several cross-cultural con- mental retardation well known to local mental health spe-
ceptions of intelligence. Eastern conceptions of intelligence, cialists. He was an interesting if not eccentric gentleman
for example, emphasize benevolence, humility, freedom who stored canned goods in his freezer and cursed at wel-
from conventional standards of judgment, and doing what fare workers who stopped by to see how he was doing. In
is right as essential to intelligence. Many African concep- spite of his need for financial support from a state agency,
tions of intelligence place heavy emphasis on social aspects he was fiercely independent and managed his own house-
of intelligence such as maintaining harmonious and stable hold with minimal supervision from case workers. Thus, in
intergroup relations (Sternberg & Kaufman, 1998). The some respects he maintained a tenuous adaptation to his
102 Chapter 5

environment. To earn much-needed extra income, he sold competence to be the key ingredients in intelligence. Of
a local 25-cent newspaper from a streetside newsstand. He course, opinions were not unanimous; these conceptions
recognized that a quarter was proper payment and had represent the consensus view of each group. In their con-
learned to give three quarters in change for a dollar bill. He ception of intelligence, experts place more emphasis on
refused all other forms of payment, an arrangement that verbal ability than problem solving, whereas laypersons
his customers could accept. But one day the price of the reverse these priorities. Nonetheless, experts and layper-
newspaper was increased to 35 cents, and the newspaper sons alike consider verbal ability and problem solving to
vendor was forced to deal with nickels and dimes as well as be essential aspects of intelligence. As the reader will see,
quarters and dollar bills. The amount of learning required most intelligence tests also accent these two competencies.
by this slight shift in environmental demands exceeded his Prototypical examples would be vocabulary (verbal abil-
intellectual abilities, and, sadly, he was soon out of busi- ity) and block design (problem solving) from the Wechsler
ness. His failed efforts highlight the essential ingredients scales, discussed later. We see then that everyday concep-
of intelligence: learning from experience and adaptation to tions of intelligence are, in part, mirrored quite faithfully
the environment. by the content of modern intelligence tests.
How well do intelligence tests capture the experts’ Some disagreement between experts and lay-persons
view that intelligence consists of learning from experience is also evident. Experts consider practical intelligence (siz-
and adaptation to the environment? The reader should keep ing up situations, determining how to achieve goals,
this question in mind as we proceed to review major intelli- awareness and interest in the world) an essential constitu-
gence tests in the topics that follow. Certainly, there is cause ent of intelligence, whereas laypersons identify social com-
for concern: Very few contemporary intelligence tests ap- petence (accepting others for what they are, admitting
pear to require the examinee to learn something new or to mistakes, punctuality, and interest in the world) as a third
adapt to a new situation as part and parcel of the examina- component. Yet, these two nominations do share one prop-
tion process. At best, prominent modern tests provide indi- erty in common: Contemporary tests generally make no
rect measures of the capacities to learn and adapt. How well attempt to measure either practical intelligence or social
they capture these dimensions is an empirical question that competence. Partly, this reflects the psychometric difficul-
must be demonstrated through validational research. ties encountered in devising test items relevant to these
content areas. However, the more influential reason intelli-
gence tests do not measure practical intelligence or social
competence is inertia: Test developers have blindly
Layperson and Expert Conceptions of Intelli- accepted historically incomplete conceptions of intelli-
gence  Another approach to understanding a construct gence. Until recently, the development of intelligence test-
is to study its popular meaning. This method is more scien- ing has been a conservative affair, little changed since the
tific than it may appear. Words have a common meaning to days of Binet and the Army Alpha and Beta tests for World
the extent that they help provide an effective portrayal of War I recruits. There are some signs that testing practices
everyday transactions. If laypersons can agree on its mean- may soon evolve, however, with the development of inno-
ing, a construct such as intelligence is in some sense “real” vative instruments. For example, Sternberg and colleagues
and, therefore, potentially useful. Thus, asking persons on have proposed innovative tests based on his model of
the street, “What does intelligence mean to you?” has intelligence. Another interesting instrument based on a
much to recommend it. new model of intelligence is the Everyday Problem Solving
Sternberg, Conway, Ketron, and Bernstein (1981) con- Inventory (Cornelius & Caspi, 1987). In this test, examinees
ducted a series of studies to investigate conceptions of intel- must indicate their typical response to everyday problems
ligence held by American adults. In the first study, people such as failing to bring money, checkbook, or credit card
in a train station, entering a supermarket, and studying in a when taking a friend to lunch.
college library were asked to list behaviors characteristic of Many theorists in the field of intelligence have relied
different kinds of intelligence. In a second study—the only on factor analysis for the derivation or validation of their
one discussed here—both laypersons and experts (mainly theories. In fact, it is not an overstatement to say that per-
academic psychologists) rated the importance of these haps the majority of the theories in this area have been
behaviors to their concept of an “ideally intelligent” person. impacted by the statistical tools of factor analysis, which
The behaviors central to expert and lay conceptions of provide ways to portion intelligence into its subcompo-
intelligence turned out to be very similar, although not nents. One of the most compelling theories of intelligence,
identical. In order of importance, experts saw verbal intel- the Cattell-Horn-Carroll theory reviewed later, would not
ligence, problem-solving ability, and practical intelligence exist without factor analysis. Thus, before summarizing
as crucial to intelligence. Laypersons regarded practical theories, we provide a brief review of this essential statisti-
problem-solving ability, verbal ability, and social cal tool.
Theories and Individual Tests of Intelligence and Achievement 103

5.1.2: A primer of Factor Analysis depicted in Table 5.1. Surely some of these tests measure
common underlying abilities. For example, we would
Broadly speaking, there are two forms of factor analysis:
expect Sentence Completion, Word Classification, and
confirmatory and exploratory. In confirmatory factor anal-
Word Meaning (variables 7, 8, and 9) to assess a factor of
ysis, the purpose is to confirm that test scores and variables
general language ability of some kind. In like manner,
fit a certain pattern predicted by a theory. For example, if
other groups of tests seem likely to measure common
the theory underlying a certain intelligence test prescribed
underlying abilities—but how many abilities or factors?
that the subtests belong to three factors (e.g., verbal, per-
And what is the nature of these underlying abilities? Factor
formance, and attention factors), then a confirmatory factor
analysis is the ideal tool for answering these questions. We
analysis could be undertaken to evaluate the accuracy of
follow the factor analysis of the Holzinger and Swineford
this prediction. Confirmatory factor analysis is essential to
(1939) data from beginning to end.
the validation of many ability tests.
The central purpose of exploratory factor analysis is to
summarize the interrelationships among a large number of Table 5.1 The 24 Ability Tests Used by Holzinger and
variables in a concise and accurate manner as an aid in Swineford (1939)
conceptualization (Gorsuch, 1983). For instance, factor 1. Visual Perception
analysis may help a researcher discover that a battery of 20 2. Cubes
tests represents only four underlying variables, called 3. Paper Form Board
factors. The smaller set of derived factors can be used to 4. Flags
represent the essential constructs that underlie the com- 5. General Information

plete group of variables. 6. Paragraph Comprehension


7. Sentence Completion
Perhaps a simple analogy will clarify the nature of fac-
8. Word Classification
tors and their relationship to the variables or tests from
9. Word Meaning
which they are derived. Consider the track-and-field
10. Add Digits
decathlon, a mixture of 10 diverse events including sprints, 11. Code (Perceptual Speed)
hurdles, pole vault, shot put, and distance races, among 12. Count Groups of Dots
others. In conceptualizing the capability of the individual 13. Straight and Curved Capitals
decathlete, we do not think exclusively in terms of the par- 14. Word Recognition
ticipant’s skill in specific events. Instead, we think in terms 15. Number Recognition
of more basic attributes such as speed, strength, coordina- 16. Figure Recognition
tion, and endurance, each of which is reflected to a differ- 17. Object-Number
ent extent in the individual events. For example, the pole 18. Number-Figure

vault requires speed and coordination, while hurdle events 19. Figure-Word
20. Deduction
demand coordination and endurance. These inferred
21. Numerical Puzzles
attributes are analogous to the underlying factors of factor
22. Problem Reasoning
analysis. Just as the results from the 10 events of a decath-
23. Series Completion
lon may boil down to a small number of underlying factors 24. Arithmetic Problems
(e.g., speed, strength, coordination, and endurance), so too
may the results from a battery of 10 or 20 ability tests reflect
The Correlation Matrix The beginning point for
the operation of a small number of basic cognitive attrib-
every factor analysis is the correlation matrix, a complete
utes (e.g., verbal skill, visualization, calculation, and atten-
table of intercorrelations among all the variables.1 The cor-
tion, to cite a hypothetical list). This example illustrates the
relations between the 24 ability variables discussed here
goal of factor analysis: to help produce a parsimonious
can be found in Table 5.2. The reader will notice that vari-
description of large, complex data sets.
ables 7, 8, and 9 do, indeed, intercorrelate quite strongly
We will illustrate the essential concepts of factor analy-
(correlations of .62, .69, and .53), as we suspected earlier.
sis by pursuing a classic example concerned with the num-
This pattern of intercorrelations is presumptive evidence
ber and kind of factors that best describe student abilities.
that these variables measure something in common; that is,
Holzinger and Swineford (1939) gave 24 ability-related
it appears that these tests reflect a common underlying factor.
psychological tests to 145 junior high school students from
Forest Park, Illinois. The factor analysis described later was
1
based on methods outlined in Kinnear and Gray (1997). In this example, the variables are tests that produce more or less
continuous scores. But the variables in a factor analysis can take
It should be intuitively obvious to the reader that any
other forms, so long as they can be expressed as continuous scores.
large battery of ability tests will reflect a smaller number of
For example, all of the following could be variables in a factor anal-
basic, underlying abilities (factors). Consider the 24 tests
ysis: height, weight, income, social class, and rating-scale results.
104 Chapter 5

Table 5.2 The Correlation Matrix for 24 Ability Variables


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
2 32
3 40 32
4 47 23 31
5 32 29 25 23
6 34 23 27 33 62
7 30 16 22 34 66 72
8 33 17 38 39 58 53 62
9 33 20 18 33 72 71 69 53
10 12 06 08 10 31 20 25 29 17
11 31 15 09 11 34 35 23 30 28 48
12 31 15 14 16 22 10 18 27 11 59 43
13 49 24 32 33 34 31 35 40 28 41 54 51
14 13 10 18 07 28 29 24 25 26 17 35 13 20
15 24 13 07 13 23 25 17 18 25 15 24 17 14 37
16 41 27 26 32 19 29 18 30 24 12 31 12 28 41 33
17 18 01 18 19 21 27 23 26 27 29 36 28 19 34 35 32
18 37 26 21 25 26 17 16 25 21 32 35 35 32 21 33 34 45
19 27 11 31 14 19 25 23 27 27 19 29 11 26 21 19 26 32 36
20 37 29 30 34 40 44 45 43 45 17 20 25 24 30 27 39 26 30 17
21 37 31 17 35 32 26 31 36 27 41 40 36 43 18 23 35 17 36 33 41
22 41 23 25 38 44 39 40 36 48 16 30 19 28 24 25 28 27 32 34 46 37
23 47 35 38 34 44 43 41 50 50 26 25 35 38 24 26 36 29 27 30 51 45 50
24 28 21 20 25 42 43 44 39 42 53 41 41 36 30 17 26 33 41 37 37 45 38 43

Note: Decimals omitted.


Source: Reprinted with permission from Holzinger, K., & Harman, H. (1941). Factor analysis: A synthesis of factorial methods. Chicago: University of Chicago Press. Copyright © 1941
The University of Chicago Press.

However, this kind of intuitive factor analysis based on a factor I, indicating that this test is a reasonably good index
visual inspection of the correlation matrix is hopelessly of factor I. Note also that Series Completion has a modest
limited; there are just too many intercorrelations for the negative loading of −.11 on factor II, indicating that, to a
viewer to discern the underlying patterns for all the varia- slight extent, it measures the opposite of this factor; that is,
bles. Here is where factor analysis can be helpful. Although high scores on Series Completion tend to signify low scores
we cannot elucidate the mechanics of the procedure, factor on factor II, and vice versa.
analysis relies on modern high-speed computers to search The factors may seem quite mysterious, but in reality
the correlation matrix according to objective statistical rules they are conceptually quite simple. A factor is nothing
and determine the smallest number of factors needed to more than a weighted linear sum of the variables; that is,
account for the observed pattern of intercorrelations. The each factor is a precise statistical combination of the tests
analysis also produces the factor matrix, a table showing used in the analysis. In a sense, a factor is produced by
the extent to which each test loads on (correlates with) each “adding in” carefully determined portions of some tests
of the derived factors, as discussed in the following section. and perhaps “subtracting out” fractions of other tests.
What makes the factors special is the elegant analytical
The Factor Matrix and Factor Loadings The methods used to derive them. Several different methods
factor matrix consists of a table of correlations called factor exist. These methods differ in subtle ways beyond the
loadings. The factor loadings (which can take on values scope of this text; the reader can gather a sense of the dif-
from −.00 + 1.00) indicate the weighting of each variable ferences by examining names of procedures: principal
on each factor. For example, the factor matrix in Table 5.3 components factors, principal axis factors, method of
shows that five factors (labeled I, II, III, IV, and V) were unweighted least squares, maximum-likelihood method,
derived from the analysis. Note that the first variable, image factoring, and alpha factoring (Tabachnick & Fidell,
Series Completion, has a strong positive loading of −.71 on 1989). Most of the methods yield highly similar results.
Theories and Individual Tests of Intelligence and Achievement 105

The factor loadings depicted in Table 5.3 are nothing graph, each of the 24 tests has been plotted against the two
more than correlation coefficients between variables and factors that correspond to axes I and II. The reader will
factors. These correlations can be interpreted as showing notice that the factor loadings on the first factor (I) are uni-
the weight or loading of each factor on each variable. For formly positive, whereas the factor loadings on the second
example, variable 9, the test of Word Meaning, has a very factor (II) consist of a mixture of positive and negative.
strong loading (.69) on factor I, modest negative loadings
(−45 and −.29) on factors II and III, and negligible load- Figure 5.1 Geometric Representation of the First Two
ings (.08 and .00) on factors IV and V. Factors from 24 Ability Tests

B
Table 5.3 The Principal Axes Factor Analysis for
24 Variables

Factors

I II III IV V 1012
23. Series Completion .71 –.11 .14 .11 .07 18
11
13
8. Word Classification .70 –.24 –.15 –.11 –.13 17 21
24
5. General Information .70 –.32 –.34 –.04 .08 15 19 16
14 1
9. Word Meaning .69 –.45 –.29 .08 .00 2
23 A
6. Paragraph Comprehension .69 –.42 –.26 .08 –.01 25 26 22
20
7. Sentence Completion .68 –.42 –.36 –.05 –.05 8

24. Arithmetic Problems .67 .20 –.23 –.04 –.11 5


6, 7
20. Deduction .64 –.19 .13 .06 .28 9

22. Problem Reasoning .64 –.15 .11 .05 –.04


21. Numerical Puzzles .62 .24 .10 –.21 .16
13. Straight and Curved Capitals .62 .28 .02 –.36 –.07
1. Visual Perception .62 –.01 .42 –.21 –.01
11. Code (Perceptual Speed) .57 .44 –.20 .04 .01
18. Number-Figure .55 .39 .20 .15 –.11
16. Figure Recognition .53 .08 .40 .31 .19 The Rotated Factor Matrix An important point in
4. Flags .51 –.18 .32 –.23 –.02 this context is that the position of the reference axes is arbi-
17. Object-Number .49 .27 –.03 .47 –.24 trary. There is nothing to prevent the researcher from rotating
2. Cubes .40 –.08 .39 –.23 .34 the axes so that they produce a more sensible fit with the fac-
12. Count Groups of Dots .48 .55 –.14 –.33 .11 tor loadings. For example, the reader will notice in Figure 5.1
10. Add Digits .47 .55 –.45 –.19 .07 that tests 6, 7, and 9 (all language tests) cluster together. It
3. Paper Form Board .44 –.19 .48 –.12 –.36 would certainly clarify the interpretation of factor I if it were
14. Word Recognition .45 .09 –.03 .55 .16 to be redirected near the center of this cluster (Figure 5.2).
15. Number Recognition .42 .14 .10 .52 .31 This manipulation would also bring factor II alongside inter-
19. Figure-Word .47 .14 .13 .20 –.61
pretable tests 10, 11, and 12 (all number tests).
Although rotation can be conducted manually by vis-
Geometric Representation of Factor Load- ual inspection, it is more typical for researchers to rely on
ings It is customary to represent the first two or three one or more objective statistical criteria to produce the final
factors as reference axes in two- or three-dimensional rotated factor matrix. Thurstone’s (1947) criteria of positive
space.2 Within this framework the factor loadings for each manifold and simple structure are commonly applied. In a
variable can be plotted for examination. In ourexample, rotation to positive manifold, the computer program
five factors were discovered, too many for simple visuali- seeks to eliminate as many of the negative factor loadings
zation. Nonetheless, we can illustrate the value of geomet- as possible. Negative factor loadings make little sense in
ric representation by oversimplifying somewhat and ability testing, because they imply that high scores on a fac-
depicting just the first two factors (Figure 5.1). In this tor are correlated with poor test performance. In a rotation
to simple structure, the computer program seeks to sim-
2
Technically, it is possible to represent all the factors as reference plify the factor loadings so that each test has significant
axes in n-dimensional space, where n is the number of factors. loadings on as few factors as possible.
However, when working with more than two or three reference The goal of both criteria is to produce a rotated factor
axes, visual representation is no longer feasible. matrix that is as straightforward and unambiguous as possible.
106 Chapter 5

help sharpen the meaning of factor I. For example, factor I is


Figure 5.2 Geometric Representation of the First Two
Rotated Factors from 24 Ability Tests not related to numerical skill (Numerical Puzzles loads .81)
or spatial skill (Paper Form Board loads .16). Using a similar
II
form of inference, it appears that factor II is mainly numeri-
10
12 cal ability (Add Digits loads .85, Count Groups of Dots loads
.80). Factor III is less certain but appears to be a visual-­
11
13 perceptual capacity, and factor IV appears to be a measure of
21 24 recognition. We would need to analyze the single test on fac-
18 tor V (Figure-Word) to surmise the meaning of this factor.
17 23
1
19 20
8 5
Table 5.4 The Rotated Varimax Factor Matrix for
15 14 22 7
2
4 69 24 Ability Variables
16

3 I Factors

I II III IV V
7. Sentence Completion .86 .15 .13 .03 .07
9. Word Meaning .84 .06 .15 .18 .08
6. Paragraph Comprehension .81 .07 .16 .18 .10
5. General Information .79 .22 .16 .12 –.02
8. Word Classification .65 .22 .28 .03 .21
22. Problem Reasoning .43 .12 .38 .23 .22
10. Add Digits .18 .85 –.10 .09 –.01
12. Count Groups of Dots .02 .80 .20 .03 .00
The rotated factor matrix for this problem is shown in 11. Code (Perceptual Speed) .18 .64 .05 .30 .17
Table 5.4. The particular method of rotation used here is 13. Straight and Curved Capitals .19 .60 .40 –.05 .18
called varimax rotation. Varimax should not be used if the 24. Arithmetic Problems .41 .54 .12 .16 .24
theoretical expectation suggests that a general factor may 21. Numerical Puzzles .18 .52 .45 .16 .02
occur. Should we expect a general factor in the analysis of 18. Number-Figure .00 .40 .28 .38 .36
ability tests? The answer is as much a matter of faith as of 1. Visual Perception .17 .21 .69 .10 .20
science. One researcher may conclude that a general factor 2. Cubes .09 .09 .65 .12 –.18
is likely and, therefore, pursue a different type of rotation.
4. Flags .26 .07 .60 –.01 .15
A second researcher may be comfortable with a Thursto-
3. Paper Form Board .16 –.09 .57 –.05 .49
nian viewpoint and seek multiple ability factors using a
varimax rotation. We will explore this issue in more detail 23. Series Completion .42 .24 .52 .18 .11

later, but it is worth pointing out here that a researcher 20. Deduction .43 .11 .47 .35 –.07
encounters many choice points in the process of conduct- 15. Number Recognition .11 .09 .12 .74 –.02
ing a factor analysis. It is not surprising, then, that different 14. Word Recognition .23 .10 .00 .69 .10
researchers may reach different conclusions from factor 16. Figure Recognition .07 .07 .46 .59 .14
analysis, even when they are analyzing the same data set. 17. Object-Number .15 .25 –.06 .52 .49

The Interpretation of Factors Table 5.4 indi- 19. Figure-Word .16 .16 .11 .14 .77
cates that five factors underlie the intercorrelations of the Note: Boldfaced entries signify subtests loading strongly on each factor.

24 ability tests. But what shall we call these factors? The


reader may find the answer to this question disquieting, These results illustrate a major use of factor analysis,
because at this juncture we leave the realm of cold, objec- namely, the identification of a small number of marker
tive statistics and enter the arena of judgment, insight, and tests from a large test battery. Rather than using a cumber-
presumption. In order to interpret or name a factor, the some battery of 24 tests, a researcher could gain nearly the
researcher must make a reasoned judgment about the com- same information by carefully selecting several tests with
mon processes and abilities shared by the tests with strong strong loadings on the five factors. For example, the first
loadings on that factor. For example, in Table 5.4 it appears factor is well represented by test 7, Sentence Completion
that factor I is verbal ability, because the variables with (.86) and test 9, Word Meaning (.84); the second factor is
high loadings stress verbal skill (e.g., Sentence Completion reflected in test 10, Add Digits (.85) while the third factor is
loads .86, Word Meaning loads .84, and Paragraph Com- best illustrated by test 1, Visual Perception (.69). The fourth
prehension loads .81). The variables with low loadings also factor is captured by test 15, Number Recognition (.74),
Theories and Individual Tests of Intelligence and Achievement 107

and Word Recognition (.69). Of course, the last factor loads With oblique rotations it is also possible to factor ana-
well on only test 19, Figure-Word (.77) lyze the factors themselves. Such a procedure may yield
one or more second-order factors. Second-order factors can
Issues in Factor Analysis Unfortunately, factor
provide support for the hierarchical organization of traits
analysis is frequently misunderstood and often misused.
and may offer a rapprochement between ability theorists
Some researchers appear to use factor analysis as a kind of
who posit a single general factor (e.g., Spearman) and
divining rod, hoping to find gold hidden underneath tons
those who promote several group factors (e.g., Thurstone).
of dirt. But there is nothing magical about the technique.
Perhaps both camps are correct, with the group factors sit-
No amount of statistical analysis can rescue data based on
ting underneath the second-order general factor.
trivial, irrelevant, or haphazard measures. If there is no
We turn now to a review of major theories of intelli-
gold to be found, then none will be found; factor analysis is
gence. A reminder: The justification for reviewing theories
not alchemy. Factor analysis will yield meaningful results
is to illustrate how they have influenced the structure and
only when the research was meaningful to begin with.
content of intelligence tests. In addition, the construct
An important point is that a particular kind of factor
validity of IQ tests depends on the extent to which they
can emerge from factor analysis only if the tests and meas-
embody specific theories of intelligence, so a review of
ures contain that factor in the first place. For example, a
theories is pertinent to test validation as well.
short-term memory factor cannot possibly emerge from a
battery of ability tests if none of the tests requires short-
term memory. In general, the quality of the output depends 5.1.3: Galton and Sensory Keenness
upon the quality of the input. We can restate this point as The first theories of intelligence were derived in the Brass
the acronym GIGO, or “garbage in, garbage out.” Instruments era of psychology at the turn of the twentieth
Sample size is crucial to a stable factor analysis. century. Sir Francis Galton and his disciple J. McKeen
Comrey (1973) offers the following rough guide: Cattell thought that intelligence was underwritten by keen
sensory abilities. This incomplete and misleading assump-
Sample Size Rating tion was based on a plausible premise:
50 Very poor
The only information that reaches us concerning outward
100 Poor events appears to pass through the avenues of our senses;
200 Fair and the more perceptive the senses are of difference, the
300 Good larger is the field upon which our judgment and intelli-
500 Very good gence can act.
1,000 Excellent
(Galton, 1883)

The sensory keenness theory of intelligence promoted


In general, it is comforting to have at least five subjects by Galton and Cattell proved to be largely a psychometric
for each test or variable (Tabachnick & Fidell, 1989). dead end. However, we do see vestiges of this approach in
Finally, we cannot overemphasize the extent to which modern chronometric analyses of intelligence such as the
factor analysis is guided by subjective choices and theoreti- Reaction Time−Movement Time (RT-MT) apparatus, an
cal prejudices. A crucial question in this regard is the choice experimental method favored by Jensen (1980) for the
between orthogonal axes and oblique axes. With orthogonal culture-reduced study of intelligence (Figure 5.3). In RT-MT
axes, the factors are at right angles to one another, which studies, the subject is instructed to place the index finger of
means that they are uncorrelated (Figures 5.1 and 5.2 both the preferred hand on the home button; then an auditory
depict orthogonal axes). In many cases the clusters of fac- warning signal is sounded, followed (in 1 to 4 seconds) by
tor loadings are situated such that oblique axes provide a one of the eight green lights going on, which the subject
better fit. With oblique axes, the factors are correlated must turn off as quickly as possible by touching the micro-
among themselves. Some researchers contend that oblique switch button directly below it. RT is the time the subject
axes should always be used, whereas others take a more takes to remove his or her finger from the home button
experimental approach. Tabachnick and Fidell (1989) rec- after a green light goes on. MT is the interval between
ommend an exploratory strategy based on repeated factor removing the finger from the home button and touching
analyses. Their approach is unabashedly opportunistic: the button that turns off the green light. Jensen (1980)
During the next few runs, researchers experiment with dif- reported that indices of RT and MT correlated as high as
ferent numbers of factors, different extraction techniques, .50 with traditional psychometric tests of intelligence.3
and both orthogonal and oblique rotations. Some number of
3
factors with some combination of extraction and rotation Actually, the raw correlation coefficient is negative because faster
produces the solution with the greatest scientific utility, con- reaction times (lower numerical scores) are associated with higher
sistency, and meaning; this is the solution that is interpreted. intelligence scores.
108 Chapter 5

P. A. Vernon has also reported substantial relationships— intelligence consisted of two kinds of factors: a single
as high as .70 for multiple correlations—between speed-of- general factor g and numerous specific factors s1, s2, s3,
processing RT-type measures and traditional measures of and so on. As a necessary adjunct to his theory, Spear-
intelligence (Vernon, 1994). These findings suggest that man helped invent factor analysis to aid his investiga-
speed-of-processing measures such as RT might be a useful tion of the nature of intelligence. Spearman used this
addition to standardized intelligence test batteries. In gen- statistical technique to discern the number of separate
eral, test developers have resisted the implications of this underlying factors that must exist to account for the
line of research. observed correlations between a large number of tests.
In Spearman’s view, an examinee’s performance on
Figure 5.3 Schematic Diagram of a Reaction Time— any homogeneous test or subtest of intellectual ability was
Movement Time Apparatus determined mainly by two influences: g, the pervasive gen-
Note: The square box eral factor, and s, a factor specific to that test or subtest. (An
error factor e could also sway scores, but Spearman sought
to minimize this influence by using highly reliable instru-
ments.) Because the specific factor s was different for each
intellectual test or subtest and was usually less influential
than g in determining performance level, Spearman
expressed less interest in studying it. He concentrated
mainly on defining the nature of g, which he likened to an
“energy” or “power” that serves in common the whole cor-
tex. In contrast, Spearman considered s, the specific factor,
to have a physiological substrate localized in the group of
neurons serving the particular kind of mental operation
demanded by a test or subtest. Spearman (1923) wrote,
“These neural groups would thus function as alternative ”
engines’ into which the common supply of” energy’ could
be alternatively distributed.”
Spearman reasoned that some tests were heavily loaded
with the g factor, whereas other tests—especially purely sen-
sory measures—were representative mainly of a specific fac-
tor. Two tests each heavily loaded with g should correlate
One reason for the lack of ongoing progress in mental
quite strongly. In contrast, psychological tests not saturated
chronometry is the absence of standardization in measure-
with g should show minimal correlation with one another.
ment and data analysis. Not all devices for measuring reac-
Much of Spearman’s research was aimed at demonstrating
tion time are the same; consequently, the data from one
the truth of these basic propositions derived from his theory.
laboratory cannot be compared to results from another set-
We have illustrated these points graphically in Figure 5.4. In
ting. Making matters worse, many “reaction time” devices
this figure, each circle represents an intelligence test, and the
lump together RT (the time needed to lift the finger off the
degree of overlap between circles indicates the strength of
home button) and MT (the time in transit to the target but-
correlation. Notice that tests A and B, each heavily loaded on
ton), which drastically obscures the relationship between
g, correlate quite strongly. Tests C and D have weak loadings
chronometric data and intelligence (Jensen, 2006). The
on g and subsequently do not correlate well.
problem with combining the two is that RT is related to IQ,
whereas MT is a motor measure uncorrelated with IQ. In
addressing these issues, Jensen (2011) has commissioned a Figure 5.4 Spearman’s Two-Factor Theory of Intelligence
leading electronics company to create a standard appara- Note: Tests A and B correlate strongly, whereas C and D correlate weakly. See
text.
tus for administering and recording reaction time and
other indices of mental chronometry. Use of a single stand- A B C D
ard instrument would provide an vital foundation for pro- e e s1 s4
gress in this area of assessment. s1 s3 s2 s5
g g
s2 s4 s3 s6

5.1.4: Spearman and the g Factor e e

Based on extensive study of the patterns of correlations


between various tests of intellectual and sensory ability, Spearman (1923) believed that individual differences
Charles Spearman (1904, 1923, 1927) proposed that in g were most directly reflected in the ability to use three
Theories and Individual Tests of Intelligence and Achievement 109

principles of cognition: apprehension of experience, educ- loaded on it. In his analysis of how scores on different kinds
tion of relations, and eduction of correlations. Incidentally, of intellectual tests correlated with each other, Thurstone
the little-used term eduction refers to the process of figuring concluded that several broad group factors—and not a single
things out. These three principles can be explained by general factor—could best explain empirical results. At vari-
examining how we solve analogies of the form A:B::C:? ous points in his research career, he proposed approximately
that is, A is to B as C is to? A simple example might be a dozen different factors. Only seven of these factors have
HAMMER:NAIL::SCREWDRIVER:? To solve this analogy, been frequently corroborated (Thurstone, 1938; Thurstone &
we must first perceive and understand each term based on Thurstone, 1941) and they have been designated primary
past experience; that is, we must have apprehension of mental abilities (PMAs).
experience. If we have no idea what a hammer, nail, and
screwdriver are, there is little chance we can complete the
analogy correctly. Next, we must infer the relation between
the first two analogy terms, in this case, HAMMER and
NAIL. Using a somewhat stilted phrase, Spearman referred
to the ability to infer the relation between two concepts as
eduction of relations. The final step, eduction of correlates,
refers to the ability to apply the inferred principle to the
new domain, in this case, applying the rule inferred to pro-
duce the correct response, namely, SCREWDRIVER:SCREW.
Although Spearman’s physiological speculations have
been largely dismissed, the idea of a general factor has
been a central topic in research on intelligence and is still
very much alive today (Jensen, 1979). The correctness of
the g factor viewpoint is more than an academic issue. If it
is true that a single, pervasive general factor is the essential
wellspring of intelligence, then psychometric efforts to
produce factorially pure subtests (e.g., measuring verbal
comprehension, perceptual organization, short-term mem-
ory, and so on) are largely misguided. To the extent that
Spearman is correct, test developers should forgo subtest Thurstone (1938) published the Primary Mental Abili-
derivation and concentrate on producing a test that best ties Test consisting of separate subtests, each designed to
captures the general factor. measure one PMA. However, he later acknowledged that
The most difficult issue faced by Spearman’s two-factor his primary mental abilities correlated moderately with
theory is the existence of group factors. As early as 1906, each other, proving the existence of one or more second-
Spearman and his contemporaries noted that relatively order factors. Ultimately, Thurstone acknowledged the
dissimilar tests could have correlations higher than the val- existence of g as a higher-order factor. By this time, Spear-
ues predicted from their respective g loadings (Brody & man had admitted the existence of group factors represent-
Brody, 1976). This finding raised the possibility that a ing special abilities, and it became apparent that the
group of diverse measures might share in common a uni- differences between Spearman and Thurstone were largely
tary ability other than g. For example, several tests might a matter of emphasis (Brody & Brody, 1976). Spearman
share a common unitary memorization factor that was continued to believe that g was the major determinant of
halfway between the g factor and the various s factors correlations between test scores and assigned a minor role
unique to each test. Of course, the existence of group factors to group factors. Thurstone reversed these priorities.
is incompatible with Spearman’s meticulous two-factor P. E. Vernon (1950) provided a rapprochement between
theory. these two viewpoints by proposing a hierarchical group
factor theory. In his view, g was the single factor at the top
of a hierarchy that included two major group factors
5.1.5: Thurstone and the Primary
labeled verbal-educational (V:ed) and practical-mechanical-
Mental Abilities spatial-physical (k:m). Underneath these two major group
Thurstone (1931) developed factor-analysis procedures factors were several minor group factors resembling the
capable of searching correlation matrices for the existence PMAs of Thurstone; specific factors occupied the bottom of
of group factors. His methods permitted a researcher to the hierarchy.
discover empirically the number of factors present in a Thurstone’s analysis of PMAs continues to influence
matrix and to define each factor in terms of the tests that test development even today. Schaie (1985) has revised and
110 Chapter 5

modified the Primary Mental Abilities Test and used these include approximately 70 abilities identified by Carroll
measures in an enormously influential longitudinal study (1993) in his comprehensive review of factor-analytic stud-
of adult intelligence. If intelligence were mainly a matter of ies of intelligence. As might be expected, the list of narrow
g, then the group factors should change at about the same abilities is continually revised and expanded with ongoing
rate with aging. In support of the group factor approach to research. These narrow abilities “represent greater speciali-
intellectual testing, Schaie (1985) reports that some PMAs zations of abilities, often in quite specific ways that reflect
show little age-related decrement (Verbal Comprehension, the effects of experience and learning, or the adoption of
Word Fluency, Inductive Reasoning), whereas other PMAs particular strategies of performance” (Carroll, 1993, p. 634).
decline more rapidly in old age (Space, Number). Thus,
there may be practical real-world reasons for reporting
Figure 5.5 Outline of the CHC Three-Stratum Theory
group factors and not condensing all of intelligence into a of Cognitive Abilities
single general factor. Source: Based on Carroll, J. B. (1993). Cognitive abilities: A survey of factor
analytic studies. New York: Cambridge University Press, and table 3 from
www.iapsych.com.
5.1.6: Cattell-Horn-Carroll
(CHC) Theory Stratum III Stratum II Stratum I
Fluid Intelligence/Reasoning (Gf ) 5 narrow abilities
Raymond Cattell (1941, 1971) proposed an influential Crystallized Intelligence/Knowledge (Gc) 10 narrow abilities
theory of the structure of intelligence that has been revised Domain-Specific Knowledge (Gkn) 7 narrow abilities
General Visual-Spatial Abilities (Gv) 11 narrow abilities
and extended by John Horn (1968, 1994) and John Carroll Intelligence, g Auditory Processing (Ga) 13 narrow abilities
(1993). Based on the reanalysis of 461 data sets from hun- Broad Retrieval [Memory] (Gr) 13 narrow abilities
Cognitive Processing Speed (Gs) 7 narrow abilities
dreds of independent studies published by other research-
Decision/Reaction Time or Speed (Gt) 5 narrow abilities
ers, Carroll’s contributions to the theory are especially
vital. The ensuing theory, known as Cattell-Horn-Carroll
(CHC) theory, is a taxonomic tour de force that synthesizes Definitions of CHC Broad Ability Factors As
the findings from almost a century of factor-analytic noted, the broad factors of CHC are more firmly estab-
research on intelligence. Many psychometricians consider lished than the narrow abilities, which continue to undergo
CHC theory to possess the strongest empirical foundation revision and extension. We provide brief definitions of the
of any theory of intelligence and also to provide the most broad factors, based on Carroll (1993), McGrew (1997), and
far-reaching implications for psychological testing www.iapsych.com.
(McGrew, 1997). Although the “big picture” of CHC theory
is well established, researchers continue to refine the • Fluid Intelligence/Reasoning (Gf): Fluid intelligence
details. Under the direction of Kevin McGrew, the Institute encompasses high-level reasoning and is used for
for Applied Psychometrics manages an informative web- novel tasks that cannot be performed automatically.
site dedicated to the advancement of CHC theory and The mental operations of fluid intelligence may
applications (www.iapsych.com). involve drawing inferences, forming concepts, gener-
According to CHC theory, intelligence consists of per- ating and testing hypotheses, understanding implica-
vasive, broad, and narrow abilities that are hierarchically tions, inductive reasoning, and deductive reasoning.
organized. These are known as Stratum III, II, and I, respec- The classic example of fluid intelligence is found in
tively (Figure 5.5). At the highest and most pervasive level matrix reasoning tasks such as Raven’s Progressive
called Stratum III, a single general factor known as little g Matrices (Raven, 2000).
oversees all cognitive activities. Stratum II capacities, • The abilities that make up fluid intelligence are largely
which reside beneath general intelligence, include several nonverbal and not heavily dependent on exposure to a
prominent and well-established abilities. In Figure 5.5, we specific culture. For these reasons, Cattell (1940)
have depicted eight abilities originally identified by Carroll believed that measures of fluid intelligence were cul-
(1993), but other researchers have proposed a slightly ture-free. Based on this assumption, he devised the
larger list that includes additional tentative entries such as Culture Fair Intelligence Test (CFIT) in an attempt to
psychomotor, olfactory, and kinesthetic abilities. The pre- eliminate cultural bias in testing. Of course, calling a
cise name given to each broad factor differs slightly from test culture fair does not make it necessarily so. In fact,
one theorist to another, as well as the scale abbreviations. the goal of a completely culture-free intelligence test
Even so, there is strong consensus for the essential list. has proved elusive.
These broad factors include “basic constitutional and long- • Crystallized Intelligence/Knowledge (Gc): This form
standing characteristics of individuals that can govern or of intelligence is typically defined as an individual’s
influence a great variety of behaviors in a given domain” breadth and depth of acquired cultural knowledge—
(Carroll, 1993, p. 634). The narrow abilities at Stratum I knowledge of the language, information, and concepts
Theories and Individual Tests of Intelligence and Achievement 111

of a person’s culture. The quintessential example is the attention and focused concentration are required. For
extent of vocabulary that an individual understands. example, the ability to perform simple arithmetic cal-
But crystallized intelligence also includes the applica- culations with lightning speed would indicate a high
tion of verbal and cultural knowledge (e.g., oral pro- level of Gs ability.
duction, verbal fluency, and communication ability). • Decision/Reaction Time or Speed (Gt): This is the abil-
Because crystallized intelligence arises when fluid ity to make decisions quickly in response to simple
intelligence is applied to cultural products, we would stimuli, typically measured by reaction time. For
expect these two kinds of cognitive ability to possess a example, the capacity to quickly press the space bar
strong correlation. In fact, it is commonly found that whenever the letter X appears on a computer screen
measures of crystallized and fluid intelligence possess would involve the use of Gt ability.
a healthy relationship (r = .5).
• Domain-Specific Knowledge (Gkn): Domain-specific Utility of CHC Theory CHC theory is unusual in its
knowledge represents a person’s acquired knowledge detail, which permits robust theory testing. A number of
in one or more specialized domains that do not repre- lines of evidence support its validity. For example, the
sent the typical experiences of individuals in the cul- structure of intelligence as posited by CHC theory has been
ture. This might include, for example, knowledge of shown to be invariant across a number of key variables,
biology, skill in lip reading, or knowledge of how to including age, ethnicity, and gender (Bickley, Keith, &
use computers. Wolfe, 1995; Keith, 1999; Carroll, 1993). In empirical stud-
• Visual-Spatial Abilities (Gv): This ability has to do with ies, the broad CHC abilities also reveal theory-confirming
imagining, retaining, and transforming mental repre- relationships with numerous academic and occupational
sentations of visual images. For example, visual-spatial variables (McGrew & Flanagan, 1998). In one study, for
ability involves the capacity to predict how a shape example, measures of CHC broad and narrow cognitive
will appear when it is rotated, or to identify quickly a abilities were selectively and appropriately related to
known object from a vague, incomplete picture, or to mathematics achievement in a representative sample of
find an object hidden in a picture. This capacity children and adolescents (Floyd, Evans, & McGrew, 2003).
includes visual memory. In general, practitioners praise the CHC approach to parti-
tioning intelligence because the broad and narrow abilities
• Auditory Processing (Ga): This is the ability to per-
are empirically verified and possess meaningful real-world
ceive auditory information accurately, which involves
implications (Fiorello & Primerano, 2005).
the capacity to analyze, comprehend, and synthesize
patterns or groups of sounds. Auditory processing
involves the ability to discriminate speech sounds and 5.1.7: Guilford and the
to judge and discriminate tonal patterns in music. A Structure-of-Intellect Model
key characteristic of Ga abilities is the cognitive talent
After World War II, J. P. Guilford (1967, 1985) continued the
needed to control the perception of auditory informa-
search for the factors of intelligence that had been initiated
tion (i.e., to filter signal from noise).
by Thurstone. Guilford soon concluded that the number of
• Broad Retrieval [Memory] (Gr): Broad retrieval
discernible mental abilities was far in excess of the seven
includes the ability to consolidate and store new infor-
proposed by Thurstone. For one thing, Thurstone had
mation in long-term memory and then to retrieve the
ignored the category of creative thinking entirely, an unwar-
information later through association. Included in
ranted oversight in Guilford’s view. Guilford also found
broad retrieval are such narrow abilities as associative
that if innovative types of tests were included in the large
memory (e.g., when provided the first part, recalling
batteries of tests he administered his subjects, then the pat-
the second part of a previously learned but unrelated
tern of correlations between these tests indicated the exist-
pair of items), ideational fluency (e.g., ability to call up
ence of literally dozens of new factors of intellect.
ideas), and naming facility (e.g., rapidly providing the
Furthermore, Guilford noticed that some of these new fac-
names of familiar faces). Some researchers further
tors had recurring similarities with respect to the kinds of
divide the broad memory factor into additional sub-
mental processes involved, the kinds of information fea-
types. In addition, some theorists propose a separate
tured, or the form that the items of information took. As a
broad factor for short-term memory (Gsm), the ability
result of these recurring similarities in the newly discovered
to retain awareness of events that have occurred in the
factors of intellect, he became convinced that these multitu-
last minute or less (Horn & Masunaga, 2000).
dinous factors could be grouped along a small number of
• Cognitive Processing Speed (Gs): This ability refers to main dimensions. Guilford (1967) proposed an elegant
the speed of executing overlearned or automatized structure-of-intellect (SOI) model to summarize his find-
cognitive processes, especially when high levels of ings. Visually conceived, Guilford’s SOI model classifies
112 Chapter 5

intellectual abilities along three dimensions called opera- Digit Span on the WAIS-III) might capture this factor of
tions, contents, and products. intellect quite well. But so might a visual digit span test and
By operations, Guilford has in mind the kind of intel- perhaps even an analogous test with tactile presentation of
lectual operation required by the test. Most test items symbols, such as vibrating rods applied to the skin. Perhaps
emphasize just one of the operations listed here: we need a separate cube for hearing, vision, and touch;
such an expanded model would incorporate 450 factors of
Cognition Discovering, knowing, or comprehending intellect, surely an unwieldy number.
Memory Committing items of information to memory, such as a Although it seems doubtful that intelligence could
series of numbers
involve such a large number of unique abilities, Guilford’s
Divergent Retrieving from memory items of production a specific
class, such as naming objects that are both hard and atomistic view of intellect nonetheless has caused test
edible developers to rethink and widen their understanding of
Convergent Retrieving from memory a correct production item, such intelligence. Prior to Guilford’s contributions, most tests of
as a crossword puzzle word
intelligence required mainly convergent production—the
Evaluation Determining how well a certain item of information satisfies construction of a single correct answer to a stimulus situa-
specific logical requirements
tion. Guilford raised the intriguing possibility that diver-
Contents refers to the nature of the materials or infor- gent production—the creation of numerous appropriate
mation presented to the examinee. The five content catego- responses to a single stimulus situation—is also an essen-
ries are as follows: tial element of intelligent behavior. Thus, a question such
as “List as many consequences as possible if clouds had
Visual Images presented to the eyes strings hanging down from them” (divergent production)
Auditory Sounds presented to the ears might assess an aspect of intelligence not measured by tra-
Symbolic Such as mathematical symbols that stand for something ditional tests.
Semantic Meanings, usually of word symbols
Behavioral The ability to comprehend the mental state and behavior of
other persons 5.1.8: Planning, Attention,
Simultaneous, and Successive
The third dimension in Guilford’s model, products,
refers to the different kinds of mental structures that the
(Pass) Theory
brain must produce to derive a correct answer. The six Some modern conceptions of intelligence owe a debt to the
kinds of products are as follows: neuropsychological investigations of the Russian psychol-
ogist Aleksandr Luria (1902−1977). Luria (1966) relied pri-
Unit A single entity having a unique combination of marily on individual case studies and clinical observations
properties or attributes
of brain-injured soldiers to arrive at a general theory of
Class What it is that similar units have in common, such as a
set of triangles or high-pitched tones
cognitive processing. The heart of his theory is as follows:
Relation An observed connection between two items, such as Analysis shows that there is strong evidence for distin-
two tones an octave apart guishing two basic forms of integrative activity of the cer-
System Three or more items forming a recognizable whole, ebral cortex by which different aspects of the outside
such as a melody or a plan for a sequence of actions
world may be reflected. . . . The first of these forms is the
Transformation A change in an item of information, such as a integration of the individual stimuli arriving in the brain
correction of a misspelling
into simultaneous, and primarily spatial groups, and the
Implication What an individual item implies, such as to expect
second is the integration of individual stimuli arriving
thunder following lightning
consecutively in the brain into temporally organized, suc-
cessive series. (Luria, 1966)
In total, then, Guilford (1985) identified five types of
operations, five types of content, and six types of products, Since this approach focuses upon the mechanics by
for a total of 5 × 5 × 6 or 150 factors of intellect. Each com- which information is processed, it is often called an infor-
bination of an operation (e.g., memory), a content (e.g., mation processing theory.
symbolic), and a product (e.g., units) represents a different Luria (1970) proposed three functional units in the
factor of intellect. Guilford claims to have verified over 100 brain. Processing of information proceeds from lower units
of these factors in his research. to higher units. The first unit is found in subcortical areas
The SOI model is often lauded on the grounds that it including the brain stem, midbrain, and thalamus. Atten-
captures the complexities of intelligence. However, this is tional processes originate here, including selective atten-
also a potential Achilles’ heel for the theory. Consider one tion and resistance to distraction. The second unit consists
factor of intellect, memory for symbolic units. A test that of the rearward sensory portions of the cerebral cortex
requires the examinee to recall a series of spoken digits (e.g., (parietal, temporal, and occipital lobes). This large unit
Theories and Individual Tests of Intelligence and Achievement 113

subserves the simultaneous and successive processes dis- be formed (successive coding) and blended together as a
cussed later in this chapter. These processes are to some syllable (simultaneous). Then the string of syllables has
extent lateralized, with simultaneous processing engaged to be made into a word (successive), the word is recog-
more with the right hemisphere, and successive processing nized (simultaneous), and a pronunciation program is
connected more with the left hemisphere. However, later- then assembled (successive), leading to oral reading (suc-
cessive and simultaneous).
alization is relative, not absolute (Springer & Deutsch,
1997). The third unit is located in the frontal lobes. This is Das admits that this may be a simplified view of what
primarily where planning occurs and also where motor occurs when a reader is confronted with a word. The essen-
output initiates. tial point is that higher-level information processing relies
Naglieri and Das (1990, 2005) have developed the upon an interplay of specific, anatomically localizable
Planning, Attention, Simultaneous, Successive (PASS) forms of information processing.
theory of intelligence as a modern extension of Luria’s The challenge of a simultaneous-successive approach
work. Planning involves the selection, usage, and monitor- to the assessment of intelligence is to design tasks that tap
ing of effective solutions to problems. Anticipation of con- relatively pure forms of each approach to information pro-
sequences and use of feedback are essential. Planning also cessing. Tests that use this strategy are the Kaufman
entails impulse control. As noted, the frontal lobes are Assessment Battery for Children II (K-ABC-II), discussed
heavily engaged in this process. Even though it is listed in the next topic, and the Das-Naglieri Cognitive Assess-
first in the PASS acronym, Planning is actually the last ment System (Das & Naglieri, 2012). The Das-Naglieri bat-
stage of information processing. The first process is Atten- tery includes successive tasks that involve rapid
tion, which requires selectively attending to some stimuli articulation (such as, “Say can, ball, hot as fast as you can
while ignoring others. In some cases, attention also entails 10 times”) and simultaneous measures of both verbal and
vigilance over a period of time. Difficulties with this pro- nonverbal tasks. The battery also assesses planning and
cess underlie attention deficit/hyperactivity disorder. As attention, so as to embody the PASS theory (Naglieri &
noted, the brain stem and other midline subcortical struc- Das, 2005).
tures are vital to attentional processes.
Simultaneous processing of information is character-
5.1.9: Information Processing
ized by the execution of several different mental opera-
tions simultaneously. Forms of thinking and perception Theories of Intelligence
that require spatial analysis, such as drawing a cube, Information processing conceptions of intelligence pro-
require simultaneous information processing. In drawing, pose models of how individuals mentally represent and
the examinee must simultaneously apprehend the overall process information. Borrowing from Campione and
shape and guide hand and fingers in the execution of the Brown (1978), Borkowski (1985) has put forward a compre-
shape. A sequential approach to drawing a cube (if one hensive theory that bears a loose analogy to the function-
were even possible) would be horrifically complex. In ing of a computer. The architectural system (hardware)
effect, the examinee would have to draw individual lines refers to biologically based properties necessary for infor-
of highly specific lengths and angular orientations, and mation processing, such as memory span and speed of
just hope that everything would line up. In the absence encoding/decoding information. Properties of the archi-
of a simultaneous mental gestalt to guide the drawing, a tectural system include capacity (e.g., number of slots in
distorted production is almost guaranteed. Successive short-term memory, capacity of long-term memory), dura-
processing of information is needed for mental activities in bility (rate of information loss), and efficiency of operation
which a proper sequence of operations must be followed. (e.g., rate of memory search). The architectural system is
This is in sharp contrast to simultaneous processing (such considered to be relatively “hardwired” and impervious to
as drawing), for which sequence is unimportant. Succes- change by the environment.
sive processing is needed in remembering a series of dig- In addition to the structural component of intelli-
its, repeating a string of words (e.g., shoe, ball, egg), and gence, there are various functional components (software).
imitating a series of hand movements (fist, palm, fist, fist, The executive system, which refers to environmentally
palm). Most forms of information processing require an learned components that steer problem solving, provides
interplay of simultaneous and successive mechanisms. overall guidance to the functional components. Elements
Das (1994) cites the example of reading an unfamiliar of the executive system include the knowledge base
word such as taciturn: (retrieval of knowledge from long-term memory), schemes
The single letters are to be recognized, and that involves (rules of thinking), control processes (rules and strategies
simultaneous coding. The reader matches the visual such as self-checking and rehearsal), and metacognition
shape of the letter with a mental dictionary and comes (self-awareness of one’s own thought processes). Meta-
up with a name for it. The letter sequences, then, have to cognition is the process of thinking about thinking. Flavell
114 Chapter 5

(1976), who pioneered research on this topic, explained it Based on these criteria, Gardner (1983, 1993) proposes
as follows: that the following seven natural intelligences have been
substantially confirmed. The seven intelligences are lin-
Metacognition refers to one’s knowledge concerning one’s
own cognitive processes or anything related to them, e.g., guistic, logical-mathematical, spatial, musical, bodily-
the learning-relevant properties of information or data. For kinesthetic, interpersonal, and intrapersonal. Three of
example, I am engaging in metacognition if I notice that I am these seven types of intelligence are well known—linguistic
having more trouble learning A than B; if it strikes me that I (i.e., verbal) intelligence, logical-mathematical intelligence,
should double check C before accepting it as fact. (p. 232) spatial intelligence—and numerous formal tests have been
devised to measure them, so we will not discuss them fur-
The information processing approach to intelligence
ther here. The other four variations of intelligence are
has generated a large body of research, especially on the
somewhat novel and, therefore, require more detailed
concept of metacognition. A consistent finding in this liter-
presentation.
ature is that individuals who use metacognitive strategies
Bodily-kinesthetic intelligence includes the types of
perform at much higher levels than those who do not
skills used by athletes, dancers, mime artists, typists, or
(Montague & Bos, 1990). For example, in a study of 32
“primitive” hunters. Although Western cultures are gener-
Israeli kindergarten children who were taught metacogni-
ally loath to consider the body as a form of intelligence,
tion related to mathematics, metacognitive skills explained
this is not the case in much of the rest of the world, nor was
more of the variance in mathematics performance than
it true in our evolutionary history. Indeed, persons who
general ability (Mevarech, 1995). Metacognition is essential
could skillfully avoid predators, climb trees, hunt animals,
to intelligence and is one of the primary influences on stu-
and prepare tools were more likely to survive and pass on
dent learning (Wang, Haertel, & Walberg, 1990).
their genes to succeeding generations.
The personal intelligences include the capacity to have
5.1.10: Gardner and the Theory access to one’s own feeling life (intrapersonal) as well as
of Multiple Intelligences the ability to notice and make distinctions about the moods,
temperaments, motivations, and intentions of others (inter-
Howard Gardner (1983, 1993) has proposed a theory of multi-
personal). Thus, personal intelligence encompasses both
ple intelligences based loosely on the study of brain−behavior
an intrapersonal and an interpersonal version. The former
relationships. He argues for the existence of several relatively
is found in great novelists who can write introspectively
independent human intelligences, although he admits that
about their feelings, while the latter is often seen in reli-
the exact nature, extent, and number of the intelligences have
gious and political leaders (e.g., Mahatma Gandhi or Lyndon
not yet been definitively established. Gardner (1983)
Johnson) who can fathom the intentions and desires of oth-
ers and use this information to influence them and form
useful alliances.
Musical intelligence is perhaps the least understood of
Gardner’s intelligences. Persons with good musical intelli-
gence easily learn to perform an instrument or to write
their own compositions. Although knowledge of the struc-
tural aspects of melody, rhythm, and timbre is important to
musical intelligence, Gardner notes that many experts
place the affective or feeling aspects of music at its core. He
believes that when the neurological underpinnings of
music are finally unraveled, we will have “an explanation
of how emotional and motivational factors are intertwined
with purely perceptual ones” (Gardner, 1983).
The savant phenomenon provides strong support for
the existence of separate intelligences, including musical
intelligence.4 A savant is a mentally deficient individual
who has a highly developed talent in a single area such as
art, rapid calculation, memory, or music. An example is the

4
Historically, savants have also been called idiot savants, which
refers, literally, to a person who is both profoundly retarded and
yet “wise” at the same time. For obvious reasons, the prefix has
been dropped.
Theories and Individual Tests of Intelligence and Achievement 115

extraordinary case of Leslie Lemke, who was born blind receive constant feedback as to how things are going in
and with mental retardation and cerebral palsy. He was problem solving. Persons who are strong on the metacom-
not supposed to live. His adoptive mother had to coax him ponential aspect of intelligence are very good at allocating
to suck milk from a bottle. Later, she strapped him to her their intellectual resources.
back to help him learn to walk. In spite of his severe disa-
bilities, Leslie became enamored of the piano and showed
Table 5.5 An Outline of Sternberg’s Triarchic Theory
incredible precocity at picking out melodies on it. Within a of Intelligence
few years, at the age of 18, he could listen to a piece of clas-
sical piano music a single time and then play it back flaw- Componential (Analytical) Intelligence
Metacomponents or executive processes (e.g., planning)
lessly (Patton, Payne, & Beirne-Smith, 1986). The reader Performance components (e.g., syllogistic reasoning)
can find additional savant case studies in Miller (1989) Knowledge acquisition components (e.g., ability to acquire vocabulary
words)
and Treffert (1989).
Experiential (Creative) Intelligence
Recently, Gardner (1998) has added three tentative Ability to deal with novelty
candidates to his list of intelligences. These are naturalistic, Ability to automatize information processing
spiritual, and existential intelligences. Naturalistic intelli- Contextual (Practical) Intelligence
gence is the kind shown by people who are able to discern Adaptation to real-world environment
Selection of a suitable environment
patterns in nature. Charles Darwin would be a prime Shaping of the environment
example of such a person. Gardner believes that the evi- Source: Summarized from Sternberg, R. J. (1986). Intelligence applied: Understanding
dence for this kind of intelligence is relatively strong. In and increasing your intellectual skills. San Diego, CA: Harcourt Brace Jovanovich.

contrast, spiritual intelligence (a concern with cosmic and


spiritual issues in one’s development) and existential intel- In a problem-solving study using novel forms of anal-
ligence (a concern with ultimate issues, including the ogies, Sternberg (1981) found that higher intelligence is
meaning of life) are less well proved as independent intel- associated with spending relatively more time on global or
ligences. In general, the theory of multiple intelligence is higher-order planning, and relatively less time on local or
compelling in its simplicity, but there is little empirical lower-order planning. For example, consider this analogy
investigation of its validity. problem:
Man: Skin:: (Dog, Tree):(Bark, Cat)
5.1.11: Sternberg and the Triarchic
The examinee must choose the two correct terms on the
Theory of Successful Intelligence right that will complete the analogy. (The correct choices are
Sternberg (1985b, 1986, 1996) takes a much wider view on Tree and Bark.) Using reaction time measures for a series of
the nature of intelligence than most previous theorists. In such novel or nonentrenched problems, Sternberg (1981)
addition to proposing that certain mental mechanisms are found that persons of higher intelligence spend more time
required for intelligent behavior, he also emphasizes that in global planning—forming a macrostrategy that applies
intelligence involves adaptation to the real-world environ- to this and similar problems—than did persons of lower
ment. His theory emphasizes what he calls successful intel- intelligence. Thus, a crucial aspect of intelligence is know-
ligence or “the ability to adapt to, shape, and select ing when to step back and allocate intellectual effort instead
environments to accomplish one’s goals and those of one’s of obtusely attacking a difficult problem.
society and culture” (Sternberg & Kaufman, 1998, p. 494). Performance components are the well-entrenched men-
Sternberg’s theory is called triarchic (ruled by three) tal processes that might be used to perform a task or solve
because it deals with three aspects of intelligence: compo- a problem. These aspects of intelligence are the ones that
nential intelligence, experiential intelligence, and contex- are probably measured the best by existing intelligence
tual intelligence. Each of these types of intelligence has two tests. Examples of performance components include short-
or more subcomponents. The entire theory is outlined in term memory and syllogistic reasoning.
Table 5.5. Knowledge acquisition components are the processes
Componential intelligence, also known as analytical used in learning. Sternberg has emphasized that in order to
intelligence, consists of the internal mental mechanisms understand what makes some people more skilled than
that are responsible for intelligent behavior. The compo- others, we must understand their increased capacity to
nents of intelligence serve three different functions. Meta- acquire those skills in the first place. A case in point is
components are the executive processes that direct the vocabulary knowledge, which is learned mainly in context
activities of all the other components of intelligence. They rather than through direct instruction. More-intelligent
are responsible for determining the nature of an intellec- persons are better able to use surrounding contexts to fig-
tual problem, selecting a strategy for solving it, and mak- ure out what a word means; that is, they have greater
ing sure that the task is completed. The metacomponents knowledge-acquisition skills. Their increased vocabulary
116 Chapter 5

results, in large measure, from their increased ability to predicted from their childhood precocity. Those who were
“soak up” the meanings of words they see and hear in their most successful had found occupations highly suited to
environment. Thus, vocabulary is an excellent measure of their abilities and interests. In sum, they had selected envi-
intelligence because it reflects people’s ability to acquire ronmental niches that fitted them well. Sternberg would
information in context. argue that the ability to select such environments is an
The second aspect of Sternberg’s theory involves expe- important aspect of intelligence.
riential intelligence. According to the theory, a person with Shaping is another way to improve the fit between one-
good experiential intelligence is able to deal effectively self and the environment, especially when selection of a
with novel tasks. Experiential intelligence is also known as new environment is not practical. In this application of
creative intelligence. This aspect of his theory explains why contextual intelligence, we shape the environment itself so
Sternberg is so critical of most intelligence tests. For the that it better fits our needs. An employee who convinces
most part, the existing tests measure things already learned the boss to do things differently has used shaping to make
by presenting tasks that the subject has already encoun- the work environment more suited to his or her talents.
tered. According to Sternberg, intelligence also involves Sternberg (1993) has developed a research instrument
the capacity to learn and think within new conceptual sys- based on his theory and has used the test to examine the
tems, not just to deal with tasks already encountered. A validity of the triarchic approach. The Sternberg Triarchic
second aspect of experiential intelligence is the ability to Abilities Test (STAT) is unique in going beyond the typical
automatize or “make routine” tasks that are encountered questions that invoke analytical intelligence; the test
repeatedly. An example of automatizing that applies to includes creative and practical questions as well. For exam-
most of us is reading, which is carried out largely without ple, in one subtest examinees are presented with a map of
conscious thought. But any task or mental skill can be an area, such as an entertainment park, and then must
automatized, if it is practiced enough. Playing music is an answer questions about navigating effectively through the
example of an extremely high-level skill that can become area shown in the map (practical intelligence). In another
automatized with enough practice. subtest examinees are presented with verbal analogies pre-
The third aspect of Sternberg’s theory involves contex- ceded by incorrect, counterfactual premises (e.g., money
tual intelligence. Contextual intelligence, also known as falls off trees). Examinees must solve the analogies as
practical intelligence, is defined as “mental activity though the counterfactual premises were true (creative
involved in purposive adaptation to, shaping of, and selec- intelligence). In factor-analytic studies of American, Finish,
tion of real-world environments relevant to one’s life” and Spanish samples, the triarchic model was a better fit to
(Sternberg, 1986, p. 33). This aspect of Sternberg’s theory the data than the usual outcome of finding a single factor of
appears to acknowledge that human behavior has been general intelligence (Sternberg, Castejon, Prieto, Hautamaki,
shaped by selective pressures during our evolutionary his- & Grigorenko, 2001).
tory. Contextual intelligence has three parts: adaptation, Although Sternberg’s triarchic theory is the most com-
selection, and shaping. prehensive and ambitious model yet proposed, not all psy-
Adaptation refers to developing skills required by one’s chometric researchers have rushed to embrace it.
particular environment. Successful adaptation will differ Detterman (1984) cautions that we should investigate the
from one culture to the next. In the pygmy cultures of basic cognitive components of intelligence before introduc-
Africa, adaptation might involve the ability to track ele- ing higher-order constructs that may be unnecessary.
phants and kill them with poison-tipped spears. In the Rogoff (1984) questions whether the three subtheories
Western industrial nations, adaptation might involve pre- (componential, experiential, contextual) are sufficiently
senting oneself favorably in a job interview. linked. Other comments on the triarchic theory can be
Selection might be called niche finding. This aspect of found in Behavioral and Brain Sciences (1984, pp. 287−304).
contextual intelligence involves the ability to leave the Whatever the final verdict on the triarchic theory of
environment we are in and to select a different environ- intelligence, Sternberg’s insistence that intelligence has
ment more suitable to our talents and needs. Feldman several components not measured by traditional tests
(1982) has illustrated how selection can operate in the rings true to anyone who has studied or administered
career choices of gifted children, thereby determining these tests. He cites the case of a colleague who was asked
whether they are highly accomplished as adults. She fol- to test a number of residents at an institution for those
lowed up on the Quiz Kids who were featured in radio and with mental retardation. These residents had just planned
television shows of the 1950s. These were extremely bright and successfully executed an escape from the security-
children by conventional standards, most with IQs of 140 conscious school, a feat requiring high levels of practical
and higher. A few became highly successful as adults. intelligence. Yet, when administered the Porteus Maze
However, most of them led rather ordinary lives, devoid of Test (Porteus, 1965), a standardized test reputed to involve
the spectacular accomplishments that might have been planning ability, they could not solve even the simplest
Theories and Individual Tests of Intelligence and Achievement 117

maze correctly. Sternberg (1986) has made it clear that Detroit Tests of Learning Aptitude-4 (DTLA-4)
intelligence just has too many components to be measured Cognitive Assessment System-II (CAS-II)
by any single test.
Kaufman Brief Intelligence Test-2 (KBIT-2)

Collectively, these instruments probably account for 95

5.2: Individual Tests percent of the intellectual assessments conducted in the


United States.

of Intelligence and The Wechsler scales have dominated intelligence test-


ing in recent years, but they are by no means the only via-
Achievement ble choices for individual assessment. Many other
instruments measure general intelligence just as well—
5.2 Analyze the individual intelligence tests some would say better. Consider the implications of a now
Individual intelligence testing is one of the major achieve- familiar observation: For large, heterogeneous samples,
ments of psychology since the founding of the discipline. In scores on any two mainstream instruments (e.g., Wechsler,
response to the success of the Binet-Simon scales in the early Stanford-Binet, McCarthy, Kaufman scales) typically cor-
1900s, psychologists developed and refined dozens of indi- relate .80 to .90. Often the correlation between two main-
vidual tests of intelligence patterned after this pathbreaking stream instruments is nearly as high as the test−retest
instrument. The explosive growth in group tests of intelli- correlation for either instrument alone. For purposes of
gence, fostered by the enthusiastic acceptance of the Army producing a global score, it would appear that any well-
Alpha and Beta tests during and after World War I, also pro- normed mainstream intelligence test will suffice.
vided impetus to the individual testing movement. Many But producing an overall score is not the only goal of
contemporary individual tests of intelligence owe their line- assessment. In addition, the examiner usually desires to
age to Binet, Simon, and the Army testing programs. gain an understanding of the subject’s intellectual func-
The successful application of intelligence tests inspired tioning. For this purpose, the overall IQ is important, but
educators and psychologists to look for ways to appraise there are instances in which the global score may be irrele-
the academic progress of students with school-based vant or even misleading. To understand a referral’s intel-
achievement tests. In turn, this led to the puzzling discov- lectual functioning, the examiner should also inspect the
ery that many children of normal or even superior intelli- subtest scores in search of hypotheses that might explain
gence lagged far behind in school achievement. From this the unique functioning of that individual. Of course, exam-
discovery, the concept of learning disability gradually iners need to undertake subtest analysis cautiously, armed
developed, and a whole new field of assessment was born. with research-based findings on the nature and meaning of
The purpose of this topic is to provide an overview of subtest scatter for the test in use (Gregory, 1994b).
noteworthy approaches to the testing of individual intelli- If the examiner’s goal is to understand intellectual
gence and achievement, and to introduce the reader to the functioning and not merely to determine an overall score,
essentials of learning disability assessment. However, an the differences between tests become quite real. Every
exhaustive survey of individual cognitive tests is simply instrument approaches the measurement of intelligence
beyond the scope of this or any other basic reference. New from a different perspective and yields a distinctive set of
and revised tests appear practically every month, and subtest scores. Furthermore, a test well suited for one refer-
thousands of new research findings are published every ral issue might perform abysmally in another context. For
year. We have chosen to review tests that are widely used example, the WAIS-IV performs admirably in the testing of
or that illustrate interesting developments in theory or mild mental retardation but contains too few simple items
method. Readers can find information on additional tests for the effective assessment of persons with moderate or
in the Mental Measurements Yearbook series, now published severe developmental disability.
every two or three years by the Buros Institute. A central axiom of assessment is that the choice of a
testing instrument should be based on knowledge of its
strengths and weaknesses as they pertain to the referral
5.2.1: Orientation to Individual question. Put simply, the skilled examiner does not blindly
Intelligence Tests rely on a single test for every referral! Instead, the skilled
The individual intelligence tests reviewed in this topic examiner flexibly chooses one or more instruments in light
include the following: of the perceived assessment needs of the examinee. Each of
the tests discussed in this topic has its special merits and
Wechsler Adult Intelligence Scale-IV (WAIS-IV)
also its particular shortcomings. The test user must know
Wechsler Intelligence Scale for Children-IV (WISC-IV) these strong and weak facets in order to choose the instru-
Stanford-Binet: Fifth Edition (SB5) ments best suited for each unique referral.
118 Chapter 5

5.2.2: The Wechsler Scales IQ =


Mental Age
Chronological Age
of Intelligence
Beginning in the 1930s, David Wechsler, a psychologist at with a new age-relative formula
Bellevue Hospital in New York City, conceived a series of Attained or Actual Score
elegantly simple instruments that virtually defined intelli- IQ =
Expected Mean Score for Age
gence testing in the mid- to late twentieth century. His
influence on intelligence testing is exceeded only by the This new formula was based on the interesting pre-
pathbreaking contributions of Binet and Simon. It is fitting sumption—stated in the form of an axiom—that IQ
that we begin the survey of individual tests with a histori- remains constant with normal aging, even though raw
cal summary of the Wechsler tradition, followed by a dis- intellectual ability might shift or even decline. The assump-
cussion of individual instruments. tion of IQ constancy is basic to the Wechsler scales. As
Wechsler (1941) put it:
Origins of the Wechsler Tests Wechsler began
The constancy of the I.Q. is the basic assumption of all
work on his first test in 1932, seeking to devise an instru-
scales where relative degrees of intelligence are defined in
ment suitable for testing the diverse patients referred to terms of it. It is not only basic, but absolutely necessary
the psychiatric section of Bellevue Hospital in New York that I.Q.’s be independent of the age at which they are
(Wechsler, 1932). In describing the development of his first calculated, because unless the assumption holds, no per-
test, he later wrote, “Our aim was not to produce a set of manent scheme of intelligence classification is possible.
brand new tests but to select, from whatever source avail-
able, such a combination of them as would meet the Although Wechsler’s view has been largely accepted
requirements of an effective adult scale” (Wechsler, 1939). by contemporary test developers, it is important to stress
In fact, the content of his scales was largely inspired by that the assumption of IQ invariance with age is really a
earlier efforts such as the Binet scales and the Army Alpha statement of values, a philosophical choice, and not neces-
and Beta tests (Frank, 1983). Readers who peruse Psycho- sarily an inherent characteristic of human nature.
logical Examining in the United States Army, a volume edited Wechsler also hoped to use his test as an aid in psychiat-
by Yerkes (1921) just after World War I, might be aston- ric diagnosis. In pursuit of this goal, he divided his scale into
ished to discover that Wechsler purloined dozens of test separate verbal and performance sections. This division
items from this source, many of which have survived to allowed the examiner to compare an examinee’s facility in
the present day in contemporary revisions of the Wechsler using words and symbols (verbal subtests) versus the ability
tests. Wechsler was not so much a creative talent as a prag- to manipulate objects and perceive visual patterns (perfor-
matist who fashioned a new and useful instrument from mance subtests). Large differences between verbal ability
the spare parts of earlier, discontinued attempts at intelli- (V) and performance ability (P) were thought to be of diag-
gence testing. nostic significance. Specifically, Wechsler believed that
The first of the Wechsler tests, named the Wechsler- organic brain disease, psychoses, and emotional disorders
Bellevue Intelligence Scales, was published in 1939. In dis- gave rise to a marked V > P pattern, whereas adolescent
cussing the rationale for his new test, Wechsler (1941) psychopaths and persons with mild mental retardation
explained that existing instruments such as the Stanford- yielded a strong P < V pattern. Subsequent research demon-
Binet were woefully inadequate for assessing adult intelli- strated many exceptions to these simple diagnostic rules,
gence. The Wechsler-Bellevue was designed to rectify and also helped refine the nature of these two major ele-
several flaws noted in previous tests: ments of intelligence. For example, verbal intelligence is
now better known as verbal comprehension, and perfor-
• The test items possessed no appeal for adults. mance intelligence is more commonly recognized as percep-
• Too many questions emphasized mere manipulation tual reasoning. Nonetheless, the distinction between verbal
of words. and performance skills has proved useful for many pur-
• The instructions emphasized speed at the expense of poses, such as studying brain−behavior relationships, and
accuracy. examining age effects on intelligence. Wechsler’s armchair
• The reliance on mental age was irrelevant to adult division of subtests into verbal and performance sections,
testing. even though refined and extended by others, continues to
endure as a major contribution to contemporary intelligence
To correct these shortcomings, Wechsler designed his testing (Kaufman, Lichtenberger, & McLean, 2001).
test specifically for adults, added performance items to bal-
ance verbal questions, reduced the emphasis upon speeded General Features of the Wechsler Tests
questions, and invented a new method for obtaining the Including revisions, David Wechsler and his followers
IQ. Specifically, he replaced the usual formula have produced more than a dozen intelligence tests in a
Theories and Individual Tests of Intelligence and Achievement 119

span of about 70 years. A major reason for the continued


success of these instruments has been the faithful adher- Table 5.6 Subtest Composition of the Wechsler
Intelligence Tests
ence to the familiar content and format first introduced in
the Wechsler-Bellevue. By sticking with a single successful WPPSI-IV WISC-IV WAIS-IV
formula, Wechsler and company ensured that examiners Similarities × × ×
could switch from Wechsler test to another with minimal Vocabulary × × ×
retraining. This was not only good psychometrics but also Comprehension × × ×
shrewd marketing insofar as it guaranteed several genera-
Information × × ×
tions of faithful test users.
Word Reasoning ×
The latest editions of the Wechsler intelligence tests—
Receptive Vocabulary ×
the WPPSI-IV, WISC-IV, and WAIS-IV—possess the follow-
Picture Naming ×
ing common features:
Block Design × × ×
• Thirteen to fifteen subtests. The multisubtest approach Picture Concepts × ×
allows the examiner to analyze intra-individual Matrix Reasoning × × ×
strengths and weaknesses rather than just to compute Picture Completion × ×
a single global score. In addition, it is possible to com- Visual Puzzles ×
bine subtest scores in theoretical meaningful ways that
Figure Weights ×
provide useful information on the broad factors of
Object Assembly ×
intelligence. As the reader will learn subsequently, the
L-N Sequencinga × ×
pattern of subtest and factor scores may convey useful
Arithmetic × ×
information that is hidden in the overall level of per-
Digit Span × ×
formance.
Coding × ×
• An empirically based breakdown into composite scores
Symbol Search × ×
and a full scale IQ. Whereas the original Wechsler intel-
Cancellation × ×
ligence scales provided only two composite scores—
Verbal IQ and Performance IQ—the revisions have Picture Memory ×

been moving toward a more sophisticated partitioning Bug Search ×

into composites confirmed from factor-analytic Zoo Memory ×

research. The WISC-IV and WAIS-IV now yield com- a


Letter–Number Sequencing.
Note: The subtests common to all Wechsler intelligence tests are in boldface. Some
posite or index scores in the same four areas: subtests are optional or used as substitutions. See text for details.

Verbal Comprehension
Perceptual Reasoning
Working Memory 5.2.3: The Wechsler Subtests:
Processing Speed
The WPPSI-IV provides five index scores similar to the
Description and Analysis
above (for ages 4:0 to 7:7) but also includes a Fluid Wechsler (1939) defined intelligence as “the aggregate or
Reasoning composite. global capacity of the individual to act purposefully, to
• A common metric for IQ and Index scores. The mean think rationally and to deal effectively with his environ-
for IQ and Index scores is 100 and the standard devia- ment.” He also believed that we can only know intelligence
tion is 15 for all tests and all age groups. In addition, by what it enables a person to do. In designing his tests,
the scaled scores on each subtest have a mean of 10 then, Wechsler selected components to represent a wide
and a standard deviation of approximately 3, which array of underlying abilities so as to estimate the global
permits the examiner to analyze the subtest scores of capacity of intelligence. Furthermore, he asked his subjects
the examinee for relative strengths and weaknesses. to do things, not merely to answer questions. The Wechsler
subtests are quite diverse and often rely on what Wechsler
• Common subtests for the different test versions. For
referred to as “mental productions.”
example, the preschool, child, and adult Wechsler
tests (WPPSI-IV, WISC-IV, and WAIS-IV) all share a Information The Information subtest is found on all
common core of the same six subtests (Table 5.6). An three Wechsler intelligence tests. Factual knowledge of per-
examiner who masters the administration of a core sons, places, and common phenomena is tested here. Ques-
subtest on any of the Wechsler tests (such as the Infor- tions for children are like the following:
mation subtest on the WAIS-IV) easily can transfer
this skill within the Wechsler family of intellectual “How many eyes do you have?”
measures. “Who invented the telephone?”
120 Chapter 5

“What causes a solar eclipse?” Digit Span is a measure of immediate auditory recall
“Which is the largest planet?” for numbers. Facility with numbers, good attention, and
freedom from distractibility are required. Performance on
Questions for adults are similar but progress to higher
this subtest may be affected by anxiety or fatigue, and
levels of difficulty. Difficult questions on the adult Informa-
many clinicians have noted that patients hospitalized for
tion subtest resemble:
medical or psychiatric reasons frequently perform poorly
“Which is the most common element in air?” on Digit Span.
“What is the population of the world?” Digits Forward and Digits Backward may assess fun-
“How does fruit juice get converted to wine?” damentally different abilities. Digits Forward seems to
“Who wrote Madame Bovary?” require the examinee to access an auditory code in sequen-
tial fashion. In contrast, to perform Digits Backward, the
Information items test general knowledge normally
examinee must form an internal visual memory trace from
available to most persons raised in the cultural institutions
the orally presented numerical sequences and then visu-
and educational systems of Western industrialized nations.
ally scan from end to beginning. Digits Backward is clearly
Indirectly, this subtest measures learning and memory
the more complex test; not surprisingly, it loads higher on
skills insofar as subjects must retain knowledge gained
general intelligence than does Digits Forward (Jensen &
from formal and informal educational opportunities in
Osborne, 1979). Gardner (1981) argues that examiners
order to answer the Information items.
should supplement standard reporting procedures and list
Information is usually regarded as one of the best
separate subscores for Digit Span. He presents separate
measures of general ability among the Wechsler subtests
means, standard deviations, and percentile ranks on Digits
(Kaufman, McLean, & Reynolds, 1988). For example, the
Forward and Backward for children ages 5 to 15.
WAIS-IV manual reveals that Information typically has the
second or third highest correlation with Full Scale IQ across Vocabulary The Vocabulary subtest is found on all
the 13 age groups (Wechsler, Coalson, & Raiford, 2008). three Wechsler intelligence tests. The examinee is asked to
Information consistently loads strongly on the first factor define up to several dozen words of increasing difficulty
identified in factor analyses of the WAIS-IV subtest correla- while the examiner writes down each response verbatim.
tions (see the following). The first factor is labeled Verbal For example, on an easy item the examiner might ask,
Comprehension. However, Information tends to reflect for- “What is a cup?” and the examinee would get partial credit
mal education and motivation for academic achievement for answering, “You drink with it” and full credit for
and may therefore yield spuriously high ability estimates answering, “It has a handle, holds liquids, and you drink
for perpetual students and avid readers. from it.” For adults and bright children, the advanced items
Digit Span Digit Span consists of two separate sections,
on the Wechsler Vocabulary subtests can be very challeng-
Digits Forward and Digits Backward. In Digits Forward, ing, on a par with tincture, obstreperous, and egregious.
the examiner reads a series of digits at one per second, then Vocabulary is learned largely in context from reading
asks the subject to repeat them. If the subject answers cor- books and listening to others. It is a rare individual who
rectly on two consecutive trials of the same length, the picks up vocabulary by reading the dictionary or memoriz-
examiner proceeds to the next series, which is one digit ing word lists from the “Building Your Wordpower” sec-
longer, up to a maximum length of nine digits. For Digits tion of popular magazines. In the main, a person’s
Backward, a similar procedure is used, except the examinee vocabulary is a measure of sensitivity to new information
must repeat the digits in reverse order, up to a maximum and the ability to decipher meanings based on the context
length of eight digits. For example, the examiner reads: in which words are encountered. Precisely because the
acquisition of word meaning depends on contextual infer-
“6−1−3−4−2−8−5” ence, the Vocabulary subtest turns out to be the single best
and the subject tries to repeat the numbers in the reverse measure of overall intelligence on the Wechsler scales
order: (Gregory, 1999). This is a surprise to many laypersons who
“5−8−2−4−3−1−6.” regard vocabulary as merely synonymous with educa-
tional exposure and, therefore, a mediocre index of general
On the WAIS-IV only, the Digit Span subtest also
intelligence. However, there is simply no denying the
includes a third section called Digit Sequencing. For this
empirical evidence: Vocabulary has among the highest
part, the examinee is asked to sort the series of digits into
subtest correlations with Full Scale IQ on both the WISC-IV
their correct order. For example, if the examiner says:
and also the WAIS-IV.
“1−7−4−9−2”
Arithmetic Except for the very easiest items for young
the examinee should respond: people or persons who have mental retardation, the Arith-
“1−2−4−7−9.” metic subtest consists of orally presented mathematics
Theories and Individual Tests of Intelligence and Achievement 121

problems. The examinee must solve the problems without alike?” The Similarities subtest evaluates the examinee’s
paper or pencil within a time limit (usually 30 to 60 sec- ability to distinguish important from unimportant resem-
onds). The simple items stress fundamental operations of blances in objects, facts, and ideas. Indirectly, these ques-
addition or subtraction, for example: tions assess the assimilation of the concept of likeness. The
“If you have fifteen apples and give seven away, how examinee must also possess the ability to judge when a
many are left?” likeness is important rather than trivial. For example,
“shirts” and “socks” are alike in that both begin with the
The more difficult items require proper conceptualiza-
letter s, but this is not the essential similarity between these
tion of the problem and the application of two arithmetic
two items. The important similarity is that shirts and socks
operations, for example:
are both exemplars of a concept, namely, “clothes.” As this
“John bought a stereo that was marked down 15 percent example illustrates, Similarities can be thought of as a test
from the original sales price of $600. How much did John of verbal concept formation and is found on all three
pay for the stereo?” Wechsler intelligence tests.
Although the mathematical requirements of the Arith- Letter−Number Sequencing The examiner orally
metic items are not excessively demanding, the necessity presents a series of letters and numbers that are in random
of solving the problems mentally within a time limit makes order. The examinee must reorder and repeat the list by
this subtest quite challenging for most examinees. In addi- saying the numbers in ascending order and then the letters
tion to rudimentary arithmetic skills, successful perfor- in alphabetical order. For example, if the examiner says
mance on Arithmetic requires high levels of concentration “R-3-B-5-Z-1-C,” the examinee should respond “1-3-5-B-C-
and the ability to maintain intermediate calculations in R-Z.” This test measures attention, concentration, and free-
short-term memory. In factor analyses of the WISC-IV and dom from distractibility. Together with Arithmetic and
WAIS-IV, Arithmetic often loads on a third factor inter- Digit Span, this subtest contributes to the Working Memory
preted as Working Memory. Index score on the WAIS-IV (see the following). Donders,
Comprehension Found on all three Wechsler intelli- Tulsky, and Zhu (2001) found the Letter−Number Sequenc-
gence tests, the Comprehension subtest is an eclectic collec- ing subtest to be highly sensitive to the effects of moderate
tion of items that require explanation rather than mere and severe traumatic brain injury.
factual knowledge. The easy questions stress common Picture Completion For this subtest, the examiner
sense, whereas the more difficult questions require an asks the examinee to identify the “important part” that is
understanding of social and cultural conventions. On the missing from a picture. For example, a simple item might
WAIS-IV, several of the most difficult questions require the be of this type: a picture of a table with one leg missing.
examinee to interpret proverbs. The items get harder and harder; testing continues until
An easy item on Comprehension is of the form “Why do the examinee misses several in a row. Figure 5.6 depicts an
people wear clothes?” Difficult items resemble the ­following: item similar to those found on the WAIS-IV. The Picture
“What does this saying mean: ‘A bird in the hand is worth
two in the bush.’”
Figure 5.6 Picture Completion Item Similar to Those
“Why are Supreme Court Judges appointed for life?” Found on the WAIS-IV
Comprehension would appear to be, in part, a meas-
ure of “social intelligence” in that many items tap the
examinee’s understanding of social and cultural conven-
tions. Sipps, Berry, and Lynch (1987) found that Compre-
hension scores were moderately related to measures of
social intelligence on the California Psychological Inven-
tory. Of course, a high score signifies only that the exami-
1 1
nee is knowledgeable about social and cultural conventions; 9 0 1 12
choosing the right action may or may not flow from this 8 1
7 2
knowledge. However, studies by Campbell and McCord 6 3
5 4
(1996) and Lipsitz, Dworkin, and Erlenmeyer-Kimling
(1993) provide no support for the commonly accepted clin-
ical lore that Comprehension scores are sensitive to social
functioning.
Similarities In this subtest, the examinee is asked
questions of the type, “In what way are shirts and socks
122 Chapter 5

Completion subtest presupposes that the examinee has


Figure 5.7 Matrix Reasoning Item Similar to Those Found
been exposed to the object or situation represented. For on the WAIS-IV
this reason, Picture Completion may be inappropriate for
culturally disadvantaged persons.

Picture Concepts This subtest is found on the


WPPSI-IV and the WISC-IV. For each item, the child is
shown a card with two or three rows of pictures and
instructed to choose one picture from each row to form a
group with a common characteristic. This is a recent sub-
test designed to measure abstract, categorical reasoning.
The 28 items reflect increasingly more difficult levels of
abstraction. For example, for an easy item the commonal-
?
ity might be that a fruit is found in each row, whereas for
a more difficult item the commonality might be that a
device used for signaling (bell, flashlight, flags) is found
in each row.

Block Design On the Block Design subtest, the exami-


nee must reproduce two-dimensional geometric designs 1 2 3 4 5
by proper rotation and placement of three-dimensional
colored blocks. For all of the Wechsler scales, the first few serial reasoning. Overall, the subtest is an excellent meas-
Block Design items can be solved through trial and error. ure of inductive reasoning based on figural stimuli. Matrix
However, the more difficult items require the analysis of Reasoning is not timed. Interestingly, Donders et al. (2001)
spatial relations, visual-motor coordination, and the rigid report that the Matrix Reasoning subtest is relatively unaf-
application of logic. Block Design demands much more fected by moderate and severe traumatic brain injury.
problem-solving and reasoning ability than most of the
Object Assembly This subtest is found only on the
Performance subtests in which memory and prior experi-
WPPSI-III. For each item, the examinee must assemble the
ence are more heavily weighted.
pieces of a jigsaw puzzle to form a common object (Figure 5.8).
Block Design is a strongly speeded test. Consider the
The examiner does not identify the items, so the examinee
WAIS-IV version, which consists of 14 designs of increas-
must first discern the identity of each item from its disar-
ing difficulty. To obtain a high score on this subtest, adults
ranged parts. Success on this subtest requires high levels of
must not only reproduce each of the designs correctly, but
perceptual organization; that is, the examinee must grasp a
they must also earn bonus points on the last six designs by
larger pattern or gestalt based on perception of the rela-
completing them quickly. An examinee who solves all the
tionships among the individual parts.
designs within the time limit but who fails to garner any
bonus points will test out at just slightly above average on
this subtest. Block Design scores may be misleading for Figure 5.8 Object Assembly Item Similar to Those Found
examinees who do not value speeded performance. on the WPPSI-III

Matrix Reasoning Matrix Reasoning is included on


all of the Wechsler intelligence tests. The subtest consists of
figural reasoning problems arranged in increasing order of
difficulty (Figure 5.7). Finding the correct answer requires
the examinee to identify a recurring pattern or relationship
between figural stimuli drawn along a straight line (simple
items) or in a 3 3 3 grid (hard items) in which the last item
is missing. Based on nonverbal reasoning about the pat-
terns and relationships, the examinee must infer the miss-
Object Assembly is one of the least reliable of the
ing stimulus and select it from five choices provided at the
Wechsler subtests. The modest reliability of Object Assem-
bottom of the card.
bly may reflect, in part, the small number of items as well
Matrix Reasoning was designed to be a measure of
as the role of chance factors in solving jigsaw puzzles.
fluid intelligence, which is the capacity to perform mental
operations such as manipulation of abstract symbols. The Coding The WISC-IV version consists of two separate
items tap pattern completion, reasoning by analogy, and and distinct parts, one for examinees under age 8 (Coding A)
Theories and Individual Tests of Intelligence and Achievement 123

and another for those 8 years of age and over (Coding B). Symbol Search This is a highly speeded subtest in
In Coding A, the child must draw the correct symbol inside which the examinee looks at a target group of symbols,
a series of randomly sequenced shapes. The task utilizes then quickly examines a search group of symbols, and
five shapes (star, circle, triangle, cross, and square), and finally marks a “YES” or “NO” box to indicate whether one
each shape is assigned a unique symbol (vertical line, two or more of the symbols in the target group occurred within
horizontal lines, single horizontal line, circle, and two the search group. A Symbol Search item is depicted in Fig-
vertical lines, respectively). After a brief practice session, ure 5.10. This subtest would appear to be a measure of pro-
the child is told to draw the correct symbol inside 43 of cessing speed. Symbol Search is highly sensitive to the
the randomly sequenced shapes. However, since there impact of traumatic brain injury (Donders et al., 2001).
is a two-minute time limit, high scores require rapid
­performance.
Coding B on the WISC-IV and Coding on the WAIS-IV Figure 5.10 Symbol Search Item Similar to Those Found
on the WISC-IV
are identical in format (Figure 5.9). For both subtests, the
Note: The examinee’s task is to determine whether either shape at the left
examinee must associate one symbol with each of the dig- occurs among the five shapes to the right.
its 0 through 9 and quickly draw the appropriate symbol
underneath a long series of random digits. The time limit YES NO
for both versions is two minutes. Very few examinees man-
age to code all the stimuli in this amount of time.
Cancellation On the WISC-IV, this is a timed subtest
Estes (1974) analyzed the Coding subtest from the
in which the child is instructed to draw a line through or
standpoint of learning theory and concluded that efficient
“cancel” drawings of animals placed randomly among
performance requires the ability to quickly produce dis-
drawings of inanimate objects (e.g., umbrella, car, hydrant,
tinctive verbal codes to represent each of the symbols in
lightbulb). For example, on a standard-sized sheet of paper,
memory. For example, in Figure 5.9, the examinee might
about 160 stimuli are pictured, including 30 animals (horse,
code the symbol underneath the number 2 as an “inverted
bear, seal, fish, chicken). Cancellation consists of two trials:
T.” Verbal coding mediates quick performance by simplify-
one with a random arrangement of visual stimuli, and one
ing a difficult task. Efficient performance also demands
with clearly structured rows and columns of stimuli. In
immediate learning of the digit-symbol pairings so that the
addition to a total subtest score, separate process scores for
examinee need not look from each digit to the reference
the random and the structured trials are available for com-
table to determine the correct response. In this regard,
parison. This subtest is similar to existing cancellation
Coding is unique: It is the only Wechsler subtest that neces-
tasks designed to measure processing speed, vigilance, and
sitates on-the-spot learning of an unfamiliar task.
visual attention. It is well established that examinees with
neuropsychological impairments perform poorly, espe-
Figure 5.9 Digit Symbol Items Similar to Those Found on cially on the random trial (e.g., Bate, Mathias, & Crawford,
the WAIS-IV 2001; Geldmacher, 1996). On the WAIS-IV, Cancellation is
somewhat more complex, involving two target stimuli con-
1 2 3 4 5 6 7 8 9 sisting of geometric shapes. The examinee is told, for
example, to cancel “red squares and yellow triangles”
among an array of red and yellow squares and red and yel-
6 2 5 9 1 3 2 6 4 low triangles. A second trial involves stars and circles in
orange and blue. This timed task (45 seconds per trial) is
much more difficult than it seems.

Coding scores show a steep decrement with advancing Visual Puzzles Visual Puzzles is found only on the
age. In cross-sectional studies, raw scores on Coding WAIS-IV. The examinee is shown a picture of a completed
decline by as much as 50 percent from age 20 to age 70 shape such as a rectangle, and asked to select from six
(Wechsler, 1981). The decrement is approximately linear smaller shapes the three that could be used to assemble the
and not easily explained by superficial references to moti- larger completed shape. Successful performance requires
vational differences or motor slowing. Of course, cross-sec- visual-spatial analysis and the mental rotation of shapes.
tional results are not necessarily synonymous with According to the WAIS-IV Technical Manual, this subtest
longitudinal trends. However, the age decrement on Cod- taps for “visual perception, broad visual intelligence, fluid
ing is so steep that it must indicate, in part, a real age intelligence, simultaneous processing, spatial visualization
change in the speed of basic information processing skills. and manipulation, and the ability to anticipate relation-
Coding is one of the most sensitive subtests to the effects of ships among parts (Wechsler, 2008b, p. 14). The 26 items
organic impairment (Donders et al., 2001; Lezak, 1995). have strict time limits of 20 seconds for the initial easy
124 Chapter 5

items and 30 seconds for the remaining items. Visual Puz- The breakdown of subtests for the four index scores is as
zles is a core subtest that contributes to the Perceptual Rea- follows:
soning Index of the WAIS-IV.
Verbal Comprehension Index
Figure Weights Figure Weights is found only on the • Similarities
WAIS-IV. It is a supplemental subtest that contributes to • Vocabulary
the Perceptual Reasoning Index. The examinee is shown a • Information
picture of an old-fashioned fulcrum scale that is missing
Perceptual Reasoning Index
weight(s) on one side. The task is to select from six options
the response that would bring the scale into balance. This • Block Design
subtest is a measure of quantitative and analogical reason- • Matrix Reasoning
ing; inductive and deductive logic are essential for success. • Visual Puzzles
Easy items provide a time limit of 20 seconds, hard items
Working Memory Index
allow 40 seconds.
• Digit Span
• Arithmetic
5.2.4: Wechsler Adult Processing Speed Index
Intelligence Scale-IV • Symbol Search
The WAIS-IV is a significant revision of the WAIS-III, even • Coding
though many of the previous items were retained
The Verbal Comprehension Index (VCI) is similar to
(Wechsler, 2008). The most significant changes include the
the outdated notion (used on the WAIS-III) of Verbal IQ or
addition of two subtests, a simplified test structure, and an
VIQ. However, from a psychometric standpoint, VCI is a
emphasis on index scores that provide a sharper demarca-
cleaner and more direct measure of verbal comprehension
tion of discrete domains of cognitive functions. In addi-
than VIQ, hence it is now the preferred index. Likewise, the
tion, the WAIS-IV abandons the familiar (but
Perceptual Reasoning Index (PRI) is similar to the former
psychometrically indefensible) bifurcation of intelligence
notion (from the WAIS-III) of Performance IQ or PIQ. Yet, as
into Verbal IQ and Performance IQ, preferring instead the
a more refined measure of perceptual reasoning, PRI is
fourfold breakdown discussed below. In addition to tradi-
therefore the preferred index. Put simply, VCI and PRI fit
tional approaches to scoring the WAIS-IV subtests, the
the factor analytic data better. Long-held conventions tend
new edition also provides neuropsychologically relevant
to persist, but it is time to let the outdated notions of Verbal
process scores for four of the subtests. These scores are
IQ and Performance IQ fade into oblivion.
useful mainly for advanced forms of test interpretation in
The Working Memory Index (WMI) is comprised of
the context of a comprehensive test battery. We do not dis-
subtests sensitive to attention and immediate memory
cuss process scores in this section. Because of improve-
(Digit Span and Arithmetic). A relatively low score on this
ments in the WAIS-IV protocol forms (e.g., prominent
index may signify that the examinee has an attentional or
display of discontinue rules), this test is somewhat easier
memory problem, especially with orally presented materi-
to administer than its predecessor. Lichtenberger and
als. The Processing Speed Index (PSI) comprises subtest
Kaufman (2009) provide an outstanding overview of the
that require the highly speeded process of visual informa-
WAIS-IV in clinical practice.
tion (Symbol Search and Coding). The PSI is sensitive to a
The WAIS-IV is comprised of 15 subtests, but only 10
wide variety of neurological and neuropsychological con-
of the subtests, known as core subtests, are needed to
ditions (Tulsky, Zhu, & Ledbetter, 1997).
obtain the traditional IQ score and the component index
scores. The other five subtests are deemed supplemental. WAIS-IV Standardization The standardization of
These are often used to provide additional clinical infor- the WAIS-IV was undertaken with great care and based on
mation; in specific instances, supplemental subtests may data gathered by the U.S. Bureau of the Census in 2005.
be used as acceptable substitutes for core subtests. The total sample of 2,200 adults (ages 16 to 91) was care-
In addition to the traditional Full Scale IQ score, nor- fully stratified on these variables: gender, race/ethnicity,
med to a mean of 100 and standard deviation of 15, the education level, and geographic region. Census figures
WAIS-IV is scored for four index scores, each based on 2 or from 2005 were used as the target values for the stratifica-
3 of the 10 core subtests. These are derived from factor tion variables. For example, of persons in the 55- to 64-year-
analysis of the subtests, which revealed four domains: Ver- old range, the Census Bureau found that 3.35 percent are
bal Comprehension, Perceptual Reasoning, Working Mem- African Americans with high school education. In like
ory, and Processing Speed. The index scores are also based manner, 3.00 percent of the standardization participants
on the familiar mean of 100 and standard deviation of 15. were African Americans with high school education.
Theories and Individual Tests of Intelligence and Achievement 125

The standardization sample was divided into 13 age In contrast to the strong reliabilities found for IQ and Index
bands: 16−17, 18−19, 20−24, 25−29, 30−34, 35−44, 45−54, scores, the reliabilities of the 15 individual subtests are gen-
55−64, 65−69, 70−74, 75−79, 80−84, 85−90. Except for the erally much weaker. The only subtests with stability coeffi-
four oldest age groups, each sample included 200 partici- cients in excess of .90 are Information (.90) and Vocabulary
pants carefully stratified on the demographic variables (.91). For the remaining subtests, reliability values range
noted earlier; the last four age groups included 100 par- from the low .70s to the mid .80s. The most important impli-
ticipants each. The resulting sample bears a very close cation of these weaker reliability findings is that examiners
correspondence to the U.S. Census proportions. How- should approach subtest profile analysis with extreme cau-
ever, persons suspected of even mild cognitive impair- tion. Subtest scores that appear discrepantly high (or low)
ment were excluded, so that the standardizations sample for an individual examinee might be a consequence of the
likely is healthier than its census counterparts. Specifi- generally weak reliability of certain subtests rather than
cally, several exclusionary criteria were used in the indicating true cognitive strengths or weaknesses. Some
standardization sample, including: uncorrected visual or reviewers conclude that profile analysis (the identification
hearing impairment, current hospitalization, evidence of of specific cognitive strengths and weaknesses based on
drug/alcohol problems, upper extremity impairment, analysis of peaks and valleys in the subtest scores) is not
use of certain prescription drugs such as anticonvulsants, justified by the evidence.
and a variety of potentially brain-impairing conditions
(e.g., head injury, stroke, epilepsy, dementia, and mood Validity The developers of the WAIS-IV provide a
disorder). Uncooperative participants and those for number of different lines of evidence to support the valid-
whom English was a second language also were ity of this instrument (Wechsler, 2008b). Good content
excluded. In sum, the standardization sample was validity was built in from the beginning through compre-
restricted to cooperative, reasonably healthy, English- hensive literature review and consultation with experts to
speaking individuals who did not manifest significant assure that items and subtests tap the relevant range of
brain-impairing conditions. cognitive processes. Good criterion-related validity was
Although the WAIS-IV is similar to the WAIS-III and demonstrated in several studies correlating the WAIS-IV
has a substantial item overlap, the two tests do not yield with mainstream intelligence tests and other measures. For
analogous IQs. In counterbalanced studies comparing example, WAIS-IV Full Scale IQ correlates strongly with
scores of 240 adults on the two tests, WAIS-IV IQ scores are global scores on other mainstream measures: .94 with the
lower by 3 points. In sum, the WAIS-IV is a harder test than WAIS-III, .91 with the WISC-IV (for 16-year-olds in the
the WAIS-III. There is a troubling enigma here: Why does overlapping age group), and .88 with the Wechsler Indi-
the normative sample for the WAIS-IV appear to be smarter vidual Achievement Test-II. The WAIS-IV also reveals
than the normative sample for the WAIS-III? appropriate convergent and discriminant validity in the
patternings of strong and weak correlations with a wide
Reliability The reliability of the WAIS-IV is exception- variety of other instruments, including measures of atten-
ally good. Composite split-half reliabilities averaged across tion deficit disorder, executive functions, and memory. As
all age groups for the Index scores and IQ are: VCI .96, PCI a generalization, correlations are appropriately strong
.95, WMI .94, PSI .90, and Full Scale IQ 98. Further support- among similar subtests and constructs from the WAIS-IV
ing the reliability of the WAIS-IV, reliability estimates for and other tests, and appropriately weak among dissimilar
subtest scores of special groups (e.g., persons with intel- subtests and constructs.
lectual disability, probable Alzheimer’s disease, traumatic Studies with special groups also provide theory-con-
brain injury, major depression, autism) are equal to or firming results that speak to the validity of the WAIS-IV.
higher than reliability estimates found in the general popu- The multiplicity of these studies is such that we can only
lation (Wechsler, 2008b). This suggests that the WAIS-IV is provide a few examples here. Specifically, when 41 young
a reliable tool not just with the general population but also adults with diagnosed Mathematics Disorder were com-
with the special populations who are more likely to be the pared to matched controls on WAIS-IV subtests, the most
focus of assessment. substantial difference by far was found on the Arithmetic
For Full Scale IQ, the standard error of measurement is subtest, where the clinical group averaged 6.6 compared to
2.6 points for the youngest examinees (ages 16 and 17), but 8.8 for the matched controls (a subtest score of 10 is aver-
even smaller at 2.1 points for all other age groups. Consider age in the general population). This corroborates the sensi-
what this means: 95 percent of the time, an examinee’s true tivity of the instrument to the elements of one specific
Full Scale IQ will be with ±4 points (2 standard errors of learning disability. In like manner, when 22 individuals
measure) of the obtained value. In common parlance, psy- with a history of moderate or severe brain injury were
chometrists would say that WAIS-IV IQ has an 8-point band compared to matched controls, the largest difference
of error, that is, IQ scores are accurate within about ±4 points. among the four index scores was found on the Processing
126 Chapter 5

Speed Index (mean of 80.5 versus mean of 97.6), whereas with that index score. The only exception is the Arithmetic
the smallest difference among the four index scores was subtest, which is factorially more complex than other sub-
found on the Verbal Comprehension Index (mean of 92.1 tests, showing an almost identical relationship with VCI,
versus mean of 100.8). These findings are exactly what PRI, and WMI.
would be predicted from a wide body of research on the Finally, the validity of the WAIS-IV is also buttressed
impact of traumatic brain injury (e.g., Lezak, Howieson, & by its strong overlap with the previous three editions of the
Loring, 2004). test, for which there is an impressive array of validity data.
The construct validity of the WAIS-IV is also sup- For a full review of these findings the reader can consult
ported by confirmatory factor analyses of the subtest scores Matarazzo (1972) and Kaufman (1990).
from the standardization sample, as detailed in the techni-
cal manual (Wechsler, 2008b). These complex analyses
were designed to determine if the relations among 5.2.5: Wechsler Intelligence Scale
observed subtest scores support the existence of the for Children-IV
hypothesized factors of intelligence measured by the four The Wechsler Intelligence Scale for Children (WISC) was
index scores of VCI, PRI, WMI, and PSI. The goodness-of- published in 1949 as a downward extension of the original
fit of the four factor hierarchical model of intelligence (Full Wechsler-Bellevue. Although used widely in the next two
Scale IQ at the top, sitting above the four index scores, each decades, psychometricians perceived a number of flaws in
sitting above two or three constituent subtest scores) turns the WISC: absence of nonwhites in the standardization
out to be exceptionally strong, although difficult to sum- sample, ambiguities of scoring, inappropriate items for
marize in visual form. A simple way to depict the strong children (e.g., reference to “cigars”), and absence of
confirmatory fit is through a 4 × 10 table that shows the females and African Americans in the pictorial content of
correlations among the four index scores and the 10 core items. The WISC-R, WISC-III, and WISC-IV corrected
subtest scores (Table 5.7). Where appropriate, these correla- these flaws.
tions are corrected for overlap between the subtest scores The WISC-IV consists of 15 subtests, 10 of which are
and the index scores. For example, Similarities is a compo- designated as core subtests used in the computation of
nent of VCI, so the simple correlation between these two composite scores and Full Scale IQ, and five of which are
variables is artificially inflated. The values shown in Table designated as supplemental:
5.7 are corrected for this kind of overlap. The reader will
notice that with only a single exception, the subtests that
compose each index score reveal their highest correlations

Table 5.7 Correlations Among WAIS-IV Subtests and


Index Scores
VCI PRI WMI PSI
Verbal Comprehension Subtests
Similarities 74 57 57 42
Vocabulary 81 55 60 41
Information 63 54 56 37
Perceptual Reasoning Subtests
Block Design 51 67 53 45
Matrix Reasoning 56 59 55 46
Visual Puzzles 48 66 49 41
Working Memory Subtests
Digit Span 53 52 60 47
Arithmetic 63 59 60 44
Processing Speed Subtests
Symbol Search 38 47 43 65
Coding 43 48 49 65 Although the supplemental subtests are not required
Source: Based on data in Wechsler, D. (2008). WAIS-IV technical and interpretive manual.
for the computation of Full Scale IQ and composite scores
San Antonio, TX: Pearson. (discussed later), careful examiners nonetheless may
Note: Decimals have been omitted. Where appropriate, these correlations are corrected for
overlap. For example, because Similarities is a component of VCI, the simple uncorrected choose to administer them because of the important diag-
correlation between these two variables would be artificially inflated. The values above are
corrected for any componential overlap between subtests and index scores. nostic information they often provide. For example, the
Theories and Individual Tests of Intelligence and Achievement 127

Cancellation subtest is supplemental but affords important excessive detail, so we refer the interested reader to Sattler
information about vigilance and visual attention; hence, (2001) for a good review of earlier studies. The WISC-IV
many examiners use it. The Arithmetic subtest also is sup- manual cites an impressive array of validity studies, which
plemental but often chosen by examiners because it is we summarize here. First, we discuss correlations of
helpful in the assessment of auditory attention (the ques- WISC-IV test scores with its predecessor and with other
tions are presented orally). Wechsler intelligence tests. The preliminary findings indi-
Another function of the supplementary subtests is cate strong correlations with comparable WISC-III sub-
suitable substitution for a core subtest. In well-defined cir- tests, most in the high .70s or low .80s. The correlation for
cumstances, an examiner may elect to give a supplemental Full Scale IQ is much higher, r = .89 Likewise, correlations
subtest in place of a core subtest. For example, when test- with the WPPSI-III are strong for comparable subtests,
ing a child with fine motor problems—such as might be and, again, exceptionally strong for Full Scale IQ, r = .89 A
observed in a child with cerebral palsy—an examiner similar pattern is found with 16-year-old examinees, who
would be well advised to use Cancellation in place of Cod- can be tested legitimately with both the WISC-IV and the
ing, and Picture Completion in place of Block Design. Both WAIS-III. In a sample of 198 children tested in counterbal-
of these supplementary tests (Cancellation and Picture anced order over a period of about three weeks, correla-
Completion) are relatively unaffected by fine motor diffi- tions were strong for comparable subtests and
culties. In contrast, the core subtests (Coding and Block exceptionally strong for composite and Full Scale IQ scores
Design) would be severely impacted by fine motor difficul- (r = .89). Overall, these are remarkable correlations, nearly
ties and, therefore, could yield unfair assessments of cogni- as strong as the reliabilities of the respective scales would
tive functioning. Substitutions also are allowed when a allow. An interesting finding is that WISC-IV IQs are an
core subtest accidentally is invalidated. However, an exam- average of 2.5 points lower than WISC-III IQs and 3 points
iner may not elect to substitute a supplemental subtest lower than WAIS-III IQs. This is a consistent finding in the
merely because a child has performed poorly on a core history of individual intelligence tests; namely, newer
subtest. tests almost invariably yield lower Q scores in comparison
The standardization of the WISC-IV is first class, to older tests. We discuss this intriguing result, called the
based on 100 boys and girls at each year of age from 6½ Flynn effect, in the next chapter.
through 16½ (total N = 2,200). These cases were carefully Factor-analytic studies of the standardization sample
selected and stratified on the basis of the 2000 U.S. Census provided additional evidence for the utility of the WISC-IV
with respect to gender, race/ethnicity (white, African in the diagnostic assessment of children. The results of
American, Hispanic, and Asian), geographic region, and numerous factor analyses, including separate analyses for
parent educational level. A desirable feature of the stand- four age groups (6−7, 8−10, 11−13, 14−16) strongly con-
ardization sample is that 5.7 percent of the sample con- firmed a four-factor solution that was used to define the
sisted of children with defined characteristics such as composite scores, called Index scores, for the test (Wechsler,
giftedness, learning disability, expressive language disor- 2003). The factors and the core subtests assigned to them
der, head injury, autism, and motor impairment. The pur- were as follows:
pose of adding these children was to ensure that the
normative sample accurately represented the population
of children attending school. The correspondence between
Factors and the core subtests assigned to the four Index
the standardization sample and the U.S. Census data on
scores
essential stratification variables was nearly perfect
(Wechsler, 2003, p. 40).
The reliability of the WISC-IV is strong and compara-
ble to previous editions of the test. For example, the IQ and
composite scores show split-half and test−retest reliabili-
ties in the .90s, whereas the individual subtests possess
somewhat lower reliability coefficients, ranging from .79
(Cancellation and Symbol Search) to .90 (Letter−Number
Sequencing). Most reliabilities are in the high .80s, for
example, Block Design and Similarities at .86, and Vocabu-
lary and Matrix Reasoning at .89. Test−retest reliabilities
tend to be slightly lower.
The validity of the WISC-IV rests, in part, on its over-
lap with the WISC-III, for which dozens of supportive
studies could be cited. We do not want to overwhelm with
128 Chapter 5

The four Index scores are based on the familiar mean milestones in the development of the SB5 and its predeces-
of 100 and standard deviation of 15. Thus, the WISC-IV sors. Released in 2003, the SB5 is a very new test (Roid,
provides substantial detail about the nuances of intellec- 2002, 2003). For this reason, evaluation of this instrument is
tual functioning—up to 15 subtest scores, four Index based, in part, on its resemblance in content and subtests to
scores, and the Full Scale IQ. The robust findings of the the SB4, for which a large body of independent research
four-factor solution to the WISC-IV provided the ration- literature has been amassed.
ale for abandoning Wechsler’s original two-factor divi-
sion of Verbal IQ and Performance IQ. In fact, there is no
longer any method on the WISC-IV to obtain a Verbal IQ Table 5.8 Milestones in the Development of the Stanford-
or a Performance IQ—precisely because these partitions Binet and Predecessor Tests
no longer fit with the emerging consensus about the
nature of intelligence.
The WISC-IV also revealed theory-confirming correla-
tions with a variety of cognitive, ability, and achievement
tests (Wechsler, 2003). In general, correlations with other
measures were appropriately high for similar constructs
and predictably low for dissimilar constructs—these are
the prerequisites for convergent validity and discriminant
validity, respectively. For example, in a sample of 550 chil-
dren aged 6−16, reading achievement subtest scores from
the Wechsler Individual Achievement Test-II correlated
more strongly with Verbal Comprehension Index scores
from the WISC-IV than with the other Index scores. Like-
wise, in a sample of 126 children aged 6−16, the Attention/
Concentration subtest from the Children’s Memory Scale
(Cohen, 1997) correlated substantially (r = .74) with Work-
ing Memory Index scores from the WISC-IV but less
robustly with the other Index scores. These and other find-
ings indicate general support for the convergent validity
of the WISC-IV Index scores. Discriminant validity was The SB5 Model of Intelligence  In early editions
confirmed by the negligible relationships among WISC-IV of the Stanford-Binet, the examiner obtained only a com-
Index scores and measures of emotional intelligence from posite IQ. Although the pattern of right and wrong answers
the BarOn Emotional Quotient Inventory (BarOn EQI, could be analyzed qualitatively, the earlier Stanford-Binet
Bar-On & Parker, 2000). For the most part, research has tests (prior to the fourth edition) did not provide a basis
shown that emotional intelligence is independent of cog- for quantitative analysis of the subcomponents of the
nitive intelligence. Thus, relationships among Index scores entire scale. The fourth and fifth editions corrected this
from the WISC-IV and subtest scores from the BarOn EQI shortcoming.
should bear out as insignificant. In fact, the correlations The organization of the SB5 was guided by the princi-
were negligible, in the range of .06 to .20. The only excep- ple that each of five factors of intelligence can be assessed
tions were sensible ones. For example, scores on the in two distinct domains—nonverbal and verbal. The five
Adaptability subscale from the BarOn EQI correlated .34 factors—derived from modern cognitive theories such as
with WISC-IV Full Scale IQ. Certainly, it is plausible that Carroll (1993) and Baddeley (1986)—are fluid reasoning,
adaptability as measured by the BarOn EQI is rooted, in knowledge, quantitative reasoning, visual-spatial process-
part, in a foundation of cognitive skills, as mirrored in IQ, ing, and working memory. When these five factors of intel-
thus illuminating the modest correlation between these ligence are “crossed” with the two domains (nonverbal
two measures. and verbal), the result is an instrument with 10 subtests
(Figure 5.11). Thus, the SB5 provides a number of different
5.2.6: Stanford-Binet Intelligence perspectives on the cognitive functioning of an examinee:
10 subtest scores (mean of 10, SD of 3), three IQ scores (the
Scales: Fifth Edition familiar Full Scale IQ, Verbal IQ, and Nonverbal IQ), as
With a lineage that goes back to the Binet-Simon scale of well as five factor scores (Fluid Reasoning, Knowledge,
1905, the Stanford-Binet: Fifth Edition (SB5) has the oldest Quantitative Reasoning, Visual-Spatial Processing, and
and perhaps the most prestigious pedigree of any individ- Working Memory). The IQ and factor scores are normed to
ual intelligence test. In Table 5.8, we outline some important a mean of 100 and SD of 15.
Theories and Individual Tests of Intelligence and Achievement 129

for very young children (as young as age 2) and adults


Figure 5.11 Structure of the Stanford-Binet: Fifth Edition
with mental retardation. In addition, the items and sub-
DOMAINS tests that contribute to the Nonverbal IQ do not require
Nonverbal Verbal
expressive language, which makes this part of the test
Fluid Nonverbal Verbal Fluid ideal for assessing individuals with limited English,
Reasoning Fluid Reasoning Reasoning
deafness, or communication disorders. The developers of
Knowledge
Nonverbal Verbal the SB5 also screened test items for fairness based on reli-
Knowledge Knowledge
gious as well as traditional concerns. Expert panels
Quantitative Nonverbal Quantitative Verbal Quantitative
Reasoning Reasoning Reasoning
examined the entire test on fairness issues related to the
FACTORS standard variables (gender, race, ethnicity, and disabil-
Visual-Spatial Nonverbal Visual- Verbal Visual-
Reasoning Spatial Processing Spatial Processing ity) and religious tradition (Christian, Jewish, Muslim,
Hindu, and Buddhist backgrounds). This is the first time
Working Nonverbal Working Verbal Working
Memory Memory Memory in the history of intelligence testing that religious tradi-
Nonverbal IQ Verbal IQ
tion has been considered in test development. Finally, the
Working Memory factor, consisting of both verbal and non-
FULL SCALE IQ verbal subtests, shows promise in helping to assess and
understand children with attention-deficit/hyperactivity
Routing Procedure and Tailored Testing disorder.
The SB5 maintains the historical tradition of this instru-
ment by using a routing procedure to estimate the general Standardization and Psychometric Proper-
cognitive ability of the examinee before proceeding to the ties of the SB5 The SB5 is suitable for children age 2
remainder of the test. The purpose of the routing proce- through adults age 85 and older, and the standardization
dure is to identify the appropriate starting points for subse- sample consists of 4,800 individuals stratified by gender,
quent subtests. The routing items are both nonverbal ethnic, regional, and educational levels in the United
(object series and matrices) and verbal (vocabulary). These States, based on the year 2000 census. In part because item
items also provide the Abbreviated IQ, sometimes used for selection was determined by modern item response the-
screening purposes. Roid (2002) describes the advantages ory, the reliability of subtests, indices, and IQ scores is
of using a routing procedure: very strong and comparable to other mainstream individ-
ual intelligence tests. For example, the Verbal IQ, Nonver-
This tailored approach to assessment provides greater
richness of factor measurement within a shorter, efficient
bal IQ, and Full Scale IQ each have reliabilities in the .90s,
test administration. The use of modern item response and the individual subtests are in the range of .70 to .85
theory in the design of SB5 allows for greater precision of (Roid, 2002).
measurement due to the adaption of the test to the func- As is typical in the release of a new test, the manual
tional level of the examinee in an efficient time frame. for the SB5 (Roid, 2003) reports on numerous affirming
correlational studies (e.g., with the Wechsler scales, the
Thus, the purpose of the routing procedure is not just
SB4, the UNIT) that provide strong support for criterion-
to reduce the number of items administered (and, there-
related validity. The validity of the test as a measure of
fore, save time), but to do so without loss of measurement
general intelligence is also supported by its resemblance
precision. This is possible because the SB5 was constructed
to the SB4, about which a large body of research can be
according to the principles of item response theory
cited. For example, Lamp and Krohn (2001) studied the
(Embretson, 1996). When a test is constructed within the
longitudinal predictive validity of the SB4 in a sample of
framework of item response theory, item difficulty levels
89 Head Start children (39 African American and 50
and other parameters are precisely calibrated during the
white) from impoverished backgrounds who ranged in
development phase.
age from about 4 to 6½. These children were retested sev-
Special Features of the SB5 In addition to pro- eral times over an eight-year period on both the SB4 and
viding a more familiar partition of intelligence into Full the Metropolitan Achievement Test. The correlations
Scale IQ, Verbal IQ, and Nonverbal IQ, the SB5 also fea- between the initial SB4 score and the subsequent achieve-
tures a number of other improvements over its predeces- ment scores were very strong (mainly in the .50s), and
sor, the SB4. The test now includes extensive high-end the test was equally good at predicting outcome for Afri-
items, designed to assess the highest level of gifted per- can American and white children. In another study
formance. Many of these items are updates from very (Atkinson, Bevc, Dickens, & Blackwell, 1992), the concur-
early editions of the Stanford-Binet, when the instrument rent validity of the SB4 was tested against the Leiter
was renowned for its very high ceiling. At the other International Performance Scale and the Vineland Adap-
extreme, improved low-end items provide better assessment tive Behavior Scales in a sample of 24 children with
130 Chapter 5

developmental delays. The correlations were very robust The 16 composite scores are based on the familiar
(.78 and .70, respectively). These and many other studies mean of 100 and standard deviation of 15. The 10 subtests
strongly support the validity of the SB4 as a measure of are normed for a mean of 10 and standard deviation of 3.
general intelligence. As new research is reported on the The composites were designed to offer contrasting
SB5, it is likely that this recent edition also will prove to assessments such that a difference between scores may be
be highly valid and even more useful than its predecessor of diagnostic significance. For example, an examinee who
as a measure of intelligence. scored well on Attention-Reduced aptitude but poorly on
In summary, the SB5 is a very promising new test that Attention-Enhanced aptitude (in the Attentional domain)
is especially useful at both ends of the cognitive spec- presumably experiences difficulty with immediate recall,
trum—the very young or those with developmental delays, short-term memory, or focused concentration.
and very gifted persons. Based on the care with which the The DTLA-4 was standardized on 1,350 students
instrument was constructed, the test is likely to become a whose backgrounds closely matched census data for sex,
mainstay of individual intelligence testing in a wide vari- race, urban/rural residence, family income, educational
ety of settings. attainment of parents, and geographic area. The reliability
of this instrument is similar to other individual tests of
intelligence, with internal consistency coefficients gener-
5.2.7: Detroit Tests of Learning ally exceeding .80 for the subtests and .90 for the compos-
Aptitude-4 ites, and test-retest coefficients for the subtests and the
The Detroit Tests of Learning Aptitude-4 (DTLA-4; Hammill, composites in the .80s and .90s. Criterion-related validity is
1999) is a recent revision of an instrument first published in well established through correlational studies with other
1935. The test is individually administered and designed mainstream instruments such as the WISC-III, K-ABC, and
for schoolchildren from 6 through 17 years of age. The Woodcock-Johnson.
DTLA-4 consists of 10 subtests that form the basis for com-
puting 16 composites, including general intelligence, opti- Table 5.9 Brief Description of the DTLA-4 Subtests
mal level, and 14 ability areas. The subtests are largely
within the Binet-Wechsler tradition, although there are a
few surprises such as the inclusion of Story Construction, a
measure of storytelling ability (Table 5.9).
The General Mental Ability composite is formed by
combining standard scores for all 10 subtests in the bat-
tery. The Optimal Level composite is based on the highest
four standard scores earned by the examinee and is
thought to represent how well the examinee might per-
form under optimal circumstances. Each of the remaining
14 composite scores is derived from a combination of sev-
eral subtests thought to measure a common attribute. For
example, subtests that involve knowledge of words and
their use are combined to form the Verbal Composite,
whereas subtests that do not involve reading, writing, or
speech comprise the Nonverbal Composite. Several of the
composite scores are designed to represent major con-
structs within contemporary theories of intelligence. In
addition to the General Mental Ability composite and the
Optimal Level composite, the remaining 14 DTLA-4 com-
posite scores are as follows:

Verbal Nonverbal (Linguistic)


Attention-enhanced Attention-reduced (Attentional)
Motor-enhanced Motor-reduced (Motoric)
Fluid Crystallized (Horn & Cattell)
Simultaneous Successive (Das)
A concern with the DTLA-4 is that the conceptual
Associative Cognitive (Jensen)
breakdown into composites is not sufficiently supported
Verbal Performance (Wechsler)
by empirical evidence. For example, while it may be true
Theories and Individual Tests of Intelligence and Achievement 131

that the Simultaneous composite does measure the simul- numbers and is instructed to underline the two numbers in
taneous cognitive processes proposed by Das, Kirby, and each row that are identical. The numbers increase in length
Jarman (1979), there is scant empirical support to buttress from one digit to seven digits. The subtest score is based on
this claim. Another problem with this instrument is that a combination of time to completion and number correct.
there are more composites than there are subtests! Inevita- In the Planned Codes subtest, the task is to learn a code
bly, the composites will be highly intercorrelated, because depicted at the top of the page (such as A goes with X-O, B
each subtest occurs in several composites. In sum, DTLA-4 goes with O-O, C goes with X-X, D goes with O-X) and
may be a good measure of general intelligence, but the use then fill in missing codes in the remainder of the page (for
of composite scores for purposes of psychoeducational example, A _ _, C_ _, B_ _, A_ _, D_ _, etc.). In the Planned
planning requires additional empirical study. Smith (2001) Connections subtest (a variation of the Trail Making Test,
provides a thorough review of the DTLA-4. part B, Reitan & Wolfson, 1993), the child draws a pencil
line to connect randomly placed numbers and letters in
5.2.8: The Cognitive Assessment sequential order, alternating between numbers and letters
(1-A-2-B-3-C, etc). The Planning subtests involve cognitive
System-II control and self-regulation.
The Cognitive Assessment System-II (CAS-II) is an individ- The Attention Scale is a measure of the mental pro-
ually administered test of cognitive abilities designed for cesses involved in resistance to distraction and focused
children and adolescents ages 5 through 17 (Naglieri, Das, attention over time. For example, in the Expressive Attention
& Goldstein, 2012). The CAS-II was explicitly constructed subtest, a variation of the Stroop procedure (Stroop, 1935),
to embody the Planning, Attention, Simultaneous, and Suc- the child first reads a long list of color words (Blue, Yellow,
cessive (PASS) theory of intelligence discussed at the begin- Red, Green) repeated in random order, then quickly names
ning of the chapter (Das, Kirby, & Jarman, 1979; Das, blocks of color printed in these four colors. These tasks are
Naglieri, & Kirby, 1994). The Standard Battery consists of 12 preamble to the final task, the only part that is scored. In the
subtests and takes about 60 minutes to complete (Figure final section of the Expressive Attention subtest, a lengthy
5.12). A shorter version of eight subtests is available, but list of the color words (Blue, Yellow, Red, Green) is pre-
most practitioners recommend the full battery because it sented, each word printed in a competing color (e.g., the word
provides a better picture for diagnosis and intervention. Blue printed in red ink), with instructions to name the colors,
not read the words. The raw score is the ratio of the total
number correct to the time needed for completion of the last
Figure 5.12 Cognitive Assessment System-II Scales and
Subtests section. In the Number Detection subtest, the child is required
to underline specific digits in particular fonts, for example,
the task might be to detect the numbers 1, 2, and 3 among
random digits, but only when printed in bold font. In the
Receptive Attention subtest, the child first underlines letter
pairs that are physically the same (e.g., TT but not Tt) and
then underlines letter pairs that are the same name (e.g., Bb
but not Ba). The score is based on accuracy and total time.
The Simultaneous Scale is a measure of the ability to
organize information into coherent wholes. Both nonver-
bal and verbal processes are utilized to analyze and syn-
thesize spatial and verbal relationships. Nonverbal Matrices
is a variation on the familiar matrix reasoning task first
employed in the Raven Progressive Matrices (Raven, 1938)
and found in many intelligence tests. A 3 × 3 matrix of
geometric shapes is shown, with a missing shape in the
lower right-hand corner. Below the matrix are six shapes,
one of which completes the rules of progression in the
The CAS-II provides a standard score (mean of 100, SD matrix from left to right and top to bottom. Based on infer-
of 15) for each of the four process scales (Planning, Atten- ence, the task is to choose the correct shape. In the Verbal
tion, Simultaneous, and Successive), as well as a Full Scale Spatial Relations subtest, the child views six drawings, each
standard score. The 12 subtests are normed to a mean of 10 depicting a particular spatial relationship between shapes,
and SD of 3. The Planning Scale is a measure of the ability and then encounters a series of printed question such as
to develop strategies for task completion. For example, in Show me the square to the right of the circle. The task is to
the Matched Numbers subtest, the child views rows of six choose the one drawing among six that depicts the
132 Chapter 5

relationship. In the Figure Memory subtest, the child views colleagues found that subtest and process scores were the-
a two- or three-dimensional drawing for five seconds, and oretically consistent with current understandings of
then must correctly locate the original drawing embedded ADHD. Specifically, average scores on the four process
within a larger, more complex drawing. The Simultaneous scales were: Planning 89.1, Attention 92.3, Simultaneous
subtests involve the perception of stimuli as a whole, in 101.2, and Successive 101.7 (Naglieri & Paolitto, n.d.).
contrast to what is needed in successive processing. These findings fit well with the hypothesis that children
The Successive Scale involves mental processes needed with ADHD manifest problems with goal-directed plan-
to remember and complete a task in a specific order or ning and show difficulties with attention due to distracti-
sequence. In Word Series, the task is to recall in correct order bility (Barkley, 1996).
a series of two to nine words orally presented at one word An intriguing result with the CAS is that differences
per second. This task is similar to measures of digit span, between Black and White children on the Full Scale score
except words are used instead of digits. The same nine are minimal when key demographic variables such as soci-
words (one-syllable, high-frequency words such as Car, oeconomic status are controlled. Naglieri, Rojahn, Matto,
Dog, Shoe) are used. In the Sentence Repetition subtest, the and Aquilino (2005) found an estimated CAS Full Scale
child reads 20 sentences aloud, one by one. After each sen- mean score difference of 4.8 points between Black (N = 298)
tence is read, the child is asked to repeat it exactly, word for and White (N = 1,691) children, smaller than typically
word, after the sentence is withdrawn from view. Color reported with traditional IQ tests. The relationships
words are used so as to minimize meaning (e.g., The green is between CAS scores and school achievement were strongly
yellowing). The sentences are of varying lengths. The raw positive and highly similar for both groups as well. Over-
score is the number of words correctly recalled. For younger all, these results indicate that the CAS is useful for assess-
children (ages 5 to 7), the child repeats a specific three-word ment in special education. On a similar note, Naglieri and
combination (like cat-book-ball) 10 times in quick succes- Rojahn (2001) found that CAS scores classified a smaller
sion. The raw score is the total time required. In the Sentence proportion of Blacks as having intellectual disability than
Questions subtest (ages 8 to 17), the child answers questions did WISC-III scores. They argued that the problem of dis-
about orally presented sentences similar to those used in proportionate representation of Blacks in special education
Sentence Repetition (e.g., The green is yellowing. Who is yellow- classes might be mitigated if the CAS were used for this
ing?). For younger children (ages 5 to 7), Speech Rates is assessment purpose. The CAS-II is a promising test that
administered instead. This subtest requires the repetition of deserves to see wider use in assessment and research.
a one-syllable and two-syllable word combination 10 times
as quickly as possible. The raw score is the total time needed 5.2.9: Kaufman Brief Intelligence
to complete the repetitions. Correct sequencing of stimuli or
activities is essential to the Successive subtests. Test-2 (KBIT-2)
In addition to 12 subtest scores and 4 process scores, The individual intelligence tests previously discussed in
The CAS also yields a Full Scale score based on the familiar this and the preceding topic are excellent measures of
mean of 100 and SD of 15. Psychometric properties of the intellectual ability, but they are not without their draw-
test are excellent. The average internal consistency reliabil- backs. One problem is the time required to administer
ities are: Planning (.88), Attention (.88), Simultaneous (.93), them. Testing sessions with the Wechsler scales, Kaufman
Successive (.93), and Full Scale (.96). The standardization Assessment Battery for Children, and the Stanford-Binet
sample consisted of 2,200 children and adolescents, strati- easily can last one hour, and two hours is not unusual if
fied on demographic variables to closely match the U.S. the examinee is bright and highly verbal. A second disad-
population (Naglieri, Das, & Goldstein, 2012). The validity vantage to these mainstream tests is the amount of train-
of the CAS-II rests in large measure on its similarity to the ing required to administer them. Proper administration of
first edition, the CAS, which stands up well in factor ana- most individual intelligence tests is based upon the
lytic studies and yields meaningful results for special assumption that the examiner has an advanced degree in
groups. For example, using multigroup confirmatory fac- psychology or a related field and has received extensive
tor analysis, Naglieri, Taddei, and Williams (2012) found supervised experience with the instruments in question.
that the factorial structure of the CAS was highly similar in Alan Kaufman responded to the need for a brief, easily
two cross-cultural samples, one comprised of 1,174 U.S. administered screening measure of intelligence by devel-
children and the other consisting of 809 Italian children. oping the Kaufman Brief Intelligence Test (K-BIT), recently
Further, results for both samples were broadly supportive released in a second edition, the KBIT-2 (Kaufman &
of the four factors of the PASS theory embodied in the CAS. ­Kaufman, 2004). The KBIT-2 consists of a Verbal or Crys-
In a study of 60 children meeting the criteria for Atten- tallized scale that includes two types of items (Verbal
tion-Deficit Hyperactivity Disorder (ADHD), Naglieri and Knowledge and Riddles) and a Nonverbal or Fluid Scale
Theories and Individual Tests of Intelligence and Achievement 133

that consists of Matrices items (2 : 2 and 3 : 3 figural (2001) also found that the K-BIT overestimated WISC-III
analogies). IQs by 1.2 to 5.0 points, on average. However, their study
The KBIT-2 is normed for examinees ages 4 to 90 and also showed that, in individual cases, K-BIT scores can
can be administered in approximately 20 minutes. The test underestimate or overestimate WISC-III scores by as much
yields standard scores with means of 100 and standard as 25 points, reaffirming that the K-BIT is not appropriate
deviation of 15 for Verbal, Nonverbal, and combined for placement and diagnostic purposes. Canivez (1995)
scores. In spite of the comparability of these scoring dimen- found comparable scores between the K-BIT and the WISC-III
sions with well-known intelligence tests, the KBIT-2 for 137 elementary and middle school children and also
authors make it clear that their instrument is not intended reported very strong correlations between the two tests, espe-
as a substitute for traditional approaches (e.g., WPPSI-III, cially for overall scores (r = .87). Eisenstein and Engelhart
KABC-2, WISC-IV, or SB5). The KBIT-2 is mainly a screen- (1997) found that the K-BIT performed well in estimating
ing test useful in signaling the need for more extensive IQs in adult neuropsychology referrals, but Donders (1995)
assessment. The brevity of this test makes it a natural recommends caution when using the test with brain-
choice for research on intelligence. injured children. The reason for caution is that K-BIT scores
show a negligible relationship with length of coma; that is,
the test is not a good index of neuropsychological status in
children. In spite of these cautions about its predecessor,
the KBIT-2 is an outstanding screening measure of general
intelligence for use in research or in those situations listed
earlier in which time constraints preclude use of a longer
instrument.

5.2.10: Individual Tests


of Achievement
Whereas intelligence tests are designed to measure the
broad mental abilities of the individual, achievement tests
The KBIT-2 manual reports highly supportive validity are intended to appraise what a person has learned in
data from numerous correlational studies. However, the school or some other course of study. Group achievement
most compelling evidence for the validity of the instru- tests are paper-and-pencil measures given to dozens of stu-
ment is its strong resemblance to the K-BIT, for which a dents at a time. Our focus here is on individual achievement
substantial body of research has been published. For exam- tests administered one-on-one and, therefore, better suited
ple, Naugle, Chelune, and Tucker (1993) compared K-BIT for the appraisal of learning problems.
results and WAIS-R scores for 200 referrals to a neuropsy- Of course, scores on intelligence and achievement tests
chological assessment center. should bear a strong relationship to one another—brighter
The patient sample included persons with seizure dis- children likely are capable of higher achievement. In fact,
orders, head injuries, substance abuse, psychiatric distur- as we shall see, the notion that intelligence and achieve-
bance, stroke, dementia, and other neurological conditions. ment typically parallel one another is at the very heart of
The heterogeneity of the referral sample guaranteed a wide the concept of learning disability—which commonly
range of functional ability, a desirable feature in a valida- involves a discrepancy between the two. We introduce the
tion study. Although the K-BIT scores tended to be about 5 reader here to the makeup of individual achievement tests
points higher than their WAIS-R counterparts, the correla- as a backdrop to the final topic in this chapter, the assess-
tions between these two instruments were extremely high ment of learning disabilities.
and theory-confirming. Vocabulary IQ (K-BIT) and Verbal More than a dozen individually administered intelli-
IQ (WAIS-R) correlated .83; Matrices IQ (K-BIT) and Per- gence tests exist, but only a few are widely used in clinical
formance IQ (WAIS-R) correlated .77; and overall IQs from and educational assessment. A number of prominent indi-
the two instruments correlated an amazing .88. In a study vidual achievement tests are summarized in Table 5.10.
comparing the K-BIT and the WISC-III scores for 50 Owing to limitations of space, we have selected one test,
referred students, Prewett (1995) also reported strong cor- the Kaufman Test of Educational Achievement-II (KTEA-II),
relations (r = .78 for overall scores) and also discovered for more detailed presentation (Kaufman & Kaufman,
that the K-BIT scores tended to be about 5 points higher 2004b). Readers who seek further information are encour-
than their WISC-III counterparts. In a sample of 65 children aged to consult Sattler (2001) or the Mental Measurements
with reading disability, Chin, Ledesma, Cirino, and others Yearbook series.
134 Chapter 5

Table 5.10 Survey of Widely Used Individual


Achievement Tests

Kaufman Test of Educational Achievement-II


(KTEA-II) The KTEA-II is an untimed test of educational
achievement for children ages 4½ through 25. A brief,
three-subtest version exists and extends the age range to
901, but for diagnostic assessment of learning difficulties
the Comprehensive Form is preferred. The core of the
KTEA-II Comprehensive Form consists of eight subtests in
four areas:

KTEA-II Comprehensive Form Core

In addition to yielding scores on each subtest, the bat-


tery provides three composite scores (Reading, Mathemat-
ics, and Written Language) and a Total Battery Composite.
For diagnostic purposes, a number of supplemental sub-
tests designed to evaluate reading skills (e.g., Phonological
Awareness) are also available. For older children, the test
takes about 80 minutes to administer; for younger children
about 30 minutes are needed. The KTEA-II is co-normed
with the KABC-II.
Brief examples of KTEA-II-like items are shown in
Table 5.11. These examples would be at the upper end of
the subtests, suitable for high school students. The KTEA-
II utilizes entry and exit rules for each subtest to ensure
that students only encounter items of appropriate diffi-
culty. Scoring is objective and highly reliable. Raw scores
Theories and Individual Tests of Intelligence and Achievement 135

are converted to standard scores (mean of 100, SD of 15) for pronouncing words containing digraphs and diphthongs,
each subtest, the composite scores, and the Total Battery and ending in writing and reading sentences containing
Composite. words with vowel digraphs and diphthongs. The KTEA-II
In addition to formal scoring, the KTEA-II provides a manual contains many useful clinical insights with educa-
systematic method for evaluating the qualitative nature of tional ramifications.
subtest errors. For example, on the Spelling subtest, errors The content validity of the KTEA-II appears to be very
can be classified according to whether they involve pre- strong, but this point may vary from one school system to
fixes, suffixes, vowel digraphs (such as ue in blue) and another. After all, individual school systems may choose to
diphthongs, consonant clusters (such as scr in unscrupu- emphasize different domains of achievement. Salvia and
lous), r-controlled patterns (such as er in inferior), and sev- Ysseldyke (1991) warn that users must be sensitive to the
eral other patterns. correspondence of test content with the students’ curricu-
lum. As with any achievement test, the user should verify
that the content of the KTEA-II is appropriate within the
Table 5.11 Examples of Characteristic KTEA-II Items
Applicable to Older Children
curricular setting. Nonetheless, Kaufman and Kaufman
(2004b) offer sufficient evidence for the validity of the test
Letter and Word Recognition
to make a case for general adequacy.
The examiner points to each word in turn and says, “What word is this?”
duodecagon 5.2.11: Nature and Assessment
obstreperous
of Learning Disabilities
correlative
Because individual intelligence and achievement tests are
indolence
foundational to the assessment of learning disabilities, we
perspicacity
close this chapter with brief review of this topic. The learn-
Reading Comprehension
ing disability (LD) field is one of the fastest growing areas
The examiner says, “Do what this says.”
within assessment. Paradoxically, it is also one of the most
Utter a fallacious response to the question, “How many eyes does a
cyclops have?”
controversial and perplexing domains of psychological
testing. Some background is needed to understand the role
Math Concepts and Applications
of intelligence and achievement tests in the evaluation of
The examiner says, “The Missoula Muggers played 80 ball games last
year. They won 16 games. What percentage of the games did they win?” learning disabilities. We begin by asking a seemingly sim-
Mathematics Computation ple question that turns out to have a complicated answer:
The examiner says “Now I want you to work these problems.” What is a learning disability?
(X - 7)(X - 9) =   5 lb   5 oz
Definitions of learning disability have gone through at
-2 lb 14 oz least three phases in the last several decades. Early views
were influenced heavily by federal legislation and relied on
Written Expression
a discrepancy between intelligence and achievement as the
The examiner shows a picture depicting people interacting and asks the
student to write a story about the picture. defining characteristic. These ideas were followed by a
Spelling model that featured intra-individual weakness in one or
The examiner explains the rules for a traditional spelling test concluding more core psychological processes as the essential attribute.
with, “I want you to write the word on this sheet.” Most recently, responsiveness to intervention has been fea-
“Paramour. One’s lover is called a paramour.” tured as the prevailing quality. We turn now to a survey of
Listening Comprehension these shifting paradigms in the history of LD assessment.
The examiner plays an audio CD track of a story. Then the examiner asks
questions about the story designed to assess comprehension. The Federal Definition of Learning Disabili-
Oral Expression ties For decades the essential nature of learning disabili-
The student is shown a full-color picture and then asked to tell a story
ties was understood in terms of a definition embedded in
about it. Due to similar format, results can be compared to Written federal law. In 1975, Congress passed Public Law 94-142,
Expression.
the Education for All Handicapped Children Act. One of
the provisions of this act was a definition of learning disa-
Kaufman and Kaufman (2004b) stress that the error bilities as follows:
analysis provides the diagnostician with a source of infor-
The term “specific learning disability” means a disorder
mation from which instructional objectives can be devel- in one or more of the basic psychological processes
oped. For example, a weakness in vowel digraphs and involved in understanding or in using language, spoken
diphthongs on the Spelling subtest translates directly to or written, which may manifest itself in imperfect ability
classroom objectives: practice in the spelling and reading to listen, speak, read, write, spell, or to do mathematical
of these elements in isolation, progressing to spelling and calculations. The term includes such conditions as
136 Chapter 5

­ erceptual handicaps, brain injury, minimal brain dys-


p The National Joint Committee on Learning
function, dyslexia, and developmental aphasia. The term Disabilities Definition After a lengthy period of
does not include children who have learning disabilities confusion and struggle over the definition of learning dis-
which are primarily the result of visual, hearing, or motor abilities, specialists and educators began to rally around a
handicaps, of mental retardation, or emotional distur- consensus view in the early 1990s. The new definition was
bance, or of environmental, cultural, or economic disad-
proposed by the National Joint Committee on Learning
vantage. (USDE, 1977, p. 65083)
Disabilities (NJCLD), a group of representatives from eight
The commitment to a federally mandated definition national organizations with a special interest in learning
was reaffirmed in 1990 by passage of Public Law 101-476, disabilities. Although similar to the federal definition, the
the Individuals with Disabilities Education Act (IDEA). new approach contains important contrasts:
The federal definition embodied in IDEA also stipu-
Learning disabilities is a general term that refers to a heter-
lated an operational approach to the identification of chil-
ogeneous group of disorders manifested by significant
dren with learning disabilities. Specifically, candidates for
difficulties in the acquisition and use of listening, speak-
an LD diagnosis had to demonstrate a severe discrepancy
ing, reading, writing, reasoning, or mathematical abili-
between general ability (intelligence) and specific achieve- ties. These disorders are intrinsic to the individual,
ment in one or more of these seven areas: presumed to be due to central nervous system dysfunc-
tion, and may occur across the life span. Problems in self-
Oral expression
regulatory behaviors, social perception and social
Listening comprehension interaction may exist with learning disabilities but do not
Written expression by themselves constitute a learning disability. Although
Basic reading skill learning disabilities may occur concomitantly with other
handicapping conditions (for example, sensory impair-
Reading comprehension
ment, mental retardation [MR], serious emotional distur-
Mathematics calculation bance [ED]) or with extrinsic influences (such as cultural
Mathematics reasoning differences, insufficient or inappropriate instruction),
they are not the result of those conditions or influences.
The discrepancy model for the identification of LD
(NJCLD, 1988, p. 1)
children functioned as a directive for school psycholo-
gists. In effect, the model mandated that psychologists The new definition avoided vague reference to “basic
should administer an individual intelligence test (general psychological processes,” specifies that the disorder is
ability measure) and an individual achievement test (spe- intrinsic to the individual, identifies central nervous system
cific achievement measure) and then look for a discrep- dysfunction as the origin of LD problems, and states explic-
ancy between Full Scale IQ and one or more areas of itly that learning disabilities may extend into adulthood.
school achievement (e.g., reading, mathematics, written Perhaps most important of all, the NJCLD approach
expression). abandoned the excessive reliance upon discrepancy
In practical terms, a severe discrepancy was defined as between ability and achievement as the hallmark of LD.
a difference of one standard deviation or more between Instead, the new model specified that the necessary (but not
general intelligence and specific achievement. A common sufficient) condition of LD was that the individual (child or
practice in identification of LD children was to compare adult) exhibit an intraindividual weakness in one or more
Full Scale IQ on an individual intelligence test such as the of the core areas of academic functioning (listening, speak-
WISC-III with specific achievement scores on an individual ing, reading, writing, reasoning, or mathematical abilities).
achievement test such as the WIAT (Wechsler Individual Shaw et al. (1995) described how the NJCLD model might
Achievement Test) or similar instrument that has subtests look in practice. In this approach, the first task is to identify
normed with a mean of 100 and a standard deviation of 15. one or more intraindividual weaknesses in the core areas.
A difference of 15 points or more between Full Scale IQ and These are always relative to strengths in several other core
specific achievement in any of the previously listed areas areas. In other words, persons who are slow learners in all
would then raise the suspicion of learning disability. areas do not meet the criteria of LD. The second step is to
Unfortunately, the federal definition did not serve its trace the learning difficulties to central nervous system dys-
intended purposes, and, increasingly, school psychologists function, which may manifest as problems with informa-
and other professionals looked to other approaches for tion processing. For example, a young adult with a severe
understanding and assessing learning disabilities in chil- weakness in listening (as judged by her inability to learn
dren. The fundamental problem was that many, many chil- from the traditional lecture approach to teaching) might
dren who exhibit serious learning problems in school and exhibit a deficit on a test of verbal memory—confirming
who would benefit from services for LD simply did not that an information processing problem was at the heart of
meet the psychometric criteria of a severe discrepancy. her disability. The purpose of the third step (examining
Theories and Individual Tests of Intelligence and Achievement 137

psychosocial skills, physical and sensory abilities) is to In sum, RTI is a shift in perspective that focuses on
specify additional problems that may need to be addressed early results and outcomes with at-risk children instead of
for program-planning purposes. Finally, in the fourth step later spending excessive time and resources on questions
the examiner rules out non-LD explanations for the learn- of discrepancy-based eligibility after children are already
ing difficulties (since these explanations would mandate a failing because of their LD. The hope is that the RTI per-
different strategy for remediation). spective will catch at-risk children earlier and thereby
reduce the number of children needing special education
The New Face of Learning Disabilities: services.
Response to Intervention In 2004, Congress reau-
thorized the Individuals with Disabilities Education Act Essential Features of Learning Disabili-
(IDEA), which is the ongoing legislation governing special ties Even though the definition of LD remains a point of
services, including the assessment of LD, in school systems contention, we can cite several features of these disorders
that receive federal funding. IDEA 2004 changed the law that are less controversial. As the reader will discover, the
about how to identify children with specific learning disa- features discussed in the following dictate, to some extent,
bilities by moving away from the discrepancy model that the nature of testing practices in the assessment of learning
had reigned supreme since the 1970s. Instead, the new law disabilities. There is general agreement—with occasional
recommended response to intervention (RTI) as the pre- dissenting votes—on five features of learning disabilities.
ferred method for identifying children with learning disa- First, a learning disability involves an intraindividual
bilities. In particular, IDEA 2004 says that a school “may discrepancy in cognitive functioning. The child (or adult)
use a process that determines if the child responds to scien- with LD reveals a relative weakness in one area compared
tific, research-based intervention as part of the evaluation to strengths in most other areas. According to the federal
procedures . . . ” in its evaluation for LD. definition followed within many school systems, the dis-
RTI is a broader concept than LD and refers both to (1) crepancy is between general ability (intelligence) and spe-
methods for increasing the capacity of school systems to cific achievement. We have described previously some of
respond effectively to the diverse academic needs of stu- the pitfalls of this definition and prefer the NJCLD
dents and (2) approaches for identifying LD children who approach in which the discrepancy is not rigidly tied to a
need special education services. The RTI approach specifi- difference between IQ and achievement test scores.
cally deemphasizes cognitive discrepancies in the diag- Second, an exclusionary clause is included in most
nostic process, focusing instead on low age-based definitions of learning disability. If the academic difficul-
achievement levels and failure to respond to evidence- ties are primarily caused by other disabling conditions
based instructional approaches (Fletcher & Vaughn, 2009; (mental retardation, emotional disturbance, visual or
Torgerson, 2009). hearing impairment, cultural or social disadvantage), then
The implementation of RTI is complicated and multi- a diagnosis of learning disability is typically ruled out.
faceted. The process involves multiple feedback loops and This clause is often misinterpreted. A person can be both
decision points. Yet, proponents of RTI view it as an learning disabled and impaired in other ways (e.g., have
improvement because it provides for early, preventive mental retardation). The important point is that the coex-
intervention in contrast to the “wait to fail” approach of isting condition must not be the primary cause of the
the discrepancy model. Fuchs and Fuchs (2005) describe a learning difficulties.
systematic approach to using RTI in a school system. The Third, Learning disabilities are heterogeneous; that is,
first step is school-wide screening in the first weeks of the there are many different varieties. Research on the identifi-
school year to identify children “at risk” for school failure. cation of subtypes is still in its infancy, but most research-
Those scoring below a certain prescribed cut-off (perhaps ers express optimism that meaningful subgroups of
the 25th percentile in reading or math) would be noted. persons with learning disabilities can be identified. Pend-
Teachers would then implement empirically validated cur- ing further research and refinement, only two broad cate-
ricular interventions for these children, who would be gories of learning disability are recognized currently. These
monitored for progress after eight weeks. Those who do two types are dyslexia or verbal learning disability, and
not respond would receive another interval of supplemen- right hemisphere or nonverbal learning disability. Our cov-
tary instruction for an additional eight weeks. Those who erage here is based on Forster (1994). The primary manifes-
still do not respond would receive a comprehensive, indi- tation of dyslexia is an unexpected difficulty in learning to
vidualized evaluation to rule out sources of underachieve- read or spell. The fundamental deficiency is thought to be a
ment such as intellectual disability, visual problems, or problem with phonological coding, which is the ability to
emotional disturbance. Finally, with the involvement of automatically associate sounds with specific letter combi-
parents, the child would receive a designation of LD and nations. Verbal learning disability constitutes about 90 per-
become eligible for special education placement. cent of all LD cases, and is much more common in boys
138 Chapter 5

than girls. In contrast, right hemisphere or nonverbal learn- Vaughn and Haager (1994) provide an excellent overview
ing disability manifests as poor skills in mathematics, on the measurement of social skills in persons with learn-
handwriting, and, often, social cognition. The fundamental ing disability.
problem is thought to be a problem in spatial cognition,
Causes and Correlates of Learning Disabili-
which is the visuospatial perception of relationships. The
ties Approximately 4 to 5 percent of all school-aged
problem likely originates in right cerebral hemisphere dys-
children receive a diagnosis of LD, so this is not a rare
function, and constitutes about 10 percent of all LD cases.
problem (Lyon, 1996). The most common form of LD is
Boys and girls are equally affected.
dyslexia, and boys outnumber girls by about 3:1 or 4:1
Fourth, a learning disability is a developmental phe-
(Forster, 1994). In a minority of cases, the etiology is clear
nomenon that is usually evident in early childhood that
and can be attributed to a specific cause such as a known
may persist into adulthood. Even though remediation
brain injury. Left hemisphere impairment is especially
efforts should be based upon optimism—so as to avoid
likely to result in verbal difficulties, whereas right hemi-
self-fulfilling prophecies—a dose of realism is needed, too.
sphere impairment may lead to problems with spatial
Longitudinal studies of children with severe learning dis-
thinking or other nonverbal skills. Thus, head injury or
abilities suggest that marked improvement in academic
other neurological problems can be the proximate cause of
achievement is the exception, not the rule, even when these
a child receiving an LD diagnosis.
subjects receive intensive educational intervention. For
However, in the majority of cases the direct etiology
example, Frauenheim and Heckerl (1983) re-tested 11
of LD problems is unclear. A number of possibilities have
adults diagnosed as having learning disabilities in child-
been proposed and these may explain some but not all
hood. All the participants had received special help for
cases of LD. For example, pathological neurodevelopmen-
reading; nine had graduated from high school, and two
tal processes have been identified in some persons with
completed the 10th grade. Full Scale IQs were typically in
severe dyslexia (Culbertson & Edmonds, 1996). Individu-
the low 90s, with Verbal IQ below average (mean of 85)
als with this disorder appear to have alterations in brain
and Performance IQ above average (mean of 104). In spite
structures such as the planum temporale (the flat surface
of the remedial intervention, when retested as adults on
on the top of the temporal lobes) known to be important
exactly the same achievement test (Wide Range Achieve-
for language processing. Whereas in normal individuals
ment Test), these examinees were scarcely improved from
the planum temporale is much larger in the left temporal
their elementary school results. These findings are corrobo-
lobe than in the right, persons with severe dyslexia do not
rated by several other follow-up studies (see Kolb &
show this pattern of asymmetry (tending toward symme-
Whishaw, 1990, chap. 29, for a review). Such results indi-
try instead). Moreover, researchers have identified micro-
cate that specialists who work with children with learning
scopic cortical malformations called polymicrogyria
disabilities should not become fixated solely on academic
(numerous small convolutions) that parallel these struc-
concerns. Social and emotional problems—which may be
tural differences. Several postmortem studies of persons
more amenable to intervention—also cry out for notice.
with severe dyslexia have revealed these deviations at the
Fifth, individuals with learning disabilities frequently
cellular level. Spreen (2001) provides an outstanding
experience social and emotional difficulties that are as
review of the possible neurological substrates of learning
pervasive and consequential as the deficits in academic
disabilities. Dyslexia also appears to show a significant
achievement. These problems may persist into adoles-
genetic component for some persons such that the idea of
cence and adulthood. In fact, the socioemotional sequelae
familial dyslexia needs to be taken seriously. However,
often become the primary presenting complaint, which
what must be emphasized is that for most individuals the
can complicate the testing process and obscure the diag-
etiology of LD (whether dyslexia or other forms) remains
nosis. For example, in a needs assessment study of 381
a mystery.
adults with learning disabilities, Hoffman, Sheldon, Min-
skoff, and others (1987) identified several crucial nonaca- Achievement Tests in LD Assessment: A Final
demic areas meriting intervention by service providers. Word  Learning disabilities manifest primarily as aca-
These adults self-endorsed several social and emotional demic problems; that is, a child with LD is typically unable
problems with high frequency: feeling frustrated (40 per- to master skills important for school success such as read-
cent), talking or acting before thinking (33 percent), being ing, mathematics, or written communication. Because
shy (31 percent), no self-confidence (28 percent), control- school-based accomplishment is at the heart of the problem,
ling emotions and temper (28 percent), and dating (27 an evaluation for LD must include relevant measures of
percent). Many other problems were also endorsed, but academic achievement. Furthermore, the evaluation of
by less than 25 percent of the sample. These findings indi- school achievement—one small part of an LD assessment—
cate that learning disability assessments should incorpo- must be based on an individual test of achievement. Even
rate measures of social and emotional functioning. though a group achievement test might raise the suspicion
Theories and Individual Tests of Intelligence and Achievement 139

of a learning disability, practitioners must rely on individ- however, little insight is gained from mere scores. What the
ual achievement tests for definitive assessment. examiner should seek to know is the qualitative nature of
Individual achievement tests typically are adminis- the problem, not just its quantitative dimensions. Individ-
tered one-on-one with the examiner sitting across from the ual achievement tests are invaluable in this regard. By
respondent and posing structured questions and problems. observing the details of deficient performance, an astute
Of course, any well-standardized achievement test will examiner can form hypotheses about the origin of an
yield normative data about the functioning of a school- achievement problem. For example, a child whose spelling
child. But the special virtue of individual achievement tests is phonetically correct is at least hearing the words correctly,
is that the examiner can observe the clinical details of defi- whereas a child with nonphonetic spelling might very well
cient (or superior) performance and form hypotheses about display a problem with auditory processing of speech
the cognitive capacities of the examinee. sounds.
Consider the problem of poor spelling, widely
observed in children and adults with verbal LD. Any good Chapter Quiz: Theories and Individual Test of Intelligence
spelling achievement test will document the disability; and Achievement
Chapter 7
Testing Special Populations
Learning Objectives
7.1 Explain how the responses to ambiguous 7.2 Discuss a case based on assessing the
stimuli reveal the innermost, unconscious intelligence of persons with disabilities
mental processes of the examinee

7.1: Infant and Preschool as 7.5 million U.S. citizens manifest intellectual disabilities,
and 1 in 10 families are directly affected by this functional
Assessment impairment (Grossman, Richards, Anglin, & Hutson, 2000).

7.1 Review the nature and application of infant and


early childhood assessment devices 7.1.1: Assessment of Infant
The individual and group tests reviewed in previous chap- Capacities
ters are suitable for persons with normal or near-normal The infant and preschool period extends from birth to
capacities in speech, hearing, vision, movement, and gen- roughly 6 years of age. The changes that occur during this
eral intellectual ability. However, not every examinee falls period obviously are profound. The infant develops basic
within the ordinary spectrum of physical and mental abili- reflexes, masters developmental milestones (grasping,
ties. By reason of immature age, physical disability, lan- crawling, sitting, standing, and so forth), learns a language,
guage weakness, or diminished intellect, a large proportion and establishes the capacity for symbolic thought. For most
of the population falls outside the reach of traditional tests children, the pattern and pace of development is visibly
and procedures. within normal limits.
Infants and very young children certainly require However, parents and professionals trained in the
exceptional approaches to assessment because of their lim- assessment of infants and preschoolers occasionally
ited capacities for communication. In Module 7.1, Infant encounter children whose development seems to be slow,
and Preschool Assessment, we review the nature and delayed, or even overtly impaired. These children elicit a
application of infant and early childhood assessment flurry of anxious questions: How delayed is this child?
devices and then investigate a fundamental question per- What are the prospects for normal functioning in school?
taining to these tests: What is the practical utility of testing Will this child achieve personal independence in the adult
children early in life? In particular, is there any predictive years?
validity for test results obtained from infants or toddlers? Another area of concern for many parents is the emo-
If instruments for very young examinees do not predict tional development of their infants and children. Even
important outcomes later in life, then using them would normal children display trials and challenges that would
appear to be pointless and perhaps even misleading. We test the saints. Visit any busy shopping mall and you will
examine this quandary in some detail. Finally, we conclude encounter scenes of hysterical, screaming children with
the topic with a discussion of an important application of frazzled parents attempting to cope. Listen to any honest
preschool testing—screening for school readiness. In Mod- parent with a toddler and you will hear a story or two of
ule 7.2, Testing Persons with Disabilities, we scrutinize a food smeared on walls, puppies tormented, obstinate
variety of tests needed for the assessment of individuals refusal to stay in bed, or similar unpleasant actions. At
with special needs. These special needs cover a wide spec- what point do difficult and problematic behaviors por-
trum, including language, hearing, and visual impair- tend a life of emotional troubles, when not promptly
ments. Of course, persons with developmental disabilities treated?
also require special approaches to assessment, and we pro- At the opposite extreme are those precocious children
vide coverage of this field as well. By one estimate, as many who achieve developmental milestones months or years

180
Testing Special Populations 181

ahead of the normative schedule. In these cases, the proud 28 behavior items, each scored on a 9-point scale. Examples
parents have a different set of concerns: How advanced is of the behavior items include the following:
my child? What are the strongest and weakest areas of
• Response decrement to light
intellectual functioning? Will this child be a gifted adult?
• Orientation to inanimate visual stimulus
Infant and preschool assessment tools can help answer
questions about the intellectual and emotional develop- • Cuddliness
ment of children, whether they are developmentally • Consolability
delayed, intellectually gifted, at-risk for emotional disor-
In addition, the infant’s neurological status is evaluated on
der, or within the normal spectrum. In this topic, we
18 reflex items, each scored on a 4-point scale. Examples
review the nature and application of representative infant
include the following:
and preschool measures. These tools include individual
tests, developmental schedules, and rating scales. We • Plantar grasp
begin with a description of several prominent instruments • Babinski reflex
and then investigate the fundamental question of purpose • Rooting reflex
or utility. What is the use of these measures? What is the • Sucking reflex
meaning of a score on a developmental schedule or pre-
school intelligence test? To what extent do these proce- Finally, seven supplementary items can be used to summa-
dures allow us to prognosticate adult functioning or, for rize the qualities of responsiveness of frail, high-risk
that matter, help us to predict early school performance? infants, including these:
These questions will be more meaningful if we first review • Quality of alertness
the relevant instruments.
• General irritability
We divide the review into two parts: infant measures
• Examiner’s emotional response to infant
for children from birth to age 2½, and preschool tests for
children from age 2½ to age 6. The division is somewhat Brazelton and Nugent (1995) do not provide an inte-
arbitrary, but not entirely so. Infant tests tend to be multidi- grative scoring system; that is, there are no summary
mensional and to load significantly on sensory and motor scores for the entire battery or its subcomponents. Instead,
development. Beginning at age 2½, standardized measures the “scoring” of the NBAS consists of a summary sheet
such as the Stanford-Binet: Fifth Edition, Kaufman Assess- with ratings on each specific item. In clinical work, the
ment Battery for Children-2, and Differential Ability Scales- instrument is used to provide feedback to parents. Specifi-
II are typically used in the assessment of preschool cally, Brazelton recommends that health care professionals
children. These tests load heavily on cognitive skills such demonstrate the NBAS in order to sensitize parents to
as verbal comprehension and spatial thinking. Thus, infant their baby’s uniqueness and to promote a positive parent–
scales and preschool tests measure somewhat different infant relationship. Hawthorne (2009) describes the clini-
components of intellectual ability. cal application of the instrument for promoting successful
caregiving strategies. Regarding clinical use of the test,
Neonatal Behavioral Assessment Scale (NBAS)
Fowles (1999) compared mothers who received a demon-
The Neonatal Behavioral Assessment Scale (NBAS) is
stration of the NBAS with a matched control group and
unique because of its theoretical basis, which emphasizes
showed that the intervention group subsequently rated
the need to document the contributions of the newborn to
their infants as significantly more predictable. Thus, the
the parent–infant system. The pediatrician T. Berry Brazel-
NBAS was found to be useful in helping mothers antici-
ton (Brazelton & Nugent, 1995) developed this instrument
pate their infants’ responses to environmental stimuli.
to identify and understand the “deviant” infant and to
However, based on a comprehensive review of published
explore the baby’s reciprocal impact on parents:
studies, Britt and Myers (1994) provide a less optimistic
My goal in developing the NBAS was to assess the baby’s review of the effects of the NBAS intervention, noting
contributions to the failures that resulted, when parents inconsistent findings in areas such as parent–infant inter-
were presented with a difficult or deviant infant. If we action, infant development, temperament, and parental
could understand the reasons behind the infant’s deviant attitudes and satisfaction.
behavior, perhaps we could in turn lead parents to a bet- For research on newborn outcomes, various investiga-
ter understanding of their role. This then could lead to a
tors have developed scoring systems for the NBAS, includ-
more optimal outcome.
ing a popular seven-cluster scoring method proposed by
(Brazelton & Nugent, 1995)
Lester (1984). This method provides summary scores for
The NBAS is suitable for infants up to two months of identified clusters (habituation, orientation, motor perfor-
age but is most commonly administered in the first week of mance, arousal/lability, regulation, autonomic stability,
life. The scale assesses the infant’s behavioral repertoire on and reflexes). Using a quantitative scoring approach,
182 Chapter 7

researchers have linked prenatal cocaine exposure to The five major clusters listed above each yield a com-
i­nferior performance on the NBAS (Morrow et al., 2001; posite score reported as a standard score (M = 100, SD =
Schuler, 1999). In addition, the NBAS is also sensitive to 15). Note that the Bayley-III does not yield an overall score
the detrimental effects of polychlorinated biphenyls akin to an IQ score on a traditional test. Such a score could
(PCBs) on babies born to women who consumed contami- be misleading in light of the broad range of diverse skills
nated Lake Ontario fish (Stewart, Reihman, Lonky, D ­ arvill, now assessed in the third edition of the test. Instead, the
& Pagano, 1999). The NBAS also shows sensitivity to the instrument seeks to yield a profile of scores useful in infant
impact of major depression in mothers by revealing assessment and diagnosis. To this end, all scores on the
greater arousal and less attentiveness to face/voice stim- instrument (including the many subscales listed above)
uli in their newborn babies (Hernandez-Reif, Field, Diego, can be reported as scaled scores (mean = 10, SD = 3) for
& Ruddock, 2006). Further, the instrument is sensitive to purposes of intra-individual comparison. This yields a use-
changes in feeding behavior of premature infants (Medoff- ful chart that helps pinpoint areas of needed intervention.
Cooper & Ratcliffe, 2005). In general, these studies demon- For example, the child depicted in Table 7.1, a 37-month-
strate the value of the NBAS in a wide variety of research old boy referred for assessment, appears to present with
endeavors with infants. mild intellectual disability characterized by problems with
In spite of the proven utility of the NBAS as a clinical expressive communication, fine motor skills, communica-
and research tool, reviewers have been somewhat skeptical tion, functional pre-academics, and self-direction.
about the psychometric properties of the instrument. For
example, Majnemer and Mazer (1998) point to very low
test–retest reliability coefficients (r = -0.15 to + 0.32 for the Table 7.1 Bayley-III Scaled Score Results for a 37-Month-
individual items) and weak interrater agreement. One Old Infant
likely explanation is that in newborn infants, individual Cog Language Motor SE
traits may fluctuate rapidly over short periods of time,
Cog RC E FM GM SE
which would produce an underestimate of true reliability
6 7 4 3 8 4
when the NBAS is given twice over a period of days or
weeks. For this reason, deviant scores from a single admin- Adaptive Behavior

istration of the NBAS should not be overinterpreted. Com CU FA HL HS LS SC SD Soc MO


4 7 4 8 7 7 5 4 6 6
Bayley-III  Originally released in 1969, the Bayley test is
Cog = Cognitive, RC = Receptive Communication, EC = Expressive
now in its third edition (Bayley, 2006). Suitable for children
Communication, FM = Fine Motor, GM = Gross Motor, SE = Social-
1 month to 42 months of age, this instrument is an impor- Emotional, Com = Communication, CU = Community Use,
tant mainstay for the evaluation of developmental delay in FA = Functional Pre-Academics, HL = Home Living, HS = Health
infants and toddlers. Known formally as the Bayley Scales and Safety, LS = Leisure, SC = Self-Care, SD = Self-Direction,
of Infant and Toddler Development-III and informally as Soc = Social, MO = Motor.
NOTE: An average score in the general population is 10, and scores
the Bayley-III, the most recent version represents a vast between 8 and 12 typically are considered normal. Scores of 4 or below,
extension and revision of the earlier editions. For example, indicated in bold, are areas of potential concern.
the first edition of the test evaluated only the cognitive and
motor capacities of infants, whereas the latest edition pro-
The technical quality and excellent standardization of
vides for the assessment of five domains. The domains and
the Bayley-III mark this test as the psychometric pinnacle
representative capacities tested are listed here.
of its field. The normative sample of 1,700 children was
stratified according to age and essential demographic vari-
The Domains of Bayley-III Test ables, and the test developers also collected extensive data
on children with high-incidence clinical diagnoses such as
autism and intellectual disability. Internal consistency reli-
ability of the five composite scores appears to be strong,
with average reliability coefficients as high as .93 (Lan-
guage) and .91 (Cognitive). Test–retest reliability over a
short period (average of 6 days) is predictably lower, with
coefficients ranging from .67 (Fine Motor) to .80 (Expres-
sive Communication). Average stability coefficient across
all ages for the major composites was .80, which is decent
given that infants and toddlers are notoriously distractible.
Validity evidence for the Bayley-III is scant at this time,
but wholly supportive. For example, confirmatory factor
Testing Special Populations 183

analysis of the subtests of the Cognitive, Language, and strengths-based approach that concentrates on protective
Motor scales supported the three-factor model across all factors at three levels: environmental (high-quality child-
age groups of the standardization sample, except for the care and schools), family (nurturing parents and extended
youngest age group (Bayley, 2006). Concurrent validity family), and within-child (adaptive personality traits).
coefficients with other instruments are strong as well. For LeBuffe and Naglieri (1999b) summarize the essentials:
example, The WPPSI-III Full Scale IQ scores correlated .72
Children whose behavior reflects these protective factors
to .79 with Bayley-III Cognitive composites. Correlations of
tend to have positive outcomes despite stress and are
the Motor and Adaptive Behavior composites with suitable
often characterized as resilient. Children lacking or with
instruments also were appropriately strong, on the order of underdeveloped protective factors are more likely to
.50 to .70. We agree with reviewers who assert that the Bay- develop emotional and behavioral problems under simi-
ley-III continues to set the standard for early childhood lar risk conditions and are described as vulnerable (p. 75).
assessment, and will maintain its status as the most fre-
quently used measure of infant and toddler development The purpose of appraising protective factors is so that
(Albers & Grieve, 2007). interventions can build upon the child’s strengths. The
focus on resilience provides a hopeful supplement to the
Devereux Early Childhood Assessment-Clinical
usual, customary appraisal of problem areas.
Form (DECA-C) The Devereux Early Childhood Assess-
In addition to protective factors, the DECA-C also pro-
ment-Clinical Form (DECA-C) is a refreshing addition to
vides a well-conceived analysis of behavioral concerns.
the assessment field. The scale is designed for the assess-
When combined, the four problem scales yield a Total
ment of preschoolers aged 2:0 through 5:11 with social and
Behavioral Concerns score that indicates the vulnerability
emotional troubles or significant behavioral concerns (LeB-
of the child to social and emotional difficulties. These scales
uffe & Naglieri, 1999a,b, 2003). What makes the instrument
include:
unique is the noteworthy focus on protective factors that can
buffer the impact of social, emotional, or behavior difficul- ATTENTION PROBLEMS: Assesses the child’s abil-
ties. DECA-C consists of three protective factor scales (Ini- ity to focus on a task and ignore distracting environ-
tiative, Self-control, and Attachment), as well as four mental stimuli. Items resemble: “Loses focus on the
problem scales (Attention Problems, Aggression, With- task at hand.”
drawal/Depression, and Emotional Control Problems). AGGRESSION: Measures aggressive or destructive
The measure can be completed by both parents and teach- acts directed at other persons or things. Items resem-
ers. The response options for the 62 items require that the ble: “Destroys personal property of others.”
parent or teacher rate the frequency of various behaviors
WITHDRAWAL/DEPRESSION: Assesses self-­
on a 5-point scale (never, rarely, occasionally, frequently, very
absorption and emotional/social withdrawal. Items
frequently).
­resemble: “Appears wrapped up in his/her own world.”
When combined, the three protective factor scale
scores provide a Total Protective Factors score that indi- EMOTIONAL CONTROL PROBLEMS: Measures
cates possible sources of resilience for the child. These difficulties in controlling negative emotions that in-
scales include: terfere with goal directed behavior. Items resemble:
­“Loses temper when things don’t go his/her way.”
The Factor Scale of the Devereux Early Childhood
Standardization of the DECA-C is exemplary, based on
Assessment-Clinical Form (DECA-C)
1,108 preschool-aged children rated by parents or teachers.
The sample approximated national data for preschoolers
with respect to race, ethnicity, geographic region, and fam-
ily income. Internal consistency reliability with these sam-
ples was good. For the parents, coefficient alphas for the
subscales were typically in the high .70s (median .78),
whereas the values for teachers were higher, typically in the
high .80s (median .88). Discriminant analysis with the Total
Behavior Concerns scale scores revealed a 74 percent accu-
racy in classifying clinical versus community cases, sug-
gesting good criterion validity (LeBuffe & Naglieri, 1999b).
Several recent studies support the validity and utility
The DECA-C is based, in part, on resilience theory, as of the DECA-C. Ogg et al. (2010) conducted a confirmatory
proposed by Werner (1990) and described by others (e.g., factor analysis of scores for 1,344 children on the protective
Masten, Best, & Garmezy, 1990). Resilience theory is a factors scales, and determined that the factor structure
184 Chapter 7

­ roposed by the original authors was adequate, with


p • Kaufman Assessment Battery for Children-2 (KABC-2)
minor modifications in wording. Specifically, a few items • Differential Ability Scales-II (DAS-II)
revealed differential item functioning for boys versus girls, • Wechsler Preschool and Primary Scale of Intelligence-
suggesting that minor adjustments to item wordings IV (WPPSI-IV)
would strengthen their respective subscales. Jaberg, Dixon,
• Stanford-Binet Intelligence Scales for Early Childhood,
and Weis (2009) replicated the original factor structure as
Fifth Edition (Early SB5)
well and found adequate internal consistency for the pro-
tective factors scales in a sample of 780 kindergarten chil- The KABC-2 was described in the previous chapter.
dren. Lien and Carlson (2009) favorably describe use of the We will focus here on the Differential Ability Scales-II, the
instrument with Head Start populations. WPPSI-IV, and the Early SB5.

Additional Measures of Infant Capacity As


Differential Ability Scales-II  The Differential
we have learned, the assessment of infants can be vital and
Ability Scales-II (DAS-II) is the latest edition of a highly
yet is so tricky. Infants ordinarily do not follow directions
respected test initially published in 1990 (Elliott, 1990,
and they may not be able to verbalize what they know.
2007). The test consists of three batteries: The Early Years
Assessment is a huge challenge. Nonetheless, dozens of
Battery (lower-level) for ages 2-6 to 3-5, the Early Years Bat-
test developers have risen to the summons. Even a brief
tery (upper-level) for ages 3-6 to 6-11, and the School-Age
review of alternative instruments would be chapter-length.
Battery for ages 7-0 to 17-11. We focus here on the battery
We refer the reader to the remarkable 400-page review pro-
used with preschool children aged 3-6 to 6-11.
vided by Berry, Bridges, and Zaslow (2004), which is avail-
The DAS-II includes 10 core subtests and 10 diagnostic
able online at www.childtrends.org. This compendium
subtests; however, rarely is a child administered all 20 sub-
provides thoughtful reviews of dozens of scales for learn-
tests. The core subtests are the primary measures of cogni-
ing, cognition, language, literacy, math, social-emotional,
tive abilities, whereas the diagnostic subtests provide
and Head Start outcomes.
supplementary information about school readiness and
information processing. The particular combination of sub-
7.1.2: Assessment of Preschool tests administered depends on the child’s age, ability level,
Intelligence and the purposes of assessment. For preschool children
age 3½ and above, a comprehensive test battery would
Preschool children exhibit wide variability in emotional
include six core subtests and seven diagnostic subtests,
maturity and responsiveness to adults. One child may
which are described in Table 7.2.
warm up to the examiner and strive for optimal perfor-
The core subtests are heavily saturated with the g fac-
mance on all questions. Another child may stare mutely at
tor and are used to derive three core cluster scores (Verbal,
the floor rather than attempt a simple block design task.
Nonverbal Reasoning, and Spatial) and an overall compos-
For the first child, we can be rest assured that the test
ite score known as General Conceptual Ability (GCA). An
results are an appropriate index of cognitive functioning.
optional cluster score known as the Special Nonverbal
But for the second child, uncertainty prevails. Does the
Composite (SNC) can be computed from four nonverbal
nonresponsiveness signal a lack of skill or a lack of coop-
subtests as well. In developing the DAS and its revision,
eration? With preschool children, a large measure of humil-
Elliott (2007) steered away from concepts of intelligence
ity is required of the examiner. Scarr (1981) has expressed
and IQ, using the more neutral designation of GCA instead.
this sentiment as follows:
Even so, most experts in the field would consider GCA to
Whenever one measures a child’s cognitive functioning, be essentially the same as IQ.
one is also measuring cooperation, attention, persistence, The diagnostic subtests measure early number con-
ability to sit still, and social responsiveness to an assess-
cepts, phonological processing, short-term memory, and
ment situation.
processing speed. These subtests and the diagnostic com-
The special danger in preschool assessment is that the posites derived from them are used for clinical analysis
examiner may infer that a low score indicates low cogni- only. The diagnostic subtests are less dependent on the g
tive functioning when, in truth, the child is merely unable factor and therefore do not figure in the GCA or any core
to sit still, attend, cooperate, and so forth. Preschool assess- composites. The diagnostic subtests contribute to three
ment needs to be approached with unusual caution to diagnostic cluster scores (School Readiness, Working
avoid negative consequences of labeling and overdiagno- Memory, and Processing Speed). These subtests provide
sis of disabling conditions. information useful in assessing learning problems and
There are several individually administered intelli- school readiness, thereby complementing the core subtests.
gence tests suitable for preschool children. The most com- The DAS-II is normed to standard scores (M = 100, SD = 15)
monly used instruments include: for the GCA and cluster scores, whereas the individual
Testing Special Populations 185

coefficients ranged from .51 to .92, with most values in the


Table 7.2 DAS-II Subtests on the Early-Years Battery, .70s and .80s.
Upper Level
The validity of the DAS-II looks promising from sev-
Contribution to eral perspectives. First, the measure reveals very strong
Subtest Abilities Measured Composite(s) correlations with other tests of preschool cognitive func-
Core Subtests tioning and achievement. For example, DAS-II GCA scores
Verbal Receptive language, GCA, Verbal correlate strongly with mainstream intelligence tests, for
Comprehension understanding of oral Ability
example, r = .87 with WPPSI-III IQ, and r = .84 with WISC-
instructions
IV IQ. Likewise strong correlations are observed with
Naming Expressive language, GCA, Verbal
Vocabulary knowledge names and object ability major achievement tests, for example, r = .82 with WIAT-II
Picture Similarities Nonverbal reasoning, matching GCA, Nonverbal total achievement, and r = .81 with KTEA-II total achieve-
pictures with common themes reasoning ability ment. Another line of validity evidence for the DAS-II con-
Matrices Abstract reasoning, deducing GCA, Nonverbal sists of test data for 12 special groups, including children
the missing pattern in a matrix reasoning ability
with giftedness, mental retardation, reading disorder,
Pattern Nonverbal, spatial visualization GCA, Spatial
Construction with colored blocks and squares Ability ADHD and learning disorder, and limited English profi-
Copying Design copying, fine-motor GCA, Spatial ciency. In general, these groups reveal theory-consistent
coordination, visual-spatial Ability patterning of scores, for example, those with reading disor-
matching
ders score relatively low on the Verbal Ability cluster, those
Diagnostic Subtests with ADHD and learning disorder score relatively low on
Early Number Knowledge of numerical School Readiness the School Readiness cluster, those known to be gifted earn
Concepts concepts— number, order,
addition, subtraction average GCA scores of 125, and so forth.
Matching Letter- Seeing spatial relationships, School Readiness Confirmatory factor analyses reported in the technical
Like Forms visually discriminating similar manual leave a confusing picture as to the underlying
forms
structure of the DAS-II. The number of factors providing
Phonological Ability to process syllables, School Readiness
Processing sounds, and phonemes, e.g., the best fit to the test data differs by age group, ranging
rhyming, blending from a 2-factor solution for the youngest age group (age
Recall of Visualization and recall, e.g., Working Memory 2-6 to 3-5) to a 7-factor solution for children ages 6-0 to
Sequential Order order of body parts (belly, hair,
12-11, with 5- and 6-factor models for other age groups. On
toe, chin)
the other hand, the DAS-II is not predicated on any par-
Recall of Digits Short-term auditory recall for Working Memory
Backward sequences, mental manipulation ticular model of intelligence, so the pertinence of confirma-
Speed of Rapid visual scanning and Processing tory factor analyses is questionable.
Information simple decision-making Speed Even though the DAS-II has been available for a few
Processing
years, there is almost no published research using the test.
Rapid Naming Naming colors and pictures as Processing
quickly as possible Speed One study found the instrument valuable in the evaluation
of specific learning disability (SLD). In particular, regres-
NOTE: GCA = General Conceptual Ability. Also, a Special Nonverbal
Composite (SNC) can be computed from the four nonverbal core subtests. sion equations using the cluster scores were helpful in
identifying children with SLD in mathematics (Hale,
subtests are based on T scores (M = 50, SD = 10). The DAS- Fiorello, Dumont, and others, 2008). Beran (2007) reviews
II was normed on 3,480 U.S. children, with careful stratifi- the test favorably, with this understatement: “The test is
cation (2002 census data) on age, gender, race/ethnicity, complex.” In fact, the summary page of the record form for
parental education, and geographic region. hand scoring proves so difficult to follow that computer
The reliability of DAS-II scores is commendable for an scoring is nearly mandatory. Sattler (2008) provides an
instrument used at the preschool level. Typically, preschool especially thorough overview of the DAS-II.
children are easily distracted and plainly influenced by
situational factors, which tends to lower the reliability of Wechsler Preschool and Primary Scale of
test scores. The DAS-II seems relatively immune to these Intelligence-IV (WPPSI-IV) The WPPSI-IV is a sig-
influences. For preschoolers, GCA internal consistency reli- nificant revision of its predecessor, the WPPSI-III, and con-
ability is reported to be .95. The cluster scores also show tinues a long tradition of excellence in the assessment of
excellent reliability with values ranging from .89 to .95. preschool and primary school children (Wechsler, 2012).
Internal consistency reliability of the subtests is predictably The test is suitable for children ages 2½ to 7 years and 7
lower, although still laudable, ranging from .81 to .91. As is months, although a slightly different mix of subtests is
often found in reliability studies, test–retest reliability fig- used for younger children (ages 2-6 to 3-11) than for older
ures were significantly lower, based on retesting of 369 children (ages 4:0 to 7:7). We discuss only the version for
children after a period ranging from 7 to 63 days. These older children here.
186 Chapter 7

The full battery includes up to 13 subtests, but only 6 Nonverbal: 9 subtests with minimal verbal demand,
are needed to obtain a Full Scale IQ (FSIQ), although this is ­including Block Design and Matrix Reasoning.
rarely the solitary goal of assessment. In most situations, General Ability: 8 subtests, mainly untimed, including
examiners will find it indispensable to compare and con- Information, Similarities, and Matrix Reasoning.
trast the various subcomponents of general intelligence, Cognitive Proficiency: 5 subtests, including Picture
not just to get a FSIQ. For this more useful assessment, an Memory, Cancellation, and Animal Coding.
additional 4 subtests are needed, for a total of 10 subtests,
which is the most common WPPSI-IV battery. The final 3 These index scales can be useful in special circum-
subtests (for a total of 13 subtests) are needed only for spe- stances such as the assessment of deaf children (Nonverbal
cial ancillary index scales discussed later. We begin our dis- battery), evaluation of bright children with slower process-
cussion in reference to the standard 10 subtests normally ing (General Ability battery), and assessment of mental
administered. proficiency (Cognitive Proficiency battery). The Cognitive
Based on factor analytic studies, clinical considera- Proficiency battery includes measures of memory and
tions, and a comprehensive review of the latest research on speeded visual search.
cognitive abilities, the developers of the WPPSI-IV con-
cluded that five Primary Index Scales, each based on two Stanford-Binet Intelligence Scales for Early
subtests, are needed to capture the complexity of cognitive Childhood  Known informally as the Early SB5, the
abilities in older children. The structure of the WPSSI-IV is Stanford-Binet Intelligence Scales for Early Childhood
outlined in Table 7.3. (Roid, 2005) combine the subtests from the Stanford-Binet
Intelligence Scales, Fifth Edition (SB5) with a new Test
Table 7.3 Primary Index Structure of the WPPSI-IV at Observation Checklist and a software-generated Parent
Ages 4:0 to 7:7 Report. The subtests of the SB5 were described in the previ-
ous chapter. We focus here on the Test Observation Check-
Primary Index Subtests Used
list (TOC), which summarizes essential information about
Verbal Comprehension Information, Similarities child test-taking behaviors—in particular, behaviors that
Visual Spatial Block Design, Object Assembly may have a stunning impact on test scores.
Fluid Reasoning Matrix Reasoning, Picture Concepts The Early SB5 was developed for children ages 2 years
Working Memory Picture Memory, Zoo Locations to 7 years and 3 months. This is precisely the age range in
Processing Speed Bug Search, Cancellation which a child’s true level of functioning can be radically
NOTE: The six subtests in boldface are used in the computation of Full underestimated due to behavior problems such as distract-
Scale IQ. ibility, low frustration tolerance, or noncompliance. For
example, many preschool children simply stop responding
One desirable feature of the new edition is the use of
when subtest items become difficult—they may look down,
child-friendly and developmentally appropriate stimulus
or look away, or offer a comment on an unrelated topic.
materials. For example, in the new subtest Zoo Locations,
Noncompliant behavior of this nature is common; in fact,
one part of the working memory composite, the child
occasional refusals are reported in 41 percent of young
views one or more animal cards placed on a large zoo lay-
children (Aylward & Carson, 2005). But a refusal can mean
out for a predetermined time, then works with an “empty”
many things. Perhaps the child really doesn’t know the
zoo to place each card in the correct location. Another
answer; or perhaps the child knows the answer but is
example of adapting test materials to the needs of children
bored with testing, or afraid to hazard a guess, or simply
is the use of an ink dauber (essentially a large felt-tip pen)
distracted. The examiner will never know for sure, but
rather than a pencil to indicate responses on processing
there is a good chance that the true cognitive abilities of a
speed subtests. This reduces the confounding of the subtest
noncompliant child will be underestimated. The purpose
(a measure of processing speed) with fine motor demands
of the TOC is to provide a qualitative but highly structured
(a measure of motor prowess).
format for describing a wide range of behaviors, including
The WPPSI-IV is a recent revision, so there is little
noncompliance, known to affect test performance.
independent research on its psychometric properties or
The test-taking behaviors listed on the TOC are
clinical utility. However, the similarities of this instrument
divided into two groups: (1) Characteristics and (2) Specific
with other Wechsler tests suggest that it will be a mainstay
Behaviors. The former are general traits most likely found
of preschool and primary school assessment. In closing, we
in many situations, whereas the latter are specific behav-
should mention that the test allows for the computation of
iors actually observed during the testing session. The focus
four Ancillary Index Scales:
of the TOC is behaviors that negatively impact test perfor-
Vocabulary Acquisition: 2 subtests, Receptive Vocabu- mance. Many of the characteristics and behaviors are rated
lary and Picture Naming. on a continuum, whereas others are categorical.
Testing Special Populations 187

The characteristics rated include (Aylward & Carson, 7.1.3: Practical Utility of Infant
2005):
and Preschool Assessment
• Motor Skills—includes gross motor skills such as
The history of child assessment has shown time and again
clumsiness and fine motor skills such as pencil
that, in general, test scores earned in the first year or two
­dexterity.
of life show minimal predictive validity. For example, in
• Activity Level—includes both excessive restlessness her review of infant intelligence testing, Goodman (1990)
as well as underactivity in relation to child’s age. concludes:
• Attention/Distractibility—refers to age-inappropriate
If the successful prediction of adolescent and adult intel-
inattention, a need for redirection.
ligence from early childhood scores is one of the great
• Impulsivity—indicates the examiner saw fit to inter- accomplishments of applied psychology, then the failure
vene, slow the child down. to predict intelligence from infancy to early childhood
• Language—includes articulation, receptive language, ranks as one of its greatest failures.
and expressive language.
Given this dismal record of repeated failures of predic-
The specific behaviors rated include (Aylward & tive validity, we must ask a difficult question: What is the
­Carson, 2005): purpose and practical utility of infant assessment? In fact,
infant tests do have an important but limited role to play.
• Consistency in Performance—may indicate a hap-
We return to that issue after a review of predictive studies.
hazard approach to the test.
• Mood—includes specific behavioral indicators such as
Predictive Validity of Infant and Preschool
negative mood, tantrums, or crying.
Tests With heterogeneous samples of normal children,
• Frustration Tolerance—includes aggressiveness, the general finding is that infant test scores correlate posi-
refusal to participate. tively but unimpressively with childhood test scores
• Change in Mental Set—includes noted tendencies (Goodman, 1990; McCall, 1979). A few studies are more
toward rigidity of approach or perseveration. optimistic in tone (e.g., Wilson, 1983), but most researchers
• Motivation—includes disinterest or boredom and agree with McCall’s (1976) conclusion:
related behaviors. Generally speaking, there is essentially no correlation
• Fear of Failure—is qualitatively judged through between performance during the first six months of life
inference and can be corroborated through parental with IQ score after age 5; the correlations are predomi-
report. nantly in the 0.20s for assessments made between 7 and
• Degree of Cooperativeness/Refusals—a crucial cate- 18 months of life when one is predicting IQ at 5–18 years;
and it is not until 19–30 months that the infant test pre-
gory because numerous refusals can lead to underesti-
dicts later IQ in the range of 0.40–0.55.
mating cognitive ability.
• Anxiety—includes excessive fearfulness, shyness, or McCall (1979) reconfirmed his original conclusion in a
need for parental presence. later review, finding that the correlations between infant
• Need for Redirection—is noted when the child cannot and school-age test scores do not exceed .40 until the sub-
stay on task and constantly needs reminders. jects are at least 19 months of age for the initial testing.
The findings with preschool tests are somewhat more
• Parental Behaviors—includes items such as parental
positive in tone. The correlation between preschool test
reassurance, tacit approval for misbehavior, or giving
results and later IQ is typically strong, significant, and
verbal cues.
meaningful. The simplest way to investigate this question is
• Representativeness of Test Behaviors—is based on
to measure the stability of IQ results in longitudinal studies.
brief interview with parent(s), if present during
In Table 7.4, we have summarized the age-to-age stability of
­testing.
children’s IQ scores on the Stanford-Binet from the Fels
The TOC helps the examiner identify problematic Longitudinal Study, an early, classic follow-up investiga-
behaviors that may affect the validity of the test results. tion of children’s intellectual and emotional development
But this is not the only purpose of this instrument. In (Sontag, Baker, & Nelson, 1958). The lowest correlation in
addition, the documentation of these behavior problems this table is .43, and that is between IQ tested at age 4 and
may prove helpful in the early detection of developmen- again at age 12. What stands out in the table is the robust-
tal difficulties such as learning disabilities, behavior prob- ness of the link between IQ in preschool and later child-
lems, attentional difficulties, borderline cognitive hood. The older the child at initial testing, the stronger the
function, and neuropsychological deficits (Aylward & relationship with later IQ. In fact, the results suggest that IQ
Carson, 2005). becomes reasonably stable, on average, by 8 years of age.
188 Chapter 7

mental retardation rarely achieve normal range cognitive


Table 7.4 Stability of IQ from 3 to 12 Years of Age functioning in childhood (Frankenburg, 1985). Most stud-
Age at Retesting ies with the Bayley test also conform to this pattern. For
Age at example, VanderVeer and Schweid (1974) found that 23
Initial
young children with mild, moderate, and severe mental
Testing 4 5 6 7 8 9 10 11 12
retardation confirmed by the Bayley at ages 18 to 30
3 .83 .72 .73 .64 .60 .63 .54 .51 .46
months continued to merit a diagnosis of mental retarda-
4 .80 .85 .70 .63 .66 .55 .50 .43
tion one to three years later. Although some of the children
5 .87 .83 .79 .80 .70 .63 .62
with moderate and severe mental retardation were func-
6 .83 .79 .81 .72 .67 .67 tioning at a higher level (mild retardation), none of the
7 .91 .83 .82 .76 .73 children with initial mental retardation was normal at fol-
8 .92 .90 .84 .83 low-up. In an ostensibly contradictory finding, Hack, Tay-
9 .90 .82 .81 lor, Drotar, and others (2005) reported that very low scores
10 .90 .88 on the Bayley-II for low-birth-weight infants tested at 20
11 .90 months of age did not strongly predict low scores on the
Source: Adapted with permission from Sontag, L. W., Baker, C., & Nelson, V. (1958). K-ABC at age 8. These findings are cautionary, but not
Mental growth and personality development: A longitudinal study. Monographs of the
Society for Research in Child Development, 23 (Whole No. 68). Copyright © by The Society definitive, insofar as the K-ABC is not a good criterion for
for Research in Child Development, Inc.
mental retardation.

Fagan Test of Infant Intelligence (FTII) The


Collectively, these findings confirm that infant tests
infant tests discussed in this chapter could be described
generally have poor prognostic value, whereas preschool
as traditional, in the sense that their methods are a natu-
tests are moderately predictive of later intelligence. This
ral outgrowth of the long sweep of individual intelligence
brings us back to the question posed at the beginning of
tests reaching back to the early 1900s. But perhaps new
this section: What is the purpose and practical utility of
approaches are needed with infants. Lewis has argued
infant assessment?
that traditional infant tests overlook early information
Practical Utility of Infant Scales The most processing behaviors, such as recognition memory and
important and sound use of infant tests is in screening for attentiveness to the environment, that might better pre-
developmental disabilities. Early detection of children at dict childhood cognitive function (Lewis & Sullivan,
risk for mental retardation is vital because it provides for 1985). In one study, simple visual habituation to a novel
early intervention and, consequently, allows for improved stimulus (measured by the duration of fixation) assessed
outcomes later in life. Although existing infant tests are at 3 months of age correlated .61 with the Bayley Mental
poor predictors of childhood and adult intelligence, an score at 24 months of age (Lewis & Brooks-Gunn, 1981).
exception to this rule is encountered for infants who obtain Fagan and McGrath (1981) reported similar findings. In
very low scores on the Bayley test and other screening their study, infants first observed a picture of a baby’s
tests. For example, infants who score two or more standard face for a short period of time and were then shown the
deviations below the mean on the original Bayley (1969) same picture alongside an unfamiliar picture (e.g., picture
and the Bayley-II (Bayley, 1993), particularly on the Mental of a bald-headed man). The investigators kept careful
Scale, reveal a high probability of meeting the criteria for track of which picture the infants looked at more. The
mental retardation later in childhood (Goodman, Malizia, logic of the procedure is simple: Staring mainly at the
Durieux-Smith, MacMurray, & Bernard, 1990). There is no new picture signifies that an infant recognizes the old pic-
longitudinal research with the very recent Bayley-III (Bay- ture; that is, an infant with good recognition memory pre-
ley, 2005), but this test likely possesses good predictive fers to look at something new. Preference for novelty—as
validity for low scores as well. measured by visual fixation time on the new picture—
With at-risk children, the correlation between infant thus becomes an index of early recognition memory. Years
test scores and later childhood IQ is much stronger than later, the investigators administered the Peabody Picture
for samples of normal children. The most consistent find- Vocabulary Test (PPVT) to gauge early childhood intelli-
ing is that a very low score on an infant test—two or more gence. Infant recognition memory scores and early child
standard deviations below the mean—accurately prog- PPVT scores correlated .37 at 4 years of age and .57 at
nosticates mental retardation in childhood. For example, 7 years of age. Infant cognitive measures would appear to
studies with the Denver Developmental Screening Test- be promising predictors of childhood intelligence (Fagan
Revised (since revised and published as the Denver-II) & Haiken-Vasen, 1997).
revealed a false-positive rate of only 5 to 11 percent, mean- Using the paradigm described previously, Fagan
ing that infants and preschoolers identified as at risk for (1984) developed a new approach to infant assessment
Testing Special Populations 189

known as the Fagan Test of Infant Intelligence (FTII). The ficity of 80 percent. Yuan (2002) published Chinese norms
FTII assesses visual recognition memory using a 10-trial for the FTII and found a strong concurrent validity coef-
habituation format (Fagan & Shepherd, 1986). In each trial, ficient of .72 for 73 infants tested with the Bailey-II. Fur-
a photograph of a face is shown to the infant, followed by ther research is needed before we abandon traditional
paired presentation of the original face with either (1) a infant measures in favor of the Fagan test and similar
photograph of a similar but new face or (2) a photograph of measures.
the original face in a different orientation. The amount of
time spent looking at the new photograph is presumed to 7.1.4: Screening for School
indicate the degree to which the infant has noticed that it is
different from the original picture. The examiner observes
Readiness
the infant’s corneal reflections to determine a percent Nov- Screening for school readiness is a controversial practice.
elty Preference, averaged across the 10 trials. The proce- One concern expressed by some parents is that results from
dure shows very high interrater agreement (O’Neill, screening tests might be used to delay entry into the school
Jacobson, & Jacobson, 1994). A score of less than 53 percent system, or to hold a child back a year. These are fateful
for novelty preference identifies children who are at risk decisions with the potential for long-term impact, either
for later mental retardation. good or bad. Another concern is that children might be
Validation studies of the FTII as a predictor of child- permanently labeled as slow learners or cognitively
hood intelligence and as a screening tool for mental retar- delayed. Underlying the entire controversy is the con-
dation are mixed in outcome. With regard to the prediction founding complexity of definition. What is school readi-
of intelligence, FTII scores obtained at 7 to 9 months of age ness? Implicitly or explicitly, experts work from at least
correlated only .32 with Stanford-Binet IQ at age 3 for a five different models when defining school readiness. Each
sample of 200 infants (DiLalla, Thompson, Plomin, and model dictates a distinctive approach to assessment and
others, 1990). In another study, overall correlations between intervention. Community Research Partners (2007) pro-
FTII scores obtained at 7 to 9 months of age and WPPSI-R vide an excellent summary of the five approaches, which
IQ at age 5 were very low, about .2, for two Norwegian we paraphrase below:
samples of healthy children (Andersson, 1996). Tasbih-
sazan, Nettelbeck, and Kirby (2003) have identified a likely
reason that FTII scores correlate weakly with later IQ,
namely, the test may possess poor reliability. In particular,
for healthy, not at-risk infants, the test–retest stability coef-
ficients for percent Novelty Preference were .29 for 12
infants tested at 27 and 29 weeks, −.07 for 12 infants tested
at 29 and 39 weeks, and −.17 for 13 infants tested at 39 and
52 weeks. These stability coefficients are not just low—they
are indistinguishable from zero, which raises doubts as to
the soundness of the FTII instrument.
The FTII may perform better as a screening test than
as a general predictor of childhood intelligence. With
regard to screening infants at risk for developmental dis-
ability, Fagan, Singer, Montie, and Shepherd (1986)
reported very positive findings in a study of 62 infants
who experienced adverse factors such as premature birth
or maternal diabetes. When evaluated at 3 years of age,
eight children revealed cognitive delay (IQ ≤ 70), whereas
54 were considered normal. The FTII, previously admin-
istered between 3 and 7 months of age, correctly detected
6 of the 8 children with delay (75 percent sensitivity) and
suitably identified 49 of 54 normal children (91 percent
specificity). However, not all FTII screening studies of at-
risk infants are positive in tone. For example, McGrath,
Wypij, Rappaport, Newburger, and Bellinger (2004) used
FTII scores from 1 year of age to predict low IQ at age 8 in
100 at-risk infants and found poor sensitivity of 32 per-
cent in detecting cognitive delay (IQ ≤ 85) but fair speci-
190 Chapter 7

In this section, we will survey a variety of screening


tests, keeping in mind the complexity of the issues involved
in preschool screening. These pitfalls lead to two adverse outcomes: underde-
Children with low intelligence are substantially at risk tection of developmental problems and delayed discovery
for school failure, which explains why individual intelli- of disabilities. In both cases, needy infants and children do
gence tests play an important role in the evaluation of pre- not receive the services they need.
school children. But individual intelligence tests require a
Qualities of a Good Preschool Screening
substantial commitment of time (up to two hours) and
Instrument What are the qualities of a good pre-
must be administered by carefully trained practitioners.
For practical reasons, then, individual intelligence tests are school screening instrument? School readiness involves a
not suitable as screening instruments. number of broad areas, including motor, language, cogni-
The ideal screening instrument is a short test that can tive, social, and emotional functioning. Success in early
be administered by teachers, school nurses, and other indi- schooling requires that children function at or near age-
viduals who have received limited training in assessment. appropriate levels in all these areas. Thus, a useful screen-
In addition, a sensible screening test is one that provides a ing tool must address at least a few of these prerequisite
cutoff score that is accurate in classifying children as nor- domains. In addition to appropriate coverage, other quali-
mal or at risk. In the context of screening tests, two kinds of ties are needed in a suitable preschool screening tool as
errors can occur. Normal children who fail the test would well. For example, the Minnesota Interagency Develop-
be referred to as false-positive cases (because they are mental Screening Task Force—a leading advocacy group in
falsely classified as positive for potential disability). At-risk preschool screening—has published extensive standards
children who pass the test would be referred to as false- by which it recommends and approves screening instru-
negative cases (because they are falsely classified as nega- ments (www.health.state.mn.us). The following list of cri-
tive for potential disability). The reader must keep in mind teria is modeled loosely on their recommendations:
that the purpose of screening is merely to identify children • The primary purpose is screening rather than assess-
in need of additional evaluation, which means that false- ment, diagnosis, or prediction of academic success.
positive cases will receive further evaluation. Hence, a • Screening is provided in most or all of these areas:
false-positive misclassification rarely leads to undesirable motor, language, cognitive, social, and emotional func-
consequences. However, false-negative cases typically do tioning.
not receive further evaluation, so this kind of misclassifica-
• Overall test–retest reliability coefficient is a minimum
tion is potentially more serious—because a needy child is
of .70, preferably higher.
deemed to be normal. Glascoe (1991) recommends that a
• Concurrent validity against a comprehensive assess-
useful instrument should yield a false-negative rate of less
ment is a minimum of .70, preferably higher.
than 20 percent (meaning that 80 percent of truly at-risk
children are flagged by the test) and an even lower false- • Sensitivity and specificity of “at risk” and “not at risk”
positive rate of less than 10 percent (meaning that 90 per- classifications, respectively, are both at least .70.
cent of normal children pass the test). • Practicality and ease of administration are built in,
Glascoe and Shapiro (2005) outline five common pit- with testing time of 30 minutes or less.
falls of developmental and behavioral screening in infancy • Cultural, ethnic, and linguistic sensitivity is evident,
and early childhood. that is, the test accurately screens children from diverse
cultures.
• Minimum expertise is required for administration,
that is, the test is suitable for paraprofessionals to
administer.

The Interagency Task Force further notes that social-


emotional domains embedded within current screening
Testing Special Populations 191

instruments do not demonstrate sufficient reliability and 7.1.5: Dial-4


validity to determine if a child needs further assessment.
The Developmental Indicators for the Assessment of
Thus, separate instruments may be required to determine
Learning-4 is an individually administered test designed
if children are “at risk” for school failure due to social-­
for the quick and efficient screening of developmental
emotional difficulties.
problems in preschool children ages 2:6 through 5:11
Instruments for Preschool Screening  As (Mardell & Goldenberg, 2011). The test screens for difficul-
noted by Meisels and Atkins-Burnett (2005), dozens of ties in five areas, including direct behavioral assessment of
instruments have been produced to screen for develop- three major developmental domains: motor, concepts, and
mental delays, but only a few have withstood the test of language. Items in these domains are administered directly
time. In Table 7.5 we summarize a few recommended tools to the child by the examiner. Two additional domains (self-
(Glascoe, 2005; Meisels & Atkins-Burnett, 2005). An inter- help and social-emotional) are appraised by means of
esting feature of these evaluations is that nearly all of them questionnaires filled out by a parent (or both parents
are available in multiple languages, including Spanish, jointly) and a teacher. For children who have not yet
French, Korean, Vietnamese, Laotian, Cambodian, Hmong entered kindergarten, the teacher form is filled out by a
(the language of the ethnic group from mountainous preschool teacher. If the child has not been to preschool,
regions of southeast Asia), and Tagalog (the language of test results still are beneficial. Examples of items within the
the Philippines). These tools reflect the increasing diversity five domains include the following:
of American culture and the desire to provide adequate
school-based services to recent immigrants.
The Five Domains of the Developmental Indicators for the
Assessment of Learning-4 (DIAL-4) Test
Table 7.5 A Sample of School Readiness Screening Tests
Ages and Stages Questionnaire (Brookes Publishing Company)
Birth to 60 months; parent report of language, cognition, personal-social,
and motor skills; available in English, Spanish, French, and Korean; takes
10 to 20 minutes; clerical or paraprofessional tester.
Brigance Screens (Curriculum Associates) Birth to 60 months;
observation of social-emotional skills, speech-language, motor, readiness,
and general knowledge; available in English, Spanish, Laotian,
Vietnamese, Cambodian, and Tagalog; takes 15 to 20 minutes; consult
online training module before scoring.
Early Screening Inventory-Revised (Pearson Assessments) 36 to
60 months; observation of visual motor/adaptive, language and cognition,
and gross motor skills; available in English and Spanish; takes 15 to 20
minutes; screeners and scorers can be trained with a manual and video.
FirstSTEP Preschool Screening Tool (Pearson Assessment) 33 to
62 months; observation of cognitive, communication, and motor domains
The DIAL-4 is available in both English and Spanish,
and classifications of: within acceptable limits, caution, or at-risk; available
in English only; takes 15 to 20 minutes; screeners and scorers can be although standardization is now based on the combined
trained with a manual and video. normative sample, that is, separate norms are not pro-
Minneapolis Preschool Screening Instrument-Revised vided. The decision to develop unified norms was care-
(Minneapolis Public Schools) 36 to 60 months; 64 dichotomous items
pertaining to cognitive, language, literacy, motor, and perceptual fully considered during test development, and based on
development; available in English, Spanish, Somali, Hmong; takes 12 to recognized requirements of school districts that serve sub-
15 minutes to administer, 2 to 5 minutes to score; easy to learn, suitable
for paraprofessionals. stantial proportions of Spanish speaking/bilingual chil-
Parents’ Evaluation of Developmental Status (Ellsworth & dren of Hispanic origin. The large norm sample was
Vandemeer Press) Birth to 96 months; parental response in 10 areas such obtained nationwide, roughly stratified by key demo-
as cognitive, expressive language, fine motor, social-emotional; available in
English, Spanish, and Vietnamese; takes 5 minutes to administer, 2 minutes
graphics such as race and parental education. Because chil-
to score; suitable for paraprofessionals and clinic office staff. dren are changing so quickly in preschool and early school
years, norms are provided at two-month intervals.
We limit our discussion here to just three tests: the Scoring for some items is discrete and objective,
DIAL-3 (Developmental Indicators for the Assessment of whereas for other questions the scoring criteria in the man-
Learning-III), the Denver II (a revision of the Denver ual leave room for subjective interpretation, which detracts
Developmental Screening Test-Revised), and the HOME from the reliability of the instrument. A total score of direct
(Home Observation for the Measurement of the Environ- academic relevance is obtained by summing the first three
ment). The first two tests use conventional approaches for area scores (motor, concepts, language). The test yields a
the identification of developmental delay, whereas the total of eight scaled scores (mean of 100, SD of 15). Table 7.6
third instrument, the HOME, embodies a radical departure depicts a 4-year-old boy with language delay and prob-
from traditional procedures. lems with social development. An interesting feature of
192 Chapter 7

this case is that the teacher perceives the boy as further (­Nunnally & Bernstein, 1994). Validity of the instrument
behind than the parents do for both self-help and social has been evaluated along the familiar lines of content, con-
development. This disparity might facilitate useful discus- struct, and criterion-related. Content validity is judged to
sion in planning for academic intervention. be high insofar as a panel of experts provided content
reviews and helped eliminate inappropriate and biased
items. Criterion-related validity is strong, as judged by cor-
Table 7.6 DIAL-4 Scaled Score Results for a 4-Year-old
relations with similar instruments such as the Early Screen-
Boy with Language Delay and Social-emotional Problems
ing Profiles, Differential Abilities Scale, and Peabody
Respondent Performance Area Standard Score Picture Vocabulary Test-IV.
Motor 110 A recent study favorably evaluates the construct valid-
Child Concepts 95 ity of the DIAL-3 through confirmatory factor analysis
Language 63 (Assel & Anthony, 2009). As noted, the instrument was
Total 89 designed to screen for developmental delays in three
Questionnaire domains: motor abilities, conceptual knowledge, and lan-
Results guage competence. An essential feature of the test is that
Self-Help 104 separate scores are reported for each domain. These domains
Parent and the 21 subtests comprising them were rationally precon-
Social-emotional 77 ceived by the test authors. An important question is whether
Self-Help 88
the 21 subtests “hang together” statistically in a manner that
supports the rational grouping into the three domains pro-
Teacher
vided by the test developers. In other words, do the three
Social-emotional 65
domains possess a latent reality, or are they merely figments
of the imaginations of the test developers? Using test results
In addition to the eight standard scores depicted here, for 1,560 children ages 3 to 6, Assel and Anthony (2009)
the DIAL-4 provides a wealth of additional information found an excellent fit between the three domains tradition-
such as raw scores, cut-off scores, and percentile ranks. A ally reported on the DIAL-3 and three empirically derived
key feature of the test is that for each of the eight areas domains found through factor analysis, which supports the
shown, the manual provides cutoff scores for assigning the construct validity of the test. However, these authors did
child to one of two outcome groups labeled “potential note that Articulation subtest was a poor index of language
delay” and “okay.” A finding of “potential delay” in one or competence, and the Catching subtest was a poor index of
more areas is a starting point for further discussion, not a motor abilities. Further, the authors found that Name Writ-
mandate for any high-stakes decision-making. The pub- ing, Rapid Color Naming, and Letters/Sounds demon-
lisher offers computer scoring and generation of reports by strated floor effects, that is, even the easiest items on these
means of a secure internet service known as Q-global. This subtests were failed by young, low-socioeconomic status,
yields a printout of results and a Report to Parents which and minority children. These findings indicate the need for
can be helpful in discussion of the child’s progress among adding simpler items on these subtests for future revisions
parents, caregivers, school psychologists, and teachers. A of the test. The DIAL-3 also comes in a Spanish version that
short version of the test cleverly called Speed Dial Screener is separately validated on a sample of 588 Spanish-speaking
is available, which cuts testing time of about 40 minutes in Head Start children (Anthony & Assel, 2007).
half. However, the trade-off of reducing testing time by It is with regard to practical utility that the DIAL-4 and
decreasing the number of test items (which unavoidably its previous editions have raised the greatest skepticism.
diminishes scale reliability) may not be a prudent exchange. The value of a screening test is best judged by the extent to
Independent research on the DIAL-4 is scant at this which it accurately identifies children in need of further
time. A search of PsychINFO for articles with DIAL-4 in developmental assessment, and accurately identifies chil-
the title did not yield a single hit. Even so, the latest release dren who are normal as normal. One useful statistic is sen-
is only a minor departure from its predecessor, hence, reli- sitivity, which is the proportion of confirmed problem
ability and validity evidence for the DIAL-3 buttress the cases accurately “flagged” as problem cases by a test (i.e.,
standing of the new edition. children with delay who are accurately classified as “poten-
Reliability of the DIAL-3 is fair, given that it is a brief tial” delay). Unfortunately, brief screening tests such as the
test for screening purposes. Internal consistency coeffi- DIAL-4 do not reveal strong sensitivity when the recom-
cients range from .66 for Motor to .84 for Concepts, with a mended cutoffs are used to identify children as showing
total scale reliability of .87. Test–retest data are similar, “potential delay.” For example, sensitivity of the DIAL-4 is
which is to say, not up to the suggested minimum reliabil- reported to be in the range of .73 to .82, depending on the
ity of .90 for tests used to make individual decisions target group being researched (Mardell & Goldenberg,
Testing Special Populations 193

2011). Put another way, 18 to 27 percent of at-risk children The Denver possesses excellent content validity inso-
will be missed. far as the behaviors tested are recognized by authorities in
Another useful statistic is specificity, which is the pro- child development as important markers of development.
portion of normal cases accurately identified as normal. However, the test interpretation categories (normal, ques-
For the DIAL-4, specificity is reported to be in the range of tionable, abnormal) were based on clinical judgment and
.82 to .86, depending on the scales and the comparison therefore await additional study for validation. A few ini-
groups used (Mardell & Goldenberg, 2011). Stated in the tial studies raise significant concerns. Glascoe and Byrne
converse, what these data mean is that 14 to 18 percent of (1993) evaluated 89 children in day care settings who were
the (sizable) samples of normal children initially will be 7 to 70 months of age. Based on extensive independent
flagged as “potential delay.” These false-positive identifi- evaluation, 18 of these 89 children were confirmed to have
cations will cause anxiety for the parents and likely trigger developmental delays according to federal definitions of
the need for additional consultation and testing. disabling conditions (e.g., language delays, mental retarda-
The only way to achieve higher sensitivity is to liberal- tion, and autism). While the Denver II functioned well in
ize the cutoff scores, that is, classify a larger proportion as correctly identifying 15 of the 18 at-risk children, the instru-
showing “potential delay.” But for any single test at one ment performed poorly with the normal children. In fact,
point in time, sensitivity and specificity are inversely 38 of the 71 normal children failed the test and were classi-
related. As one goes up, the other must go down. There is fied as questionable or abnormal. Overall, almost four in
simply no way around this psychometric reality except to six children taking the test would be referred for additional
design a better, longer, and much more comprehensive assessment, and of the four, only one would have a true
test. But then the test becomes the gold standard for the disability. The researchers recommend further validational
thing being evaluated, and is no longer a screening test. In study with recalibration and possible discarding of some
sum, increasing sensitivity inevitably will reduce specific- test items before the test receives widespread use. Other
ity (percentage of normal children correctly identified as reviewers are even more skeptical. For example, a blue-
normal). This will cause many over-referrals (children ribbon review panel of the Minnesota Interagency Devel-
identified as “potential delay” who actually are normal). opmental Screening Task Force flatly concluded that the
Denver-II is not suitable for developmental and social-
Denver II  The Denver II (Frankenburg, Dodds, Archer, emotional screening of preschool children (www.health.
and others, 1990) is an updated version of the highly popu- state.mn.us).
lar Denver Developmental Screening Test-Revised (Frank-
Home The Home Observation for Measurement of the
enburg, 1985; Frankenburg & Dodds, 1967). The Denver
Environment (HOME), popularly known as the HOME
test is probably the most widely known and researched
Inventory, is probably the most widely used index of chil-
pediatric screening tool in the United States. The instru-
dren’s environment. Based on in-home observation and an
ment is popular worldwide—it has been translated into 44
interview with the primary caretaker, the instrument pro-
different languages. Suitable for infants and children aged
vides a measure of children’s physical and social environ-
1 month to 6 years, the test consists of 125 items in four
ments. The HOME Inventory comes in three forms: Infant
areas: personal-social, fine motor-adaptive, language, and
and Toddler, Early Childhood, and Middle Childhood. The
gross motor. The items are a mix of parent report, direct
latest editions of the instrument, dated 1984, emerged after
elicitation, and observation. Each item is arranged chrono-
15 years of methodical revision and refinement (Caldwell
logically on the test by age of the child and marked pass/
& Richmond, 1967; Caldwell & Bradley, 1984, 1994).
fail. Testing begins at an age-appropriate level and contin-
ues until the child fails three items. Total time for evalua- Background and Description  Prior to the devel-
tion is 20 minutes or less. opment of the HOME Inventory, the measurement of chil-
Unlike other screening tests, the Denver II does not dren’s environments was based largely upon demographic
produce a developmental quotient or score. Instead, results data such as parental education, occupation, income, and
on about 30 age-appropriate items provide a score that can location of residence. Often these indices were combined
be interpreted as normal, questionable, or abnormal in ref- into a cumulative measure referred to as social class or
erence to age-based norms. A category of “untestable” also socioeconomic status. For example, Hollingshead and
is included. The standardization sample consisted of 2,096 Redlich (1958) developed a continuum of social class
children, all from the state of Colorado, stratified by age, derived from residence, occupation, and education of the
race, and socioeconomic status. Reliability of the Denver II head of the household. The SES score for a family whose
is reported to be outstanding for a brief screening test. household head worked at a clerical job, was a high
Interrater reliability among trained raters averaged an school graduate, and lived in a middle-rank residential
­outstanding .99. Test–retest reliability for total score over a area would be computed as follows (Hollingshead &
7‑ to 10-day interval averaged .90. Redlich, 1958):
194 Chapter 7

daily environment; provision and adult interpretation of


Scale Factor Partial
Factor × = varied cultural experiences; appropriate play materials
Value Weight Score
Residence 3 6 18
and environment; contact with adults who value achieve-
ment; and the cumulative programming of experiences to
Occupation 4 9 36
match the child’s developmental level (Caldwell & Brad-
Education 4 5 20
ley, 1984). In brief, then, the purpose of the HOME is to
Index of Socioeconomic Status = 74
measure specific, designated patterns of nurturance and
stimulation available to children in the home.
For research purposes, social scientists may categorize In order to complete the HOME Inventory, the exam-
families into a fivefold hierarchy of social classes (classes I iner must observe the child and caregiver (usually the
through V) based on the total score. The reader will notice mother) interacting in the home environment. Ratings for a
that the Hollingshead and Redlich measure was derived few inventory items are derived from observation of the
entirely from status indices. The unstated assumption is physical environment. In addition, completion of some
that these indices reflect, indirectly, meaningful environ- items is based upon self-report of the caregiver. Items are
mental variation. Put bluntly, proponents of SES as an dichotomously scored, 1 for present, 0 for absent. For
environmental measure believe that, on average, children example, one item asks whether the child is included in
from a higher social class will experience a richer and grocery store shopping at least once a week. The manual
more nurturant environment than children from a lower for the inventory encourages a relaxed, semistructured
social class. approach to observation and interview (Caldwell & Brad-
In contrast to the SES approach, the HOME Inventory ley, 1984). Completion of the inventory takes about an
was developed to provide a direct process measure of chil- hour.
dren’s environments. The guiding philosophy of this The three forms of the HOME are Infant and Toddler
instrument is that direct assessment of children’s experi- (ages 0 to 3 years), Early Childhood (ages 3 to 6 years), and
ences is a better index of the home environment than such Middle Childhood (ages 6 to 10 years). The Infant and
indirect measures as parental occupation and education. ­Toddler form consists of 45 items organized into the fol-
Although it is true that social class—as embodied in occu- lowing six subscales:
pation, education, residence—provides an oblique meas-
1. Emotional and Verbal Responsivity of Parent
ure of environmental richness, the authors of the HOME
Inventory would argue that direct assessment of children’s 2. Acceptance of the Child’s Behavior
experiences provides a more accurate index of variations in 3. Organization of the Environment
the home environment. Thus, assessment with the HOME 4. Provision of Appropriate Play Materials
involves, in part, direct observation of children’s home
5. Parent Involvement with Child
environments to determine whether certain types of cru-
6. Variety of Stimulation
cial interactions and experiences are present or absent. For
example, during an hour-long visit, the examiner observes The Early Childhood version consists of 55 items
whether the parent spontaneously communicates with the organized into eight subscales, whereas the Middle Child-
child at least five times, determines whether the child has hood version consists of 59 items organized into eight sub-
at least 10 children’s books or story records, and assesses scales.
whether the neighborhood is esthetically pleasing accord-
ing to detailed standards, to cite just a few examples. Technical Features Relevant norms for the HOME
The purpose of the HOME Inventory is to measure the Inventory are available from several sources. For the Infant
quality and quantity of stimulation and support for cogni- and Toddler version, Caldwell and Bradley (1984) report
tive, social, and emotional development available to the subscale means and standard deviations for 174 families
child in the home. The scales and items of the HOME were from Little Rock, Arkansas. Compared to the general pop-
derived from a list of environmental processes identified ulation, this sample appears to overrepresent lower-SES
from existing research and theory as important for optimal families. For example, 34 percent of the families were on
childhood development (Caldwell & Bradley, 1984). These welfare and 29 percent were single-parent households. For
growth-promoting processes include basic need gratifica- the Early Childhood version, standardization data were
tion; frequent contact with a relatively small number of available from 232 families in Little Rock, with lower-SES
adults; a positive emotional climate that fosters trust of self families similarly overrepresented. For the Middle Child-
and others; appropriate, varied, and patterned sensory hood version, Bradley and Rock (1985) report subscale
input; consistency in the physical, verbal, and emotional means and standard deviations for 141 families from Little
responses of others; a minimum of social restrictions on Rock. Approximately half of these families were African
exploratory and motor behavior; structure and order in the American, the remainder Caucasian; boys and girls were
Testing Special Populations 195

sampled equally. These families were thought to be repre- intellectual measures such as the Stanford-Binet are
sentative of all families rearing elementary-aged children ­particularly informative. In one study of 174 families, the
in Little Rock, Arkansas. However, for all three versions it total score on the HOME at 12 months of age correlated a
is clear that the standardization samples provide only local robust r = .58 with Stanford-Binet IQ at 36 months of age.
norms. These data may be useful as points of reference but Factor-analytic studies of the HOME also support the con-
should not be equated with a stratified, random, national struct validity of this instrument (Bradley, Mundfrom,
sample. Whiteside, and others, 1994). In sum, the HOME inventory
The reliability of the HOME Inventory has been shows promise not only in research but also as a practical
demonstrated in a variety of ways, particularly for the adjunct to intervention.
Infant and Toddler version, which we discuss here. The
authors note that short-term test–retest studies are inap-
propriate, since a respondent is quite likely to remember
a specific answer given to a question, which would artifi-
7.2: Testing Persons
cially inflate test–retest correlations (Bradley & Caldwell,
1984). Methods used for the assessment of reliability
with Disabilities
included interobserver agreement, internal consistency, 7.2 Discuss a case based on assessing the intelligence
and long-range test–retest stability coefficients for 91 of persons with disabilities
families from the standardization sample. By definition,
In this topic we discuss instruments designed for excep-
interobserver agreement for the subscale items is
tional and difficult consultations, such as persons with sen-
reported to be 90 percent or higher, since this is the train-
sory/motor impairment, recent immigrants from
ing criterion for new raters. Internal consistency esti-
non-English-speaking countries, and individuals with sig-
mates using Kuder-Richardson formula 20 ranged from
nificant intellectual deficiencies. According to the U.S. Cen-
.67 to .89 for all subscales except Variety of Stimulation,
sus Bureau, about 32 million Americans over the age of 5
which yielded a coefficient of only .44. This rather low
(one in eight) have a sensory, physical, mental, or self-care
reliability coefficient was due to the small number of
disability (www.census.gov, 2000). This estimate does not
items in the subscale (five). Test–retest data were availa-
include persons living in institutions. In these extraordi-
ble from 91 families tested when their infant/toddler
nary circumstances—evaluating persons with sensory,
was 6, 12, and 24 months of age. The coefficients indi-
motor, language, or intellectual disability—specialized
cated a moderate to high degree of stability for the sub-
tests are needed for valid assessment. However, before
scales, with most correlations in the .50s, .60s, and .70s.
introducing specific instruments, we examine a back-
The correlation between total score for testings at 12 and
ground issue: How did these instruments arise?
24 months of age was a highly respectable .77.
The validity of the HOME Inventory has been bol-
stered by research findings that show modest correla-
7.2.1: Origins of Tests for Special
tions with SES indices. Because the inventory was
proposed as a more meaningful, sensitive index of envi- Populations
ronment than social class, HOME scores should be sig- Beginning in the 1950s, a renewed commitment to the
nificantly but not highly related to SES indices. For the needs and rights of physically and mentally disabled per-
Infant and Toddler version, HOME Inventory subscale sons arose in the United States (Maloney & Ward, 1979;
correlations with SES are mainly in the .30s and .40s, Patton, Payne, & Beirne-Smith, 1986). Societal attitudes
while the total score–SES correlation is .45 (Bradley, toward those with special needs shifted from outright dis-
Rock, Caldwell, & Brisby, 1989). HOME scores also dain to a more supportive stance that favored new pro-
revealed a strong relationship with poverty status in grams and initiatives on behalf of the disabled. Progress
Caucasian and minority samples (Bradley, Corwyn, has been slow, but we are no longer surprised to see bath-
Pipes McAdoo, & Garcia Coll, 2001). Furthermore, higher room facilities with wheelchair access for persons with
HOME scores predicted that children would exhibit physical disability, large-print books for persons with vis-
fewer behavior problems and better preschool ability in a ual impairments, or closed-captioned television programs
study of 93 single African American mothers (Jackson, for persons with hearing disabilities. Furthermore, the spe-
Brooks-Gunn, Huang, & Glassman, 2000). cial needs of citizens with mental retardation are increas-
HOME scores also show strong, theory-confirming ingly served by small community care facilities instead of
relationships with appropriate external criteria, including massive, impersonal institutions.
language and cognitive development, school failure, thera- In the early 1970s, the renewed concern for the needs
peutic intervention, and mental retardation (Caldwell & of disabled persons was translated into federal legislation.
Bradley, 1984). The correlations between HOME scores and In 1973, Public Law 93-112 was passed, serving as a “Bill of
196 Chapter 7

Rights” for individuals with disabilities. This legislation


Figure 7.1 A Characteristic Item from the Leiter
outlawed discrimination on the basis of disability. Two International Performance Scale-Revised
years later, the landmark Education for All Handicapped
Children Act (Public Law 94-142) was enacted. This legisla-
tion mandated that disabled schoolchildren receive appro-
priate assessment and educational opportunities. In
particular, psychologists were directed to assess children in
all areas of possible disability—mental, behavioral, and
physical—and to use instruments validated for those
express purposes. We turn now to a review of tests that can
be used for the assessment of persons with sensory, motor,
or mental disabilities.

7.2.2: Nonlanguage Tests


Nonlanguage tests require little or no written or spoken
The test is untimed. Because the initial items are trans-
language from examiner or examinee. Thus, they are par-
parently obvious, most examinees catch on quickly with-
ticularly suited for assessment of non-English-speaking
out need of pantomime demonstration. The Leiter-R
persons, referrals with speech impairments, and exami-
contains 20 subtests organized into two batteries: Visuali-
nees with weak language skills. These instruments can also
zation and Reasoning, and Memory and Attention. The 10
be used as supplementary tests for examinees who have no
subtests of the Visualization and Reasoning Battery are
disabilities.
described in Table 7.7.
Leiter International Performance Scale-
Revised  The Leiter International Performance Scale-
Table 7.7 Visualization and Reasoning Subtests of the
Revised (LIPS-R, Roid & Miller, 1997) is a revision of a Leiter-R
classic and highly praised test of nonverbal intelligence
1. Figure Ground: Identification of designs or figures embedded within a
and cognitive abilities (Leiter, 1948, 1979). Leiter devised stimulus. (All ages)
an experimental edition of the test in 1929 to assess the 2. Design Analogies: Like the matrix analogies subtests found on many
intelligence of those with hearing or speech impairment, cognitive tests. (Ages 6 to 20)
those who were bilingual, or non-English-speaking exami- 3. Form Completion: Ability to recognize objects from fragmented line
drawings. (All ages)
nees. The scale was field-tested with several ethnic groups
4. Matching: Matching and discrimination of simple visual stimuli. (Ages
in Hawaii, including children of Japanese and Chinese
2 to 10)
descent. The first edition was based on test results for
5. Sequential Order: Logical progression of pictorial or figural items. (All
American children, high school students, and World War II ages)
Army recruits. Although highly praised and widely used 6. Repeated Patterns: Identify the missing part of a repeated pattern of
after its initial release, this test received strong criticism in figural items. (All ages)

recent years because of poor illustrations and outdated 7. Picture Context: Using visual cues to identify a pictured object that
has been removed. (Ages 2 to 5)
norms. The revised Leiter answers all criticisms handily,
8. Classification: Categorization of objects or geometric designs. (Ages
and the LIPS-R deserves wide use as a culture-reduced 2 to 5)
measure of nonverbal intelligence. 9. Paper Folding: Ability to mentally “fold” an item shown in unfolded
A remarkable feature of the Leiter is the complete two-dimensional form. (Ages 6 to 20)
elimination of verbal instructions. The Leiter-R does not 10. Figure Rotation: Capacity to mentally rotate a two-or three-
dimensional object. (Ages 11 to 20)
require a single spoken word from the examiner or the
examinee. With an age range of 2 years to 20 years and 11
months, the Leiter-R is particularly suitable for children Not all subtests are administered to every child. For
and adolescents whose English language skills are weak. example, the figure rotation subtest is too difficult for
This includes children with any of these features: non-­ 2-year-olds and the immediate recognition subtest is too
English-speaking, autism, traumatic brain injury, speech easy for adolescent examinees. The four Reasoning sub-
impairment, hearing problems, or an impoverished envi- tests include classification and design analogies. The six
ronment. The test is also useful in the assessment of atten- Visualization subtests include matching, figure-ground,
tional problems, as described in the following. paper folding, and figure rotation. The eight Memory sub-
Testing is performed by the child or adolescent match- tests include memory span, spatial memory, associative
ing small laminated cards underneath corresponding illus- memory, and delayed recognition memory. The two Atten-
trations on an easel display (Figure 7.1). tion subtests consist of an underlining test (e.g., marking
Testing Special Populations 197

all squares printed on a page full of geometric shapes) and poses (Sattler, 1988; Salvia & Ysseldyke, 1991), the revised
a measure of divided attention (e.g., observing a moving Leiter is a huge improvement in regards to psychometric
display and simultaneously sorting cards correctly). quality and standardization excellence. Thorough reviews
The Leiter-R yields a composite IQ with the familiar of the Leiter-R and other nonverbal assessment instru-
mean of 100 and standard deviation of 15. The test also ments are provided by McCallum, Bracken, and Wasser-
produces subtest scaled scores with a mean of 10 and man (2001).
standard deviation of 3, as well as a variety of composite
scores useful in clinical diagnosis. The test was normed on Human Figure Drawing Tests Most children
over 2,000 children and adolescents, from 2 to 21 years of enjoy drawing human figures and do so routinely and
age. Using 1993 census statistics, these subjects were care- spontaneously. Since the early 1900s, psychologists have
fully stratified according to race, age, gender, social class, tried to tap into this almost instinctive behavior as a basis
and geographic region. Internal consistency reliability for for measuring intellectual development. The first person to
subtests, domain scores, and IQ scores is excellent. Typical use human figure drawing (HFD) as a standardized intel-
coefficient alphas are in the high .80s for subtests and the ligence test was Florence Goodenough (1926). Her test,
low .90s for domain scores and IQ scores. Extensive studies known as the Draw-A-Man test, was revised by Harris
of item bias reveal that the items appear to function simi- (1963) and renamed the Goodenough-Harris Drawing Test.
larly in separate racial groups (white, African American, More recently, the HFD technique has been adapted by
and Hispanic samples); that is, there is no evidence of bias Naglieri (1988). We should also mention that human figure
(defined as differential item functioning). Coupled with drawings are widely used as measures of emotional adjust-
the fact that the test is completely nonverbal, the absence of ment, but we do not discuss that application here.
test bias indicates that the Leiter-R is a good choice for cul- The Goodenough-Harris Drawing Test is a brief, non-
ture-reduced testing of minority children. But the test is verbal test of intelligence that can be administered individ-
useful in a wide range of other situations as well. For ually or in a group. Goodenough (1926) published the first
example, Hanzel (2003) recommends the Leiter-R for the edition of this test, while Harris (1963) provided important
evaluation of children with autistic disorder, a syndrome refinements in scoring and standardization, including the
discussed later in the chapter. use of a deviation IQ. Strictly speaking, the Goodenough-
Empirical research with the Leiter-R is largely sup- Harris test doesn’t fit the criteria for nonlanguage tests
portive at this time. The test has been shown to have utility insofar as the examiner must convey certain instructions in
in the assessment of medically fragile children (Hooper, English or through a translator. However, the instructions
Hatton, Baranek, Roberts, & Bailey, 2000), the assessment are brief and basic (“I want you to draw a picture of a man
of low-functioning children with autism (Tsatsanis, Dart- [or woman]; make the very best picture you can”). The
nall, Cicchetti, and others, 2003), and the evaluation of chil- Goodenough-Harris test is, for all practical purposes, a
dren classified as language impaired (Farrell & Phelps, nonlanguage test.
2000). In this latter study, the Leiter-R also demonstrated a The purpose of the Goodenough-Harris Drawing Test
validity-confirming correlation of r = .80 with another non- is to measure intellectual maturity, not artistic skill. Thus,
verbal measure of intelligence. Further, in testing with eth- the scoring guide emphasizes accuracy of observation and
nic minorities, the Leiter-R appears to avoid the the development of conceptual thinking. The child receives
confounding of intellectual assessment with English lan- credit for including body parts and details, as well as for
guage proficiency that is common with other tests. For providing perspective, realistic proportion, and implied
example, one study of 47 Spanish-speaking and 47 English- freedom of movement.
speaking children reported average WAIS-III IQs of 94 ver- The 73 scorable items are transformed to a scaled score
sus 88, respectively, whereas average Leiter-R IQs were with the familiar mean of 100 and standard deviation of 15.
nearly identical, 98 versus 99 (Cathers-Schiffman & Of course, these norms, developed in the 1960s, are now
Thompson, 2007). thoroughly outdated. Even so, a large body of research
The Leiter-R is a welcome revision of an obsolete test. confirmed that the test captured something important. For
In the hands of a careful clinician, the test is helpful in the example, Frederickson (1985) reported correlations
intellectual assessment of children with weak skills in Eng- between Goodenough-Harris Drawing Test scores and
lish. Other uses for the revised test include the assessment WPPSI Full Scale IQ in the range of .72 to .80. In several
of attention-deficit/hyperactivity disorder (comparisons of other studies, correlations with individual IQ tests are
the Attention subtests with the other domains are crucial more variable, but the majority are over .50 (Abell, Briesen,
here) and the evaluation of giftedness in young children & Watz, 1996; Anastasi, 1975).
(the extremely high ceiling of the test proves invaluable for In response to criticisms of the Goodenough-Harris
this application). Whereas reviewers warned against using Drawing Test, Naglieri (1988) developed a quantitative
the original Leiter for placement or decision-making pur- scoring system and renormed the human figure drawing
198 Chapter 7

procedure. His scoring system, The Draw A Person: A LQ yields average scores that are remarkably close to
Quantitative Scoring System (DAP), was normed on a WISC-R Performance IQ for samples of children with hear-
sample of 2,622 individuals ages 5 through 17 years who ing impairment and those who are deaf, the H-NTLA
were representative of the 1980 U.S. Census data on age, scores are substantially more variable (Phelps & Ensor,
sex, race, geographic region, ethnic group, social class, and 1986). Thus, use of the H-NTLA may increase the risk of
community size. The DAP yields standard scores with the false-positive misclassification—labeling children as gifted
familiar mean of 100 and standard deviation of 15. In a when they are only bright or as having mental retardation
study of 61 subjects ages 6 to 16 years, the DAP correlated when they are merely borderline.
.51 with WISC-R IQ and produced similar overall scores, The H-NTLA is useful with children who are deaf,
with a mean IQ of 100 versus mean DAP score of 95 (Wis- have speech or language impairments or mental retarda-
niewski & Naglieri, 1989). Lassiter and Bardos (1995) found tion, or those who are bilingual. An interesting feature of
that the DAP score underestimated IQ scores obtained this test is the development of parallel norms: The H-NTLA
from the WPPSI-R and the K-BIT in a sample of 50 kinder- was standardized on 1,079 children who were deaf and
gartners and first graders. 1,074 normal-hearing children aged 2½ to 17½. However,
Reviewers praise the DAP for its clear scoring system, the chief weakness of the instrument is the inadequacy of
strong reliability, and careful standardization (Cosden, these norms. For example, the representativeness of the
1992). However, results of validity studies are more cau- sample of those who were deaf—picked on an opportunis-
tionary. Harrison and Schock (1994) note that the accumu- tic basis from schools for those who are deaf—is largely
lated evidence with HFD tests indicates low to moderate unknown. Standardization of the normal-hearing sample
predictive validity. In spite of their popularity and appeal, was based on occupational level of parents according to
HFD tests do not effectively identify children with learning the 1960 U.S. Census. A contemporary and more detailed
difficulties or developmental disabilities, and they may not restandardization of the test would be quite helpful. Qu
be valid for use even as screening measures. (1997) reports favorably on the reliability and validity of
the test with huge samples of Chinese deaf children.
Hiskey-Nebraska Test of Learning Aptitude
The Hiskey-Nebraska Test of Learning Aptitude (H-NTLA) Test of Nonverbal Intelligence-4 (TONI-4) The
is a nonlanguage performance scale for use with children Test of Nonverbal Intelligence-4 (TONI-4) is a language-
aged 3 to 17 years (Hiskey, 1966). This test can be adminis- free measure of cognitive ability designed for disabled and
tered entirely through pantomime and requires no verbal language-impaired populations (Brown, Sherbenou, &
response from the examinee. However, verbal instructions Johnsen, 2010). By adding new items, the fourth edition
can be used with children with normal and mild hearing realized a higher ceiling and a lower floor than the previ-
impairment. The H-NTLA consists of 12 subtests: ous version. This is a pragmatic, brief, and simple measure
that can be administered in 15 to 20 minutes. Because the
Bead Patterns Block Patterns response format can include any simple gesture such as
Memory for Color Completion of Drawings nodding or pointing, the TONI-4 is well suited for persons
Picture Identification Memory for Digits who are deaf, language impaired, or physically limited.
Picture Association Puzzle Blocks The authors recommend the test for assessing persons
Paper Folding Picture Analogies with aphasia, non-English speakers, and persons who
Visual Attention Span Spatial Reasoning have experienced a variety of severe neurological trau-
mas. The test instructions are pantomimed by the exam-
Raw scores on the subtests are converted into a Devia- iner and the examinee answers by pointing to one of six
tion Learning Quotient (LQ) with mean of 100 and stand- possible responses. For motorically impaired patients, the
ard deviation of 16. For a sample of 43 hearing-impaired examiner can point to the alternatives, one by one, while
children, the test–retest stability of the LQ scores was awaiting a choice from the examinee (e.g., nod of the head,
reported to be .79, .85, and .62 after intervals of about 1 or even an eye blink from a paralyzed patient).
year, 3 years, and 5 years, respectively, which is similar to The TONI-4 comes in two equivalent forms (A and B).
data for normal children (Watson, 1983). Even so, more Each form consists of 60 abstract or figural items that do not
than one third of the sample showed a 15-point or greater include pictures or cultural symbols. Except for a few
change in scores over the 5-year time span, which demon- ­simple-matching items, the TONI-4 items require the exam-
strates the importance of basing important decisions on inee to solve problems by identifying relationships among
more than a single measure. the abstract figures. Many of the items are similar in format
H-NTLA scores correlate quite robustly with achieve- to those found on Raven’s Progressive Matrices. The test
ment scales for grades 2 through 12 (median r = .49) and yields three kinds of scores: age equivalents (for younger
also with WISC-R Performance IQ (r = .85). Although the examinees), percentile ranks, and TONI-4 ­quotients (mean
Testing Special Populations 199

of 100 and standard deviation of 15). Suitable for persons consulting psychologist. The young examinee was totally
aged 6:0 through 89:11, the standardization sample con- confined to a battery-powered wheelchair, except when a
sisted of 2,272 people from 33 states stratified according to live-in attendant would transfer him to a bed or chair. Even
gender, race and ethnicity, parental education, and socioec- a dispassionate observer would have to agree that the cli-
onomic status. Reliability data are satisfactory, with internal ent didn’t look very capable, sitting hunched over in his
consistency coefficients typically exceeding .90 and alter- chair, unable to control his drooling, one arm arched out
nate-forms reliability in the range of .80 to .95. at an awkward angle. Yet, in spite of his disability, he had
Independent validity studies of the TONI-4 are scant, achieved a fair degree of personal independence. Using a
but investigation of prior editions (which are highly simi- simple joystick control device, he could guide his wheel-
lar in content) is supportive of this test as a culture-reduced chair to the grocery store, library, and community center
index of general intelligence. Overall, the TONI-4 is highly where he would complete simple transactions by pointing
regarded as a brief nonlanguage screening tool for persons to appropriate words and phrases in a plastic-bound spiral
with impaired language abilities (e.g., aphasic, deaf, non- notebook. Because of his poor motor control, interactions
English-speaking, intellectually disabled). The test is more with this client took quite a long time. Nonetheless, he was
carefully standardized than most and possesses excellent very efficient with short communications. Here is a typical
reliability. A useful feature is that the untimed administra- exchange, with the client’s notebook-designated responses
tion of TONI-4 rarely exceeds 20 minutes. Instructions are shown in capital letters:
available in seven major foreign languages. For a review,
“I understand you have a new synthesized-voice com-
see Ritter, Kilinc, Navruz, and Bae (2011). munication box, how do you like it?” YOU ASKED TWO
QUESTIONS. “You’re right. I’ll bet that happens a lot.
7.2.3: Nonreading and Do you have a communication box?” YES. “What do you
think of it?” IT’S NOT EASY. “Now that we are done test-
Motor-Reduced Tests ing, should I find your driver?” NO, I’LL WAIT. HE IS
Nonreading tests are designed for illiterate examinees who COMING BACK.
can, nonetheless, understand spoken English well enough
to follow oral instructions. Nonreading tests of intelligence
are well suited to young children, illiterate examinees, and
How intelligent is this client? What is his level of ver-
persons with speech or expressive-language impairments.
bal comprehension? How well does he understand
These tests need not be specialized or esoteric: The perfor-
abstract concepts? For example, is he capable of under-
mance subtests of most mainstream instruments qualify as
standing the essentials of microcomputer usage such as
nonreading tests. For example, examiners may use the
data entry, file storage, and directory commands? Could
WISC-III performance subtests to estimate the intelligence
he learn to program a microcomputer? These are precisely
of examinees with language disabilities.
the referral questions asked by a vocational rehabilitation
However, clients with cerebral palsy or other orthope-
counselor who was contemplating huge expenditures—
dically impairing conditions will score very poorly on non-
thousands of dollars—to purchase a computer system for
reading tests that require manipulatory responses.
this disabled client.
Obtaining valid test results from such persons can present
Certainly, it would be easy to underestimate the poten-
an enormous challenge (Case Exhibit 7.1). The motor defi-
tial of this young man with severe motor and language dis-
cits, increased tendency to fatigue, and inexactness of pur-
abilities because—in a quite literal sense—his intelligence
posive movements common to persons with cerebral palsy
was hidden away, trapped inside his incapacitated body.
will negatively affect their performance on cognitive
The task of the examiner was to find the able mind inside
assessment tools. Orthopedically impaired clients need
the disabled body, a formidable challenge indeed. Using
tests that are both nonreading and motor reduced. In par-
the Test of Nonverbal Intelligence and the Peabody Picture
ticular, tests that permit a simple pointing response are
Vocabulary Test, the examiner determined that the young
well suited to the assessment of children and adults with
client possessed at least average intelligence and could
cerebral palsy or other motor-impairing conditions.
likely learn the fundamentals of data processing with
microcomputers.
Case Exhibit 7.1 Peabody Picture Vocabulary Test-IV  The Pea-
body Picture Vocabulary Test-IV (PPVT-4) is the best
The Challenge of Assessment in Cerebral
known and most widely used of the nonreading, motor-
Palsy reduced tests (Dunn & Dunn, 1998). The PPVT-4 is used to
The challenges inherent to special consultations are well obtain a rapid measure of listening vocabulary with per-
typified by a client with cerebral palsy recently tested by a sons who are deaf or who have neurological or speech
200 Chapter 7

impairments. Although the PPVT-4 is useful with any K-ABC, in low SES, African American children” (p. 91).
examinee who cannot verbalize well, the test is especially Further research will be needed to clarify the utility of this
useful with examinees who also manifest motor-impairing test with minority children.
conditions such as cerebral palsy or stroke. Several lines of evidence support the validity of the
The PPVT-4 comes in two parallel versions, each con- Peabody test, but only as a narrow measure of vocabulary,
sisting of 4 practice plates and 228 testing plates. Each plate not as a general measure of intelligence (Altepeter & John-
contains four line drawings of objects or everyday scenes. son, 1989). Dunn and Dunn (1981) sought to ensure con-
The examiner presents a plate, states the stimulus word tent validity by searching Webster’s New Collegiate
orally, and asks the examinee to point to the one picture that Dictionary for all words whose meanings could be repre-
best depicts the stated word. The test items are precisely sented by a picture. Thus, the authors had a specific con-
ordered according to difficulty level, arranged in 19 sets of tent universe in mind, and the items from the Peabody
12 items each for efficient identification of basal and ceiling appear to be a fair sampling from this domain. In addition,
levels. The entry level is determined by age, and examinees the authors used sophisticated item-selection techniques
continue until they reach their ceiling level. Although the based on the Rasch-Wright latent-trait model to help build
test is untimed, administration seldom exceeds 15 minutes. construct validity into the test. This model enables
Raw scores are converted to age equivalents or standard researchers to construct a growth curve for the latent trait
scores (mean of 100, standard deviation of 15). being measured (hearing vocabulary) and to select items
The PPVT-4 was standardized on a representative that best fit the curve. Using tryout and calibration data,
national sample of 3,540 individuals ranging from 2½ to the curve was drawn repeatedly on a computer. If an item
90 or more years of age. Reliability data for the new edi- did not fit the Rasch-Wright latent-trait model (too flat or
tion are exceptionally strong, with typical internal consist- too steep an item-characteristic curve) it was discarded
ency coefficients of .94, alternate-forms reliabilities of .89, from consideration.
and test–retest correlations of .93. Concurrent validity Concurrent and predictive validity data for the Pea-
studies are also highly supportive, demonstrating robust body are somewhat limited but promising. Several investi-
correlations with verbal measures. For example, the test gators have correlated the PPVT-R with achievement
developers report a correlation of .7 with scores on the lat- measures, where modest relationships (r’s from .30 to .60)
est edition of the Clinical Evaluation of Language Funda- are common (Naglieri, 1981; Naglieri & Pfeiffer, 1983). Cor-
mentals (CELF-4). relations with reading achievement tend to be higher than
The test developers of the PPVT-4 took great care to with spelling and arithmetic achievement, suggesting that
minimize and balance cultural influences in the test items. the PPVT-R has appropriate discriminant validity (Vance,
Independent consultants representing the perspectives of Kitson, & Singer, 1985).
African Americans, Asians, Hispanics, Native Americans, Several investigators have correlated earlier versions
and women reviewed the content and artwork of the test of the Peabody with intelligence measures, particularly the
during development, and adjustments were made follow- WISC-R and WAIS-R, and healthy correlations (near .70)
ing these reviews. The test items demonstrate attractive are the rule (e.g., Naglieri & Yazzie, 1983). As might be
artwork that is balanced for racial and gender differences, expected, correlations tend to be higher with Verbal IQ
including persons with physical disabilities. However, than Performance IQ.
based on research with prior editions, the evidence is In a very important and ingenious study, Maxwell and
mixed as to whether the Peabody is a culturally fair instru- Wise (1984) investigated the vocabulary loading of the Pea-
ment that serves as a valid measure with minority chil- body in a sample of 84 inpatients from psychiatry and psy-
dren. For example, Washington and Craig (1999) found chology wards. Their study utilized the PPVT, but this
that 59 African American preschoolers at risk for academic earlier edition is similar to the PPVT-IV, so that the conclu-
failure averaged 91 on the test (SD of 11), which was seen sions are pertinent here. The researchers investigated the
as commensurate with their environmental disadvan- hypothesis that the PPVT assesses more than vocabulary in
tages. These authors laud the test as “culturally fair.” adults. In addition to the PPVT, the researchers collected
However, Campbell, Bell, and Keith (2001) reported an data on the following: WAIS-R, Wechsler Memory Scale,
average score of 82 (SD of 12) for 416 African American name-writing speed, and years of education. Name-writ-
children of low socioeconomic status, which was 8 points ing speed is simply the number of seconds required for the
lower than their overall score on the K-ABC. These examinee to write his or her full name. Even though all
researchers concluded: “Despite the attempts to reduce variables had significant correlations with PPVT IQ, WAIS-R
racial differences, the PPVT-III appears to perform simi- Vocabulary had by far the strongest correlation (r = .88).
larly to prior editions of the Peabody scales. On average, More important, when the variance accounted for by
the PPVT-III tends to underestimate both intellectual abil- Vocabulary was removed, none of the remaining variables
ity and scholastic achievement, as measured by the had any predictive relationship with the PPVT. In short,
Testing Special Populations 201

the Peabody is a good measure of vocabulary (hearing acceptable split-half reliability and shows high correlations
vocabulary, in particular) but could be misleading if used with verbal scales of the WISC-R (Teare & Thompson,
as a global measure of intellect. 1982). The developers of the Perkins-Binet have acknowl-
The PPVT-4 is a recent revision, so independent edged that visual problems exist on a continuum by devel-
research with the test is limited. One caution with the pre- oping separate norms for children with usable vision
vious edition, the PPVT-III, is that standard scores may be (Form U) and no usable vision (Form N).
substantially lower than Wechsler IQs, particularly with Test developers have also succeeded in modifying the
persons with mental retardation and minority examinees. Wechsler Performance scales for use with individuals with
In a sample of 21 adults with mild mental retardation, visual impairments. The Haptic Intelligence Scale for the
Prout and Schwartz (1984) found the PPVT-R standard Adult Blind (HISAB) consists of six subtests, four of which
scores (mean of 56) to be an average of 9 points lower than resemble the Digit Symbol, Block Design, Object Assembly,
the WAIS-R IQ (mean of 65). Naglieri and Yazzie (1983) and Picture Completion tests of the WAIS Performance
found a huge 26-point difference with a sample of Navajo scale (Shurrager, 1961; Shurrager & Shurrager, 1964). The
Indian children, who averaged a standard score of 61 on remaining two subtests consist of Bead Arithmetic, which
the PPVT-R in contrast to WISC-R IQ of 87. On a similar involves the use of an abacus to solve arithmetic problems,
note, with the PPVT-III, Bell, Lassiter, Matthews, and and a Pattern Board, which requires the examinee to repro-
Hutchinson (2001) found that the instrument tended to duce the pattern felt on a board that has rows of holes with
underestimate WAIS-III IQ scores of bright college stu- pegs in them. The reliability of the HISAB is excellent and
dents by about 10 points. the authors provide normative data on a sample of adults
Overall, we may conclude that the Peabody is a well- with visual impairment. Most encouraging of all, HISAB
normed measure of hearing vocabulary that is useful with scores correlate .65 with the WAIS Verbal IQ (Shurrager &
nonreading and motor-impaired examinees. However, the Shurrager, 1964). Although the HISAB is still manufac-
instrument is not a substitute for a general intelligence test tured and sold by Stoelting Company, unfortunately, the
and PPVT-4 scores may underestimate intellectual func- test has never been investigated empirically. A search of
tioning in some groups (e.g., minority children, high-func- PsychINFO for research with this instrument did not locate
tioning adults). a single article.
Another interesting instrument is the Blind Learning
7.2.4: Testing Persons with Visual Aptitude Test (BLAT), a tactile test for children from 6 to 16
years of age who are blind (Newland, 1971). The BLAT
Impairments items are in bas-relief form, consisting of dots and lines
Many millions of American adults have some degree of similar to Braille. The items consist of six different types:
visual impairment, including more than 1 million individ- recognition of differences, recognition of similarities, iden-
uals who are legally blind—a term used in determining tification of progressions, identification of the missing ele-
eligibility for government benefits. This term applies to ment in a 2 : 2 matrix, completion of a figure, and
individuals with central visual acuity of 20/200 or less in identification of the missing element in a 3 : 3 matrix. Most
the better eye (with correction) or to those with significant of the items were adapted from Raven’s Progressive Matri-
reduction in their visual field to a diameter of 20 degrees or ces and the Cattell Culture Fair Intelligence Test. The BLAT
less (Bradley-Johnson & Ekstrom, 1998). The number of was standardized on 961 functionally blind children 6 to
children with visual impairment is substantially smaller, 17½ years of age, in residential and day-care settings (New-
with only 0.4 percent of students between the ages of 6 and land, 1990). The sample is said to be socioeconomically and
21 years receiving special education services because of a racially representative of the U.S. population. The BLAT
vision problem (U.S. Department of Education, 1992). In reveals excellent reliability, with internal consistency
addition to special arrangements in testing, individuals (Kuder-Richardson) of .93, and test–retest reliability over a
with visual impairment may require unique instruments 7-month period of .87 and .92 (two studies). The test corre-
for valid assessment. lates very well with the Hayes-Binet (r = .74) and the WISC
In assessing the intellectual functioning of the visually Verbal scale (r = .71). The BLAT also shows strong correla-
impaired, examiners have historically relied on adapta- tions with Braille oral reading speed and comprehension
tions of the Stanford-Binet. The Hayes-Binet revision for (Baker, Koenig, & Sowell, 1995). In conjunction with a ver-
testing those with visual impairment was based on the bal test, the BLAT is a promising instrument for testing the
1916 Stanford-Binet; this instrument has since undergone intelligence of children with visual disabilities. However,
several revisions. The most recent adaptation is the Per- the test would profit substantially from minor revisions,
kins-Binet (Davis, 1980). The Perkins-Binet retains most of updated norms, and a more thorough test manual.
the verbal items from the Stanford-Binet but also adapts Dekker (1993) has developed a promising instrument
other items to a tactual mode. The Perkins-Binet possesses for visually impaired children: the Intelligence Test for
202 Chapter 7

­ isually Impaired Children (ITVIC). This test includes a


V hearing and who use sign language. One problem is that
number of haptic subtests (those relying only on the sense sign language “can now be characterized on a multidimen-
of touch), which are intended to replace traditional perfor- sional continuum encompassing numerous styles, lexical
mance subtests like Block Design that require intact vision. variants, syntactic structures, dialects, and approximations
Boter and Hoekstra-Vrolijk (1994) provide the compelling to or departures from English word ordering” (Brauer et
rationale for using haptic subtests with visually impaired al., 1998, p. 299). Thus, a test developed in standard ASL is
children: not equally fair to all persons who are deaf. In general, the
proper and valid assessment of persons who are deaf
Although the necessity for an IQ test with haptic subtests
for visually impaired children is evident in practice, the requires that interested psychologists immerse themselves
intelligence of visually impaired children is usually still in the Deaf culture and also seek relevant educational and
measured only through the use of the verbal subtests of training experiences:
the WISC-R. The risk of this is that an incomplete and one- One especially needs a thorough understanding of the
sided picture is obtained. Children with little education, implications of deafness and the use of sign language for
with a disadvantaged background or missing a good com- making diagnoses for people who are deaf. Few hearing
mand of the language may be underestimated. (p. 135) psychologists have these skills. The push is for special-
Designed for children 6 to 15 years of age, the test has ized training programs in deafness and psychology, a
separate norms for partially sighted and totally blind need that has been recognized for decades.
(Brauer et al., 1998, p. 303)
examinees. The instrument includes five verbal subtests
adapted from existing instruments such as the Wechsler If a consulting psychologist does not possess these
scales and seven new nonverbal subtests that rely on tactile skills, then the assessment of persons who are deaf should
perception: be referred to a person or agency with the requisite talents
and expertise.
Verbal Nonverbal/Haptic
The use of a sign language interpreter in the testing of
Vocabulary Perception of Objects
persons who are deaf is a complicated and controversial
Digit Span Perception of Figures matter. One concern is that the interpreter may inadvert-
Verbal Fluency Block Design ently alter the content of the test, therefore affecting the
Verbal Analogies Rectangle Puzzles validity of the findings. Certainly, it is unwise for parents
Learning Names Map and Plan Tests or teachers to serve as interpreters. However, it is also true
Exclusion of Figures that persons who are deaf and who use sign language
Figural Analogies achieve higher IQs when the directions are signed than
when they are delivered in the traditional manner (Braden,
The full battery takes about three hours to administer. 1992). The preferred resolution is for the examiner to be
Currently, the test is published in Dutch, German, and fluent in sign language, so that any necessary translations
English but has received limited use in the United States. stay within the bounds of standardized procedure.
This may be due, in part, to the size and weight of the test For the intellectual assessment of persons who are
kit. The ITVIC comes in a large “hold-all” that cannot be deaf or hard of hearing, the Wechsler Performance subtests
easily carried from one location to another. Information remain the tools of choice (Braden & Hannah, 1998). The
about this specialized instrument can be found at www. impact of English language facility is minimized on these
bartimeus.nl. subtests, so it is thought that they provide a more accurate
measure of cognitive skill than the Verbal subtests. Other
7.2.5: Testing Individuals Who tests sometimes used with persons who are deaf include
are Deaf or Hard of Hearing Raven’s Progressive Matrices (Raven, Court, & Raven,
1992) and the Hiskey-Nebraska Test of Learning Aptitude,
More than 1 million Americans are deaf or sufficiently hard
discussed previously. The WAIS-III is now available in a
of hearing that they rely on American Sign Language (ASL)
formal ASL translation (demonstrated on videotape),
as their primary means of communication (Brauer, Braden,
endorsed and disseminated by the test publisher (Kostru-
Pollard, & Hardy-Braz, 1998). Given the typical limited
bala & Braden, 1998).
mastery of the English language of persons who are deaf
and, vice versa, the typical psychologist’s limited (or non-
7.2.6: Assessment of Adaptive
existent) skill in ASL, the proper and valid assessment of
individuals who are deaf poses a profound cross-cultural Behavior in Intellectual Disability
challenge. The term intellectual disability is the currently preferred des-
More is involved than just picking a test developed for, ignation for the disability historically referred to as mental
and normed upon, individuals who are deaf or hard of retardation. In fact, the authoritative 130-year-old agency
Testing Special Populations 203

that has promoted the interests of affected individuals, the Intellectual disability represents a continuum from
American Association on Mental Retardation (AAMR), very mild to substantially disabling. For this reason, previ-
recently changed its name to the American Association on ous terminology recognized four levels of disability: mild,
Intellectual and Developmental Disabilities (AAIDD). The moderate, severe, and profound. However, current AAIDD
latest edition of its authoritative manual (Schalock, Borth- designations represent a departure from this terminology.
wick-Duffy, Buntinx, and others, 2010) eliminated all refer- Instead of focusing on the shortcomings of the person, the
ences to the term mental retardation. The reasons for the manual introduces a hierarchy of “Intensities of Needed
change have to do with providing a more hopeful and opti- Supports,” which redirects attention to the rehabilitation
mistic outlook for persons with intellectual disability: needs of the client. The four levels of needed supports are
intermittent, limited, extensive, and pervasive. However,
The construct of intellectual disability belongs within the
general construct of disability. Intellectual disability has the previous terminology referring to levels of disability
evolved to emphasize an ecological perspective that will likely prevail for quite some time, so we have chosen to
focuses on the person-environment interaction and recog- blend the old and the new approach in Table 7.8. The reader
nizes that the systematic application of individualized will notice a zone of uncertainty between levels of disabil-
supports can enhance human functioning. ity, which signifies that clinical judgment about all sources
(Schalock, Luckasson, Shogren, and others, 2007) of information is required in diagnosis. Furthermore, even
though these levels are calibrated by IQ ranges, we remind
In contrast, the outdated concept of mental retardation
the reader that the examinee must also show corresponding
gradually has taken on excess meanings that tend to isolate
deficits in adaptive skill. Under no circumstances is an IQ
the problem within the individual rather than recognizing
test a sufficient basis for diagnosing intellectual disability.
an ecological perspective.
The assessment of intellectual disability is a complex
and multifaceted concern that rightfully deserves a chapter Table 7.8 Four Levels of Intellectual Disability
or book of its own. Owing to space limitations, our cover-
age is necessarily abridged; interested readers are referred
to Schalock et al. (2010) and Jackson, Mulick, and Rojahn
(2007). Here we briefly summarize the diagnostic criteria
for intellectual disability and then review several intrigu-
ing assessment instruments in modest detail.
The most authoritative source for the definition of
intellectual disability is the American Association on Intel-
lectual and Developmental Disabilities. That organization
defines intellectual disability as follows:
Intellectual disability is characterized by significant limi-
tations both in intellectual functioning and in adaptive
behavior as expressed in conceptual, social, and practical
adaptive skills. This disability originates before age 18.
(Schalock, et al., 2007, p. 118).

The AAIDD further stipulates that significantly subav-


erage intellectual functioning is an IQ of 70 to 75 or below
on scales with a mean of 100 and a standard deviation of
15. The agency explicitly affirms the importance of profes-
sional judgment in individual cases.
A low IQ by itself is an insufficient foundation for the
diagnosis of intellectual disability. As noted, the definition
also specifies a second criterion—limitations in adaptive
behavior as expressed in conceptual, social, and practical
adaptive skills. A diagnosis of mental retardation is war-
ranted only when an individual displays a sufficiently low
IQ and limitations in one or more of the broad areas of
adaptive functioning. Furthermore, these deficits in intel- Source: Based on Schalock et al. (2010) and Beirne-Smith, Ittenbach, and Patton (2002).

lect and adaptive functioning must have arisen during the


developmental period—defined as between birth and the Limitations in adaptive skill are more difficult to
eighteenth birthday. ­confirm than a low IQ. Fortunately, the AAIDD stipulates
204 Chapter 7

specific skills within the three areas of adaptive function- higher level of competence is required to evaluate results
ing, namely: and make decisions about placement or treatment.
The 14 subscales of the SIB are arranged into four clus-
• Conceptual skills—language and literacy; money,
ters, as outlined in Table 7.9. In turn, these four clusters
time, and number concepts; and self-direction.
• Social skills—interpersonal skills, social responsibility,
self-esteem, gullibility, naïveté (i.e., wariness), social
Table 7.9 The Subscales and Clusters of the Scales of
Independent Behavior-Revised
problem solving, and the ability to follow rules/obey
laws and to avoid being victimized.
• Practical skills—activities of daily living (personal
care), occupational skills, health care, travel/transpor-
tation, schedules/routines, safety, use of money, use of
the telephone (www.aamr.org).

In regard to the assessment of these limitations, the


agency proposes that well-normed measures of adaptive
skills are desirable, but the final determination is always a
matter of clinical judgment.
The first standardized instrument for assessing adap-
tive behavior was the Vineland Social Maturity Scale (Doll,
1935). Somewhat simplistic and coarse-grained by modern
standards, the original Vineland scale consisted of 117 dis-
crete items arranged in a year-scale format. An informant
familiar with the examinee would check off applicable
items. From these results the examiner would calculate an
equivalent social age, helpful in the diagnosis of mental
retardation. Still a respected instrument, the Vineland has
undergone several revisions and is now known as the
Vineland Adaptive Behavior Scales, Second Edition (Spar-
row, Cicchetti, & Balla, 2005).
Since the release of the original Vineland scale, over
100 scales of adaptive behavior have been published (Mat-
son, 2007; Reschly, Myers, & Hartel, 2002). These instru-
ments vary greatly in structure, intended purpose, and
targeted population. Broadly speaking, we can distinguish
two types of instruments designed for two different pur-
poses. One group of mainly norm-referenced scales is used
largely to assist in diagnosis and classification. Another
group of mainly criterion-referenced scales is used largely
to assist in training and rehabilitation. We have chosen a
few representative instruments for more detailed analysis.

Scales of Independent Behavior-Revised  The


Scales of Independent Behavior-Revised (SIB-R; Bruininks,
Woodcock, Weatherman, & Hill, 1996) is an ambitious,
multidimensional measure of adaptive behavior that is
highly useful in the assessment of intellectual disability.
The instrument consists of 259 adaptive behavior items
organized into 14 subscales. The scale is completed with
the help of a parent, caregiver, or teacher well acquainted
with the examinee’s daily behaviors. For each subscale, the
examiner reads a series of items and for each item records a
score from 0 (never or rarely does task) to 3 (does task very
well). A useful feature of the SIB-R is that examiners need a
minimum of training and experience. Of course, a much
Testing Special Populations 205

constitute the Broad Independence Scale. Each subscale and nondisabled subjects show confirmatory relationships:
consists of a small number of discrete, developmentally SIB-R scores are lowest among those persons known to be
ordered items. For example, the subscale on Eating and most severely impaired in learning and adjustment. For
Meal Preparation has 19 graded items, including spearing disabled examinees, SIB-R scores correlate very strongly
food with a fork, eating soup with a spoon, taking appro- with intelligence scores (in the .80s), whereas with nondis-
priate-sized portions, and preparing snacks that do not abled examinees, the relationship is minimal (Bruininks et
require cooking. For each subscale, items are administered al., 1996). The SIB-R also possesses excellent convergent
until a predetermined ceiling is reached (e.g., 3 of 5 con- validity—the Broad Independence Score correlated .83
secutive items scored 0). with the composite score from a similar instrument, the
Raw scores for a subtest are added to obtain a part Vineland Adaptive Behavior Scales (Middleton, Keene, &
score. The part scores for each cluster are then added to Brown, 1990). Tan, Hultsch, Hunter, and Strauss (2010)
obtain the cluster score. The score for the Broad Independ- reported that a slightly modified version of the SIB-R was
ence Scale is derived from the four cluster scores. The sub- helpful in the evaluation of elderly clients with mild cogni-
test scores, cluster scores, and the Broad Independence tive impairment.
score can then be converted to a variety of normative scores In sum, the SIB-R is an excellent tool for providing
to permit comparison of the examinee’s performance with insights into an examinee’s current level of functioning in
the performance of the national norming sample. The nor- real-life situations in the home, school, and community set-
mative scales include age scores, percentile ranks, standard tings. Although this instrument does not have a precise
scores, stanines, and normal curve equivalents. correspondence with the areas of adaptive skill listed in the
A separate, unique part of the SIB-R also assesses mal- definition of intellectual disability, there is substantial simi-
adaptive behavior by measuring the frequency and sever- larity. For example, the following areas of adaptive skills
ity of problem behaviors. The Problem Behaviors Scale are well covered by subscales or clusters of the SIB-R: com-
includes eight major categories of personal and social mal- munication, self-care, home living, social skills, commu-
adjustment that could affect adaptive behavior: Hurtful to nity use, health and safety, and work. The SIB-R or a similar
Self, Hurtful to Others, Destructive to Property, Disruptive instrument ranks as a mandatory supplement to individ-
Behavior, Unusual or Repetitive Habits, Socially Offensive ual intelligence testing in the diagnosis and assessment of
Behavior, Withdrawal or Inattentive Behavior, and Unco- mental retardation.
operative Behavior. Examples of problem behaviors are
listed, and the respondent must indicate the behaviors dis- Inventory for Client and Agency Planning
played by the examinee. In addition, the respondent (ICAP) The Inventory for Client and Agency Planning
describes the one most serious behavior in each category (Hill, 2005) is one of the most widely used tests in the field
and rates it according to frequency of occurrence, severity, of developmental disabilities. This test is suitable for chil-
and typical management. dren and adults with mental retardation, individuals who
The standardization of the SIB-R was well conceived become disabled as adults through illness or accident, and
and executed. The norm group consisted of 2,182 persons elderly persons who have slowly lost their independence
sampled to reflect the 1990 census characteristics. The nor- and, therefore, need special assistance. The focus of the
mative data cover persons from age 3 months to adults instrument is on determining the need for special services
over age 80. An additional sample of persons with mental such as personal care, remedial education, vocational train-
retardation, learning or hearing disabilities, and behavior ing, or sheltered work environment.
disorders was also tested. The value of the SIB-R was fur- The test is a 16-page booklet that evaluates adaptive
ther strengthened by anchoring it to the norms for the behavior, maladaptive behavior, and the need for assis-
Woodcock-Johnson Psycho-Educational Battery-Revised. tance and supports. Amazingly, it can be completed in
The SIB-R is one component of this larger test battery, but about 15 minutes by a parent, teacher, or caregiver who is
can be used on its own. well acquainted with the client. The scales and subscales of
The reliability of the SIB-R is generally respectable, but the ICAP are depicted in Table 7.10. Identical to the SIB-R,
somewhat variable from subscale to subscale and from one adaptive behaviors are rated on a scale from 0 to 3, with 0
age group to another. The individual subscales tend to indicating never or rarely does a behavior well (even if
show split-half reliabilities in the vicinity of .80; the four asked), 1 indicating does the task but not well, 2 indicating
clusters have median composite reliabilities around .90; the does the task fairly well, and 3 indicating does the task
Broad Independence Scale has a very robust reliability in well without being asked. The maladaptive behaviors are
the high .90s (Bruininks, Woodcock, Weatherman, & Hill, assessed in a more complex manner using open-ended
1996). questions and follow-up queries as to frequency, severity,
Validity data for the SIB-R are very promising. For and consequences of the maladaptive behaviors. This tech-
example, the mean scores of various samples of disabled nique provides for a maladaptive behavior subscale with
206 Chapter 7

enhanced reliability (r = .80) in comparison to similar sub- services (www.cdhs.state.co.us). Resources are allocated
scales from other instruments that reveal low reliability for other reasons as well, but the ICAP is foundational to
(r = .60). From a psychometric standpoint, the ICAP meets the entire system of disabilities services. Certainly, this is
the highest standards. an example of consequential testing: The fate of an entire
group of individuals is linked to the soundness of the ICAP
for purposes of determining services.
Table 7.10 Scales and Subscales of the Inventory for
Client and Agency Planning
Additional Measures of Adaptive Behavior 
Number of Subscales or Domains We remind the reader that measures of adaptive behavior
Scale Items Measured vary greatly. Some scales are designed mainly for diagno-
Descriptive 10 Data on age, height, weight, legal sis, others for remediation. Some scales are useful with
status
persons with severe and profound mental retardation
Primary and 14 All relevant medical and
Additional psychological diagnoses who will never be employed, others with individuals
Diagnoses with mild mental retardation seeking vocational training.
Special Needs 10 Special needs in vision, hearing, Some scales are useful exclusively with children, others
mobility, health care, medications
with adults. These instruments are not interchangeable,
Residential 2 Residential supports now and in
Supports future
and the potential user must study their strengths and lim-
School/Vocational 2 School and vocational supports
itations carefully.
Supports now and in future The Vineland Adaptive Behavior Scales-II (VABS-II,
Other Support 26 Survey of all support services Sparrow, Cicchetti, & Balla, 2005) is the most widely used
Services needed, now and in future measure of adaptive behavior in existence. The instrument
Social/Leisure 16 Survey of social and leisure is the outcome of a major revision and restandardization of
Activities activities
the Vineland Social Maturity Scale, originally published in
Adaptive 77 Level of functioning in motor skills,
Behavior social and communication skills, 1935 by Edgar A. Doll. Based on a semistructured inter-
personal living skills, and view with a caregiver or parent, the VABS provides an
community living skills
evaluation in the following domains and subdomains:
Maladaptive 24 Self-injury, stereotyped, withdrawn,
Behavior offensive, uncooperative, disruptive,
Communication (receptive, expressive, written), Daily Liv-
destructive, hurts others ing Skills (personal, domestic, community), Socialization
NOTE: The ICAP also yields a Service Score based on Adaptive Behavior (interpersonal relationships, play and leisure time, coping
and Maladaptive Behavior. skills), Motor Skills (gross, fine).
The VABS-II is a widely respected instrument with
One of the most useful and appealing aspects of the good concurrent validity, including correlations in the
ICAP is that it provides an overall Service Score based on range of .50 to .80 with the Wechsler scales and Stanford-
both adaptive and maladaptive behavior. The Service Binet. However, some of the interview items require
Score, which ranges from 0 to 100, indicates the likely level knowledge that the informants may not possess (e.g.,
of attention, supervision, and training needed by the client. whether a child says 100 recognizable words). Silverstein
The lower the score, the greater the need for oversight. For (1986) faults the normative data, noting discontinuous
example, a child with severe disabilities and many mala- jumps in standard scores from one age group to another.
daptive behaviors might earn a score of 5, indicating the Even so, the Vineland continues to be a highly popular test
need for intensive supervision virtually 24 hours a day. At in clinical practice and research. A promising develop-
the other extreme, a normal young adult with no behavior ment in research is the increasing use of this instrument in
problems might earn a score of 95, indicating almost com- other countries. For example, de Bildt, Kraijer, Sytema,
plete self-sufficiency. and Minderaa (2005) report favorably on the validity of
By intention, the Service Score was designed to predict the VABS in a sample of 826 Dutch children with mental
not only the service intensity needed but also the costs retardation, and Balboni, Pedrabissi, Molteni, and Villa
associated with delivering the assistance. For this reason, (2001) established that the instrument accurately identifies
state and regional users often collate their ICAP data in a mentally retarded individuals with and without commu-
computer database provided by the test publishers. nication impairment, social behavior problems, and motor
In many states in the United States, the human ser- disabilities.
vices departments have linked their disability services The American Association on Intellectual and Devel-
with results from the ICAP. For example, in Colorado, the opmental Disability (AAIDD) has developed several scales
ICAP is used by the Division of Services for People with useful in the assessment of persons with cognitive limita-
Disabilities to determine eligibility and to allocate funds tions. We mention here just one of its products, the AAMR
for individuals receiving residential services and day care Adaptive Behavior Scales: Second Edition (Nihira, Leland,
Testing Special Populations 207

& Lambert, 1993). The residential and community version According to the Centers for Disease Control and Pre-
of this test, suitable for persons 18 to 80 years of age, is a vention, about 1 in 88 children manifests an ASD, and
psychometric tour de force that borders on overkill. The these disorders are 5 times more common among boys
normative sample includes more than 4,000 persons with than girls (Morbidity and Mortality Weekly Report, March
developmental disabilities from 43 states residing in the 30, 2012). Early diagnosis and intervention are vital
community or in residential settings. In addition to assess- because of the improved prognosis (Hollander, Kolevzon,
ing the appropriate behavioral domains (e.g., independent & Coyle, 2011).
functioning, domestic activity, self-direction, responsibil- The assessment of children for ASDs is a complex
ity), a noteworthy feature of the instrument is the careful endeavor that includes screening tests, behavioral observa-
attention to maladaptive behaviors, which are evaluated in tions, and diagnostic evaluation by specialists in pediat-
eight domains: rics, neurology, and psychology. Excessive reliance on
checklists or tests is unwise. Even so, appropriate scales
• Violent and antisocial behavior
can be a useful starting point. We survey a few good meas-
• Rebellious behavior
ures here.
• Eccentric and self-abusive behavior The Modified Checklist for Autism in Toddlers
• Untrustworthy behavior (M-CHAT; Robins, Fein, & Barton, 1999) is an appealing
• Withdrawal 23-item checklist that enjoys strong content validity. The
• Stereotyped and hyperactive behavior M-CHAT is a screening test used with toddlers between 16
and 30 months of age to identify children at risk for ASDs.
• Inappropriate body exposure
The authors openly acknowledge that the instrument
• Disturbed behavior
yields a high false-positive rate. Thus, M-CHAT should be
This scale has been extensively validated and clearly used only in conjunction with further diagnostic evalua-
distinguishes persons independently classified at different tion, in the event of a “failing” score. Items on the checklist
adaptive behavior levels. resemble the following:

Does your child play with other children? Yes No


7.2.7: Assessment of Autism Does your child smile when you smile? Yes No

Spectrum Disorders Does your child engage in pretend play? Yes No


Does your child enjoy peek-a-boo? Yes No
Autism is not a single disorder, but a range of closely
Does your child respond to his/her name? Yes No
related disorders evident in the first years of life. Autism
Does your child sustain eye contact? Yes No
spectrum disorders (ASDs) include diagnostic categories
such as autistic disorder, Asperger ’s syndrome, child-
Children who fail three or more items (or two or more
hood disintegrative disorder, and pervasive developmen-
critical items) should be referred for further evaluation by
tal disorder, among others (American Psychiatric
specialists. The M-CHAT has been translated into more
Association, 2000). Although the level of disability and
than 30 languages.
specific symptoms vary from child to child, what all chil-
Robins (2008) reported a large-scale study of 4,797
dren with ASDs share in common is a core of difficulties
children evaluated with M-CHAT during toddler check-
with reciprocal social skills, communication abilities, and
ups. From this sample, 466 screened positive on the
flexible behavior. Often, empathy is absent. Affected chil-
M-CHAT, including 362 families who completed a follow-
dren may display stereotypic activities, interests, and
up interview. From this group, 21 children eventually were
behaviors. A characteristic vignette of a child with ASD
diagnosed with ASDs. Remarkably, only four of these 21
might read as follows:
children were flagged by their pediatrician. In sum, the
Martin is a cute 2-year-old boy who is perplexing and M-CHAT yields a high false-positive rate, but this is an
worrisome to his parents. He will only eat crunchy acceptable price to pay for identifying at-risk children who
foods and refuses to use utensils. He rarely makes eye might otherwise go undetected for additional months or
contact. When watching TV, he rocks back and forth
years. In fact, the “cost” of the false-positive identifications
and flaps his hands. He seldom speaks, although he
usually consisted of a telephone follow-up call or brief in-
does verbalize “music” when he wants to hear a favorite
CD of children’s songs. He becomes enraged if his par-
person interview to determine that further assessment was
ents play a different CD. He appears self-absorbed and not warranted.
does not respond affectionately to his parents. For Another widely used autism checklist is the Baby and
­Martin, taking turns is a foreign concept. He has a very Infant Screen for Children with Autism Traits-Part 1,
short attention span. Even so, bright metal objects fasci- referred to as BISCUIT-Part 1 by the authors (Matson, Bois-
nate him. joli, & Wilkins, 2007). The instrument consists of 71 items
208 Chapter 7

that assess the core symptoms of autism in toddlers 17 to supporting the construct validity of the scale (Matson,
37 months of age. The items are completed by a parent or Boisjoli, Hess, & Wilkins, 2010). The BISCUIT-Part 1 also
caretaker on a 3-point scale that includes 0 (not different, demonstrated good convergent validity with the M-CHAT,
no impairment), 1 (somewhat different, mild impairment), and appropriate divergent validity with measures of adap-
and 2 (very different, severe impairment). Items are brief tive and motor behaviors in a sample of 1,007 toddlers
and resemble the following: communicates verbally, takes (Matson, Wilkins, & Fodstad, 2011). Over 80 studies have
turns, sustains eye contact, responds to name. An exploratory been published on the scale. For a recent review, see Mat-
factor analysis of results for 1,287 children enrolled in an son and Tureck (2012).
early intervention program yielded a three-factor solution
consistent with symptom clusters found in ASD children, Chapter Quiz: Testing Special Populations
Chapter 8
Origins of Personality Testing
Learning Objectives
8.1 Explain how the responses to ambiguous 8.2 Review structured tests and procedures,
stimuli reveal the innermost, unconscious self-report inventories, and behavioral
mental processes of the examinee assessment approaches of psychopathy

8.1: Theories of Personality Self-Report and Behavioral Assessment of Psychopathol-


ogy, which includes a review of structured tests and proce-
and Projective Techniques dures, including self-report inventories and behavioral
assessment approaches.
8.1 Explain how the responses to ambiguous stimuli
reveal the innermost, unconscious mental 8.1.1: Personality: an Overview
processes of the examinee
Although personality is difficult to define, we can distin-
In psychological testing a fundamental distinction often is guish two fundamental features of this vague construct.
drawn between ability tests and personality tests. Defined First, each person is consistent to some extent; we have
in the broadest sense, ability tests include a plethora of coherent traits and action patterns that arise repeatedly.
instruments for measuring intelligence, achievement, and Second, each person is distinctive to some extent; behavio-
aptitude. In the preceding seven chapters we have explored ral differences exist between individuals. Consider the
the nature, construction, application, reliability, and valid- reactions of three graduate students when their midterm
ity of ability tests. In the next two chapters we shift the examinations were handed back. Although all three stu-
emphasis to personality tests and related matters. Person- dents received nearly identical grades (solid B’s), personal
ality tests seek to measure one or more of the following: reactions were quite diverse. The first student walked off
personality traits, dynamic motivation, symptoms of dis- sullenly and was later overheard to say that a complaint to
tress, personal strengths, and attitudinal characteristics. the departmental administrator was in order. The second
Measures of spirituality, creativity, and emotional intelli- student was pleased, stating out loud that a B was, after all,
gence also fall within this realm. a respectable grade. The third student was disappointed
Theories of personality provide an underpinning for but stoical. He blamed himself for not studying harder.
the multiplicity of instruments available in the field. For How are we to understand the different reactions of
this reason, we begin this chapter with a survey of prom- these three persons, each of whom was responding to an
inent personality theories. The many ways in which the- identical stimulus? Psychologists and laypersons alike
orists conceptualize personality clearly have impacted invoke the concept of personality to make sense out of the
the design of personality tests and assessments. This is behavior and expressed feelings of others. The notion of
especially evident with projective techniques such as the personality is used to explain behavioral differences
Rorschach inkblot method, which emanated from psy- between persons (for example, why one complains and
choanalytic conceptions of personality. Thus, in Topic 8A, another is stoical) and to understand the behavioral con-
Theories of Personality and Projective Techniques, in addi- sistency within each individual (for example, why the
tion to the survey of personality theories, we have included complaining student noted previously was generally sour
an introduction to several instruments based on the turn- and dissatisfied).
of-the-twentieth-century psychoanalytic hypothesis where Why people differ is just one of many key issues in the
responses to ambiguous stimuli reveal the innermost, study of personality. Mayer (2007–8) provides a thoughtful
unconscious mental processes of the examinee. The cov- discussion of the big questions in personality psychology,
erage of personality assessment continues in Topic 8B, which he defines as “those questions that are simple,

209
210 Chapter 8

important, and central to many people’s lives.” He identi- under hypnosis, a release of emotion called abreaction
fies 20 big questions, only a few of which can be addressed would take place and the hysterical symptoms would dis-
through testing and assessment. These questions involve appear, at least briefly (Studies on Hysteria, Breuer & Freud,
existential matters such as the purpose of life, the nature of 1893–1895).
personhood, and the difficulties encountered in seeking From these early studies Freud developed a general
self-knowledge. His captivating article is a reminder that theory of psychological functioning with the concept of
some vital issues can be approached through the empiri- the unconscious as its foundation. He believed that the
cism of psychological research and testing, whereas other unconscious was the reservoir of instinctual drives and a
crucial matters remain elusive and are amenable mainly to storehouse of thoughts and wishes that would be unac-
philosophical and phenomenological inquiry. ceptable to our conscious self. Thus, Freud argued that our
In addition to understanding personality, psycholo- most significant personal motivations are largely beyond
gists also seek to measure it. Literally hundreds of person- conscious awareness. The concept of the unconscious was
ality tests are available for this purpose; we will review discussed in elaborate detail in his first book (The Interpre-
historically prominent instruments and also discuss some tation of Dreams, Freud, 1900). Freud believed that dreams
promising new approaches. However, in order that the portray our unconscious motives in a disguised form.
reader can better comprehend the diversity of instruments Even a seemingly innocuous dream might actually have a
and approaches, we begin with a more fundamental ques- hidden sexual or aggressive meaning, if it is interpreted
tion: How is personality best conceptualized? As the reader correctly.
will discover, in order to measure personality we must first Freud’s concept of the unconscious penetrated the
envision what it is we seek to measure. The reader will bet- very underpinnings of psychological testing early in the
ter appreciate the multiplicity of tests and procedures if we twentieth century. An entire family of projective tech-
also briefly describe the personality theories that comprise niques emerged, including inkblot tests, word association
the underpinnings for these instruments. approaches, sentence completion techniques, and story-
telling (apperception) techniques (Frank, 1939, 1948).
Each of these methods was predicated on the assumption
8.1.2: Psychoanalytic Theories that unconscious motives could be divined from an exam-
of Personality inee’s responses to ambiguous and unstructured stimuli.
Psychoanalysis was the original creation of Sigmund In fact, Rorschach (1921) likened his inkblot test to an
Freud (1856–1939). While it is true that many others have X ray of the unconscious mind. Although he patently
revised and adapted his theories, the changes have been overstated the power of projective techniques, it is evi-
slight in comparison to the substantial foundations that dent from Rorschach’s view that the psychoanalytic con-
can be traced to this singular genius of the Victorian and ception of the unconscious had a strong influence on
early-twentieth-century era. Freud was enormously pro- testing practices.
lific in his writing and theorizing. We restrict our discus-
sion to just those aspects of psychoanalysis that have The Structure of the Mind Freud divided the
influenced psychological testing. In particular, the Ror- mind into three structures: the id, the ego, and the super-
schach, the Thematic Apperception Test, and most of the ego. The id is the obscure and inaccessible part of our per-
projective techniques critiqued in the next topic dictate a sonality that Freud likened to “a chaos, a cauldron of
psychoanalytic framework for interpretation. Readers seething excitement.” Because the id is entirely uncon-
who wish a more thorough review of Freud’s contribu- scious, we must infer its characteristics indirectly by ana-
tions can start with the New Introductory Lectures on Psy- lyzing dreams and symptoms such as anxiety. From such
choanalysis (Freud, 1933). Reviews and interpretations of an analysis, Freud concluded that the id is the seat of all
Freud’s theories can be found in Stafford-Clark (1971) and instinctual needs such as for food, water, sexual gratifica-
Fisher and Greenberg (1984). tion, and avoidance of pain. The id has only one purpose,
to obtain immediate satisfaction for these needs in accord-
Origins of Psychoanalytic Theory Freud began ance with the pleasure principle. The pleasure principle is
his professional career as a neurologist but was soon spe- the impulsion toward immediate satisfaction without
cializing in the treatment of hysteria, an emotional disor- regard for values, good or evil, or morality. The id is also
der characterized by histrionic behavior and physical incapable of logic and possesses no concept of time. The
symptoms of psychic origin such as paralysis, blindness, chaotic mental processes of the id are, therefore, unaltered
and loss of sensation. With his colleague Joseph Breuer, by the passage of time, and impressions that have been
Freud postulated that the root cause of hysteria was buried pushed down into the id “are virtually immortal and are
memories of traumatic experiences such as childhood sex- preserved for whole decades as though they had only
ual molestation. If these memories could be brought forth recently occurred” (Freud, 1933).
Origins of Personality Testing 211

If our personality consisted only of an id striving to trolled by the ego, we are not aware of their operation. The
gratify its instincts without regard for reality, we would third characteristic of defense mechanisms is that they dis-
soon be annihilated by outside forces. Fortunately, soon tort inner or outer reality. This property is what makes
after birth, part of the id develops into the ego or conscious them capable of reducing anxiety. By allowing the ego to
self. The purpose of the ego is to mediate between the id view a challenge from the id, superego, or external reality
and reality. The ego is part of the id and servant to it, but in a less-threatening manner, defense mechanisms help the
the ego “interpolates between desire and action the pro- ego avoid crippling levels of anxiety. Of course, because
crastinating factor of thought” (Freud, 1933). Thus, the ego they distort reality, the rigid, excessive application of
is largely conscious and obeys the reality principle; it seeks defense mechanisms may create more problems than it
realistic and safe ways of discharging the instinctual ten- solves.
sions that are constantly pushing forth from the id.
The ego must also contend with the superego, the eth- Assessment of Defense Mechanisms and Ego
ical component of personality that starts to emerge in the Functions  Although Freud introduced the concept of
first five years of life. The superego is roughly synonymous defense mechanisms, it was left to his followers to eluci-
with conscience and comprises the societal standards of date these unconscious mental strategies in more detail
right and wrong that are conveyed to us by our parents. (Paulhus, Fridhandler, & Hayes, 1997). Vaillant (1971)
The superego is partly conscious, but a large part of it is developed a hierarchy of ego defense mechanisms based
unconscious, that is, we are not always aware of its exist- on the assumption that some mechanisms are healthier or
ence or operation. The function of the superego is to restrict more adaptive than others. He suggested four broad types,
the attempts of the id and ego to obtain gratification. Its listed here in ascending level of maturity: psychotic, imma-
main weapon is guilt, which it uses to punish the wrong- ture, neurotic, mature. Each type includes specific defense
doings of the ego and id. Thus, it is not enough for the ego mechanisms such as denial, projection, repression, and
to find a safe and realistic way for the gratification of id altruism, described below. Perry and Henry (2004) pro-
strivings. The ego must also choose a morally acceptable posed a similar hierarchy of adaptation in defense mecha-
outlet, or it will suffer punishment from its overseer, the nisms. They also developed a sophisticated rating scale,
superego. This explains why we may feel guilty for which, as we will see, is of value in clinical practice. A hier-
immoral behavior such as theft even when getting caught archy of types of defense mechanisms (least mature to
is impossible. Another part of the superego is the ego ideal, most mature) is provided in Table 8.1.
which consists of our aims and aspirations. The ego meas-
ures itself against the ego ideal and strives to fulfill its
demands for perfection. If the ego falls too far short of
Table 8.1 A Hierarchy of Types of Defense Mechanisms
meeting the standards of the ego ideal, a feeling of guilt (Least Mature to Most Mature)
may result. We commonly interpret this feeling as a sense
of inferiority (Freud, 1933).

The Role of Defense Mechanisms  The ego cer-


tainly has a difficult task, acting as mediator and servant to
three tyrants: id, superego, and external reality. It may seem
to the reader that the task would be essentially impossible
and that the individual would, therefore, be in a constant
state of anxiety. Fortunately, the ego has a set of tools at its
disposal to help carry out its work, namely, mental strate-
gies collectively labeled defense mechanisms.
Defense mechanisms come in many varieties, but they
all share three characteristics in common. First, their exclu-
sive purpose is to help the ego reduce anxiety created by
the conflicting demands of id, superego, and external real-
ity. In fact, Freud felt that anxiety was a signal telling the
ego to invoke one or more defense mechanisms in its own
behalf. Defense mechanisms and anxiety are, therefore,
complementary concepts in psychoanalytic theory, one
existing as a counterforce to the other. The second common
feature of defense mechanisms is that they operate uncon-
sciously. Thus, even though defense mechanisms are con- Source: Based on Perry and Henry (2004) and Vaillant (1977).
212 Chapter 8

Psychotic defense mechanisms are the least healthy An example of humor as a mature defense mechanism
because they distort reality to an extreme degree. One would be former president Ronald Reagan’s quip to doc-
example includes gross denial of external reality such as the tors in 1981 as he entered surgery for a bullet wound from
refusal to acknowledge the death of a loved one. Another his attempted assassination. He is reported to have said, “I
example is delusional projection, which consists of frank hope you’re all Republicans.”
delusions about external reality, usually of a persecutory Perry and colleagues developed the Defense Mecha-
nature. The second grouping, Acting Out, comprises sev- nism Rating Scales (DMRS) as a basis for assessing the
eral forms of maladaptive action such as passive-aggressive level, type, and severity of defense mechanisms encoun-
behavior (e.g., intentional lateness to aggravate a partner), tered in psychotherapy patients (Perry, 1990; Perry &
impulsive behavior designed to reduce tension, and com- Harris, 2004). The DMRS was devised for rating the pres-
plaining while simultaneously rejecting help. ence of 30 discrete defense mechanisms (e.g., acting out,
Borderline defense mechanisms include patterns of splitting, denial, projection, repression, intellectualization,
behavior often found in persons with a diagnosis of Border- altruism, etc.) in a 50-minute dynamically oriented inter-
line Personality Disorder (American Psychiatric Association, view. In the original scale, a 3-point qualitative rating of
2000). The specific mechanisms include splitting, in which absent, probably present, or definitely present was
the images of others (or self) alternate rapidly from all good obtained for each defense mechanism identified in a review
to all bad, and projective identification which is the projec- of a videotaped session.
tion of an unwanted, unrecognized trait (like anger) onto Subsequently, the test developers adopted a simple
others. Neurotic defense mechanisms, the fourth group, are quantitative scoring approach in which defense mecha-
found to some degree in most persons and include repres- nisms were isolated and identified in short, meaningful
sion (inexplicable memory lapses or failure to acknowledge segments of the taped interview. They found that a typical
information, such as “forgetting” a dental appointment) and therapy session includes anywhere from 15 to 75 illustra-
displacement, which comprises the transfer of feelings from tions of the various defense mechanisms. Based on prior
the real object onto someone or something else, such as kick- research, each defense mechanism receives a score from 1
ing the dog when angry with the boss. (highly immature and maladaptive) to 7 (highly mature
Obsessive defense mechanisms also are very common and adaptive). Although the scale offers a number of scor-
and consist of mental patterns like isolation of affect or intel- ing options, the most useful score is the Overall Defensive
lectualization. Isolation of affect involves the superficial Functioning (ODF) score, which is the simple average of
acknowledgement of a feeling in the absence of a full emo- the ratings of the observed defense mechanisms. The theo-
tional experience. In intellectualization, threatening matters retical range of scores is 1.0 to 7.0, although scores of 3.0
are acknowledged but explored in bland terms that are rela- and below are rare. Scores below 5.0 indicate significant
tively devoid of feelings. For example, Vaillant (1971) personality disorder or severe depression. Scores of 6.0 and
describes a physician whose mother had died recently of higher indicate normal or healthy functioning. Interrater
cancer. The doctor talked at length about the medical charac- reliabilities from six studies were mostly in the mid- to
teristics of her illness, thereby easing his sense of loss. high-.80s for the ODF scores. The stability coefficient for a
Mature defense mechanisms appear to the beholder small sample of patients over a one-month interval was a
as convenient virtues. An example is certain forms of respectable .75 (Perry & Harris, 2004).
humor that do not distort reality but that can ease the bur- The ODF scores tend to improve over the course of
den of matters “too terrible to be borne” (Vaillant, 1977). dynamically oriented therapy, which supports the validity
Specific kinds of mature mechanisms include: of the construct being measured, maturity of defense
mechanisms. In four studies involving one-month to one-
year follow-up with small samples, the within-group
Types of Mature Defense Mechanisms effect sizes for gains in ODF scores ranged from .02 to 1.05,
with most in the range of .41 to .82 (Perry Harris, 2004,
Table 9.5). Effect sizes of this magnitude are considered
moderate to large, that is, meaningful gains are being
accomplished, as registered by the increased maturity of
the defense mechanisms emerging in the therapy sessions.
The authors observe:
Defenses can be viewed as both process phenomena (psy-
chological mechanisms in action) and as a measure of
adaptive outcome, when aggregated across sessions and
time. This gives the study of defenses great potential clin-
ical relevance. To develop and test predictive hypotheses
Origins of Personality Testing 213

about treatment will make the study of defense very rele- phone call. Almost beyond belief, one patient confessed to
vant to daily clinical work, and both scientifically promis- using two electric shavers, one for each hand (Friedman &
ing and exciting Ulmer, 1984).
(Perry & Harris, 2004, p. 190). In other studies, researchers have found only a weak
relationship—or no relationship at all—between Type A
The meaningful assessment of defense mechanisms
behavior and CHD (e.g., Eaker & Castelli, 1988; Smedslund
largely has eluded clinical researchers, but instruments like
& Rundmo, 1999). In the most comprehensive review of its
the DMRS show promise of making key elements of psy-
kind, Myrtek (2007) conducted a meta-analysis of 25 pro-
choanalytic theory accessible to empirical validation (Perry,
spective studies of Type A behavior and CHD and con-
Beck, Constantinides, & Foley, 2009). However, this
cluded flatly that “Type A behavior is not an independent
approach does have two drawbacks: The practitioner
risk factor for CHD.” Effect sizes in this review were not
needs specialized training to identify defense mechanisms,
just small, they were effectively zero, on the order of .003. It
and the process of collecting relevant information from
did not matter whether structured interviews or question-
patients is very time-consuming.
naires were used to assess Type A behavior. Myrtek (2007)
also warns that the existence of the concept itself can be
8.1.3: Type Theories of Personality dangerous because it provides patients an “external causal
The earliest personality theories attempted to sort individ- attribution” and relieves them of the responsibility for
uals into discrete categories or types. For example, the behavior change. The Type A concept also gives false ben-
Greek physician Hippocrates (ca. 460–377 b.c.) proposed a efit to physicians when they work with CHD patients who
humoral theory with four personality types (sanguine, lack the usual risk factors (smoking, poor diet, lack of exer-
choleric, melancholic, and phlegmatic) that was too sim- cise). Blaming Type A behavior is easier than admitting
plistic to be useful. In the 1940s, Sheldon and Stevens that the causes of CHD sometimes are unknown.
(1942) proposed a type theory based on the relationship Other researchers have found that CHD is linked not
between body build and temperament. Their approach so much with the full-blown Type A behavior pattern as
stimulated a flurry of research and then faded into obscu- with specific components such as being anger-prone (Dem-
rity. Nonetheless, typological theories have continued to broski, MacDougall, Williams, & Haney, 1985) or possess-
capture intermittent interest among personality research- ing time urgency (Wright, 1988). Wielgosz and Nolan
ers. We will illustrate type theories by reviewing contem- (2000) identified hostility, cynicism, and suppression of
porary research on coronary-prone personality types. anger, as well as stress, depression, and social isolation as
significant risk factors in Type A behavior. Certainly there
Type A Coronary-Prone Behavior Pattern continues to be a need to sort out the specific risk factors in
Friedman and Rosenman (1974) investigated the psycho- this area of investigation. What we do know with certainty
logical variables that put individuals at higher risk of coro- is that the simple equation of Type A behavior causes CHD
nary heart disease. They were the first to identify a Type A no longer is convincing.
coronary-prone behavior pattern, which they described as Type A behavior can be diagnosed from a short inter-
“an action–emotion complex that can be observed in any view consisting of questions about habits of working, talk-
person who is aggressively involved in a chronic, incessant ing, eating, reading, and thinking (Friedman, 1996). The
struggle to achieve more and more in less and less time, more flagrant cases of Type A behavior can also be detected
and if required to do so, against the opposing efforts of by paper-and-pencil tests (Jackson & Gray, 1987). How-
other things or persons” (Friedman & Rosenman, 1974). At ever, the questionnaire approach is limited because it can-
the opposite extreme is the Type B behavior pattern, char- not reveal the facial, vocal, and psychomotor indices of
acterized by an easygoing, noncompetitive, relaxed life- hostility and time urgency that are usually evident in inter-
style. Of course, people vary along a continuum from view (Friedman & Ulmer, 1984).
“pure” Type A to “pure” Type B. Early studies indicated that persons who exhibited the
Friedman and Ulmer (1984) have provided a detailed Type A behavior pattern were at greatly increased risk of
description of the full-fledged Type A behavior pattern, coronary disease and heart attack. In one 9-year study of
and it is not an appealing picture. These individuals dis- more than 3,000 healthy men, persons with the Type A
play a deep insecurity, regardless of their achievements. behavior pattern were 2½ times more likely to suffer heart
They desire to dominate others, and typically are indiffer- attacks than those with Type B behavior pattern (Friedman
ent to the feelings of competitors. They exhibit a free- & Ulmer, 1984). In fact, not one of the “pure” Type B’s—the
floating hostility, and easily find things that irritate them. extremely relaxed, easygoing, and noncompetitive mem-
They also suffer from a sense of urgency about getting bers of the study—had suffered a heart attack. In the
things done. Type A persons often engage in multitask- famous Framingham longitudinal study, Type A men ages
ing, such as reviewing correspondence while making a 55 to 64 were about twice as likely at 10-year follow-up to
214 Chapter 8

develop coronary heart disease as Type B men (Haynes, a generalized procedure that is especially useful for study-
Feinleib, & Eaker, 1983). In this study, the link between ing changes in self-concept.1 The Q-sort consists of a large
Type A behavior and coronary heart disease (CHD) was number of cards, each containing a printed statement such
especially strong for white-collar workers. as the following:

I am poised
8.1.4: Phenomenological Theories I put on a false front
of Personality I make strong demands on myself
Phenomenological theories of personality emphasize the I am a submissive person
importance of immediate, personal, subjective experience I am likeable
as a determinant of behavior. Some of the theoretical posi-
The examinee is asked to sort a hundred or so state-
tions subsumed under this title have been given other
ments into nine piles, putting a prescribed number of cards
labels also, such as humanistic theories, existential theo-
into each, thus forcing a near-normal distribution. The
ries, construct theories, self-theories, and fulfillment theo-
instructions specify that the examinee put the cards most
ries (Maddi, 2000). Nonetheless, these approaches share a
descriptive of him or her at one end, those least descriptive
common focus on the person’s subjective experience, per-
at the opposite end, and those about which he or she is
sonal world view, and self-concept as the major wellsprings
indifferent or undecided around the middle of the distribu-
of behavior.
tion. The required distribution might look like this:
Origins of the Phenomenological Approach
The orientation briefly reviewed in this section has numer- Least Like Me Most Like Me
ous sources that reach back to turn-of-the-twentieth-century Pile No. 1 2 3 4 5 6 7 8 9
European philosophy and literature. Nonetheless, two
No. of cards 1 4 11 21 26 21 11 4 1
persons, one a philosopher and the other a writer, stand
out as seminal contributors to the modern phenomeno- The nature of the items is determined by the needs of
logical viewpoint. The German philosopher Edmund the researcher or practitioner. Rogers used a set of items
Husserl (1859–1938) invented a complex philosophy of devised by Butler and Haigh (Rogers & Dymond, 1954,
phenomenology that was concerned with the description chap. 4) to tap the self-concept. These statements were
of pure mental phenomena. Husserl’s approach was heav- taken at random from available therapeutic protocols; their
ily introspective and nearly inscrutable. More approacha- Q-sort items represented actual client statements, reworded
ble was the Danish writer Søren Kierkegaard (1813–1855), for clarity. But a special virtue of the Q-technique is that
well known for his contributions to existentialism. Exis- other researchers or practitioners are free to craft their own
tentialism is the literary and philosophical movement con- items. For example, Marks and Seeman (1963) used a psy-
cerned with the meaning of life and an individual’s chodynamic perspective in devising items for the therapist
freedom to choose personal goals. The phenomenology of description of patient groups. Examples of their items
Husserl and the existentialism of Kierkegaard influenced include the following:
dozens of prominent philosophers and psychologists.
Vestiges of these early viewpoints are evident in virtually Utilizes acting out as a defense mechanism
every contemporary phenomenological personality theory Tends to be flippant in both word and gesture
(Maddi, 2000). Genotype has paranoid features
Carl Rogers, Self-Theory, and the Q-Technique  Appears to be poised, self-assured, socially at ease
The most influential phenomenological theorist was Carl Exhibits depression (manifest sad mood)
Rogers (1902–1987). His contributions to personality the-
Scoring a Q-sort is usually a matter of comparing or
ory, known as self-theory, are extensive and generally well
correlating the distribution of items against an established
appreciated by students of psychology (Rogers, 1951, 1961,
norm. For example, well-adjusted persons might be asked
1980). But it is also true, albeit little recognized, that Rogers
to sort the items so as to derive an average pile placement
helped shape a small part of psychological testing by pop-
ularizing the Q-technique. 1
The Q-technique has additional applications as well. Marks and
The Q-technique is a procedure for studying changes
Seeman (1963) employed Q-sorts by therapists to describe patients
in the self-concept, a key element in Rogers’s self-theory. with specific MMPI profiles. Bem and Funder (1978) recommend
The technique was developed by Stephenson (1953) but a a Q-sort to derive a profile of characteristics associated with suc-
series of studies by Rogers and his colleagues served to cessful performance of a specific task. Persons whose self-
popularize this measurement approach (Rogers & descriptions match the derived profile can be predicted to succeed
Dymond, 1954). Also known as a Q-sort, the Q-technique is at the selected task.
Origins of Personality Testing 215

number (ranging from 1 to 9) for each item. An individual make cautious reference to cognitions in explaining
examinee would be considered more- or less-adjusted what it is, specifically, that a person learns. A social
according to the resemblance between his or her sortings learning theorist might argue that we learn expectations
and the average sorting for adjusted persons. We will refer or rules about the environment, not just stimulus and
the reader to Block (1961, 2008) for details. response connections.
Another way to use the Q-sort is to compare an exami-
nee’s self-sort with his or her ideal sort. Rogers used the
discrepancy between these two sortings as an index of
adjustment. His subjects were required to sort the items
twice, according to the following instructions:

1. SELF-SORT. Sort these cards to describe yourself as


you see yourself today, from those that are least like
you to those that are most like you.
2. IDEAL SORT. Now sort these cards to describe your
ideal person—the person you would most like within
yourself to be (Rogers & Dymond, 1954).

Using the item pile numbers, Rogers then correlated


the two sorts for each subject separately. Consider what
these data mean: If the self-sort and the ideal sort are highly
similar, the correlation of Q-sort data will approach 1.0; if
the two sorts are opposite one another, the correlation will
approach –1.0. Of course, most sorts will be somewhere in
between but typically on the positive side. Butler and
Based on his social learning views, Rotter (1966)
Haigh found that psychotherapy clients increased their
developed the Internal-External (I-E) Scale, an interesting
congruence between self and ideal (Rogers & Dymond,
measure of internal versus external locus of control. The
1954, chap. 4). Even so, adjusted control subjects possessed
construct of locus of control refers to the perceptions that
a greater congruence.
individuals have about the source of things that happen
to them. In particular, the I-E Scale seeks to assess the
8.1.5: Behavioral and Social examinee’s generalized expectancies for internal versus
Learning Theories external control of reinforcement. The purpose of the I-E
Scale is to determine the extent to which the examinee
Behavioral and social learning theories have their ori-
believes that reinforcement is contingent upon his or her
gins in laboratory studies on operant learning and clas-
behavior (internal locus of control) as opposed to the out-
sical conditioning. A fundamental assumption of all
side world (external locus of control). The instrument is a
behavioral theorists is that many of the behaviors that
forced-choice self-report inventory. For each item, the
make up personality are learned. To understand person-
examinee chooses the single statement (from a pair) with
ality, then, we must know about the learning history of
which he or she more strongly concurs. Items resemble
the individual. Behavioral theorists also believe that the
the following:
environment is of supreme importance in shaping and
maintaining behavior. Behavioral inquiry, therefore, In general, most people get the respect they deserve.
seeks to identify the specific components of the current OR
environment that are controlling a person’s behavior.
In reality, a person’s worth often passes unrecognized.
The behavioral approach to personality has produced a
variety of direct assessment methods, which we discuss For the preceding item, the first alternative indicates
in the next chapter. an internal locus of control, whereas the second alternative
Behavioral theorists disagree mainly on the role that signifies an external locus of control. The balance of inter-
cognitions play in determining behavior. Cognitions are nal to external responses determines the overall score on
inferred mental processes such as problem solving, judg- the scale. The I-E Scale is a reliable and valid instrument
ing, or reasoning. Radical behaviorists believe that that has stimulated a huge body of research on the nature
resorting to mentalistic explanations of any kind is and meaning of locus of control and related variables.
futile: “When what a person does is attributed to what is Research indicates that locus of control has a strong rela-
going on inside him, investigation is brought to an end” tionship to occupational success, physical health, academic
(Skinner, 1974). By contrast, social learning theorists achievement, and numerous other variables. As the reader
216 Chapter 8

might suspect, an internal locus of control generally pre- answers will not be disclosed to others. Please rate your
dicts a more positive outcome than an external locus of degree of confidence for doing the things below using
control. The interested reader can consult Lefcourt (1991) this scale:
for further details.
Important contributions to social learning theory 0 10 20 30 40 50 60 70 80 90 100
have also been made by Albert Bandura. In his early Can’t Mildly Moderately Completely
studies, Bandura examined the role of observational Do Uncertain Certain Certain
learning and vicarious reinforcement in the development Confidence:
of behavior (Bandura, 1965, 1971; Bandura & Walters, (0 to 100)

1963). More recently, he has proposed that perceived


self-efficacy is a central mechanism in human action Maintain control of the classroom when
(Bandura, 1982; Bandura, Taylor, Ewart, Miller, & ­lecturing ________
DeBusk, 1985). Self-efficacy is a personal judgment of Keep students on track during hard
“how well one can execute courses of action required to ­assignments ________
deal with prospective situations” (Bandura, 1982). The Deal with individuals who keep talking out of
concept of self-efficacy is useful in explaining why cor- turn ________
rect knowledge does not necessarily predict efficient Teach students who don’t want to be in
action. For example, two boys may be equally convinced class ________
that a garden snake in the bathtub presents no hazard, Teach students who have no parental
but one will pick it up while the other runs out the door. support ________
These differences in behavior illustrate the role of self-
Motivate students who resist doing
referential thought as a mediator between knowledge
homework ________
and action. The boy who ran out the door did not believe
Keep the brightest students interested in
he could deal with the situation effectively. He had little
class ________
perceived self-efficacy for snake handling. Bandura
would argue that the primary determinant of the boy’s This is a only a preliminary and generic example. A
behavior is a self-judgment about personal capabilities. complete scale would be longer and would undergo a few
Cognitions are, therefore, assumed to be a major deter- iterative cycles of revision before final draft. In a recent and
minant of behavior. helpful chapter, Bandura (2006) also gives advice on how
Bandura (1997, 2006) has developed an appealing to construct the best self-efficacy scales, starting with issues
approach to the assessment of self-efficacy expectations of content validity, response bias, item analysis, and end-
outlined below. But he warns against the idea that there ing with strategies for validation of scales. Yet, regardless
can be one all-purpose measure of perceived self-efficacy: of their psychometric excellence, self-efficacy scales need
to be practical. They should be judged by the extent to
One cannot be all things, which would require mastery of which, ultimately, they enable people to fulfill desired per-
every realm of human life. People differ in the areas in sonal and social transformations (Bandura, 2006).
which they cultivate their efficacy and in the levels to
which they develop it even within their given pursuits.
For example, a business executive may have a high sense 8.1.6: Trait Conceptions of
of organizational efficacy but low parenting efficacy. Personality
Thus, the efficacy belief system is not a global trait but a
differentiated set of self-beliefs linked to distinct realms of A trait is any “relatively enduring way in which one indi-
functioning vidual differs from another” (Guilford, 1959). Psycholo-
(Bandura, 2006, p. 307). gists developed the concept of trait from the ways people
describe other people in everyday life. As language
As a consequence, scales of self-efficacy need to be evolved, people found words to portray the consistencies
adapted to the particular domain of functioning of interest and differences they encountered in their daily interactions
to the practitioner or researcher. with others. Thus, when we say one person is sociable and
Fortunately, Bandura (2006) has outlined a strategy for another is shy we are using trait names to describe consist-
developing self-efficacy scales. The starting point is a sim- encies within individuals and also differences between
ple rating format, which resembles the following hypothet- them (Goldberg, 1981a; Fiske, 1986).
ical example of a scale that school administrators might Trait conceptions of personality have been enormously
use with teachers to gauge classroom self-efficacy: popular throughout the history of psychological testing, so
Classroom Questionnaire: We are interested in the areas the coverage here is necessarily selective. We will review
of challenge that teachers face in the classroom. Your two prominent and influential positions from the dozens
Origins of Personality Testing 217

of trait theories that have been proposed. These approaches which he referred to as the “Big Five” dimensions. Although
differ primarily in terms of whether traits are split off into researchers have used slightly different terms for these
finely discriminable variants or grouped together into a factors, the most common labels are:
small number of broad dimensions:
• Neuroticism
1. Cattell’s factor-analytic viewpoint identifies 16 to 20 • Extraversion
bipolar trait dimensions. • Openness to Experience
2. Eysenck’s trait-dimensional approach coalesces doz- • Agreeableness
ens of traits into two overriding dimensions.
• Conscientiousness
3. Goldberg and others have sought a modern synthesis
of all trait approaches by proposing a five-factor model Rearranging the factors yields a simple acronym:
of personality. OCEAN. The five-factor model is rapidly becoming the
consensus model of personality. Support for the five-factor
For readers who desire a more detailed discussion of approach comes from several sources, including factor
this topic, Pervin (1993) and Wiggins (1997) provide an analysis of trait terms in language and the analysis of per-
excellent review of trait approaches to personality theory. sonality from an evolutionary perspective. We discuss
these perspectives in the following.
Cattell’s Factor-Analytic Trait Theory Cattell
The use of trait terms in the analysis of personality is
(1950, 1973) refined existing methods of factor analysis to
based upon the fundamental lexical hypothesis. The
help reveal the basic traits of personality. He referred to the
essential point of this hypothesis is that trait terms have
more obvious aspects of personality as surface traits. These
survived in language because they convey important infor-
would typically emerge in the first stages of factor analysis
mation about our dealings with others:
when individual test items were correlated with each other.
For example, true–false items such as “I enjoy a good prize The variety of individual differences is nearly bound-
fight,” “Getting stuck behind a slow driver really bothers less, yet most of these differences are insignificant in
me,” and “It’s important to let people know who is in people’s daily interactions with others and have
charge” might be answered similarly by subjects, revealing remained largely unnoticed. Sir Francis Galton may
a surface trait of aggressiveness. have been among the first scientists to recognize explic-
But surface traits themselves tended to come in clusters, itly the fundamental lexical hypothesis—namely that
the most important individual differences in human
as revealed by Cattell’s more sophisticated application of
transactions will come to be encoded as single terms in
factor analysis. For Cattell, this was evidence of the existence
some or all of the world’s languages.
of source traits, the stable and constant sources of behavior.
(Goldberg, 1990)
Source traits are, therefore, less visible than surface traits but
are more important in accounting for behavior. When trait terms in English are distilled down to a rea-
Cattell (1950) was unrivaled in his use of factor analy- sonably distinct and nonoverlapping set of adjectives, a
sis to discover how traits were organized and how they few hundred characteristics typically emerge (Allport,
were related to each other. One approach was to have per- 1937). For decades, researchers have been asking individu-
sons rate others they knew well by checking various adjec- als to rate themselves or others on these or similar traits.
tives such as aggressive, thoughtful, and dominating from a When these ratings are subjected to factor analysis, the
list of 171 choices. When the results from 208 subjects were “Big Five” dimensions previously listed usually appear in
subsequently factor analyzed, about 20 underlying person- one guise or another. In sum, a mounting body of research
ality factors or traits were tentatively identified. Another indicates that the five-factor model captures a valid and
approach was to have thousands of persons answer ques- useful representation of the structure of human traits.
tions about themselves and then factor-analyze their The five-factor approach also possesses evolutionary
responses. Sixteen of the original 20 personality traits were plausibility. Specifically, the five factors of personality
independently confirmed by this second approach (Cattell, previously listed capture individual differences that relate
1973). These 16 source traits have been incorporated into to such basic evolutionary functions as survival and
the Sixteen Personality Factor Questionnaire (16PF), a trait- reproductive success (Buss, 1997; Pervin, 1993). Goldberg
based paper-and-pencil test of personality that is discussed (1981b) has theorized that people implicitly ask the fol-
in the next chapter. lowing questions in their interactions with others:

The Five-Factor Model of Personality The five- 1. Is X active and dominant or passive and submissive?
factor model of personality has its origins in a review chap- (Can I bully X or will X try to bully me?)
ter by Goldberg (1981b). In his analysis of factor-analytic 2. Is X agreeable (warm and pleasant) or disagreeable
trait research, Goldberg identified several consistencies, (cold and distant)?
218 Chapter 8

3. Can I count on X? (Is X responsible and conscientious an upper limit of r = .30. He coined the term personality
or undependable and negligent?) coefficient to describe these low correlations. Undoubt-
4. Is X crazy (unpredictable) or sane (stable)? edly significant for large samples of subjects, correlations
of r = .30 are of minimal value in the prediction of indi-
5. Is X smart or dumb? (How easy will it be for me to
vidual behavior.
teach X?)
Trait researchers responded to Mischel’s attack by
Directly or indirectly, each of these evaluations has a refining and limiting the trait concept. Researchers sought
bearing on survival and reproductive success. For exam- to identify subgroups of persons whose behavior could be
ple, point 3 (conscientiousness) involves a trait that might accurately predicted on the basis of trait scores and also
ensure group survival in a hostile world. A person low on attempted to distinguish the kinds of situations in which
this trait (undependable) would be a poor choice for guard- behavior is largely determined by traits (e.g., Mischel,
ing the food supply. The ability to discern conscientious- Shoda, & Mendoza-Denton, 2002; Wasylkiw & Fekken,
ness in others therefore has adaptive value. Not 2002). These efforts met with modest success, raising the
surprisingly, the five points previously listed correspond to validity of some trait questionnaires—in some contexts
the five-factor personality model. with some persons—substantially beyond the ominous
The five-factor model of personality has inspired sev- r = .30 barrier posited by Mischel (1968). But gone forever
eral personality scales and other systems for assessment are the days of simplistic, generalized assertions such as
(deRaad & Perugini, 2002). For example, Costa and McCrae “trait X predicts behavior Y.”
have developed two personality tests based on the five-
factor model (Costa, 1991; McCrae & Costa, 1987). The 8.1.7: The Projective Hypothesis
Revised NEO Personality Inventory (NEO-PI-R) contains
Frank (1939, 1948) introduced the term projective method to
240 items rated on a five-point scale. In addition to the five
describe a category of tests for studying personality with
major domains of personality, the inventory measures six
unstructured stimuli. In a projective test the examinee
specific traits (called facets) within each domain. A short-
encounters vague, ambiguous stimuli and responds with
ened 60-item version known as the NEO Five-Factor Inven-
his or her own constructions. Disciples of projective test-
tory (NEO-FFI) also is available. Trull, Widiger, Useda, and
ing are heavily vested in psychoanalytic theory and its
others (1998) have published a semistructured interview
postulation of unconscious aspects of personality. These
for the assessment of the five-factor model of personality.
examiners believe that unstructured, vague, ambiguous
These tests are discussed in the next chapter.
stimuli provide the ideal circumstance for revelations
about inner aspects of personality. The central assumption
Comment on the Trait Concept  All trait
of projective testing is that responses to the test represent
approaches to personality share certain problems in com-
projections from the innermost unconscious mental pro-
mon. First, there is disagreement whether traits cause
cesses of the examinee. We introduce this topic with some
behavior or merely describe behavior (Fiske, 1986). It can
preliminary concepts and distinctions relevant to projec-
be persuasively argued that invoking traits as causes is
tive testing.
an empty form of circular reasoning. For example, a per-
The assumption that personal interpretations of
son with extremely high standards might be said to pos-
ambiguous stimuli must necessarily reflect the uncon-
sess the trait of perfectionism. But when asked to explain
scious needs, motives, and conflicts of the examinee is
what is meant by perfectionism, we invariably end up
known as the projective hypothesis. Frank (1939) is gener-
referring to a pattern of extremely high standards. Thus,
ally credited with popularizing the projective hypothesis:
when we assert that someone is perfectionistic, are we
really doing anything more than providing a short-hand When we scrutinize the actual procedures that may be
description of their past behavior? Miller (1991) has called projective methods we find a wide variety of tech-
voiced this criticism of the five-factor approach, noting niques and materials being employed for the same gen-
that the model merely describes psychopathology but eral purpose, to obtain from the subject, “what he cannot
does not explain it. or will not say,” frequently because he does not know
himself and is not aware what he is revealing about him-
A second problem with traits is their apparently low
self through his projections.
predictive validity. Mischel (1968) is credited with the first
effective disparagement of the trait concept in his influen- The challenge of projective testing is to decipher
tial book Personality and Assessment. He stated that “while underlying personality processes (needs, motives, and
trait theory predicts behavioral consistency, it is behavior conflicts) based on the individualized, unique, subjective
inconsistency that is typically observed” (Mischel, 1968). responses of each examinee. In the sections that follow we
In a wide-ranging review of existing research, Mischel will examine how well projective tests have met this por-
noted that trait scales produced validity coefficients with tentous assignment.
Origins of Personality Testing 219

A Classification of Projective Techniques 


Figure 8.1 An Inkblot Similar to Those Found on the
Lindzey (1959) has offered a classification of projective Rorschach
techniques that we will follow here. Based on the response
required, he divided projectives into five categories:

• Association to inkblots or words


• Construction of stories or sequences
• Completions of sentences or stories
• Arrangement/selection of pictures or verbal choices
• Expression with drawings or play

Association techniques include the widely used


Rorschach inkblot test and its psychometrically superior
cousin the Holtzman Inkblot Technique, as well as word
association tests. Construction techniques include the The-
matic Apperception Test and the many variations upon
this early instrument. Completion techniques consist
mainly of sentence completion tests, discussed later.
Arrangement/selection procedures such as the Szondi test
(discussed in the first chapter) are currently seldom used.
Finally, expression techniques such as the Draw-A-Person
or House-Tree-Person test are very popular among clini- Exner & Weiner, 1994). The Comprehensive System (CS)
cians in spite of dubious validity data. supplanted all previous methods and became the preferred
We will review prominent techniques within each cat- scoring system because it was more clearly grounded
egory except the antiquated arrangement/selection in empirical research. Even so, reservations about the
approaches, which are almost never used. However, the Rorschach in general and the CS in particular persisted in
literature on major projective techniques is simply over- the trade (Lilienfield, Wood, & Garb, 2000, 2001).
whelming, running to perhaps tens of thousands of articles Beginning in about 2010, a new system for administra-
on the Rorschach alone. We can suggest major trends in the tion, scoring, and interpretation of the Rorschach emerged
research, but the reader will need to consult other sources as the clear choice for practitioners. The Rorschach Perfor-
for comprehensive reviews. mance Assessment System (R-PAS) represents an extension
and improvement of the CS (Meyer, Viglione, Mihura,
8.1.8: Association Techniques Erard, & Erdberg, 2011). Erard (2012) provides a succinct
summary of its appeal:
The Rorschach The Rorschach consists of 10 inkblots
devised by Herman Rorschach (1884–1922) in the early Despite its recent formal introduction to the professional
1900s. He formed the inkblots by dribbling ink on a sheet assessment community, R-PAS takes advantage of dec-
of paper and folding the paper in half, producing relatively ades of research in peer reviewed publications (including
symmetrical bilateral designs. Five of the inkblots are black the insights of Rorschach critics) and builds on estab-
lished validity and general acceptance for most of its pro-
or shades of gray, while five contain color; each is dis-
cedures and features (p. 122).
played on a white background. An inkblot of the type
employed by Rorschach is shown in Figure 8.1. The Ror- In using the R-PAS, the examiner first establishes
schach is suited to persons age 5 and up but is most com- rapport and then sits to the side of the client or patient to
monly used with adults. minimize body language communication. For each card,
Regrettably, Rorschach died before he could complete the examiner asks the respondent to look at the stimulus
his scoring methods, so the systematization of Rorschach and to answer “What might this be?” Before the test, the
scoring was left to his followers. Five American psycholo- examiner asks for “two, maybe three responses” per
gists produced overlapping but independent approaches card. During the test, if only one reply is given, the exam-
to the test—Samuel Beck, Marguerite Hertz, Bruno Klop- iner prompts for additional response(s), and pulls the
fer, Zygmunt Piotrowski, and David Rapaport (Erdberg, card after four responses are provided. This is called
1985). Predictably, the nuances of scoring varied from one response optimization, which elicits a typical range of 18
scoring method to another. Beginning in the 1990s, John to 28 responses. This technique greatly reduces short and
Exner and his colleagues began to codify and synthesize long records (protocols with upwards of 100 responses
the scoring approaches into a single method known as the have been encountered), which affords a better fit with
Rorschach Comprehensive System (Exner, 1991, 1993; norms.
220 Chapter 8

The R-PAS incorporates several laudable improve-


ments (www.r-pas.org): Table 8.2 Summary of Major Rorschach Scoring Criteria
Location: Where on the blot was the percept located?
W Whole Entire inkblot used
D Common detail Well-defined part used
Dd Unusual detail Unusual part used
White Space: Was white space used in the response?
SR Space reversal White space as the figure
SI Space integration White space integrated in percept
Content: What is seen, and is it synthesized or vague?
H Human Percept of a whole human form
Hd Human detail Human form incomplete in any way
Ex Explosion An actual explosion
Sy Synthesis Objects are meaningfully related
Vg Vagueness Objects in the percept are vague
2 Pair Two identical, mirror-image percepts
Form Quality: How well does the percept fit the blot?
o Ordinary Obvious and easily seen
Once the test is administered and the responses u Unusual Unusual but still a good fit
recorded, scoring begins. This is an intricate process that 2 Minus Distorted and unrealistic percept
requires significant training. We can only refer to high- P Popular Designated high frequency percept
lights here. Responses are scored for a number of variables Determinants: What feature of the blot determined the response?
such as location, content, form quality, thought processes, M Movement Movement seen or implied in percept
and determinants. Determinants are different aspects of
C Color Color helped determine the response
the blot such as color, shading, and form, which appear to
F Form Form a major determinant of percept
have influenced examinee responses (Table 8.2).
T Texture Shading involved in the response
Interrater reliability of R-PAS scores is excellent. Using
Thought Processes: Are there issues with thought processes or themes?
a diverse sample of 50 Rorschach records randomly
DV1 Deviant Verbalization-1 Odd or unusual verbalization
selected from ongoing research, the median intraclass cor-
DV2 Deviant Verbalization-2 Clearly bizarre verbalization
relation coefficient (an index of agreement between raters)
for 60 variables was .92 (Viglione, Blume-Marcovici, Miller, MOR Morbid Response has a clearly dysphoric tone

Giromini, & Meyer, 2012). Another useful feature of this NOTE: This list is incomplete and illustrative only. The full scoring system is
complex and allows for blends. For example, the determinant FC means that
new approach to Rorschach scoring is the availability of an both form and color were used to determine the percept, but form was
international reference sample for standardization of scor- more important than color.
ing variables. This sample of 1,396 protocols was obtained Source: Based on Exner (1993) and Meyer et al. (2011).

from 15 nations, including Australia, Brazil, Japan, Israel,


and Spain—just to give a sense of the global distribution.
The validity of the Rorschach as scored with the R-PAS Once the entire protocol has been coded, the examiner
(or any other scoring system) is difficult to summarize in computes a number of summary scores that form the pri-
any simple manner. Individual studies indicate good valid- mary basis for hypothesizing about the personality of the
ity for some purposes, but limited validity for other appli- examinee. For example, the F+ percent is the proportion of
cations. For example, with the R-PAS, Complexity scores the total responses that uses pure form as a determinant. A
were correlated with functional capacity (r = .30) and voluminous literature exists on the meaning of this index,
social skills capacity (r = .34) in a sample of 72 middle- but it seems safe to hypothesize that when the F+ percent-
aged and older outpatients with schizophrenia (Moore, age falls below 70 percent, the examiner should consider
Viglione, Rosenfarb, Patterson, & Mausbach, 2012). Psy- the possibility of severe psychopathology, brain impair-
chological complexity, as measured by the Complexity ment, or intellectual deficit in the examinee (Exner, 1993).
score, assesses the mental effort, intricacy, and integration The F+ percent is also considered to be an index of ego
evident in responses, with higher scores indicating better strength, with higher scores indicating a greater capacity to
coping skills. Thus, it makes theoretical and empirical deal effectively with stress. Meyer and Eblin (2012) discuss
sense that psychological complexity would show positive R-PAS variables and composites.
correlations with functional and social capacities. These Frank (1990) has emphasized that formal scoring of the
results support the validity of this Rorschach variable. Rorschach is insufficient for some purposes such as the
Origins of Personality Testing 221

diagnosis of schizophrenia. He stresses that an analysis of of individuals with schizophrenia or other serious mental
the patient’s thinking for the presence of highly personal, illness.
illogical, and bizarre associations to the blots is essential
for psychodiagnosis. In his approach, the Rorschach is
really an adjunct to the interview, and not a test per se.
Bornstein and Masling (2005) have reminded us that
neither the CS nor the R-PAS should be confused with
being “the Rorschach.” After all, there are many other
helpful and validated approaches to scoring the test. Their
book, Scoring the Rorschach: Seven Validated Systems (2005),
is a wonderful compendium of alternative scoring systems
that can be used to answer specialized assessment ques-
tions. A case in point is the Rorschach Prognostic Rating
Scale (RPRS; Handler & Clemence, 2005), a promising and
validated system for predicting who will be successful in
psychotherapy and who will not. Scoring the RPRS is com-
plex and consists of assigning or subtracting points for
various categories of clearly defined responses. For exam-
ple, a positive score is given if a response depicts a human
as dancing, running, talking, or pointing, whereas a score
The TDI is calculated by scoring each response for the
of zero is coded if humans are seen as sleeping, lying down,
severity level of thought disorder from none to extreme,
sitting, or balancing. The meaningful use of color in the
with possible scores of 0, .25, .50, .75, and 1.0. Then the
response also contributes to a positive score, whereas using
average score is computed across all responses. This num-
color to depict explosions or diseases results in points
ber is multiplied by 100 to yield the final score on a range
being subtracted. Several categories are scored, yielding a
from 0 to 100. Thus, an overall score of 0 would mean that
total score that ranges from –12 to +17. The following
not one response revealed any thought disorder, whereas a
interpretations are then assigned to different ranges of the
score of 100 would signify that, without exception, every
RPRS score:
response was highly bizarre and disorganized.
The reliability of the TDI is reasonably good, with
17 to 13: The person is almost able to help himself. A very promising
case that just needs a little help. split-half correlations around .80 and interrater reliability
12 to 7: Not quite so capable as the previous case to work out his coefficients of .90 and higher. Validity has been supported
problems himself but with some help is likely to do pretty well. from a number of directions, such as huge improvements
6 to 2: Better than 50–50 chance; any treatment will be of some in scores when patients with schizophrenia are tested
help.
before and after comprehensive interventions including
1 to −2: 50–50 chance.
drug therapies (Holtzman et al., 2005). Mastering the TDI
−3 to −6: A difficult case that may be helped somewhat but is scoring criteria is far easier than learning the Comprehen-
generally a poor treatment prospect.
sive System. Insofar as the TDI provides valuable informa-
−7 to −12: A hopeless case. (Handler & Clemence, 2005, p. 54)
tion about the extent of thought disorder—one of the
foremost reasons that practitioners use the Rorschach—
Meyer and Handler (1997) used meta-analysis to syn-
we can expect to see increased reliance on this approach to
thesize the results of 18 validity studies of the RPRS, com-
test scoring.
prising a total sample of 752 participants. Their results
Space does not permit us to summarize validated scor-
translated to a 78 percent success rate in psychotherapy for
ing systems. These scales are derived largely from psycho-
clients with high scores on the RPRS, but only a 22 percent
analytic theory and include an index of object relations, a
success rate for clients with low scores on the scale. The
measure of oral dependency, barrier and penetration indi-
RPRS is a promising scale that should receive wider use in
ces based on body image, a measure of primary process
clinical practice.
thinking, and a scale that assesses primitive psychological
Another useful scoring system for the Rorschach is the
defenses (Bornstein & Masling, 2005).
Thought Disorder Index (TDI), which assesses formal
thought disorder (Holtzman, Levy, & Johnston, 2005). Comment on the Rorschach The Rorschach has
Thought disorder exists on a continuum from mild slippage provoked more controversy in the field of assessment than
to bizarre disorganization and is especially characteristic of any other personality test or instrument. Opinions tend to
patients with schizophrenia. Thus, the assessment of be polarized, and both proponents and detractors cite
thought disorder is pivotal in the diagnosis and treatment studies and analyses to support their case. For example,
222 Chapter 8

critics of the test refer to a fascinating study by Albert, Fox, Supporters of the test cite improvements in scoring
and Kahn (1980) on the susceptibility of the Rorschach to offered by the R-PAS approach and are more optimistic in
faking. We remind the reader that literally thousands of their outlook (Meyer & Eblin, 2012). A recent study by
Rorschach research studies have been published. In fact, a McGrath, Pogge, Stokes, and others (2005) found that the
search of PsychINFO using the key title word Rorschach Rorschach could be scored with respectable reliability,
yielded 5,324 articles dating back to 1925 (the test was pub- even in the less controlled conditions typical of real-world
lished in 1921). The majority of these studies are positive in testing. This was an important finding because virtually all
tone. But the skeptical results reported by Albert, Fox, and prior studies of reliability have been conducted in research
Kahn (1980) are not isolated. They submitted the Rorschach settings. In response to the ongoing controversy, the pres-
protocols of 24 persons to a panel of experts, asking for tigious Society for Personality Assessment requisitioned
psychiatric diagnoses of each examinee. The 24 Rorschach external reviews by an independent panel of “blue ribbon”
protocols consisted of results from four groups of six per- experts, who concluded that the Rorschach possesses reli-
sons each: ability and validity similar to other accepted tests like the
MMPI-2. The trustees of the society assert that the contin-
• Mental hospital patients with a diagnosis of paranoid ued use of the Rorschach, therefore, is appropriate and jus-
schizophrenia tified (Board of Trustees for the Society for Personality
• Uninformed fakers given instructions to fake the Assessment, 2005).
responses of a paranoid schizophrenic The controversy over the Rorschach probably will sub-
• Informed fakers who listened to a detailed audiotape side for awhile, but it is not likely to disappear entirely.
about paranoid schizophrenia Even if the test continues to prevail because of studies sup-
• Normal controls who took the test under standard porting the reliability of scoring and the validity of infer-
instructions ences, there are other concerns seldom mentioned by
skeptics. One liability is that learning the scoring system is
The uninformed fakers, informed fakers, and normal an arduous and time consuming task that requires dozens
controls were students who had passed an MMPI screen- of hours of practice and years of supervised experience.
ing and were judged reasonably normal during interview. Some doctoral programs offer an entire course (or two) on
Each protocol was rated by six to nine judges, all fellows of the Rorschach, and this is just the beginning of the training
the Society for Personality Assessment. The judges were needed. A second problem is that administering and scor-
told to provide a psychiatric diagnosis as well as other ing the Rorschach requires a few hours of professional time
information not reported here. The judges were not from a licensed psychologist. This time is a precious and
informed as to the purpose of the study but were told to expensive commodity. Someone has to pay for it. These
assess whether any profiles appeared to be malingered. practical issues are daunting. In regard to learning the test
The informed fakers must have done an excellent in the first place, and devoting the time to administer and
job, for they were more likely to be diagnosed psychotic score it in the second place, many clinical training directors
than the real patients themselves (72 percent versus 48 and practitioners (and not a few insurance companies) are
percent, respectively). The uninformed fakers were fairly asking “Is it worth it?”
convincing, too, with a 46 percent rate of diagnosed psy-
chosis. The normal controls were diagnosed as psychotic
24 percent of the time. Granted that the diagnostic chal- 8.1.9: Completion Techniques
lenge in this study was immense, it is still disturbing to Sentence Completion Tests  In a sentence comple-
find that the expert judges rated 24 percent of the normal tion test, the respondent is presented with a series of stems
protocols as psychotic, while correctly identifying psy- consisting of the first few words of a sentence, and the task
chosis in only 48 percent of the actual psychotic protocols. is to provide an ending. As with any projective technique,
A more recent study by Netter and Viglione (1994) also the examiner assumes that the completed sentences reflect
concluded that the Rorschach was susceptible to the faking the underlying motivations, attitudes, conflicts, and fears
of psychosis. of the respondent. Usually, sentence completion tests can
In general, critics portray the test as possessing low be interpreted in two different ways: subjective-intuitive
reliability and a general lack of predictive validity (Carlson, analysis of the underlying motivations projected in the
Kula, & St. Laurent, 1997; Wood, Nezworski, & Stejskal, subject’s responses, or objective analysis by means of
1996; Lilienfeld, Wood, & Garb, 2000). In their meta-analytic scores assigned to each completed sentence.
review, Garb, Florio, and Grove (1998) concluded that the An example of a sentence completion test is shown in
Rorschach explained a dismal 8 to 13 percent of the vari- Figure 8.2. This test is quite similar to existing instruments
ance in client characteristics, as compared to the MMPI, in that the stems are very short and restricted to a small
which explained 23 to 30 percent of the variance. number of basic themes. The reader will notice that three
Origins of Personality Testing 223

topics reoccur in this short test (the respondent’s self-con- most sophisticated and theory-bound (e.g., Weiss, Zilberg,
cept, mother, and father). In this manner the examinee has & Genevro, 1989). However, the Rotter Incomplete Sen-
multiple opportunities to reveal underlying motivations tences Blank has the strongest empirical underpinnings
about each topic. Of course, most sentence completion tests and is the most widely used in clinical settings. We exam-
are much longer—anywhere from 40 to 100 stems—and ine this instrument in more detail.
contain more themes—anywhere from 4 to 15 topics.
Rotter Incomplete Sentences Blank The Rotter
Incomplete Sentences Blank (RISB) consists of three similar
Figure 8.2 Example of a Short Sentence Completion Test forms—high school, college, and adult—each containing
40 sentence stems written mostly in the first person (Rotter
Directions: Finish these sentences to indicate how you feel.
& Rafferty, 1950; Rotter, Lah, & Rafferty, 1992). Although
1. My best characteristic is ________
the test can be subjectively interpreted in the usual manner
2. My mother ________ through qualitative analysis of needs projected in the sub-
3. My father ________ ject’s responses, it is the objective and quantitative scoring
4. My greatest fear is ________ of the RISB that has drawn the most attention.
5. The best thing about my mother was ________ In the objective scoring system each completed sen-
6. The best thing about my father was ________ tence receives an adjustment score from 0 (good adjust-
7. I am proudest about ________ ment) to 6 (very poor adjustment). These scores are based
8. I only wish my mother had ________ initially on the categorizing of each response as follows:
9. I only wish my father had ________

Different Response Categories, Their Types and Examples


Dozens of sentence completion tests have been devel-
oped; most are unpublished and unstandardized instru-
ments produced to meet a specific clinical need. Some
representative sentence completion tests in current use are
outlined in Table 8.3. Of these instruments, Loevinger’s
Washington University Sentence Completion Test is the

Table 8.3 Brief Outline of Representative Sentence


Completion Tests

Conflict responses are scored 4, 5, or 6, from lowest to


highest degree of the conflict expressed. Positive responses
are scored 2, 1, or 0, from least to most positive response.
Neutral responses and omissions receive no score. The
manual gives examples of each scoring category. The over-
all adjustment score is obtained by adding the weighted
ratings in the conflict and positive categories. The adjust-
ment score can vary from 0 to 240, with higher scores indi-
cating greater maladjustment.
The reliability of the adjustment score is exceptionally
good, even when derived by assistants with minimal psy-
chological expertise. Typically, interscorer reliabilities are
in the .90s and split-half coefficients are in the .80s (Rotter
et al., 1992; Rotter, Rafferty, & Schachtitz, 1965). The valid-
ity of this index has been investigated in numerous studies
using the RISB as a screening device with a “maladjust-
ment” cutoff score. For example, a cutoff score of 135 has
been found to correctly screen delinquent youths 60 percent
224 Chapter 8

of the time while identifying nondelinquent youths cor-


Figure 8.3 A Picture Similar to Those on the Thematic
rectly 73 percent of the time (Fuller, Parmelee, & Carroll, Apperception Test
1982). The same cutoff identifies heavy drug users 80 to 100
percent of the time (Gardner, 1967). These and similar find-
ings support the construct validity of the adjustment index
but also indicate that classification rates are much lower
than needed for individual decision making or effective
screening. It also appears that the norms for the adjustment
index are outdated. Lah and Rotter (1981) found that stu-
dent scores differ significantly from those obtained in the
original study by Rotter and Rafferty (1950). Lah (1989)
and Rotter et al. (1992) provide new normative, scoring,
and validity data for the RISB.
As discussed by P. Goldberg (1965), the simplicity of
the single adjustment score is both the test’s strength and
weakness. True, the test provides a quick and efficient
method for obtaining an overall index of how respondents
are functioning on a day-to-day basis. However, a single
score cannot possibly capture any nuances of personality
functioning. In addition, the RISB is subject to the same external forces. Murray (1938, 1943) developed an elabo-
types of bias as other self-report measures, namely, the rate TAT scoring system for measuring 36 different needs
information will reflect mainly what the respondent wants and various aspects of press, as revealed by the examinee’s
the examiner to know. stories.
Almost as soon as Murray released the TAT, other clini-
cians began to develop alternative scoring systems
8.1.10: Construction Techniques (e.g.,Dana, 1959; Tomkins, 1947). Literature on the adminis-
The Thematic Apperception Test (TAT) The TAT tration, scoring, and interpretation of the TAT burgeoned
consists of 30 pictures that portray a variety of subject mat- extensively, as documented by reviews (Aiken, 1989, chap.
ters and themes in black-and-white drawings and photo- 12; Groth-Marnat, 1997; Weiner & Kuehnle, 1998). By the
graphs; one card is blank. Most of the cards depict one or 1950s, there was no single preferred mode of administra-
more persons engaged in ambiguous activities. Some cards tion, no single preferred system of scoring, and no single
are used for adult males (M), adult females (F), boys (B), or preferred method of interpretation, a predicament that still
girls (G), or some combination (e.g., BM). As a consequence, endures today. Clinicians even vary the wording of the
exactly 20 cards are appropriate for every examinee. instructions and commonly select an individualized subset
A picture similar to those on the TAT is shown in of TAT cards for each client. Indeed, the absence of stand-
Figure 8.3. In administering the TAT, the examiner ardized procedures is such that we should rightly regard
requests the examinee to make up a dramatic story for the TAT as a method, not a test.
each picture, telling what led up to the current scene, It is worth mentioning that Murray’s instructions
what is happening at the moment, how the characters are included a statement that the TAT was “a test of imagina-
thinking and feeling, and what the outcome will be. The tion, one form of intelligence” and further stipulated:
examiner writes down the story verbatim for later scoring I am going to show you some pictures, one at a time; and
and analysis. your task will be to make up as dramatic a story as you
The TAT was developed by Henry Murray and his can for each. Tell what has led up to the event shown in
colleagues at the Harvard Psychological Clinic (Morgan & the picture, describe what is happening at the moment,
Murray, 1935; Murray, 1938). The test was originally what the characters are feeling and thinking; and then
designed to assess constructs such as needs and press, ele- give the outcome. Speak your thoughts as they come to
ments central to Murray’s personality theory. According your mind. Do you understand? Since you have fifty min-
to Murray, needs organize perception, thought, and action utes for ten pictures, you can devote about five minutes to
each story. Here is the first picture.
and energize behavior in the direction of their satisfaction.
(Murray, 1943)
Examples of needs include the needs for achievement,
affiliation, and dominance. In contrast, press refers to the Currently, clinicians downplay the emphasis on
power of environmental events to influence a person. Alpha imagination and intelligence when giving instructions.
press is objective or “real” external forces, whereas beta Surely, this omission must influence the quality of the stories
press concerns the subjective or perceived components of produced.
Origins of Personality Testing 225

Even though more than a dozen scoring systems have employ subjective and “personalized” procedures for
been proposed, interpretation of the TAT is usually based interpreting the TAT; that is, only a tiny fraction of clinical
on a clinical-qualitative analysis of the story productions. A practitioners rely on a standardized scoring system (Lilien-
central consideration harks back to Murray’s “hero” feld, Wood, & Garb, 2001). This is troubling because a con-
assumption. According to this viewpoint, the hero is the sistent theme in research on projective testing is that
protagonist of the examinee’s story. It is assumed that the intuitive interpretations are likely to overdiagnose psycho-
examinee clearly identifies with this character and projects logical disturbance.
his or her own needs, strivings, and feelings onto the hero. In addition to clinical applications, the TAT has
Conversely, thoughts, feelings, or actions avoided by the received considerable use for research purposes. For exam-
hero may represent areas of conflict for the examinee. A ple, Turk, Brown, Symington, and Paul (2010) examined
specific example will help clarify these points. Consider the the content of TAT stories from 22 persons with agenesis of
response to Card 3BM given by a depressed examinee2: the corpus callosum (ACC), a congenital brain disorder in
which the pathways connecting the two cerebral hemi-
Looks like … I can’t tell if it’s a girl or boy. Could be either.
I guess it doesn’t matter. This person just had a hard phys- spheres are partially or completely absent. They used the
ical workout. I guess it’s a her. She’s just tired. No trauma linguistic inquiry software of James Pennebaker (Tauszcik
happened or anything. She was sitting around a table with & Pennebaker, 2010) to count words in psychologically
friends and she got real tired. She’s not in a health danger meaningful categories. Compared to age- and IQ-matched
or anything. These are her keys. Her friends drag her back controls, the ACC individuals used fewer words pertain-
to her room and put her to bed. She’s O.K. the next day. ing to emotionality, cognitive processes, and social pro-
No trauma. She’s tired physically, not mentally. cesses, indicating that they experienced greater difficulty
(Ryan, 1987) imagining and inferring the mental and emotional states of
What stands out in this response is the repetitive others. In this research application, the TAT proved helpful
denial of danger or trauma. But later in the testing, the for enhancing our understanding of the unique qualities of
denial of trauma is no longer maintained. Read how the persons with ACC.
examinee responded to the blank card, relating a story of a
The Picture Projective Test  The Picture Projective
young man, traumatized at school, who takes his car down
Test (PPT) is an attempt to construct a general-purpose
to the river:
instrument with improved psychometric qualities (Ritzler,
He sees the bridge, he’s really down. He remembers that Sharkey, & Chudy, 1980; Sharkey & Ritzler, 1985). The
he’s heard stories about people jumping off and killing developers of the PPT note that the majority of the TAT pic-
themselves. He could never understand why they did tures exert a strong negative stimulus “pull” on storytell-
that. Now he understands, he jumps and dies … he ing. The TAT cards are cast in dark, shaded tones and most
should have waited ‘cause things always get better some- scenes portray persons in low-key or gloomy situations. It
time. But he didn’t wait, he died.
is not surprising, then, that projective responses to the TAT
(Ryan, 1987)
are strongly channeled toward negative, melancholic sto-
Most clinicians would conclude that the examinee ries (Goldfried & Zax, 1965).
who produced these stories had been traumatized and was In contrast, the PPT uses a set of pictures taken from
defending against self-destructive impulses. Correspond- the Family of Man photo essay published by the Museum of
ingly, the clinician would be well advised to explore these Modern Art (1955). The following criteria were used in
issues in psychotherapy. selecting 30 pictures:
The psychometric adequacy of the TAT is difficult to
• The pictures had to show promise of eliciting mean-
evaluate because of the abundance of scoring and interpre-
ingful projective material.
tation methods. Clinicians defend the test on an anecdotal
• Most but not all of the pictures had to include more
basis, pointing out remarkable and confirmatory findings
than one human character.
such as illustrated here. However, data-minded research-
ers are more cautious. One problem is that formally scored • About half of the pictures had to depict humans show-
TAT protocols possess very low test–retest reliability, with ing positive affective expression (e.g., smiling, embrac-
a reported median value of r = .28 (Winter & Stewart, ing, dancing).
1977). Furthermore, an astonishing 97 percent of test users • About half of the pictures had to depict humans in
active poses, not simply standing, sitting, or lying
down.
2
Card 3BM depicts one person—arguably male or female—kneel-
ing or slumped over on a couch with head bowed on one arm. In In an initial pilot study, the authors compared TAT and
the corner is a vaguely drawn object interpreted by some exami- PPT story productions of eight undergraduates on several
nees to be a handgun or other weapon. variables such as length of stories, emotional tone, and
226 Chapter 8

activity level (Ritzler, Sharkey, & Chudy, 1980). Compared Other Variations on the TAT The TAT has inspired
to the TAT productions, the PPT stories were of compara- a number of similar tests designed for children and older
ble length but were much more positive in thematic con- adults (Table 8.4). In addition, modifications and variations
tent and emotional tone. The PPT stories were also much of the TAT have been developed for ethnic, racial, and lin-
more active, meaning that the central character had an guistic minorities. One of the first was the Thompson TAT
active, self-determined effect on the situation in the story. (T-TAT) in which 21 of the original TAT pictures were
Furthermore, the PPT stories placed greater emphasis on redrawn with African American figures (Thompson, 1949).
interpersonal rather than intrapersonal themes. In other This TAT modification incorporated certain unintended
words, the PPT stories placed more emphasis on “healthy,” changes—for example, in facial expressions and the situa-
adaptive aspects of personality adjustment than did the tions portrayed. As a result, the T-TAT should be consid-
TAT productions. ered a new test and not just a TAT translation suited to
The PPT developers also compared their instrument African American individuals (Aiken, 1989).
against the TAT in a diagnostic validity study (Sharkey &
Ritzler, 1985). PPT and TAT story productions of 50 sub-
jects were compared: normals, nonhospitalized depres-
Table 8.4 Thematic Apperception Tests for Specific
Populations
sives, hospitalized depressives, hospitalized psychotics
with good premorbid histories, and hospitalized psychot-
ics with poor premorbid histories (10 subjects in each
group). Although the TAT and PPT were essentially equal
in their capacity to discriminate normal from depressed
subjects, the PPT was superior in differentiating psychot-
ics from normals and depressives. On the PPT, depres-
sives told stories with gloomier emotional tone and
psychotics made more perceptual distortions, and the-
matic/interpretive deviations. The PPT appears to be a
very promising instrument, although it is obvious that
further research is needed on its psychometric qualities.
One noteworthy feature is that anyone can purchase the
PPT stimuli at their local bookstore. The requisite materi-
als are found in the Family of Man photo collection
(Museum of Modern Art, 1955).

Children’s Apperception Test  Designed as a


direct extension of the TAT, the Children’s Apperception
Test (CAT) consists of 10 pictures and is suitable for chil-
dren 3 to 10 years of age. The preferred version for younger
children (CAT-A) depicts animals in unmistakably human
social settings (Bellak & Bellak, 1991). The test developers
used animal drawings on the assumption that young chil-
dren would identify better with animals than humans. A
human figure version (CAT-H) is available for older chil-
dren (Bellak & Bellak, 1994). No formal scoring system
exists for the CAT and no statistical information is pro-
vided on reliability or validity. Instead, the examiner pre-
pares a diagnosis or personality description based on a
synthesis of 10 variables recorded for each story: (1) main
theme; (2) main hero; (3) main needs and drives of hero;
(4) conception of environment (or world); (5) perception of
parental, contemporary, and junior figures; (6) conflicts;
(7) anxieties; (8) defenses; (9) adequacy of superego;
(10) integration of ego (including originality of story and
nature of outcome) (Bellak, 1992). The lack of attention to
psychometric issues of scoring, reliability, and validity of
the CAT is troublesome to most testing specialists.
Origins of Personality Testing 227

Another specialized TAT-like test is the TEMAS, which Interpretation of the DAP proceeds in an entirely clini-
consists of 23 colorful drawings that depict Hispanic per- cal-intuitive manner, guided by a number of tentative psy-
sons interacting in contemporary, inner-city settings chodynamically based hypotheses (Machover, 1949, 1951).
(Aiken, 1989; Constantino, Malgady, & Rogler, 1988). For example, Machover maintained that examinees were
TEMAS is Spanish for themes and an acronym for “tell me a likely to project acceptable impulses onto the same-sex fig-
story.” The thematic content of TEMAS stories is scored for ure and unacceptable impulses onto the opposite-sex fig-
18 cognitive functions, 9 personality (ego) functions, and 7 ure. She also believed that the relative sizes of the male and
affective functions. The test can also be scored for various female figures revealed clues about the sexual identifica-
objective indices such as reaction time, fluency, unan- tion of the examinee. For example, drawing a man with
swered inquiries, and stimulus transformations (e.g., a let- large eyes and lashes was thought to indicate a homosexu-
ter is transformed into a bomb). Hispanic children respond ally inclined male.
well to the TEMAS, even though they may be inarticulate These interpretive premises are colorful, interesting,
in response to traditional projective tests. and plausible. However, they are based entirely on psycho-
The inconsistent reliability of the TEMAS is a source of dynamic theory and anecdotal observations. Machover
concern, because reliability constrains validity. The manual made little effort to validate the interpretations. The empir-
reports that Cronbach’s alpha for the 34 scoring functions ical support for her hypotheses is somewhere between
ranged from .31 to .98 with half below .70. Test–retest reli- meager and nonexistent (Swensen, 1968). In favor of the
abilities were even lower; the highest correlation was r = DAP, the overall quality of drawings does weakly predict
.53 and for 26 of the 34 functions the correlations were near psychological adjustment (Lewinsohn, 1965; Yama, 1990).
zero! In spite of the questionable reliability of the instru- However, judged by contemporary standards of evidence,
ment, several studies provide support for its concurrent the sweeping and cavalier assessments of personality so
and predictive validity. For example, in a clinical sample of often derived from the DAP are embarrassing. Some
210 Puerto Rican children, TEMAS scale scores predicted reviewers have concluded that the DAP is an unworthy
independent criteria of ego development, trait anxiety, and test that should no longer be used (Gresham, 1993; Motta,
adaptive behavior reasonably well, with correlations rang- Little, & Tobin, 1993).
ing from .27 to .51 (Malgady, Constantino, & Rogler, 1984). Rather than using the DAP to infer nuances of person-
A steady stream of research has continued to bolster the ality, a more appropriate application of this test is in the
utility of this instrument, as surveyed by Constantino & screening of children suspected of behavior disorder and
Malgady (1996). Flanagan and di Guiseppe (1999) provide emotional disturbance. For this purpose, Naglieri,
a critical review of the TEMAS; Constantino and Malgady McNeish, and Bardos (1991) developed the Draw A Person:
(2000) describe recent developments with the test. Screening Procedure for Emotional Disturbance
(DAP:SPED). In one study, diagnostic accuracy of problem
children was significantly improved by application of the
8.1.11: Expression Techniques DAP:SPED scoring approach (Naglieri & Pfeiffer, 1992).
The Draw-A-Person Test  As the reader will recall
from an earlier chapter, Goodenough (1926) used the The House-Tree-Person Test (H-T-P) The H-T-P is a
Draw-A-Man task as a basis for estimating intelligence. projective test that uses freehand drawings of a house,
Subsequently, psychodynamically minded psychologists tree, and person (Buck, 1948, 1981). The examinee is
adapted the procedure to the projective assessment of per- given almost complete freedom in sketching the three
sonality. Karen Machover (1949, 1951) was the pioneer in objects; separate pencil and crayon drawings are
this new field. Her procedure became known as the Draw- requested. Although the examiner can improvise an
A-Person Test (DAP). Her test enjoyed early popularity H-T-P Test with mere blank pieces of paper, Buck (1981)
and is still widely used as a clinical assessment tool. Watkins, recommends the use of a four-page drawing form with
Campbell, Nieberding, and Hallmark (1995) report that identification information on the first page. Pages two,
projective drawings such as the DAP rank eighth in popu- three, and four are titled House, Tree, and Person. Two
larity among clinicians in the United States. drawing forms are needed for each examinee, one for
The DAP is administered by presenting the examinee pencil drawings and the other for crayon drawings. Buck
with a blank sheet of paper and a pencil with eraser, then (1981) also provides a separate four-page form for a post-
asking the examinee to “draw a person.” When the drawing drawing interrogation phase, which consists of 60 ques-
is completed the examinee usually is directed to draw tions designed to elicit the examinee’s opinions about
another person of the sex opposite that of the first figure. elements of the drawings. Many practitioners feel the
Finally, the examinee is asked to “make up a story about this postdrawing interrogation phase is not worth the
person as if he [or she] were a character in a novel or a play” extended effort. Also, the value of separate crayon draw-
(Machover, 1949). ings is questioned (Killian, 1987).
228 Chapter 8

The House-Tree-Person Test has much the same famil- tested a young soldier who had accidentally shot himself in
ial lineage as the Draw-A-Person Test. Like the DAP Test, the leg with a 45-caliber pistol while practicing quick draw
the H-T-P Test was originally conceived as a measure of in the jungle. Surgeons found it necessary to amputate the
intelligence, complete with a quantitative scoring system soldier’s leg from the knee down. He was quite depressed,
to appraise an approximate level of ability (Buck, 1948). and everyone assumed that he suffered from grief and guilt
However, clinicians soon abandoned the use of the H-T-P over his great personal tragedy. He was virtually mute and
as a measure of intelligence, and it is now used almost nearly untestable. However, he was persuaded to complete
exclusively as a projective measure of personality. a series of figure drawings. In one drawing he depicted
Although we will not delve into any details here, the himself as a helicopter gunner, spraying bullets indiscrimi-
interpretation of the H-T-P rests on three general assump- nantly into the jungle below. When questioned about this
tions: the House drawing mirrors the examinee’s home life drawing, he became quite animated and confessed that
and intrafamilial relationships; the Tree drawing reflects he relished combat. Guided by the possible implications
the manner in which the examinee experiences the envi- of the morbid drawing, the psychologist sought to learn
ronment; and the Person drawing echoes the examinee’s more about the veteran’s attitudes toward combat. In the
interpersonal relationships. Buck (1981) provides numer- course of several interviews, the veteran revealed that he
ous interpretive hypotheses for both quantitative and qual- particularly enjoyed firing on moving objects—animals,
itative aspects of the three drawings. soldiers, civilians—it made no difference to him. Gradu-
The H-T-P is an alluring test that has fascinated clini- ally, it became clear that the young veteran was an incipient
cians for more than 40 years. Unfortunately, Buck (1948, war criminal who was depressed because his injury would
1981) has never provided any evidence to support the reli- prevent him from returning to the front lines. Needless to
ability or validity of this instrument. Indeed, he is perhaps say, this information had quite an impact on the tenor of
his own worst critic. At one point in his test manual, he the psychological report.
even asserts that validational research is not possible with
the H-T-P (Buck, 1981, p. 164).
In general, attempts to validate the H-T-P as a personal-
ity measure have failed miserably (for reviews see Krugman,
1970; Killian, 1987). Thoughtful reviewers have repeatedly
8.2: Self-Report and
recommended the abandonment of the H-T-P and similar
figure-drawing approaches to personality assessment. The
Behavioral Assessment
popularity of the H-T-P has dropped off in recent years. A of Psychopathology
search of PsychINFO revealed only nine articles on the test
8.2 Review structured tests and procedures, self-report
since 2000, including four dissertations.
inventories, and behavioral assessment approaches
Many clinicians do not use projective methods as tests
of psychopathy
at all but as auxiliary approaches to the clinical interview.
These practitioners use projective techniques as clinical Although there are many methods for the assessment of
tools to derive tentative hypotheses about the examinee. personality and related qualities, broadly speaking two
Most of these hypotheses will turn out to be false when approaches have dominated the field: unstructured and
examined more closely. However, the few that are con- structured. Unstructured methods such as the Rorschach,
firmed may have important implications for the clinical TAT, and sentence completion blanks permit broad latitude
management of the examinee. Furthermore, we suspect in the responses of the examinee. These approaches domi-
that these fruitful hypotheses might not emerge—or might nated personality testing in the early twentieth century but
emerge more slowly—if the practitioner relied entirely on then slowly faded in standing. In contrast, structured
the interview or used only formal tests with established approaches such as self-report inventories and behavior
reliability and validity (Case Exhibit 8.1). However, this rating scales gained prominence in the mid-twentieth cen-
assertion is difficult to test empirically. tury and have continued to expand in popularity to the
present time. Whereas only a handful of unstructured tech-
niques has ever risen to distinction, the number of struc-
tured instruments for assessment has grown almost
Case Exhibit 8.1 exponentially.
In the previous topic we introduced the reader to the
Projective Tests as Ancillary to the Interview
many varieties of unstructured tests such as inkblots, stim-
A specific example may help to clarify the role of projec- ulus cards, and sentence completion blanks. These meth-
tive techniques as ancillary to the clinical interview. During ods are resplendent in the richness of the hypotheses they
the Vietnam War, a Veteran’s Administration psychologist yield; however, projective techniques largely lack the
Origins of Personality Testing 229

approval of psychometrically oriented clinicians. In this a theory-guided inventory is the Personality Research Form
topic, we focus on the more structured, objective methods (PRF), based on Murray’s (1938) need-press theory of per-
for personality assessment favored by measurement- sonality. Some theory-guided inventories such as the State-
minded psychologists. We review a wide variety of true– Trait Anxiety Inventory (STAI) attempt to measure very
false, rating scale, and forced-choice instruments for specific components of personality. We review these tests in
assessing personality and other qualities. This review takes more detail in the following.
in a variety of personality tests, including the Minnesota
Personality Research Form  The Personality
Multiphasic Personality Inventory-2, arguably the most
Research Form (Jackson, 1999) is a true–false inventory
famous personality test ever published. We also examine
based loosely on Murray’s (1938) theory of manifest needs.
contemporary approaches that rely upon structured inter-
The reader will recall from an earlier discussion that Mur-
view, behavioral observation, and ratings.
ray posited 15 needs and developed a projective test, the
The self-report approaches to testing discussed in the
Thematic Apperception Test, to tap those needs. Based on
following sections are steeped in the details of psychomet-
factor-analytic approaches, Jackson expanded the number
ric methodology. These tests feature prominent references
of needs and produced several forms for assessment. The
to reliability indices, criterion keying, factor analysis, con-
forms differ in the number of scales and number of items
struct validation, and other forms of technical craftsman-
per scale. In addition to parallel short tests (forms A and B),
ship. For this reason, the approaches discussed here often
the Personality Research Form (PRF) also exists as parallel
are considered objective—as contrasted with projective.
long forms (forms AA and BB). These forms, used primar-
However, whether they are objective in any meaningful
ily with college students, consist of 440 true–false items.
sense is really an empirical question that must be answered
The long forms yield 20 personality-scale scores and two
on the basis of research. Perhaps it is more accurate to call
validity scores, Infrequency and Desirability (Table 8.5).
these methods structured. They are structured in the sense
The most popular version of the PRF is form E, which con-
that highly specific rules are followed in the administra-
sists of all 22 scales in a modified 352-item test.
tion, scoring, interpretation, and narrative reporting of
results. In fact, some of the approaches are so completely
structured that an examinee can answer questions pre-
sented on a computer screen and observe a computer-gen- Table 8.5 Personality Research Form Scales
erated narrative report spewed forth from the printer,
Scale Interpretation of High Score
literally seconds later.3
We begin our discussion of structured assessment by Abasement Self-effacing, humble, blame-accepting

reviewing several prominent personality tests. Contempo- Achievement Goal striving, competitive

rary psychometricians have relied mainly upon three tac- Affiliation Friendly, accepting, sociable

tics for personality test development: theory-bounded Aggression Argues, combative, easily annoyed
approaches, factor-analytic approaches, and criterion-key Autonomy Independent, avoids restrictions
methods. We will organize the discussion of personality Change Avoids routine, seeks change
inventories around these three categories. Of course, the Cognitive Structure Prefers certainty, dislikes ambiguity
boundaries are somewhat artificial and many test develop- Defendence On guard, takes offense easily
ers use a combination of methods. Dominance Influential, enjoys leading
Endurance Persevering, hard-working
8.2.1: Theory-Guided Inventories Exhibition Dramatic, enjoys attention
Harm Avoidance Avoids risk and excitement
The construction of several self-report inventories was
guided closely by formal or informal theories of person- Impulsivity Impulsive, speaks freely

ality. In these cases, the test developer designed the Nurturance Caring, sympathetic, comforting

instrument around a preexisting theory. Theory-guided Order Organized, dislikes confusion

inventories stand in contrast to factor-analytic approaches Play Playful, light-hearted, enjoys jokes
that often produce a retrospective theory based upon initial Sentience Notices, remembers sensations
test findings. Theory-guided inventories also differ from Social Recognition Concern for reputation and approval
the stark atheoretical empiricism found in criterion-key Succorance Insecure, seeks reassurance
instruments such as the MMPI and MMPI-2. An example of Understanding Values logical thought
Desirability Validity Scale: favorable presentation
3 Infrequency Validity Scale: infrequent responses
Computerized narrative reports may not be altogether a positive
development. We discuss the benefits and pitfalls of computer- Source: Based on Personality Research Form Scales and Descriptions from Jackson, D.
N. (1989). Personality research form manual (3rd ed.). Port Huron, MI: Sigma Assessment
generated reports in the next chapter. Systems, Inc., Research Psychologists Press division. (800) 265-1285.
230 Chapter 8

In constructing the PRF form E, Jackson first formu- Because these instruments were developed indepen-
lated rigorous and theoretically based definitions of the dently according to different test construction philoso-
traits to be measured, following Murray’s (1938) system for phies, the findings bolster the validity of both tests. Several
personality description. Next, for each scale over 100 items recent empirical comparisons also support the validity and
were written to tape the traits underlying the hypothesized utility of the PRF. For example, Goffin, Rothstein, and
needs. After editorial review, these items were adminis- Johnston (2000) proved that the PRF outperformed the
tered to large samples of college students. Item selection more widely used Sixteen Personality Factor Question-
was based on simplicity of wording, high biserial correla- naire (16PF, discussed later in this section) in predicting the
tions with total scale scores, low correlations with other job performance of 487 candidates for managerial posi-
scales (maximizing scale independence), and low correla- tions. Vernon (2000) also reports favorably on the validity
tions with the Desirability scale (minimizing social desira- of the PRF in his review of recent studies.
bility bias). Convergent and discriminant validity was
considered throughout. For the original long forms AA and State-Trait Anxiety Inventory The State-Trait
BB, 20 items were selected for each scale, resulting in 20 × Anxiety Inventory (STAI) is a popular self-report measure
22 or 440 items. For the PRF form E, about four items were of anxiety, used in research and clinical settings (Spiel-
dropped from each scale, yielding a 352-item test. berger, 1983, 1989). The current version is called Form Y, a
Unlike many other personality inventories, the PRF minor revision of the original Form X (Spielberger, Gor-
scales have no item overlap. As a result, the scales are unu- such, & Lushene, 1970). A similar scale for children also is
sually independent, with most intercorrelation coefficients available (Spielberger, 1973). The test has been translated
in the vicinity of 6.30 (Gynther & Gynther, 1976). Further- into more than 40 languages. We limit our discussion here
more, the rigorous scale construction procedures employed to the adult version.
by Jackson (1970) yielded scales with good internal consist- The purpose of the STAI is to differentiate between
ency, with a median coefficient alpha of .70. Test–retest reli- the temporary condition of state anxiety and the more
abilities are exceptionally strong, ranging from .80 to .96 for long-standing quality of trait anxiety. State anxiety is
a two-week interval, with a median of .91 (Jackson, 1999). defined as a “transitory emotional state or condition char-
Norms are based on thousands of college students from acterized by subjective feelings of tension and apprehen-
North America, and also include subgroup norms for psy- sion, and by activation of the autonomic nervous system.”
chiatric inpatients and criminal offenders. A desirable fea- Trait anxiety refers to “relatively stable individual differ-
ture of the PRF is its readability: The test requires only a ences in anxiety proneness” (Gaudry, Vagg, & Spielberger,
fifth- or sixth-grade reading level (Reddon & Jackson, 1989). 1975, p. 331).
The validity of the PRF rests upon a substantial body of The state scale (A-State scale) consists of 20 items that
research over many decades. A lengthy bibliography citing evaluate how the respondent feels “right now, at this
more than 300 articles about the test can be found at www. moment.” Items are similar to I feel at peace and I am dis-
sigmaassessmentsystems.com. For example, correlations tressed. Responses are on a 4-point scale (Not At All, Some-
between self and roommate ratings on the PRF constructs what, Moderately So, and Very Much So). The trait scale
are reported to range from .27 to .74, with a median of .53. (A-Trait scale) consists of 20 items that assess how the
The construct validity of the PRF rests especially upon respondent feels “generally.” Items are similar to I am a
confirmatory factor analyses corroborating the grouping of stable person and I lack confidence. Reponses are on a 4-point
the items into 20 scales (Jackson, 1970, 1984b). In addition, scale (Almost Never, Sometimes, Often, and Almost
research indicates positive correlations with comparable Always). Of course, scoring is reversed for positively
scales on other inventories (Mungas, Trontel, & Weingardner, stated items. The range of scores for each scale is 20 to 80,
1981). For example, Edwards and Abbott (1973) found excep- with higher scores indicating greater anxiety. Extensive
tionally strong and confirmatory correlations between simi- normative data are available, stratified by age and subdi-
lar scales on the PRF and the Edwards Personality Inventory vided by setting (employed adults, college students, high
(EPI; Edwards, 1967). The EPI is a respected but little-used school students, military recruits). The STAI has received
test consisting of 1,200(!) true–false questions. Some of the extensive service in research, and also is used in health-
confirmatory correlations between PRF and EPI scales for related clinical applications such as gauging anxiety in
218 male and female college students are reported as follows: pregnant women (Gunning, Denison, Stockley, and oth-
ers, 2010), monitoring improvement in psychotherapy
Achievement (PRF) × Is a Hard Worker (EPI). 74 patients (Vautier & Pohl, 2009), and detecting mental dis-
order in elderly patients (Kvaal, Ulstein, Nordhus, &
Change (PRF) × Likes a Set Routine (EPI) −.54
Engedal, 2005).
Nurturance (PRF) × Helps Others (EPI) .64
State anxiety fluctuates in response to environmental
Succorance (PRF) × Dependent (EPI) .73
circumstances and may change even from hour to hour.
Origins of Personality Testing 231

Therefore, we can expect that test–retest reliability will be such as empathy and interpersonal sensitivity. Items on the
lower for state anxiety than for trait anxiety. This is precisely E scale resemble the following:
what researchers find, with short-range reliability in the .40s
• Do you like to meet new people? (T)
and .50s for the A-State scale and in the high .80s for the
A-Trait scale (Rule & Traver, 1983; Spielberger et al., 1970). • Are you quiet when with others? (F)
Internal consistency of the scale is excellent, with Cronbach’s • Do you like lots of excitement? (T)
alpha of .86 for the total score in a sample of medical patients
High scores on the E scale indicate a loud, gregarious,
(Quek, Low, Razack, Loh, & Chua, 2004). Individual alpha
outgoing, fun-loving person. Low scores on the E scale
values for A-State and A-Trait are robust as well, with results
indicate introverted traits such as a preference for soli-
of .95 and .93, respectively, in a sample of 567 patients treated
tude and quiet activities. Items on the N scale resemble
at an anxiety disorders clinic (GrÖs, Antony, Simms, &
the following:
McCabe, 2007).
The validity of the STAI is well established from doz- • Are you a moody person? (T)
ens of studies demonstrating content validity, conver- • Do you feel that life is dull? (T)
gent/discriminant validity, and construct validity
• Are your feelings easily hurt? (T)
(Spielberger, 1989). In a factor-analytic study of scores for
205 patients with panic disorder, Oei, Evans, and Crook The N scale reflects a dimension of emotionality that
(1990) found that a two-factor oblique solution was the ranges from nervous, maladjusted, and overemotional
best fit, accounting for 41 percent of the variance. Notably, (high scores) to stable and confident (low scores).
18 of the A-State items revealed salient loadings on factor The reliability of the EPQ is excellent. For example, the
1 (state anxiety) and all 20 of the A-Trait items showed one-month test–retest correlations were .78 (P), .89 (E),
prominent loadings on factor 2 (trait anxiety). In sum, the .86 (N), and .84 (L). Internal consistencies were in the .70s for
STAI is a brief, reliable, and valid measure of state and P and the .80s for the other three scales. The construct valid-
trait anxiety. The measure is a mainstay for clinicians and ity of the EPQ is also well established through dozens of
researchers. studies using behavioral, emotional, learning, attentional,
and therapeutic criteria (reviewed in Eysenck & Eysenck,
1985). Friedman (1987) provides a short but thorough intro-
8.2.2: Factor-Analytically Derived duction to other sources on the EPQ.
Inventories A major focus of research with the EPQ has been on
Eysenck Personality Questionnaire  The the empirical correlates of extraversion and its polar oppo-
Eysenck Personality Questionnaire (EPQ) was designed to site, introversion. Eysenck and Eysenck (1975) describe the
measure the major dimensions of normal and abnormal typical extravert as follows:
personality (Eysenck & Eysenck, 1975). Based on a lifelong The typical extravert is sociable, likes parties, has many
program of factor-analytic questionnaire research and lab- friends, needs to have people to talk to, and does not
oratory experiments on learning and conditioning, Eysenck like reading or studying by himself. He craves excite-
isolated three major dimensions of personality: Psychoti- ment, takes chances, often sticks his neck out, acts on
cism (P), Extraversion (E), and Neuroticism (N). The EPQ the spur of the moment, and is generally an impulsive
consists of scales to measure these dimensions and also individual.
incorporates a Lie (L) scale to assess the validity of an They describe the typical introvert as follows:
examinee’s responses. The EPQ contains 90 statements
The typical introvert is a quiet, retiring sort of person,
answered “yes” or “no” and is designed for persons aged
introspective, fond of books rather than people; he is
16 and older. A Junior EPQ containing 81 statements is suit-
reserved and distant except to intimate friends. He tends
able for children ages 7 to 15.
to plan ahead, “looks before he leaps,” and mistrusts the
Items on the P scale resemble the following: impulse of the moment.
• Do you often break the rules? (T) Eysenck and his followers have linked a number of
• Would you worry if you were in debt? (F) perceptual and physiological factors to the extraversion/
• Do you take risks just for fun? (T) introversion dimension. Because of space limitations, we
can only list representative findings here:
High scores on the P scale indicate aggressive and hos-
tile traits, impulsivity, a preference for liking odd or unu- • Introverts are more vigilant in watchkeeping.
sual things, and empathy defects. Antisocial and schizoid • Introverts do better at signal-detection tasks.
patients often obtain high scores on this dimension. In con- • Introverts are less tolerant of pain but more tolerant of
trast, low scores on P foretell more desirable characteristics sensory deprivation.
232 Chapter 8

• Extraverts are more easily conditioned to stimuli asso- (A) Activity versus Lack of Energy. High-scoring indi-
ciated with sexual arousal. viduals have a great deal of energy and endur-
• Extraverts have a greater need for external stimulation. ance, work hard, and strive to excel.
(S) Emotional Stability versus Neuroticism. High-scoring
Aiken (1989) summarizes additional research on the
persons are free from depression, optimistic,
real-world correlates of the EPQ extraversion/introversion
relaxed, stable in mood, and confident.
dimension.
(E) Extraversion versus Introversion. High-scoring
In general, the technical characteristics of the EPQ are
individuals meet people easily, seek new friends,
very strong, certainly stronger than found in most self-
feel comfortable with strangers, and do not suffer
report inventories. The practical utility of the instrument is
from stage fright.
supported by voluminous research literature. Nonetheless,
the EPQ has never caught on among American psycholo- (M) Mental Toughness versus Sensitivity. High-scoring
gists, who seem enamored of multiphasic instruments that individuals tend to be rather tough-minded
produce 10, 20, or 30 scores, not a simple trio of basic people who are not bothered by blood, crawling
dimensions. creatures, vulgarity, and who do not cry easily or
show much interest in love stories.
Comrey Personality Scales  For practitioners who (P) Empathy versus Egocentrism. High-scoring indi-
desire a short self-report inventory suitable for college stu- viduals describe themselves as helpful, generous,
dents and other adults, the Comrey Personality Scales sympathetic people who are interested in devot-
(Comrey, 1970, 1980, 2008) would be a good choice. As a ing their lives to the service of others.
protégé of Guilford, Comrey pursued a ­factor-analytic
Reflecting its careful factor-analytic derivation, the
strategy in developing his 180-item test. Comrey relied
CPS scales possess exceptional internal consistencies,
exclusively upon college students in the development and
which range from .91 to .96. These findings indicate that
standardization of his test, so the CPS is well suited to
the CPS is most likely a reliable test, but traditional test–
assessment of personality in this subpopulation.
retest data are scant. Cross-cultural studies with the CPS
A special virtue of the CPS is its brevity. Consisting of
are highly supportive of its validity. Brief and Comrey
180 statements, the test is only one-third as long as compet-
(1993) report that the eight-factor solution to CPS item
ing instruments such as the MMPI-2. The eight CPS per-
responses is found in factor analyses with Russian, U.S.,
sonality scales consist of 20 items each, divided equally
Brazilian, Israeli, Italian, and New Zealand samples. Other
between positively and negatively worded statements.
validational studies with the CPS are not straightforward
Another 20 items are devoted to a validity check and the
in their interpretation. On the one hand, the correlations
assessment of social desirability response bias.
between CPS scale scores and personality-relevant bio-
The following description of CPS scales is based upon
graphical data are very small (Comrey & Backer, 1970;
Merenda (1985) and Comrey (1995, 2008):
Comrey & Schiebel, 1983). On the other hand, extreme
(V) Validity Check. A score of 8 is the expected raw scores on the CPS scales are strongly associated with psy-
score. Any score on the V scale that gives a chological disturbance (Comrey & Schiebel, 1985). This is
T-score equivalent below 70 is still within the nor- particularly true for low scores on Trust versus Defensive-
mal range, however. Higher scores are suggestive ness, Activity versus Lack of Energy, Emotional Stability
of an invalid record. versus Neuroticism, Extraversion versus Introversion, and
(R) Response Bias. High scores indicate a tendency high scores on Orderliness versus Lack of Compulsion.
to answer questions in a socially desirable Shen and Comrey (1997) describe the utility of the CPS
way, making the respondent look like a “nice” with medical students, showing that the test is a reasonable
person. predictor of clinical performance and personal suitability.
In general, reviewers conclude that the CPS is a promising
(T) Trust versus Defensiveness. High scores indicate a
test that needs updated standardization and additional
belief in the basic honesty, trustworthiness, and
documentation on its technical qualities. Comrey (1995)
good intentions of other people.
summarizes validity studies of his test.
(O) Orderliness versus Lack of Compulsion. High scores
are characteristic of careful, meticulous, orderly,
and highly organized individuals. 8.2.3: Criterion-Keyed Inventories
(C) Social Conformity versus Rebelliousness. Individuals The final self-report inventories that we will review
with high scores accept society as it is, resent non- embody a criterion-keyed test development strategy. In a
conformity in others, seek the approval of society, criterion-keyed approach, test items are assigned to a par-
and respect the law. ticular scale if, and only if, they discriminate between a
Origins of Personality Testing 233

well-defined criterion group and a relevant control group. gious beliefs. These items were the source of occasional
For example, in devising a self-report scale for depression, lawsuits alleging invasion of privacy. Finally, a few items
items endorsed by depressed persons significantly more dealing with bowel functions and sexual behavior were
(or less) frequently than by normal controls would be just downright offensive.
assigned to the depression scale, keyed in the appropriate From the standpoint of measurement, a more serious
direction. A similar approach might be used to develop problem with item content was that of omission. The MMPI
scales for other constructs of interest to clinicians such as item pool was not broad enough to assess many important
schizophrenia, anxiety reaction, and the like. Notice that characteristics, including suicidal tendencies, drug abuse,
the test developer does not consult any theory of schizo- and treatment-related behaviors. An additional motive for
phrenia, depression, or anxiety reaction to determine MMPI revision was to extend the range of item coverage.
which items belong on the respective scales. The essence of The MMPI-2 was released in 1989 after nearly a decade
the criterion-keyed procedure is, so to speak, to let the of revision and restandardization. The new, improved
items fall where they may.4 MMPI-2 incorporates a contemporary normative sample of
2,600 individuals who are loosely representative of the
Minnesota Multiphasic Personality Inven- general population on major demographic variables (geo-
tory-2 (MMPI-2) First published in 1943, the MMPI was a graphic location, race, age, occupational level, and income).
566-item true–false personality inventory designed origi- Although higher educational levels are overrepresented,
nally as an aid in psychiatric diagnosis (Hathaway & McKin- the MMPI-2 normative sample is still a vast improvement
ley, 1940, 1943; McKinley & Hathaway, 1940, 1944; McKinley, over the MMPI normative sample. The item pool has been
Hathaway, & Meehl, 1948). The test authors followed a strict significantly improved by revision of obsolete items, dele-
empirical keying approach in the construction of the MMPI tion of offensive items, and addition of new items to extend
scales. The clinical scales were developed by contrasting content coverage.
item responses of carefully defined psychiatric patient The MMPI-2 is a significant improvement upon the
groups (average N of about 50) with item responses of 724 MMPI, but maintains substantial continuity with its
control subjects. The result was a remarkable test useful both esteemed predecessor. The test developers retained the
in psychiatric assessment and the description of normal per- same titles and measurement objectives for the traditional
sonality. Within a few years, the MMPI became the most validity and clinical scales. The restandardization provides
widely used personality test in the United States. a better calibration for scale elevations, a much-needed
At first the MMPI aged gracefully; what appeared to improvement (Tellegen & Ben-Porath, 1992). Although
be minor flaws were tolerated by practitioners. But as the dozens of items were rewritten, most of these revisions are
MMPI reached middle age, the need for rejuvenation cosmetic and do not affect the psychometric characteristics
became increasingly obvious. The most serious problem of the test (Ben-Porath & Butcher, 1989). In fact, when large
was the original control group, which consisted primarily samples of subjects complete the MMPI and the MMPI-2,
of relatives and visitors of medical patients at the Univer- scores on the individual validity and clinical scales typi-
sity of Minnesota Hospital. The narrow choice of control cally correlate near .99.
subjects, tested mainly in the 1930s, proved to be a persis- The MMPI-2 consists of 567 items carefully designed
tent source of criticism for the MMPI. All of the control to assess a wide range of concerns. The examinee is asked
subjects were white, and most were young (average age to mark “true” or “false” for each statement as it applies
about 35), married, and from a small town or rural area. to himself or herself. Most of the items are self-referen-
This was a sample of convenience that was significantly tial. The items encompass a wide variety of mainly path-
unrepresentative of the population at large. ological themes (Dahlstrom, Welsh, & Dahlstrom, 1972;
The item content of the MMPI also raised concerns Graham, 1993).
(Graham, 1993). Several items used archaic and obsolete The MMPI requires a sixth-grade reading level and is
terminology, referring to “drop the handkerchief” (a par- completed by most persons in 1 to 1½ hours.
lor game from the 1930s), sleeping powders (sleeping The original MMPI scales were developed by contrast-
pills), and streetcars (electric-powered buses). Other ing item responses of carefully defined psychiatric patient
items used sexist language. Examinees found some items groups (average N of about 50) with item responses of
objectionable, especially those dealing with Christian reli- about 700 controls. The psychiatric patient groups included
the following diagnostic categories: hypochondriasis,
4
We are glossing over certain complexities here. Some items depression, hysteria, psychopathy, male homosexuality,
reflecting general psychopathology might discriminate all the paranoia, psychasthenia,5 schizophrenia, and the early
contrast groups from the control group. The test developer might
5
discard these in favor of items that are differentially discriminating This outdated diagnostic term is quite similar to what would
for just one contrast group but not the others. now be labeled obsessive-compulsive disorder.
234 Chapter 8

phase of mania (hypomania). In addition, samples of the clinical scales. Many persons with significant psychiat-
socially introverted and socially extraverted college stu- ric disturbance do produce elevated scores in the range of
dents were used to construct a scale for social introversion. T = 70 or 80 on the F Scale. On the other hand, exception-
The MMPI-2 retains the basic clinical scales with only ally high scores suggest additional hypotheses: insufficient
minor item deletions and revisions. Ben-Porath and reading ability, random or uncooperative responding, a
Butcher (1989) investigated the characteristics of the rewrit- motivated attempt to “fake bad” on the test, or an exagger-
ten items on the MMPI-2 and discovered that they are psy- ated “cry for help” in a distressed client.
chometrically equivalent to the original items. The K Scale was designed to help detect a subtle form
The MMPI-2 can be scored for four validity scales, 10 of defensiveness. The 30-item scale is composed, in part, of
standard clinical scales, and dozens of supplementary 22 items that differentiated normal profiles produced by
scales. In practice, clinicians place the greatest emphasis defensive hospitalized psychiatric patients from those pro-
upon the validity and standard clinical scales. The supple- duced by normal controls. Additionally, eight items that
mentary scales are just that—supplementary. They provide improved discrimination of depressive and schizophrenic
information helpful in fine-tuning the interpretation of the symptoms were added (McKinley, Hathaway & Meehl,
traditional validity and clinical scales. MMPI-2 scale raw 1948). An elevated score on the K Scale may indicate a
scores are converted to T scores, with a mean of 50 and a defensive test-taking attitude. Normal range elevations on
standard deviation of 10. Scores that exceed T of 65 merit the K Scale suggest good ego strength—the presence of
special consideration. These elevated scores are statisti- useful psychological defenses that allow the person to
cally uncommon in the general population and may sig- function well in spite of internal conflict.
nify the presence of psychiatric symptomatology. We will The combined use of F and K may be useful in the
concentrate upon the traditional scales here, beginning detection of MMPI-2 profiles that have been faked or
with a review of the four validity scales, known as Cannot malingered. In one study, 81 percent of fake-good profiles
Say (or ?), L, F, and K. were identified by a simple decision rule (using raw scores)
The Cannot Say score is simply the total number of of F–K < 212, whereas 87 percent of fake-bad profiles were
items omitted or double-marked in completion of the identified by a simple decision rule (using raw scores) of
answer sheet. The instructions for the test encourage exam- F–K > 7 (Bagby, Rogers, Buis, & Kalemba, 1994).
inees to mark all items, but omissions or double-marked Several clinical scales are “K-corrected” to improve
items will occur. However, this is rare—the modal number their discriminatory power. The rationale for this practice
of items omitted is zero (Tamkin & Scherer, 1957). Omis- is that elevations on K betoken an artificial reduction of
sion of up to 10 items appears to have little effect on the scores on these clinical scales. Portions of the raw score on
overall test results—one of the benefits of having a huge K are thus added to these clinical scale scores prior to com-
pool of statements in the MMPI-2. A very high score on this putation of the T scores. The K-corrected scales, discussed
scale may indicate a reading problem, opposition to later, include Hypochondriasis, Psychopathic Deviate, Psy-
authority, defensiveness, or indecisiveness caused by chasthenia, Schizophrenia, and Hypomania. Whether K
depression. correction actually improves the MMPI-2 is debatable, but
The L Scale is composed of 15 items all scored in the the test publishers continued the tradition from the MMPI
false direction. By answering “false” to L Scale items, the for the sake of continuity. Separate norms for non-K-cor-
examinee asserts that he or she possesses a degree of per- rected scale score transformations are also available.
sonal virtue that is rarely observed in our culture (e.g., In addition to the validity scales, the MMPI-2 is always
never gets angry, likes everyone, never lies, reads every scored for 10 clinical scales. With the exception of Social
newspaper editorial, and would rather lose than win). The Introversion, these clinical scales were constructed in the
L Scale was designed to identify a general, deliberate, eva- usual criterion-keyed manner by contrasting responses of
sive test-taking attitude. A high score on the L Scale indi- clinical subjects and normal controls. As noted previously,
cates that the examinee is not only defensive, but naively Social Introversion was developed by contrasting the
so. Persons with any degree of psychological sophistica- responses of college students high and low in social intro-
tion can adopt a defensive test-taking attitude and still version. The 10 clinical scales and common interpretations
score in the normal range on the L Scale. of elevated scores are outlined in Table 8.6.
The F Scale consists of 60 items answered by normal Dozens of supplementary scales can also be scored on
subjects in the scored direction no more than 10 percent of the MMPI-2. Some of the supplementary scales are based
the time. These items reflect a broad spectrum of serious upon rational identification of symptom clusters and sub-
maladjustment, including peculiar thoughts, apathy, and sequent scale purification by empirical means. Fifteen use-
social alienation. Even though F Scale items seem to indi- ful MMPI-2 Content Scales were developed in this manner
cate psychiatric pathology, they are seldom endorsed by (Butcher, Graham, Williams, & Ben-Porath, 1990). Many of
patients. Fewer than 50 percent of these items appear on the supplementary scales were developed by independent
Origins of Personality Testing 235

meaning of various elevations on the Pa or Paranoia scale


Table 8.6 The 10 Clinical Scales from the Minnesota as follows:
Multiphasic Personality Inventory-2
T = 27–44 examinee may be stubborn, touchy, or
Scale No. Typical ­difficult
and Scale K Interpretation
Abbreviation Name Correction of Elevation T = 45–59 no undue sensitivity and adequate regard
for others
1 Hs Hypochondriasis .5K Excessive physical
preoccupation T = 60–69 increasing probability of rigidity and over-
2D Depression Sad feelings, sensitivity
hopelessness
T = 70–79 rigid, touchy, projects blame and hostility
3 Hy Hysteria Immaturity, use of
repression, denial T = 79–100 frankly delusional paranoid features may
4 Pd Psychopathic .4K Authority conflict, be present
deviate impulsivity
The configural approach to MMPI-2 interpretation is
5 Mf Masculinity- Masculine interests
femininity [women], feminine somewhat more complicated and consists of classifying the
interests [men] profile as belonging to one or another loosely defined code
6 Pa Paranoia Suspiciousness, type that has been studied extensively. Code types are usu-
hostility
ally defined by a combination of elevation (two or more
7 Pt Psychasthenia 1K Anxiety and
obsessive thinking
clinical scales elevated beyond a certain criterion) and defi-
nition (two or more clinical scales clearly standing out
8 Sc Schizophrenia 1K Alienation, unusual
thought processes from the others). For example, in its full-blown manifesta-
9 Ma Hypomania .2K High energy, tion, the 4–9 code type can be defined by a valid profile in
possible agitation which scale 4 (Psychopathic Deviate) and scale 9 (Hypo-
0 Si Social Shyness and mania) are the high-point elevations, both exceed T of 65
introversion introversion
(elevation), and both exceed the next highest clinical scale
by at least 5 T-score points (definition). Here is how Gra-
investigators; these scales vary widely in quality. In prac- ham (1993) describes persons who fit this code type:
tice, only about 30 of the additional scales are routinely
The most salient characteristics of 49/94 individuals is a
scored. Examples of the supplementary scales include
marked disregard for social standards and values. They
Anxiety, Repression, Ego Strength, and the MacAndrew
frequently get in trouble with the authorities because of
Alcoholism Scale-Revised. Anxiety (A) and Repression (R) antisocial behavior. They have a poorly developed con-
are the first two major factors that always emerge from fac- science, easy morals, and fluctuating ethical values.
tor analysis of MMPI-2 responses. An interesting supple- Alcoholism, fighting, marital problems, sexual acting
mentary scale is Barron’s (1953) Ego Strength (Es) Scale, out, and a wide array of delinquent acts are among the
which purports to predict positive response to psychother- difficulties in which they may be involved. This is a
apy. However, not all studies confirm this use of the scale common code type among persons who abuse alcohol
(Graham, 1987). The MacAndrew Alcoholism Scale- and other substances.
Revised (MAC-R; MacAndrew, 1965) is a useful index of
The most likely diagnosis for such individuals is anti-
alcohol or other substance abuse. The MAC-R is not only
social personality disorder.
useful in assessment of alcoholism but is also helpful in the
We should mention briefly that several computerized
identification of heavy drinkers and drug-dependent indi-
interpretation systems are available for the MMPI and the
viduals (Wolf, Schubert, Patterson, Grande, & Pendleton,
MMPI-2 (Fowler, 1985; Butcher, 1987). The Minnesota
1990). We cannot possibly review all the useful supplemen-
Report™ (Butcher, 1993) is the best. This system generates
tary scales here. The interested reader should consult
a very cautious and methodical 16-page report that
Butcher and Williams (1992) and Graham (1993).
includes discussion of profile validity, symptomatic pat-
MMPI-2 Interpretation The interpretation of an terns, interpersonal relations, diagnostic considerations,
MMPI-2 profile can proceed along two different paths: and treatment considerations. The Minnesota Report™
scale by scale or configural. In the simplest possible also provides a variety of figures and tables to illustrate
approach, scale by scale, the examiner determines the test results.
validity of the test, as discussed previously, by inspecting The adequacy of computerized MMPI-2 narrative
the four validity scales. If the test appears reasonably valid reports is generally good, but the reader should realize that
by these criteria, the examiner consults a relevant resource computer programs are written by fallible human beings.
book and proceeds scale by scale to produce a series of There is a danger that computer-generated test reports will
hypotheses. For example, Lachar (1974) has distilled the be erroneous. Furthermore, some less-reputable interpretive
236 Chapter 8

systems can be purchased on microcomputer diskette for a embodies a restructured format (RF), the recent entry is
few hundred dollars. This increases the risk that computer- called the MMPI-2-RF. This innovative test comprises 338
based test interpretations will be misused by unqualified items carefully selected from the original 567 items of the
persons. We discuss the pitfalls of computerized test inter- MMPI-2, using modern psychometric methods for scale
pretation in the final chapter of the book. construction. Certainly the reduced length is a potential
advantage. Patients often tire when completing the MMPI-
Technical Properties of the MMPI-2 From the 2, and some find the experience tedious and onerous. Even
standpoint of traditional psychometric criteria, the MMPI-2 so, the MMPI-2-RF constitutes a dramatic departure from
presents a mixed picture. Reliability data are generally the parent instrument and is therefore really a new test
positive, with median internal consistency coefficients (Butcher, 2011). The utility of the MMPI-2-RF will rest upon
(alpha) typically in the .70s and .80s, but as low as the .30s accumulated research in the coming years.
for some scales in some samples. One-week test–retest
coefficients range from the high .50s to the low .90s, with a Millon Clinical Multiaxial Inventory-III
median in the .80s (Butcher, Dahlstrom, Graham, Tellegen, (MCMI-III) The MCMI-III is a personality inventory
& Kaemmer, 1989). These are good figures considering that designed for the same purposes as the MMPI-2, namely, to
some attributes—such as those measured by the Depres- provide useful information for psychiatric diagnosis
sion scale—change so quickly that the test–retest method- (Millon, 1983, 1987, 1994). The MCMI-III has two advan-
ology is of questionable suitability. tages over the MMPI-2. First, it is much shorter (175 true–
A shortcoming of the MMPI-2 is that intercorrelations false items) and, therefore, more palatable to clinical
among the clinical scales are extremely high. For example, referrals; second, it is planned and organized to identify
in the case of scales 7 and 8, the Psychasthenia and Schizo- clinical patterns in a manner that is compatible with the
phrenia scales, the correlation is commonly in the .70s. In Diagnostic and Statistical Manual (DSM-IV) of the American
part, this reflects the item overlap between MMPI scales— Psychiatric Association.
scales 7 and 8 share 17 items in common. But it is also true The MCMI-III is a highly theory-driven test, incorporat-
that the criterion-keyed approach is not well suited to the ing Millon’s elaborate theoretical formulations on the nature
development of independent measures. A high intercorre- of psychopathology and personality disorder (Millon, 1969,
lation of basic scales is one price to be paid for using this 1981, 1986; Millon & Davis, 1996). The test includes 27
test development strategy. scales, listed in Table 8.7. The first 11 scales measure per-
The validity of the MMPI-2 is difficult to summarize, sonality styles or traits such as narcissism and antisocial
owing to the sheer volume of research on this instrument tendencies; the next three assess more severe personality
and its predecessor, the MMPI. As of 1975, over 6,000 stud- pathology (schizotypal, borderline, and paranoid disor-
ies employing the MMPI had been completed (Dahlstrom, ders); the following seven scales assess clinical syndromes
Welsh, & Dahlstrom, 1975). Of course, thousands of addi- such as anxiety and depression; the next three scales assess
tional studies have been published since then. Graham severe clinical syndromes such as thought disorder; the
(1993) provides a brief but excellent review of validity stud- last three scales are validity (response style) indices. Scores
ies on the MMPI/MMPI-2. He notes that the average valid- on these scales (Disclosure, Desirability, and Debasement)
ity coefficient for MMPI studies conducted between 1970 are used to adjust the other scale scores upward or down-
and 1981 was a healthy .46. He also points out the con- ward, based on defensiveness or exaggeration of symp-
firming pattern of extratest correlates in dozens of studies toms, respectively.
of identified patient groups. Research also indicates that Scale development for the MCMI-III and its precursors
the MMPI-2 is highly comparable to the MMPI, for which was careful and methodical. We can only portray the broad
a substantial body of validity data has been compiled outline here, in which 3,500 initial items were culled to 175
(Hargrave, Hiatt, Ogard, & Karr, 1994). Finally, bias studies statements in three stages of test development: a theoreti-
comparing MMPI-2 results for Caucasian and African cal-substantive stage (theory-guided item writing), an
American clients indicate that slight racial differences do internal-structural stage (item-scale correlations), and an
exist in average profiles. However, these differences validly external-criterion stage (contrast of diagnostic groups with
reflect emotional functioning; that is, the MMPI-2 is not the reference group). A special feature of the last stage was
racially biased (McNulty, Graham, Ben-Porath, & Stein, Millon’s use of general psychiatric patients instead of nor-
1997). The MMPI-2 likely will maintain its status as the pre- mal controls as the reference group. The purpose of this
miere instrument for assessment of psychopathology in strategy was to enhance the capacity of MCMI scales to dif-
adulthood for many years to come. ferentiate specific diagnostic groups from one another.
In 2008, a new version of the MMPI-2 with reduced Unfortunately, one side effect of this particular criterion-
length and restructured scales was released (Ben-Porath & keyed approach was a rather substantial degree of item
Tellegen, 2008; Tellegen & Ben-Porath, 2008). Because it overlap for the clinical scales. Millon planned for and
Origins of Personality Testing 237

of the PIC-R, a popular instrument that dates back to the


Table 8.7 Scales of the Millon Clinical Multiaxial Inventory-III late 1950s (Wirt & Broen, 1958; Wirt, Lachar, Klinedinst, &
Seat, 1984). The current version, suitable for children 5
Clinical Personality Patterns Clinical Syndromes
through 19 years of age, consists of 275 true–false state-
1 Schizoid A Anxiety
ments that are completed by a parent or parental surrogate.
2A Avoidant H Somatoform The PIC-2 is one corner of a triad of instruments developed
2B Depressive N Bipolar: Manic by David Lachar and colleagues to provide a comprehen-
3 Dependent D Dysthymia sive, multiview perspective on children’s emotional and
4 Histrionic B Alcohol Dependence behavioral adjustment in the home, school, and commu-
5 Narcissistic R Post-Traumatic Stress nity. The complementary instruments are the Personality
Disorder
Inventory for Youth (PIY), which is filled out by the child,
6A Antisocial
and the Student Behavior Survey (SBS), which is filled out
6B Aggressive (Sadistic) Severe Syndromes by the teacher. We discuss only the PIC-2 here. Items on the
7 Compulsive SS Thought Disorder PIC-2 resemble the following:
8A Passive-Aggressive CC Major Depression
(Negativistic) My child finds it difficult to fall asleep.
8B Self-Defeating PP Delusional Disorder My child is a finicky eater.
Severe Personality Pathology Validity (Modifying) Indices
My child has threatened to kill himself (herself).
S Schizotypal X Disclosure
Sometimes my child swears at other adults.
C Borderline Y Desirability
Our marriage has been full of turmoil.
P Paranoid Z Debasement

The instrument also provides a shorter 96-item version


expected the item overlap but probably did not anticipate known as the Behavioral Summary, suitable for screening
that some pairs of scales on the MCMI would share the and research purposes.
majority of their items in common. Some of this overlap The test developers of the PIC-2 followed a complex
was eliminated with the further refinement of the test for multistage methodology to assign individual items to
the second and third editions. The revised instrument also scales and subscales. The goal was to minimize content
incorporates an item-weighting procedure. In this approach, overlap between scales and subscales by examining pre-
individual questions are weighted 2 or 1 to reflect their liminary item × subscale correlations and then retaining
importance in discriminating the prototype for each scale. only those items for each specific subscale that showed
The item-weighting approach has been criticized as unnec- high correlations. As a consequence of this test develop-
essary and unwieldy (Streiner, Goldberg, & Miller, 1993). ment strategy, each subscale possesses homogeneous con-
The normative sample for the MCMI-III consisted of tent and the individual statements correlate substantially
about a thousand men and women patients from across the with one another. The resulting instrument consists of
United States. This is an unusual and controversial three response validity scales (Inconsistency, Dissimula-
approach to the collection of a normative sample. More tion, Defensiveness) and nine adjustment scales. Each of
typically, population-proportionate sampling of reasona- the adjustment scales includes two or three subscales
bly normal individuals is used. Millon offers the arguable (Table 8.8).
justification that a patient sample is adequate for the nor- Scale raw scores are converted to T scores with a mean
mative sample because the base rates (in the general popu- of 50 and standard deviation of 10. Higher T scores indi-
lation) for specific personality and clinical disorders were cated increased probability of psychopathology or deficit.
consulted to calibrate the cutting points on the individual Norms for children ages 5 through 19 years of age are
scales (Millon & Davis, 1996). But this approach is com- based on a nationally representative sample of 2,306 par-
plex, experimental, and difficult to understand. The relia- ents of boys and girls in kindergarten through 12th grade.
bility of the individual scales is good: Internal consistency With the possible exception of the three validity scales
coefficients average .82 to .90, and test–retest coefficients (Inconsistency, Dissimulation, and Defensiveness), the
for one week range from .81 to .87. Support for the validity PIC-2 scale and subscale names are self-explanatory. The
of the MCMI-III is mixed (Haladyna, 1992; Piersma & Boes, validity scales are (1) Inconsistency, which includes 35 simi-
1997). Craig (1993) has assembled a series of articles that lar pairs of items to determine consistency of responding;
are largely supportive of the MCMI. Jankowski (2002) pro- (2) Dissimulation, a 35-item scale designed to identify delib-
vides a beginner’s guide to the test. erate exaggeration (fake bad) about symptoms or random
responding; and (3) Defensiveness, a 24-item scale consist-
Personality Inventory for Children-2 (PIC-2) ing of improbable virtues (e.g., “my child never has any
The PIC-2 (Lachar & Gruber, 2001) is a substantial revision problems”) and therefore an index of naive defensiveness.
238 Chapter 8

of behavioral assessment offer a practical alternative to


Table 8.8 Adjustment Scales and Subscales of the projective tests, self-report inventories, and other unwieldy
Personality Inventory for Children-2
techniques aimed at global personality assessment.
Adjustment Scales Subscales
Typically, behavioral assessment is designed to meet
the needs of therapists and their clients in a quick and
Cognitive Impairment Inadequate Abilities
uncomplicated manner. But behavioral assessment differs
Poor Achievement
from traditional assessment in more than its simplicity. The
Developmental Delay
basic assumptions, practical aspects, and essential goals of
Impulsivity and Distractibility Disruptive Behavior Fearlessness
behavioral and traditional approaches are as different as
Delinquency Antisocial Behavior
night and day. Traditional assessment strategies tend to be
Dyscontrol complex, indirect, psychodynamic, and often extraneous to
Noncompliance treatment. In contrast, behavioral assessment strategies
Family Dysfunction Conflict among Members tend to be simple, direct, behavior-analytic, and continu-
Parent Maladjustment ous with treatment.
Reality Distortion Developmental Deviation Behavior therapists use a wide range of modalities to
Hallucinations and Delusions evaluate their clients, patients, and subjects. The methods
Somatic Concern Psychosomatic Preoccupation of behavioral assessment include, but are not limited to,
Muscular Tension and Anxiety behavioral observations, self-reports, parent ratings, staff
Psychological Fear and Worry ratings, sibling ratings, judges’ ratings, teacher ratings,
Discomfort therapist ratings, nurses’ ratings, physiological assess-
Depression
Sleep Disturbance/Death Preoccupation ment, biochemical assessment, biological assessment,
Social Withdrawal Social Introversion Isolation structured interviews, semistructured interviews, and ana-
Social Skills Deficits Limited Peer Status
logue tests. In their Dictionary of Behavioral Assessment Tech-
niques, Hersen and Bellack (1988) list 286 behavioral tests
Conflict with Peers
used in widely diverse problems and disorders in children,
The reliability of PIC-2 scales and subscales is good, adolescents, adults, and the geriatric population. Dozens
with test–retest values in the range of .82 to .92 and internal more are referenced in a more recent compendium (Hersen
consistency coefficients in the range of .81 to .92. The test & Bellack, 1998). So that the reader can appreciate the
manual (Lachar & Gruber, 2001) summarizes a huge body diversity of techniques available, we provide a sampling of
of criterion-related validity studies such as correlations these tests in Table 8.9.
with independent ratings from clinicians. These correla- In recent years, a new form of behavioral assessment
tions are very strong for similar behavioral dimensions known as ecological momentary assessment has become
(and weak for dissimilar behavioral dimensions), thus sup- increasingly popular. In ecological momentary assessment,
porting the validity of individual scales and subscales. In the client carries a wireless handheld device similar to a
like manner, PIC-2 subscale scores show theory-consistent personal digital assistant and responds in real time to pre-
relationships with the DSM-IV diagnostic categories of planned inquiries from the researcher. This approach is
clinic-referred children. For example, 63 children indepen- designed to circumvent a number of limitations of tradi-
dently diagnosed with Oppositional Defiant Disorder tional self-report techniques. We discuss ecological momen-
showed highly elevated scores (average T scores of 75 to tary assessment in more detail at the end of this chapter.
80) on the following PIC-2 subscales: Disruptive Behavior, Behavioral assessment is often—but not always—an
Fearlessness, Dyscontrol, and Noncompliance. This is a integral part of behavior therapy designed to change the
perfect match to the major clinical features of this DSM-IV duration, frequency, or intensity of a well-defined target
diagnostic category. Overall, the test developers have cited behavior. For example, one therapy goal for a shy college
an impressive body of research that supports the reliability student might be that she initiate a minimum of five conver-
and validity of their instrument. Although independent sations lasting two minutes or more each day. The therapist
studies of this test are yet to be published, it seems clear might recommend that she approach this goal incremen-
that the PIC-2 will earn wide usage in the behavioral and tally, beginning with a few brief social exchanges before pro-
emotional assessment of school-aged children. ceeding to lengthier conversations with strangers. In this
example, behavioral assessment might take the form of self-
monitoring in which the student uses a wristwatch for tim-
8.2.4: Behavioral Assessment ing and a diary for keeping track of conversations.
Behavioral assessment concentrates on behavior itself As noted, behavioral assessment often exists in service
rather than on underlying traits, hypothetical causes, or of behavior therapy. In many cases, the nature of behavio-
presumed dimensions of personality. The many methods ral assessment is dictated by the procedures and goals of
Origins of Personality Testing 239

Wolpe deemphasized the significance of thoughts and


Table 8.9 A Sampling of Behavioral Assessment Tests beliefs. He viewed fear as a learned phenomenon that
and Techniques
could be unlearned by following a strict protocol of gradu-
Abnormal Involuntary Movement Scale ated exposure to the feared object or situation.
Alcohol Dependence Scale After Skinner, Bandura (1977), Mahoney and Arnkoff
Assertiveness Self-Statement Test (1978), and Meichenbaum (1977) reintroduced cognitive
Automatic Thoughts Scale
factors into the ever-changing behavioral framework. For
example, Bandura (1977) demonstrated that persons are
Behavioral Assessment of Satiety
perfectly capable of cognitively based learning. In particu-
Behavioral Pain Scale
lar, he showed that individuals can learn from mere obser-
Blood Alcohol Level
vation of the response contingencies experienced by
Body Sensation Questionnaire
models. Since this learning occurs in the absence of per-
Compulsive Activity Checklist
sonal consequences, it must be cognitively mediated. As a
Conversational Skills Rating Scale
consequence of this paradigm shift, practically all modern-
Current Dieting Questionnaire
day behavior therapists concern themselves—at least to
Dementia Behavioral Assessment Test some extent—with the thoughts and beliefs of their clients.
Drinking Context Scale This new emphasis is reflected in a family of very popular
Gifted Behaviors Rating Scale treatment procedures known collectively as cognitive
Goal Attainment Scaling behavior therapy (Hofmann & Reinecke, 2010).
Health Risk Attitude Scale
Irrational Beliefs Inventory
McGill Pain Questionnaire
8.2.5: Behavior Therapy
Physical Activity and TV Viewing and Behavioral Assessment
Physical Fighting—Youth Risk Survey At present, the specific techniques of behavior therapy can
Pittsburgh Insomnia Rating Scale be classified into four overlapping categories (Johnston,
Prosocial Behaviors of Children 1986): exposure-based methods, cognitive behavior thera-
Rape Trauma Symptom Rating Scale pies, self-control procedures, and social skills training.
Scale for the Assessment and Rating of Ataxia Behavioral assessment is used in all of these approaches, as
Scale of Sexual Experience reviewed in the following sections. However, there are rel-
Six Minute Walk Test atively few behaviorally based tools for the evaluation of
Sleep Assessment Scale social skills, so this category is not discussed. Readers who
Victimization in Dating Relationships desire limited coverage of instruments for the behavioral
evaluation of social skills training (including assertiveness)
should consult Meier and Hope (1998).
behavior therapy. For this reason, the reader will better
appreciate behavioral assessment tools if we interweave Exposure-Based Methods  Exposure-based meth-
this topic with a discussion of behavior therapy methods. ods of behavioral therapy are well suited to the treatment
Behavior therapy, also called behavior modification, is of phobias, which include intense and unreasonable fears
the application of the methods and findings of experimen- (e.g., of spiders, blood, public speaking). One approach to
tal psychology to the modification of maladaptive behav- phobic avoidance is systematic exposure of the client to the
ior (Plaud & Eifert, 1998). The roots of behavior therapy feared situation or object. Wolpe (1973) favored gradual
can be traced to Skinner’s (1953) seminal book, Science and exposure with minimal anxiety in a procedure known as
Human Behavior, which detailed the application of operant systematic desensitization. In this therapeutic approach,
conditioning to the problems of human behavior. Skinner the client first learns total relaxation and then proceeds
shunned any reference to private, nonobservable events from imagined exposure to actual or in vivo exposure to
such as thoughts or feelings; he emphasized the impor- the feared stimulus. Another exposure-based method is
tance of identifying observable behaviors and methodi- flooding or implosion in which the client is immediately
cally altering the environmental consequences of those and totally immersed in the anxiety-inducing situation.
behaviors. The therapist needs some type of behavioral assess-
Research by Wolpe (1958) on the systematic behavioral ment to gauge the continuing progress of a client undergo-
treatment of phobias also was influential in founding the ing an exposure-based treatment for a phobia. In the
methods of behavior therapy. Wolpe’s clinical procedures simplest possible assessment approach, known as a behav-
were derived from his laboratory work on the conditioning ioral avoidance test (BAT), the therapist measures how
and counterconditioning of fear in cats. Like Skinner, long the client can tolerate the anxiety-inducing stimulus.
240 Chapter 8

Here is one classic example of a standardized BAT used to


evaluate patients with agoraphobia, a disabling fear of Table 8.10 Example of a Fear Survey Schedule
open spaces often accompanied by panic attacks: Please check the column that best describes your current response to
these situations or objects.
The standardized Behavioral Avoidance Test (BAT) was
Degree to which you would be disturbed
conducted a week after intake. All anxiolytics, antidepres-
sants, or other psychotropic medication had been taken Not at Just a Moderate Very Extremely
All Little Amount Much Bothered
away at least 4 days before the test. The test was adminis-
Being in a strange
tered by the first author, who was blind to the patients’
place
diagnoses [and] not involved in the treatment. The patients
Speaking in public
were asked to walk alone as far as they could from the hos-
pital along a mildly trafficated road that was 2 km long. Walking into a party

The route was divided into eight intervals of equal length, Getting an injection
and the patients rated their anxiety level on a 0–10 scale at People watching
the end of each interval. Uncompleted intervals were given me work
a score of 10. An avoidance-anxiety score was computed by Large open spaces
summing the anxiety scores for all intervals. Being fat
(Hoffart, Friis, Strand, & Olsen, 1994) Spider on the wall

The researchers discovered that the avoidance-anxiety Cat in the room

score from the BAT technique was strongly related to self- Reprimand from
the boss
reports of catastrophic thoughts (e.g., choking to death,
having a heart attack, acting foolish, becoming helpless). NOTE: Most fear survey schedules consist of several dozen items.

This finding illustrates that behavioral assessment


approaches often encompass a cognitive component as Klieger and Franklin (1993) have raised a number of
well. Notice, too, the direct relationship between the goal cautions about the use of fear survey schedules in clinical
of therapy and the behavioral avoidance test. In agorapho- research. These authors note that reliability data for fear
bia, the primary treatment goal is to reduce patients’ anxi- surveys are almost nonexistent. A more serious problem
ety about walking alone in open spaces—which is exactly has to do with the validity of these instruments. Using the
what the BAT measures. Wolpe and Lang (1977) Fear Survey Schedule-III (FSS-III),
The BAT approach is predicated on the reasonable a highly respected and widely used schedule, Klieger and
assumption that the client’s fear is the main determinant of Franklin (1993) found no relationship between reported
behavior in the testing situation. Unfortunately, demand fears on the FSS-III and BAT measures of the same fears.
characteristics for desirable behavior may exert a strong For example, subjects who reported a high fear of blood on
influence on the client’s behavior. The client’s tolerance of the FSS-III were just as likely to approach a bloody white
the anxiety-inducing stimulus will bear some relationship towel and touch it as were subjects who reported no fear of
to experienced fear but also has much to do with the situa- blood. Similar results were found for subjects who feared
tional context of assessment (McGlynn & Rose, 1998). The snakes, spiders, and fire. The researchers concluded that
results of BAT assessments may not generalize, and the the FSS-III and similar instruments are a poor choice for
therapist must be wary of foreclosing treatment too soon. identifying experimental groups and a poor basis for meas-
A fear survey schedule is another type of behavioral uring the outcome of therapeutic interventions. The essen-
assessment useful in the identification and quantification tial downfall seems to be that fear survey schedules possess
of fears. Fear survey schedules are face valid devices that such “obvious” validity that few researchers have both-
require respondents to indicate the presence and intensity ered to evaluate the traditional psychometric characteris-
of their fears in relation to various stimuli, typically on a tics of reliability and validity. Fear survey schedules should
5- or 7-point Likert scale. Dozens of these instruments have be used with caution.
been published, including versions by Wolpe (1973), Ollen-
dick (1983), and Cautela (1977). Tasto, Hickson, and Rubin Cognitive Behavior Therapies  The one factor
(1971) used factor analysis to develop a 40-item survey that common to all cognitive behavior therapies is an empha-
yields a profile of fear scores in five categories. A generic sis on changing the belief structure of the client. The three
fear survey schedule is shown in Table 8.10. Fear survey best-known variants of cognitive behavior therapy are
schedules are often used in research projects to screen large Ellis’s (1962) rational emotive therapy (RET), Meichen-
samples of persons in search of subjects who share a com- baum’s (1977) self-instructional training, and Beck’s
mon fear. Another use of these schedules is to monitor (1976) cognitive therapy. Ellis postulates that most dis-
changes in fears, including those that have been targeted turbed behavior is caused by irrational beliefs, such as the
for clinical intervention. widespread belief that one must have the love and
Origins of Personality Testing 241

approval of all significant persons at all times. Ellis


attempts to alter such core irrational beliefs, primarily by Table 8.11 Questionnaire Measures of Cognitive Distortion
logical argument and forceful exhortation. Meichen-
baum’s self-instructional technique consists of teaching
the client to use coping self-statements to combat stressful
situations. For example, a college student suffering from
intense test-taking anxiety might be taught to use the fol-
lowing self-talk during examinations: “You have a strat-
egy this time…. Take a deep breath and relax…. Just
answer one question at a time….” Beck’s cognitive ther-
apy concentrates mainly on the role of cognitive distor-
tions in the maintenance of depression and other
emotional disturbances. Beck (1983) regards depression
as primarily a cognitive disorder characterized by the
negative cognitive triad: a pessimistic view of the world,
a pessimistic self-concept, and a pessimistic view of the
future. In therapy, he uses a gentle form of cognitive
restructuring to help the client perceive his or her prob-
lems in alternative, solvable terms.
Cognitive behavior therapists need not use formal
assessment tools in their clinical practice. Typically, these
therapists monitor the belief structure of their clients on
an informal session-to-session basis. Irrational and dis-
torted thoughts are challenged as they arise during ther-
apy. In the end, the client’s self-report of improvement
may constitute the main index of therapeutic success.
Nonetheless, several straightforward measures of cogni-
tive distortion are available. We have outlined a few
prominent instruments in Table 8.11. These instruments
are mainly research questionnaires suitable to the testing
of group differences, but not sufficiently validated for
individual assessment. Clark (1988) faults the developers
of cognitive distortion questionnaires for premature
release of their instruments. In particular, he notes the
absence of research on the concurrent and discriminant
validity of most self-statement measures. Another prob-
lem is that existing questionnaires were designed to vali-
date constructs in research and consequently do not work
well in clinical practice.
An exceptional and well-validated measure not
listed in Table 8.11 is the Beck Depression Inventory
(BDI). The BDI is a short, simple, self-report question-
naire that focuses, in part, on the cognitive distortions
that underlie depression (Beck & Steer, 1987; Beck, Ward,
Mendelsohn, Mock, & Erbaugh, 1961). One reason for its
popularity is that most patients can complete the 21
items on the BDI in 10 minutes or less. The test has been
widely used: More than 1,900 articles using the BDI have
been published (Conoley, 1992). A second edition of the
inventory was released in 1996 (Beck, Steer, & Brown,
1996). On the BDI-II, several items were revised so as to
bring the inventory into closer conformity with prevail-
ing diagnostic criteria for depression. The 21 items are of
the following form:
242 Chapter 8

Check the statement from this group that you feel is depression, and tended to report more somatic symptoms,
most true about you: especially for high levels of depression. The authors pro-
pose revised cut-off scores for the various levels of depres-
0 I am upbeat about the future. sion (mild, moderate, and severe) in older patients.
1 I feel slightly discouraged about the future. The BDI-II is particularly useful in primary care medi-
2 I feel the future has little to offer for me. cal settings, where the presence of significant depression
3 I feel that the future is utterly hopeless. can be overlooked. Many patients are not aware of their
illness, and some physicians may not be trained to examine
Thirteen items cover cognitive and affective compo- for it. In a sample of 340 medical outpatients, Arnau, Mea-
nents of depression such as pessimism, guilt, crying, inde- gher, Norris, and Bramson (2001) found that 23 percent of
cision, and self-accusations; eight items assess somatic and the group scored in the range indicative of mild, moderate,
performance variables such as sleep problems, body image, or severe depression on the test. The instrument proved
work difficulties, and loss of interest in sex. The examinee helpful in identifying patients with depression who might
receives a score of 0 to 3 for each item; the total raw score is otherwise be overlooked. Overall, the BDI-II was 92 per-
the sum of the endorsements for the 21 items; the highest cent accurate in identifying patients meeting the formal
possible score is 63. criteria for Major Depressive Disorder.
In a meta-analysis of BDI research studies, the internal The only shortcoming of the BDI-II is its transparency.
consistency of the scale (coefficient alpha) ranged from .73 Patients who wish to hide their despair or exaggerate their
to .95, with a mean of .86 in nine psychiatric populations depression can do so easily. However, for patients who are
(Beck, Steer, & Garbin, 1988). The BDI-II possesses excel- motivated to accurately report their cognitive and emo-
lent internal consistency with a coefficient alpha of .92 tional status, the BDI-II ranks among the best instruments
(Beck, Steer, & Brown, 1996). Test–retest reliability of the for indexing the presence and degree of depression. Some
BDI is modest, with a range of .60 to .83 in nonpsychiatric clinicians ask patients to complete the BDI-II after each
samples and .48 to .86 in psychiatric samples. However, the therapy session; they use the BDI much as a physician
test–retest methodology is not well suited to phenomena might use a thermometer.
such as depression that are naturally unstable. Subjective
depression fluctuates dramatically from week to week, day Self-Monitoring Procedures  A common mis-
to day, even hour to hour. A lackluster value for test–retest conception about behavior therapy is that it consists of
reliability might signify valid change in the construct being authoritarian therapists applying powerful rewards and
measured rather than unwanted measurement error. punishments to passive clients. Although this stereotypi-
A variety of normative results are available, with BDI cal model may be true for some impaired clients with lim-
data for samples of patients with major depression, dys- ited behavioral repertoires, for the most part behavior
thymia, alcoholism, heroin addiction, and mixed problems. therapy consists of humane practitioners teaching their
The manual also provides guidelines for degree of depres- clients methods of self-control. An emphasis upon self-
sion based upon BDI score (0 to 9, normal; 10 to 19, mild to monitoring is fundamental to all forms of behavior ther-
moderate; 20 to 29, moderate to severe; 30 and above, apy. In self-monitoring, the client chooses the goals and
extremely severe). These ratings are based upon clinical actively participates in supervising, charting, and record-
evaluations of patients. ing progress toward the end point(s) of therapy. Accord-
The BDI has been extensively validated against other ing to this model, the therapist is relegated to the status of
measures of depression and independent criteria of depres- expert consultant.
sion. For example, correlations with clinical ratings and Self-monitoring procedures are especially useful in the
scales of depression such as from the MMPI are typically in treatment of depression, a prevalent behavior disorder
the range of .60 to .76 (Conoley, 1992). Sex differences are consisting of sad mood, low activity level, feelings of
minimal, although there may be slight differences in the worthlessness, concentration problems, and physical
expression of depression between men and women (Steer, symptoms (sleep loss, appetite disturbance, reduced inter-
Beck, & Brown, 1989). Large college student samples of est in sex). Several self-monitoring programs for depres-
Whites (N = 838) and Blacks (N = 139), the BDI-II was sion have been reported (Lewinsohn & Talkington, 1979;
found to be free of racial bias (Sashidharan, Pawlow, & Rehm, Kornblith, O’Hara, & others, 1981). In order to illus-
Pettibone, 2012). Yet, in a comparison of 218 older patients trate the self-monitoring approach to the control of depres-
(M = 69.4 years of age) versus 613 younger patients (M = 37.9 sion, we will summarize one small corner of the program
years of age), Kim, Pilkonis, Frank, Thase, and Reynolds advocated by Lewinsohn and his colleagues (Lewinsohn,
(2002) found strong evidence of differential item function- Munoz, Youngren, & Zeiss, 1986).
ing. Specifically, older patients tended to report fewer cog- Lewinsohn observed that depression goes hand in
nitive symptoms, especially for low to average levels of hand with a marked reduction in the experiencing of
Origins of Personality Testing 243

pleasant events. Depressed persons retreat from engaging (1982) report favorably on the technical qualities of the PES
in pleasant activities; the behavioral withdrawal only con- and discuss a variety of rational, factorial, and empirical
tributes further to their depression, inciting a continuous subscales, which we cannot review here. The instrument
downward spiral. Fortunately, it is possible to replace the has fair to good test–retest reliability (one-month correla-
downward spiral with an upward one. To help reverse the tions in the range of .69 to .86), excellent concurrent valid-
downward spiral of depression, Lewinsohn and his col- ity with trained observers, and promising construct
leagues devised the Pleasant Events Schedule (PES; validity. In general, the subscales behave as one would
MacPhillamy & Lewinsohn, 1982). The purpose of the PES predict on the basis of the constructs they purport to
is twofold. First, in the baseline assessment phase, the PES measure—we refer the reader to MacPhillamy and Lewinsohn
is used to self-monitor the frequency (F) and pleasantness (1982) for details.
(P) of 320 largely ordinary, everyday events. Examples of
the kinds of events listed on the PES include the following:
8.2.6: Structured Interview
reading magazines
Schedules
going for a walk
An important responsibility for many mental health practi-
being with pets
tioners is to determine a proper psychiatric diagnosis for
playing a musical instrument
their patients, within prevailing guidelines. Almost with-
making food for charity out exception, practitioners utilize the Diagnostic and Statis-
listening to the radio tical Manual of Mental Disorders, now in its fourth edition
reading poetry (DSM-IV; APA, 2000). The latest version includes a “Text
attending a church service Revision” and for this reason is known technically as
DSM-IV-TR. Here we use the less cumbersome acronym
watching a sports event
DSM-IV. DSM-V is scheduled for release in 2013.
playing catch with a friend
Five axes are included in the DSM-IV classification.
working on my job Axis I concerns clinical disorders such as Alcohol Use Disor-
The frequency and pleasantness of these everyday der, Panic Disorder, Major Depressive Disorder, or Schizo-
events are both rated 0 to 2.6 The mean rate of pleasant phrenia. Axis II pertains to personality disorders such as
activities is then calculated from the sum of the F × P Borderline Personality Disorder, Avoidant Personality Dis-
scores; that is, mean rate = F × P/320. Normative findings order, or Dependent Personality Disorder. Axis III is
for mean F, mean P, and mean F × P are reported in Lewin- employed to identify general medical conditions (e.g., hypo-
sohn, Munoz, Youngren, and Zeiss (1986) and serve as a thyroidism, heart disease) that may bear upon psychological
basis for treatment planning. Participants in the Lewin- adjustment. Axis IV is for reporting psychosocial and envi-
sohn program also monitor their daily mood on a simple 1 ronmental problems (e.g., loss of friends, unemployment,
(worst) to 9 (best) basis. litigation, no health insurance) that may impact personal
The second use of the PES is to self-monitor therapeu- functioning. Axis V consists of an anchored rating scale, the
tic progress. Based on the initial PES results, clients iden- Global Assessment of Function (GAF) Scale, used to assign a
tify 100 or so potentially pleasant events and strive to summary score of functioning from 1 (e.g., immobilized,
increase the frequency of these events, monitoring daily suicidal) to 100 (e.g., thriving, sought out). Of course, inter-
mood along the way. Clients who increase the frequency of mediate scores are available and clearly operationalized. For
pleasant events generally show an improvement in mood example, a GAF score of 70 indicates some mild symptoms
and other depressive symptoms. but generally good psychological functioning.
The PES is a highly useful tool for clinicians who wish Diagnosis is construed by some people as a form of
to implement a self-monitoring approach to the assessment pointless, overconfident, pigeonholing. In truth, it serves a
and treatment of depression. MacPhillamy and Lewinsohn number of indispensable functions. As outlined by
Andreasen and Black (1995), these key purposes include:

6
The Frequency Scale is calibrated as follows: • Reducing the complexity of clinical phenomena
0—This has not happened in the past 30 days. • Facilitating communication between clinicians
1—This has happened a few times (1 to 6 times) in the past 30 days.
• Predicting the outcome of the disorder
2—This has happened often (7 times or more) in the past 30 days.
• Deciding on an appropriate treatment
The Pleasantness Scale is calibrated as follows:
• Assisting in the search for etiology
0—This was not pleasant.
1—This was somewhat pleasant. • Determining the prevalence of diseases worldwide
2—This was very pleasant. • Making decisions about insurance coverage
244 Chapter 8

Yet, for all of its advantages, there are also problems Finally, we would be remiss not to mention a family of
with DSM-IV. One problem is the sheer amount of time it instruments known as SCID, the Structured Clinical Inter-
can take to determine a multiaxial diagnosis. A second and view for DSM-IV (First & Gibbon, 2004). SCID comes in
related difficulty is that, although the DSM-IV textbook numerous editions and variations, including SCID-I for
describes the diagnostic categories and alternatives with Axis I diagnoses, SCID-II for Axis II diagnoses, SCID-P for
great precision, it does not specify a coherent method for determining the differential diagnosis of psychotic symp-
arriving at the diagnosis. A third problem flows from the toms, and SCID-NP for nonpatient settings in which a cur-
previous two, namely, psychiatric diagnosis is mixed in its rent psychiatric disorder is unlikely. All of the forms follow
reliability (Andreasen & Black, 1995). Interrater agreement the same format in which the interviewer reads the SCID
for some diagnoses is very high (e.g., Alcohol Use Disor- questions to the client in sequence, the objective being to
der) but for other diagnoses it is only moderate to low (e.g., elicit sufficient information to determine whether individ-
Borderline Personality Disorder). ual DSM-IV criteria are met. The interviewer has the lee-
Several interview schedules have been developed to way to ask for specific examples of affirmative answers.
reduce the time needed for diagnosis and also to improve Thus, SCID is a semistructured interview. A logical flow
the reliability of the enterprise by standardizing the proce- sheet is followed to determine the appropriate diagnosis.
dures. Broadly speaking, these instruments are of two The SCID reveals generally good interrater agreement for
types: semistructured approaches that allow for some clini- DSM-IV diagnosis, but this is variable from one diagnosis
cian leeway in follow-up questioning, and structured to the other. In Table 8.12, we have summarized the average
approaches that mandate a completely scripted approach.
Here we will describe two prominent schedules to illus-
trate this important form of psychological assessment.
Table 8.12 Average SCID Interrater Agreement
for Psychiatric Diagnosis
The Schedule for Affective Disorders and Schizophre-
nia (SADS; Spitzer & Endicott, 1978) is a highly respected Axis I Diagnoses Weighted Kappa
diagnostic interview for evaluating Axis I mood and psy-
Major Depressive Disorder 79
chotic disorders. The SADS is a semistructured inquiry
Dysthymic Disorder 63
that includes standard questions asked of all patients and
Bipolar Disorder 77
optional probes used to clarify patient responses (Rogers,
Schizophrenia 80
Jackson, & Cashel, 2004). Additional unstructured ques-
Alcohol Dependence/Abuse 90
tions can be asked to augment the optional probes. Part I
of the SADS methodically examines Axis I symptoms for Other Substance Dependence/Abuse 86

the current episode, including the worst period and the Panic Disorder 75

current week, whereas Part II provides a survey of past Social Phobia 63

episodes. Through a progression of questions and crite- Obsessive Compulsive Disorder 53


ria, the interviewer solicits sufficient information to assess Generalized Anxiety Disorder 66
the severity of disturbance and also to elucidate the diag- Post-Traumatic Stress Disorder 89
nosis. For example, one item on the SADS addresses Somatoform Disorder 41
prominent signs of depression: pessimism and hopeless- Eating Disorder 71
ness. A standard inquiry for this item might be: “Have Axis II Personality Disorders
you felt discouraged?” An affirmative answer would trig- Avoidant 64
ger optional probes such as “How do you see things Dependent 66
working out?” Obsessive Compulsive 56
Rogers (2001) has reviewed the voluminous research Passive-Aggressive 67
on reliability and validity of the SADS and offers an encour-
Self-Defeating 62
aging endorsement of the instrument. For example, the con-
Depressive 65
sensus from over 21 studies is that the interrater reliability
Paranoid 68
for specific diagnoses is typically strong, with median
Schizotypal 70
kappa coefficients of greater than .85. Kappa is the index of
Schizoid 76
interrater agreement, corrected for chance (Cohen, 1960).
Histrionic 64
Validity for the SADS also is robust with moderate predic-
Narcissistic 74
tive validity (e.g., results moderately predict the course and
outcome of mood disorders) and strong concurrent validity Borderline 62

(e.g., results correlate with other similar schedules). A Antisocial 72

child’s version of the schedule, known as the “kiddie” NOTE: Decimals omitted.
SADS or K-SADS, also is available (Ambrosini, 2000). Source: Average results for multiple studies reported on the SCID website (www.scid4.org).
Origins of Personality Testing 245

kappas from multiple studies of SCID reliability. Kappa The purpose of this kind of assessment is to objectify the
values above .70 are considered good agreement, values extent of troublesome actions. This information serves as a
from .50 to .69 are deemed fair, and values below .50 indi- baseline for later comparison to determine the effective-
cate poor agreement. ness of any interventions. See Figure 8.4 for an example. In
this hypothetical example, it is evident that the student
“Sammy” is more out of control in the afternoon than the
8.2.7: Assessment by Systematic morning, which may be valuable information when it
Direct Observation comes to remediation planning.
Although not a prominent approach with adults, system-
atic and direct observation is widely used in the evaluation Figure 8.4 Example of a Frequency Recording Sheet
of children, especially by psychologists who work in school
systems. In fact, Wilson and Reschly (1996) determined
that systematic observation is the single most commonly Date: November 10, 2005 Observer: Judy Jones
used assessment method among school-based practition- Student: Sammy Smith Age: 8-5 Grade: 3
ers, who reported an average of more than 15 student Target Behaviors
behavioral observations per month.
Time Calling Leaving
It is essential to distinguish systematic, direct observa-
Period Out Seat Off Task
tion from more casual approaches such as naturalistic
observation. Anyone can engage in the informal and anec- 9:00–9:15 ×××× ×× ××××
dotal methods that characterize naturalistic observation— 9:15–9:30 ××× ××× ××
and most people do so every day. These methods typically 9:30–9:45 ××× ××× ××
culminate in formless conclusions such as “Johnny seems
9:45–10:00 × ×× ××
to be out of his seat a lot during the school day.” In con-
2:00–2:15 ××××× ××××× ××
trast, systematic and direct observation is highly structured
and set apart by five characteristics (Hintze, Volpe, & Sha- 2:15–2:30 ×××××× ×××× ×××××××
piro, 2002; Salvia & Ysseldyke, 2001): 2:30–2:45 ××××× ××× ×××××××
1. The goal of observation is to measure specific behaviors. 2:45–3:00 ×××× ×××× ×××××××
2. The target behaviors have been operationally defined Calling Out: Specific episodes of interrupting
beforehand. teacher, calling to classmates, making noise,
yelling
3. Observations are conducted under objective, standard- Leaving Seat: Separate event such as standing with-
ized procedures. out permission, leaving the seat, knees on seat
4. The times and places for observation are carefully Off Task: Not doing assigned work (e.g., daydream-
ing, playing with objects, doing other work)
specified.
5. Scoring is standardized and does not vary from one
observer to another.
Another approach to systematic, direct observation is
This form of assessment is appealing because of its to record the duration of target behaviors. Typically, target
direct link to intervention. In fact, it is common to employ behaviors are undesirable actions such as temper tantrums,
observational assessment before, during, and after an inter- social isolation, or aggressive outbursts, but the focus of
vention to determine the impact on the individual student. assessment also may include desirable behaviors such as
Commonly, systematic and direct observation is exe- staying on task during a designated reading period or vigi-
cuted by means of an objective, structured coding system. lantly working on a homework assignment (Hintze, Volpe,
Many different styles of coding systems have been pro- & Shapiro, 2002). For some behaviors, duration may be
posed; we have space here only to illustrate a few popular more important than frequency. Consider out-of-seat
methods. Sattler (2002) provides an extensive review, behavior as an example. A third grader who is out of his
devoting two chapters to this topic. One straightforward seat in a morning for six brief episodes of a few seconds
approach is simple frequency counting of target behaviors. each is far, far less problematic—both to self and others—
Typically, the target behaviors are undesirable behaviors than a student who leaves his seat once for 10 minutes. See
such as a student leaving his or her seat, calling out, or Figure 8.5 for an example of a duration recording sheet. In
being off task. Of course, the characteristics of these behav- this hypothetical example, it is evident that “Susan” exhib-
iors would be carefully specified in advance. Then an its a high level of undesirable behavior. The goal of inter-
observer sits off to the side and unobtrusively records the vention might be to reduce both the frequency and the
frequency of each behavior within discrete time periods. average duration of her tantrum behaviors.
246 Chapter 8

ment. With regard to poor design of instruments, the most


Figure 8.5 Example of a Duration Recording Sheet
common error is coding complexity, in which there are too
many categories or ill-defined categories. Attention to
Date: November 10, 2005 Observer: Judy Jones design of rating scales and pretesting of instruments will
Student: Susan Brown Age: 8-5 Grade: 3 avert this problem. Problems also can arise in the suitable
sampling of behavior. For example, if a child’s attentional
Time Start: 9:00 Time Stop: 12:00 difficulties mainly arise in the afternoon, clearly it is point-
less to collect data only in the morning. Ratings should be
Tantrum Behavior Elapsed Time in
Separate Incidents Minutes and Seconds collected throughout the day or, if this is not possible, dur-
ing the most salient time periods.
1 3 min 00 s
2 2 min 30 s
8.2.8: Analogue Behavioral
3 1 min 15 s
4 4 min 30 s
Assessment
The methods of analogue behavioral assessment are closely
5 2 min 45 s
related to the methods of systematic, direct observation.
Total: 14 min 00 s The main difference has to do with the settings in which
Average Episode 2 min 48 s the observations occur. In systematic, direct observation,
the assessment of clients takes place in a natural setting
such as a classroom. In analogue behavioral assessment,
In addition to the individualized forms of direct clients are observed in a contrived but plausible setting
observation that we have illustrated here, dozens of pub- and also are instructed to engage in relevant tasks designed
lished forms also are available (e.g., Sattler, 2002). For to elicit behaviors of interest (Haynes, 2001). The goal is to
these instruments, the categories of observation and the create a state of affairs analogous to pivotal situations in
operational definitions are prespecified, which saves time real life—hence, the use of the word analogue in describing
for the practitioner. For example, Shapiro (1996) has issued this form of observational assessment.
the Behavior Observation of Students in Schools (BOSS), a Perhaps some examples will help clarify the nature and
straightforward form that consists of six categories of scope of this approach. One application of analogue behav-
classroom behavior—five designed for students and one ioral assessment is the evaluation of children referred for
for the teacher. The BOSS classifies behaviors as active assessment of behavior or school problems (Mori & Armen-
engagement, passive engagement, off-task motor, off-task dariz, 2001). A specialist who works with these children
verbal, and off-task passive. Of course, these categories could dedicate a separate room in his or her clinic to ana-
are thoroughly defined in operational terms. Direct logue behavioral assessment. The room might resemble a
instruction by the teacher also is recorded. The BOSS is small classroom, complete with blackboard, a few student
rated in 15-second intervals for a 15-minute interval. The desks, and bookcases. The referred child would be given a
instrument also allows for the collection of behavioral realistic homework assignment and told to work on it for 30
norms for classmates to determine normative patterns in minutes while waiting for the interview. The psychologist
each category. then observes through a one-way window and records rel-
Although direct observations offer the utmost simplic- evant behaviors using a suitable rating scale.
ity in format, it is important to recognize a number of Analogue behavioral assessment also can be used to
threats to reliability and validity for this genre of assess- evaluate parent–child interactions. For example, in evalu-
ment (Baer, Harrison, Fradenburg, Petersen, & Milla, 2005). ating a 3-year-old referred for behavior problems, the clini-
Sattler (2002) has catalogued the sources of unreliability, cian might place the parent and child in a room full of toys
which include personal qualities of the observer, poor with instructions to play for 10 minutes. The psychologist
design of instruments, and problems in obtaining a repre- then instructs the parent to tell the child, “Okay, it’s time to
sentative sample of behavior. For example, observer drift go. You have to pick up the toys just like you do at home.”
occurs when an observer becomes fatigued and less vigi- The clinician observes through a one-way window and
lant over time, thus failing to notice target behaviors when codes both the parental management style and the nature
they occur. Expectations also can influence ratings such as and degree of child compliance.
when the observer has been told that a child is aggres- In like manner, analogue behavioral assessment has
sive—and then records questionably aggressive acts as been used in the assessment of adult couples, including
aggressive. The primary antidote to observer inaccuracy is husbands and wives seeking marital therapy (Heyman,
careful training and cross-checking of one observer against 2001). In a standard paradigm, the clinician asks the couple
another to demonstrate a high level of interrater agree- to discuss two conflict areas for 5 to 7 minutes each. The
Origins of Personality Testing 247

clinician sits to the side observing the interactions and real-time sampling of the actual pain experiences, would
recording communication patterns with a standard form provide a more accurate portrayal of the episode. Recency
such as the Rapid Couples Interaction Scoring System is another recall bias that is circumvented by EMA. The
(RCISS; Krokoff, Gottman, & Hass, 1989). The RCISS con- recency bias refers to the fact that people are more likely
sists of 22 codes that address speaker and listener behav- to recall recent events than remote events. Potentially, this
iors, both verbal and nonverbal, in such categories as could lead to underestimation of the therapeutic effects of
criticism, disagreement, compromise, positive solution, a drug if retrospective recall coincided with the onset of
questioning, humor, and smiling. Instruments of this genre symptoms. In contrast, with an EMA analysis, client
typically do not reveal strong interrater agreement for spe- reporting consists of periodic and instantaneous time
cific constructs (e.g., put-downs), but the more inclusive samples; the results are relatively unaffected by the
constructs such as positive affect versus negative affect fare recency bias.
better and provide information that is helpful in character- In general, EMA provides a more accurate and reliable
izing communication patterns (Heyman, 2001). There are approach to the assessment of patient experience than tra-
little or no data on the test–retest reliability of the RCISS or ditional approaches such as retrospective questionnaires.
similar instruments, and some researchers advise caution One advantage is that compliance cannot be faked (as
in their use. For example, King (2001) faults the RCISS when patients fill out a week’s worth of daily question-
because it does not deal adequately with issues of subtext naires minutes before handing them in to the researcher).
or “reading between the lines” in couples’ communication. In fact, because EMA approaches are highly user-friendly,
researchers report an astonishing overall compliance of 93
8.2.9: Ecological Momentary to 99 percent averaged across many studies (Shiffman et
al., 2001). EMA has been used in research into treatments
Assessment for acute pain, alcoholism, arthritis, asthma, depression,
Recent advances in wireless connectivity have spawned an eating disorders, headaches, hypertension, gastrointestinal
entirely new approach to assessment known as ecological disorders, schizophrenia, smoking, and urinary inconti-
momentary assessment (EMA). Ecological momentary nence (Shiffman & Hufford, 2001; Shiffman, Hufford, Hick-
assessment is defined as the “real-time measurement of cox, and others, 1997; Smyth, Wonderlich, Crosby, and
patient experience in the real world, at the point of experi- others, 2001). As EMA technology becomes streamlined
ence” (Shiffman, Hufford, & Paty, 2001). Consider the and more affordable, we can expect this new technique to
research problem of determining whether a new drug become commonplace in psychological outcome studies
treatment is effective in ameliorating the severe pain of with clients.
migraine headaches. Whereas previous research methods In addition to practical applications in health care
relied upon retrospective questionnaire reports of patients research, the EMA methodology also can be used to test
receiving a new drug treatment, an EMA approach instead psychological theories, as illustrated by a recent study of
would consist of patients reporting their instantaneous emotions. Tong, Bishop, Enkelmann, and others (2005)
experiences on a handheld device, with responses immedi- enlisted the cooperation of 118 police officers in Singapore
ately transmitted (via the same wireless technology used to wear an ambulatory blood pressure monitor during
by cell phones) to a central computer for ultimate analysis their work day. This device also beeped at random about
with sophisticated software. For example, the handheld every 30 minutes, a signal that the officer should fill out a
device might “beep” to signal that the patient should simple 12-item questionnaire in a palmtop as soon as pos-
immediately respond (on a touch-sensitive screen) to a sible. The items, rated on 5-point scales, included topics
series of rating scales for pain, mood, fatigue, and other rel- such as:
evant dimensions. The entire self-rating procedure might
take less than a minute. The ratings would be requested • How pleasant is this event?
several times a day on a randomized schedule. • To what extent are you getting what you expected?
Because EMA responses of clients are immediate and • How much personal effort is needed to deal with it?
based on a schedule determined by the researcher, several • How much control do you have over the event?
biases of human recall are avoided. For example, consider
the biasing effects of saliency, in which emotionally With practice, it would take less than a minute to fill
charged events dominate recall. For instance, a very brief out a questionnaire of this nature. Of course, the added
episode of severe migraine pain may be recalled as lasting advantage of the EMA approach is that data are collected
much longer than the actual experience because of the in naturalistic settings in real time, and, therefore, not
emotional valence of the incident. Whereas a retrospec- prone to biases in recall.
tive questionnaire report of this pain would be affected by In some cases, EMA provides for insights that would be
the salience of the event, an EMA analysis, with periodic difficult to achieve with any other research methodology.
248 Chapter 8

Consider the common belief that binge eating is main- requested mood reports will occur just before and just
tained because it reduces negative affect, which is known after episodes of binge eating. In a meta-analysis of 36
as the affect regulation model (Polivy & Herman, 1993). EMA studies including 968 participants, Haedt-Matt and
Put simply, this is the view that people binge on food Keel (2011) found that negative affect increased prior to
because they feel bad, and bingeing helps them feel bet- episodes of binge eating. But they also discovered that
ter, at least in the short run. Because retrospective reports negative affect continues to increase afterward, which
are notoriously untrustworthy, researchers prefer more fails to support a key prediction of the affect regulation
immediate access to personal experiences in real time. model.
Fortunately, when EMA is used with large samples of
binge eaters, it is inevitable that some of the randomly Chapter Quiz: Origins of Personality Testing
Chapter 9
Assessment of Normality
and Human Strengths
Learning Objectives
9.1 Review the qualities of several tests and 9.2 Explain positive psychological assessment
discuss their strengths and weaknesses

9.1: Assessment Within the do not fit neatly into the categories of the Diagnostic and
Statistical Manual of Mental Disorders, Fourth Edition
Normal Spectrum (DSM-IV).
When a practitioner wants to assess personality within
9.1 Review the qualities of several tests and discuss
the normal spectrum, tests designed expressly for that pur-
their strengths and weaknesses
pose typically provide a more helpful perspective than
In the previous chapter we surveyed tests used by psychol- instruments developed from the standpoint of psychopa-
ogists to evaluate clients for a range of symptoms and life thology. Instead of measuring concepts such as depression,
difficulties. These instruments Tests used by psychologists paranoia, anxiety, narcissism, or suicide potential, the focus
to evaluate clients for a range of symptoms and life diffi- in these alternative instruments is on qualities pertinent to
culties included the mainstays of the profession such as the the normal range of human functioning. We are referring
MMPI-2, MCMI-III, Rorschach, and TAT. Such tests might here to features like responsibility, social presence, intuition,
be referred to as “clinical” in nature, because they are well locus of control, attachment style, or faith maturity. This
suited to the needs of clinical practice. But what are practi- chapter investigates an assortment of instruments suitable
tioners to do if they want to evaluate someone who is rea- for assessment within the normal continuum and beyond.
sonably normal? In other words, assessment does not Normality differs from abnormality by shades of gray
always entail delving into symptoms, distress level, rather than revealing a sharp demarcation (Offer & Sab-
defense mechanisms, diagnosis, and the like. One example shin, 1966). Understanding the various definitions of nor-
might be a young executive who wants to know about mality would involve a lengthy detour; we do not pursue
“growth edges” in regard to leadership positions. Another the topic here. In their comprehensive textbook of psychia-
example might be a college student who desires self- try, Sadock and Sadock (2004) provide an excellent over-
knowledge as part of vocational explorations. view. Our goal here is to focus on useful tests and measures,
Even though clinical tests such as those surveyed in including some that have been neglected because of the
the previous chapter can be employed within the normal emphasis on psychopathology within the field of clinical
spectrum, they do not excel in this application. In fact, the psychology.
evaluation of normal personality was not the original pur- In Module 9.1, Assessment Within the Normal Spec-
pose of tests such as the MMPI or the Rorschach. For exam- trum, we explore the qualities of several tests and discuss
ple, the initial objective of the MMPI-2 was the diagnosis of their strengths and weaknesses. We feature a few widely
psychopathology, which remains the most dominant and used scales in this topic, including the venerable Myers-
effective application of the instrument. Historically, the Briggs Type Indicator (Myers & McCaulley, 1985), one of
purpose of the Rorschach has been described by Frank the most widely employed personality tests of all time, and
(1939) and others as providing an “X-ray of the mind” to the California Psychological Inventory (Gough & Bradley,
identify themes hidden away from ordinary observation. 1996), a measure with strong empirical roots.
Currently, the most common application of the test is with In addition to their value in the assessment of client
clients who display complex psychological symptoms that personality, tests also contribute to our understanding of

249
250 Chapter 9

both typical and atypical trajectories of personality across types (Myers & McCaulley, 1985; Tzeng, Ware, & Chen,
the life span. For this reason, we follow a key research issue 1989). As discussed below, recent adaptations of the test also
in personality psychology, namely, whether personality provide dimensional scores in addition to the well-known
remains stable or tends to shift in specific directions with four-letter typological codes.
age. We close the topic with an evaluation of tools for According to the publisher, the MBTI is the most widely
assessing spiritual and religious constructs. used individual test in history, taken by approximately 2
Other forms of assessment pertinent to the normal million people a year. Proponents of the instrument deem it
spectrum of adult functioning also are covered in Module valuable in vocational guidance and organizational consult-
9.1. We are referring here to the evaluation of spiritual, reli- ing. It comes in a number of versions, including Form M, a
gious, and moral constructs. These specialized forms of 93-item test which can be purchased by qualified psycholo-
assessment have received an increasing amount of atten- gists in a self-scoring paper-and-pencil format, or adminis-
tion in recent years. tered on-line. Other forms such as the 126-item Form G and
In Module 9.2, Positive Psychological Assessment, we the 144-item Form Q are available on-line and must be
examine a number of relatively new scales that have authorized by a psychologist who has agreed to a licensing
emerged in response to a reawakening of interest in human arrangement with the publisher, Consulting Psychologists
potential, an interest that has remained largely dormant in Press (www.cpp.com).
psychology since the early 1900s (Seligman & Csikszentmi- Regardless of the version employed, the MBTI is
halyi, 2000). A special focus in this topic is the assessment scored on four theoretically independent polarities: Extra-
of creativity. version–Introversion, Sensing–iNtuition, Thinking–Feel-
ing, and Judging–Perceiving. The test-taker is categorized
9.1.1: Broad Band Tests of Normal on one side or the other of each polarity, which results in a
four-letter code such as ENTJ (Extraversion, iNtuition,
Personality Thinking, Judging). Because there are two poles to each of
A broad band test is one that measures the full range of the four dimensions, this allows for 24 or 16 different per-
functioning, as opposed to limited aspects. Beginning in sonality types. Each of the 16 types has been studied exten-
the 1940s, researchers sought to capture the nuances of sively over the years.
normal personality by developing broad-band self-report The four polarities (E-I, S-N, T-F, J-P) do not necessar-
instruments. The sheer variety of approaches to this task is ily correspond to common understandings of the anchor
a testament to the complexity of human functioning. An terms and hence require some explanation. It is also impor-
enduring question, related to the previous topic on theo- tant to note that the concepts are intended to be value-neu-
ries of personality, is how best to conceptualize the multi- tral and merely descriptive. Thus, it is neither better nor
faceted notion of personality. For example, is personality worse to manifest Extraversion or Introversion. Likewise,
best construed as a limited number of types, with most Thinking and Feeling are simply different modalities and
people resembling one type or another with reasonable one is not better than the other, and so forth. The opposite
precision? Or, is personality best interpreted as several ends of each polarity are simply different modes of being
dimensions, with each unique individual revealing a spe- that may have a variety of implications for relationships,
cific level of each dimension? If a dimensional approach is vocation, leadership, and personal functioning. Possessing
preferred, how many dimensions are needed to describe the qualities of one polarity or the other may be advanta-
the array of human responses: 5, 16, 20—or more? geous (or not) in different situations.
There are no definitive answers to these questions, Extraversion–Introversion is probably the easiest to
although dimensional approaches generally have pre- describe. An extravert (E) directs energy outward to peo-
vailed over typological methods in the history of test ple and conversations, whereas an introvert (I) directs
development. Even so, useful and popular typological energy inward to his or her inner world. A note of clarifi-
approaches do exist. In fact, we begin the discussion of cation: The MBTI retains the original spelling of Extraver-
broad-band tests with an instrument that flexibly permits sion, preferred by Jung, instead of using the synonymous
both a typological and a dimensional approach to the concept of Extroversion, preferred by contemporary psy-
understanding of normal personality. chologists. Sensing–iNtuition involves two opposite ways
of perceiving. Those who prefer sensing (S) rely on the
9.1.2: Myers-Briggs Type Indicator immediate senses, whereas those who prefer intuition (N)
rely upon “relationships and/or possibilities that have
(MBTI) been worked out beyond the reach of the conscious mind”
Originally published in 1962, the MBTI is a forced-choice, (Myers & McCaulley, 1985). Of course, the letter N is used
self-report inventory that attempts to classify persons accord- to designate intuition because the letter I already is taken
ing to an adaptation of Carl Jung’s theory of personality to label Introversion. Thinking–Feeling refers to basing
Assessment of Normality and Human Strengths 251

conclusions on thinking (T), that is, logic and objectivity, stable on three of the four letters, that is, one letter changed
as opposed to feeling (F), which involves a reliance on per- for them. About 17 percent of examinees retained two of
sonal values and social harmony. Finally, Judging–Per- their four letters, but switched on the other two. And, 3 per-
ceiving indicates a preference for decisiveness and closure cent retained only one letter, switching on the other three.
(J) or an open-ended flexibility and spontaneity (P). Overall, these are impressive results as to the long-term sta-
Whereas in common parlance the notion of “judging” bility of the MBTI code types.
often has a negative connotation, this is not the case when In a review of 17 studies reporting reliability coeffi-
the term is applied to this polarity of the MBTI. cients, Capraro and Capraro (2002) found respectably
The 16 possible four-letter types are not equally repre- strong reliability coefficients of .84 (E-I), .84 (S-N), .67 (T-F),
sented in the general population, and some types are more and .82 (J-P). Salter, Forney, and Evans (2005) conducted an
common in specific occupational groups. For example, in a especially rigorous evaluation of MBTI reliability, looking
sample of 231 education graduate students from a Mid- at the stability of MBTI categories across three administra-
western university, the ENFP type was by far the most tions with 231 graduate students in education. The three
common (N = 43), followed by ENFJ (N = 28) in frequency. administrations were at the beginning of the first year,
Codes beginning with the letter E (Extraversion) consti- beginning of the second year, and end of the second year.
tuted nearly two-thirds of this sample, which highlights Their report included extensive analyses, but of interest
the importance of Extraversion in the field of education. here is the percentage of respondents who received the
Paraphrasing from Myers and McCaulley (1985, p. 78), the same classification (e.g., Extraversion or Introversion) on
work expectations for someone who embodies the ENFP all three occasions. The percentage who displayed com-
type are as follows: plete consistency for each dimension was as follows:

• prefers to work interactively with a succession of • E-I 67%


people away from the desk • S-N 66%
• likes to work with a succession of new problems to be • T-F 69%
solved • J-P 71%
• prefers to provide service that is appreciated
• likes to work in changing situations that require Given the stringency of the reliability approach (agree-
adaptation ment across three administrations), these are respectable
findings.
These qualities align well with the role expectations More than 400 references citing the MBTI were found
for people heading into the field of education. in PsychINFO from 2000 to 2009, many pertaining to the
Standardization data for the MBTI is extensive and validity of the instrument. For example, in a study of 177
based on large samples collected over many decades (Myers managers, Higgs (2001) reported a significant relationship
& McCaulley, 1985). One particularly useful table is a list of between emotional intelligence and the dominant MBTI
occupations empirically attractive to the sixteen types. For function of iNtuition. Emotional intelligence is monitoring
example, 18 percent of attorneys are INTJ in type, whereas emotions of self and others and using this information to
only 2 percent of elementary school teachers fit this code. guide thinking and actions (Mayer & Salovey, 1993). A pos-
This is useful information for clients who take the test in itive relationship with MBTI iNtuition is strong support for
search of personal or career guidance. Split-half reliabilities the validity of this dimension.
for the four scales are in the .80s for the combined subject Another recent study also provides support for the
pool of nearly 56,000 participants. Test–retest reliabilities for validity of the polarities assessed by the MBTI. Furnham,
the four scales are somewhat lower and depend on the inter- Moutafi, and Crump (2003) tested 900 adults with two
val between tests. When the interval is short, on the order of instruments: the MBTI and the Revised NEO-Personality
a few weeks, results are strong, with coefficients mainly in Inventory (NEO-PI-R, Costa & McCrae, 1992). The NEO-PI-
the .70s and higher. Yet, when the interval is longer, on the R is a well validated measure of personality that evaluates
order of several years, the coefficients are predictably lower, five factors of personality known as the “big five.” These
in the .40s and .50s. With regard to reliability, an important factors are Neuroticism, Extraversion, Openness (to experi-
question with the MBTI is the stability of the four letter code ence), Agreeableness, and Conscientiousness. As predicted
from test to retest. The test manual reports on a dozen stud- by the authors, the MBTI dimensions revealed healthy and
ies of code type stability, with retest intervals ranging from 5 appropriate correlations with corresponding factors from
weeks to 5 years (most intervals a year or two). On average, the NEO-PI-R. Specifically, the following averaged concur-
about 41 percent of examinees retained their identical code rent validity correlations were found between the MBTI
type, that is, all four letters of the code remained the same dimensions and the NEO-PI-R scales: E-I correlated .71
from test to retest. About 38 percent of examinees remained with Extraversion; S-N correlated –.65 with Openness; T-F
252 Chapter 9

correlated –.35 with Agreeableness; and, J-P correlated .46 One concern about the MBTI is that the increasing cost
with Conscientiousness. The negative correlations indicate of administering the instrument—in the range of $10 to $30
an inverse relationship, that is, those categorized as S (Sens- per individual—provides a disincentive for outside research-
ing) on the MBTI obtained low scores on Openness, whereas ers who want to conduct reliability or validity studies. This is
those categorized as N (iNtuition) obtained high scores on an issue not only for the MBTI but also for the most widely
Openness. In like manner a T or Thinking type tended to used contemporary tests. Understandably, test publishers
obtain low scores on Agreeableness whereas an F or Feeling want to profit from their massive and expensive efforts at test
type tended to obtain high scores. All of these correlations development. But the downside is that scholarly researchers
are consistent with theoretical understandings of the MBTI need substantial funding if they desire to administer newer
and hence buttress the validity of the instrument. versions of the MBTI to large samples of examinees. Partly in
As mentioned, recent versions of the MBTI yield addi- reaction to the paucity of independent research on newer
tional information beyond the four-letter typological clas- versions of this test, reviewers continue to suggest caution in
sification. For example, the 144-item form Q, available its use, especially when making simplistic inferences from
on-line, provides a highly detailed and sophisticated sum- the four-letter type formulas (Pittenger, 2005).
mary report that partitions each of the four polarities into
five facet scores. Hence the report includes a total of 20
9.1.3: California Psychological
facet scores in addition to the four-letter code. For example,
the Thinking-Feeling dimension includes bipolar facets Inventory (CPI)
such as Logical-Empathetic, Reasonable-Compassionate, Originally published in 1957, the CPI is a true–false test
and Tough-Tender. The dimensions and facets of this ver- designed expressly to measure the dimensions of normal
sion of the MBTI are displayed in Table 9.1. The report personality (Gough & Bradley, 1996; McAllister, 1988). The
includes not only the typological classifications (e.g., T or instrument is available in two forms, the CPI-434 (Gough,
F) but also a rating for each bipolar facet on an 11-point 1995) and the CPI-260 (www.skillsone.com), which is avail-
continuum. This kind of nuanced dimensional information able only online. The component scales and the interpre-
appeals to many users. tive strategies are nearly identical for the two versions,
which differ mainly in the number of items—434 versus
260. Psychometric properties of both versions are similar
Table 9.1 Dimensions and Facets of the MBTI, Form Q and strong. Because of its ease of administration and the
immediacy with which the practitioner receives an exten-
Extraversion (E) (I) Introversion
sive computer-generated report, the CPI-260 rapidly is
Initiating Receiving
gaining favor among psychological practitioners.
Expressive Contained
The CPI-260 is scored for 20 folk measures of personality,
Gregarious Intimate
7 work-related scales, and 3 broad vectors. The purpose of
Active Reflective the test is to provide a clear picture of the examinee by using
Enthusiastic Quiet descriptors based on the ordinary language of everyday life
Sensing (S) (N) Intuition (Gough & Bradley, 1996). Three of the basic personality scales
Concrete Abstract also provide information on test-taking attitudes and there-
Realistic Imaginative fore function as validity scales. These scales are Good Impres-
Practical Conceptual sion (Gi), which assesses the extent to which the individual
Experiential Theoretical presents a favorable image to others; Communality (Cm),
Traditional Original which measures unusual responses that might arise from
Thinking (T) (F) Feeling carelessness or faking bad; and Well-being (Wb), which
Logical Empathetic gauges the portrayal of serious emotional problems.
Reasonable Compassionate The 20 folk measures and 7 work-related scales are
Questioning Accommodating listed and briefly described in Table 9.2. These scales are
Critical Accepting
reported as T-scores normed to a mean of 50 and a standard
deviation of 10 in the general population. The test developers
Tough Tender
used an empirical methodology of criterion-keying to
Judging (J) (P) Perceiving
develop the majority of the scales. Specifically, extreme
Systematic Casual
groups of participants (mainly college students) were formed
Planful Open-Ended
on such scale-relevant criteria as school grades, sociability,
Early Starting Pressure-Prompted
and participation in curricular activities. Item-endorsement
Scheduled Spontaneous
frequencies were then contrasted to ferret out the best state-
Methodical Emergent
ments for each scale. For example, the Sociability (Sy) scale
Assessment of Normality and Human Strengths 253

was constructed by contrasting item-endorsement rates for


Table 9.2 Brief Description of Standard and Work-Related persons reporting a large number of social activities versus
CPI-260 Scales
those reporting few or no social activities. In constructing
Standard Common Interpretation four of the folk scales, the authors used a rational basis
Scales of High Score backed up by indices of internal consistency.
Do Dominance dominant, persistent, good leadership Reflecting the care with which the scales were con-
ability
structed, reliability data for the CPI are respectable. Most
Cs Capacity for personal qualities that underlie and lead to
alpha coefficients are in the .70s and .80s, with a median
Status status
value of .76. The test–retest reliability coefficients tend to be
Sy Sociability outgoing, sociable, participative
temperament somewhat lower, with a median retest correlation of .68. The
Sp Social Presence poise, spontaneity, and self-confidence in authors provide a wealth of normative data, including aver-
social situations age test scores for 52 samples of males and 42 samples of
Sa Self-acceptance self-acceptance and sense of personal females, subdivided by education, occupation, college major,
worth
gender, and other variables. The basic normative sample
In Independence high sense of personal independence, not
easily influenced
consists of 3,000 males and 3,000 females of varying age,
social class, and geographic region (Gough & Bradley, 1996).
Em Empathy good capacity to empathize with other
persons In addition to the wealth of information provided by
Re Responsibility conscientious, responsible, and the individual scale scores, the CPI also is scored on three
dependable broad dimensions or vectors derived from decades of factor-
So Social Conformity strong social maturity and high integrity analytic studies with the instrument. The three vectors
Sc Self-control good self-control, freedom from impulsivity include two basic orientations and a third theme reflecting
and self-centeredness
ego integration. The first basic orientation called vector 1 or
Gi Good Impression concerned about creating a good
v.1 has two polarities: toward people or toward one’s inner
impression
life. This vector is similar to the extraversion–introversion
Cm Communality valid and thoughtful response pattern
dimension found in nearly every personality theory ever
Wb Sense of Well- not worrying or complaining, free from self
being doubt proposed. The second basic orientation or v.2 also has two
To Tolerance permissive, accepting, and nonjudgmental polarities: rule-favoring or rule-questioning. This vector
social beliefs reflects a conventional–unconventional dimension also
Ac Achievement via achieves well in settings where found in many studies. These first two bipolar orientations,
Conformance conformance is necessary
v.1 and v.2, provide a 2 × 2 typology of four lifestyles termed
Ai Achievement via achieves well in settings where
the Implementer, Supporter, Innovator, and Visualizer life-
Independence independence is necessary
styles, described below. The third vector or v.3 assesses a
Cf Conceptual high degree of personal and intellectual
fluency efficiency 7-point continuum variously referred to as self-realization,
Is Insightfulness interested in and responsive to the inner psychological competence, or ego integration. In the client
needs, motives, and experiences of others feedback report provided by the publisher, v.3 is referred to
Fx Flexibility flexible and adaptable in thought and as Level of Satisfaction and scored 1 (low) to 7 (high). This
social behavior
vector acts as a moderator for each of the lifestyles, with
Sn Sensitivity sensitive to others’ feelings, personally
high scores on v.3 leading to a positive expression and low
vulnerable
scores leading to a negative expression.
Work-Related Common Interpretation
Scales of High Score
Results from several correlational studies confirm dis-
tinctive psychological portraits for the four lifestyles men-
Mp Managerial good judgment, effective at dealing with
Potential people tioned above (Gough & Bradley, 1996).
Wo Work Orientation strong work ethic, rarely complains about
work
Ct Creative creative thinker who prefers what is new
Temperament or different
Lp Leadership strong leadership skills, deals well with
stress
Ami Amicability collegial and cooperative, a good team
player
Leo Law Enforcement practical, well suited to work in law
Orientation enforcement
SOURCE: Based on Gough, H. G. and Bradley, P. (1996). CPI manual (3rd ed.). Mountain
View, CA: Consulting Psychologists Press. Also, Megargee, E. (1972). The California
Psychological Inventory handbook. San Francisco: Jossey-Bass; and McAllister, L. (1988). A
practical guide to CPI interpretation. Palo Alto, CA: Consulting Psychologists Press.
254 Chapter 9

The CPI Manual provides a wealth of information greater incidence of delinquency. The reader can find further
about each lifestyle, including adjective correlates obtained details on the real-world empirical correlates of CPI profiles
from spouses, peers, and professional evaluators. From in Groth-Marnat (2003) and Hargrave and Hiatt (1989).
these empirical sources, a clear portrait of each lifestyle
emerges. For example, the summary statement for Innova- 9.1.4: Neo Personality Inventory-
tors is as follows:
Revised (NEO PI-R)
Gammas attend to and seek the monetary, prestige, and
The NEO Personality Inventory-Revised (NEO PIR)
other rewards offered by society, but are often at odds
with the culture concerning the criteria by which these embodies decades of factor-analytic research with clinical
rewards are apportioned. Their values are personal and and normal adult populations (Costa & McCrae, 1992). The
individual, not traditional or conventional. Gammas test is based upon the five-factor model of personality
[Innovators] are the doubters, the skeptics, those who see described in the previous chapter. It is available in two par-
and resist the arbitrary and unjustified features of the sta- allel forms consisting of 240 items rated on a five-point
tus quo. At their best, they are innovative and insightful dimension. An additional three items are used to check
creators of new ideas, new products, and new social validity. A shorter version, the NEO Five-Factor Inventory
forms. At their worst, they are rebellious, intolerant, self- (NEO-FFI) is also available (Costa & McCrae, 1989). We
indulgent, and disruptive; and at low levels on the v.3 limit our discussion to the NEO PI-R. Form S is for self-
scale, they often behave in wayward, rule-violating, and
reports whereas Form R is for outside observers (e.g., the
narcissistic ways.
spouse of a client). The item format consists of five-point
(Gough & Bradley, 1996, p. 50)
ratings: strongly disagree, disagree, neutral, agree, strongly
The reader will notice that the third vector, v.3, moder- agree. The items assess emotional, interpersonal, experien-
ates the expression of the Implementer lifestyle, for better tial, attitudinal, and motivational variables.
or for worse. When v.3 is high, the Implementer is innova- The five domain scales of the NEO PI-R are each based
tive and insightful. When v.3 is low, the Implementer is upon six facet (trait) scales (Table 9.3). The internal consist-
wayward and narcissistic. A similar pattern holds true for ency of the scales is superb: .86 to .95 for the domain scales,
the other three lifestyles—each can have a positive or nega- and .56 to .90 for the facet scales. Stability coefficients range
tive expression, depending on the level of personal inte- from .51 to .83 in three- to seven-year longitudinal studies.
gration reflected on the v.3 scale. Validity evidence for the NEO PI-R is substantial, based on
The CPI is heir to a long history of empirical research the correspondence of ratings between self and spouse,
that substantiates a number of real-world correlates for correlations with other tests and checklists, and the con-
distinctive test profiles. Due to space limitations, we can struct validity of the five-factor model itself (Costa &
only list several prominent areas in which the value of the McCrae, 1992; Piedmont & Weinstein, 1993; Trull, Useda,
test has been empirically confirmed. Costa, & McCrae, 1995).

Table 9.3 Domain and Facet (Trait) Scales of the NEO PI-R
Domains Facets
Neuroticism Anxiety Self-Consciousness
Angry Hostility Impulsiveness
Depression Vulnerability
Extraversion Warmth Activity
Gregariousness Excitement Seeking
Assertiveness Positive Emotions
The CPI is particularly effective at identifying adoles-
Openness to Fantasy Actions
cents or adults who follow a delinquent or criminal life- Experience
style. For example, Gough and Bradley (1992) studied a Aesthetics Ideas
sample of 672 delinquent or criminal men and women, con-
Feelings Values
trasting their CPI scale scores with a large sample of con-
Agreeableness Trust Compliance
trols. Of the 27 scales evaluated, they found significant
Straightforwardness Modesty
mean differences on 25 for men and 26 for women. The
Altruism Tender-Mindednesss
most discriminating scale was Social Conformity (So),
Conscientiousness Competence Achievement Striving
which revealed healthy point-biserial correlations of .54 for
Order Self-Discipline
men and .58 for women. They also found that low scores on
Dutifulness Deliberation
v.3 (a measure of ego integration) were associated with
Assessment of Normality and Human Strengths 255

The NEO PI-R is an excellent measure of personality Another useful site is available at http:/ipip.ori.org. This
that is especially useful in research. Rubenzer, Fasching- location hosts the International Personality Item Pool
bauer, and Ones (2000) describe a particularly fascinating (IPIP), advertised as a “scientific collaboratory for the
research project with the test in which all U.S. presidents development of advanced measures of personality and
were evaluated by 115 highly informed, expert presiden- other individual differences.” The term collaboratory was
tial biographers who filled out the NEO PI-R on behalf of coined by Finholt and Olson (1997) to describe Internet-
the presidents, from George Washington through George based arrangements that facilitate the collaboration of test
H. W. Bush. The authors developed a typology of presi- specialists, regardless of geographical location. For exam-
dents from the data and related facets of the test to presi- ple, the specific mission of IPIP is to bring test develop-
dential success (i.e., historical greatness). They also ment into the public domain and serve as a forum for the
published individual presidential profiles, such as the fol- dissemination of research findings and psychometric
lowing results for George Washington (50 is average in the developments.
general population): Recently, the developers of the NEO-PI-R produced a
new version that is more readable and therefore better
Neuroticism 47 suited to students as young as 12 years of age. The NEO-
Extraversion 44 PI-3 is a careful and modest revision of the original instru-
Openness 39 ment that addresses a number of problematic items difficult
Agreeableness 40 for adolescents and young adults to comprehend (McCrae,
Conscientiousness 72 Costa, & Martin, 2005). As noted above, the NEO-PI-R con-
sists of 240 items rated on a 5-point Likert scale from
The portrait that emerges is of a leader who is well- Strongly Agree to Strongly Disagree. The authors identified
adjusted, slightly introverted, not particularly open to 30 items using words on a par with laissez-faire, fastidious,
experience, markedly disagreeable, and extremely consci- and adhere that even adults might find challenging. The
entious. After reviewing the specific facet scores (see Table authors rewrote these items for transparency and carefully
9.3), the authors concluded that Washington “falls quite tested them for equivalence in a new sample of 500
short of the modern political commodities of warmth, respondents. Three illustrations of old items and replace-
empathy, and open-mindedness.” ment items (in boldface) are shown below. These are repre-
The test also shows promise as a measure of clinical sentative only, not the actual items and revisions:
psychopathology. For example, Clarkin, Hull, Cantor, and
1. I feel angst about the future.
Sanderson (1993) found that patients diagnosed with bor-
derline personality disorder scored very high on Neuroti- 1. I feel nervous about the future.
cism and very low on Agreeableness, which resonates 2. I think of myself as laissez-faire.
strongly with every clinician’s response to these challeng- 2. I think of myself as easy-going.
ing patients. Ranseen, Campbell, and Baer (1998) deter-
3. I enjoy situations of raucous hilarity.
mined that 25 adults with attention deficit disorder scored
4. I like to laugh.
significantly higher than controls in the Neuroticism
domain and significantly lower in the Conscientiousness An additional 18 items were rewritten because they
domain, demonstrating the usefulness of the NEO PI-R in revealed low item-total correlations with the facet (trait)
understanding attention deficit disorders in adulthood. scale to which they belonged. The resulting instruments,
One minor concern about the instrument is that it lacks the NEO-PI-3, retained the original five-factor structure
substantial validity scales—only three items assess valid- and revealed better internal consistency and readability
ity. The administration of the NEO PI-R assumes that sub- than the previous version. In sum, the authors improved
jects are cooperative and reasonably honest. This is usually their test, especially for applications with adolescent and
a safe assumption in research settings but may not hold college-aged clients (Costa, McCrae, & Martin, 2008).
true in forensic, personnel, or psychiatric settings.
For purposes of education and research, several psy-
9.1.5: Stability and Change
chometricians have constructed websites where it is pos-
sible to self-administer an equivalent version of the NEO in Personality
PI-R. Although not identical to the commercial version of Most of us have heard adages like “People don’t change” or
the test (Costa & McCrae, 1992), these parallel adaptations “Personality traits become exaggerated with age” or “You
do provide estimates of examinee standing on the five have to hit bottom before change is possible.” Opinions
broad domains and 30 subdomains of personality tested abound on the stability or malleability of personality. What
by the NEO PI-R and also provide useful narrative reports. the lay public seldom recognizes, however, is that issues of
One such site can be found at www.personalitytest.com. stability and change in personality can be approached with
256 Chapter 9

empiricism through psychological assessment. As we will documented historical impacts on personality in a path-
see, a few tests figure prominently in lifespan developmen- breaking longitudinal study of children raised during the
tal research, especially instruments that embody the five- Great Depression (1929–1941). Among other findings, these
factor approach (Costa & McCrae, 1992). children grew into adults who responded with habits of
One question central to the field of personality psy- greater frugality than preceding or subsequent cohorts.
chology is whether personality remains stable throughout In studying age trends in personality, a certain degree
life, or reveals predictable shifts in certain qualities as we of tentativeness is warranted, because no single study or
age. On the surface this question appears amenable to method is conclusive. Some researchers combine longitudi-
straightforward longitudinal research. Simply administer a nal and cross-sectional methods in what is known as the
suitable instrument to a large sample of the general popu- cross-sequential approach (Nestor & Schutt, 2012). This
lation, and retest every five years or so. Then, chart the method involves the longitudinal retesting of cross-sectional
trends in dimensions of personality over the life span. But study participants on at least one additional occasion. The
this is not as simple as it seems. One problem is selective beauty of the cross-sequential method is that cohort effects
attrition, in which less healthy individuals tend to drop can be distinguished from genuine longitudinal trends.
out, disappear, or discontinue the project for reasons This allows researchers to identify typical changes result-
known and unknown (Barry, 2005). Although there are ing from intrinsic maturation.
methodological adjustments for minimizing the impact, It is important to mention that core issues of personal-
selective attrition nonetheless may skew results toward an ity change may not be wholly amenable to traditional
unrealistically optimistic picture of trends in aging. methods of measurement. Consider the case study of Ann,
Another problem with longitudinal research is that dec- interviewed on videotape five times over a span of 40
ades of time are needed to follow individuals over the life years, from age 21 to age 61 (Mitchell, 2007). She was one of
span. Long-term developmental research is difficult and more than 100 participants in the monumental Mills Longi-
expensive. tudinal Study, conducted by Ravenna Helson (Helson &
An alternative strategy is cross-sectional research in Soto, 2005). Mitchell (2007) analyzed the videotaped inter-
which a large sample of individuals of all ages (from teen- views of Ann through the lens of attachment theory, which
agers to persons in their 90s) is tested at one point in time, we summarize briefly before returning to her story.
allowing for immediate age comparisons in personality Attachment theory (Ainsworth & Bowlby, 1965)
characteristics. This is an appealing technique but also broadly distinguishes secure attachment from insecure
fraught with methodological concerns. In particular, the attachment. In secure attachment, distressed infants seek
cross-sectional strategy is vulnerable to a research problem proximity to caregivers and receive nurturance from them
known as cohort effects (Schaie, 2011). A cohort is a group without pause or ambivalence. In insecure attachment, dis-
of individuals born at roughly the same time who therefore tressed infants are unable to receive a sense of security
share particular life experiences and historical influences. from caregivers who are themselves limited or erratic
A cohort effect is the inference that differences between (George & Solomon, 1999). Insecure attachment is further
age groups (cohorts) are due to disparities in the nature subdivided into three types: avoidant, ambivalent, and dis-
and quality of early developmental or historical experi- organized (Main & Hesse, 1990). Volumes have been writ-
ences rather than caused by the impact of aging. A hypo- ten about these styles; we can provide only the barest of
thetical example will serve to illustrate. Suppose we outlines here. In the avoidant style, the distressed infant
observe in a cross-sectional study of neuroticism (anxiety- appears emotionally distant and the caregiver is disen-
proneness) that persons in their 70s score higher than those gaged. In the ambivalent style, the distressed infant
in their 50s. We might be tempted to attribute the apparent becomes anxious, insecure, and angry, and the caregiver is
increase in neuroticism to the impact of aging and its atten- inconsistent. In the disorganized style, the distressed infant
dant concerns. But that inference overlooks the possibility seems depressed, angry, and passive, and the caregiver is
that the older participants in our study were always higher extremely erratic.
in neuroticism than the younger members, perhaps because Attachment theory is relevant to adult personality
their early formative years occurred during the frantic development because, in the words of Mitchell (2007),
uncertainty of World War II, or for other unknown reasons. “Attachment status becomes personality style” (p. 97).
In this hypothetical example, the higher level of neuroti- Corresponding to the four styles of infant attachment men-
cism would not be a general trend or result of traversing tioned above (secure, avoidant, ambivalent, and disorgan-
into old age, but a specific quirk of the older cohort. Again, ized), the linked attachment styles in adulthood are
this is an hypothetical example. Real age trends in neuroti- described as secure, dismissing of attachment, preoccupied
cism are reviewed below. with attachment, and disorganized-fearful (Main & Solo-
Yet, the proposal that historical forces can shape the mon, 1986). Questionnaires have been developed to assess
personality of an entire cohort is accurate. Elder (1974) has attachment style in adulthood (e.g., Simpson, Rholes, &
Assessment of Normality and Human Strengths 257

Nelligan, 1992), but they are limited and drab in compari- contradictory. In a study of 2,274 participants in their for-
son to qualitative analysis of interview materials. ties retested after 6 to 9 years, Costa, Herbst, McCrae, and
In the case study of Ann, Mitchell (2007) determined Siegler (2000) found minimal or no change in mean level of
that Ann started her journey into adulthood (age 21) with a the Big Five factors, even though popular accounts indicate
distinctly insecure attachment of the avoidant style. In nar- that midlife is a time of crisis and turmoil. In contrast, oth-
rative statements, Ann described a frightening childhood ers report that personality traits continue to transform in
in which her mother died a prolonged death from cancer. middle and old age, with increases in conscientiousness
This was bad enough, but compounding the trauma was and agreeableness, and decreases in some elements of
that her father, previously a source of security, proved inca- extraversion (Helson, Kawn, John, & Jones, 2002).
pable of breaking the painful news to Ann, leaving it to her How can we reconcile these contradictory reports? Per-
grandfather instead. Then, her father withdrew and haps the best approach to this dilemma is a comprehensive
became distant, which Ann experienced as even more dev- synthesis of all relevant studies by means of meta-analysis.
astating than the death of her mother. Ann developed an Meta-analysis is a sophisticated statistical procedure for
avoidant attachment style. She feared abandonment for combining data from multiple studies. In this method,
most of her life: results from studies using different measurement tech-
niques can be transformed to a common metric, the effect
This narrative presents a set of rich characters in a plot
that devolves from intimate tenderness to death, aban- size, and then combined for powerful statistical analyses
donment, and benign neglect. The strong-minded girl (Cohen, 1988). One type of effect size is Cohen’s d, which is
escaped, but in the process a door was closed that would the mean difference on a variable between two comparison
not open again for nearly 40 years. groups divided by the standard deviation of the pooled
(Mitchell, 2007, p. 100) groups on that variable, or d = (M1 − M2)/sp. While effect
sizes exist theoretically on an infinite range in positive and
The door opened gradually after the birth of an adored
negative directions, it is rare in everyday research that they
daughter, four years of therapy to deal with attachment
exceed the bounds of + 3.0 to −3.0, a value of 0 indicating no
concerns, divorce, falling in love again, remarriage, return
difference between groups. The beauty of meta-analysis is
to school, and a new career. When last interviewed, Ann
that studies using diverse tests, measuring slightly different
revealed an amazing shift to a secure attachment style:
constructs, based on varying scales of measurement, none-
At 61, Ann was phasing in retirement and was “much
theless can be transformed to the common metric of effect
less stressed, much more easy going.” She was learning for-
size and then combined for comprehensive analysis.
eign languages, doing photography, involved in local poli-
In regard to shifts in Big Five personality factors over
tics, and often with her partner, family, and friends (p. 113).
the life course, Roberts, Walton, and Viechtbauer (2006)
The analysis provided by Mitchell (2007) is full of rich
completed a meta-analysis of 92 longitudinal samples to
detail that we cannot recount here. The point of this some-
determine the patterns of normative change. Their findings
what lengthy digression into the case of Ann is that analy-
constitute an authoritative synthesis of research in the
ses based on average test scores from large groups of
field. They sorted the various personality test results into
research participants, whether longitudinal or cross-sec-
six categories closely resembling the Big Five taxonomy of
tional, will not capture the depth and vibrancy available
personality traits. Their categories are effectively identical
from the qualitative study of individual lives in transition.
to the Big Five, except they split extraversion into two sub-
Even so, empirical analyses provide a general framework
categories of social dominance and social vitality. The six
for understanding stability and change in personality.
categories they investigated were emotional stability, con-
Thus, we review key studies and conclusions below.
scientiousness, agreeableness, social dominance, social
Personality Stability and Change in Middle vitality, and openness to experience. They summarize their
and Late Life Do people change in personality traits findings as follows:
across the life course? Several researchers have sought to This study demonstrates that personality traits show a
identify mean-level changes or normative changes that are clear pattern of normative change across the life course.
generalizable patterns of development found in most peo- People become more socially dominant, conscientious,
ple (Caspi & Roberts, 1999). Most commonly, investigators and emotionally stable mostly in young adulthood, but
use the Big Five model of personality as their measurement in several cases also in middle and old age. We found
perspective (Goldberg, 1981b). As the reader will recall, that individuals demonstrated gains in social vitality
this is the view that personality is best conceived as five and openness to experience early in life and then
factors labeled neuroticism, extraversion, openness to decreases in these two trait domains in old age (Roberts
experience, agreeableness, and conscientiousness. et al., 2006, p. 14).
Individual reports of developmental trends in the Big Further, they note that contrary to popular views
Five factors over the life course often seem inconclusive or about personality development, the biggest shifts occur
258 Chapter 9

not in adolescence, but in young adulthood when social estimates. The minimum sample size for each year of age
role expectations are more taxing. Young adulthood is was 922, and at least 422 persons of each gender were
when most persons leave home, find a career, and integrate included. Their study is vast and comprehensive in its con-
with the community. The authors caution that their find- clusions. We need to keep in mind that cross-sectional age
ings are based entirely on Western samples and generaliza- differences may not mirror longitudinal age trends. As dis-
tion to non-Western cultures therefore is unknown. cussed above, cohort effects always could be at play. How-
Soto, John, Gosling, and Potter (2011) pursued the ever, Soto et al. (2011) collected their cross-sectional data
question of age differences in personality traits with an over a period of 7 years, and thus were able to examine for
intriguing and massive cross-sectional research project. cohort effects, which they did not find. We summarize here
Their sample consisted of an astonishing 1,267,218 individ- a few essential and remarkable findings:
uals (age 10 to 65) who responded to a Web-based ques-
tionnaire on Big Five personality traits. Their assessment • Scores for Agreeableness and Conscientiousness take a
instrument was the Big Five Inventory (BFI), a simple nosedive after age 10, reaching their lowest levels by
44-item measure with excellent psychometric qualities far in the entire life span at age 13 and then climbing
(John, Donahue, & Kentle, 1991; John, Naumann, & Soto, sharply into young adulthood at age 20. The popular
2008). The BFI is freely available to researchers for non- stereotype that young teenagers are disagreeable and
commercial purposes. The format of the instrument is lacking in self-discipline rings true in this study.
depicted in Table 9.4. The test developers isolated two dis- • Scores on Agreeableness, Conscientiousness, and
tinctive subscales, called Facet scales, for each of the Big Openness to Experience all climb gradually or moder-
Five domains. ately throughout the entire span of adulthood, from
age 20 to 65. Some qualities do appear to improve
indefinitely with age (at least to age 65).
Table 9.4 The BFI Facet Scales: Names and
Example Items • Scores on Extraversion are at their highest level at age
10, drop sharply until age 15, and then remain level
BFI Facet Scale Example Items
across the life span. The contribution of the Activity
Extraversion component (e.g., “Is full of energy”) appears to
Assertiveness 1. Has an assertive personality.
2. Is sometimes shy, inhibited. (R)
explain the very high scores on Extraversion at age 10.
Activity 3. Is full of energy. After age 15 there is essentially no change in Extraver-
4. Generates a lot of enthusiasm. sion scores.
Agreeableness • Scores on Neuroticism reveal abrupt gender differ-
Altruism 1. Is helpful and unselfish with others.
2. Is considerate and kind to almost ences. Women outscore men, sometimes dramatically
everyone. so, at all age levels. For women, scores rise sharply to
Compliance 3. Has a forgiving nature. their highest levels at age 15–16 and then decline at a
4. Starts quarrels with others. (R)
moderate pace for the remainder of the life span. It
Conscientiousness
appears that the mid-teen years are especially difficult
Order 1. Tends to be disorganized. (R)
2. Can be somewhat careless. (R) for girls. For men, scores on Neuroticism decline mod-
Self-Discipline 3. Perseveres until the task is finished. erately from age 10 to 20, remain level from age 20 to
4. Is easily distracted. (R)
age 50, and then decline moderately to age 65. The
Neuroticism
higher scores for women compared to men document
Anxiety 1. Worries a lot.
2. Remains calm in tense situations. (R) an established epidemiological trend in which women
Depression 3. Is depressed, blue. are more likely to manifest anxiety and depression
4. Can be moody. than men (McLean, Asnaani, Litz, & Hofmann, 2011).
Openness to Experience
• Women score higher than men at all ages for Agreea-
Openness to Aesthetics 1. Values artistic, aesthetic experiences.
2. Has few artistic interests. (R) bleness, Conscientiousness, and Extraversion. Gender
Openness to Ideas 3. Likes to reflect, play with ideas. differences on Openness to Experience are complex,
4. Is curious about many things. but at all ages men score higher than women on the
NOTE: Reverse-keyed items are denoted by (R). The common stem for all BFI items is “I
see myself as someone who . . . ” BFI _ Big Five Inventory.
Ideas facet. The interpretation of these gender differ-
Source: Reprinted with permission from Soto, C. J., John, O. P., Gosling, S. D., & Potter, ences is unclear.
J. (2011). Age differences in personality traits from 10 to 65: Big Five domains and facets in
a large cross-sectional sample. Journal of Personality and Social Psychology, 100, 330–348.
The literature on age differences and longitudinal
Their assessment tool, the BFI, is appropriate for chil- trends in Big Five personality domains is vast. We refer the
dren and adults of any age with a fifth-grade reading level. reader to Helson and Soto (2005), Lüdtke, Roberts, Traut-
However, for participants younger than 10 and older than wein, and Nagy (2011), Specht et al. (2011), and Wortman,
65, sample sizes were too small to provide highly reliable Lucas, and Donnellan (2012).
Assessment of Normality and Human Strengths 259

9.1.6: The Assessment of Moral Table 9.5 Kohlberg’s Levels and Stages of Moral
Judgment Development

Kohlberg has proposed one of the few theories of moral Level 1: Preconventional
development that is both comprehensive and empirically Stage 1. Punishment and obedience orientation: The physical
consequences determine what is good or bad.
based (Colby, Kohlberg, Gibbs, & Lieberman, 1983;
Stage 2. Instrumental relativism orientation: What satisfies one’s
Kohlberg, 1958, 1981, 1984; Kohlberg & Kramer, 1969).
own needs is good.
Although he was more concerned with theory-based prob-
Level 2: Conventional
lems of moral development than with the nuances of stand-
Stage 3. Interpersonal concordance orientation: What pleases or
ardized measurement, Kohlberg did generate a method of helps others is good.
assessment that is widely used and intensely debated. We Stage 4. “Law-and-order” orientation: Maintaining the social
will review the underlying rationale for his measurement order and doing one’s duty is good.
tool and discuss the psychometric properties of the instru- Level 3: Postconventional or Principled
ment as well. In addition, we will take a brief look at a more Stage 5. Social contract-legalistic orientation: Values agreed upon
objectively based adaptation of Kohlberg’s approach known by society determine what is good.

as the Defining Issues Test (Rest, 1979; Rest & Thoma, 1985). Stage 6. Universal ethical-principle orientation: What is right is a
matter of conscience derived from universal principles.
Stages of Moral Development Kohlberg’s theory Source: Based on Kohlberg (1984).
grew out of Piaget’s (1932) stage theory of moral develop-
ment in childhood. Kohlberg extended the stages into ado- The Moral Judgment Scale consists of several hypo-
lescence and adulthood. In order to explore reasoning thetical dilemmas such as Heinz and the druggist, pre-
about difficult moral issues, he devised a series of moral sented one at a time (Colby, Kohlberg, Gibbs, & others,
dilemmas. One of the most famous is the dilemma of 1978). In its latest revision, the scale comes in three ver-
Heinz and the druggist: sions called Forms A, B, and C. Scoring is quite complex,
In Europe, a woman was near death from a special kind of based on the examiner’s judgment of responses in relation
cancer. There was one drug that the doctors thought might to extensive criteria outlined in a detailed scoring manual
save her. It was a form of radium that a druggist in the same (Colby & Kohlberg, 1987). Although there are several dif-
town had recently discovered. The drug was expensive to ferent dimensions to scoring, the one element most fre-
make, but the druggist was charging ten times what the quently cited in research studies is the overall stage of
drug cost him to make. He paid $200 for the radium and moral reasoning that characterizes a respondent.
charged $2000 for a small dose of the drug. The sick wom-
an’s husband, Heinz, went to everyone he knew to borrow Critique of the Moral Judgment Scale Early
the money, but he could only get together about $1000 which versions of the Moral Judgment Scale suffered serious
is half of what it cost. He told the druggist that his wife was shortcomings of scoring and interpretation. For example,
dying, and asked him to sell it cheaper or let him pay later. in his doctoral dissertation, Kohlberg (1958) proposed two
But the druggist said, “No, I discovered the drug and I’m scoring systems: one using the sentence or completed
going to make money from it.” So Heinz got desperate and thought as the unit of scoring, the other relying upon a
broke into the man’s store to steal the drug for his wife. global rating of all the subject’s utterances as the unit of
(Kohlberg & Elfenbein, 1975) analysis. Neither approach was fully satisfactory, and early
After reading or hearing this story, the respondent is reviews of the scale were justifiably critical of its reliability
asked a series of probing questions. The questions might and validity (Kurtines & Greif, 1974).
be as follows: Should Heinz have stolen the drug? What if In response to these criticisms, Kohlberg and his associ-
Heinz didn’t love his wife? Would that change anything? ates developed a scoring system that is unparalleled in its
What if the person dying was a stranger? Should Heinz clarity, detail, and sophistication (Rest, 1986). Fortuitously,
steal the drug anyway? Based on answers to this and other since the moral dilemmas of the Moral Judgment Scale have
dilemmas, Kohlberg concluded that there are three main remained constant over the years, it is possible to apply the
levels of moral reasoning, with two substages within each new scoring system to old data. The capacity to reanalyze
level (Table 9.5). One use of his measurement instrument, old data and compare them with new data is invaluable in
the Moral Judgment Scale, is to determine a respondent’s determining the reliability and validity of an existing scale.
stage of moral reasoning.1 A most important study in this regard has been published
by Kohlberg and associates (Colby et al., 1983).
1
Even though the Moral Judgment Scale has been widely used for This investigation reports the results of using the new
empirical research, Kohlberg (1981, 1984) suggests that its most scoring system in a longitudinal study spanning more
valuable application is for the promotion of self-understanding and than 20 years. The results are impressive and offer strong
the development of moral reasoning in the individual respondent. support for the reliability and validity of the instrument.
260 Chapter 9

Test–retest correlations for the three forms were in the These ratings form the basis for generating several
high .90s, as were interrater correlations. Longitudinal quantitative scores that pertain to the moral judgment of
scores of subjects tested at three- to four-year intervals the examinee. The most widely used score is the P score,
over 20 years revealed theory-consistent trends. Fifty-six which is a percentage of principled thinking. Reliability
of 58 subjects showed upward change, with no subjects of the P score ranges from .71 to .82 in test–retest studies
skipping any stages. Furthermore, only 6 percent of the (Rest, 1979, 1986). Validity has been studied by contrast-
195 comparisons showed backward shifts between two ing groups known to differ on principled thinking. For
testing sessions. The internal consistency of scores was example, graduate students in moral philosophy and
also excellent: about 70 percent of the scores were at one political science, general college students, high school
stage, and only 2 percent of the scores were spread further seniors, and ninth-grade students were found to differ
than two adjacent stages. Cronbach’s alpha was in the appropriately and systematically on the P score. In longi-
mid-.90s for the three forms. These findings have been cor- tudinal studies, significant upward trends were found
roborated by Nisan and Kohlberg (1982). Heilbrun and over six years and four testings. Recently, Rest has recom-
Georges (1990) also report favorably upon the validity of mended a new measure of moral judgment, the N2 index,
the Moral Judgment Scale, insofar as postconventional calculated on the basis of several complex formulas that
development is correlated with higher levels of self-con- use both ranking and rating data. The two indices are
trol, as would be predicted from the fact that morally highly correlated in the .90s. Nonetheless, in a retrospec-
mature persons often oppose social pressure or legal con- tive analysis of previous studies, the N2 index outper-
straints. In sum, the Moral Judgment Scale is reliable, formed the P index by a substantial margin (Rest, Thoma,
internally consistent, and possesses a theory-confirming Narvaez, & Bebeau, 1997).
developmental coherence. Over 600 articles have been published on the Defining
Issues Test (McCrae, 1985). In general, the instrument is
The Defining Issues Test The Defining Issues Test considered a useful alternative to Kohlberg’s Moral Judg-
(DIT) is similar to the Moral Judgment Scale but incorpo- ment Scale, particularly for research on group differences
rates a much simpler and completely objective scoring for- in moral reasoning. However, reviewers do note several
mat (Rest, 1979, 1986). The examinee reads a series of moral cautions about the DIT (Westbrook & Bane, 1992). First,
dilemmas similar to those designed by Kohlberg and then the test uses two moral dilemmas from the Vietnam War
chooses a proper action for each. For example, one dilemma and is, therefore, somewhat dated. Many young exami-
involves a patient dying a painful death from cancer. In her nees have little knowledge of (and perhaps no interest in)
lucid moments, she requests an overdose of morphine to this topic and may find it difficult to identify with these
hasten her death. What should the doctor do? Three questions. Another dilemma—the classic case of whether
options of the following kind are listed: Heinz should steal a drug to save his wife’s life—is also of
He should give the woman a fatal overdose dubious value since it has been widely publicized and
reprinted in college textbooks. A significant proportion of
Should not give the overdose
prospective examinees are no longer naive about this
Can’t decide
moral dilemma.
The examinee’s choice does not enter directly into the Richards and Davison (1992) have pressed the point
determination of the moral judgment score. The real pur- that the DIT is biased against conservatively religious
pose in forcing a choice is to cause the examinee to think individuals. Certainly, it is well established that conserv-
about the importance of various factors in making the deci- ative or fundamentalist religious people tend to score
sion. Following the choice of proper action, the examinee lower than average on the P score of the Defining Issues
rates the importance of several factors on a five-point Likert Test (Getz, 1984; Richards, 1991). According to Richards
scale: great, much, some, little, or no importance. The factors and Davison (1992), the reason for this is that stage 3 and
are distinct for each dilemma. The factors differ in the level stage 4 items (unintentionally) possess strong theological
of moral judgment they signify, ranging from Kohlberg’s implications that cause fundamentalist individuals to
stage 1 through stage 6. In the case of the preceding dilemma, endorse the items, thereby lowering their score on the
the factors include such matters as follows: test. Consider items that tap stage 4 reasoning, which is
the “law and order” orientation that equates “moral”
 hether the doctor can make it look like an
W
with doing one’s duty and maintaining the social order.
accident
Whereas nonreligious persons might support the laws of
 an society afford to let people end their lives
C the land (and endorse stage 4 items) because they believe
when they want to that legal authorities define what is right and moral, reli-
 hether the woman’s family favors giving the
W gious minorities such as Mormons believe that support-
overdose or not ing the laws of the land is a theological and religious
Assessment of Normality and Human Strengths 261

obligation that flows directly from articles of faith in scholars of today. Even psychologists, to whom presuma-
their religion: bly nothing of human concern is alien, are likely to retire
into themselves when the subject is broached. (p. 1)
While Mormons place a high value on obeying the law
and supporting legal authorities, this value is due to their The situation is little improved in contemporary times.
theological belief that God has commanded them to do For example, except for a few specialty journals, spiritual
so, and not because they believe, as do true Stage 4 think- and religious topics are virtually absent from the psycho-
ers, that the laws of the land or legal authorities define logical literature.
what is right or moral. Yet researchers have no right to retire from the field,
(Richards & Davison, 1992, p. 470)
given its significance to the average person. Consider these
These researchers demonstrate empirically that certain statistics on religious belief in the United States, stable since
DIT items measure a different construct for conservative 1944 when national polls first came into use (Hoge, 1996):
religious persons than for the general population. As a con- • Belief in God has remained constant at about 92 to 95
sequence, the validity of the test in these groups is open to percent of the population.
question. • Belief in the divinity of Jesus Christ has been endorsed
Relatively few studies have investigated the relation- by 75 to 80 percent of adults.
ship between level of moral development on the DIT and
• Belief in an afterlife has remained at about 75 percent
moral behavior. This is understandable, given that the pur-
of the population.
pose of the DIT is not directly to predict behavior but to
evaluate moral development. Still, it is a reasonable Comparable statistics are not available worldwide, but
assumption that individuals who receive higher P scores on it seems likely that the percentage of believing individuals
the DIT should also refrain from moral transgressions such (whether Muslim, Buddhist, Hindu, Jew, or other) is very
as cheating on tests. A study by Cummings, Maddux, Har- high. Most people embrace a spiritual perspective in life,
low, and Dyas (2002) investigated this particular relation- and surely this must have some bearing on their adjust-
ship by asking 145 college students majoring in education ment, behavior, and outlook.
to anonymously fill out both the DIT and the Assessment of Unfortunately, the field of psychology, including the
Academic Misconduct (AAM). The AAM is a 41-item meas- specialty area of testing, largely has maintained an indiffer-
ure of misbehaviors such as copying test answers, down- ence to this important aspect of human experience. Worse
loading term papers, retrospectively changing test answers, yet, in many intellectual circles the endorsement of spirit-
and so forth. Although these individuals reported an aver- ual or religious sentiments is seen as evidence of psycho-
age (but prolific!) level of academic misconduct for college pathology. Among others, Sigmund Freud endorsed a
students—fully three-fourths admitted to one or more cynical view of religion in his aptly titled essay, The Future
transgressions—there was absolutely no relationship of an Illusion (1927/1961). Yet for many persons, a connec-
between scores on the DIT and scores the AAM. Certainly, tion with the transcendent is essential to meaning in life.
more research is needed on the connection (or disconnec- This is especially so in times of extreme duress, as when
tion) between moral reasoning and moral action. personal annihilation knocks at the front door. Consider
Another concern about the DIT is the dearth of norms the experience of Viktor Frankl (1963), a Nazi death camp
pertinent to minority groups. Finally, Westbrook and Bane survivor and founding figure of existential psychology. At
(1992) argue that the technical manual for the DIT lacks essen- one point during World War II he had to surrender his coat
tial details needed to evaluate the adequacy of the test. In with a cherished manuscript in the pockets in exchange for
spite of the concerns listed here, the DIT is a widely respected the worn-out rags of an inmate sent to the gas chamber:
test, particularly for research on moral reasoning. Thoma
Instead of the many pages of my manuscript, I found in a
(2006) provides a thorough review of research on the DIT.
pocket of the newly acquired coat a single page torn out
of a Hebrew prayer book, which contained the main Jew-
9.1.7: The Assessment of Spiritual ish prayer, Shema Yisrael. How should I have interpreted
such a “coincidence” other than as a challenge to live my
and Religious Concepts thoughts instead of merely putting them on paper?
Within the field of psychology, transcendent topics such as
spiritual well-being or faith maturity never have received In the remainder of this topic, we take the view that
mainstream attention. Many years ago, Gordon Allport (1950) spiritual and religious dimensions to life often serve con-
lamented that the subject of religion “seems to have gone into structive purposes and that assessment within these
hiding” among intellectuals and academic researchers: domains is worthy of additional study.

Whatever the reason may be, the persistence of religion in Challenges and Purposes of Religious and
the modern world appears as an embarrassment to the Spiritual Assessment Other than personal or
262 Chapter 9

scholarly curiosity about religious and spiritual matters, and researchers in efforts to create a better world for chil-
what might be the motivation for religious and spiritual dren, lists 18 easily accessible measures of spirituality, the
assessment? Further, what is spirituality, and how is it dis- majority published in recent years (www.search-institute.
tinguished from religiousness? It appears evident that org). There is an abundance of available instruments.
some people can be religious without being spiritual, The motivations for completing an assessment of spir-
ghost walking through religious traditions with no ituality or religiousness might include personal curiosity,
involvement of heart. But is it possible to be spiritual with- but are there other purposes for these tools? Richards and
out being religious? Before we review specific assessment Bergin (2005) make a strong case that clinicians need to
tools, it will prove helpful to examine the distinction include spiritual and religious assessment in psychother-
between spirituality and religiousness, and to discuss the apy. They list five reasons for a spiritual-religious assess-
reasons for assessment in the first place. ment of clients, which include: understanding client world
According to the Yearbook of American and Canadian view and improving the capacity of the therapist to empa-
Churches (2012), total church membership has declined thize; establishing the impact of spiritual-religious views
steadily for many years, even though some denominations on the presenting problem; determining if the spiritual-
have increased in popularity. Alongside this general religious views of the client can be used for growth or cop-
decline in traditional forms of worship, spiritual practices ing; identifying which spiritual interventions might be
have expanded in popularity, as witnessed by the prolifer- useful in therapy; and, recognizing any spiritual doubts
ation of meditation, 12-step, Eastern, yoga, and other that need to be addressed in therapy. These benefits of spir-
broadly spiritual practices. For example, mindfulness med- itual-religious assessment can be extended beyond the
itation, with roots in Buddhism, is more popular than ever therapeutic alliance. Even individuals who are functioning
(Williams & Penman, 2011). It is recommended for prob- within the normal spectrum of personality will benefit
lems with anxiety, depression, pain, hyperactivity, sleep, from feedback about their spiritual-religious health.
parenting, stress, tinnitus, psoriasis, Parkinson’s disease—
the list goes on and on. Those who practice mindfulness, Historical Overview on Spiritual and Reli-
for whatever initial purpose, often embrace it as a way of gious Assessment Interest in the psychology of reli-
being in the world, a spiritual discipline. gion can be traced to the early 1900s when William James
But what is spirituality, and how is it distinguished (1902) composed his masterpiece, The Varieties of Religious
from religiousness? Certainly the two share broad overlap Experience. In this book, James catalogued the manifold
in many cases, but each must possess unique qualities if ways in which humans reveal their interest in transcendent
assessment is to succeed. Kapuscinski and Masters (2010) matters. His overall conclusion was that religion is “an
review the vexing problem of definition and conclude that essential organ of our life, performing a function which no
the terms continue to be used separately but with little other portion of our nature can so successfully fulfill.”
agreement on meaning. Others think we are beginning to Although many writers have offered psychological
see a consensus in the field: analyses of religion since the seminal writings of James, it
was not until the 1960s that scales for the assessment of
Despite definitional difficulties, there is agreement among
researchers that individuals have the capacity to experi- religious variables began to appear (Wulff, 1996). One of
ence spirituality outside the context of religious institu- the first such measures was the Allport-Ross Religious Ori-
tions. Religion is frequently defined by institutional entation scales, which proposed to assess two dimensions
affiliation, whereas spirituality is not. Religion is also of religious expression, the intrinsic and the extrinsic (All-
often considered more external or mediated by a group, port & Ross, 1967). Intrinsically religious persons were
whereas spirituality is more closely associated with per- thought to live their religion (e.g., to find meaning, direc-
sonal experience and is less doctrinaire tion, outlook), whereas extrinsically religious persons were
(Masters & Hooker, 2012, p. 2). believed to use their religion (e.g., to seek security, status,
Heedful that definitions and distinctions will remain sociability). In his earlier writings on this topic, Allport
fuzzy, we believe there is merit in developing measures of referred to intrinsic religious expression as a genuine or
spirituality and religiousness as separate but overlapping mature religious orientation, whereas extrinsic religious
constructs (Hill & Pargament, 2008). expression was viewed as immature. Later he dropped the
In spite of challenges with definition, efforts to develop mature–immature designations because the labels seemed
measures of spirituality and religiousness have flourished overly judgmental.
in recent years. For example, Hill and Hood (1999) com- The impetus for development of these scales was All-
piled information on 125 measures of spirituality/religios- port’s distressing observation of a positive relationship
ity. Dozens of new scales have been developed since the between religiosity (in certain forms) and authoritarian, big-
release of their compendium. The Search Institute, which oted, prejudicial attitudes. As a devoutly religious person,
serves educators, parents, youth groups, faith communities, Allport was convinced that intrinsically oriented religious
Assessment of Normality and Human Strengths 263

individuals rarely would harbor these attitudes. After all, an orientation has been questioned. Kirkpatrick and Hood
essential precept of almost every religious faith is an attitude (1990) summarized the major theoretical and methodologi-
of love toward one’s neighbors. In the Christian faith, this cal criticisms of the scales as follows:
view is summed up in the famous dictum “Love your neigh-
• A lack of conceptual clarity in what the Intrinsic–
bor as yourself” (Mark 12:31). Yet the evidence was over-
Extrinsic scales are supposed to be measuring. Are
whelming to Allport that at least some religious individuals
these types of motivation (i.e., the motives associated
did reveal hatred, bigotry, and prejudice toward their neigh-
with religious belief and practice), or personality vari-
bors. The usual targets of these malicious attitudes were
ables (i.e., pervasive aspects of institutional behavior
racial minorities, Jews, and homosexual persons, among
or involvement), or something else?
others. He reasoned that religious persons with intolerant
attitudes possessed a predominantly extrinsic religious ori- • A confusion over the relationship between the Intrin-
entation; that is, their faith served external goals such as sta- sic–Extrinsic scales. In particular, are these opposite
tus in the community, belonging to an in-group, and the like. ends of a single bipolar dimension, or do the scales
The investigation of this hypothesis (that extrinsically reli- measure separate dimensions (so that conceivably
gious persons would be more authoritarian, bigoted, and some persons could score high on both)?
prejudiced than intrinsically religious persons) required Other problems cited include weaknesses in the facto-
appropriate tools. For this purpose, Allport and colleagues rial structure, reliability, and construct validity of the
developed the Religious Orientation scales. scales; excessive reliance on a “good religion” versus “bad
Examples of the kinds of items on the 11-item Extrinsic religion” dichotomy; and the folly of defining and study-
scale and the 9-item Intrinsic scale are as follows: ing religiousness independent of belief content (Kirkpat-
• The church is important as a place to develop good rick & Hood, 1990).
social relationships. (Extrinsic) In response to the limitations of the Religious Orienta-
• Sometimes I find it necessary to compromise my reli- tion scales, Batson and his associates (1993) developed a
gious beliefs for economic reasons. (Extrinsic) measure of a third religious orientation known as Quest.
These researchers consider Quest to be a more mature and
• I try hard to carry my religion over into other aspects
flexible religious outlook than the intrinsic and extrinsic
of my life. (Intrinsic)
orientations. Actually, Allport recognized the elements
• My religion is important because it provides meaning
inherent to this orientation but failed to incorporate them
to my life. (Intrinsic)
in his Intrinsic scale. Religion as Quest is characterized by
Although originally devised in a yes–no format, mod- complexity, doubt, and tentativeness as ways of being reli-
ern applications of these scales utilize a nine-point contin- gious. Examples of the kinds of items on the 12-item Quest
uum from (1) strongly disagree to (9) strongly agree scale are as follows:
(Batson, Schoenrade, & Ventis, 1993).
• My life experiences have led me to reconsider my reli-
Research on the Religious Orientation scales has not
gious convictions.
provided strong support for Allport’s original hypothesis
(Wulff, 1996). In fact, several studies have shown that per- • I find religious doubts upsetting. (reverse scored)
sons scoring high on the Intrinsic scale actually reveal • As I grow and mature, I expect my religious beliefs to
higher levels of authoritarianism, close-mindedness, and change.
prejudice toward African Americans, gays, and lesbians. • Questions are more important to my religious faith
Hunsberger (1995) concludes that it is not religion per se than answers.
that makes for prejudice, nor is it intrinsic/ extrinsic reli-
Items are scored on the same nine-point continuum from
gious orientation. Instead, “it is the way in which religious
(1) strongly disagree to (9) strongly agree. Results are reported
beliefs are held that seems most directly associated with
as an average rating. Research with 424 undergraduates inter-
prejudice, and this is best explained by the tendency for
ested in religion indicates that Quest is, indeed, a dimension
fundamentalism and right-wing authoritarianism to be
of religious experience independent from both Intrinsic and
closely linked.” Specifically, he links prejudice against
Extrinsic orientations. Whereas Intrinsic and Extrinsic scores
minorities with authoritarian religious traditions that pro-
correlated .72, Quest revealed negligible relationships with
mote an absolute truth, divide the world into “Good” and
both scales (–.05 with Intrinsic and .16 with Extrinsic).
“Evil,” and shun complexity or doubt in their belief sys-
But exactly what does the Quest scale measure? The
tems. These aspects of religious expression are not typi-
intention of its authors was that it assess “the degree to
cally measured by paper-and-pencil tests.
which an individual’s religion involves an open-ended,
Religion as Quest Increasingly, the conceptual basis responsive dialogue with existential questions raised by
for the distinction between intrinsic and extrinsic religious the contradictions and tragedies of life” (Bateson et al.,
264 Chapter 9

1993, p. 169). The three components of the Quest orienta- are scored from 1 (strongly disagree) to 6 (strongly agree).
tion are (1) readiness to face existential questions without The items from the two subscales are combined on the
reducing their complexity, (2) self-criticism and perception SWB Scale, with odd-numbered items assessing religious
of religious doubts as positive, and (3) openness to change. well-being and even-numbered items assessing existential
But critics have charged that the scale may not measure well-being. Some items are worded negatively; these are
anything religious at all, that instead it may assess agnosti- reverse scored so that a higher score always indicates
cism, anti-orthodoxy, religious doubt, or religious conflict. greater well-being. Examples of SWB-like items include
In response to these criticisms, Batson et al. (1993) note My relationship with God helps me through hard times and Life
the following: is inherently without meaning (Reverse scored).

• Students at Princeton Theological Seminary scored The Assessment of Spirituality and Religious
significantly higher (p < .001) on the Quest scale (mean Sentiments (ASPIRES) Scale The Assessment of
of 6.7) than undergraduates at the same institution Spirituality and Religious Sentiments (ASPIRES) scale is a
(mean of 5.2). This finding supports the view that the recent and promising measure of spiritual and religious
scale is a valid measure of something religious. variables (Piedmont, 2010). What makes the test unique is
• The 32 members of a charismatic Bible study group its predictive power above and beyond the Big Five per-
scored significantly higher (p < .001) on the Quest scale sonality factors. In other words, ASPIRES represents an
(mean of 5.5) than the 26 members of a traditional Bible extension of these well established components into a sixth
study group (mean of 4.6). The charismatic group placed dimension of personality (Piedmont, 1999). The scale also
emphasis on religion as a shared search; most prayed is robust across cultures and useful within nonreligious
with hands raised, and some members spoke in tongues. samples, including agnostics and atheists.
The 35-item ASPIRES scale measures two dimensions,
Quest is its own dimension of religious expression, and spiritual transcendence and religious sentiments. Spiritual
substantial research on the meaning and correlates of this transcendence is further subdivided into three facets:
faith orientation has been completed. Batson et al. (1993) prayer fulfillment, universality, and connectedness. Reli-
summarize research with the Quest scale by noting that it gious sentiments consists of two facets: religious involve-
appears to measure a religion of less faith but more works. ments, and religious crisis. The overall structure of the
Quest arose as a response to the limitations of the ASPIRES scale, with descriptions of dimensions and facets,
Intrinsic and Extrinsic approach to the measurement of is shown in Table 9.6. Items resemble I find a sense of peace in
religious orientation. But this brief 12-item scale possesses the quiet of my prayers, and I follow the precepts of an organized
its own limitations, chief among them its brevity and facto- faith. Responses are provided on a 5-point Likert scale
rial simplicity. Several other instruments have been pro- (strongly agree, agree, neutral, disagree, strongly disagree).
posed to measure aspects of religious experience. We
survey a few prominent and representative approaches in
the following sections. Table 9.6 Structure and Description of the ASPIRES
Scale (Piedmont, 2010)
The Spiritual Well-Being Scale The concept of spir- Scale or Facet Name Measure of
itual well-being can be traced to a paper by Moberg (1971)
Spiritual Transcendence The motivational capacity to create a broad
that proposed this form of well-being as an essential compo- Scale (STS) sense of personal meaning for one’s life
nent of healthy aging. Spiritual well-being was conceptual- Prayer Fulfillment (PF) The ability to create a personal space that
ized as a two-dimensional construct consisting of a vertical Facet enables one to feel a positive connection to
some larger reality
dimension and a horizontal dimension. The vertical dimen-
Universality (UN) Facet The belief in a larger meaning and purpose
sion concerned well-being in relation to God or a higher
to life
power, whereas the horizontal dimension involved existential
Connectedness (CN) Feelings of belonging and responsibility to a
well-being, which is a sense of purpose in life without any Facet larger human reality that cuts across
specific religious reference. The challenge of developing a generations and groups

scale to measure these components of well-being was taken Religious Sentiments The extent to which an individual is involved
Scale (RSS) in and committed to the precepts, teachings,
up by Ellison (1983) and Paloutzian and Ellison (1982). and practices of a specific religious tradition
Their instrument was designated the Spiritual Well- Religious Involvements How actively involved a person is in performing
Being Scale (SWB Scale). The SWB Scale consists of two (RI) Facet various religious rituals and activities
subscales: Religious Well-Being (RWB), which assesses the Religious Crisis (RC) Extent to which a person may be experiencing
vertical dimension of well-being in relation to God; and Facet problems, difficulties, or conflicts with the God
of their understanding
Existential Well-Being (EWB), which measures the hori-
Source: Reprinted with permission from Brown, I. T., Chen, T., Gehlert, N. C., & Piedmont,
zontal dimension of well-being in relation to life purpose R. L. (2012, October 8). Age and gender effects on the Assessment of Spirituality and
Religious Sentiments (ASPIRES) Scale: A cross-sectional analysis. Psychology of Religion
and life satisfaction. Each subscale consists of 10 items that and Spirituality, online publication.
Assessment of Normality and Human Strengths 265

The ASPIRES scale demonstrates strong psychometric Maturity Scale (FMS) arose as a practical tool to serve three
qualities. Alpha reliabilities for the facet scales range from research purposes:
.60 (CN) to .95 (PF) with a mean alpha of .82 (Piedmont,
1. Provide baseline data on the vitality of faith in main-
2010). The normative sample consists of nearly 3,000 indi-
stream Protestant congregations
viduals, ages 17 to 94, from four geographic areas of the
Midwestern and East Coast regions of the United States. 2. Identify the contributions of demographic, personal,
The STS portion of the scale correlates with religious and and congregational variables to faith development
spiritual variables and incrementally predicts (above and 3. Furnish a criterion variable for evaluating the impact
beyond the Big Five dimensions) relevant outcomes such of religious education in mainstream denominations
as social support and prosocial behavior (Piedmont, 1999, The development of the scale was a time-consuming
2001). The test holds up well cross-culturally, revealing a and careful process that began with a working definition:
robust factor structure in diverse religious groups and cul-
tures (Nelson & Piedmont, 2008; Piedmont, Werdel, & Fer- Faith maturity is the degree to which a person embodies
the priorities, commitments, and perspectives character-
nando, 2009). The STS component of ASPIRES yields
istic of vibrant and life-transforming faith, as they have
incremental validity in the prediction of treatment outcome
been understood in “mainline” Protestant traditions.
in spiritually based programs for alcohol and drug abuse
(Benson, Donahue, & Erickson, 1993, p. 3)
(Piedmont, 2004). These findings further support the valid-
ity of ASPIRES and also uphold the contention that spiritu- Using open-ended questionnaires with a convenience
ality supplements the Big Five personality dimensions. sample of 410 mainline Protestant adults, the test develop-
In later writings, Ellison described the SWB Scale as a ers next identified eight core dimensions of faith maturity.
measure of psychospiritual personality integration and Three advisory panels provided ongoing counsel during
resultant well-being (Ellison & Smith, 1991). According to this stage and the next phase of item writing. These inter-
this view, well-being consists of “the integral experience of actions assured that the scale possessed face and content
a person who is functioning as God intended, in consonant validity.
relationship with Him, with others, and within one’s self” The resulting FMS is a 38-item test that embodies key
(p. 36). This is the biblical notion of shalom, which denotes indicators of faith maturity in eight core areas (Table 9.7).
being harmoniously at peace within and without. If this Items are answered on a seven-point scale from 1 = never
conceptualization is correct, healthy spirituality as meas- true to 7 = always true. Based upon the areas assessed, the
ured by the SWB Scale should show positive relationships reader will notice that right belief is only one aspect of a
with independent measures of health and subjective well- mature faith. In large measure, faith maturity is defined by
being. Literally dozens of studies have investigated this
broad-range hypothesis, with generally positive findings.
The one identified shortcoming of the SWB Scale is an
Table 9.7 The Eight Core Dimensions and Sample Items
from the Faith Maturity Scale
apparent low ceiling, especially in religious samples. Led-
better, Smith, Vosler-Hunter, and Fischer (1991) caution A. Trusts and believes (5 items)

that the clinical usefulness of the scale is limited to low Every day I see evidence that God is at work in the world

scores (since high-functioning religious persons tend to B. Experiences the fruits of faith (5 items)
“top out” on the scale). They also offer suggestions for I feel weighed down by all my responsibilities (reverse scored)
revision (e.g., rewording items in more extreme directions) C. Integrates faith and life (5 items)
toward the goal of increasing the ceiling level of the SWB My faith influences how I think and act every day
Scale. Bufford, Paloutzian, and Ellison (1991) have pub- D. Seeks spiritual growth (4 items)
lished norms for the test but caution that in many religious I take time to meditate or pray
samples the typical individual receives the maximum E. Experiences and nurtures faith in community (4 items)
score. This would indicate that the scale is helpful in I talk with others about my faith
research but is not useful for distinguishing among indi- F. Holds life-affirming values (6 items)
viduals with high levels of spiritual well-being. I tend to be critical of other persons (reverse scored)
G. Advocates social change (4 items)
The Faith Maturity Scale In 1987, six major Protes-
I believe the churches of this nation should get involved in
tant denominations undertook a national four-year study political issues
of personal faith, denominational allegiance, and their H. Acts and serves (5 items)
determinants (Benson, Donahue, & Erickson, 1993). I offer significant amounts of time to help others
Funded in part by the Lilly Endowment, this project NOTE: The sample items are similar to those on the Faith Maturity Scale.
spawned what is undoubtedly the most sophisticated Source: Based on Benson, P., Donahue, M., & Erickson, J. (1993). The Faith Maturity Scale:
Conceptualization, measurement, and empirical validation. In M. L. Lynn & D. O. Moberg (Eds.),
measure of spiritual maturity ever conceived. The Faith Research in the social scientific study of religion (vol. 5). Greenwich, CT: JAI Press.
266 Chapter 9

value and behavioral consequences. As the authors note, have been made in the understanding and treatment of
the Faith Maturity Scale “parts company with more tradi- many conditions that entail serious and crippling emo-
tional ways of defining and measuring personal religion.” tional pain or other forms of disability. Even so, this one-
Yet it does embody the kinds of behaviors and attitudes sided emphasis from the perspective of disease and repair
that derive from a dynamic, life-transforming faith. These has led to a relative void of positive perspectives. Consider
behaviors and attitudes are consistent with the theology the results of Table 9.8, which compiles the number of
found in most religious traditions but are especially perti- PsychINFO listings conjured up for a variety of terms,
nent for the particular purpose of assessing faith maturity some pathological and some positive. The reader will
in the Protestant context. notice that pathological concepts like Depression or Demen-
The FMS is scored as the mean of the 38 items, which tia are 50 to 100 times more likely to be the topic of inquiry
yields a potential range of 1 to 7. The average score for than positive concepts like Resilience or Gratitude.
3,040 adults in five Protestant denominations was 4.63,
which indicates that the instrument avoids the “ceiling
Table 9.8 Number of PsychINFO Listings for a Sampling
effect” found on other scales such as the Spiritual Well- of Pathological and Positive Terms
Being Scale, discussed previously. The estimated reliability
of the scale is very robust across age, gender, occupation, Pathological Term Number of Listings

and denomination, with typical coefficient alphas of .88 Depression 130,033


(Benson et al., 1993). Test–retest reliability was not reported. Abuse 106,772
The validity of the scale is supported by several lines of Anxiety 113,316
evidence, beginning with the careful approach to item selec- Schizophrenia 74,979
tion, by which face validity and content validity were built- Brain damage 70,235
in. Construct validity was demonstrated in several ways. Addiction 51,969
First, it was predicted and confirmed that groups presumed Mental retardation 39,660
to differ in levels of faith maturity would obtain significantly Dementia 29,860
different mean scores on the FMS. Indeed, pastors scored the
Positive Term Number of Listings
highest (5.3), followed by church education coordinators
(4.9), teachers (4.7), adults (4.6), and youth (4.1)—each group Resilience 5,668

in respective order scoring significantly lower than the oth- Optimism 4,784

ers. Second, pastors’ ratings of the faith maturity of 123 con- Wisdom 4,712

gregation members on a 1 to 10 scale correlated very Altruism 3,502


substantially (r = .61) with the FMS scores of these persons, Genius 1,818
indicating a correspondence between independent expert Courage 1,740
ratings and self-report. The scale also revealed predictive Forgiveness 1,667
utility. Specifically, FMS scale scores were strongly related to Gratitude 751
a variety of prosocial behaviors such as donating time to
help those who are poor, hungry, or sick; promoting a greater In recent years, a movement known as positive psychol-
role for women in the church; and endorsing the use of for- ogy has emerged to redress this imbalance. A simple defini-
eign policy to challenge apartheid. tion of positive psychology is the scientific and practical
pursuit of optimal human functioning (Lopez & Snyder,
2003). One of the founders of the movement, Martin Selig-
9.2: Positive Psychological man, provides a detailed perspective on the movement:
The field of positive psychology at the subjective level is
Assessment about valued subjective experiences: well-being, content-
ment, and satisfaction (in the past); hope and optimism
9.2 Explain positive psychological assessment
(for the future); and flow and happiness (in the present).
With few exceptions, clinical psychology since World War At the individual level, it is about positive individual
II has focused on what is wrong with people and how to traits: the capacity for love and vocation, courage, inter-
alleviate or diminish a host of symptoms and syndromes. personal skill, aesthetic sensibility, perseverance, forgive-
Research abounds on the assessment and treatment of anx- ness, originality, future-mindedness, spirituality, high
talent, and wisdom.
iety, depression, serious mental illnesses, dementia, marital
(Seligman & Csikszentmihalyi, 2000, p. 5)
discord, drug abuse, mental retardation, and brain dam-
age, to name a few areas of significant inquiry. Also included in positive psychology are civic virtues
There is nothing inherently wrong with this extensive such as altruism, tolerance, and work ethic. In sum, posi-
body of research on psychopathology. In fact, huge strides tive psychology is a broad movement linked by the focus
Assessment of Normality and Human Strengths 267

on life-affirming concepts. The goal is to bring balance to We have quoted below a few perspectives on creativity
psychology by helping to build human strengths. from eminent researchers:
An important element of this movement is positive
• If a response is to be called original, it must be to some
psychological assessment, which can be defined as the
extent adaptive to reality (Barron, 1955, p. 553).
measurement of specific human strengths such as those
• We may proceed to define the creative thinking pro-
mentioned above. After all, if a psychological movement
cess as the forming of associative elements into new
proposes to increase human strengths and virtues, it is also
combinations that either meet specified requirements
obligated to develop measurement approaches for pur-
or are in some way useful (Mednick, 1962, p. 221).
poses of research and assessment. In recent years, psychol-
ogists have paid increasing attention to positive forms of • Creativity can be regarded as the quality of products
assessment, resulting in dozens of new instruments and or responses judged to be creative by appropriate
approaches. In their path-breaking edited book on positive observers, and it can also be regarded as the process by
psychological assessment, Lopez and Snyder (2003) com- which something so judged is produced (Amabile,
piled 24 chapters, each detailing several instruments. In 1983, p. 31).
other words, there are now hundreds of instruments avail- • Creativity involves bringing something into being that
able for positive psychological assessment. Some of the is original (new, unusual, novel, unexpected) and also
constructs measured with psychological tests include valuable (useful, good, adaptive, appropriate) (Ochse,
hope, emotional intelligence, optimism, romantic love, 1990, p. 2).
empathy, forgiveness, gratitude, and wisdom-related per- • Creativity is the ability to produce work that is both
formance, to name just a few. novel (i.e., original, unexpected) and appropriate (i.e.,
A comprehensive review of positive psychological useful, adaptive concerning task constraints) (Stern-
assessment would entail a textbook in its own right (if not berg & Lubart, 1999, p. 3).
several). The best we can do here is focus on a few key • Creativity is a specific capacity to not only solve prob-
areas of assessment with a small number of tests that illus- lems but to solve them originally and adaptively (Feist
trate important or interesting approaches to positive psy- & Barron, 2003, p. 63).
chological assessment. In particular, we will review issues
• Creativity is the ability to come up with ideas or arti-
involved in the assessment of creativity, emotional intelli-
facts that are new, surprising, and valuable (Boden,
gence, optimism, hope, forgiveness, and gratitude.
2004, p. 1).

These conceptual definitions emphasize novelty and


9.2.1: Assessment of Creativity usefulness of the creative product, but also suggest that
The topic of creativity has fascinated and yet also vexed creativity is a particular kind of process as well. On these
psychologists and educators for more than a century. elements, there is broad agreement in the field of creativity
Researchers are beginning to understand fundamental ele- research. However, going from conceptual definitions to
ments common to many forms of creativity, yet, a simple operational definitions has proved to be difficult, to say the
definition of creativity remains elusive, and its assessment least. Prentky (2001) notes that “what creativity is, and
continues to be problematic. It is no exaggeration to state what it is not, hangs as the mythical albatross around the
that hundreds of tests of creativity have been published. neck of scientific research on creativity” (p. 97).
Some of these instruments possess respectable psychometric Relevant to assessment, one controversy overshadows
qualities, but most are of questionable validity. Unlike other the study of creativity. This is the question whether creativ-
fields of assessment such as intelligence or personality— ity is general or domain-specific in nature. Kaufman and
where a few instruments have risen to the top and dominate Baer (2004) articulate the concern as follows:
the field—in the field of creativity there are no acknowl-
edged “gold standards” for assessment. In part, this is Is there perhaps something we might label c, analogous to
because of the criterion problem—the difficulty in defining the g of intelligence, that transcends domains and
enhances the creativity of a person in all fields of endeavor?
creativity. Thus, we begin with a foundational question:
And does it make sense to call someone “creative,” or
What is creativity?
should attributions of creativity always be qualified in
Psychologists have sought to understand creativity some way (e.g., “a creative storyteller” or “a creative
since at least the early 1900s. For example, John B. Watson, mathematician,” but not “a creative person”)? (p. 4).
the famous American behaviorist, suggested simplistically
that a poem or brilliant essay is the mere product of shift- In their lengthy review chapter, Kaufman and Baer
ing words around until a new pattern is hit upon (Watson, (2004) acknowledge the complexity of the specific versus
1928). Fortunately, Watson’s simplistic views were fol- general debate, noting that the answer hinges on the defi-
lowed by a large number of more thoughtful formulations. nition of creativity and the assessment methods employed.
268 Chapter 9

But they also render a final conclusion that that the evi- The new instrument, called the Kaufman Domains of
dence for c (general creativity) is weak. We agree with their Creativity Scale (K-DOCS), demonstrated strong psycho-
verdict that creativity appears to be domain-specific. metric qualities, with internal consistency coefficients of
What, then, is the best way to partition the domains of .83 to .86 and test–retest reliabilities (132 participants
creativity? One answer might be to claim that there are as retested after two weeks) of .78 to .86. In addition to find-
many domains of creativity as there are fields of inquiry or ing a clear-cut five-factor structure for the test, additional
expression, whether in science, art, economics, service, evidence of validity was found in the domain scale correla-
leadership, entrepreneurship, or whatever. But this anar- tions with Big Five personality dimensions, which were
chical response rings hollow. People who are creative in theoretical sensible, for example, Openness to Experience
one field typically reveal talent in closely allied fields as correlated significantly with all creativity domains except
well. Gifted writers usually can be good poets, if they Mechanical/Scientific (Kaufman, 2012).
choose, and vice versa. A creative scientist might excel at We turn now to a brief discussion of instruments for
mechanical problem-solving as well. The number of the assessment of creativity. Over the years, creativity has
domains must be somewhere between huge (nearly infi- been studied in terms of cognitive processes, personal
nite), and small (a handful). But creativity is not a single characteristics, and behavioral products (Batey & Furn-
general factor. ham, 2006). We will review these approaches in turn and
Several investigators have derived empirical classifica- examine the assessment methods that each has spawned.
tions of creativity, with the number of domains typically in
Creativity as Process Several theorists and
the range of 5 to 10 (Carson, Peterson, & Higgins, 2005; Kauf-
researchers have focused on underlying cognitive pro-
man, Cole, & Baer, 2009; Ivcevic & Mayer, 2009). The study
cesses in their understanding of creativity. Of historical
by Kaufman (2012) is representative, and we provide modest
interest is Wertheimer’s (1945) suggestion that creativity
details here. His investigation was based on the common
arises when the thinker grasps the essential features of a
sense view that layperson perceptions of constructs like intel-
problem and their relation to a final solution—the so-called
ligence, wisdom, personality, or creativity, when analyzed
“aha!” phenomenon. Wallas (1926) theorized that such
collectively, embody some degree of practical wisdom (Stern-
insights often occur after a period of incubation wherein
berg, 1985). Participants were 2,318 college students asked to
the unconscious mind rearranges the features of the puzzle
rate an initial collection of 94 items as follows:
even while the conscious mind takes “time off” from the
Compared to people of approximately your age and life problem.
experience, how creative would you rate yourself for each Mednick (1962) proposed that creativity is the capacity
of the following acts? For acts that you have not specifi- to combine remote associations. According to this view,
cally done, estimate your creative potential based on your
creativity is a matter of novel arrangements of unusual
performance on similar tasks
associations to a given stimulus. Consider the invention of
(Kaufman, 2012, p. 300).
the grain reaper by McCormick, based on the association
Students rated themselves on a 5-point Likert scale between grain and hair (Weber, 1969). It occurred to the
from 1 (much less creative) to 5 (much more creative) on each inventor that grain is like the hair on a person’s head. Since
item. The items were gleaned from several prior research mechanical clippers are used to cut hair, something like
projects. The 94 items coalesced into five factors (from fac- hair clippers could be used to cut grain. We see in this
tor analysis), which provided a basis for reducing the scale example how a creative invention was developed from a
to 50 items organized into 5 domains of about 10 items remote association.
each. The emergent domains were the following: Based on his process-oriented view of creativity, Med-
nick (1962) developed the Remote Associates Test (RAT), a
clever index of the remoteness of verbal associations. The
The Five Domains of Creativity
RAT is a timed, 40-minute paper-and-pencil test with inter
item reliability consistently above .90. (Mednick & Med-
nick, 1966). Some examples of the kinds of items encoun-
tered on the RAT:

• rat–blue–cottage
• out–dog–cat
• wheel–electric–high
• surprise–line–birthday

For each triplet, the examinee must find a fourth word


that “fits” in the sense of having reasonable (but often
Assessment of Normality and Human Strengths 269

remote) associations to the other three words. (The correct sons in various professions (Barron, 1968; Martindale,
answers above are cheese, house, wire, and party.) Compe- 1981). In this methodology, colleagues within a field of
tent performance on this test would appear to require a study nominate other individuals who are high and low in
capacity to examine several novel or remote associations at creativity, and their consensus view is used to identify two
the same time and to search for the one association that is select groups of individuals (high-creative, low-creative).
common to all three stimulus words. These groups are then contrasted on personality measures,
Validity studies of the RAT have been mixed in out- including self-checked adjectives and standard personality
come. Early studies were promising and indicated that inventories.
high RAT-scorers tended to receive higher ratings for the Based on hundreds of studies, a fairly stable set of core
creativity of their products (e.g., architectural designs, characteristics of creative persons has emerged (Barron &
research projects, suggestions, and drawings) than low Harrington, 1981; Dellas & Gaier, 1970). Interestingly, the
scorers (Mednick & Mednick, 1966). One early study distinguishing characteristics of creative individuals
showed that high RAT-scoring scientists tended to write appear to be largely temperamental, although a certain
more research proposals, to win more research grants, minimum level of intelligence also is required. Harrington
and to win bigger grants than lower scorers (Gordon & (1975) has captured a not altogether flattering portrait of
Charanian, 1964). However, later studies indicated com- the creative person in his Composite Creative Personality
plex patterns between RAT scores and other creativity Scale, which consists of 42 self-checked adjectives (from a
indices. For example, Andrews (1975) found that RAT larger list) that empirically distinguish creative from non-
scores predicted the innovativeness of research for medi- creative persons. These adjectives include many positive
cal sociologists only for a small subsample of the respond- terms such as active, curious, imaginative, inventive, original,
ents whose environment provided certain “prerequisites” resourceful, and sensitive, but also embrace negative terms
for achieving payoff from creative ability. Specifically, such as argumentative, cynical, egotistical, impulsive, rebel-
among those researchers who were responsible for initiat- lious, and unconventional. These qualities fit well with the
ing new activities, who hired their own research assis- observation of Feist (1999):
tants, who had stable employment and low interference
One of the most distinguishing characteristics of creative
from superiors, the correlation between RAT scores and
people is their desire and preference to be somewhat
innovativeness of research was a healthy + .55. But these
removed from regular social-contact, to spend time alone
researchers constituted less than a fourth of the sample; working on their craft . . . to be autonomous and inde-
for the remainder of the subjects there was no relation- pendent of the influence of a group. (p. 158)
ship between the RAT and creativity. These complex and
contradictory findings are typical of research on the In addition to the broad generalizations noted above,
assessment of creativity. the particular link between personality characteristics
Ochse (1990) provides a thorough appraisal of RAT and creative behavior also depends on the specific domain
validity. He concludes that the test may predict scores on of investigation. For example, compared with their less
tests of verbal fluency, but fails to predict creativity in gen- creative counterparts, creative artists tend to be more
eral. In other words, the RAT is not so much a general spontaneous, creative writers tend to be more noncon-
measure of creativity as a specialized measure of verbal forming, creative architects tend to be less flexible, and
intelligence. Recently, Bowden and Jung-Beeman (2003) creative engineers tend to be better adjusted than other
published extensive normative data for RAT-type items. groups (Piirto, 1998). In attempting to predict creative
Based on 289 university students, their normative data behavior from personality characteristics, one creative
consists of percentage correct for 144 items under four time personality type may not fit all creative occupations (Kerr
limits (2, 7, 15, and 30 seconds). They recommend using & Gagliardi, 2003). Batey and Furnham (2006) provide an
these normative data to investigate process factors such as excellent review of the complex literature on creativity
incubation, the impact of hints, and techniques to facilitate and personality.
problem solving. Recently, Sternberg (2002) has proposed that creative
individuals are distinguished not so much by specific traits
Creativity as Personal Characteristics Guil- as by the heartfelt decision to be creative:
ford (1950) was one of the first researchers to define crea-
I believe that, although creative people differ in an aston-
tivity in terms of the person when he asserted that
ishing number of ways, there is, in fact, one key attribute
“creativity refers to the abilities that are most characteristic that they all possess. . . . This attribute is the decision to be
of creative people.” His pronouncement helped inspire an creative. People who create decide that they will forge
expansion of research on the personal characteristics of their own path and follow it, for better or for worse. The
creative persons. Much of this research has relied upon path is a difficult one because people who defy conven-
contrasts of peer nominated high- and low-creative per- tion often are not rewarded. (p. 376)
270 Chapter 9

This perspective suggests that creative individuals Creativity as Product The most enduring defini-
will be characterized by a stubborn dedication to their tions of creativity have used the product as the distinguish-
creative endeavors, even when rewards for their activities ing sign of this capacity. According to this approach,
seem to be lacking. creative persons create products (ideas, inventions, writ-
The opinion that creativity resides within qualities of ings, artistic outputs, etc.) that meet certain criteria. For
the person continues to be popular. From this perspec- example, Jackson and Messick (1968) applied four criteria
tive, self-report measures are the natural and preferred to creativity:
assessment method (Silvia, Wigert, Reiter-Palmon, &
• Novelty: Creative products are new, or at least repre-
Kaufman, 2012). Table 9.9 summarizes a few promising
sent a new application of the familiar.
instruments.
• Appropriateness: The product must be appropriate to
the context, not merely novel.
Table 9.9 Self-Report Measures of Creativity
• Transcendence of constraints: A product transcends
constraints when it goes beyond the traditional.
• Coalescence of meaning: The value of creative prod-
ucts may not be apparent at first, the full significance
may only be appreciated with time.

The Jackson and Messick (1968) criteria have proved


helpful in delineating the special characteristics of a crea-
tive outcome, but they do not constitute a psychological
measure of creativity. For measures of creativity based on
the product-oriented approach, we must examine the sem-
inal studies of Joy Paul Guilford and the various tests
inspired by his factor-analytic research.
As the reader will recall from an earlier chapter, Guil-
ford (1959, 1985) formulated a structure of intellect model
that parceled intelligence into 150 factors aligned upon
three dimensions: operations, constructs, and products.
One of the operations that emerged from Guilford’s factor
analyses was divergent thinking:

Divergent thinking is defined as the kind that goes off


in different directions. It makes possible changes of
direction in problem solving and also leads to a diver-
sity of answers, where more than one answer may be
acceptable.
(Guilford, 1959)

Divergent thinking is virtually the opposite of conver-


gent thinking. Convergent thinking is the production of a
single correct answer determined by facts and reason.
Western civilization places such a heavy emphasis on con-
vergent thinking that we are inclined to dismiss the value
of divergent thinking, even to mock it as undisciplined
and, therefore, unproductive. But divergent thinking is
essential to creative discovery. Unconstrained, freewheel-
ing thought is the hallmark of the creative person. Tests of
divergent thinking are therefore considered excellent
measures of creativity.
Guilford and his colleagues developed about a dozen
experimental measures of divergent thinking (Guilford &
Hoepfner, 1971), some of which were subsequently stand-
ardized and published as the Christensen-Guilford Fluency
Tests. Subtests and items similar to his measures include:
Assessment of Normality and Human Strengths 271

immediate and long-term consequences. The time limit


Subtests and items similar to the experimental measures of
divergent thinking for each activity is five minutes.

Figure 9.1 Example Stimulus Card Used for the First


Three TTCT-Verbal Subtests
NOTE: A stimulus card similar to the above is used for the Asking Questions,
Guessing Causes, and Guessing Consequences subtests.

Although Guilford’s tests never received wide usage


and eventually faded into obscurity, his theories and con-
tributions were highly influential in the field of creativity
studies. In particular, Guilford’s influence is found in the
work of E. Paul Torrance (1915–2003), who developed a
group of tests still in use today.
The Torrance Tests of Creative Thinking (TTCT)
(Kim, 2006; Torrance, 1966) are based loosely on Guil-
ford’s model, although Torrance was more concerned
with the interest level of his measures than with their fac- In the fourth activity of the Verbal subtests, Product
torial purity. These tests purport to assess a global cogni- Improvement, the task is to suggest improvements to a toy
tive construct of creativity—a style of thinking believed that would make it more appealing to children. For
to be essential to creative achievements. The TTCT sub- example, the child might be shown a picture of a stuffed
tests do not assess motivation, expertise, intelligence, or rabbit and asked to think of ways to change the toy so
other capacities that could contribute to creative produc- that others would have more fun playing with it. Unusual
tivity. The test comes in two parallel forms, A and B, Uses, the fifth activity, is a familiar standby in creativity
which are highly comparable. The comments below refer assessment, namely, thinking of unusual uses for a com-
to both forms. mon object such as a brick. The final Verbal subtest is Just
The TTCT consists of two parts: The TTCT-Verbal and Suppose, which involves asking the examinee to list the
the TTCT-Figural. Suitable for ages 6 through 18 and problems and benefits that might arise from an improba-
beyond, the TTCT-Verbal contains six subtests: ble situation. For example, the child might be told “Just
suppose that clouds had strings hanging down from
Asking Questions
them—what might be some problems or benefits of this
Guessing Causes situation?”
Guessing Consequences The verbal subtests are scored according to three criteria:
Product Improvement 1. Fluency—the raw number of relevant ideas;
Unusual Uses 2. Originality—the inventiveness or creativity of the ideas;
Just Suppose 3. Flexibility—the flexibility of categories of ideas.
The first three verbal subtests are based on the same Of course, the manual for the TTCT, which is periodi-
stimulus card which shows a simple pen and ink drawing cally updated for normative data, provides significant
of one or two human-like figures engaged in ambiguous guidance on scoring (Torrance, 1974, 1998).
activity. A TTCT-like drawing is shown in Figure 9.1. In The TTCT-Figural consists of three activities, which
the first activity, Asking Questions, the child is encouraged are suitable for ages 5 through 18 and beyond:
to ask questions about the picture. In the second activity,
Picture Construction
Guessing Causes, the child is told to guess the causes of the
action in the picture. In the third activity, Guessing Conse- Picture Completion
quences, the child is instructed to speculate about the Repeated Figures
272 Chapter 9

The time limit for each activity is 10 minutes. In the testers produce interrater reliabilities in the .90s. Test–
first activity, Picture Construction, the child draws a pic- retest reliability coefficients are lower, in the range of .50
ture using a simple shape (jelly bean or pear) as a start- to .93 (Kim, 2006). Reliability data certainly are strong
ing point. The stimulus shape must become an integral enough to support the use of the test for group testing
part of the constructed picture. In the second activity, and research purposes (Trefflinger, 1985). However, mak-
Picture Completion, the examinee encounters 10 incom- ing individual decisions (e.g., admission to special pro-
plete figures and is asked to complete a drawing from gram for gifted children) solely on the basis of TTCT
each and then to name each drawing. An example of a scores could be problematic.
TTCT-like drawing (with completion and title) is shown The validity of the TTCT is a more complicated ques-
in Figure 9.2. In the last activity, Repeated Figures, the tion, especially in light of the difficulty of defining the
child is provided two or three pages of repeated figures criterion—what is creativity? Yet, the instrument is rea-
(e.g., circles) and asked to use them in constructing pic- sonably predictive of later creative accomplishments,
tures that are then named. For example, the child might even in the long run. For example, in a sample of 80 par-
draw a rectangle encompassing six circles and name it ticipants, the correlation between a TTCT creativity index
“swiss cheese.” derived from assessment in elementary years and the
quality of highest creative achievements in adulthood
(40-year follow-up) was a healthy r = .43 (Cramond, Mat-
Figure 9.2 Example TTCT-Figural Picture Completion thews-Morgan, Bandalos, & Zuo, 2005). In this study, the
Drawing with Title
quality of creative achievements was rated blindly from
NOTE: This sample resembles one of the ten incomplete figures used on the
Picture Completion subtest. autobiographical materials supplied by the research par-
ticipants. The correlation, r = .43, was higher than the
observed relationship between childhood IQ and adult
creativity, r = .32. Creativity as measured by the TTCT
appears to be more predictive of certain forms of achieve-
ment than intelligence.
Overall, with its 50 years of research and strong psy-
chometric properties, the TTCT is one of the best instru-
ments for creativity assessment. The test has been
translated into 35 languages and has spawned more
research than any other measure in the field. Among its
many strong features, age- and grade-norms are available
for more than 50,000 participants, kindergarten through
high school. Applications of the test are mainly with
Scoring of the TTCT-Figural subtests is based on five school-aged children, although norms are provided for
norm-referenced measures and 13 criterion-referenced out- adults as well (Kim, 2006).
comes. The five norm-referenced measures include:
Comment on Creativity Tests Tests of creativity
1. Fluency—the raw number of stimuli provided; have served a useful function in highlighting the diver-
2. Originality—the number of statistically infrequent sity of skills that make up the whole of intellectual abil-
drawings; ity. As a consequence of research on creativity, educators
3. Abstractness of Titles—the abstraction level of the and psychologists now realize that an exclusive empha-
titles; sis on “correct” thinking (i.e., convergent problem solv-
4. Elaboration—the provision of details and elaboration; ing) is too narrow a focus for education and assessment
alike. However, the validity of creativity tests is still an
5. Resistance to Premature Closure—the degree of open-
open question. One problem is that definitions of creativ-
ness for incomplete figures.
ity (e.g., Jackson & Messick, 1968, above) do not lend
The 13 criterion-referenced measures include a variety themselves easily to psychometric measurement, that is,
of creative strengths expressed in the drawings such as tests of creativity do not operationalize the construct of
emotional fluency, unusual visual perspective, humor, creativity very well (Chase, 1985). In part, the failure to
colorful imagery, and fantasy. operationalize creativity stems from the multifactorial
Although scoring of the TTCT is tedious and elabo- nature of this puzzling ability. Consider this observation:
rate—especially for the Figural subtests—experienced whereas a general factor almost always can be extracted
Assessment of Normality and Human Strengths 273

from intelligence and ability tests, it seems clear that • Using emotions to facilitate thinking; and
there is no corresponding factor in the realm of creativity. • Perceiving emotions accurately in oneself and others.
For example, a creative painter is unlikely to be a creative (p. 507)
musician or a creative research scientist. Creativity is
almost always specific to the realm in which it is identi- These theorists propose that emotional intelligence
fied. This specificity poses a difficult obstacle to general is an instance of traditional intelligence, not something
measures of creativity. different from it. In other words, emotional intelligence
(EI) is an important and overlooked subset of abilities
9.2.2: Measures of Emotional that contribute to human efficiency and adaptation.
Thus, just as prior researchers have documented verbal
Intelligence forms of intelligence (e.g., verbal comprehension) and
In the history of psychology, emotions and intelligence perceptual forms of intelligence (e.g., perceptual rea-
generally have been viewed as distinct capacities of the soning) Mayer et al. (2008) propose that emotional
individual, each capable of influencing the other, but sep- intelligence is a third major subdivision that comple-
arate nonetheless. For example, Thomas Chalmers (1833) ments the traditional dichotomy of verbal and percep-
wrote an early chapter titled On the Connection between the tual abilities.
Intellect and the Emotions. Chalmers was a Scottish church To understand how emotional intelligence differs
leader who catalogued the disruptive influence of emo- from traditional forms of intelligence, imagine a situation
tions on clear thinking. In like manner, the American psy- in which you visit a close friend in the hospital. He has
chologist Henry H. Goddard (1919) proposed a separation just emerged from emergency surgery after a serious head
of the emotions and intelligence. He argued that intelli- injury from a fall. He lies still in bed with his eyes closed.
gence, properly exercised, can modify and influence emo- Standing around your friend are anxious family members
tions for the benefit of the individual. and a stern-faced doctor. What would you do or say?
The first person to hint at a possible union of emo- Would you press forward to join the family members?
tional and intellectual factors was the eminent American Would you leave the room and return later? Would you
psychologist E. L. Thorndike (1920). In a short essay hug or console others? Would you ask the doctor for an
published in Harper’s Magazine for a general audience, update? You will need to make these and many other
Thorndike spoke of three kinds of intelligence: abstract, choices in a matter of seconds. Adaptive functioning in
mechanical, and social. The first two types are well this complex situation would require you to manage your
known in assessment and have been validated repeat- own emotions (maybe you feel strong relief that you are
edly. However, the third kind of intelligence, social intel- not the one in the hospital bed), understand the subtle
ligence, has proved more elusive. Thorndike defined emotional signals conveyed by others (perhaps the glassy
social intelligence as “the ability to understand and stare of the sister indicates that you are not welcome at
manage people.” An essential part of this ability is the this time), use your emotions to facilitate thinking (maybe
accurate recognition of emotions in others. Unfortu- your anguish is so strong that you think it wise to remain
nately, early attempts to measure social intelligence quiet), and perceive emotions accurately in others (per-
proved fruitless (Thorndike & Stein, 1937). The concept haps everyone is quiet because your friend has just drifted
gradually fell out of favor. off to sleep). Successful navigation of this difficult and
Recently, the idea that emotions and intellect might painful situation would require high levels of emotional
constitute a single cluster of intertwined abilities has intelligence.
reemerged in the concept of emotional intelligence, as pro- Because of the subtlety and complexity of the con-
posed by Mayer, Salovey, and colleagues (Salovey & struct, the assessment of emotional intelligence has
Mayer, 1989–90; Mayer, Salovey, & Caruso, 2008). The proved challenging. However, with innovative forms of
notion of emotional intelligence has been pursued by other testing such as embodied in the MSCEIT or Mayer-Sal-
researchers as well (discussed below); however, the Mayer- ovey-Caruso Emotional Intelligence Test (Mayer, Salovey,
Salovey model boasts the strongest theoretical and empiri- & Caruso, 2002), progress is being made. This instrument
cal underpinnings, so we begin with their approach. Mayer consists of 141 items that yield a total emotional intelli-
et al. (2008) define emotional intelligence as follows: gence score as well as two Area scores, four Branch Scores,
and eight Task scores. Table 9.10 provides a brief descrip-
• Managing emotions so as to attain specific goals; tion of the test, which is designed for adults age 17 and
• Understanding emotions, emotional language, and the older. Normative data are based on a sample of more than
signals conveyed by emotions; 5,000 individuals.
274 Chapter 9

An interesting issue with tests of emotional intelli-


Table 9.10 Brief Description of the MSCEIT Tasks gence like the MSCEIT is how to determine the correct
answers. After all, the questions involve subtle emotional
concepts, for which the “correct” responses are not neces-
sarily obvious. Consider the following question, which
resembles some found on the MSCEIT:
What emotion(s) might prove helpful to feel when
talking with a police officer who has just stopped you for
speeding?

Deference not helpful … 1 … 2 … 3 … 4 … 5 … very helpful


Mild anxiety not helpful … 1 … 2 … 3 … 4 … 5 … very helpful
Surprise not helpful … 1 … 2 … 3 … 4 … 5 … very helpful
Irritation not helpful … 1 … 2 … 3 … 4 … 5 … very helpful

The authors of the MSCEIT propose two different scor-


ing methods: consensus scoring and expert scoring. In con-
sensus scoring, the majority choices of the normative
sample are used to identify the correct options. For exam-
ple, in the example above if 67% of the general population
circled the number “1” for “irritation” (i.e., it is not help-
ful), this answer would be coded as the correct alternative.
Respondents would receive lower scores to the extent they
deviated from this alternative. This method is also called
general scoring because the reference point is the general,
normative sample.
The second approach, expert scoring, relies on the
judgment of experts in the field of emotion to determine
the correct options. In particular, the authors used 21
experts attending a conference of the International Society
for Research on Emotion. Scoring for this approach relies
on the consensus of these experts. Fortunately, the two
scoring approaches (general and expert) reveal a very high
agreement, on the order of .96 to .98 (Mayer, Salovey, &
Caruso, 2002).
The rationale for consensus scoring—whether based
on the general population or experts—is that emotions and
their expression possess an evolutionary and social basis.
Emotions constitute a “signal system” that conveys impor-
The overall score on the MSCEIT is called the Emo- tant information to those around us. For example, the emo-
tional Intelligence (EI) score. This score is normed to a tion of sadness signals loss and wanting to be comforted;
mean of 100 and standard deviation of 15. The two Area the emotion of anger indicates the individual feels threat-
scores (Experiential and Strategic) and the four Branch ened and could respond forcefully; the emotion of happi-
scores (Perceiving, Facilitating, Understanding, and Man- ness conveys an interest in joining others. Individuals who
aging) likewise are normed to these traditional bench- do not “read” emotions in a consensual manner likely will
marks. While scores are provided for the eight Tasks (see experience difficulty in a broad range of social situations.
Table 9.10), the test developers caution against overinter- The validity of the MSCEIT has been investigated from
pretation of these elemental scores because of their lower numerous perspectives, including factorial, discriminant,
reliability. The overall EI score demonstrates strong inter- and predictive validity. Some results indicate that the
nal reliability, in the low .90s, whereas the reliability of the instrument measures a unitary skill that can be subdivided
two Area scores is slightly lower and more variable, typi- into the four branches described above (Mayer, Salovey,
cally in the high .80s (Mayer, Salovey, & Caruso, 2002). Caruso, & Sitarenios, 2003). Further, EI as measured by the
Test–retest reliability of the overall score is respectable at MSCEIT reveals generally low correlations with verbal
.86 (Brackett & Mayer, 2003). intelligence, general intelligence, and major dimensions of
Assessment of Normality and Human Strengths 275

personality, that is, the construct provides something that major personality constructs. For example, a correlation of
goes beyond established measures (Mayer, Salovey, & r = –.77 with the anxiety scale from Cattell’s 16PF is
Caruso, 2004). EI is potentially useful because of its inverse reported (Newsome et al., 2000). The EQ-i appears to dem-
relationship with deviant behaviors such as bullying, sub- onstrate strong reliability, with test–retest reliability of .85
stance abuse, and violence. These relationships—high EI after one month (Bar-On, 1997). What remains unclear is
scores corresponding to low deviance—hold true even whether the test measures emotional intelligence as a con-
after the statistical control of intelligence and personality struct, as it is understood by others (Conte, 2005).
variables (Rubin, 1999; Trinidad & Johnson, 2002).
In spite of the supportive literature provided by pro-
ponents of EI measures, other reviewers maintain a cau- 9.2.3: Assessment of Optimism
tious stance about the MSCEIT and similar tests. For Optimism is another fruitful area for psychometric
example, in a comprehensive review of the psychometrics research and assessment. Typically this construct is
of emotional intelligence, Zeidner, Roberts, and Matthews viewed as one end of a bipolar continuum, optimism–
(2008, p. 71) concluded that there has been “irrational pessimism. The difference between the two ends of the
enthusiasm surrounding the practical utility of emotional spectrum is captured in the familiar adage about the glass
intelligence.” They note that evidence regarding the role of water that is half-full to the optimist and half-empty to
of EI in occupational success is weak, based largely on the pessimist. Whether this bipolar depiction is an accu-
anecdotal reports and popular sources like Daniel Gole- rate portrayal of the underlying construct(s) is a topic we
man’s (1995) book, Emotional Intelligence: Why It Can Mat- take up below. Nonetheless, it is certainly the starting
ter More than IQ. point for many theorists and for the perceptions of the lay
Even the developers of the MSCEIT acknowledge the public as well. Carver and Scheier capture why this area
potential for misuse of their instrument. Mayer, Salovey, of assessment is important: “Optimists are people who
Caruso, and Sitarenios (2003, p. 104) state flatly that “the expect good things to happen to them; pessimists are peo-
applied use of EI testing must proceed with great caution.” ple who expect bad things to happen to them. Does this
The growing trend to use these instruments in selection of difference among people matter? It certainly does. Opti-
employees is, therefore, disquieting. As Conte (2005, p. mists and pessimists differ in several ways that have a big
438) notes, managers and organizational leaders “should impact on their lives. They differ in how they approach
be wary of making this leap unless more rigorous discrimi- problems and challenges they encounter, and they differ
nant, predictive, and incremental validity evidence for EI in the manner and the success with which they cope with
measures is shown.” life’s difficulties” (2003, p. 75). In short, optimism and
In addition to the MSCEIT, a few other measures of pessimism have to do with people’s expectations for the
emotional intelligence have gained recognition. One of future. Optimists expect a better future than pessimists
these is the Emotional Competence Inventory (Sala, 2002), and generally have more confidence in their ability to
based on Goleman’s (1995) conception of emotional intel- manage challenges when they arise. Generally, optimists
ligence. The Emotional Competence Inventory (ECI) con- fare better than pessimists in terms of personal adjust-
tains 110 items organized into four clusters: (1) ment and even physical health, although the differences
Self-Awareness, (2) Social Awareness, (3) Self-Manage- for health are not substantial (Peterson, 2000). How these
ment, and (4) Social Skills. One appealing feature of this individual differences arise in personal development is
instrument is the 360-degree feedback that it yields. In this an important and intriguing question that we do not pur-
method, self-ratings, peer ratings, and supervisor ratings sue here. Instead we focus on assessment issues, namely,
are reported separately for comparison and contrast. The how is optimism measured?
ECI is used mainly in large corporate settings for formative The most widely used instrument is the revised Life
evaluation of employees. The publishers have maintained Orientation Test (LOT-R; Scheier, Carver, & Bridges, 1994).
tight proprietary control over the test, which has limited This is an intriguingly simple scale that consists of six
independent research on its psychometric qualities. scored items and four “filler” items (10 items total).
Another widely used test is the Bar-On Emotional Respondents indicate their extent of agreement with the
Quotient Inventory (Bar-On, 2000), which is traditionally items on a five-point Likert scale ranging from 1 or
known by the acronym EQ-i. This 133-item self-report “strongly disagree” to 5 or “strongly agree.” Items similar
instrument yields an overall EQ score as well as five com- to those found on the LOT-R include:
posite scores: (1) intrapersonal, (2) interpersonal, (3) adapt-
I have a positive outlook and expect the best in life
ability, (4) general mood, and (5) stress management.
Reviewers of the EQ-i have noted that the theory behind I don’t expect good things to happen to me (reverse
the test is unclear (Matthews, Zeidner, & Roberts, 2002). scored)
Further, the test appears to overlap substantially with I enjoy my family life a great deal (filler)
276 Chapter 9

Of course, negatively worded items are reverse scored.


Responses on the six scored items are then summed to 9.2.4: Assessment of Gratitude
yield a total from 6 (highly pessimistic) to 30 (highly opti- As Emmons, McCullough, and Tsang (2003) observe, grati-
mistic). Even though “pessimist” and “optimist” are cate- tude is difficult to define. In part, this is because the con-
gories in popular language, the LOT-R instead provides a cept can be viewed as an attitude, an emotion, a disposition,
score on a continuum, without strict cut-offs. In large sam- or a personality trait. A simple definition is that gratitude is
ples of respondents, the score distribution tends to be a response of thankfulness and joy when receiving a gift.
skewed toward the optimistic side, but not excessively so But delving further, difficulties arise. What constitutes a
(Carver & Scheier, 2003). gift? What are the possible sources of a gift? Some gifts are
Although the theoretical basis for the LOT-R postu- obvious and not debatable, as when neighbors deliver a
lates an optimism-pessimism continuum, psychometric precooked meal to someone who is grieving a loss. Almost
analyses by Herzberg, Glaesmer, and Hoyer (2006) with everyone would experience gratitude in this situation. But
huge samples of adults (N = 46,133) reveal that the opti- what about viewing a sunrise, taking a hot shower, or see-
mism and pessimism items on the test measure two inde- ing a baby smile in the supermarket? Should we experi-
pendent constructs rather than a single, bipolar trait. This ence gratitude for these opportunities as well? In other
is a counterintuitive finding which suggests that optimism words, does gratitude require a personal benefactor, or can
and pessimism are partly independent. Conceivably, an it be expanded to the countless ways in which life pleas-
individual could earn high scores on both (or low scores on antly surprises the mindful person?
both), although these outcomes probably are rare. In prac- Regardless of how it is conceptualized, gratitude is uni-
tice, many researchers now report three scores from the versally recognized as a personal virtue because it promotes
LOT-R: an optimism score based on the positively worded social cohesion and provides an inner buffer against the toil
items, a pessimism score based on the negatively worded and pain of everyday life. In general, people with a grateful
items, and a total score that combines the two. disposition experience greater well-being than those with-
An additional finding of the Herzberg et al. study out this asset (Emmons et al., 2003). The German-French
(2006) is that the reliability of the instrument is low (Cron- theologian and physician Albert Schweitzer (1969), who
bach alphas of .71 for the Optimism items and .68 for the founded a hospital in west central Africa and received the
Pessimism items). Thus, the test is recommended for group Nobel Peace Prize for his philosophy of “Reverence for
research only; it is not suitable for clinical practice with Life,” referred to gratitude as the “secret of life” (p. 36). Truly,
individuals. that is a strong statement! In general, gratitude has received
A substantial literature points to the general conclu- less attention as a topic of measurement than it deserves. But
sion that LOT-R optimists fare much better than pessimists recent efforts are beginning to redress this deficiency.
on a wide variety of outcome measures (Snyder & Lopez, One such effort is the Gratitude Questionnaire-Six
2007). For example, in a sample of 275 Japanese college stu- Item Form (GQ-6) developed by McCullough, Emmons,
dents, LOT-R total scores correlated r = . 39 with social sup- and Tsang, 2002. The GQ-6 is a simple self-report measure
port, and r = −.26 with interpersonal conflict (Sumi, 2006). of the disposition to experience gratitude (Figure 9.3). The
In a sample of 504 Australian high school students, LOT-R test consists of the six best items from a longer list of state-
scores correlated r = .55 with self-esteem and r = −.38 with ments that articulate gratitude and appreciation.
psychological distress (Creed, Patton, & Bartrum, 2002). In The reader will notice that the GQ-6 is based on a Lik-
other words, for both studies LOT-R total scores modestly ert-type format with seven alternatives ranging from 1
predicted good social adjustment. (strongly disagree) to 7 (strong agree). Two items are stated
Steptoe, Wright, Kunz-Ebrecht, and Iliffe (2006) inves- in the reverse (and therefore reverse scored) as a way of
tigated the relationship between LOT-R scores and numer- inhibiting response bias. The development and choice of
ous health behaviors in 128 community-dwelling seniors specific test items was based on a thorough analysis of the
65 to 80 years old. Dispositional optimism as measured by many facets of the grateful disposition (McCullough,
the LOT-R total score was associated with many healthful Emmons, & Tsang, 2002). The authors determined that
behaviors, including moderate alcohol consumption, not gratitude reflects intensity (feeling more intensely grate-
smoking, brisk walking, and vigorous physical activities ful), frequency (feeling grateful many times a day), span
(women only). Self-rated health and physical health status (grateful for many things), and density (grateful to many
both were associated with optimism, although the direc- individuals). Initially, they proposed 39 items to measure
tion of influence would be difficult to determine from this these qualities. The GQ-6 is composed of the six best items,
cross-sectional study. The full scale was more consistently as determined by factor-analytic procedures performed
associated with these positive relationships than either the with test results from two samples: 238 undergraduates
optimism or pessimism subscales of the test. Carver and and 1,228 adult volunteers surveyed via the Internet. Reli-
Scheier (2002) review additional external correlates of opti- ability of the instrument is good, with coefficient alphas
mism as measured by the LOT-R. between .82 and .87. Validity of the GQ-6 is based on
Assessment of Normality and Human Strengths 277

struct which the researchers called gratitude/appreciation.


Figure 9.3 The Gratitude Questionnaire-Six Item Form (GQ-6)
Gratitude is an essential element of human experience that
Source: Reprinted with permission of Michael McCullough and Robert Emmons. Copy-
right 2002, all rights reserved. deserves ongoing psychometric inquiry.

Using the scale below as a guide, write a number beside each statement
to indicate how much you agree with it.
9.2.5: Sense of Humor: Self-Report
1 = strongly disagree Measures
2 = disagree
Humor is a broad construct that has many meanings.
3 = slightly disagree
4 = neutral Humor can refer to the characteristics of the material (a
5 = slightly agree funny joke or cartoon) or the responses of the individual (a
6 = agree chuckle or belly laugh). Humor can be constructive when it
7 = strongly agree
brings people together, or destructive when it is at some-
_____1. I have so much in life to be thankful for. one’s expense. In contemporary Western society, having a
_____2. If I had to list everything that I felt grateful for, it would be a very
sense of humor is generally viewed as a virtue. It is thought
long list.
_____3. When I look at the world, I don’t see much to be grateful for.* that individuals with a “good” sense of humor will more
_____4. I am grateful to a wide variety of people. easily befriend others and also will be able to weather the
_____5. As I get older I find myself more able to appreciate the people, adversities of life with greater balance.
events, and situations that have been part of my life history.
_____6. Long amounts of time can go by before I feel grateful to
But how do we conceptualize the loose notion of “sense
something or someone.* of humor?” Is this an enduring personality trait, an ability
*Items 3 and 6 are reverse scored. to make others laugh, a temperamental feature of good
cheer, a world view that life is fundamentally absurd, or
numerous theory-confirming relationships with other something else? Martin (2003, p. 315) argues that: “One of
measures. For example, self-ratings on the GQ-6 correlated the challenges of research on humor in the context of posi-
modestly with external observers’ perceptions of gratitude tive psychology is to identify which aspects or components
in the participants. Additional studies indicated that the of the humor construct are most relevant to mental health
GQ-6 is positively related to optimism, hope, spirituality, and successful adaptation.” His answer is to conceptualize
religiousness, forgiveness, empathy, and prosocial behav- humor as a way of coping with stress and enhancing rela-
ior. The scale is negatively related to depression, anxiety, tionships. With this approach, Martin has developed three
materialism, and envy (McCullough et al., 2002). instruments used widely in humor research: The Coping
While the GQ-6 conceives of gratitude as a single Humor Scale, the Situational Humor Response Question-
dimension, other researchers have proposed a multidimen- naire, and the Humor Styles Questionnaire.
sional model. For example, the Gratitude, Resentment, and The Coping Humor Scale was designed to assess the
Appreciation Test (GRAT, Watkins, Woodward, Stone, & extent to which individuals report using humor to cope with
Kolts, 2003) proposes three dimensions to gratitude: stress (Martin & Lefcourt, 1983). The CHS consists of 7 items
similar to “When things are tense I look for something funny
• Appreciation of others, expressed as gratitude toward
to say” or “I think humor is a useful way of coping with
other people.
problems.” These items are rated on a scale from 1 (strongly
• Simple appreciation, expressed as gratitude toward disagree) to 4 (strongly agree). There is no neutral point on
non-social sources. the scale, which forces the respondent to take a position.
• Sense of abundance, expressed as the absence of gen- The CHS has good test–retest reliability, with r = .80
eral resentment. over a 12-week period, but only fair internal consistency,
The 42 items of the GRAT are rated on a 1 to 5 scale with coefficient alphas of .60 to .70 (Martin, 1996). Regard-
(strongly agree to strongly disagree). The test possesses ing validity, Martin (2003, p. 317) summarizes a number of
excellent reliability for the three subscales and the total score robust external correlates of the test. CHS total scores cor-
(Thomas & Watkins, 2003), and reveals theory-consistent relate strongly with the following constructs:
relationships with external criteria such as spirituality and • Peer ratings of using humor to cope with stress
the absence of materialism (Diessner & Lewis, 2007). • Peer ratings of not taking one’s self too seriously
Even though the authors of the GRAT hypothesized a
• Researcher ratings of funniness of monologues pro-
multidimensional model in the development of their test,
duced under stress
subsequent research indicates that gratitude might actually
• Researcher ratings of using laughter and humor before
be a unitary trait. Wood, Maltby, Stewart, and Joseph (2007)
dental surgery
conducted a factor analysis of the three GRAT subscales
and nine other indices of gratitude (including the GQ-6), The CHS is a respected instrument in humor research.
and found a clear one-factor solution. The 12 measures Nonetheless, it has faded in use because later instruments (dis-
were highly intercorrelated, indicating a single latent con- cussed below) provide broader measures of sense of humor.
278 Chapter 9

The Situational Humor Response Questionnaire pro- The HSQ includes 32 self-descriptive statements (8 for
vides a measure of the degree to which the respondent is each subscale) that depict specific uses of humor. For exam-
easily amused and laughs in a wide range of situations ple, items on the Affiliative scale might resemble: “I like to
(Martin, 1996; Martin & Lefcourt, 1984). The SHRQ con- tell silly jokes based on word play.” Items on the Aggres-
sists of 21 items, the first 18 of which describe ordinary life sive scale might resemble: “I like to poke fun at people
situations such as “You were at a party and the host acci- when they make mistakes.”
dentally spilled a drink on you.” Each item is rated on a The first two styles, Affiliative and Self-enhancing,
scale from 1 (“I would not have been particularly amused”) embody constructive and healthy uses of humor. The
to 5 (“I would have laughed heartily”). The last three items last two styles, Aggressive and Self-defeating, involve
refer to laughing and being amused in general. unhealthy uses of humor that distance the individual
As summarized by Martin (1996), the SHRQ reveals from others. For each item, respondents indicate agree-
adequate psychometric qualities, including test-retest cor- ment or disagreement on a 7-point scale ranging from 1
relations of around .70 and Cronbach alphas in the vicinity (totally disagree) to 7 (totally agree). The HSQ reveals
of .70 to .85. An interesting validity criterion used in several excellent psychometric properties, with strong internal
studies is the correlation of test scores with observed fre- consistencies of the subscales (around .80), and good
quency of laughter, with rs ranging from .30 to .60. As noted test–retest reliabilities (.80 to .85). Validity is based on
by Martin (2003), frequency of laughter is a good validity convergent and discriminant correlations of the sub-
criterion, but it is not perfect. After all, there is laughter scales with appropriate external criteria including well-
without humor and humor without laughter. Fortunately, being, hostility, intimacy, coping, satisfaction with
the validity evidence for this instrument includes a wide relationships, and major personality variables (Martin
base of diverse studies, such as correlations with rated fun- et al., 2003).
niness of monologues produced by participants, and corre- How do individual differences in humor styles arise?
lations with other humor scales. Another concern about the A recent behavioral genetics analysis comparing HSQ
test is that the humor situations were designed with college scores of identical and fraternal twins found fascinating
students in mind and may not generalize to other groups. differences in developmental influences among the four
The humor situations date to the 1980s and earlier; some are humor styles (Vernon, Martin, Schermer, & Mackie, 2008).
no longer funny. After all, what is deemed funny shifts over In this study of 300 pairs of identical twins and 156 pairs of
time, is specific to cultures, and is sometimes idiosyncratic. fraternal twins, the positive forms of humor (Affiliative
For example, some viewers find the video clips featured on and Self-enhancing) were found to display significant
the television show America’s Funniest Home Videos to be genetic influences whereas the negative forms of humor
hilarious, whereas others regard this weekly offering with (Aggressive and Self-defeating) arose in greater measure
bewilderment or even downright scorn. from common environmental influences. The authors offer
Recently, Martin and colleagues have developed a new the following conclusion:
humor instrument that represents the culmination of decades
These results may have implications for potential thera-
of research. The Humor Styles Questionnaire (HSQ, Martin, peutic interventions designed to modify individuals’
Puhlik-Doris, Larsen, Gray, & Weir, 2003) assesses four dimen- sense of humor. Because traits that are mainly influenced
sions that convey individual differences in uses of humor: by environmental factors may be more malleable than
those that are mainly influenced by genetic factors, our
The Four Dimensions of the Humor Styles Questionnaire (HSQ) findings suggest that it may be easier to help people
reduce their levels of aggressive and self-defeating humor
styles than to increase their use of affiliative and self-
enhancing humor. This is clearly a topic for further exper-
imental study.
(Vernon et al., 2008, pp. 1123–1124)

The lesson here for psychological testing is that the


development of good measures such as the HSQ often gen-
erates far-reaching consequences.

Chapter Quiz: Assessment of Normality and Human


Strengths
Chapter 10
Neuropsychological
Assessment and Screening
Learning Objectives
10.1 Review neurobiological concepts relevant to 10.2 Review prominent neuropsychological
psychological testing and assessment instruments, test batteries, and screening
tools

10.1: Neurobiological these brain systems is essential for those who study or use
psychological tests. In this primer, the reader also will
Concepts and Behavioral encounter several of the simpler approaches to assessment
used by neuropsychologists. In the process, a good foun-
Assessment dation will be set for Module 10.2, Neuropsychological
Tests, Batteries, and Screening Tools, which reviews prom-
10.1 Review neurobiological concepts relevant to inent neuropsychological instruments, test batteries, and
psychological testing and assessment screening tools.
In the practice of assessment, psychologists often discover
that their clients need assistance with serious problems
10.1.1: The Human Brain: An
that are best understood from a neurobiological stand-
point. These problems typically arise as a consequence of Overview
head injury, learning disability, memory impairment, lan- By convention the nervous system is divided into the cen-
guage disorder, or attentional difficulties, to list just a few tral nervous system consisting of the brain and spinal cord,
examples. Tens of millions of individuals are affected. For and the peripheral nervous system that includes the cra-
example, in the United States an estimated 5 to 8 million nial nerves and the network of nerves emanating from the
children struggle with a learning disability (Dey, Schiller, & spinal cord. The brain is intimately involved in thinking,
Tai, 2004), about 13 to 16 million adults live with memory feeling, and behaving. For these reasons, our focus in this
loss and other symptoms related to dementia (Alzheimer’s topic is the structure and function of the brain.
Disease and Related Disorders Association, 2000), and The brain is beyond doubt the most protected organ in
approximately 1.7 million people experience a head injury the human body. The first line of defense against physical
each year (Faul, Xu, Wald, & Coronado, 2010). trauma is the skull, consisting of several intermeshed, rigid
These numbers are staggering, and they provide an bones that almost completely encase the brain. Beneath the
ongoing mandate for psychologists to develop specialized skull, the brain is also surrounded by the meninges, a thin
tests and procedures at the interface of psychology and layering of three tough membranes that encases the brain
medicine. The purpose of this chapter is to summarize and spinal cord, providing additional protection. The mid-
pertinent tests, concepts, methods, and issues encountered dle spongy layer of the meninges is filled with another
in neuropsychological assessment and ancillary areas of form of protection, cerebrospinal fluid, which buffers the
appraisal such as substance abuse evaluation and screen- brain against sudden acceleration and deceleration, such
ing for dementia. In Module 10.1, Neurobiological Con- as from a blow to the head. The brain literally floats in a
cepts and Behavioral Assessment, we provide a condensed snugly fitting bath of cerebrospinal fluid. Buoyancy
review of neurobiological concepts relevant to psychologi- reduces the effective weight of the organ to a few ounces,
cal testing and assessment. The emphasis in this topic is vastly reducing pressure upon the base of the brain. With-
upon the various brain systems that underlie effective out the protection of this fluid, the brain would bruise eas-
cognitive and emotional functioning. Understanding ily from any rapid movement of the head.
279
280 Chapter 10

When unbouyed, the brain weighs less than three These energy sources are supplied by the flow of blood
pounds. It is composed principally of five elements: gray through the cardiovascular system. Hence, the general
matter, white matter, glial cells, cerebrospinal fluid (CSF), physical health of the client and the specific condition of
and the blood vessels of the vascular system that provide his or her vascular system in the brain are essential to high-
the brain with oxygen and nutrients. level cognitive functioning.
The 10 11 or 100 billion neurons in the brain are Two pairs of arteries carry blood to the brain. These
arranged in complex networks that largely have defied are the left and right internal carotid arteries, found in the
understanding. In part, the inscrutability of the brain front of the neck, and the left and right vertebral arteries,
derives from its computational complexity. Neurons com- found in the back of the neck. The vertebral arteries come
municate by sending all-or-none electrochemical impulses together just below the base of the brain to form a single
to one another. Each neuron might send transmissions to artery, the basilar artery. These three arteries—the left and
thousands, perhaps tens of thousands, of other neurons at right internal carotids and the basilar artery—all feed into
near and distant sites called synapses. Chemical commu- a circular arterial structure at the base of the brain known
nications across the synapses can occur up to a thousand as the circle of Willis. This circular network ensures that the
times a second. Even if we use a conservative estimate of brain receives a continual supply of blood, even if one of
a thousand synapses per neuron, in theory the number of the input arteries is compromised.
neural transmissions that could occur in just one second From this circular arterial system at the base of the
is a staggering 1017 or 100,000,000,000,000,000 (one hun- brain, three arteries branch upward on each side to the
dred quadrillion). No wonder that staid neuroscientists roughly symmetrical cerebral hemispheres of the brain.
such as Sir John Eccles (who received a Nobel Prize for The anterior cerebral arteries supply blood to the left and
his work in neurophysiology) resort to hyperbole and right frontal lobes and some midline structures. The mid-
describe the brain as “without qualification the most dle cerebral arteries provide blood to the vast majority of
highly organized and most complexly organized matter the lateral surface of each hemisphere, including the fron-
in the universe” (Eccles, 1973). Considering how little we tal, parietal, and temporal lobes, and to some internal
know of the universe, the truth of this statement is open structures as well. Finally, the posterior cerebral arteries
to question. But it does effectively underscore the point supply blood to the left and right occipital lobes and to
that neuroscientists approach the study of the human additional subcortical structures.
brain with a sense of awe. Especially with advancing age, it is not unusual for
one or more arteries in the brain to become completely
Cerebrospinal Fluid and the Ventricular
obstructed by a condition known as atherosclerosis, the
­System Cerebrospinal fluid (CSF) is a clear liquid that
gradual buildup of fatty plaque. When an artery in the
is continuously produced and replenished within the ven-
brain becomes completely obstructed—whether gradually
tricles. The ventricles are hollow, interconnected chambers
or suddenly from a piece of dislodged plaque—the brain
found in the middle of the brain. There are four ventricles:
tissue supplied by that vessel dies because it is deprived
two side-by-side ventricles, called the lateral ventricles,
of oxygen. This event is called an infarct, which is one
and two midline ventricles known as the third and fourth
kind of stroke or cerebrovascular accident (CVA). Another
ventricles.
kind of CVA occurs when a bulging area of arterial weak-
In rare cases, the normal flow of CSF can become
ness, called an aneurysm, bursts open, allowing blood to
constricted, such as when the aqueduct leaving the third
spurt directly into the brain tissue. This is technically
or fourth ventricle becomes too small. This can be a con-
known as an arterial rupture. The effects of a CVA depend
genital condition present at birth or a disease-related
upon the size and location of the resulting damage to the
state observed in adulthood. In children, the increase in
brain. For example, an infarct occurring at the base of the
pressure can lead to enlargement of the ventricles and
left middle cerebral artery would have calamitous gener-
compression of the brain against the skull. In time, the
alized effects (e.g., right-sided paralysis of the body, loss
skull can even enlarge. This condition is known as hydro-
of speech), whereas an infarct occurring higher up, in a
cephalus or, literally, “water on the brain.” Untreated,
smaller offshoot from the artery, might have limited effects
the consequence of hydrocephalus can be mental retar-
or even go unnoticed. One form of vascular impairment
dation and early mortality. Fortunately, effective treat-
known as multi-infarct dementia (MID) occurs when the
ments are available, including the insertion of a shunt to
hardly noticeable individual effects of many small infarcts
drain the excess fluid from the ventricles—usually into
accumulate over a number of years. The symptoms of
the child’s abdomen.
MID are varied but often impact the ability to perform
The Vascular System of the Brain Metaboli- everyday activities such as eating, dressing, and shop-
cally, the brain is a highly active organ, needing substantial ping. The symptoms might include forgetfulness, vague
supplies of oxygen and glucose to function effectively. or circumstantial speech, lack of concentration, loss of
Neuropsychological Assessment and Screening 281

­ alance, physical weakness, difficulty following instruc-


b information was restricted to one hemisphere or the other.
tions, and problems handling money. Often the onset of For example, when a picture of an apple was tachistoscopi-
MID is so gradual and insidious that relatives recognize cally presented to the left side of the examinee’s fixation
only in retrospect that something has been wrong for point, this stimulus was processed only in the right hemi-
months after the onset of problems. sphere (on account of the normal crossing over of neural
connections). Furthermore, because the corpus callosum
was severed, the image of the apple remained trapped in
10.1.2: Structures and Systems the right hemisphere. As the reader will discover later, the
of the Brain right hemisphere is usually mute and does not subserve
The organization of the human brain is difficult to compre- important language functions. Thus, when asked, “What
hend because important structures are interwoven and did you see?” the examinees, responding from the verbal
folded over upon one another. As noted, the brain also con- left hemisphere, would honestly reply, “Nothing.” Yet,
tains an intricate system of fluid-filled caverns, the ventri- these patients could readily identify the object by pointing
cles, further complicating the spatial arrangement of to it with the left hand (which is under the neural control of
important brain structures. In addition, functional brain the right hemisphere). This suggests that although the
systems rarely obey any simple structural organization— right hemisphere cannot talk, it has a separate and inde-
they typically meander their way from one part of the pendent capacity to perceive, learn, remember, and issue
brain to another. Hence, we will focus mainly on a func- commands for motor tasks.
tional systems approach to explaining the operation of the In a normal individual with intact corpus callosum,
brain, alluding to structures when appropriate. consciousness appears unitary because the two halves of
We begin with a quick overview of the central nervous the brain can communicate and forge a compromise as
system and its primary subdivisions. The most basic ele- regards perception, thought, and action. Much of our
ment of the nervous system is the cerebrum, consisting of knowledge of hemispheric specializations, discussed later,
the left and right cerebral hemispheres, which are con- has been garnered from the detailed study of split-brain
nected by the corpus callosum, a band of fibers that trans- patients. Further insight has been gained from studies of
fers information from one hemisphere to the other. From persons living with the congenital absence of this struc-
the standpoint of evolution, the cerebrum is the most recent ture, a condition known as agenesis of the corpus callosum
part of the brain to develop. This is where thought, percep- (ACC). Present in about 1 in 4,000 live births, ACC mani-
tion, imagination, judgment, and decision occur. Some fests with a variety of deficits, superbly summarized by
essential structures located beneath the cerebrum are the Paul, Brown, Adolphs, and others (2007). Even though
basal ganglia and the cerebellum (both important in coor- overall IQ is minimally impacted, impairments are
dinated movement), the diencephalon (including the thala- observed in abstract reasoning, problem solving, and cate-
mus), the midbrain (consisting of the cranial nerves and gory fluency (e.g., the ability to list multiple items in a cat-
other important relay stations), the pons (connecting the egory such as animals). One intriguing symptom that bears
cerebrum with the cerebellum and the spinal cord), and the on current understanding of language function is that per-
medulla (mediating essential bodily functions). sons with ACC show marked difficulty in the verbal
expression of emotional experience. Parents of children
Corpus Callosum The corpus callosum is the major with the disorder consistently describe conversations that
commissure that serves to integrate the functions of the are meaningless or out of place (Paul et al., 2007). This cor-
two cerebral hemispheres. This large bundle of subcortical responds well with known lateralization of brain function,
nerve fibers is about four inches long and a quarter inch in which logical components of language are underwritten
thick. The corpus callosum spans the brain from side to by the left hemisphere, whereas the emotional aspects of
side just above the level of the thalamus. Although there language are subserved by the right hemisphere. In the
are exceptions, the corpus callosum generally connects absence of a corpus callosum, individuals with ACC find it
homologous brain sites in the left and right hemispheres. particularly difficult to synthesize these two elements of
The function of the corpus callosum was poorly under- language.
stood until the 1960s when Sperry, Gazzaniga, and others
initiated sophisticated laboratory studies of so-called split- Cerebral Cortex The cerebral cortex, the outermost
brain patients (Sperry, 1964; Gazzaniga, 1970; Gazzaniga & layer of the brain, is the source of the highest levels of sen-
LeDoux, 1978). These patients were persons with epilepsy sory, motor, and cognitive processing. Also called the neo-
whose corpus callosa had been severed to prevent the cortex, the cerebral cortex is a very recent evolutionary
transport of epileptic discharges from one hemisphere to development. It is the functional capacity of this brain sys-
the other. Although outwardly normal, split-brain patients tem—a uniform six layers deep—that most dramatically
revealed a striking isolation of consciousness when visual separates humans from the lower animals.
282 Chapter 10

The tissue of the cerebral cortex is folded over into overlap with nearby brain sites. In brief, the committed
elaborate convolutions consisting of bulges and grooves. cortex of the frontal lobe is dedicated to motor control, the
The prominent bulges are called gyri (singular gyrus), parietal lobe is concerned with the processing of touch
whereas the clefts, fissures, and grooves are called sulci and other somatosensory information, the occipital lobe is
(singular sulcus). This arrangement allows the brain to involved in visual perception, and the temporal lobe is
have a great deal more cerebral cortex than if the surface essential to the processing of auditory information. Of
were smooth. Although the pattern of gyri and sulci is sub- course, these brain regions serve other functions as well,
tly unique for each person, certain major landmarks such but part of each major lobe is dedicated to a specific motor
as the central sulcus and the lateral sulcus (Figure 10-1) are or sensory function (Figure 10-2).
always discernible in a normal brain.
Figure 10.2 The Structural Model of Left Hemisphere
Figure 10.1 Major Landmarks of the Left Cerebral Language Functions
Hemisphere

A small portion of the cerebral cortex is committed 10.1.3: Survival Systems: The
cortex. These sites are dedicated to basic sensory process-
ing of vision, hearing, touch, and motor control. Nonethe- Hindbrain and Midbrain
less, the specificity of committed cortex is relative, not The lowest part of the brain, located at the top of the spinal
absolute. For example, the precentral gyrus classically is cord, consists of the hindbrain, which includes the medulla
regarded as the motor cortex (see Figure 10-1), but only a oblongata, the pons, the reticular formation, and the cere-
fraction of the neurons subserving voluntary movement bellum. From the standpoint of evolution, the hindbrain
are located there. This has been demonstrated through was the first brain system to develop, which explains why
neurosurgical investigations of the exposed cortex in per- so many vital bodily functions are governed by this brain
sons with epilepsy, beginning with the pioneering work of area. For example, the automatic control of breathing is
Wilder Penfield (1958). The fully conscious patient received mediated here—we breathe even when asleep, or for that
local anesthesia while surgeons opened a skull flap to matter, when in a deep coma.
expose one side of the brain. Then a stylus was used to The lowest section of the hindbrain is the medulla
deliver a small, brief, harmless electrical charge to specific oblongata, which mediates several essential bodily func-
sites in the sensory, motor, and language areas. The pur- tions: breathing, swallowing, vomiting, blood pressure,
pose of this procedure was to map the topography of the and, partially, heart rate (Kandel, Schwartz, & Jessell, 1995).
cortex so that vital brain sites were not excised. Using this Aspects of talking and singing also are governed here,
approach, Uematsu, Lesser, Fisher, and others (1992) recon- although higher brain sites are intimately involved in these
firmed that a significant proportion—more than one- functions as well.
third—of motor responses originate outside the classic Significant damage to the medulla usually is fatal. In
narrow cortical strip. Some motor responses emanate from rare cases, a small stroke in the medulla causes one or more
the sensory strip, and others from adjoining brain sites. of the following symptoms: opposite-sided paralysis, par-
Furthermore, the motor strip contains a sizeable propor- tial loss of pain and temperature sense, clumsiness, dizzi-
tion of sensory cells, too. Thus, cells that subserve each ness, partial loss of the gag reflex, and same-sided paralysis
specific sensory or motor function are highly concentrated and atrophy of the tongue. Thus, one reason why neurolo-
in the respective committed area, but also thin out and gists ask patients to stick out their tongue and move it from
side to side is specifically to check for neurological damage
Neuropsychological Assessment and Screening 283

in and around the medulla. The polio virus—rampant in 10.1.4: Attentional Systems
the 1950s but now well controlled—may attack the
Attention has been likened to a “spotlight” that our brain
medulla, shutting down the neural control of breathing
uses to identify what is relevant and ignore what is irrele-
and necessitating a mechanical respirator.
vant (Andreasen, 2001). Attention is often a primitive, auto-
The pons and cerebellum are the highest structures in
matic cognitive system that is essential for survival. Consider
the hindbrain. Together they help coordinate muscle tone,
the variety of competing stimuli encountered when you
posture, and hand and eye movement. The role of the cer-
drive a car down the highway, perhaps with a friend sitting
ebellum in motor control is discussed later. Lesions of the
next to you. A realistic scenario is that your friend asks a
pons may render the individual incapable of making coor-
question, an airplane flies low in the distant horizon, a bill-
dinated lateral eye movements. For this reason, neurolo-
board on the left lures your visual focus, a siren blares in the
gists and neuropsychologists commonly ask patients to
distance, your back aches from a strenuous workout—all
demonstrate left-right and up-down eye movements.
these sources of stimulation compete for your attention.
Located just above the hindbrain is the midbrain,
Then a car swerves into your lane. Instantly, without con-
which includes a number of important relay stations
scious forethought, your brain focuses every last fragment
involved in hearing and vision. In addition, the midbrain
of attention on this one looming threat, ignoring all else.
contains nuclei for many of the cranial nerves (some of
Neuropsychologists have identified several kinds of
which also emanate from the hindbrain). The 12 paired
attention, including the following types:
cranial nerves are major neural tracts whose functions
are well understood and easily tested. Some are exclu- • Orienting
sively sensory, relaying information from the external • Selective
world to the brain; some are exclusively motor, serving to
• Divided
execute commands from the brain; about a third of the
• Sustained
cranial nerves possess both sensory and motor functions.
Neurologists refer to the cranial nerves by number. The
numbers correspond roughly to the top to bottom
sequence of the nerves’ emergence from the brain
(Table 10.1). The reader will notice that many cranial
nerves mediate aspects of vision and eye movement, basic
sensory functions, and movement of jaw, tongue, face,
and head. Over the centuries, neurologists have devised a
variety of simple confrontational techniques to assess the
cranial nerves. As peculiar as it may appear, asking the
patient to stick out his or her tongue and move it left,
right, up, or down can provide important information
about the functioning of the hypoglossal (12th) cranial
nerve. In like manner, various simple tests of hearing, bal-
ance, eye movement, and so on are used to complete the
examination of the cranial nerves.

Table 10.1 The Cranial Nerves and Their Functions


1. Olfactory Sense of smell
2. Optic Vision
3. Oculomotor Horizontal and vertical eye movement
4. Trochlear Vertical eye movement
5. Trigeminal Facial sensation, jaw movement
6. Abducens Horizontal eye movement
7. Facial Facial movement and taste
8. Auditory/vestibular Hearing and balance
The exact neurological mechanisms of attention are
9. Glossopharyngeal Taste, swallowing
not well understood. Kandel, Schwartz, and Jessell (1995)
10. Vagus Visceral reflexes
note that the “neuronal mechanisms of focused attention
11. Accessory Head movement
and conscious awareness are now emerging as one of the
12. Hypoglossal Tongue movement
great unresolved problems in perception and indeed in all
284 Chapter 10

of neurobiology” (p. 402). Neurologically, attention is a Bodily movements may lose their coordination in cere-
complex function that involves the collaborative effort of bellar disease, becoming spasmodic and jerky. Even a sim-
several brain sites. Furthermore, different forms of atten- ple gesture such as reaching for a cup may result in the
tion appear to invoke different brain systems. For example, inadvertent thrusting of cup and contents halfway across
sustained attention or vigilance is mediated by the reticu- the room. The characteristic wide-based gait of alcoholics—
lar formation, a network of ascending and descending called ataxia—is a consequence of cerebellar degeneration
nerve cell bodies and fibers, which begins in the spinal (Ghez, 1991). Another symptom of cerebellar damage is
cord and extends through the medulla all the way up to the intention tremor, so named because it is not present at rest
thalamus. Specific nuclei within the reticular formation but arises during voluntary, intentional movements of the
project through the thalamus to wide areas of the brain and hands. Nystagmus also is common in cerebellar disease. In
thereby help mediate attention. Based upon the classic this symptom, the eyes appear to jitter back and forth even
studies of Moruzzi and Magoun (1949) demonstrating that when the individual attempts to hold a steady gaze.
ascending nerve tracts within the reticular formation gov- In conjunction with the vestibular center in the inner
ern general arousal or consciousness, portions of this struc- ear, the cerebellum also helps coordinate the vestibuloocu-
ture are also known as the reticular activating system. lar reflex (VOR). The VOR acts to maintain the eyes on a
Damage to the reticular activating system gives rise to fixed target when the head is rotated. Without the VOR,
global diminution of consciousness ranging from chronic vision would be incredibly blurred whenever the head
drowsiness to stupor or coma (Carpenter, 1991). moved even a fraction of an inch. Instead, a small area of
Selective attention appears to invoke brain sites in the cerebellum coordinates a rapid refixation of the eyes to
addition to the reticular formation. For example, based compensate for head movements.
upon functional imaging studies that highlight active brain The basal ganglia consist of a collection of nuclei in
sites, it appears that the cingulate gyrus is essential for the in the forebrain that makes connections with the cere-
focusing upon relevant aspects of the environment while bral cortex above and the thalamus below. The basal gan-
ignoring irrelevant information. One finding is that, when glia are traditionally considered as part of the motor
asked to perform complex attentional tasks, persons who system. The main constituents of the basal ganglia are
suffer from schizophrenia and who, therefore, reveal defi- three large subcortical nuclei: the caudate, the putamen,
cits in selective attention also show dysfunction in the cin- and the globus pallidus. Some authorities also consider the
gulate gyrus (Carter, Mintun, Nichols, & Cohen, 1997). amygdala to be part of the basal ganglia (Carpenter, 1991).
These structures are interconnected with and functionally
related to the subthalamic nucleus and the substantia
10.1.5: Motor/Coordination Systems nigra. Along with the cerebellum, the corticospinal system,
Although many brain sites are involved in motor control, and the motor nuclei in the brain stem, the basal ganglia
three areas are of special significance: the cerebellum, the participate in the control of movement. Unlike the other
basal ganglia, and the motor cortex. The cerebellum sits just components of the motor system, the basal ganglia do not
below the cerebrum at the back of the brain. Together with have direct connections with the spinal cord. The motor
other brain structures, it helps coordinate muscle tone, pos- functions of the basal ganglia are indirect and are mediated
ture, and hand and eye movements. Lesions in or near the via neural connections with the frontal cerebral cortex.
cerebellum may render the individual incapable of making The most common syndrome caused by damage to the
coordinated lateral eye movements. For this reason, neurol- basal ganglia is Parkinson’s disease (PD) (Factor & Weiner,
ogists and neuropsychologists commonly ask patients to 2008). In Parkinson’s disease, three characteristic types of
demonstrate left-right and up-down eye movements. An motor disturbances are observed: involuntary movement,
individual with damage to the cerebellum might not be able including tremor; poverty and slowness of movement
to move his or her eyes with facility in all directions. without paralysis; and changes in posture and muscle tone.
The cerebellum receives sensory information from In its later stages, this disease is typified by an immobile,
every part of the body and coordinates the details of auto- masklike facial expression, an extreme difficulty initiating
matic skilled movements. Damage to the cerebellum may movements, and a fine tremor that may disappear once a
cause a variety of motor disturbances, depending upon the movement is under way.
specific sites affected (Manto & Pandolfo, 2002). Slurred, Patients with Parkinson’s disease also reveal specific
hesitant speech known as dysarthria may be a symptom of cognitive deficits, suggesting that the basal ganglia con-
cerebellar damage. Muscles may become flabby and tire tribute not just to movement but to thinking as well. Defi-
easily. Rapid, coordinated tapping of the index finger may cits observed in these patients include problems
prove difficult. Measures of finger-tapping speed (Reitan & formulating goals and evaluating progress, difficulties
Wolfson, 1993) are, therefore, an important component of with attention, limitations in word-finding, and slowed
neuropsychological test batteries. thinking. Some patients with PD report that their brain
Neuropsychological Assessment and Screening 285

feels “swampy” (Tröster, 2012). A loss of spontaneity and a semantic memory is general knowledge not tied to a spe-
lack of initiative also are observed (La Rue, 1992). cific learning experience, such as knowing that a butterfly
The motor cortex is found on the precentral gyrus of is an insect, not a bird. Working memory is the retention of
the frontal lobe. Primary motor cells that subserve volun- information that we need only briefly, such as remember-
tary movement are located here and in adjoining brain ing the digits of a phone number just long enough to com-
sites. Motor control is substantially but not exclusively plete the call. Associative memory involves memories that
contralateral (opposite-sided), meaning that the left pre- are invoked because of their association with particular
central gyrus subserves the right side of the body, and vice cues, for example, recalling the smell and taste of popcorn
versa. Thus, when an individual makes a decision, say, to when hearing the sound of it popping in the microwave.
lift his right hand, motor neurons in the left precentral Declarative memory involves the “what” of memory (e.g.,
gyrus will be activated. For obvious reasons, this area is knowing that a bicycle has two wheels) whereas proce-
also known as the motor strip. dural memory involves the “how” of memory (e.g., know-
The fact that motor control is substantially opposite- ing how to ride a bicycle). Another way of dividing
sided is the basis for several neuropsychological proce- memory is explicit versus implicit, which defines the dif-
dures that compare the function of the two sides of the ference between memories that are immediately accessible
body as a means of determining the integrity of the left and and obvious (e.g., knowing your name) compared to those
right motor strips. Consider the finger-tapping test, that are latent, beneath the surface (e.g., surprising your-
employed with many neuropsychological test batteries self when you are able to recall the name of your first-
(e.g., Reitan & Wolfson, 1993). In a typical finger-tapping grade teacher).
procedure, the examiner uses standardized procedures Another important distinction is between short-term
with repeated trials to determine the maximal tapping and long-term memory. Short-term memory is synony-
rates of the left and right index fingers over a 10-second mous with working memory and is very short in duration,
span. Of course, the preferred hand will have a slight lasting from perhaps 10 seconds to a minute. If short-term
advantage, with a normative expectation of a rate that is 10 memories are not “refreshed” through rehearsal, they dis-
percent higher. For example, in a right-handed person, a appear after this brief duration. Long-term memory refers
tapping rate of 55 for the right index finger and 50 for the to memories that have been consolidated in some way so
left index finger might be typical. that they are more lasting in duration—hours or years—
Any significant deviation from this expected pattern although not necessarily permanent.
may suggest impairment in the opposite-sided motor strip. Describing the brain systems involved in memory is
For example, suppose a right-handed examinee has a tap- challenging because multiple brain sites are typically
ping rate of 47 for the right index finger and 50 for the left involved and different types of memory utilize different
index finger. Because the right-sided tapping rate is com- pathways. Even so, there is substantial evidence that struc-
paratively slower than expected (i.e., 6 percent slower tures within the temporal lobes are essential to many
instead of 10 percent faster than the left-sided tapping rate), important features of memory. In particular, the hippocam-
this would suggest impairment in the left motor strip. pus and the amygdala appear to be involved in various
aspects of memory and learning. Specifically, these brain
sites are involved in the consolidation of short-term memo-
10.1.6: Memory Systems ries into long-term memories. The amygdala may play a
Although the lay public thinks of memory as a single thing, special role in integrating memories from different modali-
psychologists have known for more than a century that ties and, especially, in consolidating memories with strong
there are many types of memory and also several stages of emotional meaning (Andreasen, 2001).
memory (Ebbinghaus, 1885/1913). We can provide only a Humans have both a left hippocampus and right hip-
cursory review here. The importance of reviewing these pocampus (plural: hippocampi), located subcortically within
basic distinctions is that different brain systems may be the left and right temporal lobes. The same is true for the
involved in different kinds of memory. amygdala (plural: amygdalae), which is also a bilateral
As to types of memory, Andreasen (2001) posits the structure. The crucial role of these structures in the consoli-
existence of at least four different polarities of memory: dation of memory was revealed by the case of H.M., a
episodic versus semantic, working versus associative, patient with intractable epilepsy who was treated by the
declarative versus procedural, and explicit versus implicit. surgical removal of the forward section of the temporal
To this list, we would add a fifth dimension: short-term lobe on both sides of his brain (Milner, 1968). Prior to this
versus long-term memory. These dimensions are not com- case, many individuals with epilepsy had been success-
pletely separate and distinct from one another. Episodic fully treated by the removal of the diseased portion of one
memory refers to memory of events or experiences, such temporal lobe. The goal of this kind of surgery is to remove
as recalling that you had oatmeal for breakfast. In contrast, the diseased brain areas that serve as the “trigger” or focus
286 Chapter 10

point for seizure activity. The cognitive consequences of Because of its proximity to and connections with the hypo-
single-sided temporal lobe surgery had proved to be mini- thalamus, the limbic system indirectly exerts autonomic
mal. H.M. was the first carefully studied case of bilateral nervous system control over crucial bodily functions
temporal lobe surgery. needed for continued existence.
The consequences of his surgery were devastating, The hypothalamus is a deceptively small structure that
which was a shocking revelation to everyone involved. Put sits just below and in front of the thalamus. Even though it
simply, H.M. proved incapable of forming any new memo- composes only about 0.3 percent of the brain’s weight, the
ries from the point of the surgery onward (Milner, 1968). hypothalamus is involved in numerous aspects of moti-
His old long-term memories remained intact, so he could vated behavior and bodily regulation: blood pressure, feed-
recall where he attended high school, and so forth. And his ing, sexual behavior, sleep/wake cycle, temperature
short-term memory was intact, so he could remember a regulation, emotional behavior, and movement. Well stud-
phone number briefly, for example. But his ability to con- ied in lower animals, the functions of the hypothalamus are
solidate new long-term memories was completely annihi- less well known in humans (Kolb & Whishaw, 2011). It is
lated. He could read the same magazine from day to day, known that the hypothalamus exerts proprietary control
unaware that he had read it, cover to cover, the day before. over the pituitary gland, thereby modulating a wide range
A new doctor remained a new doctor on each new visit. He of endocrine functions. The most common cause of a hypo-
was essentially a prisoner of the moment, able to converse thalamic lesion is a severe head injury. Hypothalamic
and interact with apparent normality but unable to remem- lesions often lead to disturbances of pituitary function,
ber anything new for more than a few minutes. including excessive or deficient intake of food or water
Structured testing of H.M. confirmed that different and temperature and blood pressure dysregulation
forms of memory are subserved by different brain systems. ­(Kupfermann, 1991a). Dysfunction of the hypothalamus
Consider procedural memory, for example, the recollection also can lead to emotional dysregulation (especially fear or
of how to do something. H.M. was asked to undertake rage) and sleep disturbance (hypersomnolence or insomnia).
repeated trials of mirror drawing—a complex procedural
task in which the examinee traces a path on a sheet of paper 10.1.8: Language Functions
while looking in a mirror. This is a daunting assignment in
which directionality—left and right—are effectively
and Cerebral Lateralization
reversed. With practice, normal individuals typically show Language Functions of the Left Hemisphere
slow improvement, tracing the path more quickly and with Language is primarily (but not exclusively) a left hemi-
fewer errors. Intriguingly, H.M. likewise showed normal sphere function that involves widely separated cortical
improvement on this task from day to day—indicating that and subcortical structures. Because so many regions of the
his procedural memory remained intact—even though he left hemisphere are involved in language, virtually any sig-
had no realization that he had seen the puzzle before nificant left hemisphere lesion will produce some kind of
­(Corkin, 1968). Most likely, this kind of procedural memory disturbance in the production or comprehension of lan-
is subserved by the cerebellum. Clearly, it is not underwrit- guage. For this reason a detailed profile of language skills
ten by the temporal lobes. offers a window to the integrity and functioning of the left
hemisphere.
Yet, we need to keep in mind that virtually any high-
10.1.7: Limbic System level intellectual activity, including language expression
The limbic system is a “primitive” central brain system and comprehension, requires the synthetic interaction of
that is involved in emotions and basic survival drives. This the entire brain. Speech is a case in point. While primarily
system overlaps with other brain sites, especially those subserved by the left hemisphere in most individuals, the
involved in memory. The structures of the limbic system right cerebral hemisphere does provide the intonation pat-
are involved in emotions, such as fear and aggression, as terns for speech. As a result, patients with right-sided
well as in the acquisition of memory. The pleasure centers lesions (particularly in the frontal area) may speak in an
of the brain are located here, too, within the nucleus acum- eerie monotone (Kalat, 2012).
bens. In addition to the hippocampus and amygdala, other Modern conceptions of brain–language correlations
limbic structures are the cingulate gyrus, mammillary bod- actually stem from the late nineteenth century. In 1861,
ies, and the fornix. Andreasen (2001) points out that the Paul Broca observed that damage to a small region just in
exact boundaries of what constitutes the limbic system are front of the motor cortex of the left hemisphere caused a
not well established because our understanding of this language disorder originally called expressive aphasia and
brain system has been steadily growing. now more typically known as nonfluent aphasia. Persons
In evolutionary terms, the limbic system is very old with damage to this left hemisphere premotor area—aptly
and, consequently, involved in primitive survival functions. named Broca’s area—speak in a slow, labored manner.
Neuropsychological Assessment and Screening 287

They have difficulty enunciating words correctly; the act of 2. In Wernicke’s area, the meanings of words are activated
speaking seems to be torturous for them. Speech takes on a and the auditory codes are transported to a subcorti-
frankly telegrammatic nature; adjectives, adverbs, articles, cal bundle of transmission fibers called the arcuate
and conjunctions—the words that add color to speech— ­fasciculus.
frequently are omitted. Writing also is difficult for these 3. The arcuate fasciculus sends the auditory codes directly
persons. Fortunately, persons who experience Broca’s to Broca’s area.
aphasia have little difficulty understanding either spoken
4. Upon reaching Broca’s area, the auditory code activates
or written language. In its pure form, the disorder involves
the corresponding articulatory code that specifies the
expressive language only.
sequence of muscle actions required to pronounce a
In 1874, Wernicke announced that damage to the
word.
upper and rearward portion of the left temporal lobe—a
region now known as Wernicke’s area—was linked to a 5. In turn, the articulatory code is transmitted to the por-
language disorder originally called receptive aphasia and tions of the motor cortex governing tongue, lips, lar-
now more typically known as fluent aphasia. Affected ynx, and so forth in order to produce the desired spo-
individuals appear unable to comprehend spoken or writ- ken word.
ten language. Apparently, persons with Wernicke’s aphasia Comprehending or speaking a written word involves most
have no difficulty perceiving words but cannot associate of the previously outlined pathways, but with a different
the words with their underlying meaning. As a conse- starting point:
quence, the written and verbal expressions of persons with
6. Written words are first registered in the visual cortex,
this aphasia are fluent but meaningless. For example, when
then relayed through the visual association cortex to
asked to define book, a patient might respond, “Book, a
the angular gyrus.
husbelt, a king of prepator, find it in front of a car ready to
be directed.” The same person might define scarecrow as,
7. In the angular gyrus, the visual form of the word is
mapped into the auditory code stored in Wernicke’s
“We’ll call that a three-minute resk witch, you’ll find one in
area, thereby gaining access to the meaning of the
the country in three witches” (Williams, 1979).
written word, which can also be spoken (steps 2
Building on the observations of Broca and Wernicke,
through 5 previously).
Geschwind (1972) proposed a structural, neurological
model of left hemisphere language functions that has been
highly influential in neuropsychological assessment. This
model bears directly upon the assessment of language
skills; the major elements are outlined next and depicted in
Figure 10-2.

Figure 10.2 The Structural Model of Left Hemisphere


Language Functions

Motor Arcuate fasciculus


cortex (subcortical)

Angular gyrus
In practice, few patients reveal aphasic symptoms that
fall neatly into one or another of the preceding categories.
Furthermore, modern conceptions of aphasia point to
weaknesses in the classical model (e.g., its overly simplistic
Broca’s Visual view of the structure of language) and propose a complex,
area cortex
Wernicke’s
nonlinear model of aphasia that is beyond the scope of cov-
area erage here (Bonner, Ash, & Grossman, 2010). Nonetheless,
a thorough assessment of language functions is an essen-
tial part of every neuropsychological evaluation and the
Geschwind postulated the following:
classical model of Broca, Wernicke, and Geschwind pro-
1. Spoken language is perceived in the left auditory cor- vides a useful starting point. Additional perspectives on
tex at the top of the temporal lobe and then transferred aphasia and the structural model of language can be found
to Wernicke’s area. in Benson (1994) and Mayeux and Kandel (1991).
288 Chapter 10

Specialized Functions of the Right Hemi- or midline surface that separates the two cerebral hemi-
sphere Based on thousands of studies of normal and spheres. Each occipital lobe sees the opposite side of the
brain-damaged persons, it is now well established that the visual world. Thus, all visual stimuli to the left of the read-
right hemisphere is dominant for a variety of cognitive and er’s fixation point are ultimately processed in the right
perceptual skills. However, a detailed discussion of spe- occipital lobe, and vice versa. The split visual world is
cialized right hemisphere functions is beyond the scope of shared across the splenium, the rearward portion of the
this section. Competent reviews of the extensive literature corpus callosum, producing a unified perception of the
on this topic can be found in Bradshaw and Mattingley entire visual field. Damage to the primary visual area pro-
(1995), Fonseca, Scherer, de Oliviera, and others (2009), duces a corresponding loss of visual field on the opposite
Springer and Deutsch (1997), and Witelson (2007). In gen- side. For example, an extensive lesion in the left occipital
eral, the right hemisphere appears to be dominant for the lobe would render a person blind to the right half of the
analysis of geometric and visual space, the comprehension visual world. A very small lesion might produce a scotoma
and expression of emotion, the processing of music and or blind spot.
nonverbal environmental sounds, the production of non- The forward portion of each occipital lobe is unimodal
verbal and spatial memories, and the tactual recognition of association cortex. These regions synthesize visual stimuli
complex shapes. and produce meaning from them. This is where the high-
A frequent symptom of right hemisphere damage is level processing of visual information occurs. Damage to
constructional dyspraxia, the impaired ability to deal with the association cortex of the occipital lobes may cause
spatial relationships either in a two- or three-dimensional ­visual agnosia, a difficulty in the recognition of drawings,
framework (Reitan & Wolfson, 1993). This symptom is objects, or faces (Kandel, 1991). Luria (1973) described a
commonly exhibited by an impaired ability to copy simple typical case of a patient with such a lesion:
shapes such as a cross. Left hemisphere lesions can also
The patient carefully examines the picture of a pair of
cause constructional dyspraxia, but the correlation is less
spectacles shown to him. He is confused and does not
consistent. Most neuropsychological test batteries include
know what the picture represents. He starts to guess.
one or more copying tasks to screen for constructional dys-
“There is a circle … and another circle … and a stick … a
praxia. We include a summary of findings on cerebral lat- crossbar … why, it must be a bicycle?”
eralization in Table 10.2.
The visual agnosias are especially linked to right-
Table 10.2 A Summary of Findings on Cerebral sided lesions of occipital association cortex, but may also
Lateralization involve impairment of the parietal and temporal lobes as
well. A particularly dramatic form of visual agnosia is
Functional Left Hemisphere Right Hemisphere
System Dominance Dominance
prosopagnosia, the inability to recognize familiar faces.
Benson (1994) cites the example of a 70-year-old man
Vision Processing of the right Processing of the left
visual field Recognition of visual field Recognition of who suffered a series of strokes affecting the forward
letters, words faces portions of the occipital lobes. The patient’s chief com-
Audition Processing of right ear Processing of left ear plaint was that he could not recognize his wife or his
Processing of language- Processing of music and
related sounds environmental sounds
daughter by sight, although he immediately recognized
Somatosensory Sensory input from the Sensory input from the
them by their voices. In another case of visual agnosia
right side left side known as object agnosia, a patient reproduced a drawing
Movement Motor output to the right Motor output to the left of a train with great skill but had no idea what he had
side Complex voluntary side drawn. Benson (1988) describes the many fascinating
movement, including
speech symptoms of visual agnosia.
Language Speech, reading, writing, Intonation and emotional
and arithmetic patterning to speech
Memory Verbal memory Pictorial memory
10.1.10: Executive Functions
Spatial Analysis of geometric and The executive functions of the brain provide the ability to
processes visual space respond to novel situations in an adaptive manner. Lezak,
Emotion Comprehension and Howieson, and Loring (2004) propose that the executive
expression of emotion
functions consist of four components:
Olfaction Smell in left nostril Smell in right nostril

• Volition
10.1.9: Visual System • Planning
The primary sensory areas for vision are located in the • Purposive action
occipital lobes; much of this projection area is on the mesial • Effective performance
Neuropsychological Assessment and Screening 289

Volition is the capacity for intentional behavior, the with epilepsy. He stimulated different areas of the motor
ability to conceptualize a goal. Planning is the identifica- cortex with a harmless electrical current to map the corre-
tion of the steps needed to achieve the goal. Purposive spondence between cortex and different body parts. Pen-
action is the capacity to take action and sustain it in an field found that those areas of the body requiring precise
orderly manner. Effective performance requires the ability control, such as fingers and mouth, occupy a dispropor-
to monitor one’s activities in light of the original goals and tionately large amount of cortical space.
shift strategies as needed. Thus, executive functions are Just in front of the primary motor cortex is the supple-
implicated in a wide range of cognitive, emotional, and mentary motor cortex. The supplementary motor cortex is
social skills. involved in the serial ordering of complex motor chains,
An intriguing paradox of psychological testing is that that is, movement programming. A portion of the frontal
few instruments are sensitive to impairments of executive lobes just below the supplementary motor cortex is
functions. When provided with the structure of a typical involved in the control of voluntary eye gaze. The left fron-
psychological test, individuals with impaired executive tal lobe also mediates expressive language, discussed in
functions often rise to the occasion and perform well. detail later.
However, in the perplexity of real life, personal function- Damage to the primary motor cortex causes opposite-
ing may reveal catastrophic disability. For example, a suc- sided deficits in fine motor control and also reduces the
cessful financial planner who sustained a brain injury speed and strength of limb movements. These effects are
easily detected with simple motor tests such as finger-­
… can no longer formulate plans well because of an ina- tapping speed. Severe damage to the motor cortex causes
bility to take all aspects of a situation into account and total paralysis of the affected bodily parts. Damage to the
integrate them. This disability is further aggravated by
supplementary motor cortex causes deficits in the execu-
his lack of awareness of his mistakes. Problems occa-
tion of motor sequences such as copying a series of arm or
sioned by the man’s emotional lability and proneness to
facial movements (Kolb & Milner, 1981).
irritability are overshadowed by the crises resulting from
his efforts to carry out inappropriate and sometimes The most common cause of frontal lobe damage is
financially hazardous plans. closed head injury, which is one type of traumatic brain
(Lezak, 1995, p. 650) injury. In a closed head injury, acceleration/deceleration
forces are instantly applied to the entire brain, as when a
Yet, cognitive test scores for this individual—and oth- person’s head strikes the dashboard in an automobile acci-
ers like him with impaired executive functions—might dent. Because of the irregular surfaces of the surrounding
well be normal. skull, the forward underside surfaces of the frontal lobes
Executive functions are substantially but not exclu- are almost always damaged (Jennett & Teasdale, 1981). The
sively underwritten by the frontal lobes. Although it is true front ends of the temporal lobes also are highly vulnerable
that disturbances in executive functions can arise from a in closed head injury.
variety of neurological conditions that involve diverse Nauta (1971) summarizes the effects of frontal lobe
brain sites, in the vast majority of cases damage to the fron- dysfunction as a “derangement of behavioral program-
tal lobes is implicated. It is with the frontal lobes that ming.”
humans create intentions, form plans, and regulate their What are the different kinds of behavioral disturbances
behavior by comparing the effects of their actions with resulting from generalized, bilateral frontal lobe damage?
their original intentions. In short, the frontal lobes are
essential for the programming, regulation, verification,
and motor performance of executive functions.
Enacting a plan requires a bodily movement of some
kind. People pursue their goals by physically manipulat-
ing the environment, whether with their hands or through
the motor activity of speech. It is not surprising, then, to
find that the primary motor cortex is located in the frontal
lobes—where plans and intentions are also formed.
The primary motor cortex is found on the precentral
gyrus, at the rear of the frontal lobe, just in front of the cen-
tral sulcus. Motor control is opposite-sided, with the left
motor cortex controlling bodily movements on the right,
and vice versa. The topical organization of the motor strip
was first mapped by Penfield (1958) during a series of
operations to remove damaged cortical tissue in persons
290 Chapter 10

Curiously, frontal lobe lesions may have little effect on neurological problems encountered in adulthood and old
old learning and well-established skills. Both Hebb and age. Because neuropsychological tests excel in the evalua-
Penfield reported that surgical removal of frontal lobe tis- tion of these syndromes, a brief survey will provide an
sue caused little change in IQ scores (Hebb, 1939; Penfield important backdrop to the selected instruments discussed
& Evans, 1935). Early studies of prefrontal lobotomy dem- in the second half of the chapter.
onstrated much the same finding: no change in IQ or even
Traumatic Brain Injury Traumatic brain injury or
a slight improvement after disconnection of the frontal
TBI is an inclusive term that encompasses everything
lobes.
from a “mild” concussion to severe brain injury (Silver,
Devising adequate measures of frontal lobe function
McAllister, & Yudofsky, 2011). TBI is most commonly the
has proved to be difficult. Lezak et al. (2004) note that fron-
consequence of a blow to the head, and concussion is prob-
tal lobe disorders change how a person responds, whereas
ably the most common form of TBI. The classic example of
most tests measure what a person knows. Lezak (1982) has
a concussion is the football player who receives a hard hit
devised an ingenious method called the Tinkertoy® Test,
(“sees stars”), is rendered briefly unconscious and immo-
discussed in the next topic, to assess the programming dif-
bile, and then gradually walks off the field with assistance.
ficulties experienced by persons with frontal lobe lesions.
Within hours or a few days, he is back to normal. The
More commonly, clinicians rely upon observation and
symptoms of concussion include a brief loss of conscious-
checklists to diagnose frontal lobe dysfunction. A generic
ness followed by a low-grade headache, difficulty concen-
example of a checklist for executive functions is provided
trating, fatigue, irritability, and other emotional symptoms.
in Figure 10-3.
Although some concussions can have serious, lasting
effects, most patients appear to make a full recovery in a
few days or weeks. A concussion is one example of a closed
Figure 10.3 Example of a Structured Checklist for the
Assessment of Executive Functions head injury (CHI)—a trauma to the head and brain in
which the skull remains intact. But closed head injury is a
Awareness broader term than concussion and potentially signifies a
Is unaware of 1 2 3 4 5 Has insight into greater level of impairment than typically found in a con-
limitations  limitations
cussion. Closed head injury is often contrasted with open
Goal Selection head injury or OHI—a trauma to the head and brain in
Sets no goals 1 2 3 4 5 Sets suitable which the skull is penetrated. OHI is also known as pene-
  long-term goals
trating head injury. Typically, the consequences of OHI are
Logical Analysis
focal or localized in and near the site of impact, whereas
Is disorganized 1 2 3 4 5 Plans thoughtfully
the effects of CHI are more diffuse, affecting areas through-
Action Orientation
out the brain.
Needs prompting 1 2 3 4 5 Takes decisive action
The neurological consequences of TBI depend upon
Self-Monitoring the nature and severity of the injury, but any or all of the
Is unable to identify 1 2 3 4 5 Detects and corrects
errors  mistakes
following are possible:

Impulse Control • a contusion or bruising of the brain underneath the


Is highly impulsive 1 2 3 4 5 Thinks before acting site of impact known as a coup injury
Flexibility • a contusion opposite the side of the impact, caused by
Is inflexible in approach 1 2 3 4 5 Learns from feedback
rebound, and known as a contrecoup injury
1 = profoundly deficient
2 = severely deficient
• frequent contusions in the undersurfaces of the frontal
3 = moderately deficient lobes and the tips of the temporal lobes because of the
4 = mildly deficient bony skull protrusions located there
5 = normal
• diffuse axonal injury or nonspecific brain cell damage
from shear-strain effects on neural pathways
• brain tissue damage due to obstructed blood flow
10.1.11: Neuropathology of when cerebral arteries are ruptured
Adulthood and Aging • hematoma or bleeding into the brain between the skull
Although most individuals age gracefully and maintain and the surface of the brain
good health into old age, an unfortunate minority experi- • edema or swelling of the brain, which can lead to sec-
ence one or more neurological syndromes such as brain ondary brain damage
injury, dementia, or Parkinson’s disease. In this section we • in the long term, possible shrinkage of the brain and
provide a brief synopsis of a number of more common enlargement of the ventricular system
Neuropsychological Assessment and Screening 291

As to the neurobehavioral effects of TBI, the most com- Brain tumors produce a variety of effects, depending
mon and reliable complaints are of concentration and upon their location, size, and rate of growth. A rapidly
memory problems. This is why tests of concentration and infiltrating tumor such as a glioma quickly may compro-
memory are found in virtually every test battery used in mise many skills. For example, if the tumor is on the left
neuropsychological assessment. Other generalizations side of the brain, motor and sensory functions on the right
about TBI are difficult because the nature and severity of side of the body may be severely impacted, as well as lan-
the brain damage will not be the same in any two patients. guage and problem-solving abilities. If the tumor is on the
Focal damage may lead to specific symptoms (e.g., damage right side of the brain, constructional abilities (e.g., draw-
to the left hemisphere language areas may cause expres- ing, assembling three-dimensional objects) will be
sive aphasia). Many studies suggest that TBI patients are impaired as well as motor and sensory functions on the
more seriously handicapped by personality and emotional left side. A slower-growing meningioma may produce no
disturbances than by cognitive and physical disabilities symptoms for years and then create focal symptoms that
(Lezak & O’Brien, 1990). relate to the site of encroachment on the brain. For exam-
Modern warfare constitutes a major source of TBI ple, if the right parietal area is affected, deficits in spatial
cases. Beginning just after the stunning and devastating ability may be observed.
attacks of September 11, 2001, more than two million U.S.
troops have been deployed to Afghanistan and Iraq. Chronic Alcohol Abuse Chronic alcohol ingestion
Almost half of these soldiers have been deployed more leads to neuronal changes that include a loss of dendritic
than once, totaling in excess of three million tours of duty branches and dendritic spines, especially in areas important
(Marine Corps Times, December 18, 2009). In these contem- for memory such as the hippocampus. Over time, enlarge-
porary war theatres, blast injuries from roadside bombs ment of the ventricles and widening of the cerebral sulci
known as improvised explosive devices (IEDs) comprise a also are observed. In severe cases, atrophy of the medial
common source of TBI. The detonation of an IED produces thalamus and mamillary bodies is found, leading to the
a pressure shock wave that reverberates through the brain pronounced memory problems that characterize Wernicke-
and body, often causing neuronal changes that include dif- Korsakoff’s syndrome (Davila, Shear, Lane, Sullivan, &
fuse axonal injury. TBI from these deadly devices is recog- Pfefferbaum, 1994). The neuropathology of alcoholism
nized as the “signature injury” of the wars in Afghanistan often is exacerbated by vitamin and nutritional deficiencies.
and Iraq (Dixon, 2011). Even a “mild” blast can produce In those tragic cases of severe alcohol abuse in which
subtle deficits that are difficult to detect and measure. the medial thalamus and mamillary bodies are damaged,
The prevalence of troop exposure to IED blasts is not the profound anterograde amnesia of Wernicke-Korsa-
well appreciated by the public. In a study of 2,525 U.S. koff’s syndrome is noted. Patients show an inability to
Army infantry soldiers conducted three to four months retain memory of events for more than a short time even
after a year-long deployment to Iraq (Hoge, McGurk, though immediate memory is intact and remote memory is
Thomas, and others, 2008), fully 62 percent of the sample only mildly impaired. The falsification of memory known
reported that an IED had exploded near them on two or more as confabulation, in the presence of clear consciousness, is
occasions! From the large subsample of IED-exposed sol- noted. Other symptoms of severe abuse include gait dis-
diers (N = 1,556), 7 percent reported an injury with loss of turbance and gaze difficulties. In neurologically intact alco-
consciousness, 15 percent told of injury with altered mental holics, neurobehavioral effects are more elusive and
status, and 18 percent reported other injury. Emotional and controversial but may include subtle memory deficits and
health consequences likewise were common, with many difficulties with novel problem solving (e.g., Waugh,
troops demonstrating Post-Traumatic Stress Disorder ­Jackson, Fox, Hawke, & Tuck, 1989).
(PTSD), depression, and health problems such as stomach Recent research indicates that the brain changes and
pain, headache, fatigue, and sleep disturbance. Overall, 15 neurocognitive impairments caused by prolonged alcohol
percent of the original sample met the criteria for mild TBI abuse can be partially reversed. A common problem
(mTBI). The presence of mTBI was especially correlated observed in chronic alcoholics is, literally, shrinkage of
with IED blasts that caused a loss of consciousness. brain tissue and enlargement of the ventricles. The ventri-
cles are fluid-filled caverns at the center of the brain. The
Neoplastic Disease (Tumor) Neoplastic disease or relationship is linear, with greater alcohol intake predicting
brain tumor encompasses many different forms of tumorous greater brain shrinkage and larger ventricular enlargement
growth (Reitan & Wolfson, 1993). For example, gliomas are (Anstey, Jorm, Reglade-Méslin, and others, 2006). Using
tendril-like tumors of the glial cells that infiltrate the brain sophisticated imaging techniques, Bartsch, Homola, Biller,
over a period of weeks or months; meningiomas are slower- and others (2007) studied longitudinal changes in brain
growing, globular-shaped tumors of the meninges (mem- volume in 15 alcoholics and 10 matched controls. After 6–7
branes encasing the brain) that press down upon the brain. weeks of abstinence, the alcoholics revealed a 2 percent
292 Chapter 10

gain in volume of brain tissue, compared to no change known as neuritic plaques and neurofibrillary tangles
among the controls. While a 2 percent improvement may (Koss, 1994). Additional brain changes include neuronal
not seem like much, it could foretell even more dramatic loss, shrinkage or atrophy of the brain, depletion of acetyl-
gains with long-term abstinence. The common metric choline neurotransmitters involved in memory, and accu-
among substance abuse professionals is that full cognitive mulation of foreign deposits in the cerebral vasculature;
recovery takes at least a year. In the Bartsch et al. study the course of the disease invariably is downhill. First
(2007), pretest versus posttest scores on the d2-test, a meas- described in 1907, Alois Alzheimer portrayed his initial
ure of attention and concentration, also improved in the case as follows:
recovery group but showed no change in the control group.
The first noticeable symptom of illness shown by this
Several other studies confirm improvement in neuropsy- 51-year-old woman was suspiciousness of her husband.
chological test results after abstinence in recovering alco- Soon, a rapidly increasing memory impairment became
holics, as summarized by Walker (2006). evident; she could no longer orient herself in her own
dwelling, dragged objects here and there and hid them,
Normal Pressure Hydrocephalus Hydrocepha-
and at times, believing that people were out to murder
lus is a build-up of cerebral spinal fluid (CSF) inside the
her, started to scream loudly. On observation at the
skull, which causes brain swelling. In normal pressure
institution, her entire demeanor bears the stamp of utter
hydrocephalus (NPH), which mainly affects individuals bewilderment. She is completely disoriented to time
aged 60 or older, there is an increase in CSF, but the pres- and place.
sure of the fluid remains normal. Even so, brain function is (La Rue, 1992)
affected, leading to a classic triad of symptoms: gait ataxia,
Although Alzheimer’s disease is not part of normal
incontinence, and dementia. Conn (2011) describes his own
aging, advanced age is an important risk factor. Rare
case of NPH from a unique perspective (he is a physician)
before age 65, the disease afflicts 3 percent of persons 65 to
and suggests that many cases of dementia caused by NPH
74 years of age, 18 percent of persons 75 to 84 years of age,
are misdiagnosed with potentially tragic consequences.
and nearly half of those 85 years and older (Evans,
NPH is highly treatable, whereas other forms of dementia
Funkenstein, Albert, and others, 1989). Symptoms and
resist intervention. His story is a warning against compla-
examples suggestive of Alzheimer’s disease are listed in
cency and fatalism among health care workers who deal
Table 10.3. These examples characterize other forms of
with assessment and diagnosis, including psychologists.
dementia as well.
His case of NPH
… began in about 1992 as a trivial abnormality of gait that
was misdiagnosed as Parkinson’s disease (PD). Over the
Table 10.3 General Symptoms and Specific Examples
Suggestive of Alzheimer’s Disease
next 10 years, during which I was being unsuccessfully
treated with dopaminergic drugs for PD, the illness grad- Significant memory problems that extend beyond benign forgetfulness
ually progressed until I could barely walk with a walking Fails to recall what was eaten for breakfast
frame, had become incontinent of urine and, sometimes, Difficulty with everyday tasks and commonplace activities
faeces and began to show signs of cognitive loss. In the No longer balances the checkbook, prepares the same meal
process of obtaining a motorised wheelchair I was Loss of orientation to date, time and/or place
referred to a younger neurologist who recognised that I
Significantly off as to date or time, loses the way going home
had run the whole classic course of NPH, a disease of
Gradual and insidious onset
which I had never heard. I had a ventriculoperitoneal
Onset is hard to identify, problem is recognized in retrospect
shunt (VPS) implanted in 2003 and was miraculously
restored virtually to normal (p. 162). Language and word finding difficulties
Conversation characterized by circumlocution and vagueness
A VPS shunt is a catheter extending beneath the skin
Problems with abstract thinking
from the ventricles of the brain to the abdominal cavity,
Difficulty following the rules of simple card games
allowing excess CSF to drain off.
Deterioration of social judgment
The prevalence of NPH is difficult to ascertain because
Dresses inappropriately, neglects personal hygiene
it resembles other forms of diffuse dementia. Many cases
Misplaces or loses important items
likely are overlooked. Based on his evaluation of published
Car keys disappear, eyeglasses are found in a kitchen drawer
studies, Conn (2011) estimates that 1 percent of the popula-
Changes in Personality:
tion will develop NPH by the age of 80.
Onset of suspiciousness, periods of agitation, mood changes
Alzheimer’s Disease The most common degenera- Loss of Initiative
tive neurological disease is Alzheimer’s disease (AD), Absence of self-initiation, needs prompting to become involved
which features an insidious degeneration of the brain. The NOTE: These examples characterize other forms of dementia as well.
pathophysiology includes clumplike deposits in the brain Source: A synthesis based on Alzheimer’s disease websites.
Neuropsychological Assessment and Screening 293

As detailed by Storandt and Hill (1989), difficulty with reveal a deficit on neuropsychological tests requiring
the acquisition of new information (short-term memory speed (e.g., Digit Symbol, Trail Making, reaction time
dysfunction) is generally the most salient symptom in the measures). Surprisingly, tests of visual discrimination
early stages. As the disease progresses, patients may also and paired-associate learning—which do not require
show a prominent language dysfunction (e.g., pronounced speed—also differentiate patients with moderate to
word finding difficulty) or a striking visuospatial distur- severe PD from matched controls (Pirozzolo, Hansch,
bance. Reports of personality change, including delusions Mortimer, Webster, & Kuskowski, 1982). About 40 to 60
and agitation, also are common. The late stages are charac- percent of PD patients also experience depression (La
terized by severe, pervasive disability. Rue, 1992).

Vascular Dementia (Stroke) The second most


common cause of dementia in the elderly is vascular 10.1.12: Behavioral Assessment
dementia, caused by blockage of an artery and subse- of Neuropathology
quent death of brain tissue due to insufficient blood sup-
Psychological testing can be essential in the evaluation of
ply (infarction) or bleeding into or around the brain
neuropathology, as we will see in the next topic. Yet, it is
(hemorrhage). Sudden onset is the rule, but the accumu-
easy for psychologists to become enamored of tests and to
lation of small strokes over time, known as multi-infarct
overlook the value of simple observation, interview, and
dementia (MID), may produce an apparently progressive
behavioral evaluation. In medicine, the field of behavioral
disorder. The Hachinski Ischemic Score was developed to
neurology has recognized the merit of these straightfor-
distinguish multi-infarct dementia from Alzheimer’s dis-
ward approaches for at least 150 years, dating back to the
ease (Hachinski, Iliff, Zilha, and others, 1975). Using this
pioneering observations of Paul Broca and Carl Wernicke
index, MID is indicated by the presence of several of the
on syndromes of aphasia (Pincus & Tucker, 2003). Psychol-
following factors: abrupt onset, somatic complaints, step-
ogists make use of this long-established tradition when
wise deterioration, emotional incontinence, fluctuating
they conduct a mental status examination at the beginning
course, history of hypertension, nocturnal confusion, his-
of assessment (Sonne, 2012).
tory of strokes, personality preserved, atherosclerosis
present, depression, and focal neurological signs. Because
Assessment of Mental Status The mental status
MID may be treatable to some degree, the differential
examination (MSE) is a loosely structured interview that
diagnosis of MID versus Alzheimer ’s disease is more
usually precedes other forms of psychological and medi-
than academic.
cal assessment. The purpose of the evaluation is to pro-
The stroke syndrome is defined by the acute onset of a
vide an accurate description of the patient’s functioning
focal deficit involving the central nervous system. The spe-
in the realms of orientation, memory, thought, feeling,
cific symptoms depend upon the site of infarction but may
and judgment. The MSE is the psychological equivalent
include motor weakness and impaired sensibility in the
of the general physical examination: Just as the physician
limbs on the opposite side; nonfluent aphasia if the domi-
reviews all the major organ systems, looking for evidence
nant hemisphere is affected; partial loss of the visual field if
of disease, the psychologist reviews the major categories
the stroke occurs in the rear of the brain. The acute symp-
of personal and intellectual functioning, looking for signs
toms of stroke often subside in some measure and lead to a
and symptoms of psychopathology (Gregory, 1999).
plateau of stable functioning.
Although there is some latitude as to the scope of the
Parkinson’s Disease (PD) Parkinson’s disease (PD) MSE, certain mental functions are almost always investi-
is almost nonexistent before age 40 and affects only 1 or 2 gated. A typical evaluation touches upon the areas listed
in 1,000 persons ages 70 and over (La Rue, 1992). Primarily in Table 10.4.
identified as a movement disorder, cognitive and emo- Some of the elements in this list can be assessed with
tional problems are common in PD. In fact, the late stages short screening tests. In particular, cognition, memory, and
of PD may entail a clear dementia. The symptoms include orientation are intellectual functions that can be tested in a
slowness of movement (bradykinesia), tremor at rest, shuf- formal, structured manner (Hodges, 1994). These measures
fling gait, and postural rigidity. The neuropathology are most commonly used in the mental status evaluation of
includes depletion of dopamine and neuron loss in the the elderly, especially when the client appears to have a
basal ganglia. dementia such as Alzheimer’s disease, as discussed later in
Tremor is the most common and the least debilitat- this chapter. Formal tests of mental status are also helpful
ing early symptom in PD. The rate of progression is quite in the assessment of certain brain-impairing conditions
variable, but movement disability in PD can become pro- such as head injury, schizophrenia, severe depression, and
nounced and lead to confinement; 10 to 20 percent of PD drug-induced delirium. It is important to emphasize that
patients develop a clear dementia. Patients with PD screening tests are supplementary—they do not replace
294 Chapter 10

completing the BPAD, the informant rates the client on 78


Table 10.4 Major Areas of a Typical Mental Status Exam items Within the past four weeks (current), and also five years
Appearance and Behavior ago (past). Items are rated on a four-point scale. The BPAD
Grooming assesses the symptoms for each of the two time periods
Facial expressions (current and past) and also computes a change score. The
Gross motor behavior change score reflects changes in mood and behavior that
Eye contact might signal the onset of dementia. Thus, three sets of
Speech and Communication Processes scores emerge: Current, Past, and Change.
Speech content, rate, tone, volume For each of the three sets of scores, the BPAD yields a
Word difficulty, confusion, misuse total score and seven domain scores. All scores are reported
Thought Content as T-scores with a mean of 50 and standard deviation of 10,
Logic, clarity, appropriateness
relative to the standardization sample. The test was stand-
Delusions
ardized and validated on a large sample of men and
women 30 to 90 years of age. The sample was matched to
Cognitive and Memory Functioning
U.S. census proportions in regard to racial/ethnic makeup,
Calculating ability
educational backgrou nds, and geographic regions.
Immediate recall
The seven domains of the test are grouped into three
Recent and remote memory
clusters, as follows:
Fund of information
Abstracting ability Psychopathological Symptom Cluster
Emotional Functioning Perceptual Delusions
Predominant mood Positive Mood/Anxiety
Appropriateness of affect Negative Mood/Anxiety
Insight and Judgment Behavioral Symptom Cluster
Awareness of problems Aggressive
Orientation Perseverative/Rigid
Day, date, time, location Disinhibited
Source: Based on Gregory, R. J. (1999). Foundations of intellectual assessment: The Biological Symptom Cluster
WAIS-III and other tests in clinical practice. Boston: Allyn and Bacon.
Biological Rhythms

clinical judgment in the evaluation of mental status. Some The instrument also yields a total score based on the
areas covered by the MSE are simply impossible to quan- sum of all seven domains. The BPAD items are at a grade 6
tify. For example, the evaluation of a patient’s insight reading level. The test can be used in a variety of settings
requires keen observation and sensitive interviewing skills. (inpatient, outpatient, assisted living) with patients sus-
An MSE screening test for insight does not exist. pected of having Alzheimer’s disease, vascular dementia,
and psychiatric problems.
Behavioral Rating Scales Another approach in The BPAD is a promising test, but there is scant valid-
the behavioral tradition is to utilize observations from per- ity research at this time. Certainly the domains exemplify
sons familiar with the patient, such as a spouse, parent, good content validity, insofar as they overlap with the con-
close friend, or caretaker. Asking them questions about the sensus of experts on the behavioral and psychological
patient is a good starting point. But a more efficient method symptoms of dementia. For example, a prominent interna-
is to employ a relevant behavior rating scale tied to the spe- tional group provides the following authoritative state-
cific behaviors of the individual. This allows for reliable ment on the behavioral manifestations of dementia:
assessment and provides access to normative data. Hun-
dreds of behaviorally based scales exist (Tate, 2010). These Behavioral symptoms: Usually identified on the basis of
can be broad-spectrum (such as establishing the likelihood observation of the patient, including physical aggres-
of dementia) or narrow in focus (such as verifying the pres- sion, screaming, restlessness, agitation, wandering,
ence of the syndrome of disinhibition). For purposes of culturally inappropriate behaviors, sexual disinhibi-
illustration, we will summarize two instruments here, one tion, hoarding, cursing and shadowing.
for the evaluation of dementia in general, and another for Psychological symptoms: Usually and mainly assessed
the appraisal of specific frontal lobe syndromes. on the basis of interviews with patients and relatives;
The Behavioral and Psychological Assessment of these symptoms include anxiety, depressive mood,
Dementia (BPAD) is a proxy-report rating scale designed hallucinations and delusions. A psychosis of Alzhei-
to assess dementia-related changes in behavior among mer’s disease has been accepted since the 1999 confer-
adults 30 years of age and older (Schmidt & Gallo, 2007). In ence (International Psychogeriatric Association, 2002).
Neuropsychological Assessment and Screening 295

Although terminology is not identical, the BPAD


domains possess a clear commonality with the above
10.2: Neuropsychological
description of dementia.
A test that embodies a more specific application is the
Tests, Batteries, and
Frontal Systems Behavior Scale (FrSBe) (Grace & Malloy,
2001). The purpose of this instrument is to provide a
Screening Tools
behaviorally oriented assessment of three frontal lobe 10.2 Review prominent neuropsychological
syndromes: apathy, disinhibition, and executive dysfunc- instruments, test batteries, and screening tools
tion. The scale consists of 46 items rated on a 5-point Lik- The purpose of this topic is to review a diverse collection of
ert scale by either the patient or a family member. Results neuropsychological tests, batteries, and screening tools. We
from a family member are considered more reliable and focus here on representative tests, prominent batteries, and
valid. Items are written at a 6th grade level. Separate useful screening tools, recognizing that comprehensive
norms are provided for the patient and family form. The coverage is well beyond the scope of the book. For a com-
scale also attempts to quantify behavioral changes over plete treatment of neuropsychological assessment, the
time by including a baseline (retrospective) and a current reader is referred to the authoritative tome amassed by
assessment. A highly desirable feature of the form is that Lezak, Howieson, Bigler, and Tranel (2012), which runs to
it takes only 10 minutes to administer and 10–15 minutes an amazing 1,200 pages in length. By necessity, the cover-
to score. age here is more discerning and emphasizes better-known
The subscales include Apathy (14 items), Disinhibi- tests and batteries.
tion (15 items), and Executive Dysfunction (17 items), Neuropsychologists and other clinicians often encoun-
which are reported as T-scores (mean of 50, SD of 10) ter clients who struggle with alcoholism or other types of
derived from a community-based sample of 436 men and substance abuse. For this reason, we also review a few sim-
women with two levels of education. Comparison data ple but practical tools for rapid screening of clients with
also are provided for several clinical groups: frontotempo- possible alcohol problems. This issue is vital because, at
ral dementia, frontal lesions, nonfrontal stroke, head any given time, 10 percent of the adult population mani-
injury, Alzheimer ’s disease, Parkinson’s disease, and fests an alcohol disorder (Yalisove, 2004). Although it
Huntington’s disease. might appear a straightforward matter to identify patients
The construct validity of the FrSBe is firmly upheld with alcohol problems—just ask them how much and how
by an exploratory factor analytic study of results for 324 often they drink—in reality this is a vexing diagnostic chal-
neurological patients and research participants, the lenge due to the active façade of denial maintained by most
majority diagnosed with neurodegenerative disorders alcoholics. However, a number of screening tools summa-
such as Alzheimer’s, Parkinson’s, and Huntington’s dis- rized later are useful for this task.
ease (Stout, Ready, Grace, Malloy, & Paulsen, 2006). The Finally, it is important to emphasize that neuropsycho-
three-factor solution revealed that 83 percent of the items logical assessment involves more than the administration
from the Apathy, Disinhibition, and Executive Dysfunc- and scoring of specialized tests and screening tools. An
tion scales loaded prominently on the corresponding fac- essential component of any assessment is the evaluation of
tors from the analysis. These results highly support the a client’s mental status. This is particularly true with
utility of the scale in assessment of the three frontal elderly clients who may experience Alzheimer’s disease or
­syndromes. other forms of dementia. Accordingly, we close this chap-
In a study of 66 individuals with a history of traumatic ter with a focus upon mental status assessment in the
brain injury, Reid-Arndt, Nehl, and Hinkebein (2007) elderly. In this concluding topic, we pay special attention
found that the FrSBe was a better predictor of community to the Mini-Mental Status Exam (Tombaugh, McDowell,
integration than neuropsychological tests. Mendez, Licht, Kristjansson, & Hubley, 1996), one of the most widely used
and Saul (2008) reported that the scale differentiates screening tools in existence.
patients with frontotemporal dementia (FTD) from those Neuropsychological tests and procedures encompass
with Alzheimer’s disease (AD) and vascular dementia an eclectic assortment of methods and purposes. At one
(VaD). Specifically, the FTD patients had significantly end of the spectrum are simple, 10-minute screening tests
greater scores on Disinhibition than the AD patients and used to probe the need for further assessment. At the other
the VaD patients. Chiaravalloti and DeLuca (2003) testify end of the spectrum are exhaustive, six-hour test batteries
that the FrSBe is sensitive to the behavioral changes designed to provide a comprehensive assessment. In
observed in patients with Multiple Sclerosis. In sum, this between are hundreds of specialized instruments devel-
simple, brief scale is an excellent measure for use with oped to measure particular neuropsychological abilities.
patients who display frontal lobe manifestations related to At first glance, this multitude of tests would appear to
a variety of neurodegenerative disorders. resist simple categorization, as if researchers in this area
296 Chapter 10

had followed an incoherent philosophy of trial and error in Reasoning


the development of new instruments and procedures. Planning
However, with closer scrutiny it is evident that most neu- Flexibility of thinking
ropsychological tests fit within a simple, logical model of 7. Motor output
brain–behavior relationships. We will use this model as a
The order of the categories listed corresponds roughly
framework for discussing well-known neuropsychological
to the order in which incoming information is analyzed by
tests and procedures.
the brain in preparation for a response or motor output.
In the remainder of this topic, the discussion of neu-
10.2.1: A Conceptual Model of ropsychological tests and procedures is organized around
Brain–Behavior Relationships these seven categories. Within each category we will
Bennett (1988) has proposed a simplified model of brain– review established tests and also introduce new instru-
behavior relationships that is helpful in organizing the ments that show promise of extending the horizons of neu-
seemingly chaotic profusion of neuropsychological tests ropsychological assessment. However, the reader needs to
(Figure 10-4). His conceptualization is a slight expansion of know that neuropsychological assessment commonly
the model presented by Reitan and Wolfson (1993). Accord- involves a battery of tests. One approach is flexible or
ing to this view, each neuropsychological test or procedure patient-centered testing in which an individualized test
evaluates one or more of the following categories: battery is fashioned for each client. These batteries are
based upon the presenting complaints, referral issues, and
1. Sensory input an initial assessment (Kane, 1991; Larrabee, 2008). More
2. Attention and concentration typically, neuropsychologists employ a fixed battery of
3. Learning and memory tests for most referrals. One of the most widely used fixed
batteries, the Halstead-Reitan Neuropsychological Battery,
4. Language
is outlined in Table 10.5. Even though the HRNB is an old
5. Spatial and manipulatory ability test—the elements of the battery have not been changed
6. Executive functions: since its inception in the 1950s—many neuropsychologists
Logical analysis still regard this battery as the “gold standard” in the field
Concept formation (Horton, 2008; Sweeney, et al., 2007). In large measure, this
is because of the steadily accumulating body of affirming
research on the battery, which includes 267 publications by
Figure 10.4 Conceptual Model of Brain Behavior its developer, Ralph Reitan, and literally hundreds of addi-
Relationships tional articles from the dozens of neuropsychologists men-
Source: Adapted with permission from Reitan and Wolfson (1993). tored by him. Yet, the HRNB is not without competition.
The chapter closes with a presentation of two other batter-
Sensory Reception ies, namely, the Neuropsychological Assessment Battery
and the Luria-Nebraska Neuropsychological Battery.

Attention and
Concentration
10.2.2: Assessment of Sensory Input
The accuracy of sensory input is crucial to the proficiency
of perception, thought, plans, and action. An individual
Memory and
Learning
who does not see stimuli correctly, hear sounds accurately,
or process touch reliably may encounter additional handi-
caps at higher levels of perception and cognition. Neu-
ropsychological assessment always incorporates a
Left Hemisphere: Right Hemisphere:
Language, Linear Visuospatial, Holistic
multimodal examination of sensory capacities.
Thinking Thinking
Sensory-Perceptual Exam The procedures devel-
oped by Reitan and Klove are entirely typical of sensory-
perceptual procedures (Reitan, 1984, 1985). The
Executive Functions:
Organization and Regulation
Reitan-Klove Sensory-Perceptual Examination consists of
of Goal-Directed Behavior several methods for delivering unilateral and bilateral
stimulation in the modalities of touch, hearing, and vision.
The tasks are so simple that normal persons seldom make
Motor Output
any errors at all. For example, the examinee is asked to say
Neuropsychological Assessment and Screening 297

Finger Localization Test Finger localization is a


Table 10.5 Tests and Procedures of the Halstead-Reitan venerable procedure developed by neurologists to evalu-
Test Battery
ate possible sensory losses caused by impairment of brain
Test Description functions. Most neuropsychological test batteries employ a
Category Test* Measures abstract reasoning and concept formation; variant of this test, in which examinees must identify those
requires examinee to find the rule for categorizing fingers that have been touched (without benefit of sight).
pictures of geometric shapes
Benton has developed a well-normed 60-item test of finger
Tactual Measures kinesthetic and sensorimotor ability; requires
Performance blindfolded examinee to place blocks in appropriate localization that consists of three parts: (1) with the hand
Test* cutout on an upright board with dominant hand, then visible, identifying single fingers touched by the examiner
nondominant hand, then both hands; also tests for
incidental memory of blocks
with the pointed end of a pencil (10 trials each hand);
(2) with the hand hidden from view, identifying single fin-
Speech Sounds Measures attention and auditory-visual synthesis;
Perception Test* requires examinee to pick from four choices the written gers touched by the examiner (10 trials each hand); (3) with
version of taped nonsense words the hand hidden from view, identifying pairs of fingers
Seashore Measures attention and auditory perception; requires simultaneously touched by the examiner (10 trials each
Rhythm Test* examinee to indicate whether paired musical rhythms
are same or different hand). The method of response is left to the patient: nam-
Finger Tapping Measures motor speed; requires examinee to tap a ing, touching, or pointing to fingers on a diagram (Benton,
Test* telegraph keylike lever as quickly as possible for 10 Sivan, Hamsher, Varney, & Spreen, 1994). Each stimulus
seconds
presentation is scored right or wrong, and normal adults
Grip Strength Measures grip strength with dynamometer; requires
examinee to squeeze as hard as possible; separate
typically make very few errors in the 60 trials. Mean scores
trials with each hand for normal adults are near perfect, ranging from 56 to 60 in
Trail Making, Measures scanning ability, mental flexibility, and various samples. In contrast, patients with brain disease
parts A, B speed; requires examinee to connect numbers (part A) find finger localization to be a challenging task, particu-
or numbers and letters in alternating order (part B) with
a pencil line under pressure of time larly on the second and third parts of the test.
Tactile Form Measures sensory-perceptual ability; requires
Recognition examinee to recognize simple shapes (e.g., triangle)
placed in the palm of the hand 10.2.3: Measures of Attention
Sensory- Measures sensory-perceptual ability; requires examinee and Concentration
Perceptual to respond to simple bilateral sensory tasks, e.g.,
Exam detecting which finger has been touched, which ear The attentional capacity of the brain makes it possible to
has received a brief sound; assesses the visual fields
attend to meaningful stimuli, screen irrelevant sensory
Aphasia Measures expressive and receptive language abilities; input from the profusion of incoming stimuli, and flexibly
Screening Test tasks include naming a pictured item (e.g., fork)
repeating short phrases; copying tasks (not a measure shift to alternative stimuli when conditions demand it
of aphasia) included here for historical reasons (Kinsbourne, 1994). While in theory it might be possible to
Supplementary WAIS-III, WRAT-3, MMPI-2, memory tests such as make subtle distinctions between simple attention, concen-
Wechsler Memory Scale-III or Rey Auditory Verbal
Learning Test tration, mental shifting, mental tracking, vigilance, and
*Strictly speaking, these five measures constitute the Halstead-Reitan Test Battery. However,
other variants of attention/concentration, in practice these
in common parlance reference to the Halstead-Reitan includes all of the measures listed in skills are difficult to separate. Only one attentional meas-
the table.
ure—the Test of Everyday Attention (TEA)—has succeeded
in partitioning attention into its component sources. We
which hand has been touched (with eyes closed), or to discuss the TEA and other prominent measures of atten-
report which ear has received a barely audible finger snap, tional impairment in the following sections.
or to identify which number has been traced on the finger-
tip. The results of this test are especially diagnostic if the Test of Everyday Attention The Test of Everyday
examinee consistently makes more errors on one side of Attention (TEA) is a promising measure devised in Great
the body than the other. The reader will recall from the pre- Britain by Robertson, Ward, Ridgeway, and NimmoSmith
vious chapter that neural innervation is almost exclusively (1994, 1996). The TEA measures the subcomponents of
opposite-sided. Furthermore, certain areas of the cerebral attention, including sustained attention, selective attention,
cortex are devoted to primary processing of touch, hearing, divided attention, and attentional switching. The subtests
and vision. Thus, an examinee who finds it difficult to pro- of the TEA are outlined in Table 10.6. The test has three par-
cess touch in the right hand may have a lesion in the post- allel versions and has been well validated with closed head
central gyrus of the left parietal lobe. Similarly, difficulty injury clients, stroke patients, and persons with Alzhei-
processing sound in the right ear may indicate a lesion in mer’s disease. Normative data are based upon the perfor-
the superior portion of the left temporal lobe, and right- mance of 154 healthy individuals between the ages of 18
sided visual defects may indicate brain impairment in the and 80. Examinees enjoy the real-life scenarios of the TEA,
left occipital lobe. which adds to the ecological validity of the instrument. The
298 Chapter 10

TEA is highly sensitive to normal age effects in the general (1995) CPT, children with diagnosed Attention-Deficit/
population and is, therefore, well suited to geriatric assess- Hyperactivity Disorder (ADHD) did not score worse than
ment. With the exception of the Elevator Counting subtest, clinical controls; on the other hand, children with diagnosed
the eight subtests were standardized to yield equivalent reading disorders showed impaired performance on the
scores with a common mean of 10 and standard deviation CPT (McGee, Clark, & Symons, 2000). In general, reviewers
of 3. Thus, the TEA allows for subtest analysis as a means of recommend that CPT tests should be interpreted in the con-
identifying an individual’s particular strengths and weak- text of a comprehensive test battery, especially when they
nesses (Crawford, Sommerville, & Robertson, 1997). The are used in the assessment of persons with suspected atten-
TEA is highly sensitive to the effects of closed head injury tional problems (Riccio, Reynolds, & Lowe, 2001).
(Chan, 2000), with the Map Search and Telephone Search The CPT is ideal for computerized adaptation, and
subtests revealing the largest deficits from brain injury dozens of different versions of it have appeared in the lit-
(Bate, Mathias, & Crawford, 2001). Chan and his colleagues erature (e.g.,Conners, 1995; Gordon & Mettelman, 1988).
developed a Cantonese version of the TEA and report Unfortunately, the proliferation of similar but not identical
favorably on its use with clinical and nonclinical Chinese tests has hindered research on the practical utility of this
participants (Chan, Lai, & Robertson, 2006; Chan & Lai, promising measure of attention. Sandford and Turner
2006). A children’s version (TEA-ch) is also available (1997) have published a computerized CPT that uses both
(Manly, Nimmo-Smith, Watson, and others, 2001). visual and auditory stimuli. The Intermediate Visual and
Auditory Continuous Performance Test (IVA) is normed
on 781 normal persons ranging from 5 to 90 years of age
Table 10.6 Subtests of the Test of Everyday Attention (TEA)
and screened for attention deficit, learning difficulties,
Map Search: A two-minute speeded search for 80 symbols on a colored emotional problems, and medication use. In one analysis,
map; measures selective attention.
the IVA showed 92 percent sensitivity (i.e., an 8 percent
Elevator Counting: Simulation of elevator floor counting from tape-
presented tones; measures sustained attention. rate of false negatives) and 90 percent specificity (i.e., a 10
Elevator Counting with Distraction: Same as above but with auditory percent rate of false positives) in differentiating children
distractors; measures sustained attention. diagnosed with Attention-Deficit/Hyperactivity Disorder
Visual Elevator: Visual simulation of elevator floor counting with up-down (ADHD) from normal children. Research by Tinius (2003)
reversals; measures attentional switching.
further endorses the validity of the IVA. He found that
Auditory Elevator with Reversal: Same as visual elevator, except it is
adults with mild traumatic brain injury or ADHD per-
presented on tape; measures attentional switching.
formed significantly lower than normal controls on IVA
Telephone Search: Search for key symbols while searching entries in a
simulated classified telephone directory; measures divided attention. subtests assessing reaction time, inattention, impulsivity,
Telephone Search Dual Task: Combines Telephone Search with and variability of reaction time. This instrument is just one
simultaneous counting of auditory tones; measures divided attention. of many promising neuropsychological tests that takes
Lottery: Subject listens for winning numbers known to end in 55 and then advantage of microcomputer technology.
writes down preceding stimuli; measures sustained attention.

10.2.4: Tests of Learning


Continuous Performance Test The Continuous
Performance Test (CPT) is not really a single test but rather and Memory
a family of similar procedures that dates back to the path- Learning and memory are intertwined processes that are
breaking research of Rosvold, Mirsky, Sarason, and others difficult to discuss in isolation. Learning new material usu-
(1956). These authors devised a measure of sustained atten- ally requires the exercise of memory. Furthermore, many
tion (also called vigilance) that involved continuous pres- tests of memory incorporate a learning curve through
entation of letters on a screen. In some cases, examinees repeated administrations. The separation of learning and
were to press a key when a certain letter appeared (e.g., x). memory processes is theoretically possible but of little
In other instances, examinees were to press a key when a practical value in clinical assessment. We make no tight
certain letter appeared after another letter (e.g., x when it distinction between these processes.
occurs after a). Errors of omission are noted when the Memory tests can be categorized according to several
examinee fails to press for a target stimulus. Errors of com- dimensions, including short term versus long term, verbal
mission are noted when the examinee presses the key for a versus pictorial, and learning curve versus no learning
nontarget stimulus. Normal subjects make few errors. curve. These dimensions reflect neurological factors dis-
Although CPT tests are sensitive to a wide variety of cussed in the previous section. For example, verbal mem-
brain-impairing conditions including hyperactivity, drug ory is significantly lateralized to the left hemisphere,
effects, schizophrenia, and overt brain damage, these tests whereas pictorial memory is largely underwritten by the
are not a panacea for the diagnosis of attention-deficit disor- right hemisphere. The interested reader can consult Lezak
ders. For example, in one study of the popular Conners et al. (2012) for more detailed analyses of the neural
Neuropsychological Assessment and Screening 299

s­ ubstrates for different types of memory. Here we will con- first seven subtests constitute the basis for obtaining age-
centrate on the psychometric characteristics of four quite adjusted scaled scores (mean of 100 and SD of 15) for five
dissimilar memory tests. standard indices:

Wechsler Memory Scale-IV The Wechsler Memory Immediate Memory Index


Scale-IV (Wechsler, 2009) is a monumental revision of the Delayed Memory Index
previous edition. The latest version is barely recognizable Auditory Memory Index
as the offspring of the original one-page test published Visual Memory Index
more than 60 years ago (Wechsler, 1945). The fourth edition Visual Working Memory Index
is an extensive, multiphasic memory test consisting of nine
If the ancillary subtests (Logos and Names) are
subtests, although seven are sufficient for the Standard
employed, five additional index scores can be computed.
Battery. The nine subtests are described in Table 10.7. The
We confine our discussion here to the Standard Battery,
although it is worth noting that the WMS-IV provides for
Table 10.7 WMS-IV Subtests five flexible batteries (e.g., Older Adult/Abbreviated Bat-
tery, Logical Memory/Designs Battery) using different
combinations of the nine subtests. The standard battery
requires about 75 minutes to administer, while the abbrevi-
ated battery can be completed in 35–40 minutes.
The WMS-IV was co-normed with the WAIS-IV in
2009. The standardization of the new instrument is superb.
Based on 2005 census data, the 2,200 participants were
stratified as to age (13 age bands spanning 16 to 90), gen-
der, race/ethnicity, educational level, and geographic
region.
Because the WMS-IV is a relatively new version, there
is currently little external research on its reliability and
validity. Even so, the WMS-IV Technical and Interpretive
Manual (Pearson, 2009) provides a mountain of supportive
data. Subtests internal consistencies range from a low of .74
(Visual Reproduction I) to highs of .94 to .97 (Verbal Paired
Associates I and Visual Reproduction II, respectively).
Internal consistencies of the index scores were excellent,
consistently in the mid-to high-90s. Test–retest reliabilities
for the index scores were lower, in the low .80s.
Validity of the battery appears strong, based on a vari-
ety of approaches, including confirmatory factor analysis,
correlations with other measures, and test profiles for spe-
cial groups (Pearson, 2009). In general, the index scores
reveal good convergent validity (high correlations with
similar measures) and good discriminant validity (low cor-
relations with dissimilar measures). Test profiles for special
groups (e.g., intellectual disability, traumatic brain injury,
Alzheimer’s disease, and schizophrenia) likewise make
theoretical sense in light of the aims of the test battery.
An important disclaimer with any multiphasic battery
like the WMS-IV is that distinctive profiles should not be
used in isolation for diagnosis. If A implies B, it does not
follow that B implies A. This is a logical fallacy. A specific
example will illustrate the point. If Alzheimer’s disease, on
average, yields a distinctive WMS-IV profile, it does not
follow that the presence of that profile in a new patient sig-
nifies that the patient has Alzheimer’s disease. Proper
diagnosis always entails the synthesis of many sources,
including interview with patient and informants.
300 Chapter 10

Likewise, isolated low scores on a WMS-IV index because of their prior exposure to the specific items,
should not be overinterpreted. Accessing the original regardless of whether their clinical condition is improving
standardization data, Brooks, Holdnack, and Iverson or getting worse. With parallel versions of a test, the
(2011) found that healthy people often obtain low scores on impact of practice effects can be mitigated by using a dif-
one or more index scores, especially when they had lower ferent form for each administration. Yet, this is a potential
education levels or intelligence. Moderating influences weakness, too, because the equivalence of the seven paral-
need to be considered in test interpretation. lel forms is not well established. In reviewing studies of
the seven forms of the RAVLT, Hawkins, Dean, and Pearl-
Rey Auditory Verbal Learning Test In the son (2004) could locate only six studies, and four of these
early 1900s, the Swiss psychologist Edouard Claparede were limited to comparisons of the original test against
(1873–1940) proposed a memory test consisting of the one other form. Although differences between forms likely
free-recall of a 15-item word list. This test evolved into are minor, their exact magnitude is simply unknown.
the Rey Auditory Verbal Learning Test (RAVLT), making
it one of the oldest mental tests in continuous use (Boake, Fuld Object-Memory Evaluation The Fuld
2002). The test first appeared in French (Rey, 1964), but Object-Memory Evaluation is a useful test of memory
an English-language adaptation has been provided by impairment in the elderly (Fuld, 1977). The test begins by
Lezak (1983, 1995) and others. The RAVLT is a very presenting the examinee with a bag containing 10 com-
­popular test of memory, especially for purposes of clini- mon objects (ball, bottle, button, etc.). The task is not
cal research. A search of PsychINFO from 1950 onward described as a memory test. The examinee is asked to
revealed more than 400 published articles using this determine whether he or she can identify objects by touch
­simple instrument. alone. Each object is felt and then named; the examinee
In administering the RAVLT, the examiner reads a list then pulls it out of the bag to see if he or she was right.
of 15 concrete nouns at the rate of one per second. The After all 10 items have been correctly identified, a distrac-
examinee recalls as many as possible in any order. Fore- tor task is administered: rapidly naming words in a
warning the examinee to recall all the words, including semantic category (e.g., names, foods, things that make
those previously recalled, the examiner reads the entire list people happy, vegetables, or things that make people sad).
a second time. A third, fourth, and fifth administration and Then the examinee is asked to recall as many of the objects
recall then ensue; these are followed by an interference as possible. After each recall, the subject is slowly and
trial with a new list of words. Next, immediate recall of the clearly reminded verbally of each item omitted on that
original list is tested (without benefit of a new presenta- trial, a procedure called selective reminding (Buschke &
tion). Finally, a recognition trial is included in which the Fuld, 1974). The examinee is then administered four more
examinee must underline the administered words from a chances to recall the list by selective reminding, with a dis-
longer written paragraph. The test yields a number of tractor task after each trial. Delayed recall is tested after a
scores, including the number recalled (of 15) for each of the 5-minute interval. Finally, the test closes with a multiple-
initial five trials, the total for the five trials (75 possible), the choice recognition test.
immediate recall after the distractor list is read, and the The Fuld test is often used to help confirm a diagnosis
recognition score. of Alzheimer’s disease, a degenerative neurological disor-
Rosenberg, Ryan, and Prifitera (1984) concluded that der described in the previous topic. In the early stages of
the RAVLT performs well in the identification of patients Alzheimer’s disease the most prominent symptom is mem-
known to be memory impaired by other criteria. In addi- ory loss. Elderly persons with memory impairment not
tion to an overall reduction in performance, memory- only score lower than control subjects on the Fuld Object-
impaired patients showed a reduced rate of improvement Memory Evaluation, but they also benefit very little from
across the five learning trials. Abundant norms for the the selective reminding. Fuld (1977) has provided norms
RAVLT can be found in Strauss, Sherman, and Spreen for community-active and healthy nursing-home residents
(2006). Schoenberg, Dawson, Duff, and others (2006) pro- in their 70s and 80s. Fuld, Masur, Blau, Crystal, and
vide normative data for 392 individuals with documented ­Aronson (1990) describe a prospective study in which the
neurological dysfunction. Fuld Object-Memory Evaluation demonstrated promise as
The RAVLT is available in at least seven parallel ver- a predictor of dementia in cognitively normal elderly.
sions, which is both a strength and a weakness of the test Lichtenberg, Manning, Vangel, and Ross (1995) describe a
(Hawkins, Dean, & Pearlson, 2004). It is a strength because program of neuropsychological research using the Fuld
clinicians often employ repeat testing as they follow test with older urban medical patients.
patients with memory difficulties. Of course, this raises Chung (2009) reports very favorably on the validity
the specter of practice effects: examinees will do better on of the Fuld test as a screening measure of dementia in
second, third, and ensuing administrations to some degree Chinese elderly. In a sample of 192 community-dwelling
Neuropsychological Assessment and Screening 301

individuals, 57 with confirmed dementia, the optimal The RBMT is highly popular in geriatric and rehabili-
­cut-off on the total retrieval score yielded an amazing tation settings because of its robust ecological validity—
93 percent sensitivity and 90 percent specificity. In other the subtests parallel the tasks and activities of everyday life
words, 93 percent of the individuals with dementia were (Guaiana, Tyson, & Mortimer, 2004). Another strong point
correctly spotted, and 90 percent of the normal individu- of the instrument is that it assesses many elements of mem-
als were appropriately classified. These are impressive ory. For example, the test evaluates all of the following
findings for a simple screening test. Chung and Ho (2009) aspects: short-term, long-term, verbal, spatial, retrospec-
report similarly favorable results in a Chinese nursing- tive, and prospective memory. The focus on prospective
home sample. memory—remembering to do something in the future—is
a rare but welcome addition to the appraisal of memory.
Rivermead Behavioral Memory Test The River-
Man, Chung, and Mak (2009) developed an online ver-
mead Behavioral Memory Test (RBMT) is a measure of
sion of the RBMT for use with Chinese examinees. They
everyday memory such as route finding, remembering
compared scores of 30 stroke patients on the original, face-
names, and recalling information (Wilson, Cockburn, &
to-face version of the test versus the online version, and
Baddeley, 1991). The instrument includes the following
found exceptionally strong correlations on the 12 subtests,
subtests:
with rs ranging from .84 to .93. The new version also was
Names: A photograph is shown along with the first highly successful in distinguishing stroke patients from
and second names of the person in the photograph. controls. In sum, the online adaptation looks highly prom-
The examinee is tested on both the first and the second ising as a replacement for the more cumbersome face-to-
names. face edition.
Belonging: At the beginning of the test, the examinee is
required to hand over a personal belonging (e.g., wal- Wide Range Assessment of Memory and
let), which is then hidden while the examinee ­observes. ­Learning-2 The original version of the Wide Range
Later the examinee must remember to ask for the item Assessment of Memory and Learning (WRAML) was the
and then also to find it. first comprehensive memory scale designed for use with
Appointment: The examinee is asked to remember to children (ages 5 to 17 years). The second edition of the test,
ask the date of the next appointment when he or she the WRAML-2 (Sheslow & Adams, 2004), retains the pedi-
hears the sound of an alarm timer. atric focus but also extends the norms upward to 90 years of
Pictures: The examinee is shown 10 cards with simple age. The WRAML-2 is, therefore, unique as the only mem-
pictures or drawings and later is asked to recognize ory scale that can be used with both children and adults. In
them among a set of 20 cards. addition to examiner convenience (no need to buy and
Immediate Story: The examiner reads a short paragraph learn several memory tests), there is clinical value as well in
and immediately afterward asks the examinee to recall using a single test across a wide range of ages. Specifically,
as many elements of the brief story as possible. when clinicians desire to do follow-up testing on a child or
Delayed Story: After completing a number of additional teenage client who subsequently transitions into adult-
subtests, the examinee is asked to recall as many ele- hood, using a single test avoids the pitfall of introducing
ments of the story as possible. measurement error associated with different tests.
Faces: The examinee is shown 5 cards with a face on The WRAML-2 consists of six core subtests that con-
them and then asked to recognize them among a set tribute to three Index scores: Verbal Memory, Visual Mem-
of 10 cards. ory, and Attention/Concentration. Collectively, these
Immediate Route: The examiner demonstrates a short Index scores establish the overall General Memory Index.
route with the examinee and leaves an envelope with A description of the core memory tasks is provided in
a written message at the destination. The examinee is Table 10.8.
asked to reproduce the route and to recall the message. In addition to the core memory subtests, the WRAML-2
Immediate Message: This item is linked to Immediate also utilizes delayed memory tasks and recognition mem-
Route (above). The examinee is asked to recall the writ- ory tasks. The delayed memory tasks require free recall of
ten message. previously presented material whereas the recognition
Delayed Message: After completing a number of inter- memory tasks involve mere recognition of the material.
vening tasks, the examinee is asked to recall the writ- The two formats (delayed and recognition) help distin-
ten message again. guish between storage and retrieval problems in memory.
Orientation: This subtest consists of 10 items tapping In particular, a client who performs poorly on delayed
knowledge of personal and societal information. memory but who excels at recognition memory most likely
Date: The examinee is asked the date of the has a problem with retrieval rather than storage. This is
­examination. somewhat similar to not remembering a test item when a
302 Chapter 10

the sentence to answer the question): If in a bag you had


Table 10.8 Description of Core WRAML-2 Subtests two red balls, three yellow balls, and one green ball, what
is the probability the ball would be yellow if you reached
into the bag and randomly chose one ball? To answer this
question, the short-term verbal memory processor must
hold on to all the words in the sentence until the last
phrase containing the question. Then it must reproduce
the sentence, remembering how many red balls there
were, and so on, then hold that information secure, return-
ing to accumulate all the numbers in order to compute the
answer. There are two working memory subtests on the
WRAML-2, one that examines verbal working memory
and another that examines a combination of verbal and
visual working memory.
The adult standardization age bands used in norming
the WRAML-2 are similar to those of the WMS-III, with
similar attention given to stratification variables such as
age, gender, ethnicity, geographic region, and educational
level. “Tighter” age bands exist for the 5- to 14-year-old
samples because there is more change in memory abilities
across these ages than in adulthood (except for the oldest
age groups). For the WRAML-2, factor-analytic studies
show strong support for the three discrete domains being
measured (Verbal Memory, Visual Memory, and Atten-
tion/Concentration) as well as the newly introduced
domain of Working Memory. Especially impressive are the
analyses showing extremely low item bias for gender as
well as ethnicity. As with the WMS-III, validity studies
show clinical groups with neurological disorders scoring
significantly lower than nonclinical groups on all
WRAML-2 Indexes. The correlation of the WRAML-2 with
WAIS-III Full Scale IQ is moderate, supporting the claim
that it measures something different from, although related
to, intelligence. Of interest, though, a much lower correla-
tion with the WISC-III suggests that there is less correlation
between intelligence and memory ability among children
than among adults.
fill-in-the blank format is used but succeeding when a Because both tests claim to be memory tests and show
­multiple-choice format is used. In fact, retrieval memory some similarities across tasks used to assess memory, it is
requires a different neurological substrate than recognition reasonable to wonder if the WMS-III and WRAML-2 yield
memory. Although capable functioning in both retrieval similar scores (i.e., if there is reasonable concurrent valid-
and recognition memory is typical throughout life, distinct ity). Using 79 adults from ages 17 through 74 years, the
differences (favoring recognition) are observed in old age, test developers showed that overall memory indexes of
with certain neurological conditions such as Alzheimer’s the two measures differed by only 4.7 points. However,
disease, and in some forms of brain injury. the correlations between scores on the two memory
The WRAML-2 also includes optional subtests that instruments ranged from .29 to .60. These moderate cor-
can be used to evaluate a relatively new area of memory relations suggest that they are measuring somewhat dif-
measurement, namely, working memory (Baddeley, 1986). ferent aspects of memory and are not interchangeable
Working memory is a complex form of short-term mem- instruments.
ory. In addition to simply holding on to rote information
for several seconds, when using working memory the cli- Additional Tests of Learning and Mem-
ent is also “working” with a part of the memory trace ory Because of space limitations, we can do no more
without distorting the whole trace. For example, try to than briefly mention several other useful tests of learning
read the following sentence only once (i.e., do not reread and memory. The California Verbal Learning Test-II is
Neuropsychological Assessment and Screening 303

­ atterned after the Rey AVLT but provides software to


p
Assessing Attributes Description
quantify and analyze the pattern of results (Delis, Kramer,
Reading: The examiner requests the patient to read
Kaplan, & Ober, 2000). The Benton Visual Retention Test is and explain a short paragraph suited to
a design-copying test of visual memory (Sivan, 1991). prior level of education and intelligence.
The examiner may ask the patient to
Good reviews of memory tests can be found in Lezak et al. follow written instructions (e.g., “Close
(2012) and Strauss, Sherman, and Spreen (2006). your eyes” or “Clap your hands three
times”).
Writing and copying: The examiner asks the patient to write
10.2.5: Assessment of Language spontaneously and from dictation. Also,
Functions the examiner may ask the patient to copy
written matter and geometric shapes.
The examiner is interested in grossly
As noted in a previous section, language functioning offers ungrammatical written productions and
a window to the integrity of the left cerebral hemisphere. significant distortions in copying.
Thus, neuropsychologists are keenly interested in an exam- Calculation: The examiner asks the patient to perform
inee’s ability to speak, read, write, and comprehend what very simple mathematical calculations
(e.g., 17 3 3) with and without aid of
others say. Little wonder that a comprehensive neuropsy- scratch paper. The tasks are so simple
chological examination always includes one or more meth- that normal subjects rarely fail.
ods for assessing language functions.
Neuropsychologists exhibit a special interest in a vari- Based on the clinical assessment, the examiner may fill
ety of language dysfunctions known collectively as apha- out a rating scale for severity of aphasia. For example, the
sia. Briefly stated, aphasia is any deviation in language rating scale used in the Boston Diagnostic Aphasia Exam
performance caused by brain damage. In testing for apha- (Goodglass, Kaplan, & Barresi, 2000) includes the follow-
sia, a neuropsychologist might use any or all of three ing speech characteristics: melodic line, phrase length,
approaches: (1) a nonstandardized clinical examination, articulatory agility, grammatical form, word finding, and
(2) a standardized screening test, or (3) a comprehensive auditory comprehension.
diagnostic test of aphasia. We will provide examples of Screening and Comprehensive Diagnostic
each in our brief review of assessment methods in aphasia. Tests for Aphasia Standardized screening tests for
aphasia closely resemble the brief clinical exam. The essen-
Clinical Examination for Aphasia A clinical
tial difference is that standardized screening tests incorpo-
examination for aphasia has the advantages of simplicity,
rate objective and precise instructions for administration
flexibility, and brevity. These are important attributes
and scoring. The weakness of screening tests is that they
when assessing a severely impaired patient who may
will not detect subtle forms of aphasia.
require bedside testing. Every practitioner has a slightly
Comprehensive diagnostic tests for aphasia are quite
different version of the brief clinical exam (Lezak et al.,
lengthy and used mainly when a patient is known to expe-
2012; Reitan, 1984, 1985). Nonetheless, certain elements
rience aphasia. These tests provide a profile of language
commonly are assessed:
skills that is helpful in treatment planning. We provide a
Common Elements in the Clinical Assessment for Aphasia brief description of several aphasia tests in Table 10.9.

Assessing Attributes Description


10.2.6: Tests of Spatial
Spontaneous speech: The examiner looks for distinctive
symptoms of aphasia such as word- and Manipulatory Ability
finding difficulty or neologisms (e.g.,
referring to a comb as a “planker”). Tests of spatial and manipulatory ability are also known as
Repetition of sentences The examiner asks the patient to repeat tests of constructional performance. A constructional per-
and phrases: stimuli such as “No ifs, ands, or buts,” formance test combines perceptual activity with motor
and “Methodist Episcopal.” The repetition
tasks are so simple that normal subjects response and always has a spatial component (Lezak et al.,
almost never fail them. 2012). Because constructional ability involves several com-
Comprehension of The examiner asks questions (“Does plex functions, even mild forms of brain dysfunction will
­spoken language: a car have handlebars?”) and issues result in impaired constructional performance. However,
commands (“Take this paper, fold it in
half, and put it on the floor”). Again, the careful observation is needed to distinguish the cause of
tasks are so simple that normal subjects the failed performance, which may include spatial confu-
almost never fail them.
sion, perceptual deficiency, attentional difficulties, moti-
Word finding: The examiner points to common, easily
recognized objects and asks, “What’s vational problems, and apraxias. The term apraxia refers
this?” Typical items include watch, pen, to a variety of dysfunctions characterized by a breakdown
pencil, glasses, ring, and shoes. The
examiner may ask the patient to name in the direction or execution of complex motor acts (Strub
numbers, letters, or colors. & Black, 2000). For example, a patient who could not
304 Chapter 10

Table 10.9 Brief Description of Several Aphasia Tests Figure 10.5 Stimuli Similar to Those From the Bender
Gestalt Test-II.
NOTE: The Bender-Gestalt-II consists of sixteen stimuli similar to these.

The test is simple to explain and administer. The exam-


inee is instructed to copy one drawing at a time on a sheet
of blank paper. Erasures are discouraged. If needed, addi-
tional sheets of paper are provided. The examinee is told
“this is not a test of artistic ability, but try to copy the draw-
ings as accurately as possible. Work as fast or as slowly as
you wish” (Hutt, 1977). Use of a ruler or straight edge is
not permitted.
For the original version of the BGT, a number of com-
plex scoring systems have been developed for adults
(Hain, 1964; Hutt & Briskin, 1960; Lacks, 1999). In addi-
tion, Koppitz (1963, 1975) produced an elaborate scoring
demonstrate how to use a key would be diagnosed as suf-
system for children aged 5 to 11. The Koppitz system
fering from ideomotor apraxia.
yielded a raw score (total errors) that could be converted
Tests of constructional performance embrace two large
to an age-equivalent score as well. In contrast to the use of
classes of activities: drawing and assembling. Owing to
the BGT with adults—where the examiner is looking for
limitations of space, we will review only a few prominent
signs of brain impairment—when used with children, the
instruments in each category.
primary purpose of the test is to assess the level of devel-
Design Copying Tests Drawing a copy of simple opmental maturity. Several interesting variations on the
geometric shapes such as two overlapping pentagons is a original BGT are discussed in Gregory (1999).
complex activity that requires accurate visual perception, A revised and expanded version of the BGT was pub-
correct spatial analysis, as well as intact motor functions lished by Brannigan and Decker (2003). The BGT-II adds to
and the executive ability to make mid-course corrections in the original test rather than revamping it. Specifically, it
the drawing. Because the activity of copying a design includes the original nine stimulus cards supplemented by
involves so many cognitive capacities, it is sensitive to a seven new drawings (four very easy drawings, and three
wide variety of brain impairing conditions. For this reason, that provide substantial challenge). The four additional
design copying has been a mainstay of cognitive screening “easy” cards are administered only to younger examinees
for brain impairment. 4 through 7 years of age, whereas the three “difficult” cards
One of the most widely used design copying tests— are administered only to older examinees 8 through 851
indeed, one of the most widely used individual tests of any years of age. Unlike previous editions of the test which
kind—is the Bender Visual-Motor Gestalt Test (Bender, lacked serious efforts at standardization, the BGT-II norms
1938), more commonly known as the Bender Gestalt Test are based on more than 4,000 individuals, ages 4 through
(BGT). In the last half of the twentieth century, the BGT 85, stratified on important demographics according to the
consistently ranked among the top four or five most fre- 2000 census.
quently used tests in clinical psychology (Piotrowski, These new stimulus cards are intended to extend the
1995). The original version consisted of nine stimulus measurement scale at the lower and higher extremes of
drawings similar to those in Figure 10-5. ability. The authors also provide an explicit scoring
Neuropsychological Assessment and Screening 305

s­ ystem whereby each reproduction is scored on a 5-point Assembly Tests In his classic book on the parietal
scale from 0 (no resemblance) to 4 (nearly perfect). Of lobes, Critchley (1953) provided the rationale for including
course, comprehensive, census-based norms are provided three-dimensional construction tasks in a neuropsycholog-
by way of standard scores, T scores, percentile ranks, con- ical test battery:
fidence intervals, and classification labels. The standard It is possible, and indeed useful, to proceed to problems
score is called the Visual Motor Integration (VMI) and is in three-dimensional space though tests of this character
anchored to a mean of 100 and standard deviation (SD) of are only too rarely employed. This is a more difficult
15. This is a useful feature of the BG-II because it allows undertaking, and patients who respond moderately well
for comparisons of the VMI score with IQs, memory quo- to the usual procedures with sticks and pencil-and-paper
tients, and other indices normed to mean of 100 and SD of may display gross abnormalities when told to assemble
15. Marnic (2011) found that the test is valuable in the bricks according to a three-dimensional pattern.
diagnosis of attention-deficit/hyperactivity disorder in Benton, Sivan, Hamsher, Varney, and Spreen (1994) pre-
referred children and adolescents. Decker (2008) provides sent a three-dimensional block construction test with excel-
a sophisticated analysis of subtle changes in BGT-II proto- lent norms and scoring guide. The two forms of the test
cols across the life span, suggesting that visual-motor (Form A and Form B) consist of three block models that are
skills mature rapidly from childhood into middle adoles- presented one at a time to the patient. The patient is
cence, decline steadily through adulthood, and drop requested to construct an exact replica of the model by
steeply in old age. selecting the appropriate blocks from a set of loose blocks on
The Greek Cross (Reitan & Wolfson, 1993) is a very a tray. Based on omissions, additions, substitutions, and dis-
simple drawing task that is surprisingly sensitive to brain placements, the three models are scored from 0 to 6, 8, and
impairment. The examinee is requested to carefully copy 15 points, respectively. This test is quite sensitive to brain
the figure without lifting the pencil, that is, by tracing the impairment, especially when the left or right parietal area is
perimeter. The stimulus figure and examples of defective affected. Lezak et al. (2012) discusses other assembly tasks.
performance are shown in Figure 10-6. This test is most We should mention that the Tactual Performance Test from
often evaluated on a qualitative basis, although scoring the Halstead-Reitan battery is, in part, an assembly task that
guides do exist (Gregory, 1999). measures spatial and manipulatory abilities (see Table 10.4).

10.2.7: Assessment of Executive


Figure 10.6 The Greek Cross Stimulus Figure and
Reproductions from Persons with Known Brain Damage Functions
(a) Stimulus figure. Executive functions include logical analysis, conceptual-
(b) Clerical worker with diffuse right hemisphere dysfunction
of unknown origin. ization, reasoning, planning, and flexibility of thinking.
(c) College professor two years after a right hemisphere The assessment of executive functions presents an unusual
stroke. quandary to neuropsychologists:
(d) Patient with generalized, diffuse dementia.
Source: From Gregory, Robert J. Foundations of intellectual assessment: A major obstacle to examining the executive functions is
The WAIS-III and other tests in clinical practice, p. 197. Published by Allyn and the paradoxical need to structure a situation in which
Bacon, Boston, MA. Copyright © 1999 by Pearson Education. Adapted by
permission of the publisher. patients can show whether and how well they can make
structure for themselves. Typically in formal examina-
tions, the examiner determines what activity the subject is
to do with what materials, when, where, and how. Most
cognitive tests, for example, allow the subject little room
for discretionary behavior, including many tests thought
to be sensitive to executive—or frontal lobe—disorders …
The problem for clinicians who want to examine the exec-
utive functions becomes how to transfer goal setting,
structuring, and decision making from the clinician to the
subject within the structured examination.
(Lezak, 1995)

Many neuropsychologists resolve this quandary by


using the clinical method to evaluate executive functions
rather than administering formal tests (Cripe, 1996). For
example, Pollens, McBratnie, and Burton (1988) use inter-
view and observations to fill out the structured checklist on
executive functions mentioned in the previous topic.
306 Chapter 10

Only a limited number of neuropsychological tests tap examiner says “right” or “wrong.” After the examinee has
executive functions to any appreciable degree. Useful sorted a run of 10 correct placements in a row, the exam-
instruments in this regard include the Porteus Mazes, Wis- iner shifts the principle without warning. The test contin-
consin Card Sorting Test, and a novel approach known as ues until the examinee has made six runs of 10 correct
the Tinkertoy® Test. We remind the reader that the Cate- placements. The test can be scored in several different
gory Test from the Halstead-Reitan battery also captures ways, including total number of trials to criterion ­(Axelrod,
executive functions to some extent (Table 10.4). Greve, & Goldman, 1994). A common use of the WCST is
The Porteus Maze Test was devised as a culture- to gauge ongoing recovery in patients with brain trauma
reduced measure of planning and foresight (Porteus, 1965). of recent onset. Thus, the longitudinal constancy of test
Without lifting the pencil and attempting to avoid dead scores in patients with stabilized conditions is a reassuring
ends, the examinee must trace a line through a series of characteristic of this test (Greve, Love, Sherwin, and
increasingly difficult mazes. This underused instrument is ­others, 2002).
quite sensitive to the effects of brain damage, particularly Lezak (1982) devised the Tinkertoy ® Test to give
in the frontal lobes (Smith & Kinder, 1959; Smith, 1960). patients the opportunity to demonstrate executive capaci-
Krikorian and Bartok (1998) have published contem- ties within the structured format of an examination. Fifty
porary Porteus Maze norms for children and young adults pieces of a standard Tinkertoy® set are placed on a clean
7 to 21 years of age; these researchers also demonstrated surface and the examinee is told, “Make whatever you
that test scores are minimally related to IQ scores. Mack want with these. You will have at least five minutes and as
and Patterson (1995) investigated the Porteus test as a use- much more time as you wish to make something.” The test
ful measure of executive function in elderly patients with is scored from 21 to 112 based on several variables includ-
Alzheimer’s disease. In a study of 276 pediatric patients ing the number of pieces used, the mobility of the construc-
who had sustained a traumatic brain injury (TBI), Levin, tion, symmetry, and the naming of the construction.
Song, Ewing-Cobbs, and Roberson (2001) found that the Head-injured patients produce impoverished designs con-
Porteus test was highly sensitive to TBI severity as meas- sisting of a small number of pieces. These individuals often
ured by the volume of tissue damage in the prefrontal are unable to provide a name for their constructions.
areas of the brain. Bayless, Varney, and Roberts (1989) studied the predic-
The Wisconsin Card Sorting Test (WCST) is a good tive validity of the Tinkertoy® Test by comparing the
measure of executive functions, although its differential results of 50 patients with closed-head injuries versus 25
sensitivity to frontal lobe damage is debated (Mountain & normal controls. Half of the head-injured individuals had
Snow, 1993). The instrument was devised to study abstract returned to work while half had not. Whereas all but one of
thinking and the ability to shift set (Berg, 1948; Heaton, the head-injured who returned to work scored normally on
Chelune, Talley, and others, 1993). The examinee is given a the Tinkertoy® Test, nearly half of the nonreturnees per-
pack of 64 cards on which are printed one to four symbols formed below the level of the worst control subject. The
(triangle, star, cross, or circle) in one of four colors (red, researchers conclude:
green, yellow, or blue). No two cards are identical. Thus,
The test seems particularly well suited for demonstrating
each card embodies a number, a particular shape, and a
the presence of deficits in executive functioning, which
specific color. The examinee must sort these cards under-
have proven to be difficult to demonstrate with clinical
neath four stimulus cards according to an unknown princi-
tests even though they have catastrophic sequelae in daily
ple (Figure 10-7). vocational or psychosocial endeavors.
(Bayless et al., 1989)
Figure 10.7 Cards and Sorting Piles Similar to the
Wisconsin Card Sorting Test The Tinkertoy® Test also shows promise in the assess-
ment of individuals with Alzheimer ’s disease (Koss,
­Patterson, Mack, Smyth, & Whitehouse, 1998).
Neuropsychologists still need additional measures
of executive functions. One promising approach in the
early stages of development is real-world assessment of
Red
route finding. The ability to find an unfamiliar location in
Green the city requires strategy, self-monitoring, and corrective
Yellow maneuvers. These are executive functions applied to a
Blue realistic problem (Boyd & Sauter, 1993). Another promis-
ing approach to the assessment of executive functions is
For example, the unknown principle might be “sort embodied in a recent battery called the Behavioral
according to color.” As the examinee places cards, the Assessment of the Dysexecutive System (BADS; Wilson,
Neuropsychological Assessment and Screening 307

Alderman, Burgess, and others, 1996). The BADS battery the index finger of each hand, the examinee completes a
consists of six novel situational tests that resemble real- series of 10-second trials until five trials in a row are within
life daily activities: a 5-point range. The score for each hand is the average of
these five trials, rounded to the nearest whole number. With
the dominant hand, males typically score about 54 taps (SD
Six Tests for Behavioral Assessment of the Dysexecutive
System (BADS) of 4), whereas females typically score about 51 taps (SD of 5;
Dodrill, 1979; Morrison, Gregory, & Paul, 1979).
In general, the absolute level of performance is of less
interest than the relative abilities on the two sides of the
body. Normative expectation is that the nondominant hand
will yield a tapping rate about 90 percent of the dominant
hand. Significant deviations from this pattern are thought
to indicate a lesion in the hemisphere opposite that of the
slowed hand (Haaland & Delaney, 1981). However, such
inferences must be made with great caution owing to the
very low reliability of the ratio score. Although test–retest
and interexaminer reliabilities for either hand alone
approach .80, the reliability of the ratio score is a dismal .44
The battery also includes a 20-item dysexecutive ques- to .54 (Morrison, Gregory, & Paul, 1979). The ratio score
tionnaire with items rated on a 5-point (0 to 4) Likert scale. should be used with extreme caution in making clinical
The items involve likely changes when executive functions inferences about lateralization of damage.
are impaired, for example, “I have difficulty thinking The Purdue Pegboard Test requires the examinee to
ahead and planning for the future.” The questions are in place pegs in holes with the left hand, right hand, and then
four broad areas: personality/emotional changes, motiva- both hands. Each trial lasts only 30 seconds, so the entire test
tional changes, behavioral changes, and cognitive changes. can be administered in a matter of minutes. Tiffin (1968)
Spreen and Strauss (1998) provide a helpful review of this reports normative scores for work applicants. Relative slow-
battery. Norris and Tate (2000) compared the BADS with ing in one hand suggests a lesion in the opposite hemisphere,
six other commonly used tests of executive functioning. In whereas bilateral slowing indicates diffuse or bilateral brain
a sample of 36 neurological patients, they demonstrated damage. Using the Purdue Pegboard Test in isolation, one
the ecological superiority of this new instrument in pre- study found an 80 percent accuracy in identifying brain
dicting competency in everyday role functioning. Simon, impairment among a large group of normal subjects and
Giacomini, Ferrero, and Mohr (2003) found that the BADS neurological patients (Lezak, 1983). Other studies report
was a fair measure of social adjustment in patients with much less favorable findings (Heaton, Smith, Lehman, &
schizophrenia, correlating r 5 .34 with an index of psycho- Vogt, 1978). The Purdue Pegboard Test is a useful addition to
social adjustment. The BADS outperformed the Wisconsin a comprehensive battery but should not be used in isolation
Card Sorting Test and the Trail Making Test (part B) in this for screening purposes. Spreen and Strauss (1998) provide
context. In a study comparing healthy controls, patients an excellent summary of norms for this widely used test.
with mild cognitive impairment, and patients with mild Klove has developed a variation on the pegboard test
Alzheimer’s disease, the BADS was highly sensitive to the in which the pegs have a ridge along one side (Klove,
impact of mild Alzheimer’s disease, but did not differenti- 1963). Because each peg must be rotated into position, the
ate the other two groups (da Costa Armentano, Porto, Grooved Pegboard requires complex coordination in addi-
Brucki, & Nitrini, 2009). tion to motor dexterity. The Grooved Pegboard test is an
excellent instrument for assessing lateralized brain dam-
age (Haaland & Delaney, 1981).
10.2.8: Assessment of Motor Output Finally, we should mention that useful motor tests
Most neuropsychological test batteries include measures of need not require sophisticated equipment. Lezak (1995)
manipulative speed and accuracy. Lezak et al. (2012) pro- recommends a line tracing task to assess difficulties in
vides a comprehensive review. We will briefly summarize motor regulation (Figure 10-8). The examinee is given a
three approaches: finger tapping, pegboard performance, brightly colored felt-tipped pen and a sheet of paper with
and line tracing. several figures and told to draw over the lines as rapidly as
Perhaps the most widely used test of motor dexterity is possible. Difficulties with motor regulation show up in
the Finger-Tapping Test from the Halstead-Reitan battery. overshooting corners, perseveration of an ongoing
This test consists of a tapping key that extends from a response, and inability to follow the reduced curves in the
mechanical counting device attached to a flat board. With bottom figure. Because this task is easily completed by
308 Chapter 10

most 10-year-olds, any noticeable deviations are sugges-


tive of difficulties in motor regulation. Table 10.10 Tests and Procedures of the Luria-Nebraska
Neuropsychological Battery
Ability Scale: Tasks Included
Figure 10.8 A Typical Line-Tracing Task (Reduced Size) C1 Motor: Coordination, speed, drawing, complex motor abilities
C2 Rhythm: Attend to, discriminate, and produce verbal and nonverbal
rhythmic stimuli
C3 Tactile: Identify tactile stimuli, including stimuli traced on the wrists
C4 Visual: Identify drawings, including overlapping and unfocused
objects; solve progressive matrices and other visuospatial skills
C5 Receptive Speech: Discriminate phonemes and comprehend words,
phrases, sentences
C6 Expressive Speech: Articulate sounds, words, and sentences fluently;
identify pictured or described objects
C7 Writing: Use motor writing abilities in general; copy and write from
dictation
C8 Reading: Read letters, words, and sentences; synthesize letters into
sounds and words
C9 Arithmetic: Complete simple mathematical computations;
10.2.9: Test Batteries in comprehend mathematical signs and number structure

Neuropsychological Assessment C10 Memory: Remember verbal and nonverbal stimuli under both
interference and noninterference conditions
We remind the reader that the Halstead-Reitan Neuropsycho- C11 Intelligence: Reasoning, concept formation, and complex
logical Battery (Reitan & Wolfson, 1993), discussed earlier, is a mathematical problem solving
respected and widely used battery in neuropsychological
assessment. Here we summarize competing approaches.
We cannot review the voluminous literature on the
The Luria-Nebraska Neuropsychological Bat- LNNB, but brief mention of a few key studies certainly is
tery Now that we have completed a tour of some indi- merited. The reliability of the LNNB has been evaluated
vidual neuropsychological tests and procedures, it is time from the usual perspectives (split-half, internal consist-
once again to remind the reader that many ency, and test–retest), with excellent results. For example,
neuropsychologists prefer to use a fixed battery rather the mean test-retest reliability for the clinical scales was
than an ever-shifting, individualized assortment of instru- near .90 (Bach, Harowski, Kirby, Peterson, & Schulein,
ments. Certainly, one of the most widely used fixed batter- 1981; Plaisted & Golden, 1982; Teichner et al., 1999). In var-
ies is the Luria-Nebraska Neuropsychological Battery ious validity studies of classification of brain-damaged
(LNNB; Golden, 2004; Golden, Purish, & Hammeke, 1980, persons versus other criterion groups, the LNNB has
1986), now in its third edition (LNNB-III; Teichner, Golden, shown hit rates of 80 percent or better (Golden, Moses,
Bradley, & Crum, 1999). Graber, & Berg, 1981; Hammeke, Golden, & Purish, 1978;
The test consists of 269 discrete items, chosen from the Moses & Golden, 1979; Teichner et al., 1999).
work of Luria (1966) and formally standardized. These In spite of the positive appraisals of the LNNB reported
items are scored 0, 1, or 2 according to precise criteria in the by Golden and his colleagues, some neuropsychologists
administration and scoring manual. Similar items are remain skeptical of the test (e.g., Lezak, 1995). One concern
grouped into 11 clinical scales, C1 through C11 (Table is that the heterogeneity of the scales is so great that the
10.10). Raw scores on each scale are converted into T scores, individual scale scores do not quantify specific neuropsy-
with a mean of 50 and a standard deviation of 10. Higher chological deficits but instead serve only to differentiate
scores reflect more psychopathology; scores above 70 are normal persons from brain-damaged patients (Snow, 1992;
especially suggestive of brain impairment. Van Gorp, 1992). Early reviewers also expressed concern
Three summary scales are also derived from test perfor- that the speech scales were not oriented to syndromes of
mance: S1 (Pathognomonic), S2 (Left Hemisphere), and S3 aphasia and could therefore misdiagnose language deficits
(Right Hemisphere). The Pathognomonic scale reflects the (Delis & Kaplan, 1982). In defense of the LNNB, Purish
degree of compensation that has occurred since an injury, (2001) contends that initial criticisms were based on mis-
such as functional reorganization of the brain as well as conceptions as to the theoretical basis for the instrument.
actual physical recovery. Higher scores reflect less compen- Furthermore, in his view, these criticisms have been largely
sation. The Left Hemisphere and Right Hemisphere scales negated by an expanding body of empirical research
can be used to help determine whether an injury is diffuse or ­supporting the test.
lateralized. A number of other scales and interpretive factors Yet, it is possible that the LNNB and its chief rival, the
are also available (Golden, Purish, & Hammeke, 1986). Halstead-Reitan Neuropsychological Battery, have reached
Neuropsychological Assessment and Screening 309

their peak of popularity and clinical utility (Davis, ­Johnson, • Shape Learning*—Visual learning of 9 shapes with
and D’Amato, 2005). New batteries emerge every few delayed recognition
years. A promising addition is the Neuropsychological • Story Learning*—Verbal learning of a short narrative
Assessment Battery. story of five sentences

The Neuropsychological Assessment Battery


• Daily Living Memory—Verbal learning of medication
(NAB) The Neuropsychological Assessment Battery or
instructions, address, phone number
NAB (Stern & White, 2003ab) is a recent and promising entry SPATIAL
in the field that is remarkable for its breadth and sophistica-
• Visual Discrimination—Matching of stimuli pre-
tion. Suitable for adults 18 to 97 years of age, the NAB is a
sented visually from an array
comprehensive battery of 24 individual tests in five modular
• Design Construction—Assembling a tangram design
areas: attention, language, memory, spatial, and executive
from individual pieces
functions. Twelve of the subtests also can be used as a sepa-
rate screening module. The instrument comes in two parallel • Figure Drawing—Drawing task involving copy and
and psychometrically equivalent versions, Form 1 and Form recall of geometric shapes
2. Norms are based on data from 1,448 neurologically healthy • Map Reading—Answering practical questions based
individuals matching the U.S. population on educational on the map of a city
level, gender, ethnicity, and geographic region.
EXECUTIVE FUNCTIONS
The five major modules, each consisting of four to six
subtests, are listed below: • Mazes*—Solving paper-and-pencil mazes of increas-
ing complexity
ATTENTION • Categories—Classifying and categorizing task based
• Orientation*—Questions about orientation to self, on photos of six people
time, place, and situation • Word Generation*—Creating three-letter words from
• Digits Forward*—Repetition of orally presented digit two vowels and six consonants
sequences of increasing length • Judgment—Answering practical questions about
• Digits Backward*—Orally presented digit sequences home safety and health
recalled in reverse order
Subtests used in the Screening Module are indicated
• Dots—Delayed recognition of the “new” dot in visual with an asterisk. One feature evident in this table is that
presentation of dots each module contains one subtest designed to possess
• Numbers & Letters*—Timed tests of letter cancella- ecological validity as well as psychometric validity.
tion, letter counting, serial addition ­Ecological validity refers to the congruence between test-
• Driving Scenes—Recognition of what is “new” in ing situations and analogous real-world circumstances. A
presentation of a second driving scene test with strong ecological validity is one that highly
resembles practical behaviors required in the real world.
LANGUAGE
Among the NAB subtests with ecological validity are
• Oral Production—Speech output when the examinee
Driving Scenes, Bill Payment, Daily Living Memory, Map
orally describes a picture
Reading, and Judgments. Each resembles a real world
• Auditory Comprehension—Comprehension of orally situation of importance in daily life. Ecological validity is
presented commands and instructions beneficial because it increases the acceptability of testing
• Naming*—Ability to name a pictured object, with to examinees.
cues if necessary The modular nature of the NAB allows for fixed
• Reading Comprehension—Reading comprehension administration of the entire battery (which takes about
of single words and sentences three hours), or flexible administration of the Screening
• Writing—Writing sample scored for delivery, legibil- Module followed by full administration of one or more of
ity, syntax, spelling the five modules, depending on screening results. Once the
• Bill Payment—Real world task of writing a check to test has been administered, software is available to com-
pay a utility bill pute a large array of output scores in a highly user-friendly
computerized report. The module scores are reported as
MEMORY standard scores (M 5 100, SD 5 15), whereas the subtest
• List Learning—Verbal learning of 12-word list with scores are rendered as T-scores (M 5 50, SD 5 10).
interference trial The reliability of test scores is highly variable across
the different modules and subtests, and is influenced by
*Subtests used on the Screening Module. the examinee’s age as well. The average coefficient alphas
310 Chapter 10

for the subtests in the five major modules revealed the fol- the Daily Living memory subtest provided the greatest
lowing ranges (Stern & White, 2003b): accuracy in identifying patients with Alzheimer’s Disease.
It will prove interesting in the years ahead to see how addi-
Attention Module: .78 to .79
tional studies bear on the validity of the NAB.
Language Module: .48 to .84
Memory Module: .47 to .86
Baseline Testing With Brief Neuropsycholog-
Spatial Module .65 to .67 ical Test Batteries As with most human attributes,
Executive Functions Module: .45 to .77 variability in neurocognitive abilities from one person to
Test–retest reliability was evaluated with 95 individu- the next is substantial. Some people are quick with reaction
als who were tested twice over an average span of 6 times, strong in memory skills, and facile with mathemati-
months. Understandably, these average coefficients were cal processing; others innately possess lower levels of abil-
somewhat lower and more variable: ity; and, most of us are somewhere in between. Individual
differences present a quandary in assessment, especially
Attention Module: .44 to .87 when the objective is to identify mild or subtle neuropsy-
Language Module: .23 to .70 chological deficits such as mild traumatic brain injury
Memory Module: .41 to .61 (mTBI). When do low scores indicate mTBI and when do
Spatial Module .13 to .68 they signify a typical level of functioning? Access to base-
Executive Functions Module: .43 to .64 line testing can prove invaluable in making this distinc-
tion. For at least two areas of assessment, the acquisition of
These relationships between test and retest NAB scores
baseline test data has become the expected practice.
are respectable, given the lengthy test-retest interval.
One application of baseline testing is the Automated
The validity of the NAB is difficult to summarize con-
Neuropsychological Assessment Metrics (ANAM) Trau-
cisely, because of the complexity of the instrument. The
matic Brain Injury (TBI) Battery used in the armed forces.
authors provide extensive documentation on validity, as
U.S. military troops deployed to war zones are administered
evaluated from the traditional perspectives, including con-
the latest version, the ANAM4 TBI Battery, to obtain base-
tent validity, factor-analytic evidence of construct validity,
line neurocognitive performance levels. In situations where
and convergent and divergent correlations with similar
a soldier has been exposed to trauma such as an IED blast,
and dissimilar external measures (all supportive). The
retesting with the ANAM4 TBI Battery will help identify the
authors conclude:
presence of TBI, even if it is mild in severity. The battery was
Although the data presented in this chapter support the designed to minimize retesting effects by providing a nearly
validity of the NAB, these data and analyses should be endless source of potential stimuli within each test module.
considered only the beginning steps in the ongoing pro- Developed under the guidance of the U.S. Army, the battery
cess of test validation.
is widely available and used in diverse settings worldwide.
(Stern & White, 2003b, p. 141)
The full ANAM4 consists of 22 assessments that can be
Temple and Zgaljardic (2009) provide independent grouped into flexible or standardized batteries. The sub-
evidence for the validity of the Screening Module of the tests include measures of reaction time, learning, memory,
NAB. They note strong associations with a measure of mathematical processing, spatial processing, executive
functional independence in a sample of 70 individuals functions, and symptoms. Based on decades of study by
with moderate-to-severe traumatic brain injury at a resi- dozens of neuropsychological and human performance
dential post-acute rehabilitation facility. Yet, Iverson, Wil- researchers, the subtests are highly sensitive to the impact
liamson, Ropacki, and Reilly (2007) come down on the of brain injury, degenerative disease, toxin exposure, medi-
other side of the fence. In their study of 37 outpatients with cation effects, and rehabilitation efforts. All modules are
neurological problems, results on the Screening Module administered with a personal laptop computer. For the
were better than expected. In other words, in their sample performance-based measures, stimuli are presented visu-
the instrument did not show good sensitivity. ally, and the left–right mouse buttons are used for the
We need to keep in mind that the establishment of test forced-choice options.
validity is a dynamic process, not something set in stone The ANAM4 TBI Battery consists of eight assessments
when a test is released. The meaning of tests scores is that can be administered in about 20 minutes, making it
sharpened and refined by ongoing research. Several recent highly feasible as a follow-up test when a soldier has been
reports support the validity of the NAB. For example, in a exposed to trauma such as an IED blast. The eight modules
study of 54 patients with TBI and 54 matched controls, are listed in Table 10.12. The ANAM4 software generates a
Donders and Levitt (2012) found that the Attention, Execu- full report providing the examiner with the current neuro-
tive Functions, and Memory modules were highly sensi- cognitive status of the soldier, comparisons to previous
tive to brain impairment. Gavett et al. (2012) reported that testing sessions, and comparisons to selected reference and
Neuropsychological Assessment and Screening 311

individual team members. Impact is a highly popular com-


Table 10.12 Subtests of the ANAM4 TBI Battery puter-based testing program that is used in high school,
Sleepiness scale: A self-assessment of the soldier’s sleepiness/fatigue college, and professional sports programs. It should be
level on a 7-point scale from “very alert” to “very sleepy.”
given only by persons trained in its administration and
Mood scale: A self-assessment of the user’s mood state in seven
categories (Vigor, Happiness, Depression, Anger, Fatigue, Anxiety, and
interpretation. The test developers caution that the battery
Restlessness). A number of adjectives related to these mood categories should never be used as a “stand-alone” device for diagno-
are rated on a 7-point scale. sis or decision-making.
Simple reaction time (SRT): The user clicks the left mouse button when an ImPACT typically is administered from a laptop com-
asterisk appears on the screen at random intervals. A measure of
attention and reaction time. puter by an athletic trainer, school nurse, team doctor, or
Code substitution: A display of digits 1 through 9 appears in a row at the psychologist to help determine when a player is ready to
top of the screen with a different symbol above each digit. A series of 72 return to the field after a possible concussion from a hard
individual probes appears at the bottom of the screen, each showing a
pairing of a digit and symbol. The soldier clicks the left or right mouse “hit” or other head trauma. The six modules are described
button to signify a match or non-match, respectively, with the static in Table 10.13.
display at the top of the screen. A measure of visual search, sustained
attention, and encoding.
Dozens of published studies pertain to the reliability
Procedural reaction time: A series of single digits (2, 3, 4, or 5) is
and validity of ImPACT. See impacttest.com for a listing of
presented in 32 trials. The user clicks the left mouse button to indicate the references. We will summarize here two studies on the sen-
digit is “low” (2 or 3) or the right mouse button to indicate the digit is sitivity and specificity of test scores in predicting certain
“high” (4 or 5). A measure of processing efficiency and rule-following.
outcomes. The reader will recall that sensitivity refers to
Mathematical processing: A series of single-digit arithmetic equations
(e.g., 3 1 4 2 1) is presented in 20 trials. The user clicks the left mouse the percentage of respondents with a known condition
button to indicate the answer is less than 5 or the right mouse button if who are correctly detected, whereas specificity refers to the
the answer is greater than 5. A measure of basic computational skills,
concentration, and working memory. percentage of respondents without the condition who are
Matching to sample: A series of 4 3 4 matrices with cells in a 2-color correctly designated. Lau, Collins, and Lovell (2011) fol-
format are presented in 20 trials. Following each stimulus, a pair of slightly lowed 108 male high school football players who sustained
different 4 3 4 matrices appears side-by-side. The user clicks the left or
right mouse button to indicate the correct match to the previous stimulus.
a concussion and then divided the group into protracted
A measure of spatial processing and visuo-spatial working memory. recovery (14 or more days) before returning to play, and
Code substitution delayed: A series of 36 probes appears as in the short recovery (less than 14 days) before returning to play.
previous code substitution test. The soldier response in the same fashion, A combination of four symptom clusters and four ImPACT
but must access memory of the static display, which is not represented.
A measure of delayed recall for visual stimuli. scores yielded a sensitivity of 65 percent and specificity of
Source: Based on Eonta, S. E., Carr, W., McArdle, J. J., and others (2011). Automated
80 percent. Schatz, Pardini, Lovell, Collins, and Podell
Neuropsychological Assessment Metrics: Repeated assessments with two military samples. (2006) tested 12 recently concussed athletes with ImPACT
Aviation, Space, and Environmental Medicine, 82, 34–39.
and compared the data to results for 66 high school ath-
norm groups. Researchers can transfer data in spreadsheet letes with no history of concussion. The best discriminant
format to preferred statistical packages. function analysis correctly classified 82 percent of partici-
Normative data based on extraordinarily large samples pants in the concussion group (sensitivity) and 89 percent
are available for the ANAM4 TBI Battery. Vincent, Roebuck- of participants in the control group (specificity). These two
Spencer, Gilleland, and Schlegel (2012) collected test data studies support the overall utility of ImPACT.
from over 107,500 active duty service members 17 to 65 But the battery is not without its critics. ESPN contrib-
years of age. The norms are carefully stratified by age and utor Peter Keating (2012) cites a concern about the high
gender. The main criticism of ANAM4 is the lack of research false positive rate, and notes the conflict of interest in
on its effectiveness in identifying mTBI in soldiers (Kennedy which the test developers, who have published the vast
& Moore, 2010). While it is clear that the individual subtests majority of research on the battery, also are involved in
possess strong psychometric qualities, there is surprisingly marketing the battery for profit. Further, he notes that
little research on such matters as sensitivity and specificity … in practice, it’s hard for any neuropsychological test to
of the overall battery in the identification of mTBI. get good data. Some athletes intentionally try to perform
Another laptop-based neurocognitive battery is poorly on baselines so their post-injury tests won’t keep
ImPACT (Immediate Post-Concussion Assessment and them out of play. Peyton Manning [Denver Broncos quar-
Cognitive Testing), developed in the 1990s terback] admitted to this practice, which players call
by Mark Lovell and Joseph Maroon (Lovell, 2006; sandbagging, in April 2011 (ESPN The Magazine, “Con-
Lovell, Iverson, Collins, and others, 2006). ImPACT is cussion Test May Not Be Panacea,” August 26, 2012).
intended for sports settings to help in making return-to- After reviewing the available research, Mayers and
play decisions following concussions. The 20-minute bat- Redick (2012) conclude that the empirical evidence does not
tery is widely used in clinical management of concussions support the use of the battery for making return-to-play
for athletes ages 10 through adulthood. The instrument is decisions. ImPACT likely serves a positive purpose by sen-
intended for use when baseline results are available for sitizing players, coaches, and others to the dangers of
312 Chapter 10

repeated concussion. But as the test developers acknowl-


Table 10.13 The Six Modules from the ImPACT Test edge, test results alone should never be the basis for impor-
Battery
tant decisions like returning to play after head trauma.
The stakes are high for athletes and their families. In the
long-term, repeated blows to the head are known to cause
chronic traumatic encephalopathy (CTE), a degenerative
brain disease associated with memory loss, confusion, aggres-
sion, impulse control problems, Parkinsonian symptoms
(tremor, gait abnormalities, slurred speech), and, eventually,
progressive dementia (Saulle & Greenwald, 2012). Even
“minor” blows to the head that do not result in serious symp-
toms can lead to CTE if they occur with sufficient frequency,
as in boxing or football (McKee, Cantu, Nowinski, and others,
2009). In a recent post-mortem analysis of brain tissue in 85
former football players, hockey players, and military veter-
ans, McKee, Stein, Nowinski, and others (2012) concluded
that “for some athletes and war fighters, there may be severe
and devastating long-term consequences of repetitive brain
trauma that has traditionally been considered only mild
(p. 20).” As a society, we may want to reconsider the glamori-
zation of contact sports like football, boxing, and hockey.

10.2.10: Screening for Alcohol


use Disorders
The ways in which people can abuse alcohol include a
spectrum of misfortune and tragedy ranging from an occa-
sional hangover to, literally, drinking oneself to death. But
clinicians and researchers generally recognize two diagno-
ses: alcohol abuse and alcohol dependence (American Psy-
chiatric Association, 2000). Loosely speaking, the more
generic syndrome of alcoholism refers to either diagnosis.
A full discussion of these syndromes is not justified here,
but a brief summary is warranted. Interestingly, neither
alcohol abuse nor dependence is defined by ingestion of a
particular amount of alcohol, although substantial quanti-
ties typically are involved. The criteria for alcohol abuse
refer to the functional impact of drinking on the life of the
patient. In particular, if an individual meets one or more of
four criteria, a diagnosis of alcohol abuse is defensible.
Different criteria to diagnose alcohol abuse:

In addition to meeting one or more of these criteria, the


patient must not meet the criteria for a diagnosis of substance
dependence, which often entails a more serious and chronic
syndrome. Specifically, if a patient meets three or more of seven
Source: Based on descriptions from impacttest.com and Lovell (2006). criteria, a diagnosis of alcohol dependence is warranted.
Neuropsychological Assessment and Screening 313

Different criteria to diagnose alcohol dependence: to a 1991 study of 1,991 participant responses to the Cut-
down, Annoyed, Guilt, and Eye-opener (CAGE) question-
naire through telephone interview of 5,382 residents. The
time period in question, 1991 to 2005, was an era in which
alcohol consumption was known to be in decline, so it was
surprising to the researchers when they found that the per-
centage of respondents endorsing each of the symptoms
had increased substantially. In fact, the magnitude of the
paradoxical increase astonished the researchers. For exam-
ple, when asked whether they had thought about cutting
down on their drinking, the percentage of respondents
who answered “yes” increased from 4.3 percent in 1991 to
16.6 percent in 2005. The researchers speculate that the
results might indicate the emergence of a temperance
movement in France. Whether or not this is true, the find-
Given the high prevalence of alcohol use disorders in ings most certainly cast doubt on the value of the CAGE in
the United States, it is nearly inevitable that psychologists general population surveys.
and other clinicians will encounter patients who experi- Some researchers find that the CAGE questionnaire is
ence problems in this spectrum. Fortunately, there are sev- more effective for screening with men than women (Cher-
eral simple devices useful for screening and assessment, pitel, 2002). In response to this shortcoming, a similar
which we review here. In some cases, these tools are pris- instrument called the TWEAK questionnaire was devel-
tinely simple and consist of the clinician casually asking a oped specifically for women. The acronym refers to Toler-
handful of “yes-no” questions. In other cases, a more tradi- ance for drinking, Worried friends or relatives, Eye-opener
tional paper-and-pencil questionnaire is needed. to get going in the morning, Amnesia for things done or
The CAGE questionnaire is a short screening instru- said while drinking, and feeling the need to Kut down on
ment that consists of the practitioner asking if the client has intake (Russell, Martier, Sokol, and others, 1994). TWEAK
thought about Cutting down on drinking, become Annoyed is scored on a 7-point scale, with the first two items earning
by criticism of his or her drinking, felt Guilty about his or two points each, the last three items earning one point
her drinking, or had an Eye-opener drink in the morning. A each. A total score of two or more points indicates the like-
simple “yes–no” question pertinent to each symptom is lihood of an alcohol problem. TWEAK is highly accurate in
asked as part of a general health history. The exact wording screening for alcohol problems in women (Bradley, Boyd-
of this copyrighted instrument can be found in Ewing Wickizer, Powell, & Burman, 1998).
(1984). The endorsement of even a single item suggests the CAGE and TWEAK are by no means the only acro-
presence of an alcohol use disorder, whereas saying “yes” nymic screening tools for alcohol problems. Other instru-
to two or more items virtually guarantees that the patient ments include the five-item RAPS questionnaire or Rapid
will meet the criteria for alcohol abuse or dependence. Alcohol Problems Screen (Cherpitel, 1995) and the 10-item
Research indicates that the tool is more effective when it is AUDIT questionnaire or Alcohol Use Disorders Identifica-
not preceded by questions about how much or how often tion Test (Saunders, Aasland, Babor, and others, 1993). A
the patient drinks (Steinweg & Worth, 1993). Apparently, huge amount of effort was invested in the development
questions about quantity and frequency trigger denial in and validation of the AUDIT questionnaire. Research on
the patient, making accurate assessment nearly impossible. this instrument was underwritten by the World Health
The CAGE questionnaire has proved valuable as a screen- Organization (WHO), and the scale has been translated
ing tool in numerous locations, including general psycho- into many languages.
logical practice and medical settings. In one study of a Dozens of additional screening tests could be men-
“walk-in” or immediate-care Veterans hospital clinic, the tioned, but we want to close this section by reviewing an
test correctly identified 86 percent of patients later con- interesting scale that embodies some appealing methods of
firmed to have alcoholism and accurately ruled out 93 per- test construction. The Substance Abuse Subtle Screening
cent of patients later confirmed not to have alcohol Inventory-3 or SASSI-3 (Miller, Roberts, Brooks, &
problems. Astonishingly, the prevalence rate for alcoholism Lazowski, 1997) consists of two types of questions: obvious
was determined to be 22 percent for this largely male clinic and subtle. The obvious questions include 26 behaviors
population (Liskow, Campbell, Nickel, & Powell, 1995). that are endorsed on a 4-point Likert-type continuum rang-
A recent epidemiological study conducted in and ing from never to repeatedly. These questions embody high
around Paris, France, casts doubt on the usefulness of the face validity and are on a par with “I have taken drugs to
CAGE test as a screening device for alcoholism (Messiah, improve how I feel” and “I have had more to drink than I
et al., 2007). In 2005, the researchers conducted a follow-up planned.” The subtle questions consist of 67 true–false
314 Chapter 10

items that are more indirect and indicative of the attitudes so easy that normal adults almost always obtain scores in
and behaviors that commonly accompany substance abuse. the range of 27 to 30 points (Figure 10-9).
These questions are on par with “I probably break the law
more than others” and “I tend to be a responsible person” Figure 10.9 Scoring Weights and Domains of the Mini-
Mental State Examination
[reverse scored]. Both types of items—obvious and
­subtle—were carefully validated during test construction. 5 Orientation to Time (day, date, month, season, and year)

Test construction consisted of administering a large 5 Orientation to Place (floor, building, city area, city, state)

group of preliminary items to three groups of individuals: 3 Immediate Memory (three words presented orally)

substance abusers, non–substance abusers, and substance 5 Attention and Calculation (serial 7s, five subtractions)

abusers instructed to “fake good.” The SASSI-3 emerged 3 Delayed Recall (three words presented orally above)
after this large pool of items was winnowed down to a 2 Naming (pencil and watch)
smaller number, based on group contrasts. The resulting 1 Repetition (brief sentence presented orally)
instrument includes the direct items—those that discrimi- 3 Comprehension (follow simple three-part oral command)
nated substance abusers from non–substance abusers, and 1 Reading (read simple command and obey)
the indirect items—those that discriminated the “fake-good” 1 Writing (compose a simple sentence)
substance abusers from non–substance abusers. In addition 1 Drawing (reproduce two intersecting pentagons)
to the adult scale, an adolescent version now has been pub- 30 Total
lished, and the instrument is available for supervised online
The reliability of this simple instrument is excellent.
administration. A Spanish version also is available.
Folstein et al. (1975) report a 24-hour test-retest reliability
The test developers report excellent reliability for the
of .89 for 22 patients with varied depressive symptoms.
SASSI-3, with two-week test–retest stability coefficients for
Reliability over a 28-day period for 23 clinically stable
40 respondents ranging from .92 to 1.00 for the subscales
patients with diagnoses of dementia, depression, and
and coefficient alpha of .93 for the test overall. A validity
schizophrenia was an impressive .99. Normative data are
study of 419 respondents revealed a 95 percent rate of cor-
available from several sources (e.g., Lindal & Stefansson,
rect classification for substance abusers and a 93 percent
1993; Tombaugh, McDowell, Kristjansson, & Hubley, 1996).
correct classification rate for non–substance abusers—very
Using a cutting score of 23 or below as abnormal and 24 or
impressive results for a short screening test (Miller &
above as normal, the MMSE is about 80 to 90 percent accurate
Lazowski, 1999). Laux, Salyers, and Kotova (2005) found
in identifying elderly patients with suspected Alzheimer’s dis-
strong test-retest reliability with the SASSI-3 in a sample of
ease or other dementia. This cutting score produces few false-
103 college students, reporting r 5 .94 over a one-week
positives (normal patients classified as having dementia). The
period. Feldstein and Miller (2007) reviewed 36 studies on
sensitivity of the instrument depends on a number of factors,
all editions of the SASSI and weigh in skeptically, citing
including the cutting score used, the educational level of the
high rates of false positives. They propose that public
examinee, the extent of the dementia, the nature of the under-
domain instruments (e.g., CAGE, AUDIT) perform just as
lying pathology, and the type of setting in which assessments
well and have the added advantage of being free.
are undertaken (Anthony, LeResche, Niaz, Von Korff, &
The SASSI-3 appears to be a capable tool. Yet, given
­Folstein, 1982; Tombaugh, McDowell, Kristjansson, & Hubley,
the frequency of its use—the instrument has been adminis-
1996; Tsai & Tsuang, 1979). In spite of its limitations, the MMSE
tered millions of times—it is disconcerting that few inde-
remains the most reliable and practical screening test for
pendent studies have been published (Gray, 2001). A search
dementia in the elderly (Ferris, 1992). Drebing, Van Gorp,
of PsychInfo yielded only 15 studies on the test, and the
Stuck, and others (1994) recommend its use as part of a short
majority of these were unpublished doctoral dissertations.
screening battery for cognitive decline in the elderly.
More research is needed to corroborate the value of this
Research on the MMSE continues unabated. A search
promising inventory.
of PsychINFO for articles with “MMSE” in the title yielded
Mini-Mental State Exam The most widely used 128 hits with 27 of them published since 2010. A final cau-
mental status tool with the elderly is the Mini-Mental State tion is worth mentioning. The MMSE has become so popu-
Examination (MMSE), a 5- to 10-minute screening test that lar that some practitioners use MMSE total scores as a
yields an objective global index of cognitive functioning shortcut toward a diagnosis of dementia (Nieuwenhuis-
(Folstein, Folstein, & McHugh, 1975; Tombaugh, McDow- Mark, 2010). Tests should never be used as a substitute for
ell, Kristjansson, & Hubley, 1996). The test contains 30 clinical judgment.
scorable items having to do with orientation, immediate
memory, attention, calculation, language production, lan- Chapter Quiz: Neuropsychological Assessment and
guage comprehension, and design copying. The items are Screening
Chapter 11
Industrial, Occupational,
and Career Assessment
Learning Objectives
11.1 Review the role of psychological tests in 11.2 State the challenges of vocational
making decisions about hiring, placement, psychologists who provide career
promotion, and evaluation of resources guidance and assessment

11.1: Industrial and psychologists in the testing and placement of recruits. Psy-
chological testing in the service of decision making about
Organizational Assessment personnel is, thus, a prominent focus of this profession. Of
course, specialists in I/O psychology possess broad skills
11.1 Review the role of psychological tests in making and often handle many corporate responsibilities not previ-
decisions about hiring, placement, promotion, ously mentioned. Nonetheless, there is no denying the cen-
and evaluation of resources trality of assessment to their profession.
We begin our review of assessment in the occupational
In this chapter we explore the specialized applications of
arena by surveying the role of testing in personnel selection.
testing within two distinctive environments—occupational
This is followed by a discussion of ways that psychological
settings and vocational settings. Although disparate in
measurement is used in the appraisal of work performance.
many respects, these two fields of assessment share essen-
tial features. For example, legal guidelines exert a powerful
and constraining influence upon the practice of testing in 11.1.1: The Role of Testing
both arenas. Moreover, issues of empirical validation of in Personnel Selection
methods are especially pertinent in occupational and areas
of practice. In Topic 11A, Industrial and Organizational Complexities of Personnel Selection Based
Assessment, we review the role of psychological tests in upon the assumption that psychological tests and assess-
making decisions about personnel such as hiring, place- ments can provide valuable information about potential
ment, promotion, and evaluation. In Topic 11B, Assessment job performance, many businesses, corporations, and mili-
for Career Development in a Global Economy, we analyze tary settings have used test scores and assessment results
the unique challenges encountered by vocational psychol- for personnel selection. As Guion (1998) has noted, I/O
ogists who provide career guidance and assessment. Of research on personnel selection has emphasized criterion-
course, relevant tests are surveyed and catalogued related validity as opposed to content or construct validity.
throughout. But more important, we focus upon the spe- These other approaches to validity are certainly relevant
cial issues and challenges encountered within these dis- but usually take a back seat to criterion-related validity,
tinctive milieus. which preaches that current assessment results must pre-
Industrial and organizational psychology (I/O psychol- dict the future criterion of job performance.
ogy) is the subspecialty of psychology that deals with behav- From the standpoint of criterion-related validity, the
ior in work situations (Borman, Ilgen, Klimoski, & Weiner, logic of personnel selection is seductively simple. Whether
2003). In its broadest sense, I/O psychology includes diverse in a large corporation or a small business, those who select
applications in business, advertising, and the military. For employees should use tests or assessments that have docu-
example, corporations typically consult I/O psychologists mented, strong correlations with the criterion of job perfor-
to help design and evaluate hiring procedures; businesses mance, and then hire the individuals who obtain the
may ask I/O psychologists to appraise the effectiveness of highest test scores or show the strongest assessment results.
advertising; and military leaders rely heavily upon I/O What could be simpler than that?
315
316 Chapter 11

Unfortunately, the real-world application of employ- 11.1.2: Autobiographical Data


ment selection procedures is fraught with psychometric
According to Owens (1976), application forms that request
complexities and legal pitfalls. The psychometric intrica-
personal and work history as well as demographic data
cies arise, in large measure, from the fact that job behavior
such as age and marital status have been used in industry
is rarely simple, unidimensional behavior. There are some
since at least 1894. Objective or scorable autobiographical
exceptions (such as assembly-line production) but the gen-
data—sometimes called biodata—are typically secured by
eral rule in our postindustrial society is that job behavior is
means of a structured form variously referred to as a bio-
complex, multidimensional behavior. Even jobs that seem
graphical information blank, biographical data form,
simple may be highly complex. For example, consider
application blank, interview guide, individual background
what is required for effective performance in the delivery
survey, or similar device. Although the lay public may not
of the U.S. mail. The individual who delivers your mail six
recognize these devices as true tests with predictive power,
days a week must do more than merely place it in your
I/O psychologists have known for some time that biodata
mailbox. He or she must accurately sort mail on the run,
furnish an exceptionally powerful basis for the prediction
interpret and enforce government regulations about pack-
of employee performance (Cascio, 1976; Ghiselli, 1966;
age size, manage pesky and even dangerous animals, rec-
Hunter & Hunter, 1984). An important milestone in the
ognize and avoid physical dangers, and exercise effective
biodata approach is the publication of the Biodata Hand-
interpersonal skills in dealing with the public, to cite just a
book, a thorough survey of the use of biographical informa-
few of the complexities of this position.
tion in selection and the prediction of performance (Stokes,
Personnel selection is, therefore, a fuzzy, conditional,
Mumford, & Owens, 1994).
and uncertain task. Guion (1991) has highlighted the diffi-
The rationale for the biodata approach is that future
culty in predicting complex behavior from simple tests. For
work-related behavior can be predicted from past choices
one thing, complex behavior is, in part, a function of the
and accomplishments. Biodata have predictive power
situation. This means that even an optimal selection
because certain character traits that are essential for suc-
approach may not be valid for all candidates. Quite clearly,
cess also are stable and enduring. The consistently ambi-
personnel selection is not a simple matter of administering
tious youth with accolades and accomplishments in high
tests and consulting cutoff scores.
school is likely to continue this pattern into adulthood.
We must also acknowledge the profound impact of
Thus, the job applicant who served as editor of the high
legal and regulatory edicts upon I/O testing practices.
school newspaper—and who answers a biodata item to
Given that such practices may have weighty consequences—
this effect—is probably a better candidate for corporate
determining who is hired or promoted, for example—it is not
management than the applicant who reports no extracur-
surprising to learn that I/O testing practices are rigorously
ricular activities on a biodata form.
constrained by legal precedents and regulatory mandates.
The Nature of Biodata Biodata items usually call for
Approaches to Personnel Selection Acknowl- “factual” data; however, items that tap attitudes, feelings,
edging that the interview is a widely used form of personnel and value judgments are sometimes included. Except for
assessment, it is safe to conclude that psychological assess- demographic data such as age and marital status, biodata
ment is almost a universal practice in hiring decisions. Even items always refer to past accomplishments and events.
by a narrow definition that includes only paper-and-pencil Some examples of biodata items are listed in Table 11.1.
measures, at least two-thirds of the companies in the United Once biodata are collected, the I/O psychologist must
States engage in personnel testing (Schmitt & Robertson, devise a means for predicting job performance from this
1990). For purposes of personnel selection, the I/O psychol- information. The most common strategy is a form of
ogist may recommend one or more of the following: empirical keying not unlike that used in personality test-
ing. From a large sample of workers who are already
• Autobiographical data
hired, the I/O psychologist designates a successful group
• Employment interview and an unsuccessful group, based on performance, tenure,
• Cognitive ability tests salary, or supervisor ratings. Individual biodata items are
• Personality, temperament, and motivation tests then contrasted for these two groups to determine which
• Paper-and-pencil integrity tests items most accurately discriminate between successful
and unsuccessful workers. Items that are strongly discrim-
• Sensory, physical, and dexterity tests
inative are assigned large weights in the scoring scheme.
• Work sample and situational tests
New applicants who respond to items in the keyed direc-
We turn now to a brief survey of typical tests and assess- tion, therefore, receive high scores on the biodata instru-
ment approaches within each of these categories. We close this ment and are predicted to succeed. Cross validation of the
topic with a discussion of legal issues in personnel testing. scoring scheme on a second sample of successful and
Industrial, Occupational, and Career Assessment 317

reference checks, academic achievement, expert judgment,


Table 11.1 Examples of Biodata Questions and projective techniques. Noting that properly standard-
How long have you lived at your present address? ized ability tests provide the fairest and most valid selec-
What is your highest educational degree? tion procedure, Reilly and Chao (1982) concluded that only
How old were you when you obtained your first paying job? biodata and peer evaluations had validities substantially
How many books (not work related) did you read last month? equal to those of standardized tests. For example, in the
At what age did you get your driver’s license? prediction of sales productivity, the average validity coef-
In high school, did you hold a class office? ficient of biodata was a very healthy .62.
How punctual are you in arriving at work? Certain cautions need to be mentioned with respect to
What job do you think you will hold in 10 years?
biodata approaches in personnel selection. Employers may
How many hours do you watch television in a typical week?
be prohibited by law from asking questions about age,
race, sex, religion, and other personal issues—even when
Have you ever been fired from a job?
such biodata can be shown empirically to predict job per-
How many hours a week do you spend on hobbies?
formance. Also, even though the incidence of faking is very
How many job projects did you manage in the last year?
low, there is no doubt that shrewd respondents can falsify
In college, did you participate in a sports team?
results in a favorable direction. For example, Schmitt and
How many hours per month do you volunteer?
Kunce (2002) addressed the concern that some examinees
What is your attitude toward others who use marijuana?
might distort their answers to biodata items in a socially
desirable direction. These researchers compared the scores
obtained when examinees were asked to elaborate their
unsuccessful workers is a crucial step in guaranteeing the
biodata responses versus when they were not. Requiring
validity of the biodata selection method. Readers who
elaborated answers reduced the scores on biodata items;
wish to pursue the details of empirical scoring methods
that is, it appears that respondents were more truthful
for biodata instruments should consult Murphy and
when asked to provide corroborating details to their writ-
Davidshofer (2004), Mount, Witt, and Barrick (2000), and
ten responses.
Stokes and Cooper (2001).
Recently, Levashina, Morgeson, and Campion (2012)
The Validity of Biodata The validity of biodata has proved the same point in a large scale, high-stakes selection
been surveyed by several reviewers, with generally posi- project with 16,304 applicants for employment. Biodata
tive findings (Breaugh, 2009; Stokes et al., 1994; Stokes & constituted a significant portion of the selection procedure.
Cooper, 2004). An early study by Cascio (1976) is typical of The researchers used the response elaboration technique
the findings. He used a very simple biodata instrument—a (RET), which obliges job applicants to provide written elab-
weighted combination of 10 application blank items—to orations of their responses. Perhaps an example will help. A
predict turnover for female clerical personnel in a naked, unadorned biodata question might ask:
medium-sized insurance company. The cross-validated
• How many times in the last 12 months did you de-
correlations between biodata score and length of tenure
velop novel solutions to a work problem in your area
were .58 for minorities and .56 for nonminorities.1 Drakeley
of responsibility?
et al. (1988) compared biodata and cognitive ability tests
as predictors of training success. Biodata scores possessed Most likely, a higher number would indicate greater
the same predictive validity as the cognitive tests. Further- creativity and empirically predict superior work productiv-
more, when added to the regression equation, the biodata ity. The score on this item would be combined with others
information improved the predictive accuracy of the cog- to produce an overall biodata score used in personnel selec-
nitive tests. tion. But notice that nothing prevents the respondent from
In an extensive research survey, Reilly and Chao (1982) exaggeration or outright lying. Now, consider the original
compared eight selection procedures as to validity and question with the addition of response elaboration:
adverse impact on minorities. The procedures were bio- • How many times in the last 12 months did you develop
data, peer evaluation, interviews, self-assessments, novel solutions to a work problem in your area of
responsibility?
1
The curious reader may wish to know which 10 biodata items • For each circumstance, please provide specific details
could possess such predictive power. The items were age, marital as to the problem and your solution.
status, children’s age, education, tenure on previous job, previous
salary, friend or relative in company, location of residence, home Levashina et al. (2012) found that using the RET tech-
ownership, and length of time at present address. Unfortunately, nique produced more honest and realistic biodata scores.
Cascio (1976) does not reveal the relative weights or direction of Further, for those items possessing the potential for exter-
scoring for the items. nal verification, responses were even more realistic. The
318 Chapter 11

researchers conclude that RET decreases faking because it reliability from dozens of these early studies was typically
increases accountability. in the mid-.50s, much too low to provide accurate assess-
As with any measurement instrument, biodata items ments of job candidates. This research also revealed that
will need periodic restandardization. Finally, a potential interviewers were prone to halo bias and other distorting
drawback to the biodata approach is that, by its nature, this influences upon their perceptions of candidates. Halo
method captures the organizational status quo and might, bias—discussed in the next topic—is the tendency to rate a
therefore, squelch innovation. Becker and Colquitt (1992) candidate high or low on all dimensions because of a global
discuss precautions in the development of biodata forms. impression.
The use of biodata in personnel selection appears to Later, researchers discovered that interview reliability
be on the rise. Some corporations rely on biodata almost to could be increased substantially if the interview was jointly
the exclusion of other approaches in screening applicants. conducted by a panel instead of a single interviewer
The software giant Google is a case in point. In years past, (Landy, 1996). In addition, structured interviews in which
the company used traditional methods such as hiring can- each candidate was asked the same questions by each
didates from top schools who earned the best grades. But interviewer also proved to be much more reliable than
that tactic now is used rarely in industry. Instead, many unstructured interviews (Borman, Hanson, & Hedge, 1997;
corporations like Google are moving toward automated Campion, Pursell, & Brown, 1988). In these studies, relia-
systems that collect biodata from the many thousands of bilities in the .70s and higher were found.
applicants processed each year. Using online surveys, Research on validity of the interview has followed the
these companies ask applicants to provide personal details same evolutionary course noted for reliability: Early
about accomplishments, attitudes, and behaviors as far research that examined unstructured interviews was quite
back as high school. Questions can be quite detailed, such pessimistic, while later research using structured
as whether the applicant has ever published a book, approaches produced more promising findings. In these
received a patent, or started a club. Formulas are then studies, interview validity was typically assessed by corre-
used to compute a score from 0 to 100, designed to predict lating interview judgments with some measure of on-the-
the degree to fit with corporate culture (Ottinger & Kurzon, job performance. Early studies of interview validity
2007). The system works well for Google, which claims to yielded almost uniformly dismal results, with typical
have only a 4 percent turnover rate. validity coefficients hovering in the mid-.20s (Arvey &
There is little doubt, then, that purely objective biodata Campion, 1982).
information can predict aspects of job performance with Mindful that interviews are seldom used in isolation,
fair accuracy. However, employers are perhaps more likely early researchers also investigated incremental validity,
to rely upon subjective information such as interview which is the potential increase in validity when the inter-
impressions when making decisions about hiring. We turn view is used in conjunction with other information. These
now to research on the validity of the employment inter- studies were predicated on the optimistic assumption that
view in the selection process. the interview would contribute positively to candidate
evaluation when used alongside objective test scores and
background data. Unfortunately, the initial findings were
11.1.3: The Employment Interview almost entirely unsupportive (Landy, 1996).
The employment interview is usually only one part of the In some instances, attempts to prove incremental
evaluation process, but many administrators regard it as validity of the interview demonstrated just the opposite,
the vital make-or-break component of hiring. It is not unu- what might be called decremental validity. For example,
sual for companies to interview from 5 to 20 individuals Kelly and Fiske (1951) established that interview informa-
for each person hired! Considering the importance of the tion actually decreased the validity of graduate student
interview and its huge costs to industry and the profes- evaluations. In this early and classic study, the task was to
sions, it is not surprising to learn that thousands of studies predict the academic performance of more than 500 gradu-
address the reliability and validity of the interview. We ate students in psychology. Various combinations of cre-
can only highlight a few trends here; more detailed dentials (a form of biodata), objective test scores, and
reviews can be found in Conway, Jako, and Goodman interview were used as the basis for clinical predictions of
(1995), Huffcutt (2007), Guion (1998), and Schmidt and academic performance. The validity coefficients are
Zimmerman (2004). reported in Table 11.2. The reader will notice that creden-
Early studies of interview reliability were quite sober- tials alone provided a much better basis for prediction than
ing. In various studies and reviews, reliability was typi- credentials plus a one-hour interview. The best predictions
cally assessed by correlating evaluations of different were based upon credentials and objective test scores; add-
interviewers who had access to the same job candidates ing a two-hour interview to this information actually
(Wagner, 1949; Ulrich & Trumbo, 1965). The interrater decreased the accuracy of predictions. These findings
Industrial, Occupational, and Career Assessment 319

highlighted the superiority of actuarial prediction (based are not always available. Guion (1998) has expressed the
on empirically derived formulas) over clinical prediction same point:
(based on subjective impressions). We pursue the actuarial
A large body of research on interviewing has, in my opin-
versus clinical debate in the last chapter of this text. ion, given too little practical information about how to
structure an interview, how to conduct it, and how to use
it as an assessment device. I think I know from the
Table 11.2 Validity Coefficients for Ratings Based on research that (a) interviews can be valid, (b) for validity
Various Combinations of Information
they require structuring and standardization, (c) that
Correlation with structure, like many other things, can be carried too far,
Academic (d) that without carefully planned structure (and maybe
Basis for Rating Performance even with it) interviewers talk too much, and (e) that the
Credentials alone .26 interviews made routinely in nearly every organization
Credentials and one-hour interview .13 could be vastly improved if interviewers were aware of
and used these conclusions. There is more to be learned
Credentials and objective test scores .36
and applied. (p. 624)
Credentials, test scores, and two-hour interview .32

Source: Based on data in Kelly, E. L., & Fiske, D. W. (1951). The prediction of perfor- The essential problem is that each interviewer may
mance in clinical psychology. Ann Arbor: University of Michigan Press.
evaluate only a small number of applicants, so that stand-
ardization of interviewer ratings is not always realistic.
Studies using carefully structured interviews, includ- While the interview is potentially valid as a selection tech-
ing situational interviews, provide a more positive picture nique, in its common, unstructured application there is
of interview validity (Borman, Hanson, & Hedge, 1997; probably substantial reason for concern.
Maurer & Fay, 1988; Schmitt & Robertson, 1990). When the Why are interviews used? If the typical, unstructured
findings are corrected for restriction of range and unrelia- interview is so unreliable and ineffectual a basis for job can-
bility of job performance ratings, the mean validity coeffi- didate evaluation, why do administrators continue to value
cient for structured interviews turns out to be an impressive interviews so highly? In their review of the employment
.63 (Wiesner & Cronshaw, 1988). A meta-analysis by Conway, interview, Arvey and Campion (1982) outline several reasons
Jako, and Goodman (1995) concluded that the upper limit for the persistence of the interview, including practical con-
for the validity coefficient of structured interviews was .67, siderations such as the need to sell the candidate on the job,
whereas for unstructured interviews the validity coefficient and social reasons such as the susceptibility of interviewers
was only .34. Additional reasons for preferring structured to the illusion of personal validity. Others have emphasized
interviews include their legal defensibility in the event of the importance of the interview for assessing a good fit
litigation (Williamson, Campion, Malo, and others, 1997) between applicant and organization (Adams, Elacqua, &
and, surprisingly, their minimal bias across different racial Colarelli, 1994; Latham & Skarlicki, 1995).
groups of applicants (Huffcutt & Roth, 1998). It is difficult to imagine that most employers would
In order to reach acceptable levels of reliability and ever eliminate entirely the interview from the screening and
validity, structured interviews must be designed with selection process. After all, the interview does serve the
painstaking care. Consider the protocol used by Motow- simple human need of meeting the persons who might be
idlo et al. (1992) in their research on structured interviews hired. However, based on 50 years worth of research, it is
for management and marketing positions in eight telecom- evident that biodata and objective tests often provide a
munications companies. Their interview format was based more powerful basis for candidate evaluation and selection
upon a careful analysis of critical incidents in marketing than unstructured interviews.
and management. Prospective employees were asked a set One interview component that has received recent atten-
of standard questions about how they had handled past tion is the impact of the handshake on subsequent ratings of
situations similar to these critical incidents. Interviewers job candidates. Stewart, Dustin, Barrick, and Darnold (2008)
were trained to ask discretionary probing questions for used simulated hiring interviews to investigate the com-
details about how the applicants handled these situations. monly held conviction that a firm handshake bears a criti-
Throughout, the interviewers took copious notes. Appli- cal nonverbal influence on impressions formed during the
cants were then rated on scales anchored with behavioral employment interview. Briefly, 98 undergraduates under-
illustrations. Finally, these ratings were combined to yield went realistic job interviews during which their hand-
a total interview score used in selection decisions. shakes were surreptitiously rated on 5-point scales for grip
In summary, under carefully designed conditions, the strength, completeness, duration, and vigor; degree of eye
interview can provide a reliable and valid basis for person- contact during the handshake also was rated. Independent
nel selection. However, as noted by Schmitt and Robertson ratings were completed at different times by five individu-
(1990), the prerequisite conditions for interview validity als involved in the process. Real human-resources
320 Chapter 11

professionals conducted the interviews and then offered 11.1.4: Cognitive Ability Tests
simulated hiring recommendations. The professionals
Cognitive ability can refer either to a general construct
shook hands with the candidates but were not asked to
akin to intelligence or to a variety of specific constructs
provide handshake ratings because this would have cued
such as verbal skills, numerical ability, spatial percep-
them to the purposes of the study. This is the barest outline
tion, or perceptual speed (Kline, 1999). Tests of general
of this complex investigation. The big picture that emerged
cognitive ability and measures of specific cognitive skills
was that the quality of the handshake was positively
have many applications in personnel selection, evalua-
related to hiring recommendations. Further, women bene-
tion, and screening. Such tests are quick, inexpensive,
fited more than men from a strong handshake. The
and easy to interpret.
researchers conclude their study with these thoughts:
A vast body of empirical research offers strong sup-
The handshake is thought to have originated in medieval port for the validity of standardized cognitive ability tests
Europe as a way for kings and knights to show that they
in personnel selection. For example, Bertua, Anderson,
did not intend to harm each other and possessed no con-
and Salgado (2005) conducted a meta-analysis of 283
cealed weapons (Hall & Hall, 1983). The results presented
in this study show that this age-old social custom has an
independent employee samples in the United Kingdom.
important place in modern business interactions. They found that general mental ability as well as specific
Although the handshake may appear to be a business for- ability tests (verbal, numerical, perceptual, and spatial)
mality, it can indeed communicate critical information are valid predictors of job performance and training suc-
and influence interviewer assessments. (p. 1145) cess, with validity coefficients in the magnitude of .5 to .6.
Surveying a large number of studies and employment
Perhaps this study will provide an impetus for addi-
settings, Kuncel and Hezlett (2010) summarized correla-
tional investigation of this important component of the job
tions between cognitive ability and seven measures of
interview.
work performance as follows:
Barrick, Swider, and Stewart (2010) make the general
case that initial impressions formed in the first few sec-
Job performance, high complexity: .58
onds or minutes of the employment interview signifi-
Job performance, medium complexity: .52
cantly influence the final outcomes. They cite the social
Job performance, low complexity: .40
psychology literature to argue that initial impressions are
Training success, civilian: .55
nearly instinctual and based on evolutionary mechanisms
that aid survival. Handshake, smile, grooming, manner of Training success, military: .62

dress—the interviewer gauges these as favorable (or not) Objective leader effectiveness: .33

almost instantaneously. The purpose of their study was to Creativity: .37

examine whether these “fast and frugal” judgments


formed in the first few seconds or minutes even before the Beyond a doubt, there is merit in the use of cognitive
“real” interview begins affect interview outcomes. Partici- ability tests for personnel selection.
pants for their research were 189 undergraduate students Even so, a significant concern with the use of cogni-
in a program for professional accountants. The students tive ability tests for personnel selection is that these
were pre-interviewed for just 2–3 minutes by trained instruments may result in an adverse impact on minority
graduate students for purposes of rapport building, before groups. Adverse impact is a legal term (discussed later in
a more thorough structured mock interview was con- this chapter) referring to the disproportionate selection
ducted. After the brief pre-interview, the graduate inter- of white candidates over minority candidates. Most
viewers filled out a short rating scale on liking for the authorities in personnel psychology recognize that cog-
candidate, the candidate’s competence, and perceived nitive tests play an essential role in applicant selection;
“personal” similarity. The interviewers then conducted a nonetheless, these experts also affirm that cognitive tests
full structured interview and filled out ratings. Weeks provide maximum benefit (and minimum adverse
after these mock interviews, participants engaged in real impact) when combined with other approaches such as
interviews with four major accounting firms (Deloitte biodata. Selection decisions never should be made exclu-
Touche Tohmatsu, Ernst & Young, KPMG, and Pricewater- sively on the basis of cognitive test results (Robertson &
houseCoopers) to determine whether they would receive Smith, 2001).
an offer of an internship. Just over half of the students An ongoing debate within I/O psychology is
received an offer. Candidates who made better first whether employment testing is best accomplished with
impressions during the initial pre-interview (that lasted highly specific ability tests or with measures of general
just 2–3 minutes) received more internship offers (r = .22) cognitive ability. The weight of the evidence seems to
and higher interviewer ratings (r = .42). In sum, initial support the conclusion that a general factor of intelli-
impressions in the employment interview do matter. gence (the so-called g factor) is usually a better predictor
Industrial, Occupational, and Career Assessment 321

of training and job success than are scores on specific


cognitive measures—even when several specific cogni- Table 11.3 Representative Cognitive Ability Tests Used in
Personnel Selection
tive measures are used in combination. Of course, this
conclusion runs counter to common sense and anecdotal
evidence. For example, Kline (1993) offers the following
vignette:
The point is that the g factors are important but so also are
these other factors. For example, high g is necessary to be
a good engineer and to be a good journalist. However for
the former high spatial ability is also required, a factor
which confers little advantage on a journalist. For her or
him, however, high verbal ability is obviously useful.

Curiously, empirical research provides only mixed


support for this position (Gottfredson, 1986; Larson &
Wolfe, 1995; Ree, Earles, & Teachout, 1994). Although the
topic continues to be debated, most studies support the
primacy of g in personnel selection (Borman et al., 1997;
Schmidt, 2002). Perhaps the reason that g usually works
better than specific cognitive factors in predicting job per-
formance is that most jobs are factorially complex in their
requirements, stereotypes notwithstanding (Guion, 1998).
For example, the successful engineer must explain his or
her ideas to others and so needs verbal ability as well as
spatial skills. Since measures of general cognitive ability instrument somewhat of an institution in personnel test-
tap many specific cognitive skills, a general test often pre- ing is its format (50 multiple-choice items), its brevity (a
dicts performance in complex jobs as well as, or better 12-minute time limit), and its numerous parallel forms (16
than, measures of specific skills. at last count). Item types on the Wonderlic are quite varied
Literally hundreds of cognitive ability tests are availa- and include vocabulary, sentence rearrangement, arithme-
ble for personnel selection, so it is not feasible to survey the tic problem solving, logical induction, and interpretation
entire range of instruments here. Instead, we will highlight of proverbs. The following items capture the flavor of the
three representative tests: one that measures general cogni- Wonderlic:
tive ability, a second that is germane to assessment of
mechanical abilities, and a third that taps a highly specific 1. REGRESS is the opposite of
facet of clerical work. The three instruments chosen for a. ingest
review—the Wonderlic Personnel Test-Revised, the Ben- b. advance
nett Mechanical Comprehension Test, and the Minnesota c. close
Clerical Test—are merely exemplars of the hundreds of d. open
cognitive ability tests available for personnel selection. All 2. Two men buy a car which costs $550; X pays $50 more
three tests are often used in business settings and, there- than Y. How much did X pay?
fore, worthy of specific mention. Representative cognitive a. $500
ability tests encountered in personnel selection are listed in b. $300
Table 11.3. Some classic viewpoints on cognitive ability c. $400
testing for personnel selection are found in Ghiselli (1966), d. $275
Hunter and Hunter (1984), and Reilly and Chao (1982). 3. HEFT CLEFT—Do these words have
More recent discussion of this issue is provided by Borman a. similar meaning
et al. (1997), Guion (1998), and Schmidt (2002). b. opposite meaning
c. neither similar nor opposite meaning
Wonderlic Personnel Test-Revised Even
though it is described as a personnel test, the Wonderlic The reliability of the WPT-R is quite impressive, espe-
Personnel Test-Revised (WPT-R) is really a group test of cially considering the brevity of the instrument. Internal
general mental ability (Hunter, 1989; Wonderlic, 1983). consistency reliabilities typically reach .90, while alterna-
The revised version was released in 2007 and is now tive-form reliabilities usually exceed .90. Normative data
named the Wonderlic Contemporary Cognitive Ability are available on hundreds of thousands of adults and
Test. We refer to it as the WPT-R here. What makes this hundreds of occupations. Regarding validity, if the
322 Chapter 11

WPT-R is considered a brief test of general mental ability, and the DAT Mechanical Reasoning subtest was an impres-
the findings are quite positive (Dodrill & Warner, 1988). sive .80. An intriguing finding is that the test proved to be
For example, Dodrill (1981) reports a correlation of .91 one of the best predictors of pilot success during World
between scores on the original WPT and scores on the War II (Ghiselli, 1966).
WAIS. This correlation is as high as that found between In spite of its psychometric excellence, the BMCT is
any two mainstream tests of general intelligence. Bell, in need of modernization. The test looks old and many
Matthews, Lassister, and Leverett (2002) reported strong items are dated. By contemporary standards, some BMCT
congruence between the WPT and the Kaufman Adoles- items are sexist or potentially offensive to minorities
cent and Adult Intelligence Test in a sample of adults. (Wing, 1992). The problem with dated and offensive test
Hawkins, Faraone, Pepple, Seidman, and Tsuang (1990) items is that they can subtly bias test scores. Moderniza-
report a similar correlation (r = .92) between WPT and tion of the BMCT would be a straightforward project that
WAIS-R IQ for 18 chronically ill psychiatric patients. could increase the acceptability of the test to women and
However, in their study, one subject was unable to man- minorities while simultaneously preserving its psycho-
age the format of the WPT, suggesting that severe visuos- metric excellence.
patial impairment can invalidate the test.
Another concern about the Wonderlic is that exami- Minnesota Clerical Test The Minnesota Clerical
nees whose native language is not English will be unfairly Test (MCT), which purports to measure perceptual speed
penalized on the test (Belcher, 1992). The Wonderlic is a and accuracy relevant to clerical work, has remained
speeded test. In fact, it has such a heavy reliance on speed essentially unchanged in format since its introduction in
that points are added for subjects aged 30 and older to 1931, although the norms have undergone several revi-
compensate for the well-known decrement in speed that sions, most recently in 1979 (Andrew, Peterson, & Longstaff,
accompanies normal aging. However, no accommodation 1979). The MCT is divided into two subtests: Number
is made for nonnative English speakers who might also Comparison and Name Comparison. Each subtest consists
perform more slowly. One solution to the various issues of of 100 identical and 100 dissimilar pairs of digit or letter
fairness cited would be to provide norms for untimed per- combinations (Table 11.4). The dissimilar pairs generally
formance on the Wonderlic. However, the publishers have differ in regard to only one digit or letter, so the compari-
resisted this suggestion. son task is challenging. The examinee is required to check
only the identical pairs, which are randomly intermixed
Bennett Mechanical Comprehension Test In with dissimilar pairs. The score depends predominantly
many trades and occupations, the understanding of upon speed, although the examinee is penalized for incor-
mechanical principles is a prerequisite to successful per- rect items (errors are subtracted from the number of cor-
formance. Automotive mechanics, plumbers, mechanical rect items).
engineers, trade school applicants, and persons in many
other “hands-on” vocations need to comprehend basic
mechanical principles in order to succeed in their fields. Table 11.4 Items Similar to Those Found on the
In these cases, a useful instrument for occupational test- Minnesota Clerical Test
ing is the Bennett Mechanical Comprehension Test
(BMCT). This test consists of pictures about which the
examinee must answer straightforward questions. The
situations depicted emphasize basic mechanical princi-
ples that might be encountered in everyday life. For
example, a series of belts and flywheels might be depicted,
and the examinee would be asked to discern the relative
revolutions per minute of two flywheels. The test includes
two equivalent forms (S and T).
The BMCT has been widely used since World War II
for military and civilian testing, so an extensive body of
technical and validity data exist for this instrument. Split-
half reliability coefficients range from the .80s to the low
.90s. Comprehensive normative data are provided for sev-
eral groups. Based on a huge body of earlier research, the
concurrent and predictive validity of the BMCT appear to
be well established (Wing, 1992). For example, in one study
with 175 employees, the correlation between the BMCT
Industrial, Occupational, and Career Assessment 323

The reliability of the MCT is acceptable, with reported Certainly early research on personality and job per-
stability coefficients in the range of .81 to .87 (Andrew, formance was rather sobering for many personality scales
Peterson, & Longstaff, 1979). The manual also reports a and constructs. For example, Hough, Eaton, Dunnette,
wealth of validity data, including some findings that are Kamp, and McCloy (1990) analyzed hundreds of pub-
not altogether flattering. In these studies, the MCT was lished studies on the relationship between personality
correlated with measures of job performance, measures of constructs and various job performance criteria. For these
training outcome, and scores from related tests. The job studies, they grouped the personality constructs into sev-
performance of directory assistants, clerks, clerk-typists, eral categories (e.g., Extroversion, Affiliation, Adjust-
and bank tellers was correlated significantly but not ment, Agreeableness, and Dependability) and then
robustly with scores on the MCT. The MCT is also highly computed the average validity coefficient for criteria of
correlated with other tests of clerical ability. job performance (e.g., involvement, proficiency, delin-
Nonetheless, questions still remain about the validity quency, and substance abuse). Most of the average corre-
and applicability of the MCT. Ryan (1985) notes that the lations were indistinguishable from zero! For job
manual lacks a discussion of the significant versus the non- proficiency as the outcome criterion, the strongest rela-
significant validity studies. In addition, the MCT authors tionships were found for measures of Adjustment and
fail to provide detailed information concerning the specific Dependability, both of which revealed correlations of r =
attributes of the jobs, tests, and courses used as criterion .13 with general ratings of job proficiency. Even though
measures in the reported validity studies. For this reason, it statistically significant (because of the large number of cli-
is difficult to surmise exactly what the MCT measures. ents amassed in the hundreds of studies), correlations of
Ryan (1985) complains that the 1979 norms are difficult to this magnitude are essentially useless, accounting for less
use because the MCT authors provide so little information than 2 percent of the variance.2 Specific job criteria such
on how the various norm groups were constituted. Thus, as delinquency (e.g., neglect of work duties) and sub-
even though the revised MCT manual presents new norms stance abuse were better predicted in specific instances.
for 10 vocational categories, the test user may not be sure For example, measures of Adjustment correlated r = −.43
which norm group applies to his or her setting. Because of with delinquency, and measures of Dependability corre-
the marked differences in performance between the norm lated r = −.28 with substance abuse. Of course, the nega-
groups, the vagueness of definition poses a significant tive correlations indicate an inverse relationship: higher
problem to potential users of this test. scores on Adjustment go along with lower levels of delin-
quency, and higher scores on Dependability indicate
11.1.5: Personality Tests lower levels of substance abuse. Apparently, it is easier to
predict specific job-related criteria than to predict general
It is only in recent years, with the emergence of the “big
job proficiency.
five” approach to the measurement of personality and the
Beginning in the 1990s, a renewed optimism about
development of strong measures of these five factors, that
the utility of personality tests in personnel selection
personality has proved to be a valid basis for employee
began to emerge (Behling, 1998; Hurtz & Donovan, 2000).
selection, at least in some instances. In earlier times such as
The reason for this change in perspective was the emer-
the 1950s into the 1990s, personality tests were used by
gence of the Big Five framework for research on selec-
many in a reckless manner for personnel selection:
tion, and the development of robust measures of the five
Personality inventories such as the MMPI were used for constructs confirmed by this approach such as the NEO
many years for personnel selection— in fact, overused or Personality Inventory-Revised (Costa & McCrae, 1992).
misused. They were used indiscriminately to assess a can- Evidence began to mount that personality—as conceptu-
didate’s personality, even when there was no established alized by the Big Five approach—possessed some utility
relation between test scores and job success. Soon person- for employee selection. The reader will recall from an
ality inventories came under attack.
earlier chapter that the five dimensions of this model are
(Muchinsky, 1990)
Neuroticism, Extraversion, Openness to Experience,
In effect, for many of these earlier uses of testing, a Conscientiousness, and Agreeableness. Shuffling the
consultant psychologist or human resource manager first letters, the acronym OCEAN can be used to remem-
would look at the personality test results of a candidate ber the elements. In place of Neuroticism (which
and implicitly (or explicitly) make an argument along
these lines: “In my judgment people with test results like
this are [or are not] a good fit for this kind of position.” 2
The strength of a correlation is indexed by squaring it, which pro-
Sadly, there was little or no empirical support for such vides the proportion of variance accounted for in one variable by
imperious conclusions, which basically amounted to a ver- knowing the value of the other variable. In this case, the square of
sion of “because I said so.” .13 is .0169 which is 1.69 percent.
324 Chapter 11

pertains to the negative pole of this factor), some provides an accurate measure of managerial potential
researchers use the term Emotional Stability (which (Gough, 1984, 1987). Certain scales of the CPI predict over-
describes the positive pole of the same factor) so as to all performance of military academy students reasonably
achieve consistency of positive orientation among the well (Blake, Potter, & Sliwak, 1993). The Inwald Personality
five factors. Inventory is well validated as a preemployment screening
A meta-analysis by Hurtz and Donovan (2000) solidi- test for law enforcement (Chibnall & Detrick, 2003; Inwald,
fied Big Five personality factors as important tools in pre- 2008). The Minnesota Multiphasic Personality Inventory
dicting job performance. These researchers located 45 also bears mention as a selection tool for law enforcement
studies using suitable measures of Big Five personality fac- (Selbom, Fischler, & Ben- Porath, 2007). Finally, the Hogan
tors as predictors of job performance. In total, their data set Personality Inventory (HPI) is well validated for prediction
was based on more than eight thousand employees, pro- of job performance in military, hospital, and corporate set-
viding stable and robust findings, even though not all tings (Hogan, 2002). The HPI was based upon the Big Five
dimensions were measured in all studies. The researchers theory of personality. This instrument has cross-validated
conducted multiple analyses involving different occupa- criterion-related validities as high as .60 for some scales
tional categories and diverse outcome measures such as (Hogan, 1986; Hogan & Hogan, 1986).
task performance, job dedication, and interpersonal facili-
tation. We discuss here only the most general results, 11.1.6: Paper-and-Pencil
namely, the operational validity for the five factors in pre-
dicting overall job performance. Operational validity refers Integrity Tests
to the correlation between personality measures and job Several test publishers have introduced instruments
performance, corrected for sampling error, range restric- designed to screen theft-prone individuals and other unde-
tion, and unreliability of the criterion. Big Five factors and sirable job candidates such as persons who are undependa-
validity coefficients were as follows: ble or frequently absent from work (Cullen & Sackett, 2004;
Wanek, 1999). We will focus on issues raised by these tests
Conscientiousness .26 rather than detailing the merits or demerits of individual
Neuroticism .13 instruments. Table 11.5 lists some of the more commonly
Extraversion .15 used instruments.
Agreeableness .05 One problem with integrity tests is that their proprie-
Openness to Experience .04 tary nature makes it difficult to scrutinize them in the same
manner as traditional instruments. In most cases, scoring
Overall, Conscientiousness is the big winner in their keys are available only to in-house psychologists, which
analysis, although for some specific occupational catego- makes independent research difficult. Nonetheless, a siza-
ries, other factors were valuable (e.g., Agreeableness paid ble body of research now exists on integrity tests, as dis-
off for Customer Service personnel). Hurtz and Donovan cussed in the following section on validity.
(2000) use caution and understatement to summarize the An integrity test evaluates attitudes and experiences
implications of their study: relating to the honesty, dependability, trustworthiness,
What degree of utility do these global Big Five measures and pro-social behaviors of a respondent. Integrity tests
offer for predicting job performance? Overall, it appears typically consist of two sections. The first is a section
that global measures of Conscientiousness can be dealing with attitudes toward theft and other forms of
expected to consistently add a small portion of explained dishonesty such as beliefs about extent of employee theft,
variance in job performance across jobs and across crite- degree of condemnation of theft, endorsement of com-
rion dimension. In addition, for certain jobs and for cer- mon rationalizations about theft, and perceived ease of
tain criterion dimensions, certain other Big Five theft. The second is a section dealing with overt admis-
dimensions will likely add a very small but consistent
sions of theft and other illegal activities such as items sto-
degree of explained variance. (p. 876)
len in the last year, gambling, and drug use. The most
In sum, people who describe themselves as reliable, widely researched tests of this type include the Personnel
organized, and hard-working (i.e., high on Conscientious- Selection Inventory, the Reid Report, and the Stanton
ness) appear to perform better at work than those with Survey. The interested reader can find addresses for the
fewer of these qualities. publishers of these and related instruments through
For specific applications in personnel selection, certain Internet search.
tests are known to have greater validity than others. For Apparently, integrity tests can be easily faked and
example, the California Psychological Inventory (CPI) might, therefore, be of less value in screening dishonest
Industrial, Occupational, and Career Assessment 325

predicting job performance, training performance, or work


Table 11.5 Commonly Used Integrity Tests turnover (corrected rs of .15, .16, and .09, respectively).
However, when counterproductive work behavior (CWB,
e.g., theft, poor attendance, unsafe behavior, property
destruction) was the criterion, the corrected r was a
healthy .32. The correlation was even higher, r = .42,
when based on self-reports of CWB as opposed to other
reports or employee records. Overall, these findings sup-
port the value of integrity testing in personnel selection.
Ones et al. (1993) requested data on integrity tests from
publishers, authors, and colleagues. These sources proved
highly cooperative: The authors collected 665 validity
coefficients based upon 25 integrity tests administered to
more than half a million employees. Using the intricate
procedures of meta-analysis, Ones et al. (1993) computed
an average validity coefficient of .41 when integrity tests
were used to predict supervisory ratings of job perfor-
mance. Interestingly, integrity tests predicted global dis-
ruptive behaviors (theft, illegal activities, absenteeism,
tardiness, drug abuse, dismissals for theft, and violence
on the job) better than they predicted employee theft
alone. The authors concluded with a mild endorsement of
these instruments:

When we started our research on integrity tests, we, like


many other industrial psychologists, were skeptical of
integrity tests used in industry. Now, on the basis of anal-
yses of a large database consisting of more than 600 valid-
ity coefficients, we conclude that integrity tests have
substantial evidence of generalizable validity.

This conclusion is echoed in a series of ingenious stud-


ies by Cunningham, Wong, and Barbee (1994). Among
other supportive findings, these researchers discovered
that integrity test results were correlated with returning an
overpayment—even when subjects were instructed to pro-
vide a positive impression on the integrity test.
Other reviewers are more cautious in their conclu-
sions. In commenting on reviews by the American Psycho-
logical Association and the Office of Technology
applicants than other approaches such as background
Assessment, Camara and Schneider (1994) concluded that
check. For example, Ryan and Sackett (1987) created a
integrity tests do not measure up to expectations of experts
generic overt integrity test modeled upon existing instru-
in assessment, but that they are probably better than hit-or-
ments. The test contained 52 attitude and 11 admission
miss, unstandardized methods used by many employers to
items. In comparison to a contrast group asked to respond
screen applicants.
truthfully and another contrast group asked to respond as
Several concerns remain about integrity tests. Pub-
job applicants, subjects asked to “fake good” produced sub-
lishers may release their instruments to unqualified users,
stantially superior scores (i.e., better attitudes and fewer
which is a violation of ethical standards of the American
theft admissions).
Psychological Association. A second problem arises from
Validity of Integrity Tests In a recent meta-­ the unknown base rate of theft and other undesirable
analysis of 104 criterion-related validity studies, Van behaviors, which makes it difficult to identify optimal cut-
Iddekinge, Roth, Raymark, and Odle-Dusseau (2012) ting scores on integrity tests. If cutting scores are too strin-
found that integrity tests were not particularly useful in gent, honest job candidates will be disqualified unfairly.
326 Chapter 11

Conversely, too lenient a cutting score renders the testing Example items and their corresponding weights were as
pointless. A final concern is that situational factors may follows:
moderate the validity of these instruments. For example,
how a test is portrayed to examinees may powerfully Installing Pulleys and Belts Scoring Weights
affect their responses and therefore skew the validity of 1. Checks key before installing against:
the instrument. ____ shaft 2
The debate about integrity tests juxtaposes the legiti- ____ pulley 2
mate interests of business against the individual rights of ____ neither 0
workers. Certainly, businesses have a right not to hire
Disassembling and Repairing a Gear Box
thieves, drug addicts, and malcontents. But in pursuing
10. Removes old bearing with:
this goal, what is the ultimate cost to society of asking mil-
____ press and driver 3
lions of job applicants about past behaviors involving
____ bearing puller 2
drugs, alcohol, criminal behavior, and other highly per-
____ gear puller 1
sonal matters? Hanson (1991) has asked rhetorically
____ other 0
whether society is well served by the current balance of
power—in which businesses can obtain proprietary infor- Pressing a Bushing into Sprocket and
Reaming to Fit a Shaft
mation about who is seemingly worthy and who is not. It
4. Checks internal diameter of bushing against shaft diameter:
is not out of the question that Congress could enter the
____ visually 1
debate. In 1988, President Reagan signed into law the
____ hole gauge and micrometers 3
Employee Polygraph Protection Act, which effectively
eliminated polygraph testing in industry. Perhaps in the ____ Vernier calipers 2

years ahead we will see integrity testing sharply curtailed ____ scale 1

by an Employee Integrity Test Protection Act. Berry, Sack- ____ does not check 0
ett, and Wiemann (2007) provide an excellent review of the
current state of integrity testing. Campion found that the performance of 34 male main-
tenance mechanics on the work sample measure was sig-
11.1.7: Work Sample and nificantly and positively related to the supervisor ’s
evaluations of their work performance, with validity coef-
Situational Exercises ficients ranging from .42 to .66.
A work sample is a miniature replica of the job for which A situational exercise is approximately the white-col-
examinees have applied. Muchinsky (2003) points out that lar equivalent of a work sample. Situational exercises are
the I/O psychologist’s goal in devising a work sample is largely used to select persons for managerial and profes-
“to take the content of a person’s job, shrink it down to a sional positions. The main difference between a situational
manageable time period, and let applicants demonstrate exercise and a work sample is that the former mirrors only
their ability in performing this replica of the job.” Guion part of the job, whereas the latter is a microcosm of the
(1998) has emphasized that work samples need not include entire job (Muchinsky, 1990). In a situational exercise, the
every aspect of a job but should focus upon the more diffi- prospective employee is asked to perform under circum-
cult elements that effectively discriminate strong from stances that are highly similar to the anticipated work
weak candidates. For example, a position as clerk-typist environment. Measures of accomplishment can then be
may also include making coffee and running errands for gathered as a basis for gauging likely productivity or other
the boss. However, these are trivial tasks demanding so lit- aspects of job effectiveness. The situational exercises with
tle skill that it would be pointless to include them in a work the highest validity show a close resemblance with the cri-
sample. A work sample should test important job domains, terion; that is, the best exercises are highly realistic (Asher
not the entire job universe. & Sciarrino, 1974; Muchinsky, 2003).
Campion (1972) devised an ingenious work sample Work samples and situational exercises are based on
for mechanics that illustrates the preceding point. Using the conventional wisdom that the best predictor of future
the job analysis techniques discussed at the beginning of performance in a specific domain is past performance in
this topic, Campion determined that the essence of being that same domain. Typically, a situational exercise requires
a good mechanic was defined by successful use of tools, the candidate to perform in a setting that is highly similar
accuracy of work, and overall mechanical ability. With to the intended work environment. Thus, the resulting per-
the help of skilled mechanics, he devised a work sample formance measures resemble those that make up the pro-
that incorporated these job aspects through typical tasks spective job itself.
such as installing pulleys and repairing a gearbox. Points Hundreds of work samples and situational exercises
were assigned to component behaviors for each task. have been proposed over the years. For example, in an
Industrial, Occupational, and Career Assessment 327

earlier review, Asher and Sciarrino (1974) identified 60 pro- potential administrators could be described in terms of
cedures, including the following: eight primary factors. When scores on these primary fac-
tors were themselves factor analyzed, three second-order
• Typing test for office personnel
factors emerged. These second-order factors describe
• Mechanical assembly test for loom fixers
administrative behavior in the most general terms possi-
• Map-reading test for traffic control officers ble. The first dimension is Preparing for Action, character-
• Tool dexterity test for machinists and riveters ized by deferring final decisions until information and
• Headline, layout, and story organization test for mag- advice is obtained. The second dimension is simply
azine editors Amount of Work, depicting the large individual differ-
• Oral fact-finding test for communication consultants ences in the sheer work output. The third major dimension
is called Seeking Guidance, with high scorers appearing to
• Role-playing test for telephone salespersons
be anxious and indecisive. These dimensions fit well with
• Business-letter-writing test for managers
existing theory about administrator performance and
A very effective situational exercise that we will dis- therefore support the validity of Frederiksen’s task.
cuss here is the in-basket technique, a procedure that simu- A number of salient attributes emerged when Fred-
lates the work environment of an administrator. eriksen compared the subject groups on the scorable
dimensions of the in-basket test. For example, the under-
The In-Basket Test The classic paper on the in-bas-
graduates stressed verbal productivity, the government
ket test is the monograph by Frederiksen (1962). For this
administrators lacked concern with outsiders, the business
comprehensive study Frederiksen devised the Bureau of
executives were highly courteous, the army officers exhib-
Business In-Basket Test, which consists of the letters, mem-
ited strong control over subordinates, and school princi-
oranda, records of telephone calls, and other documents
pals lacked firm control. These group differences speak
that have collected in the in-basket of a newly hired execu-
strongly to the construct validity of the in-basket test, since
tive officer of a business bureau. In this test, the candidate
the findings are consistent with theoretical expectations
is instructed not to play a role, but to be himself. 3 The can-
about these subject groups.
didate is not to say what he would do, he is to do it.
Early studies supported the predictive validity of in-
The letters, memoranda, phone calls, and interviews
basket tests. For example, Brass and Oldham (1976) dem-
completed by him in this simulated job environment con-
onstrated that performance on an in-basket test
stitute the record of behavior that is scored according to
corresponded to on-the-job performance of supervisors if
both content and style of the responses. Response style refers
the appropriate in-basket scoring categories were used.
to how a task was completed—courteously, by telephone,
Specifically, based on the in-basket test, supervisors who
by involving a superior, through delegation to a subordi-
personally reward employees for good work, personally
nate, and so on. Content refers to what was done, including
punish subordinates for poor work, set specific perfor-
making plans, setting deadlines, seeking information; sev-
mance objectives, and enrich their subordinates’ jobs are
eral quantitative indices were also computed, including
also rated by their superiors as being effective managers.
number of items attempted and total words written. For
The predictive power of these in-basket dimensions was
some scoring criteria such as imaginativeness—the num-
significant, with a multiple correlation coefficient of .54
ber of courses of action which seemed to be good ideas—
between predictors and criterion. Standardized in-basket
expert judgment was required.
tests can now be purchased for use by private organiza-
Frederiksen (1962) administered his in-basket test to
tions. Unfortunately, most of these tests are “in-house”
335 subjects, including students, administrators, execu-
instruments not available for general review. In spite of
tives, and army officers. Scoring the test was a complex
occasional cautionary reviews (e.g., Brannick et al., 1989;
procedure that required the development of a 165-page
Schroffel, 2012), the in-basket technique is still highly
manual. The odd–even reliability of the individual items
regarded as a useful method of evaluating candidates for
varied considerably, but enough modestly reliable items
managerial positions.
emerged (rs of .70 and above) that Frederiksen could con-
duct several factor analyses and also make meaningful
Assessment Centers An assessment center is not
group comparisons.
so much a place as a process (Highhouse & Nolan,
When scores on the individual items were correlated
2012). Many corporations and military branches—as
with each other and then factor analyzed, the behavior of
well as a few progressive governments—have dedicated
special sites to the application of in-basket and other
3
We do not mean to promote a subtle sexism here, but in fact Fred- simulation exercises in the training and selection of
eriksen (1962) tested a predominantly (if not exclusively) male managers. The purpose of an assessment center is to
sample of students, administrators, executives, and army officers. evaluate managerial potential by exposing candidates
328 Chapter 11

to multiple simulation techniques, including group pres- 11.1.8: Appraisal of Work


entations, problem-solving exercises, group discussion
exercises, interviews, and in-basket techniques. Results
Performance
from traditional aptitude and personality tests also are The appraisal of work performance is crucial to the suc-
considered in the overall evaluation. The various simu- cessful operation of any business or organization. In the
lation exercises are observed and evaluated by success- absence of meaningful feedback, employees have no idea
ful senior managers who have been specially trained in how to improve. In the absence of useful assessment,
techniques of observation and evaluation. Assessment administrators have no idea how to manage personnel. It is
centers are used in a variety of settings, including busi- difficult to imagine how a corporation, business, or organi-
ness and industry, government, and the military. There zation could pursue an institutional mission without eval-
is no doubt that a properly designed assessment center uating the performance of its employees in one manner or
can provide a valid evaluation of managerial potential. another.
Follow-up research has demonstrated that the perfor- Industrial and organizational psychologists frequently
mance of candidates at an assessment center is strongly help devise rating scales and other instruments used for
correlated with supervisor ratings of job performance performance appraisal (Landy & Farr, 1983). When done
(Gifford, 1991). A more difficult question to answer is properly, employee evaluation rests upon a solid founda-
whether assessment centers are cost-effective in compar- tion of applied psychological measurement—hence, its
ison to traditional selection procedures. After all, fund- inclusion as a major topic in this text. In addition to intro-
ing an assessment center is very expensive. The key ducing essential issues in the measurement of work perfor-
question is whether the assessment center approach to mance, we also touch briefly on the many legal issues that
selection boosts organizational productivity sufficiently surround the selection and appraisal of personnel. We
to offset the expense of the selection process. Anecdo- begin by discussing the context of performance appraisal.
tally, the answer would appear to be a resounding yes, The evaluation of work performance serves many
since poor decisions from bad managers can be very organizational purposes. The short list includes promo-
expensive. However, there is little empirical information tions, transfers, layoffs, and the setting of salaries—all of
that addresses this issue. which may hang in the balance of performance appraisal.
Goffin, Rothstein, and Johnston (1996) compared the The long list includes at least 20 common uses identified by
validity of traditional personality testing (with the Person- Cleveland, Murphy, and Williams (1989). These applica-
ality Research Form; Jackson, 1984b) and the assessment tions of performance evaluation cluster around four major
center approach in the prediction of the managerial per- uses: comparing individuals in terms of their overall per-
formance of 68 managers in a forestry products company. formance levels; identifying and using information about
Both methods were equivalent in predicting performance, individual strengths and weaknesses; implementing and
which would suggest that the assessment center approach evaluating human resource systems in organizations; and
is not worth the (very substantial) additional cost. How- documenting or justifying personnel decisions. Beyond a
ever, when both methods were used in combination, per- doubt, performance evaluation is essential to the mainte-
sonality testing provided significant incremental validity nance of organizational effectiveness.
over that of the assessment center alone. Thus, personality As the reader will soon discover, performance evalua-
testing and assessment center findings each contribute tion is a perplexing problem for which the simple and
unique information helpful in predicting performance. obvious solutions are usually incorrect. In part, the task is
Putting a candidate through an assessment center is difficult because the criteria for effective performance are
very expensive. Dayan, Fox, and Kasten (2008) speak to seldom so straightforward as “dollar amount of widgets
the cost of assessment center operations by arguing that sold” (e.g., for a salesperson) or “percentage of students
an employment interview and cognitive ability test scores passing a national test” (e.g., for a teacher). As much as we
can be used to cull the best and the worst applicants so might prefer objective methods for assessing the effective-
that only those in the middle need to undergo these ness of employees, judgmental approaches are often the
expensive evaluations. Their study involved 423 Israeli only practical choice for performance evaluation.
police force candidates who underwent assessment center The problems encountered in the implementation of
evaluations after meeting initial eligibility. The research- performance evaluation are usually referred to collectively
ers concluded in retrospect that, with minimal loss of sen- as the criterion problem—a designation that first appeared
sitivity and specificity, nearly 20 percent of this sample in the 1950s (e.g., Flanagan, 1956; Landy & Farr, 1983). The
could have been excused from more extensive evaluation. phrase criterion problem is meant to convey the difficul-
These were individuals who, based on interview and cog- ties involved in conceptualizing and measuring perfor-
nitive test scores, were nearly sure to fail or nearly certain mance constructs, which are often complex, fuzzy, and
to succeed. multidimensional. For a thorough discussion of the
Industrial, Occupational, and Career Assessment 329

criterion problem, the reader should consult comprehen- for performance evaluation. Certainly employers have
sive reviews by Austin and Villanova (1992) and Campbell, good reason to keep tabs on absenteeism and to reduce it
Gasser, and Oswald (1996). We touch upon some aspects of through appropriate incentives. Steers and Rhodes (1978)
the criterion problem in the following review. calculated that absenteeism costs about $25 billion a year
in lost productivity! Little wonder that absenteeism is a
11.1.9: Approaches to Performance seductive criterion measure that has been researched
extensively (Harrison & Hulin, 1989).
Appraisal Unfortunately, absenteeism turns out to be a largely
There are literally dozens of conceptually distinct useless measure of work performance, except for the
approaches to the evaluation of work performance. In extreme cases of flagrant work truancy. A major problem
practice, these numerous approaches break down into four is defining absenteeism. Landy and Farr (1983) list 28 cat-
classes of information: performance measures such as pro- egories of absenteeism, many of which are uncorrelated
ductivity counts; personnel data such as rate of absentee- with the others. Different kinds of absenteeism include
ism; peer ratings and self-assessments; and supervisor scheduled versus unscheduled, authorized versus unau-
evaluations such as rating scales. Rating scales completed thorized, justified versus unjustified, contractual versus
by supervisors are by far the preferred method of perfor- noncontractual, sickness versus nonsickness, medical ver-
mance appraisal, as discussed later. First, we mention the sus personal, voluntary versus involuntary, explained ver-
other approaches briefly. sus unexplained, compensable versus noncompensable,
certified illness versus casual illness, Monday/Friday
Performance Measures Performance measures
absence versus midweek, and reported versus unreported.
include seemingly objective indices such as number of
When is a worker truly absent from work? The criteria are
bricks laid for a mason, total profit for a salesperson, or
very slippery.
percentage of students graduated for a teacher. Although
In addition, absenteeism turns out to be an atrociously
production counts would seem to be the most objective
unreliable variable. The test–retest correlations (absentee
and valid methods for criterion measurement, there are
rates from two periods of identical length) are as low as .20,
serious problems with this approach (Guion, 1965).
meaning that employees display highly variable rates of
absenteeism from one time period to the next. A related
problem with absenteeism is that workers tend to underre-
port it for themselves and overreport it for others (Harrison
& Shaffer, 1994). Finally, for the vast majority of workers,
absenteeism rates are quite low. In short, absenteeism is a
poor method for assessing worker performance, except for
the small percentage of workers who are chronically truant.

Peer Ratings and Self-Assessments Some


researchers have proposed that peer ratings and self-
assessments are highly valid and constitute an important
complement to supervisor ratings. A substantial body of
research pertains to this question, but the results are often
confusing and contradictory. Nonetheless, it is possible to
Another problem is that production counts may be list several generalizations (Harris & Schaubroeck, 1988;
unreliable, especially over short periods of time. Finally, Smither, 1994):
production counts may tap only a small proportion of job
• Peers give more lenient ratings than supervisors.
requirements, even when they appear to be the definitive
criterion. For example, sales volume would appear to be • The correlation between self-ratings and supervisor
the ideal criterion for most sales positions. Yet, a salesper- ratings is minimal.
son can boost sales by misrepresenting company products. • The correlation between peer ratings and supervisor
Sales may be quite high for several years—until the com- ratings is moderate.
pany is sued by unhappy customers. Productivity is cer- • Supervisors and subordinates have different ideas
tainly important in this example, but the corporation about what is important in jobs.
should also desire to assess interpersonal factors such as
Overall, reviewers conclude that peer ratings and self-
honesty in customer relations.
assessments may have limited application for purposes such
Personnel Data: Absenteeism Personnel data as personal development, but their validity is not yet suffi-
such as rate of absenteeism provide another possible basis ciently established to justify widespread use (Smither, 1994).
330 Chapter 11

Supervisor Rating Scales Rating scales are the checklist to appraise the performance of resident advisers
most common measure of job performance (Landy & (RAs) in a dormitory. Modeling a study by Aamodt, Keller,
Farr, 1983; Muchinsky, 2003). These instruments vary Crawford, and Kimbrough (1981), we might ask current
from simple graphic forms to complex scales anchored to dormitory RAs the following question:
concrete behaviors. In general, supervisor rating scales Think of the best RA that you have ever known. Please
reveal only fair reliability, with a mean interrater reliabil- describe in detail several incidents that reflect why this
ity coefficient of .52 across many different approaches person was the best adviser. Please do the same for the
and studies (Viswesvaran, Ones, & Schmidt, 1996). In worst RA you have ever known.
spite of their weak reliability, supervisor ratings still rank
Based upon hundreds of nominated behaviors, check-
as the most widely used approach. About three-quarters
list developers would then proceed to distill and codify
of all performance evaluations are based upon judgmen-
these incidents into a smaller number of relevant behav-
tal methods such as supervisor rating scales (Landy,
iors, both desirable and undesirable. For example, the fol-
1985).
lowing items might qualify for the RA checklist:
The simplest rating scale is the graphic rating scale,
introduced by Donald Paterson in 1922 (Landy & Farr, ____ stays in dorm more than required
1983). A graphic rating scale consists of trait labels, brief ____ breaks dormitory rules
definitions of those labels, and a continuum for the rating.
____ is fair about discipline
As the reader will notice in Figure 11-1, several types of
____ plans special programs
graphic rating scales have been used.
____ fails to discipline friends
____ is often unfriendly
Figure 11.1 Examples of Graphic Rating Scales
____ shows concern about residents
(a) Quality
Poor
Excellent ____ comes across as authoritarian

(b)
Quality
Excellent Poor Of course, the full checklist would be much longer
5 4 3 2 1
than the preceding. The RA supervisor would complete
Quality this instrument as a basis for performance appraisal. If
Work Nearly Work Is Quality Is Work Has Work Is
(c) Always Often Average For Occasional Rarely needed, an overall summary score can be derived from an
Exceptional Exceptional This Job Flaws Adequate
appropriate weighting of individual items.
Another form of criterion-referenced judgmental
(d) Quality 7 6 5 4 3 2 1
measure is the behaviorally anchored rating scale (BARS).
The classic work on BARS dates back to Smith and Kendall
Rating
Factors Consistently
Performance Evaluation
Sometimes Consistently
(1963). These authors proposed a complex developmental
Unsatisfactory
(e)
Excellent Excellent Average
procedure for producing criterion-referenced judgments.
Quality:
Accuracy The procedure uses a number of experts to identify and
Neatness
Clarity define performance dimensions, generate behavior exam-
ples, and scale the behaviors meaningfully. Overall, the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2 0
procedure is quite complex, time-consuming, and expen-
Quality:
(f) sive. A number of variations and improvements have been
Poor Fair Average Good
suggested. An advantage to BARS and other behavior-
based scales is their strict adherence to EEOC (Equal
The popularity of graphic rating scales is due, in part, to Employment Opportunity Commission) guidelines dis-
their simplicity. But this is also a central weakness because cussed later in this chapter. BARS and related approaches
the dimension of work performance being evaluated may be focus upon behaviors as opposed to personality or attitudi-
vaguely defined. Dissatisfaction with graphic rating scales nal characteristics. A behaviorally anchored scale for per-
led to the development of many alternative approaches to formance of college professors in posting office hours is
performance appraisal, as discussed in this section. depicted in Figure 11-2. Of course, the comprehensive eval-
A critical incidents checklist is based upon actual epi- uation of a sales manager would include additional scales
sodes of desirable and undesirable on-the-job behavior for other aspects of work.
(Flanagan, 1954). Typically, a checklist developer will ask Research on improving the accuracy of ratings with
employees to help construct the instrument by submitting BARS is mixed. Some studies find fewer rating errors—
specific examples of desirable and undesirable job behav- especially a reduction in unwarranted leniency of evalua-
ior. For example, suppose that we intended to develop a tions—whereas other studies report no improvement with
Industrial, Occupational, and Career Assessment 331

officers and the other judged less so. Here is a sample tet-
Figure 11.2 Behaviorally Anchored Rating Scale for
Posting and Maintaining Office Hours rad (Borman, 1991):

Could be expected to post required and extra office Most Least


hours the first week of the semester, maintain them Descriptive Descriptive
7
without exception, and greet students in a friendly
manner. A. Cannot assume responsibility ____ ____

Could be expected to post required and extra office B. Knows how and when to
____ ____
delegate authority
hours the first week of the semester, and maintain 6
them without exception. C. Offers suggestions ____ ____
Could be expected to post required and extra office D. Changes ideas too easily ____ ____
hours the first week of the semester, and maintain 5
them most of the time.
Supervisors were asked to review the items in each tet-
Could be expected to post required office hours the
first week of the semester, and maintain them most of 4 rad and to check one item as most descriptive and one item
the time. as least descriptive of the officer being evaluated. A score
Could be expected to post required office hours by of +1 was awarded for responding “most descriptive” to
3
mid-semester, and maintain them most of the time. the positively keyed item (in this case, alternative B) or
Could be expected to post required office hours with “least descriptive” to the negatively keyed item (in this
“push” from department chair, but would miss office 2
hours without notice.
case alternative A), whereas a score of −1 was awarded for
Could be expected to resist posting office hours and
responding “least descriptive” to the positively keyed item
1 or “most descriptive” to the negatively keyed item.
fail to maintain them.
Responding to the nonkeyed items (alternatives C and D)
BARS compared to other evaluation methods (Murphy & as most or least descriptive earned a score of 0. Thus, each
Pardaffy, 1989). Overall, Muchinsky (2003) concludes that tetrad yielded a five-point continuum of scores: +2, +1, 0,
the BARS approach is not much better than graphic rating −1, −2. The summary score used for performance appraisal
scales in reducing rating errors. Nonetheless, the scale devel- consisted of the algebraic sum of the individual items.
opment process of BARS may have indirect benefits in that The forced-choice approach has never really caught
supervisors are compelled to pay close attention to the on, due largely to the effort required in scale construction.
behavioral components of effective performance. This is unfortunate because the method does effectively
A behavior observation scale (BOS) is a variation reduce unwanted bias. Borman (1991) refers to this
upon the BARS technique. The difference between the two approach as a “bold initiative” that produces a relatively
is that the BOS approach uses a continuum from “almost objective rating scale.
never” to “almost always” to measure how often an
employee performs the specific tasks on each behavioral 11.1.10: Sources of Error
dimension. As with the BARS technique, researchers ques- in Performance Appraisal
tion whether behavior observation scales are worth the
The most difficult problem in the assessment of job perfor-
extra effort (Guion, 1998).
mance is the proper definition of appraisal criteria. If the
A forced-choice scale is designed to eliminate bias
supervisor is using a poorly designed instrument that does
and subjectivity in supervisor ratings by forcing a choice
not tap the appropriate dimensions of job behavior, then
between options that are equal in social desirability. In
almost by definition the performance appraisal will be
theory, this approach makes it impossible for the supervi-
inaccurate, incomplete, and erroneous. Undoubtedly, the
sor to slant ratings in a biased or subjective manner. We
failure to identify appropriate criteria for acceptable and
will use the pathbreaking research by Sisson (1948) to
unacceptable performance is a major source of error in per-
illustrate the features of this approach. He developed a
formance appraisal. But it is not the only source. Even
scale to evaluate Army officers that consisted of tetrads of
when supervisors have access to excellent, well-designed
behavioral descriptors. Each tetrad contained two positive
measures of performance appraisal, various sorts of subtle
items matched for social desirability and two negative
errors can creep in. We discuss three such additional
items also matched for social desirability. The four items
sources of rating error: halo effect, rater bias, and criterion
in each tetrad were topically related to a single perfor-
contamination.
mance dimension. Unknown to the supervisors who com-
pleted the rating scale, one of the two positive items was Halo Effect The tendency to rate an employee high or
judged very descriptive of effective Army officers and the low on all dimensions because of a global impression is
other judged less so. Likewise, one of the two negative called halo effect. Research on the halo effect can be traced
items was judged more descriptive of ineffective Army back to the early part of this century (Thorndike, 1920). The
332 Chapter 11

most common halo effect is a positive halo effect. In this affect often correlate strongly with performance ratings,
case, an employee receives a higher rating than deserved but this is because both are a consequence of how well or
because the supervisor fails to be objective when rating poorly the employee does the job (Ferris, Judge, Rowland,
specific aspects of the employee’s behavior. A positive halo & Fitzgibbons, 1994; Varma, DeNisi, & Peters, 1996).
effect is usually based upon overgeneralization from one Other forms of rater bias are discussed by Goldstein
element of a worker’s behavior. For example, an employee (1991) and Smither (1994).
with perfect attendance may receive higher-than-deserved
Criterion Contamination Criterion contamina-
evaluations on productivity and work quality—even
tion is said to exist when a criterion measure includes fac-
though attendance is not directly related to these job
tors that are not demonstrably part of the job (Borman,
dimensions.
1991; Harvey, 1991). For example, if a performance meas-
Smither (1998) lists the following approaches to con-
ure includes appearance, this would most likely be a case
trol for halo effects:
of criterion contamination—unless appearance is relevant
• Provide special training for raters to job success. Likewise, evaluating an employee on “deal-
• Supervise the supervisors during the rating ing with the public” is only appropriate if the job actually
• Practice simulations before doing the ratings requires the employee to meet the public. Goldstein (1992)
outlines three kinds of criterion contamination:
• Keep a diary of information relevant to appraisal
• Provide supervisors with a short lecture on halo effects

Additional approaches to rater training are dis-


cussed by Goldstein (1991). An intriguing analysis of the
nature and consequences of halo error can be found in
Murphy, Jako, and Anhalt (1993). Contrary to the reign-
ing prejudice against halo errors, these researchers con-
clude that the halo effect does not necessarily detract
from the accuracy of ratings. They point out that a pre-
sumed halo effect is often the by-product of true overlap
on the dimensions being rated. The debate over halo
effect is not likely to be resolved anytime soon (Arvey &
Murphy, 1998).

Rater Bias The potential sources of rater bias are so Careful attention to job analysis as a basis for selection
numerous that we can only mention a few prominent of appraisal criteria is the best way to reduce errors in per-
examples here. Leniency or severity errors occur when a formance appraisal. In addition, employers should follow
supervisor tends to rate workers at the extremes of the certain guidelines in performance appraisal, as discussed
scale. Leniency may reflect social dynamics, as when the in the following section.
supervisor wants to be liked by employees. Leniency is
also caused by extraneous factors such as the attractiveness Guidelines for Performance Appraisal Per-
of the employee. Severity errors refer to the practice of rat- formance appraisal is a formidable task. Not only must
ing all aspects of performance as deficient. In contrast, cen- employers pay attention to the psychometric soundness of
tral tendency errors occur when the supervisor rates their approach, they must also design a practical system
everyone as nearly average on all performance dimen- that meets organizational goals. For example, appraisal
sions. Context errors occur when the rater evaluates an standards must be sufficiently difficult and detailed to
employee in the context of other employees rather than ensure that organizational goals are accomplished. Another
based upon objective performance. For example, the pres- concern is that performance appraisal falls under the pur-
ence of a workaholic salesperson with extremely high sales view of Title VII of the Civil Rights Act of 1964. Hence,
volume might cause the sales supervisor to rate other sales employers must develop fair systems that do not discrimi-
personnel lower than deserved. nate on the basis of race, sex, and other protected catego-
Recently, researchers have paid considerable atten- ries. To complicate matters, these standards—soundness,
tion to the possible biasing effects of whether a supervisor practicality, legality—may conflict with one another. The
likes or dislikes a subordinate. Surprisingly, the trend of practical approach may be neither psychometrically sound
the findings is that supervisor affect (liking or disliking) nor legal. Often, appraisal methods that show the best
toward specific employees does not introduce rating bias. measurement characteristics (e.g., strong interrater reliabil-
In general, strong affect in either direction represents ity) will fail to assess the most important aspects of perfor-
valid information about an employee. Thus, ratings of mance; that is, they are not practical. This is a familiar
Industrial, Occupational, and Career Assessment 333

refrain within the measurement field. Too often, psycholo- fine-tuned analysis here. In sum, they found that congru-
gists must choose between rigor and relevance, rarely ence between societal norms and personnel assessment
achieving both at the same time. Finally, legal considera- methods tended to reduce turnover and/or absenteeism.
tions must be considered when exploring the limits of per- One example is the use of the so-called 360-evaluation, in
formance appraisal. which performance appraisal is based on input from peo-
Smither (1998) has published guidelines for develop- ple at all levels who interact with the employee. This prac-
ing performance appraisal systems that we paraphrase tice is more effective (leading to less absenteeism and
here: turnover) in some cultures than others. Peretz and Fried
(2012) found that personnel assessment systems with sev-
• Base the performance appraisal upon a careful job
eral sources of raters (e.g., supervisors, coworkers, and
analysis
subordinates) were most acceptable to employees in com-
• Develop specific, contamination-free criteria for panies located in societies with low power distance, high
appraisal from the job analysis future orientation, and respect for individualism. In con-
• Determine that the instrument used to rate perfor- trast, multiple sources of assessment were not well received
mance is appropriate for the appraisal situation by employees working in collectivistic societies. It appears
• Train raters to be accurate, fair, and legal in their use of the best practices in personnel assessment depend upon
the appraisal instrument the cultural context.
• Use performance evaluations at regular intervals of six
months to a year
• Evaluate the performance appraisal system periodi- 11.2: Assessment for
cally to determine whether it is actually improving
performance Career Development
The training of raters is an especially important guide- in a Global Economy
line. An appraisal system that seems perfectly straightfor-
ward to the employer could easily be misunderstood by an 11.2 State the challenges of vocational psychologists
untrained rater, resulting in biased evaluations. Borman who provide career guidance and assessment
(1991) notes that two kinds of rater training are effective: Prior to the 1700s, agrarian economies dominated cultural
rater error training, in which the trainer seeks simply to and economic life in the western world. Vocational oppor-
alert raters to specific kinds of errors (e.g., halo effect); and tunities for most people remained limited to farming,
frame-of-reference training, in which the trainer familiar- crafts, labor, and small businesses. The modern vision
izes the raters with the specific content of each performance that individuals could pursue dozens or hundreds of
dimension. Research indicates that these kinds of training careers likely did not exist for the masses who scrambled
improve the accuracy of ratings. simply to survive (Zinn, 1995). With the advent of the first
Finally, we review an intriguing study conducted from industrial revolution in the 1700s, including the invention
an international perspective. Peretz and Fried (2012) of the steam engine and other labor saving devices, the
remind us that cultural norms influence the nature, accept- need for human labor diminished rapidly. In parallel, the
ability, and impact of different approaches to performance vocational world expanded substantially, offering
appraisal. They surveyed performance appraisal practices upward mobility to some of the working class and poor.
in 21 nations, obtained ratings on cultural norms for each Gradually, the concept of career identity emerged in the
nation, and determined their joint impact on organiza- public consciousness.
tional absenteeism and turnover. Specifically, the research- Career identity is now recognized as essential to per-
ers collected data on personnel practices from thousands of sonhood and vital to a sense of well-being. When we meet
organizations in these mainly European countries. Next, someone for the first time, our natural inclination is to
they obtained ratings for each country on four cultural ask, or at least to wonder, “What do you do for a living?”
practices: power distance (acceptance versus rejection of The values, political views, and personal qualities of the
inequality), future orientation (present versus future orien- individual are important, too, but how the individual
tation), person value (individualism versus collectivism), contributes to society is typically the first thing we want
and uncertainty avoidance (acceptance versus avoidance to know. An occupational title communicates an abun-
of uncertainty). Each cultural norm was rated 1 to 7 for dance of information, including personality characteris-
each nation based on an independent global data base. tics, economic class, and social standing (Andersen &
Then, they examined the joint impact of personnel prac- Vandehey, 2011).
tices and cultural norms on absenteeism and turnover. Work and career are so central to personal well-being
Their study is complex and detailed, beyond the scope of that unemployment, especially when prolonged, consistently
334 Chapter 11

causes a wide range of physical, psychological, and spiritual client’s expectation for “test and tell” sets the stage for
maladies. These include: the client and the counselor to depend on a limited,
structured approach
. . . economic hardship, loss of health insurance, foreclo- (Andersen & Vandehey, 2011, p. 10).
sure, and mental health problems. The mental health
problems include depression and anxiety, feelings of The problem with this method is that the counselor will
hopelessness and shame, and familial tension and conflict fail to discern the unique needs of the client in a develop-
(Jones & Barber, 2012, p. 18). mentally sensitive context. Guidance will be far more effec-
tive if the practitioner slows the process down and provides
A meta-analysis of 104 empirical studies revealed that
the opportunity for mutual exploration. In other words,
the negative impact of unemployment is buffered by the
career guidance is a tactic of assessment in the broader
availability of coping resources (e.g., family and financial
sense, not a limited method of testing in the narrow sense.
support) and, conversely, made worse by work-role cen-
Assessment for career development requires knowl-
trality (e.g., the belief that work is central to one’s life and
edge of theories of career development, sensitivity to issues
satisfaction) (McKee-Ryan, Song, Wanberg, & Kinicki,
of diversity, and understanding of information resources.
2005).
Thus, before turning to a survey of suitable instruments,
Except in a few totalitarian states where occupational
we begin with a brief review of prominent career develop-
access is rigidly controlled by the ruling elite, individuals
ment theories. We start with a simple but provocative
usually have some degree of latitude in finding their own
question pursued by Blustein (2006), “What is work for?”
way to a vocation. They also possess some capacity to
change occupations in their lifetimes. Even though the
widely cited assertion that the average individual will 11.2.1: Career Development
switch careers seven times has no factual basis, nonethe- and the Functions of Work
less, career change likely is more common now than in For some people, gainful employment provides more than
years past (Bialik, 2010). Also, initial career choice for the just a means to pay for food and housing. Psychologists
young adult remains a vexing issue for many, especially who provide assessment for career development need to
with the continual emergence of substantially new voca- keep in mind the multiple functions of work, reviewed
tions. The advent of new vocations is driven by technologi- here. Yet, it is also true that many people, perhaps the
cal innovations and the aging of the population. A few majority, do not have access to the educational and employ-
examples of new careers include cloud computing expert, ment opportunities that would allow them to develop a
market research data analyst, and corporate listening work vision or to realize a career dream.
officer (Forbes magazine, May 5, 2011).
Since recorded time, humanity has been plagued by vari-
The need for flexibility in career development originates,
ous forms of structural barriers based on race, culture,
in part, from the globalization of the world economies,
immigration status, religion, gender, age, sexual orienta-
spelled out in the provocative best seller, The World Is Flat tion, and social class that have had a differential impact on
(Friedman, 2009). Information technology is now instantly individuals. Our belief is that counselors need to be fully
available to everyone, linking knowledge centers into a sin- cognizant of how these barriers affect clients so that they
gle worldwide network, creating a more level economic are able to provide maximally effective interventions that
playing field, and requiring corporations to restructure as do not inadvertently blame the victims of social oppression
new opportunities emerge. One concrete example of the new, (Blustein, Kenna, Gill, & DeVoy, 2008, p. 297).
flat world: For the previous edition of this textbook, the edi-
It bears repeating that discrimination continues to
torial production and composition services were completed
obstruct career potential for minorities. A subtle racism on
by the skilled and efficient employees of a dynamic company
the part of employers and agencies often is the source.
located in India. After a few phone calls and email exchanges
Many studies could be cited to buttress this point as a
of PDF files with the author, the text was ready for printing in
global issue. For reasons of space, we offer just two exam-
the United States in a matter of weeks.
ples. A recent study from Great Britain confirms that ethnic
In summary, psychologists who provide career
minorities experience an “ethnic penalty” with higher
guidance will need new approaches to assessment that
unemployment rates, greater concentrations in dead-end
are sensitive to the need for transition planning in a rap-
assembly line jobs, and lower earnings than Whites, even
idly changing and increasingly competitive global econ-
for the same job (Bell & Casebourne, 2008). Immigrants to
omy. But practitioners need to avoid the “Test and Tell”
Great Britain likewise face career barriers. When able to
trap:
find work, it is typically in just a few industries such as
Clients often come to career counseling assuming an catering, language translation, shop work, and clerical
expert will administer some test that tells the client “the jobs. Professional employment was notably lacking,
answer” as to what occupation is “the right one.” The despite previous experience (Bloch, 2002).
Industrial, Occupational, and Career Assessment 335

Unfortunately, most theories of career development do many the founder of the field of career guidance. In 1909, he
not acknowledge the profound challenges faced by low published Choosing a Vocation, a practical manual for provid-
income individuals, minorities, and immigrants. The psy- ing career direction to young men and women. Parsons
chology-of-working viewpoint provided by Blustein and (1909) advocated making a career choice based on matching
his collaborators is an exception. These researchers provide personal traits with job factors:
a meta-theoretical perspective that can be used alongside
In the wise choice of a vocation there are three broad fac-
traditional models of career development. We begin with a tors: (1) a clear understanding of yourself, your aptitudes,
summary of their model. abilities, interests, ambitions, resources, limitations, and
According to Blustein and colleagues (2008), work can their causes; (2) a knowledge of the requirements and
fulfill any or all of three sets of needs: conditions of success, advantages and disadvantages,
compensation, opportunities, and prospects in different
lines of work; (3) true reasoning on the relations of these
two groups of facts (p. 5).

Parsons provided a 116-item questionnaire to survey


the accomplishments, interests, and aptitudes of the client.
This was followed by a lengthy, penetrating interview
designed to illuminate aspects of social presentation and
personal character (e.g., “Do you smile naturally and eas-
ily?” “Is your handshake warm and cordial?” “Are you
careful about voice modulation?” “Are you honest, truth-
ful, and candid?” “Are you industrious, hard-working,
and persistent?” “Do you welcome people of different
creed or political faith?”). His manual also provided an
extensive analysis of the qualities needed for success in
dozens of vocations. Consultation with each client contin-
ued over a span of several weeks. The task of the counselor
was to match the traits of the client with the requirements
of specific lines of work. Effectively, this was an early, rudi-
mentary form of the method advocated by John Holland
and others, known as person–environment fit.

In addition to discrimination, structural barriers often 11.2.3: Theory of Person–


prevent career development among minorities. For exam- Environment Fit
ple, African Americans may lack relevant social networks, Over 50 years ago, John Holland (1959) established the
lack public transit for employment, and lack savings framework for a sophisticated theory of vocational choice
needed to relocate for available work (Weller & Fields, that has engendered more research than any other
2011). Further, unemployment is itself a serious structural approach in the field. From the beginning, he also con-
barrier. In 2011, unemployment among African Americans structed and validated assessment tools that embodied
was about 16 percent, double that of Whites. These data do the practical application of his model, known as Person-
not include those who have quit looking for work, or who Environment Fit. He proposed that personality traits/
are chronically underemployed. Being out of work tends to interests tend to cluster into a small number of vocation-
become a vicious, self-perpetuating cycle, with the unem- ally relevant patterns, called types. For each personality
ployed individual losing work skills with each passing type, there is also a corresponding work environment
month, further reducing employment prospects. best suited to that type. According to Holland, there are
six types: Realistic, Investigative, Artistic, Social, Enter-
11.2.2: Origins of Career prising, and Conventional. Each type corresponds to both
a set of personality traits/interests and also to a set of
Development Theories environmental work demands. Table 11.6 depicts this
Implicitly or explicitly, practitioners make use of a theoretical approach, sometimes known as the RIASEC model, in
framework in their practice of assessment in career coun- reference to the first letters of the six types. The types are
seling. Thus, we provide a short review of essential view- idealizations that few people (or work environments) fit
points here. We begin with an historical note, acknowledging completely. The RIASEC personality patterns and corre-
the seminal contributions of Frank Parsons, considered by sponding work environments are found in Table 11.6.
336 Chapter 11

others), and a lesser emphasis on the Artistic type (reveals a


Table 11.6 RIASEC Personality Types creative element). Using the first letters of these three types
in descending order of emphasis, we arrive at the Holland
code for the individual, namely, ISA. We will say more
about Holland codes when we discuss assessment tools
such as the Self-Directed Search developed for this purpose.
For now it will suffice to know that excellent tools exist for
the empirically validated assessment of the six types.
Consistency and differentiation are two concepts
important in the Holland approach. Referring to the hexag-
onal model depicted in Figure 11-3, adjacent personality
types bear greater similarity to one another than types that
are separated on the figure. For example, the Realistic and
Conventional types (side by side) are somewhat similar,
whereas the Realistic and Social types (across the hexagon)
are quite different or inconsistent. Thus, a client whose Hol-
land code was RCE (adjacent codes) would be considered
more consistent than a client whose code was REA (sepa-
rated codes). This is relevant to assessment and career guid-
ance because work environments tend to possess
consistency in regard to types. It is easier for clients to find
person–environment fit when they possess consistency, too.

Figure 11.3 Holland’s Hexagonal Model of Personality


Types and Occupational Themes

Differentiation refers to the relative strength of the


first, second, and third personality types of the Holland
code. A client with strong differentiation will reveal a
marked preference for his or her first category, and less
interest in the second and third categories. A client with
Source: Based on Holland, J. L. (1985). Vocational Preference Inventory (VPI)
weak differentiation might demonstrate scores that are
manual—1985 edition. Odessa, FL: Psychological Assessment Resources. nearly tied on the top three categories of the Holland code.
This could indicate a difficulty committing to one kind of
Regarding the six personality types, it is rare that an work environment. Most work environments require some
individual is a “pure” representation of only one type. degree of differentiation. Hence, the undifferentiated client
Instead, most individuals reveal a preferred type, but dis- may struggle to find a satisfying work environment.
play some resemblance to a secondary and a tertiary type as Holland’s theoretical approach has been so influential
well. For example, someone who was very strong on the that nearly every assessment tool in the field of career guid-
Investigative dimension (likes to analyze) might reveal a ance makes reference to his six personality types. But the sim-
secondary emphasis for the Social aspect (enjoys helping ple elegance of this approach is also a potential weakness. The
Industrial, Occupational, and Career Assessment 337

assessment tools that embody the Holland model typically PEC theory is rich in complexity because it has evolved
list suitable occupations and rule out nonmatching environ- over more than five decades; we can only provide a few
ments. Counselors and clients can foreclose on further explo- highlights. The central principle is that the more closely the
ration. It is easy to fall into the “test and tell” trap. rewards of the job or the organization correspond to the
core values of the individual, the more likely it is that he or
she will find satisfaction with the position. But PEC also
11.2.4: Theory of Person– invokes cognitive, personality, and environmental styles in
Environment Correspondence its understanding of work adjustment. For example, envi-
The theory of Person–Environment Correspondence (PEC) ronmental styles include celerity, pace, rhythm, and endur-
evolved from the Theory of Work Adjustment (TWA). First ance required to complete the job, which are each assessed
envisioned in the 1950s, TWA arose as a basis for conduct- on a continuum (Dawis, 1996):
ing research on the work adjustment of vocational rehabili-
tation clients. Soon it became clear that TWA applied to Highlights of the Person-Environment Correspondence
situations other than rehabilitation, and that the approach Theory (PEC)
was a specific case of a more general method, which came
to be known as Person–Environment Correspondence or
PEC (Dawis, 2002).
PEC bears modest similarity to the person-environment
approach advocated by Holland and colleagues. The central
point of similarity is that, in determining suitable careers,
both theories compare the attributes of individuals with the
qualities needed in occupations (Dawis, 1996; Dawis &
Lofquist, 1991). One difference is that PEC places greater
emphasis on individual abilities and their match to the abil-
ity patterns required by specific occupations. Ability is dif-
ferent from skill level, which can be acquired with
Andersen and Vandehey (2012) provide a useful illus-
preparation. Ability refers to aptitude, indicating the level of
tration of how these environmental styles play out for spe-
mastery an individual can achieve with suitable training
cific occupations:
and experience. Another difference is that PEC places
greater weight on individual values and their correspond- Two examples demonstrating differing styles are an
ence to the value fulfillments provided by specific occupa- emergency room and a gemsmith. An emergency room
tions (Dawis, 2002; Eggerth, 2008). requires cyclical, intense work periods as well as down
PEC theory identifies six crucial values that need to be times. Medical personnel need high celerity (be fast) with
a high level of effort (pace). Also, some surgeries could
considered in assessment and counseling for career devel-
last up to 16 hours, requiring high endurance. By con-
opment. These values are as follows:
trast, a gemsmith is ill advised to be fast when cutting
1. Achievement—the importance of using one’s abilities gems, and the celerity requirements are low. In addition,
and having a feeling of accomplishment several outstanding gems may be worth more money
than many poorly cut stones (low pace). The work envi-
2. Altruism—the importance of harmony with, and being
ronment has a steady rhythm and probably requires vary-
of service to, others
ing amounts of endurance, depending upon the stone
3. Autonomy—the importance of being independent and size and complexity of the cuts (p. 47).
having a sense of control
Of course, these four dimensions also manifest as
4. Comfort—the importance of feeling comfortable and
measurable personality styles. In the world of career coun-
not being stressed
seling, a mismatch between these two broad factors (envi-
5. Safety—the importance of stability, order, and predict- ronmental style required by a job, personality style
ability
preferred by the client) often is a precipitating referral issue.
6. Status—the importance of recognition and being in a Dawis and colleagues offer 17 testable propositions
dominant position (Dawis, 2002, p. 446). derived from PEC and provide a wealth of supporting
This list is not comprehensive and it is likely that addi- research (Dawis, 2002; Dawis & Lofquist, 1984). For example,
tional values will emerge with further research. Of course, one proposition is:
correspondence between personal values held by the client Proposition III: P’s satisfaction is a function of the corre-
and the potential for their fulfillment in an occupation is spondence of E’s reinforcers to P’s values, provided that
central to work satisfaction and productivity. P’s abilities correspond to E’s ability requirements.
338 Chapter 11

Put simply, a person’s satisfaction with a job is a func- The stage development theory proposed by Super pro-
tion of the match of the available environmental reinforcers vides a useful reminder that career development does not
with the values of the individual, provided that his or her end in young adulthood but extends throughout the life
abilities correspond to those required by the position. This span. However, the theory was based on career develop-
is an empirically testable hypothesis that has stood up well ment as found in the dominant culture of his time which
in research studies (Dawis, 2002). was mainly white and often middle class or higher. In a
changing global economy, some of the developmental
11.2.5: Stage Theories of Career stages no longer seem as relevant. In particular, the mainte-
nance phase is difficult for many to sustain because of the
Development need for frequent career transitions (Friedman, 2009). Super
Beginning in the 1950s, Donald Super and colleagues died in 1994. Toward the end of his career, he acknowl-
developed an influential stage theory in the field of career edged new realities:
guidance and development (Super, 1953, 1994). His
Work and occupation provide a focus for personality
approach departs from the trait-factor method preferred by organization for most men and women, although for
many in the field, and embraces a more flexible, holistic, some individuals this focus is peripheral, incidental, or
life span perspective. The essentials of the theory were even nonexistent. Then other foci such as leisure activities
stated with elegant simplicity in his first and most widely and homemaking, may be central. Social traditions such
cited article, “A theory of vocational development” (Super, as sex-role stereotyping and modeling, racial and ethnic
1953). Later papers provided additional details to the origi- biases, and the opportunity structure, as well as individ-
nal framework (Super, Savickas, & Super, 1996). ual differences are important determinants of preferences
Super acknowledged the obvious fact that people dif- for such roles as worker, student, leisurite, homemaker,
fer in their abilities, interests, and personalities, but also and citizen
believed that most people were qualified for several occu- (Super et al., 1996, p. 126).
pations, not just a few positions. Individuals and occupa- The brief mention of “opportunity structure” is impor-
tions were each flexible enough to “allow both some tant to underscore, in light of the Great Recession experi-
variety of occupations for each individual and some vari- enced worldwide in the early part of the twenty-first century.
ety of individuals in each occupation” (Super, 1953, p. 189).
He argued that the individual self-concept evolves 11.2.6: Social Cognitive Approaches
with time and experience, so that vocational choice and
Social cognitive approaches to career development
adjustment are continuous and lifelong processes. He envi-
acknowledge that people learn and develop attitudes
sioned five occupational life stages: growth, exploration,
about work within a social context through observation
establishment, maintenance, and decline. These stages are
and modeling of behavior. Prominent exemplars of this
sometimes known as a career ladder (Super et al., 1996).
approach include Gottfredson (2005), Lent, Brown, and
The growth stage extends into the teenage years and
Hackett (2000), and Krumboltz (2009). In our coverage
involves the observation of adult behavior and the explo-
here, we summarize the recent views of John Krumboltz
ration of fantasies and interests. The exploration stage
because of their direct relevance to matters of assessment.
was subdivided into fantasy, tentative, and realistic
Krumboltz (2009) calls his approach the Happenstance
phases, as the young adult tries out one or more lines of
Learning Theory (HLT). In brief:
training or education toward an eventual career. The
establishment stage begins around age 25 or 30, and was HLT posits that human behavior is the product of count-
subdivided into the trial and stabilization phases. Voca- less numbers of learning experiences made available by
tional development tasks encountered in this stage both planned and unplanned situations in which individ-
include the assimilation of organizational climate, the uals find themselves. The learning outcomes include
skills, interests, knowledge, beliefs, preferences, sensitivi-
consolidation of positive relationships with coworkers,
ties, emotions, and future actions (p. 135).
and the advancement of career responsibilities through
promotion (Super, 1990). The theory is practical and compassionate in style,
In the maintenance stage of middle age, the individual attempting to explain how and why each person follows a
may need to innovate, update skills, or face career stagna- unique path, and describing how counselors can facilitate
tion. Additionally, some persons ask: “Should I remain in development. In regard to the how and why of behavior,
this career?” If the answer is “No” then the individual Krumboltz surveys genetic influences, learning experi-
would reenter the exploration and establishment stages ences, environmental conditions, parents and caretaker
before attaining the maintenance stage. The last stage, influences, peer groups, and structured educational set-
decline, is hypothesized to occur in old age and may require tings. He concludes by noting that “Social justice is not
possible specialization, disengagement, or retirement. equally distributed among humans on our planet.” He
Industrial, Occupational, and Career Assessment 339

argues powerfully that practitioners have a responsibility club she befriends a bank manager who is impressed
to help overcome social injustice. The proper uses of assess- by her winsome personality, which leads to a job inter-
ment might be a small part of the solution. view and a new career endeavor.
HLT is based on four premises (Krumboltz, 2009): 4. The success of counseling is assessed by what the client
1. The goal of career counseling is to help clients learn accomplishes in the real world outside the counseling
to take actions to achieve more satisfying career and session—not by what takes place during counseling.
personal lives—not to make a single career decision. HLT is an action-based theory. The task of the counselor
Krumboltz notes that the future is uncertain for every- is to collaboratively identify things that the client can
one, especially in the world of work, where new careers do outside of the consultation that will promote new
emerge and old ones die out. In his view, making a sin- learning and new opportunities. A simple example is
gle career decision is potentially foolhardy. A more ten- asking the client to commit to one action step between
tative, exploratory approach is to be preferred. appointments (e.g., ask three people how they came to
be working in their current job) and to report back by
2. Assessments are used to stimulate learning—not to
email how things went.
match personal characteristics with occupational
characteristics.
For example, in regard to interest assessment, Krumboltz 11.2.7: O*NET in Career
contends that the goal is to help clients find attractive Development
activities to explore now. In regard to happenstance, it is
The Occupational Information Network or O*NET is the
his experience that helping clients commit to new ac-
primary source of occupational information in the United
tions often will open up unexpected opportunities. A
States. O*NET is sponsored by the U.S. Department of
similar argument holds for personality assessment,
Labor and is free and open to anyone in the world who has
which can be used to stimulate discussion about alter-
an Internet connection. This is a rich and sophisticated
native settings for the client, and to identify areas of
database that includes detailed information on nearly 1,000
needed change (e.g., assertiveness training for an intro-
specific occupations. For each occupation, the website lists
verted client). It may also prove helpful to identify dys-
the knowledge, skills, and abilities needed. Personality
functional career beliefs by using the Career Beliefs Inven-
qualities needed, education required, technology needs,
tory (Krumboltz & Vosvick, 1996), which is discussed
and typical salary also are given.
later in this topic.
The website provides several assessment tools for
Krumboltz (1993, 1996) has been critical of many inter-
career exploration, including a number of instruments that
est inventories because most clients have little or no
can be self-administered. For example, the O*NET Interest
experience with the topics being assessed. Instead of
Profiler is an online test consisting of 60 occupational activ-
marking items as like, dislike, or indifferent, he play-
ities that are rated on a five-point scale from strongly dislike
fully suggests that the response options should be “I
to strongly like. The test not only yields a score for each of
don’t know yet,” “I haven’t tried that yet,” or “I’d like
the six RIASEC dimensions, but also links to a user-friendly
to learn more about that before I answer” (Krumboltz,
list of specific occupations suited to the preparation level
1996, p. 57). He also finds fault with these instruments
selected by the examinee. Further, these occupations are
because they focus excessively on cognitive matching
individually rated for employment outlook, environmen-
of client to work environments, and overlook the emo-
tal or “green” appeal, and apprenticeship needed.
tional problems, including dysfunctional career be-
liefs, that hamper career development.
3. Clients learn to engage in exploratory actions as a way 11.2.8: Inventories for Career
of generating beneficial unplanned events—not to Assessment
plan all their actions in advance. One guiding motif in this topic is that successful assess-
The statement that “chance favors only the prepared ment for career guidance requires ongoing interaction
mind” is attributed to Louis Pasteur (1822–1895), the with clients. Career counseling extends well beyond mere
French biologist and chemist. But the statement can be testing. Avoiding the “test and tell” trap is vital. Even so,
applied to career development as well. Krumboltz as- the use of appropriate assessment tools can be helpful,
serts that the goal of the counselor is to help clients sometimes even essential. The number of instruments
engage in activities that are likely to generate un- available for career assessment is huge, and new tools
planned events, and to prepare clients to benefit from emerge every year. We survey a number of widely used
these happenstance occurrences. An example might be tests here, to provide a sense of the diversity available. We
encouraging an unemployed client to join a health club begin with a specialized tool designed to challenge mala-
as a means of exploring her interests in yoga. At the daptive career beliefs.
340 Chapter 11

Career Beliefs Inventory Krumboltz (1991) cre- as adults, both employed and unemployed. Initial test–
ated the Career Beliefs Inventory to identify and measure retest reliability data for the CBI are mixed, with one month
attitudes and beliefs that might block career development. reliabilities ranging from .30s to the .70s for the high school
In his work with clients, he often noted that people firmly sample. Internal consistencies were likewise modest, with
hold to self-limiting beliefs that prevent them from finding coefficients mainly in the range of .40 to .50. This might be
a satisfying job or career. due to the small number of items for some scales, as few as
two items for several scales. Fuqua and Newman (1994)
recommend that the CBI could be improved if additional
items were added to some of the scales.
Walsh (1996) supplemented the original standardiza-
tion sample for the CBI with nearly 600 additional partici-
pants. She reported more promising results, with internal
consistencies ranging from the low .30s to the high .80s,
with a mean coefficient alpha of .57 for the CBI scale scores.
Regarding validity, results of factor analyses did find
reproducible clusters of beliefs, but these did not corre-
spond to the scale clusters provided in the CBI reports. She
suggests that the practical application of the CBI might rest
with exploring client beliefs at the level of the individual
items (Walsh, Thompson, & Kapes, 1996).
In a study of convergent validity correlating CBI
results with data from four other personality and voca-
The Career Beliefs Inventory (CBI) was designed to
tional inventories, Holland, Johnston, Asama, and Polys
increase the awareness of clients to underlying career
(1993) reported at least moderate construct validity for
beliefs and to gauge the potential influence of these beliefs
most of the CBI scales. They concluded that the test seems
on occupational choice and life satisfaction.
to be measuring variance in career variables not assessed
The CBI can be taken individually or administered in a
by other instruments. In addition, significant correlation of
group setting to persons in grade 8 or higher. The paper-
some CBI scales with the State-Trait Anxiety Inventory
and-pencil test can be hand-scored, but computer-scoring is
indicated that certain self-limiting and irrational beliefs
preferable because it yields an elegant 12-page report. Hand
caused emotional discomfort.
scoring is also confusing and likely to introduce errors.
The 96 test items, all in Likert format, are grouped into
25 scales organized under the following five headings: 11.2.9: Inventories for Interest
1. Your Current Career Situation. Four Scales: Employ- Assessment
ment Status, Career Plans, Acceptance of Uncertainty, In most applications of psychological testing, the goals of
and Openness. assessment are reasonably clear. For example, intelligence
2. What Seems Necessary for Your Happiness. Five scales: testing helps predict school performance; aptitude testing
Achievement, College Education, Intrinsic Satisfaction, foretells potential for accomplishment; and personality
Peer Equality, and Structured Work Environment. testing provides information about social and emotional
3. Factors that Influence Your Decisions. Six scales: Con- functioning. But what is the purpose of interest assess-
trol, Responsibility, Approval of Others, Self-other ment? Why would a psychologist recommend it? What can
Comparisons, Occupation/College Variation, and Career a client expect to gain from a survey of his or her interests?
Path Flexibility. Interest assessment promotes two compatible goals:
life satisfaction and vocational productivity. It is nearly
4. Changes You Are Willing to Make. Three scales: Post-
self-evident that a good fit between individual interests
training Transition, Job Experimentation, and Relocation.
and chosen vocation will help foster personal life satisfac-
5. Effort You Are Willing to Initiate. Seven scales: Improv-
tion. After all, when work is interesting we are more likely
ing Self, Persisting While Uncertain, Taking Risks,
to experience personal fulfillment as well. In addition,
Learning Job Skills, Negotiating/Searching, Overcom-
persons who are satisfied with their work are more likely
ing Obstacles, and Working Hard.
to be productive. Thus, employees and employers both
Standardization of the CBI is based on more than 7,500 stand to gain from the artful application of interest assess-
individuals in the United States and Australia. The sample ment. Several useful instruments exist for this purpose,
was reasonably diverse, with age range of 12 to 75, includ- and we will review the most widely used interest inven-
ing junior high, high school, and college students, as well tories later.
Industrial, Occupational, and Career Assessment 341

In the selection of employees, the consideration of per- testing (Donnay, Thompson, Morris, & Schaubhut, 2004).
sonal interests may be of great practical significance to We can best understand the SII-R by studying the history of
employers and, therefore, circumstantially relevant to the its esteemed predecessor, the SVIB. In particular, we need to
job candidates as well. We may sketch out a rough equa- review the guiding assumptions used in the construction of
tion as follows: productivity = ability × interest. In other the SVIB that have been carried over into the SII-R.
words, high ability in a specific field does not guarantee The first edition of the SVIB appeared in 1927, eight
success; neither does high interest level. The best predic- years after E. K. Strong formulated the essential proce-
tions are possible when both variables are considered dures for measuring occupational interests while attend-
together. Thus, employers have good reason to determine ing a seminar at the Carnegie Institute of Technology
whether a potential employee is well matched to the posi- (Campbell, 1971; Strong, 1927). In constructing the SVIB,
tion; the employee should like to know as well. Strong employed two little-used techniques in measure-
Working from the Holland RIASEC model described ment. First, the examinee was asked to express liking or
earlier, Ny, Su, Rounds, and Drasgow (2012) recently com- disliking for a large and varied sample of occupations,
pleted an intriguing quantitative summary of 60 years of educational disciplines, personality types, and recrea-
research on the relationship between vocational interests, tional activities. Second, the responses were empirically
person–environment fit, and job performance. Their review keyed for specific occupations. In an empirical key, a spe-
was based on 568 correlations from published empirical cific response (e.g., liking to roller skate) is assigned to the
studies. The basic premise of their survey was that: scale for a particular occupation only if successful persons
in that occupation tend to answer in that manner more
Holland’s theory suggests that the similarities between
an individual’s interest profile and the profile of his or often than comparison subjects.
her occupation should predict tenure and performance in Although Strong did not express his underlying
academic and work domains (p. 387). assumptions in a simple and straightforward manner, it is
clear that the theoretical foundation for the SVIB derives
This is exactly what their analyses revealed. For the
from a typological, trait-oriented conception of personality.
employment studies reviewed, the correlations between
Tzeng (1987) has identified the following basic assump-
“fit” (congruence between an individual’s Holland code
tions in the development and application of the SVIB:
and the code of his/her chosen occupation) and job perfor-
mance ranged from .21 to .30, depending on the inventory 1. Each occupation has a desirable pattern of interests
used and the characteristics of the study. The same pattern and personality characteristics among its workers. The
emerged in the academic samples. The correlations ideal pattern is represented by successful people in that
between “fit” (congruence between a student’s Holland occupation.
code and the code of his/her chosen major) and grades 2. Each individual has relatively stable interests and per-
were mainly in the range of .27 to .31. In other words, when sonality traits. When such interests and traits match
employees or students possess interest patterns that match the desirable interest patterns of the occupation the in-
the expectations of their job or major, they are more likely dividual has a high probability to enter that occupation
to be productive in their work or studies. and be more likely to succeed in it.
We turn now to a critical examination of major interest 3. It is highly possible to differentiate individuals in a
tests. The four instruments chosen for review include: given occupation from others-in-general in terms of
the desirable patterns of interests and traits for that
• The Strong Interest Inventory-Revised (SII-R), the latest
occupation.
revision of the well-known Strong Vocational Interest
Blank (SVIB) Strong constructed the scales of his inventory by con-
• The Vocational Preference Inventory (VPI), a useful trasting the responses of several specific occupational crite-
inventory that embodies the RIASEC model of John rion groups with those of a people-in-general group. The
Holland subjects for each criterion group were workers in that occu-
• The Self-Directed Search (SDS), a self-administered pation who were satisfied with their jobs and who had
and self-scored guide to exploring career options been so employed for at least three years. The items that
differentiated the two groups, keyed in the appropriate
• The Campbell Interest and Skill Survey (CISS), an
direction, were selected for each occupational scale. For
appealing test that is simple in format but sophisti-
example, if members of a specific occupational group dis-
cated in execution
liked “buying merchandise for a store” more often than
Strong Interest Inventory-Revised (SII-R) The people-in-general, then that item (keyed in the dislike
Strong Interest Inventory-Revised (SII-R) is the latest revi- direction) was added to the scale for that occupation.
sion of the Strong Vocational Interest Blank (SVIB), one of The first SVIB consisted of 420 items and a mere hand-
the oldest and most prominent instruments in psychological ful of occupational scales (Strong, 1927). Separate editions
342 Chapter 11

for men and women followed shortly. The inventory has sample and the reference sample, supporting the distinc-
undergone numerous revisions over the years (Tzeng, tiveness of specific career paths (Donnay et al., 2004).
1987), culminating in the modern instrument known as the The SII-R also yields five Personal Style Scales. These
Strong Interest Inventory-Revised (Campbell, 1974; Hansen, are designed to measure preferences for broad styles of liv-
1992; Hansen & Campbell, 1985; Donnay, et al., 2004). ing and working. These scales assist in vocational guidance
Although the Strong Interest Inventory (SII-R) was by showing the level of comfort with distinctive styles.
fashioned according to the same philosophy as the SVIB,
the latest revision departs from its predecessors in a num-
ber of ways.

The SII-R consists of 291 items answered in a 5-point


Likert format, with options of Strongly Like, Like, Indifferent,
Dislike, Strongly Dislike. The standardization sample (N =
2,250) consists of an equal number of employed men and
women from the U.S. workforce. The sample is restricted
The personal style scales each have a mean of 50 and a
to employed persons because the main purpose of the test
standard deviation of 10. Note that these are truly bipolar
is to determine interest patterns within occupational
scales for which each pole is distinct and meaningful.
groups. Racial and ethnic groups accurately represent the
The SII-R can only be scored by prepaid answer sheets
U.S. population and constitute 30 percent of the sample.
or booklets that are mailed or faxed to the publisher, or
Test results are organized in six sections. At the most
through purchase of a software system that provides on-
global level are the six General Occupational Theme scores,
site scoring for immediate results. The results consist of a
namely, Realistic, Investigative, Artistic, Social, Enterpris-
lengthy printout that is organized according to several
ing, and Conventional. These scores are based on the theo-
themes. All scores are expressed as standard scores with a
retical analysis of Holland (1966, 1985), whose work was
mean of 50 and an SD of 10.
discussed earlier. Each theme score pertains to a major
interest area that describes both a work environment and a Evaluation of the SII-R The SII-R represents the
type of person. For example, persons scoring high on the culmination of over 70 years of study, involving literally
Realistic theme are generally quite robust, have difficulty thousands of research reports and hundreds of thousands
expressing their feelings, and prefer to work outdoors with of respondents. In evaluating this instrument, we can only
heavy machinery. outline basic trends in the research, referring the reader to
The 30 Basic Interest Scales are found within the gen- other sources for details (Bailey, Larson, Borgen, & Gasser,
eral theme scores. These identify specific interest domains, 2008; Savickas, Taber, & Spokane, 2002; Hansen, 1992;
indicating areas likely to be stimulating and rewarding to Hansen & Campbell, 1985). We should also point out that
the client. Examples of these scales include Counseling and evaluations of the reliability and validity of the SII-R are
Helping, Visual Arts and Design, Marketing and Advertis- based in part upon its similarity to the SII and SVIB, for
ing, Finance and Investing, Medical Science, and Mechan- which a huge amount of technical data exists.
ics and Construction. The interest scales are empirically Based upon test–retest studies, the reliability of the
derived and consist of substantially intercorrelated items. Strong has proved to be exceptionally good in the short
The most detailed results consist of 130 Occupational run, with one- and two-week stability coefficients for the
Scales, with separate normative data for each gender. Scores occupational scales generally in the .90s. When the test–
on these scales indicate the similarity of people of the cli- retest interval is years or decades, the correlations drop to
ent’s gender who have been working in, and are satisfied the .60s and .70s for the occupational scales, except for
with, the listed occupation. Each scale produced at least a respondents who were older (over age 25) upon first test-
one standard deviation separation between the occupational ing. For younger respondents first tested as adolescents,
Industrial, Occupational, and Career Assessment 343

the median test–retest correlation after 15 years is around As noted previously, Holland proposes that personal-
.50 (Lubinski, Benbow, & Ryan, 1995). But for older ity traits tend to cluster into a small number of vocationally
respondents, first tested after the age of 25, the median relevant patterns, called types. For each personality type
test–retest correlation 10 to 20 years later is a phenomenal there is also a corresponding work environment best suited
.80 (Campbell, 1971). Apparently, by the time we pass to that type. According to Holland, there are six types:
through young adulthood, personal interests become Realistic, Investigative, Artistic, Social, Enterprising, and
extremely stable. The questions on the SII-SVIB capture Conventional. This is sometimes known as the RIASEC
that stability in the occupational scores, providing support model, in reference to the first letters of the six types.
for the trait conception of personality upon which these Test–retest reliability coefficients for the six major
instruments were based. scales range from .89 to .97. VPI norms are based upon
The validity of the Strong is premised largely on the large convenience samples of college students and
ability of the initial occupational profile to predict the employed adults from earlier VPI editions. The character-
occupation eventually pursued. Strong (1955) reported istics of the standardization sample are not well defined,
that the chances were about two in three that people which makes the norms somewhat difficult to interpret
would be in occupations predicted by high occupational (Rounds, 1985).
scale scores, and about one in five that respondents would The validity of the VPI is essentially tied to the validity
be in occupations for which they had shown little interest of Holland’s (1985a) hexagonal model of vocational inter-
when tested. Although other researchers have quibbled ests. Literally hundreds of studies have examined this
with the exact proportions (Dolliver, Irvin, & Bigley, 1972), model from different perspectives. We will cite trends and
it is clear that the SII-SVIB has impressive hit rates in pre- representative studies. The reader is referred to Holland
dicting occupational entry. The instrument functions even (1985c) and Walsh and Holland (1992) for more details.
better in predicting the occupations that an examinee will Several VPI studies have investigated a key assump-
not enter. In a recent study, Donnay and Borgen (1996) pro- tion of Holland’s theory—that individuals tend to move
vide evidence for construct validity by demonstrating toward environments that are congruent with their person-
strong overall differentiation between 50 occupational ality types. If this assumption is correct, then the real-world
groups on the SII: match between work environments and personality types
of employees should be substantial. We should expect to
The big picture is that people in diverse occupations show
large and predictable differences in likes and dislikes, find that Realistic environments have mainly Realistic
whether in terms of vocational interests or in terms of employees, Social environments have mainly Social
personal styles. And the Strong provides valid, structural, employees, and so on. Research on this topic has followed
and comprehensive measures of these differences. (p. 290) a straightforward methodology: Subjects are tested with
the VPI and classified by their Holland types (using up to
The SII-R is used mainly with high school and college
six letters); the work environments of the subjects are then
students and adults seeking vocational guidance or advice
independently classified by an appropriate environmental
on continued education. Because most students’ interests
measure; finally, the degree of congruence between per-
are undeveloped and unstabilized prior to age 13 or 14, the
sons and environments is computed. In better studies, a
test is not recommended for use below high-school level.
correction for chance agreement is also applied.
As evident in the reliability data reported, the SII-R
Using his hexagonal model, Holland has developed
becomes increasingly valuable with older subjects, and it is
occupational codes as a basis for classifying work environ-
not unusual to see middle-aged persons use the results of
ments (Gottfredson & Holland, 1989; Holland, 1966, 1978,
this instrument for guidance in career change.
1985c). For example, landscape architect is coded as RIA
Vocational Preference Inventory The Voca- (Realistic, Investigative, Artistic) because this occupation is
tional Preference Inventory is an objective, paper-and-pen- known to be a technical, skilled trade (Realistic compo-
cil personality interest inventory used in vocational and nent) that requires scientific skills (Investigative compo-
career assessment (Holland, 1985c). The VPI measures 11 nent) and also demands artistic aptitude (Artistic
dimensions, including the six personality–environment component). The Realistic component is listed first because
themes of Realistic, Investigative, Artistic, Social, Enter- it is the most important for landscape architect, whereas
prising, and Conventional, and five additional dimensions the Investigative and Artistic components are of secondary
of Self-Control, Masculinity/Femininity, Status, Infre- and tertiary importance, respectively. Some other occupa-
quency, and Acquiescence. The test items consist of 160 tions and their codes are taxi driver (RSE), mathematics
occupational titles toward which the examinee expresses a teacher (ISC), reporter (ASE), police officer (SRE), real
feeling by marking y (yes) or n (no). The VPI is a brief test estate appraiser (ECS), and secretary (CSA). In a similar
(15 to 30 minutes) and is intended for persons 14 years and manner, Holland has also worked out codes for different
older with normal intelligence. college majors.
344 Chapter 11

One approach to congruence studies is to compare VPI are reported for pooled convenience samples of 4,675 high
results of students or workers with the Holland codes that school students, 3,355 college students, and 4,250 employed
correspond to their college majors or occupations. For adults ages 16 through 24 (Holland, 1985a, b). However,
example, VPI Holland codes for a sample of police officers SDS results are typically interpreted in an individualized,
should consist mainly of profiles that begin with S and ipsative manner (“Is this occupation a good fit for this cli-
should contain a larger-than-chance proportion of specifi- ent?”), so normative data are of limited relevance.
cally SRE profiles. Furthermore, the degree of congruence The SDS is available in a hand-scored paper-and-pencil
should be related to the degree of expressed satisfaction version and a computerized version as well. Unfortunately,
with that line of work or study. the paper-and-pencil version is prone to a 16 percent cleri-
Research with college students provides strong sup- cal error rate when used by high school students (Holland,
port for the congruence prediction: Students tend to select 1985a, b). The user-friendly microcomputer test is probably
and enter college majors that are congruent with their pri- the preferred version because of the ease of administration
mary personality types (Holland, 1985a; Walsh & Holland, and the error-free scoring and interpretation.
1992). Thus, Artistic types tend to major in art, Investiga- When a subject takes the SDS, the three highest theme
tive types tend to major in biology, and Enterprising types scores are used to denote a summary code. For example, a
tend to major in business, to cite just a few examples. These person whose three highest scores were on Investigative,
results provide strong support for the VPI and the theory Artistic, and Realistic would have a summary code of IAR.
upon which it is based. In a separate booklet distributed with the test—the Occupa-
This short review has barely touched the surface of tions Finder—the examinee can look up his or her summary
supportive validity studies with the VPI. Walsh and code and find a list of occupations that provide the best
­Holland (1992) cite several additional lines of research that “fit.” For example, an examinee with an IAR summary
buttress the validity of this test. But not all studies of the code would learn that he or she most closely resembles
VPI affirm its validity. Furnham, Toop, Lewis, and Fisher persons in the following occupations: anthropologist,
(1995) failed to find a relationship between personality– astronomer, chemist, pathologist, and physicist. The test
environment (P-E) “fit” and job satisfaction, a key theoreti- booklet contains additional information, which helps the
cal underpinning of the test. According to Holland’s theory, examinee explore relevant career options.
the better the P–E fit, the greater should be job satisfaction. The SDS serves a very useful purpose in providing a
In three British samples, the relationships were weak or quick and simple format for prompting young persons to
nonexistent, suggesting that the VPI does not “travel well” examine career alternatives. By eliminating the time-con-
in cultures outside of the United States. suming process of administration, scoring, interpretation,
and counselor feedback, the test makes it possible for a
Self-Directed Search Holland has always shown a wide audience to receive an introductory level of career
keen interest in the practical applications of his research on counseling. Holland (1985a, b) proposes that the SDS is
vocational development. Consistent with this interest, he appropriate for up to 50 percent of students and adults
developed the Self-Directed Search, a highly practical, brief who might desire career guidance. Presumably, the other
test that is appealing in its simplicity (Holland, 1985a, b). 50 percent would find the SDS an insufficient basis for
As the name suggests, the Self-Directed Search is designed career exploration. Holland (1985a, b) rightfully warns
to be a self-administered, self-scored, and self-interpreted users to consider many sources of information in career
test of vocational interest. The SDS measures the six choice and not to rely too heavily on test scores per se.
RIASEC vocational themes described previously. ­Levinson (1990) discusses the integration of SDS data with
The SDS consists of dichotomous items that the exami- other psychoeducational data to make specific vocational
nee marks “like” or “dislike” (or “yes” or “no”) in four sec- recommendations for high school students.
tions: (1) Activities (six scales of 11 items each); (2) LaBarbera (2005) illustrates the potential application
Competencies (six scales of 11 items each); (3) Occupations of the SDS in a study of 463 physician assistants (PAs)
(six scales of 14 items each); and (4) Self-Estimates (two sets known to be well satisfied with their work. The PAs are
of six ratings). For each section, the face-valid items are medical professionals who provide care under the supervi-
grouped by RIASEC themes. For each theme, the total sion of a licensed physician. This is a demanding profes-
number of “like” and “yes” answers is combined with the sion with well defined duties that include many of the
self-estimates of ability to come up with a total theme same functions provided by a general practitioner. Who is
score. The SDS takes 30 to 50 minutes for completion and is a good candidate for this up-and-coming profession in
intended for persons 15 years and older. high demand? LaBarbera (2005) determined that the Hol-
The RIASEC themes on the SDS showed test-retest reli- land profile was a distinctive SIR for men, especially those
abilities that range from .56 to .95 and internal consistencies with interests in surgery, whereas the profile for women
that range from .70 to .93. Norms for SDS scales and codes maintained the first two letters (SI) but yielded a muddle
Industrial, Occupational, and Career Assessment 345

for the third theme. This is valuable information for pro- subjects, and varied working activities that the examinee
spective students and career counselors. rates on a six-point scale from strongly like to strongly dis-
The validity of the SDS is linked to the validity of the like. The interest items resemble the following:
hexagonal model of personality and environments upon
A pilot, flying commercial aircraft
which the test is based. One aspect of validity, then, is
whether the model makes predictions that are confirmed A biologist, working in a research lab
by SDS results in the real world. In general, the results A police detective, solving crimes
from over 400 studies support the construct validity of the
The skill items include a list of activities that the exam-
SDS (Dumenci, 1995; Holland, 1985a, b, 1987).
inee rates on a six-point scale from expert (widely recog-
One approach to construct validity is to determine
nized as excellent in this area) to none (have no skills in
whether the relationships among SDS scales make theoreti-
this area). The skill items resemble the following:
cal sense. One tenet of construct validity is that similar
scales should reveal stronger relationships, dissimilar Helping a family resolve its conflicts
scales weaker relationships. For example, it is not hard to Making furniture, using woodworking and power tools
imagine one person combining Artistic and Investigative
Writing a magazine story
themes in personality and work environment. After all,
these themes are mildly similar, so we would predict a CISS results are scored on several different kinds of
moderately positive correlation between them. This is scales: Orientation Scales, Basic Interest and Skill Scales,
exactly what Holland (1985a, b) found. In a general refer- Occupational Scales, Special Scales, and Procedural Checks.
ence sample of 175 women aged 26 to 65 years, scores on All scale scores are reported as T scores, normed to a popu-
these two themes correlated modestly, r = .26, as would be lation average of 50, with a standard deviation of 10.
predicted. Further, unrelated themes like Investigative and The Orientation Scales serve to organize the CISS
Enterprising (which bear little in common) should reveal a profile—the interest, skill, and occupational scales are
weak correlation. The value turned out to be a negligible reported under the appropriate Orientations. The seven
r = −.02. Overall, the various correlations among the six Orientations are as follows (Campbell et al., 1992, pp. 2–3):
themes of the SDS make theoretical sense, which supports
• Influencing—influencing others through leadership,
the construct validity of the test.
politics, public speaking, and marketing
The predictive validity of the SDS has been investi-
gated in several dozen studies, which are summarized by • Organizing—organizing the work of others, managing,
Holland (1985a, b, 1987). The typical methodology for and monitoring financial performance
these studies is that SDS high-point codes for large sam- • Helping—helping others through teaching, healing,
ples of students are compared with the first letter of their and counseling
occupational choices (or aspirations) one to three years • Creating—creating artistic, literary, or musical produc-
later. Overall, the findings indicate that the SDS has mod- tions, and designing products or environments
erate to high predictive efficiency, depending upon the age • Analyzing—analyzing data, using mathematics, and
of the sample (hit rates go up with age), the length of the carrying out scientific experiments
time interval (hit rates go down with time), and the specific • Producing—producing products, using “hands- on”
category predicted (hit rates are better for Investigative skills in farming, construction, and mechanical crafts
and Social predictions) (Gottfredson & Holland, 1975).
• Adventuring—adventuring, competing, and risk taking
through athletic, police, and military activities
Campbell Interest and Skill Survey The Camp-
bell Interest and Skill Survey (CISS; Campbell, Hyne, & There are 29 pairs of Basic Scales, each pair consisting
Nilsen, 1992) is a newer measure of self-reported interests of parallel interest and skill scales. The Basic Scales are
and skills. The test is designed to help individuals make clustered within the seven Orientations, based upon their
better career choices by describing how their interests and intercorrelations. For example, the Helping Orientation
skills match the occupational world. The primary target contains the following Basic Scales, each with separate
population for the CISS is students and young adults who interest and skill components: Adult Development, Coun-
have not entered the job market, but the test is also suitable seling, Child Development, Religious Activities, and Medi-
for older workers who are considering a change in careers. cal Practice.
The test is appropriate for persons 15 years of age and The 58 pairs of Occupational Scales, each with sepa-
older with a sixth-grade reading level, although younger rate interest and skill components, provide feedback on the
children can be tested in exceptional circumstances. degree of similarity between the examinee and satisfied
The CISS consists of 200 interest items and 120 skill workers in that occupation. These scales were constructed
items. The interest items include occupations, school empirically by contrasting the responses of happily
346 Chapter 11

Figure 11.4 Representative Sections from the Campbell Interest and Skill Survey
NOTE: The full profile consists of an 11-page printout.

Source: From Campbell Interest and Skill Survey (CISS). Copyright © 1997 David Campbell, Ph.D. Reproduced with permission of the publisher NCS Pearson,
Inc. All rights reserved. “Campbell” and “CISS” are trademarks, in the US and/or other countries, of Pearson Education, INC. or its affiliates.

CAMPBELL™ INTEREST AND SKILL SURVEY INDIVIDUAL PROFILE REPORT


SAMPLE REPORT Date Scored: 07/27/2005
Orientations and Basic Scales
Interest/
Orientations Interest Skill Very Low Low Mid-Range High Very High Skill
and Basic Scales Pattern
30 35 40 45 50 55 60 65 70
Influencing 52 48

Leadership 55 54 Develop

Law/Politics 60 49 Develop

Public Speaking 30 47
Sales 56 52 Develop
Advertising/Marketing 48 53
Organizing 40 38 Avoid

Supervision 46 34
Financial Services 45 46
Office Practices 42 30 Avoid
Helping 61 57 Pursue

Adult Development 60 59 Pursue

Counseling 66 60 Pursue

Child Development 68 52 Develop

Religious Activities 36 42 Avoid

Medical Practice 63 50 Develop


Creating 29 37 Avoid

Art/Design 34 41 Avoid

Performing Arts 32 42 Avoid

Writing 39 53
International Activities 57 54 Develop

Fashion 34 37 Avoid
Culinary Arts 35 38 Avoid

Analyzing 59 53 Develop

Mathematics 55 54 Develop
Science 55 50 Develop
Producing 56 60 Pursue

Mechanical Crafts 57 59 Pursue


Woodworking 55 63 Pursue
Farming/Forestry 54 57 Explore

Plants/Gardens 45 45 Avoid
Animal Care 59 58 Pursue
Adventuring 64 70 Pursue
Athletics/Physical Fitness 63 68 Pursue
Military/Law Enforcement 56 66 Pursue
Risks/Adventure 70 67 Pursue
Industrial, Occupational, and Career Assessment 347

Figure 11.4 Continued


CAMPBELL™ INTEREST AND SKILL SURVEY INDIVIDUAL PROFILE REPORT
SAMPLE REPORT Date Scored: 07/27/2005
Influencing Orientation
Orientation Scale Occupational Scales
** *** **
* Very Low Low Mid-Range High Very High Interest/ Orien- * Very Low Low Mid-Range High Very High Interest/
Standard Skill tation Standard Skill
Scores 30 35 40 45 50 55 60 65 70 Pattern Code Scores 25 30 35 40 45 50 55 60 65 70 75 Pattern

I 52 I 62
Influencing S 48 Attorney I Develop
S 52
I 48
Financial Planner IO
S 50
Hotel I 42
Basic Interest and Skill Scales Manager IO
S 43
Avoid
**
* Very Low Low Mid-Range High Very High Interest/
Manufacturer's I 35
Standard Skill
Representative IO
Scores 30 35 40 45 50 55 60 65 70 Pattern S 53
I 55 I 50
Leadership S 54 Develop Marketing Director IO
S 49
Law/ I 60 Develop Realtor
I 58
Pursue
Politics S 49 IO
S 60
Public I 30 I 42
Speaking S 47 CEO/President IOA S 50
I 56 Human Resources IOH I 60
Sales Develop Pursue
S 52 Director S 59
Advertising/ I 48 School I 75
Develop
Marketing S 53 Superintendent IOH S 51
Advertising I 42
Account Executive IC
S 51
I 38
Media Executive IC
S 49
Public Relations I 38
Director IC
S 46
I 46
Corporate Trainer ICH Explore
S 59

The Influencing Orientation focuses on influencing others through leadership, politics, public speaking, sales, and marketing.
Influencers like to make things happen. They are often visible because they tend to take charge of activities that interest them. They
typically work in organizations where they are responsible for directing activities, setting policies, and motivating people.
Influencers are generally confident of their ability to persuade others and they usually enjoy the give-and-take of debating and
negotiating. Typical high-scoring individuals include company presidents, corporate managers, school superintendents, sales
representatives, and attorneys.

Your Influencing interest and skill scores are both mid-range. People who have this pattern of scores typically report moderate
interest and confidence in leading, negotiating, marketing, selling, and public speaking.

Your scores on the Influencing Basic Scales, which provide more detail about your interests and skills in this area, are reported
above on the left-hand side of the page. Your scores on the Influencing Occupational Scales, which show how your pattern of
interests and skills compares with those of people employed in Influencing occupations, are reported above on the right-hand side of
the page. Each occupation has a one-, two-, or three-letter code that indicates its highest Orientation score(s). The more similar the
Orientation code is to your highest Orientation scores (which are reported on page 2), the more likely it is that you will find
satisfaction working in that occupation.

* Standard Scores: I ( ) = Interests; S ( ) = Skills


** Interest/Skill Pattern: Pursue = High Interests, High Skills; Develop = High Interest, Lower Skills;
Explore = High Skills, Lower Interests; Avoid = Low Interest, Low Skills
*** Orientation Code: I=Influencing; O=Organizing; H=Helping; C=Creating; A=Analyzing; P=Producing; A=Adventuring
Range of middle 50% of people in the occupation: Solid Bar = Interests; Hollow Bar = Skills
348 Chapter 11

employed persons in specific occupations with responses Independent correlational studies also support the
of a general reference sample drawn from the working validity of the CISS. For example, in a sample of 221 college
population at large. students, Hansen (2007) correlated CISS Skill Scale scores
In addition to Basic and Occupational Scales, the CISS with SII scores and found strong evidence for convergent
incorporates three special scales: Academic Focus, a meas- and discriminant validity (i.e., strong correlations with
ure of interest and confidence in intellectual, scientific, and similar scales, negligible correlations with dissimilar
literary activities; Extraversion, a measure of social extra- scales). In a sample of 118 adults, Savickas et al. (2002) cor-
version; and Variety, a measure of the examinee’s breadth related scores from individual occupational scales of the
of interests and skills. Finally, the CISS reports a variety of CISS with scores from the scales of other mainstream
Procedural Checks to detect possible problems in test taking instruments such as the Strong Interest Inventory. They
such as random responding or excessive omissions. also found strong support for both convergent validity
Overall, the reliability of CISS scales is exceptionally (i.e., modest correlations for same-named pairs of scales)
strong. For example, coefficient alpha for the Orientation and discriminant validity (i.e., negligible correlations for
Scales is typically in the high .80s, and three-month test– unlike pairs of scales). In a sample of 128 college students,
retest reliabilities for 324 respondents are in the mid- to Hansen and Neuman (1999) confirmed the concurrent
high .80s. Similar findings for reliability are reported for validity of the CISS by finding a good fit between occupa-
the Basic and Occupational Scales. Norms for the CISS are tional scale scores and students’ chosen majors. The fit was
based upon 5,000 subjects spread over the 58 occupations. considered “excellent” or “moderately good” for more
The authors report extensive validity data for the Occupa- than 70 percent of the students. Boggs (1999) provides a
tional Scales, including sample means for each occupa- review and critique of the CISS. Campbell (2002) presents
tional sample as well as lists of the three highest- and the history and development of the instrument.
lowest-scoring occupations for each scale (Campbell et al., This instrument will almost certainly receive increased
1992). These data document that the scales do discriminate attention in the years ahead. One noteworthy feature of the
between occupations in an effective and meaningful way. CISS is the comprehensiveness and clarity of the profile
For example, the average T score on accountant by account- report form. The report consists of 11 user-friendly pages.
ants is 75.8. Statisticians, bookkeepers, and financial plan- We have reprinted two pages in Figure 11-4 for illustrative
ners achieve the next three highest scores for this scale, purposes. This format is preferable to the detail-rich but
with average T scores in the low 60s. Commercial artists, eye-straining graphs encountered with many instruments.
professors, and social workers obtain the three lowest The CISS promises to rival the Strong Interest Inventory for
scores, with average T scores around 40. Because these vocational guidance of young adults.
results fit well with our expectations about occupational
interest and skill patterns, they provide support for the Chapter Quiz: Industrial, Occupational, and Career
validity of the CISS. ­Assessment
Chapter 12
Legal Issues and the Future
of Testing
Learning Objectives
12.1a Review the essential laws that regulate the 12.2a Identify contemporary applications of the
use of tests in a variety of settings—schools, computer in psychological assessment
the workplace, and hospitals
12.2b Discuss the professional and social issues
12.1b Review several ways that psychologists raised by this practice
interface with the legal system in the field
of forensic assessment

12.1: Psychological Testing assessment are surveyed, and then the professional and
social issues raised by this practice are discussed. The book
and the Law closes with thoughts on the future of testing—which will be
forged in large measure by increasingly sophisticated appli-
12.1a Review the essential laws that regulate the use of cations of computer technology but also greatly affected by
tests in a variety of settings—schools, the legal standards.
workplace, and hospitals

12.1b Review several ways that psychologists interface 12.1.1: The Sources and Nature
with the legal system in the field of forensic
assessment of Law
The law establishes a number of guidelines that define the
In the previous chapters we have outlined the myriad of
permissible scope and applications of psychological test-
ways in which tests are used in decision making. Further-
ing. However, before investigating the key legal guidelines
more, we have established that psychological testing is not
that impact testing, it will be helpful to understand the
only pervasive, but it is also consequential. Psychological
sources and nature of law. Broadly speaking, there are
testing is not only pervasive, but it is also consequential.
three sources of law: constitutional provisions, legislative
Test results matter. Test findings may warrant a passage to
edicts, and judicial opinions. We examine each briefly.
privilege. Conversely, test findings may sanction the denial
of opportunity. For many reasons, then, it is appropriate to Constitutional Sources of Law The United
close the book with two special topics that bear upon the States has a constitutional form of government, meaning
potential repercussions of psychological testing. In Module that the U.S. Constitution is the final authority for all legal
12.1, Psychological Testing and the Law, we review critical matters in the country. All other forms of law must be con-
legal issues pertaining to the use of psychological tests. In sistent with this seminal document. Thus, the Constitution
this topic, we survey the essential laws that regulate the use places limits on legislative actions and judicial activity. The
of tests in a variety of settings—schools, employment situa- United States is also a federation of states, which means
tions, medical settings, to name just a few arenas in which that each state retains its own government and system of
the law constrains psychological testing. We also examine laws, while ceding some powers to the central govern-
several ways that psychologists interface with the legal sys- ment. For example, the power to regulate interstate com-
tem in the field of forensic assessment. In Module 12.2, merce and the responsibility to provide for the national
Computerized Assessment and the Future of Testing, con- defense both reside with the federal government. Each
temporary applications of the computer in psychological state has its own constitution as well, which is another

349
350 Chapter 12

source of laws that affects citizens living in a state. Of limited largely to forensic practitioners who deal with com-
course, state constitutions cannot contradict the U.S. Con- petency to stand trial, civil and criminal commitment, or the
stitution and, in most cases, they are highly similar to the right to refuse treatment. For example, psychologists who are
federal document. involved in the civil commitment of an individual who needs
Three provisions of the U.S. Constitution potentially treatment typically must show—as a direct consequence of
bear upon the practice of psychological testing: the Fifth, the due process clause of the Fourteenth Amendment—that
Sixth, and Fourteenth Amendments to the Constitution several stringent criteria are fulfilled:
(Melton et al., 1998). The Fifth Amendment provides a
• The individual must be reliably diagnosed as suffering
privilege against self-incrimination, which impacts the
from severe mental illness;
nature of psychological assessment in forensic evaluations.
For example, as discussed previously, a forensic practi- • In the absence of treatment, the prognosis for the indi-
tioner might be asked by the court to evaluate an alleged vidual is major distress;
offender for competency to stand trial. In many states, self- • The individual is incompetent; that is, the illness sub-
incriminating disclosures made during an evaluation of stantially impairs the person’s ability to understand or
competency to stand trial cannot be used to determine communicate about the possibility of treatment;
guilt (i.e., they are inadmissible as evidence during trial). • Treatment is available;
The Sixth Amendment states that every person accused • The risk–benefit ratio of treatment is such that a rea-
of a crime has the right to counsel (i.e., the right to a law- sonable person would consent to it. (Melton et al.,
yer). This is understood to mean both the presence of coun- 1998, p. 310)
sel during legal proceedings and also the right to effective
assistance from counsel. Does this mean that counsel must Whether these conditions are met would be determined
be present during a pretrial assessment, such as a court- at a commitment hearing during which the individual would
ordered evaluation for competency to stand trial? This will have full procedural rights such as the presence of counsel.
depend upon the state and jurisdiction in which the pro- The psychologist’s role would be to offer professional opin-
ceedings occur. Although most courts have held that the ions on these guidelines. Of course, the validity of psycho-
defendant does not have a right to the presence of counsel logical assessment is relevant to these criteria in several
during pretrial psychological evaluations, a minority of ways, including the following: understanding the reliability
courts have held that the Sixth Amendment guarantee does of psychiatric diagnosis, choosing appropriate tests for com-
apply to such pretrial assessments (Melton et al., 1998). In petency (see the topic below, Forensic Applications of Assess-
these jurisdictions, the defendant’s lawyer can be present ment), and comprehending risk–benefit analysis.
during any psychological testing or evaluation. This raises
Legislative Sources of Law In addition to consti-
difficult questions as to the validity of assessments under-
tutional sources, laws also emanate from the actions of
taken in the presence of a third party. For example, what if
state and federal legislative bodies. These laws are called
the client asks his or her lawyer for advice on how to answer
statutes and are codified by subject areas into codes. For
certain questions? Surely, this is not standard protocol in
example, the laws passed by Congress at the federal level
psychological assessment and might drastically affect the
are codified into 50 topics identified as Title 1 through Title
validity of the results. Fortunately, most courts favor alter-
50 with each area devoted to a specific theme. Three exam-
native methods for protecting the rights of defendants dur-
ples include Title 18, Crimes and Criminal Procedure; Title
ing pretrial evaluations, such as tape-recording the session,
20, Education; and Title 29, Labor. Each titled area is fur-
having a defense psychologist observe the evaluation, or
ther subdivided. For example, Title 20, Education, is gar-
providing for an independent evaluation.
gantuan. It consists of 77 chapters, a few of them hundreds
The Fourteenth Amendment provides that no state shall
of pages in length. This includes Chapter 70, Strengthening
deprive any U.S. citizen of life, liberty, or property without
and Improvement of Elementary and Secondary Schools,
“due process of law.” The amendment also specifies “equal
in which literally hundreds of specific statutes passed over
protection of the laws.” The relevant section reads:
the last few decades have been collated and cross-referenced.
No State shall make or enforce any law which shall For example, one federal statute mandates that school sys-
abridge the privileges or immunities of citizens of the tems must show adequate yearly progress in order to be
United States; nor shall any State deprive any person of eligible for further federal funding. The law further stipu-
life, liberty, or property, without due process of law; nor lates that “adequate yearly progress” shall be defined by
deny to any person within its jurisdiction the equal pro- the State in a manner that
tection of the laws.
(i) applies the same high standards of academic achieve-
It is mainly the “due process” feature of this amendment ment to all public elementary school and secondary
that has impacted psychological practice. This influence is school students in the State;
Legal Issues and the Future of Testing 351

(ii) is statistically valid and reliable; • Americans with Disabilities Act of 1990, which prohibits
(iii) results in continuous and substantial academic im- employment discrimination against qualified individ-
provement for all students; uals with disabilities in both government and the pri-
(iv) measures the progress of public elementary schools, vate sector
secondary schools and local educational agencies • Rehabilitation Act of 1973, which prohibits discrimina-
and the State based primarily on the academic assess- tion against qualified individuals with disabilities who
ments described in paragraph (3); work in the federal government
(v) includes separate measurable annual objectives for
• Civil Rights Act of 1991, which authorizes monetary dam-
continuous and substantial improvement for each of
ages in cases of intentional employment discrimination
the following:

(I) The achievement of all public elementary school


The EEOC is the federal agency in charge of the admin-
and secondary school students. istrative and judicial enforcement of the civil rights laws
(II) The achievement of
listed earlier. We discuss this important regulatory body in
further detail later.
(aa) economically disadvantaged students;
(bb) students from major racial and ethnic groups; Judicial Sources of Law Another source of law is
(cc) students with disabilities; and the judiciary, specifically, the federal courts and the United
(dd) students with limited English proficiency; States Supreme Court. Indirectly, these bodies make law in
several ways. First, they have the authority to review all
except that disaggregation of data under subclause (II)
shall not be required in a case in which the number of
federal legislative edicts to determine their constitutional-
students in a category is insufficient to yield statistically ity and interpretation. In addition, they can appraise the
reliable information or the results would reveal person- constitutional validity of any state law, whether constitu-
ally identifiable information about an individual student. tional, statutory, or regulatory in origin. In doing so, they
(U.S. Code, Title 20, Chapter 70, http://uscode.house.gov) have the opportunity to sharpen the focus of laws promul-
gated by these other sources. For example, in ruling on the
As can be seen, legal codes are written with such
constitutionality of state civil commitment laws, federal
specificity that their intention cannot easily be overlooked
courts not only have found them unconstitutional, but they
or bypassed. The preceding sample is just one small snip-
have also used this opportunity to publish permissible cri-
pet of law—barely discernible in a vast ocean of literally
teria and procedures for commitment (as discussed previ-
hundreds of pages of edicts that impact educational prac-
ously in relation to the Fourteenth Amendment). The
tices. But it is clear that these legislative rulings influence
courts also hear lawsuits filed on behalf of individuals or
psychological testing. For example, in the preceding
groups. In these cases, court rulings can establish new law.
excerpt, an inescapable inference is that school systems
Finally, the courts can make law when the original sources
must use standardized educational achievement tests
such as constitutional laws or legislative statutes are silent
with established reliability and validity—or else they risk
on an important issue:
losing federal funds.
Legislatures cannot possibly oversee the implementa- In performing their interpretive function, courts will first
tion of all the statutes they enact. Consequently, it is look at the plain words of any relevant constitutional pro-
increasingly common for these bodies to delegate rule- vision, statute, or regulation and then review the legisla-
making authority to agencies within the executive branch tive history of a given law, including statements made by
of government. For example, the U.S. Congress has the law’s sponsors or during committee or public hearing
passed several laws designed to prohibit discrimination sessions. But if neither of these sources is helpful, or if no
relevant law exists, the courts themselves must devise
in employment. But the enforcement of these laws is left
principles to govern the case before them. The principles
to the Equal Employment Opportunity Commission
articulated by courts when they create law are collectively
(EEOC). The following federal laws bear, at least in part,
known as common law, or judge-made law.
on job discrimination: (Melton et al., 1998, p. 29)
• Civil Rights Act of 1964, which prohibits employment Typically, common law is conservative, based to the
discrimination based on race, color, religion, gender, or extent possible on the precedent of past cases, rather than
national origin created at the whim of the judiciary.
• Equal Pay Act of 1963, which protects women (and In sum, there are several sources of law: state and
men) who perform equal work in the same organiza- federal constitutions, legislative statutes, regulations
tion from gender-based wage discrimination enacted by agencies such as the EEOC, and judicial inter-
• Age Discrimination in Employment Act of 1967, which pretations from federal courts and the Supreme Court.
protects individuals who are 40 years of age or older These are the primary sources of law that might intersect
352 Chapter 12

with the practice of psychological testing. Other sources


of law include presidential executive orders and interna- Table 12.1 Major Legal Landmarks in School-Based
Cognitive Testing
tional law, which we do not discuss here because they
rarely impact psychological practice. 1967 Hobson v. Hansen

Now that the reader has an understanding of how, Court ruled against the use of group ability tests to “track” students on
the grounds that such tests discriminated against minority children.
why, and where laws originate, we turn to a review of par-
1970 Diana v. State Board
ticular laws that impact the practice of psychological
Court ruled against traditional testing procedures for educable mentally
assessment. We partition the discussion into three topics: retarded (EMR) placement of Mexican American children; State Board of
legal influences on psychological testing in school systems, Education enacted special provisions for testing minority children (e.g.,
disability assessment and the law, and legal issues in bilingual assessment).

employment testing. The division is somewhat artificial; 1979 Debra P. v. Turlington

for example, the assessment of learning disability—greatly Court did not rule against the use of a minimum competency test as a
condition for high school graduation—a test with excessive failure rate for
impacted by law—involves both the practice of testing in African American students—but did suspend its use for four years, as a
school systems and the assessment of disability. means of providing due process about notification of the new
requirement.
1979 Larry P. v. Riles
12.1.2: Testing in School Systems Court ruled that standardized IQ tests are culturally biased against African
and the Law American children for EMR evaluation and stipulated that the proportion
of African American children in these classes must match their proportion
The law has impacted school-based testing in two broad in the school population.

ways: (1) Federal legislation has mandated specific prac- 1980 PASE v. Hannon

tices in the assessment of students, especially those with In complete contradiction to the Larry P. v. Riles decision, the court ruled
that standardized IQ tests are not racially or culturally biased.
disabilities; and (2) lawsuits have shaped and reshaped
1984 Georgia NAACP v. Georgia
particular testing practices in school systems over the last
Court ruled that traditional procedures of evaluation do not discriminate
60 years. We will discuss legislative influences in the next
against African American children; court also rejected the view that
section on disability assessment and the law. Our goal disproportionate representation in EMR classes constituted proof of
here is to provide an overview of influential lawsuits that discrimination.

have molded testing practices in the schools. In the main, 1994 Crawford v. Honig

these lawsuits have assailed the use of tests, especially in The judge in the Larry P. v. Riles case overruled his earlier ruling so as to
allow the use of a standardized IQ test for the evaluation of African
special education placement and as a requirement for high American students diagnosed with learning disability.
school graduation. 2000 GI Forum v. Texas Education Agency
Attacks on cognitive testing in school systems have Court ruled that the use of the Texas Assessment of Academic Skills as
been with us for a long time. Beginning in the 1960s, part of a high school graduation requirement was permissible despite
these attacks took a new form: lawsuits filed by minority high failure rates of African American and Latino students.

plaintiffs seeking to curtail or ban the use of school-


based cognitive tests, especially intelligence tests. In this education. Written two decades ago, these observations
section we will review the major court cases, summa- still hold true:
rized in Table 12.1. Later, we will discuss the implica- If special education actually worked, which it does not,
tions of court decisions for the contemporary use of and minority children assigned to EMR classes in the pri-
cognitive tests in schools. mary grades eventually reached the same level of reading
Many of the legal assaults on testing have arisen from and math achievements as children in regular classrooms,
the controversial practice of using cognitive test results for I doubt whether the plaintiffs in these cases would have
purposes of assigning low-functioning students to “voca- brought suit. A major problem in the educational system
tional” school tracks or to special classes for educable men- is that special education, even with smaller classes and
tally retarded (EMR) persons. Invariably, minority children better trained teachers, still does not work to bring such
are assigned to these special tracks and classes in surpris- children up to par. Rather, special education classes per-
petuate educational disadvantage.
ing disproportion to their representation in the school pop-
(Scarr, 1987)
ulation. For example, a typical finding is that minority
children are two to three times more likely to be classified Something is amiss in education when well-intentioned
as EMR than white children (Agbenyega & Jiggetts, 1999). placement policies inadvertently perpetuate a legacy of
In a school system comprised of 25 percent minority stu- mistreatment of minorities. The legal challenges to school-
dents, this could translate to EMR classes with about 50 based testing are certainly understandable, even though
percent minority student representation. sometimes misplaced. After all, the problem is not so much
Therein lies the crux of the legal grievance, for spe- with the tests—which assess academically relevant skills
cial education classes are equated by many with inferior with reasonable validity—but with educational policies
Legal Issues and the Future of Testing 353

that isolate low-functioning students to inefficient place- in classes for educable mentally retarded (EMR) persons.
ments. Even experts sympathetic to the lawsuits acknowl- Diana was a class action suit filed on behalf of nine Mexican
edge that tests often are quite useful, so it is worth examining American elementary school children who had been placed
why killing the messenger has been a popular response to in EMR classes. The placements were based on individual
concerns about discriminatory placements. IQ tests administered by a non-Spanish-speaking psycho-
metrist. When retested in English and Spanish, eight of
Hobson v. Hansen (1967) The first major court case to
these nine children showed substantial—sometimes huge—
challenge the validity of ability tests was Hobson v. Hansen
increases in IQ and were, therefore, removed from EMR
(1967). In that landmark case, plaintiffs argued that the
classes. Faced with this evidence, the California State Board
allocation of financial and educational resources in the
of Education decided to enact a series of special provisions
Washington, DC, public school system favored white chil-
for the testing of Mexican American and Chinese American
dren and, therefore, discriminated against minority chil-
children. These provisions included the testing of minority
dren. Among the issues addressed in the trial was the use
children in their primary language, elimination of certain
of standardized group ability tests such as the Metropoli-
vocabulary and information items that minority children
tan Readiness and Achievement Test and the Otis Quick-
could not be expected to know, retesting of minority chil-
Scoring Mental Ability Test to “track” students according
dren previously placed in EMR classes, and development of
to ability. Children were placed in honors, regular, or basic
new tests normed on Mexican American children. These
tracks according to ability level on the tests. One conse-
provisions answered the concerns of plaintiffs, eliminating
quence of this tracking method was that minority children
the need for further court action.
were disproportionately represented in the lowest track,
which focused on skills and preparation for blue-collar Debra P. v. Turlington (1979) This was a class action
jobs. Placement in this track virtually ruled out entrance to lawsuit filed on behalf of all African American students in
college and entry to a well-paying profession. Florida against Ralph Turlington, the state Commissioner
Judge Skelly Wright decided the Hobson case in 1967, of Education. At issue was the use of the State Student
ruling against the use of a tracking system based on group Assessment Test-Part 2 (SSAT-II), a functional literacy test,
ability tests. Most commentators view his banishment of as one requirement for awarding a high school diploma. In
ability testing for tracking purposes as justified. However, the 1970s, Florida was one of the states at the forefront of
there is good reason to worry about the further implications the functional literacy movement. Functional literacy has
of Judge Wright’s decision, which implied that acceptable to do with practical knowledge and skills used in everyday
tests must measure children’s innate capacity to learn. Ber- life. A test of functional literacy might require students to:
soff (1984) commented on the Hobson decision as follows:
Hobson, when read in its entirety, represents the justified
condemnation of rigid, poorly conceived classification
practices that negatively affected the educational oppor-
tunities of minority children and led to permanent stig-
matization of blacks as unteachable. But swept within
Hobson’s condemnation of harmful classification practices
were ability tests used as the sole or primary decision-
making devices to justify placement. Not only was ability
grouping as then practiced in the District of Columbia
abolished, but tests were banned unless they could be
shown to measure children’s innate capacity to learn.

Not even ardent hereditarians believe that tests solely


measure innate ability. No test could ever pass the criterion
mandated by this case.
The Hobson case concerned group ability tests and had
no direct bearing on the use of individual intelligence tests
in school systems. However, it did portend an increasing
skepticism about the use of any test—whether group or
individual—for purposes of educational placement.
Currently, about 20 states use a functional literacy
Diana v. State Board of Education (1970) In Diana test of this genre as one condition of awarding the high
v. State Board of Education (1970), plaintiffs questioned the use school diploma.
of individual intelligence tests (the WISC and Stanford-Binet) However, in Florida in the late 1970s, African American
for purposes of placing Mexican American schoolchildren students failed the functional literacy test at a substantially
354 Chapter 12

higher rate than white students. Plaintiffs argued the SSAT-II helping black students overcome discriminatory vestiges
was unfair because African American students received and pass the SSAT-II. Thus, we affirm the finding that use
inferior education in substantially segregated schools. The of the SSAT-II as a diploma sanction will help remedy
purpose of the lawsuit was to void the use of the test as a vestiges of past discrimination.
requirement for graduation. The information in the follow- (U.S. Court of Appeals for the Eleventh Circuit,
April 27, 1984)
ing discussion was retrieved from the appeals court deci-
sion (Debra P. v. Turlington, U.S. Court of Appeals for the In sum, the case of Debra P. v. Turlington appears to
Eleventh Circuit, April 27, 1984). confirm that functional literacy testing can play a construc-
With practical finesse, the court decision offered some- tive role in secondary education.
thing to both sides, although state officials likely were hap-
Larry P. v. Riles (1979) The case of Larry P. v. Riles
pier with the outcome than were the plaintiffs. The nature
raised concerns about the use of intelligence tests for
of the ruling also revealed admirable sensitivity to issues of
assigning African American children to EMR special edu-
test validity and psychological measurement on the part of
cation classes. In November 1971 attorneys representing
the court. Based on the reasonable belief that a high school
several San Francisco families filed for a preliminary
diploma should signify functional literacy, the state was per-
injunction seeking to prohibit the use of traditional IQ tests
mitted to use the test as a diploma requirement. However,
for EMR placement of African American children. The spe-
the court delayed implementation of the new diploma test-
cific grievance was that six African American children in
ing program for four years. This delay served two purposes.
the San Francisco school district had been inappropriately
First, it provided due process to current students (and their
placed into “dead-end” EMR classes based on scores from
parents), alerting them that a new requirement was being
IQ tests said to be racially and culturally biased against
set in place. Second, it gave the state time to prove that the
African Americans. As a consequence of this placement it
SSAT-II was a fair test of that which is taught in Florida’s
was alleged that the children had suffered irreparable
classrooms. The court wanted proof of what it called
harm. The plaintiffs sought a ban on the use of “culturally
“instructional validity.” Put simply, the court wanted assur-
biased” IQ tests, asked for reevaluation of all African
ance that the state was teaching what it was testing.
American EMR children, requested special assistance for
The state undertook a massive evaluation project to
those who returned to the regular classroom, and sought a
prove instructional validity. The Florida Department of
quota limiting assignment of African American children to
Education hired a consulting firm to conduct a four-part
EMR classes. The quota was defined in proportion to over-
study that included (1) teacher surveys asking expressly if
all African American representation in the school district
the skills tested by the SSAT-II were taught; (2) administra-
population.
tor surveys to demonstrate that school districts utilized
In 1972 Judge Robert Peckham granted a preliminary
remedial programs when appropriate; (3) site visits to ver-
injunction, restraining school officials in San Francisco
ify all aspects of the study; and (4) student surveys to dis-
from placing primary reliance on IQ tests in EMR place-
cern if students perceived they were being taught the skills
ments for African American children. He also ordered that
required on the functional literacy test.
African American EMR children should be reevaluated
Weighing all the evidence carefully over a period of
and that those who were returned to regular classes should
several years, the court ruled that the State of Florida could
be given special help. However, he was wary of the plain-
deny diplomas to students who had not yet passed the
tiffs’ proposed ratio system limiting African American
SSAT-II, beginning with the class of 1983. Furthermore, the
enrollment in EMR classes.
court concluded that the use of the SSAT-II actually helped
The case of Larry P. eventually went to trial in 1978.
to mitigate the impact of vestiges of school segregation by
More than 50 expert witnesses were called and over 200
motivating students, teachers, and administrators toward
reports, studies, and exhibits were received in evidence.
a common goal:
In the end the plaintiffs prevailed. In 1979 Judge Peck-
The remarkable improvement in the SSAT-II pass rate ham ruled that individual intelligence tests “are racially
among black students over the last six years demon- and culturally biased, have a discriminatory impact
strates that use of the SSAT-II as a diploma sanction will against black children, and have not been validated for
be effective in overcoming the effects of past segregation.
the purpose of essentially permanent placements of black
Appellants argue that the improvement has nothing to
children into educationally dead-end, isolated, and stig-
do with diploma sanctions because the test has not yet
been used to deny diplomas. However, we think it likely
matizing classes for the so-called educable mentally
that the threat of diploma sanction that existed through- retarded.”
out the course of this litigation contributed to the This decision was based, in part, on certain assump-
improved pass rate, and that actual use of the test as a tions about the nature of intelligence that are not necessar-
diploma sanction will be equally, if not more, effective ily shared by experts in the field. For example, after
Legal Issues and the Future of Testing 355

reviewing the trial transcript—some ten thousand pages in presiding judge came to exactly the opposite conclusion.
length—Elliott (1987) concluded that the legal opinion in Judge John Grady ruled that intelligence tests are not cul-
Larry P. was based on the following assumptions: that turally biased against African American children.
intelligence is the innate ability to learn, that a culturally Astonishingly, in his written opinion Judge Grady
fair test should measure innate ability, and that a culturally commented on the cultural fairness of every single item
fair test should produce equal scores for all relevant sub- on the WISC, WISC-R, and Stanford-Binet, finding all but
groups. If these assumptions are correct, then the legal 9 of the 488 items to be culturally fair. He concluded that
opinion cited in Larry P. follows with inexorable logic. the 9 biased items were not sufficient in number to ren-
However, very few assessment specialists embrace the der the tests discriminatory, and he endorsed their ongo-
antiquated view that it is meaningful or useful to define ing use for evaluation of minority children. Although
intelligence as innate ability to learn. little has been made of the judge’s transgression, it would
Within California, the decision effectively abolished the be considered a colossal breach of professional ethics
use of individual intelligence tests for placement of African were a psychologist to publish individual test items in
American students in EMR classes. In 1984 the decision was the public record.
affirmed by the U.S. Ninth Circuit Court of Appeals, and in
Georgia NAACP v. Georgia (1984) In this case the
1986 the ban was extended so that IQ tests could not be used
NAACP alleged that evaluation procedures used in the
for any special education placement of African American
state of Georgia discriminated against African American
children in the public schools of California.
children, resulting in their overrepresentation in EMR
Although it is arguable whether the Larry P. decision
classes. However, the U.S. Court of Appeals ruled in 1984
was good social science, there is no denying the profound
that discrimination did not exist. Furthermore, the court
policy implications of this case:
rejected the notion that overrepresentation of African
For special education, the negative results are reduced American children in EMR classes was a sufficient basis
precision and objectivity of assessment, reduced precision to prove discrimination.
of placement, reduced morale of and faith in the profes-
sionals charged with assessment, some downgrading of Crawford v. Honig (1994) This case initiated a reex-
the once-central importance of developing intellectual amination of the rights of minority children in special edu-
skills, and reduced services for slow-learning, non-LD cation in California. Contrary to other cases in which the
children in the 65–80 range. The positive results are lawyers and parents of minority children asked for a ban
broader and newer kinds of assessment (if there is time on the use of traditional tests, the purpose of Crawford v.
for the breadth, and norms for the novel tests) and some Honig was exactly the opposite—to obtain legal permission
fresh thinking about programs for children having diffi-
for using tests such as the Wechsler Intelligence Scale for
culty in school. (Elliott, 1987)
Children-Revised (WISC-R) with African American chil-
One major consequence of Larry P. has been a huge dren. The case was filed by the parents of Demond Craw-
reduction in the number of children assigned to self-con- ford, an African American student diagnosed with learning
tained EMR classes. For example, in California the number disability. His parents understood the value of standard-
of EMR children went from a high of 58,000 in 1968–1969 to ized intelligence tests in the assessment of learning disabil-
approximately 13,000 in 1984. For some mildly retarded ity and wanted school psychologists to use these traditional
children, alternative placement in regular classrooms has instruments in their evaluation. However, as a direct con-
been beneficial, but for others who are now not eligible for sequence of the Larry P. v. Riles decision, it was illegal in
any special help, the aftermath of court-influenced place- 1994 for psychologists to administer the WISC-R, or any
ment policies is more questionable (Powers & Hagans- other mainstream IQ test, to African American children in
Murillo, 2004). California, even with the permission of the parents. A psy-
chologist who did so risked fines and jail time for breaking
Parents in Action on Special Education (PASE) v.
the law. In this lawsuit, Judge Robert Peckham, the same
Joseph P. Hannon (1980) PASE v. Hannon was litigated
judge who presided in the Larry P. v. Riles case, overruled
in 1980, just one year after the landmark Larry P. v. Riles
his earlier finding so as to permit the use of standardized
case. In this suit, attorneys for two African American stu-
IQ tests in the evaluation of African American children
dent plaintiffs argued that the children were inappropri-
upon the formal request of their parents. This is an excel-
ately placed in educable mentally handicapped (EMH)
lent example of the fact that laws can be reshaped in
classes because of racial bias in the IQ tests used for place-
response to changing social conditions.
ment. The case was tried as a class action suit, meaning that
the plaintiffs represented the category of all similar children GI Forum v. Texas Education Agency (2000) In this
in Chicago. Even though the issues in the PASE class action court suit, filed on behalf of seven African American and
suit were substantially the same as the preceding case, the Latino high school students in Texas, plaintiffs challenged
356 Chapter 12

the use of the Texas Assessment of Academic Skills (TAAS) as Act.1 According to Ballard and Zettel (1977), this law was
a requirement for high school graduation on the grounds designed to meet four major goals:
that it discriminated unfairly against minority students and
1. To ensure that special education services are available
violated their right to due process. They pointed out that
to children who need them
substantial disparities in resources existed between “white”
schools—those with a preponderance of white students—
2. To guarantee that decisions about services to disabled
students are fair and appropriate
and minority schools—those with a preponderance of minor-
ity students. In the view of plaintiffs, this was the explanation 3. To establish specific management and auditing re-
for the differential failure rates. In fact, 67 percent of African quirements for special education
American, 59 percent of Latino, and 31 percent of white stu- 4. To provide federal funds to help the states educate
dents failed the exam the first time it was used in 1991. disabled students
After hearing expert witnesses over many months, the
Many practices in the assessment of disabled persons
court ruled in favor of state education officials, citing sev-
stem directly from the provisions of Public Law 94-142. For
eral compelling reasons. Although the court agreed with
example, the law specifies that each disabled student must
plaintiffs that disparities in resources did exist, it found no
receive an individualized education plan (IEP) based on a
evidence that these inequalities caused the higher failure
comprehensive assessment by a multidisciplinary team.
rate of minority students. The court also pointed out that
The IEP must outline long-term and short-term objectives
the TAAS was constructed with great care and possessed
and specify plans for achieving them. In addition, the IEP
“curricular validity”; that is, it tested what was actually
must indicate how progress toward these objectives will be
taught. This quality of a test is the same thing as instruc-
evaluated. The parents are intimately involved in this pro-
tional validity, as described earlier in Debra P. v. Turlington.
cess and must approve the particulars of the IEP.
Officials also noted that the TAAS was just one condition of
Pertinent to testing practices, PL 94-142 includes a
awarding the diploma, not the sole factor; attendance,
number of provisions designed to ensure that assessment
passing grades, and completion of the required curriculum
procedures and activities are fair, equitable, and nondis-
also are needed. The court praised the humane manner of
criminatory. Salvia and Ysseldyke (1988) summarize key
test implementation, noting that students first encounter
provisions which include assessment in the native lan-
the TAAS in the tenth grade and are provided remedial
guage with validated tests administered by trained person-
courses for any of the three subsections (reading, math,
nel; appraisal in areas related to the specific disability,
writing) that they fail. The cutoff score of 70 percent for
including—when appropriate—hearing, vision, emotional
each curricular area was deemed reasonable. Moreover, the
functioning, academic performance, communication skills,
court noted, students have a minimum of seven additional
motor skills, and general intelligence; and, evaluation by a
opportunities to pass the test. Finally, the court found it
multidisciplinary team that includes a teacher or specialist
“highly significant that minority students have continued
with knowledge of the area of suspected disability.
to narrow the passing rate gap at a rapid rate.” Similar to
PL 94-142 also contains a provision that disabled stu-
the findings in Debra P. v. Turlington, this case demonstrated
dents should be placed in the least restrictive environ-
that a well-designed graduation test can be an engine of
ment—one that allows the maximum possible opportunity
positive social change.
to interact with nonimpaired students. Separate schooling
is to occur only when the nature or the severity of the dis-
ability is such that instructional goals cannot be achieved
12.1.3: Disability Assessment in the regular classroom. Finally, the law contains a due
and The Law process clause that guarantees an impartial hearing to
Individuals with disabilities are afforded many legal pro- resolve conflicts between the parents of disabled children
tections, some of which impact the use of psychological and the school system.
tests. In this section, we review two broad areas in which In general, the provisions of PL 94-142 have provided
legislation has been written to defend individuals with dis- strong impetus to the development of specialized tests that
abilities: school-based assessment of children with disabili- are designed, normed, and validated for children with spe-
ties, and employment-based testing of persons with cific disabilities. For example, in the assessment of a child
disabilities. The coverage is purposefully brief. Readers with visual impairment, the provisions of PL 94-142 virtu-
can find lengthier discussions in Bruyere and O’Keeffe ally dictate that the examiner must use a well-normed test
(1994), Salvia and Ysseldyke (2001), and Stefan (2001).
1
Each congressional law receives two numbers, one referring to
Public Law 94-142 In 1975, the U.S. Congress passed a the particular Congress that passed it, the other referring to the
compulsory special education law, Public Law 94-142, law itself. Thus, Public Law 94-142 is the 142nd law passed by the
known as the Education for All Handicapped Children 94th Congress.
Legal Issues and the Future of Testing 357

devised just for this population rather than relying upon


traditional instruments.

Public Law 99-457 In 1986, Congress passed several


amendments to the Education for All Handicapped Chil-
dren Act, expanding the provisions of PL 94-142 to include
disabled preschool children. Public Law 99-457 requires
states to provide free appropriate public education to disa-
bled children ages 3 through 5. The law also mandates
financial grants to states that offer interdisciplinary educa-
tional services to disabled infants, toddlers, and their fami-
lies, thus establishing a huge incentive for states to serve
children with disabilities from birth through age 2. Public
Law 99-457 also provides a major impetus to the develop-
ment and validation of infant tests and developmental
schedules. After all, the early and accurate identification of
at-risk children would appear to be the crucial first step in
effective interdisciplinary intervention.
In the years since its inception, NCLB has remained
No Child Left Behind Act In the context of school- controversial, and efforts to modify it often make head-
based testing and the law, an important development is lines. Whether the act is accomplishing its stated intentions
the 2001 No Child Left Behind Act (NCLB). The ambition is still an open question. But it is an issue that can be inves-
of this act was to improve education through standards- tigated in an empirical, nonpartisan manner. For example,
based reforms that require states to implement assess- Wang, Beckett, and Brown (2006) provide an even-handed
ments in basic educational skills. NCLB is a complex and synthesis of research-based findings on the impact of
far reaching law that expands the federal role in public NCLB, summarizing pros and cons. The general tone of
education. There are important implications for educa- their review is mildly supportive. While citing several
tional and psychological testing in this act. The six ele- problems with NCLB (e.g., failure to provide adequate
ments of the law include: funding for test development and personnel training, fail-
ure to acknowledge genetic and socioeconomic influences),
the authors conclude that the law is bringing about posi-
tive changes in student learning.
But not all reviewers agree with this optimistic infer-
ence. The potential distorting effects of the high-stakes test-
ing dictated by NCLB remain a serious concern. Nichols,
Glass, and Berliner (2006) analyzed longitudinal data from
25 states on the relationship between high-stakes pressure
and improvements in student achievement as measured by
the National Assessment of Educational Progress (NAEP).
NAEP consists of periodic assessments in mathematics,
reading, science, writing, the arts, civics, economics, geog-
raphy, and U.S. history. The tests are administered uni-
formly across the nation with the same test booklets. Based
on sophisticated correlational analyses across time, Nichols
et al. (2006) found no value in high-states testing. To the
contrary, they found the impact to be insidiously negative.
Specifically, their analyses revealed that

• States with greater proportions of minority students


implement accountability systems that exert greater
pressure. This suggests that any problems associated
with high-stakes testing will disproportionately affect
America’s minority students.
• High-stakes testing pressure is negatively associated
with the likelihood that eighth and tenth graders will
358 Chapter 12

move into 12th grade. Study results suggest that earlier), the relevant accommodations might include any of
increases in testing pressure are related to larger the following:
numbers of students being held back or dropping out
• Assistance in completing answer sheets
of school.
• Audiotape or oral presentation of written tests
• Increased testing pressure produced no gains in NAEP
• Special seating for tests
reading scores at the fourth- or eighth-grade levels
(Nichols et al., 2006, p. 5). • Large-print examinations
• Retaking exams
These authors call for a moratorium on policies that
• Dictating rather than writing test answers
force school systems to use high-stakes testing. By implica-
tion, this would mean that key elements of NCLB ought to • Printed version of verbal instructions
be suspended. • Extended time limit

Americans with Disabilities Act The 1990 In general, changes in the testing medium (e.g., from
Americans with Disabilities Act (ADA) forbids discrimi- written to oral) are consistent with the intention of ADA, if
nation against qualified individuals with disabilities in such a change is needed to accommodate a disability. For
both the public sector (e.g., government agencies and enti- example, an appropriate accommodation in the testing
ties receiving federal grants) and the private sector (e.g., medium would be the audiotaped presentation of test
corporations and other for-profit employers). Under the items for persons who are visually impaired. On the other
ADA, disability is defined as a physical or mental impair- hand, changing a test from a printed version into a sign
ment that substantially limits one or more of the major life language version for persons with hearing impairment
activities (Parry, 1997). Examples of ADA-recognized disa- would be considered translation into another language,
bilities include sensory and physical impairments (e.g., not a simple change of medium.
blindness, paralysis), many mental illnesses (e.g., major In most testing accommodations mandated by the
depression, schizophrenia), learning disabilities, and atten- ADA, it is necessary to change the time limits, usually by
tion-deficit/hyperactivity disorder. providing extra time. This raises problems of test interpre-
Under the ADA, the process of qualifying an individ- tation, especially when a strict time limit is essential to the
ual for work or educational accommodations requires cur- validity of a test. For example, Willingham, Ragosta, Ben-
rent, detailed, and professional documentation. For nett, and others (1988) found that extended time limits on
example, a graduate student who was seeking a special the SAT significantly reduced the validity of the test as a
arrangement for taking tests (such as a quiet room) because predictor of first-year college grades. This was especially
of attentional problems might need to submit a compre- true for examinees with learning disabilities, whose first-
hensive endorsement from a licensed psychologist, detail- year grades were subsequently overpredicted by their SAT
ing the history, current functioning, clinical diagnosis of scores. Thus, although it seems fair to provide extra time
attention-deficit/hyperactivity disorder, and necessity for on a test when the testing medium has been changed (e.g.,
accommodations (Gordon & Keiser, 1998). In other words, audiotaped questions replacing the printed versions), from
the ADA is a civil rights act, not a program of entitlement: a psychometric standpoint, the challenge is to determine
The ADA does not guarantee equal outcomes, establish how much extra time should be provided so that the modi-
quotas, or require preferences favoring individuals with fied test is comparable to the original version. Nester (1994)
disabilities. Rather, the ADA is intended to ensure and Phillips (1994) provide thoughtful perspectives on the
access to equal employment opportunities based on range of reasonable accommodations required by the ADA.
merit. The ADA is designed to “level the playing field”
by removing the barriers that prevent qualified individ- Cognitive Disability and the Death Pen-
uals with disabilities from having access to the same alty One way that laws evolve in American society is
employment opportunities that are available to individ- through decisions of the Supreme Court. In a 2002 court case
uals without disabilities. (Atkins v. Virginia), the Supreme Court held that the execu-
(Klimoski & Palmer, 1994, p. 45) tion of mentally retarded convicts is “cruel and unusual pun-
In sum, the purpose is to ensure that individuals who ishment” prohibited by the Eighth Amendment. In speaking
are otherwise qualified for jobs or educational programs for the 6-3 majority, Chief Justice John Paul Stevens wrote:
are not denied access or put at improper disadvantage sim- We are not persuaded that the execution of mentally
ply because of a disability. retarded criminals will measurably advance the deterrent
In regard to psychological testing, an important provi- or the retributive purpose of the death penalty. Construing
sion of the ADA is that agencies and institutions must and applying the Eighth Amendment in the light of our
make reasonable testing accommodations for persons with “evolving standards of decency,” we therefore conclude
disabilities. With appropriate documentation (discussed that such punishment is excessive and that the Constitution
Legal Issues and the Future of Testing 359

“places a substantive restriction on the State’s power to lawsuit filed on behalf of an estimated 2,500 job applicants.
take the life” of a mentally retarded offender. Prospective security guards for Target were required to take
(Atkins v. Virginia, 2002, p. 321) the Rodgers Psychscreen, a 704-item condensed combina-
This new constitutional standard has profound implica- tion of the CPI and the MMPI. Several applicants objected
tions, literally of life and death, for the proper application of to answering the test, which included questions about God,
psychological tests with persons who display intellectual dis- sex, and bowel movements. Target agreed to pay $1.3 mil-
ability. Choosing the appropriate tests, getting the results lion, including $60,000 to four plaintiffs named in the law-
right, and offering an accurate diagnosis of intellectual disabil- suit. Although Target admitted no wrongdoing in the case,
ity could determine whether some examinees face death row. corporate officers agreed not to use the Psychscreen test for
This was certainly relevant for Doil Lane, who was con- at least five years.
victed of the heinous rape and murder of a nine-year-old girl Sibi Soraka was one of the plaintiffs in the lawsuit. He
and sentenced to death, principally on his confession (DNA found the questions to be “off-the-wall and bizarre.” He
testing was inconclusive). This confession of a highly sug- claimed that the cumulative effect of answering the ques-
gestible young man with intellectual disability may have tions made him palpably ill. He added: “It doesn’t take
been false. Whether or not his confession was true, there is no Einstein to figure out that these questions really don’t have
question as to presence of significant intellectual disability: any bearing on our world and life today, or certainly on
a job walking around looking for shoplifters.” Target cor-
As a child, he spent years as a resident of a special school poration defended the testing practice, noting that Psych-
in Texas for mentally disabled students. His I.Q. has
screen is commonly used in the evaluation of law enforce-
tested between 62 and 70. His mental deficiencies are so
ment officers. Attorneys for Soraka disagreed, citing a lack
obvious that the report by the Kansas police officer who
first interviewed him noted Lane seemed “mentally
of evidence that the test helped identify good versus poor
retarded.” The former chief psychologist of the Texas risks for employment. They noted that about 800 of the
Division of Criminal Justice assessed his intelligence in 2,500 applicants were denied employment based solely
1998 and concluded he had mental retardation. When his upon Psychscreen results.
police interrogation was over, Lane—a thirty-year-old— This case illustrates that the psychometric soundness
climbed into the interrogating officer’s lap. At his trial in of an instrument is not the only criterion in test selection.
Texas, Lane asked the judge for crayons so that he could In addition, test users must show that the instrument is rel-
color pictures. The judge denied the request. evant to their application. Furthermore, issues of accept-
(Human Rights Watch, 2001, p. 38) ability to prospective examinees must be considered.
In response to the Supreme Court decision, Texas Gov-
ernor Rick Perry commuted the death sentence of Doil
Lane to life in prison. Personnel testing is particularly sensitive because the
consequences of an adverse decision are often grave: The
12.1.4: Legal Issues in Employment applicant does not get the job, or an employee does not get
the desired promotion or placement. Recognizing that em-
Testing ployment testing performs a sensitive function as gatekeeper
Nearly every aspect of the employment relationship is sub- to economic advantage, Congress has passed laws sharply
ject to the law: recruitment, screening, selection, placement, regulating the use of testing. The courts have also rendered
compensation, promotion, and performance appraisal all decisions that help define unfair test discrimination. In addi-
fall within the domain of legal interpretations (Cascio, tion, regulatory bodies have published guidelines that sub-
1987). However, courts and legislative bodies have stantially impact testing practices. We will provide a current
reserved special scrutiny for employment-related testing. perspective on the regulation of personnel testing by tracing
The practitioner who refuses to learn relevant legal guide- the development of laws, regulations, and major court cases.
lines in personnel testing does so at great peril, because It may surprise the reader to learn that employment
unwise practices can lead to costly and time-consuming testing has raised legal controversy only in the last 35 years
litigation (Case Exhibit 12.1). (Arvey & Faley, 1988). During this period, several definitive
court decisions and path breaking governmental directives
have helped define current legal trends. These landmarks
Case Exhibit 12.1 are depicted in Table 12.2, beginning with the Civil Rights
Act of 1964, proceeding through the federal regulations of
Case Exhibit Unwise Testing Practices in
the Equal Employment Opportunity Commission (EEOC),
­Employee Screening
and concluding with very recent court cases and legislative
According to the Associated Press of July 11, 1993, the Target developments. We will review these landmarks in chrono-
discount chain agreed to settle out of court in a class-action logical order.
360 Chapter 12

have been used by the courts to help resolve legal disputes


Table 12.2 Major Legal Landmarks in Employment Testing regarding employment-testing practices (see the following
section).
The 1964 Myart v. Motorola case marked the first
involvement of the courts in employment testing. The
issues raised by this landmark case are still reverberating
today. Leon Myart was an African American applicant for a
job at one of Motorola’s television assembly plants. Even
though he had highly relevant job experience, Mr. Myart
was refused a position because his score on a brief screen-
ing test of intelligence fell below the company cutoff.
Claiming racial discrimination, he filed an appeal with the
Illinois Fair Employment Practices Commission. The state
examiner found in favor of the complainant and directed
that the Motorola company should offer Mr. Myart a job. In
addition, the examiner ruled that the particular test should
not be used in the future and that any new test should
“take into account the environmental factors which con-
tribute to cultural deprivation.” In essence, the examiner
concluded that Motorola’s employment-testing practices
were unfair because they acted as a barrier to the employ-
Early Court Cases and Legislation During the ment of culturally deprived and disadvantaged applicants.
presidency of Lyndon Johnson, Congress passed the Civil Even though the case was later overturned for lack of evi-
Rights Act of 1964. This early civil rights legislation had a dence, Myart v. Motorola did set the precedent to hear such
profound effect on employee-testing procedures. In addi- complaints in the court system (Arvey & Faley, 1988).
tion to broad provisions designed to prevent discrimina-
tion in many social contexts, Title VII of this act prohibits Advent of EEOC Employment Testing Stand-
employment practices that discriminate on the basis of ards During the 1970s, several court cases helped shape
race, color, religion, sex, or national origin. The act estab- current standards and practices in employment testing.
lished several important general principles relevant to The focus of Griggs v. Duke Power Company (1971) was the
employment testing (Cascio, 1987): use of tests—in this case the Wonderlic Personnel Test and
the Bennett Mechanical Comprehension Test—as eligibility
• Discriminatory preference for any group, minority or criteria for employees who wanted to transfer to other
majority, is barred by the act. departments. In particular, employees at Duke Power
• The employer bears the burden of proof that all Company who lacked a high school education could qual-
requirements for employment, including test scores, ify for transfer if they scored above the national median on
are related to job performance. both tests. This policy appeared to discriminate against
• Professionally developed tests used in personnel test- African American employees since it was disproportion-
ing must be job related. ately difficult for them to gain eligibility for transfer. How-
• In addition to open and deliberate discrimination, the ever, lower courts found no discriminatory intent and
law forbids practices that are fair in form but discrimi- therefore found in favor of the power company.
natory in operation. In 1971, the Supreme Court reversed the lower court
• Intent is irrelevant: the plaintiff need not show that findings, ruling against the use of tests without their vali-
discrimination was intentional. dation. The decision emphasized several points of current
relevance (Arvey & Faley, 1988):
• In spite of these proscriptions, job-related tests and
other measuring devices are deemed both legal and • Fairness in employment testing is determined by con-
useful. sequences, not motivations.
The 1964 legislation also created the Equal Employ- • Testing practices must have a demonstrable link to job
ment Opportunity Commission (EEOC) to develop guide- performance.
lines defining fair employee-selection procedures. The • The employer has the burden of showing that an
initial guidelines, published in 1966, were vague. Later employment practice such as testing is job related.
revisions of these guidelines, including the Uniform Guide- • Diplomas, degrees, or broad testing devices are not
lines on Employee Selection (1978), were quite specific and adequate as measures of job-related capability.
Legal Issues and the Future of Testing 361

• The EEOC testing standards deserve considerable def- legalistic manner. However, the existence of several sets of
erence from employment testers. competing guidelines was confusing, and strong pressures
were exerted upon the involved parties to forge a compro-
These employment testing guidelines were further
mise. These efforts culminated in a consensus document
refined in a 1973 court decision, United States v. Georgia
known as the 1978 Uniform Guidelines on Employee Selection.
Power Company. In this case, the Georgia Power Company
The Uniform Guidelines quickly earned respect in court
presented a validation study to support its employment-
cases and were frequently cited in the resolution of legal
testing practices when its policies were shown to have an
disputes. The new guidelines contain interpretation and
adverse impact upon the hiring and transferring of African
guidance not found in earlier versions, particularly regard-
Americans. However, the validation study was weak, in
ing adverse impact, fairness, and the validation of selection
part because it was based upon multiple discriminant anal-
procedures, as discussed later.
ysis, a complex statistical technique rarely used for this
The Uniform Guidelines provide a very specific defini-
purpose. The courts ruled that the validation study was
tion of adverse impact. In general, when selection proce-
inadequate since it did not adhere to EEOC guidelines for
dures favor applicants from one group (usually males or
evaluating validity studies. This finding ensconced the
whites), the basis for selection is said to have an adverse
EEOC guidelines as virtually the law of the land in employ-
impact on other groups (usually females or nonwhites) with
ment-testing practices.
a lower selection proportion. The Uniform Guidelines define
Several other court cases in the 1970s and 1980s also
adverse impact with a four-fifths rule. Specifically, adverse
served to strengthen the authority of EEOC testing guide-
impact exists if one group has a selection rate less than four-
lines. These cases were quite complex and involved multi-
fifths of the rate of the group with the highest selection rate.
ple issues in addition to those cited here. In Albemarle v.
For example, consider an employer who has 200 applicants
Moody (1975), the Supreme Court deferred to EEOC guide-
in a year, 100 African American and 100 white. If 120 persons
lines in finding that subjective supervisory ratings are
were hired, including 80 whites and 40 African Americans,
ambiguous and, therefore, constitute a poor basis for eval-
then the percentage of whites hired is 80 percent (80/100),
uating the validity of an employment selection test. The
whereas the percentage of African Americans hired is 40
central issue in Washington v. Davis (1976) was whether
percent (40/100). Since the selection rate for African Ameri-
performance in a training program (as opposed to actual
cans is only half that of whites (40 percent/80 percent), the
on-the-job performance) was a sufficient basis for deter-
employer might be vulnerable to charges of adverse impact.
mining the job-relatedness of the employment selection
We should note that the Uniform Guidelines suggest caution
procedures. In this case, the Supreme Court ruled that per-
about this rule when sample sizes are small.
formance in a police officer training program was a suffi-
The Uniform Guidelines also pay more attention to fair-
cient criterion against which to validate a selection test.
ness than previous documents. Fairness is treated in the
In State of Connecticut v. Teal, the U.S. Supreme Court
following manner:
sided with four African American state employees who had
failed a written test that was used to screen applicants for When members of one racial, ethnic, or sex group char-
the position of welfare eligibility supervisor. The workers acteristically obtain lower scores on a selection proce-
claimed unfair discrimination, noting that only 54 percent dure than members of another group, and the differences
are not reflected in differences in a measure of job per-
of minority applicants passed, compared to 80 percent for
formance, use of the selection procedure may unfairly
whites. In its defense, the state of Connecticut argued that
deny opportunities to members of the group that obtain
discrimination did not exist, since 23 percent of the success-
the lower scores. Furthermore, in cases where two or
ful African American applicants were ultimately promoted, more selection procedures are equally valid, the
compared to 14 percent for whites. The Court was not employer is obliged to use the method that produces the
impressed with this argument, noting that Title VII of the least adverse impact.
1964 Civil Rights Act was specifically designed to protect
The Uniform Guidelines also establish a strong affirma-
individuals, not groups. Thus, any unfairness to an indi-
tive action responsibility on the part of employers. If an
vidual is unacceptable. Further analysis of fair employment
employer finds a substantial disparity in persons hired
court cases can be found in Arvey and Faley (1988), Cascio
from a subgroup compared to their availability in the job
(1987), Kleiman and Faley (1985), and Russell (1984).
market, several corrective steps are recommended. These
Uniform Guidelines on Employee Selec- corrective measures include specialized recruitment pro-
tion During the 1970s, several federal agencies and pro- grams designed to attract qualified members of the group
fessional groups proposed revisions and extensions of the in question, on-the-job training programs so that affected
existing EEOC employment testing guidelines. The revi- minorities do not get locked into dead-end jobs, and a
sions were developed in response to court decisions that revamping of selection procedures to reduce or eliminate
had interpreted EEOC guidelines in a narrow, inflexible, exclusionary effects.
362 Chapter 12

Finally, the guidelines provide specific technical stand- male senior vice president for their promotion decisions.
ards for evaluating validity studies of employee selection The bank did not dispute that it made hiring and promo-
procedures. The courts will almost certainly consult these tion decisions solely on the basis of subjective judgment.
Uniform Guidelines if employees bring suit against the com- When an analysis of promotion patterns confirmed statisti-
pany for alleged unfairness in employee selection prac- cally significant racial disparities, Watson brought suit
tices. Thus, it is a foolish employer who does not pay against the bank.
special attention to these technical criteria. For example, Two legal theories were available for Watson to litigate
one criterion concerns the use of performance scores her claim under Title VII of the 1964 Civil Rights Act. The
obtained during training programs: two theories are called “disparate treatment” and “disparate
impact.” A disparate treatment case is more difficult to liti-
Where performance in training is used as a criterion, suc-
cess in training should be properly measured and the rel- gate, since the plaintiff must prove that the employer
evance of the training should be shown either through a engaged in intentional discrimination. In a disparate impact
comparison of the content of the training program with case, intention is irrelevant. Instead, the plaintiff need merely
the critical or important work behavior(s) of the job(s), or show that a particular employment practice—such as using a
through a demonstration of the relationship between standardized test—results in an unnecessary and dispropor-
measures of performance in training and measures of job tionately adverse impact upon a protected minority.
performance. The lower courts ruled that Watson was restricted to the
Thus, preemployment evaluation of job candidates in more limited disparate treatment approach since the
a training program may constitute a valid method of employer had used subjective evaluation procedures. Fur-
employee selection, but only if a strong link exists between thermore, the lower courts ruled that the bank had not
the task demands of training and the requirements of the engaged in intentional discrimination and did have legiti-
actual job. mate reasons for not promoting Watson. Nonetheless, the
The Uniform Guidelines contain many other criteria that Supreme Court agreed to hear the case in order to determine
we cannot review here. We urge the reader to read this fas- whether a disparate impact analysis could be applied to
cinating and influential document which is often cited in subjective employment devices such as interview. Relying
court cases on employment discrimination. heavily upon a brief from the American Psychological Asso-
ciation (APA, 1988), the Supreme Court ruled unanimously
Legal Implications of Subjective Employment that the disparate impact analysis is applicable to subjective
Devices In many corporations, promotions are based or discretionary promotion practices based on interview. In
upon the subjective judgment of senior managers. A com- effect, the Court ruled that subjective employment devices
mon practice is for one or more managers to interview sev- such as interview can be validated. Thus, employers do not
eral qualified employees and offer a promotion to the one have unmonitored discretion to evaluate applications for
candidate who appears most promising. The selection of promotion based on subjective interview. As a consequence
this candidate is typically based on subjective appraisal of of Watson v. Fort Worth Bank and Trust, employers must be
such factors as judgment, originality, ambition, loyalty, and ready to defend all their promotion practices—including
tact. Until recently, these subjective employment devices subjective interview—against claims of adverse impact.
appeared to be outside the scope of fair employment prac-
tices codified in the Uniform Guidelines and other sources. Recent Developments in Employee Selec-
However, in a civil rights case, Watson v. Fort Worth tion Recent court cases also have impacted personnel
Bank and Trust (1988), the Supreme Court made it easier for testing. The issue in Soraka v. Dayton Hudson was whether
employees to prove charges of race or sex discrimination corporations can use a personality test as a basis for preem-
against employers who use interview and other subjective ployment screening for mental health problems in job appli-
assessment devices for employee selection or promotion. cants. As discussed previously, Soraka was required to take
We outline the factual background of this important case the Rodgers Psychscreen as part of the application process
before discussing the legal implications (Bersoff, 1988). for a position as security guard. The Psychscreen is a true-
Clara Watson, an African American employee at Fort false personality inventory intended to identify persons with
Worth Bank and Trust, was rejected for promotion to psychological problems such as depression and anxiety.
supervisory positions four times in a row. Each time, a Soraka filed suit against the department store, claiming that
white applicant received the promotion. Watson obtained individual questions about his sexual practices and religious
evidence showing that the bank had never had an African beliefs were a violation of his civil rights. This case was inter-
American officer or director, had only one African Ameri- esting because it pertained to the value and validity of indi-
can supervisor, and paid African American employees vidual items as opposed to overall test scores. The courts
lower salaries than equivalent white employees. Further- have long held that preemployment testing must have dem-
more, all supervisors had to receive approval from a white onstrated relevance to job performance or it cannot be used.
Legal Issues and the Future of Testing 363

However, the courts have not required validity evidence for standards to testimony based upon psychological tests
individual test items. Soraka won his case, which was and evaluations. We also explore a few specialized instru-
appealed by Dayton Hudson. In 1993, the company settled ments useful in forensic assessment.
out of court. This litigation is summarized in Case Exhibit The role of the psychological examiner can intersect
12.1 found earlier in this section. with the legal system in a multitude of ways. The practi-
Another recent court case illustrates how litigation tioner might be called upon for the following:
will continue to clarify the scope of ADA in regard to psy-
• Evaluation of possible malingering
chological testing. In Karraker v. Rent-A-Center (2005), a fed-
• Assessment of mental state for the insanity plea
eral appeals court unanimously invalidated the use of the
Minnesota Multiphasic Personality Inventory-2 (MMPI-2) • Determination of competency to stand trial
as a job screening test, citing ADA restrictions on preem- • Assessment of personal injury
ployment medical tests. The defendants argued in vain • Specialized forensic personality assessment
that their use of the test was solely to measure traits of
These are the primary applications of forensic prac-
character and personality such as honesty, preferences, and
tice, which we examine here. A variety of additional appli-
reliability—all legal under ADA. The appeals court held
cations are surveyed in Melton, Petrila, Poythress, and
that the MMPI-2 was designed, at least in part, to reveal
Slobogin (1998).
mental illness. As such, the effect of using the test was to
In addition to meeting the general guidelines for ethi-
hurt employment prospects for individuals with a mental
cal practice required of any clinician, practitioners who
disability, a direct violation of ADA.2 The defendants paid
offer expert testimony based upon psychological tests will
a substantial sum to settle a class action suit filed by
encounter additional standards of practice unique to the
employees and agreed to stop using the test in California.
U.S. jurisprudence system. We summarize major concerns
Forensic Applications Of Assessment Psychol- regarding psychological tests and courtroom testimony
ogy and the legal system have had a long and uneasy alli- here. The reader can find extended discussions of this topic
ance characterized by mistrust on both sides. Within the in Melton et al. (1998) and Wrightsman, Nietzel, Fortune,
legal system, lawyers and judges maintain antipathy and Greene (2002).
toward the testimony of psychologists because of a con- Each of the previously listed topics raises unique ques-
cern that their opinions are based upon “junk science” (or tions about the role of the psychologist in the courtroom.
perhaps no science at all) and also because of a belief (not However, one issue is common to all forms of courtroom
entirely unfounded) that some expert witnesses will pro- testimony: When is a psychologist an expert witness? We
fess almost any viewpoint that serves the interests of a discuss this general issue before returning to specific appli-
defendant. Within the mental health profession, psycholo- cations of psychological evaluation that intersect with the
gists find the adversarial aspect of courtroom testimony— U.S. legal system.
based upon the expectation of yes-no opinions expressed
Standards for the Expert Witness Just as psy-
as virtual certainties—to be an impossible arena in which
chologists are concerned with issues of standards and com-
to pursue the truth about human behavior. As the reader
petence, so too are lawyers and judges. U.S. jurisprudence
will discover, this essential tension between law and psy-
has developed various guidelines for courtroom testimony,
chology is a constant backdrop that shapes and informs the
including several general principles regarding the testi-
nature of psychological practice in the courtroom.
mony of an expert witness. These standards are found in
For better or for worse, psychologists do testify in
Federal Rules of Evidence (1975) and have been upheld by
court cases, and the focus of their testimony often pertains
various court decisions. We can summarize the principles
to the interpretation of psychological tests and assessment
of expert testimony as follows:
interviews. When are test results and psychological opin-
ions based upon them admissible in court? What criteria • The witness must be a qualified expert. Not all psy-
do judges use in determining whether to admit psycho- chologists who are asked to testify will be allowed to
logical testimony? Psychologists who represent them- do so. Based on a summary of the expert’s education,
selves as experts and who use tests to justify their opinions training, and experience, the judge decides whether
must have a firm grounding in legal issues that pertain to the testimony of the witness is to be admitted.
assessment. In this topic we examine the relevance of legal • The testimony must be about a proper subject matter. In
particular, the expert must present information beyond
2 the knowledge and experience of the average juror.
Oddly enough, in one of those twists so typical of how law is
interpreted, it appears that the MMPI-2 still can be used legally in • The value of the evidence in determining guilt or inno-
employment settings if the employer makes a conditional offer of cence must outweigh its prejudicial effect. For example,
employment before requiring that candidates take the test. if the expert’s testimony might confuse the issue at
364 Chapter 12

hand or might prejudice the members of the jury, it is 2. Has the proposed theory (or technique) been tested using
generally not admissible. valid and reliable procedures and with positive results?
• The expert’s testimony should be in accordance with a 3. Has the theory (or technique) been subjected to peer
generally accepted explanatory theory. In most courts, review?
guidance on this matter is provided by Frye v. United 4. What is the known or potential error rate of the scien-
States, a 1923 court case pertaining to the admissibility tific theory or technique?
of expert testimony. 5. What standards controlling the technique’s operation
In Frye v. United States, the counsel for a murder defend- maximize its validity?
ant attempted to introduce the results of a systolic blood 6. Has the theory (or technique) been generally accepted
pressure deception test. The lawyer offered an expert wit- as valid by a relevant scientific community?
ness to testify to the result of the deception test. It was 7. Do the expert’s conclusions reasonably follow from
asserted that emotionally induced activation of the sympa- applying the theory (or technique) to this case?
thetic nervous system causes systolic blood pressure to rise
gradually if the examinee attempts to deceive the examiner. The ramifications of the Daubert trilogy rulings for the
In other words, the expert witness asserted that in the expert testimony of psychologists are unclear at this time.
course of an interrogation about a crime, the pattern of For example, it is uncertain whether testimony based upon
change in systolic blood pressure could be used as a form of the Rorschach Inkblot Test (discussed earlier in this text)
lie detector test. The defense counsel wanted their expert would be admissible under these newer, more restrictive
witness to testify in support of the client’s innocence. Coun- guidelines (Grove, Barden, Garb, & Lilienfeld, 2002; Ritzler,
sel for the prosecution objected, and the Court of Appeals of Erard, & Pettigrew, 2002). What is clear at this point is that
the District of Columbia upheld the objection, ruling: judges generally have tightened the standards for admit-
ting expert evidence in U.S. courts (Dixon & Gill, 2002). For
While courts will go a long way in admitting expert testi-
example, some courts have used the Daubert ruling as a
mony deduced from a well-recognized scientific principle
or discovery, the thing from which the deduction is made basis for denying testimony from mental health profes-
must be sufficiently established to have gained general sionals, including psychologists. In some courts, testimony
acceptance in the particular field in which it belongs. about psychological evaluations of sexually abused chil-
(cited in Blau, 1984) dren has been ruled inadmissible. Increasingly, courts will
demand that testimony from psychologists has a strict
The court concluded that the systolic blood pressure
scientific basis (Melton et al., 1998).
deception test had not gained acceptance among physio-
logical and psychological authorities and, therefore,
The Nature of Forensic Assessment Before we
refused to allow the testimony of the expert witness.
turn to specific applications, it will prove helpful to explore
According to these guidelines, a test, inventory, or
crucial differences between forensic assessment and tradi-
assessment technique must have been available for a fairly
tional assessment. The most general divergence is that
long period of time in order to have a history of general
forensic assessment is molded by the prerequisites of the
acceptance. For this reason, the prudent expert witness will
legal system, whereas traditional assessment is shaped by
choose well-established, extensively researched instru-
the needs of the client and current professional standards.
ments as the basis for testimony, rather than relying upon
Although the two approaches occasionally will look the
recently developed tests that might not stand up to cross-
same—for example, the examiner might use the MMPI-2 in
examination under the constraints of Frye v. United States.
both cases—the types of information sought, the strategies
In the mid- to late 1990s, the standards for expert testi-
for gathering it, and the manner of report writing will be
mony were refined further, beginning with a Supreme
noticeably different.
Court decision in Daubert v. Merrell Dow Pharmaceuticals
One major difference is the scope of evaluation in tra-
(1993). The Court’s written opinion added extensive guide-
ditional and forensic assessment (Melton et al., 1998).
lines about factors to be considered in weighing scientific
Whereas traditional assessment usually is broadscale and
testimony in trials. Two additional court cases (General
provides a comprehensive picture of a client’s functioning
Electric Co. v. Joiner, 1997; Kumho Tire Co., Ltd. v. Carmichael,
and treatment needs, forensic assessment engages a nar-
1999) further extended the parameters of expert testimony
row focus that may not even appear to be “clinical” in
defined by Daubert. Sometimes known as the Daubert tril-
nature. For example, when evaluating a forensic client for
ogy, these three cases generated several new guidelines
competency to stand trial, the client’s symptom pattern,
that trial judges may use in determining the admissibility
mental status, diagnosis, and so forth are only of tangential
of expert testimony (Grove & Barden, 1999):
interest. What matters most, and what the legal system will
1. Is the proposed theory (or technique), on which the want to know, is whether the client meets the criteria for
testimony is to be based, testable? competency or not, as discussed later in this topic. Lawyers
Legal Issues and the Future of Testing 365

and judges will prefer and expect a “yes” or “no” answer Woe to the forensic examiner who writes a sloppy
to the competency question, and they may regard a lengthy report that becomes part of court proceedings. Not only
description of symptoms as so much drivel. might this unnecessarily confuse the court case, but it also
Another huge difference has to do with the client’s role in could result in literally hours of ill-mannered and humili-
the process. Whereas in traditional assessment the client vol- ating cross-examination of the report writer.
untarily agrees to an assessment and may even help deter-
Evaluation of Suspected Malingering In most
mine its scope and nature, in forensic assessment the client
settings, a psychologist safely can assume that clients will
really has little choice in the matter, unless he or she wants to
be reasonably honest about their mental and emotional
aggravate the judge who has authorized the assessment. In
state. Clients want to tell their stories and they want to get
fact, in forensic assessment the “client” is not really the client!
things right. At worst, they may overstate symptoms
The psychologist is working at the behest of a judge or law-
slightly so as to impress the clinician that help truly is
yer. Put simply, a judge, lawyer, or other court officer is usu-
deserved and needed. Yet outright deception and manipu-
ally the real client. It would be more accurate to refer to the
lation are uncommon—for the simple reason that clients
individual undergoing the assessment as the “examinee.”
rarely have incentive for these strategies.
Threats to validity also differ in the two settings.
However, the rules of clinical engagement are turned
Although it is true that clients in traditional assessment may
upside-down in forensic settings. The typical forensic cli-
want to present themselves in a good light and, therefore,
ent has much to gain from a case formulation that empha-
distort the truth when responding, this pales in comparison
sizes illness and disability. Indeed, the context of the
to the blatant faking of psychopathology (malingering) that
assessment almost guarantees that clients will seek to look
may occur in a forensic setting. Malingering is discussed in
“crazy” or disabled, whether by exaggeration or (more
more detail later in this topic. For now, consider the true case
rarely) deceptive design. In the mind of the forensic client,
of an inmate evaluated by the author. He complained that he
fabrication of symptoms may serve to excuse unacceptable
was “seeing things” and that he needed “medication” to
behavior (e.g., favoring the insanity plea), sway sentencing
sleep better. When asked to describe what he was “seeing,”
recommendations (e.g., against capital punishment), or
he was blandly inarticulate. During the interview, he
gain entitlements (e.g., certification for disability). These
appeared calm and collected. Furthermore, his personality
client maneuvers clearly influence the validity of forensic
test profile (MMPI-2) was a mountain range of scale eleva-
assessments. Hovering in the background of every forensic
tions, a classic fake-bad profile. Beyond a doubt, he was fab-
assessment is this troubling question: Was the client rea-
ricating his symptoms in hopes of receiving a prescription
sonably honest and forthright?
for antianxiety medications.
The forensic examiner must make a judgment about
Finally, it is important to mention that the nature of the
the honesty of the client’s self-portrayal during the evalua-
written report will differ in the two settings. In traditional
tion. And yet while common sense dictates that the exam-
assessment, the audience typically is other professionals
iner should expect some degree of deception, the
who are familiar with jargon, diagnostic terminology, and
conclusion that a client has consciously malingered needs
treatment options. In forensic assessment, the audience is
to be reached with caution:
legal personnel who care mainly about the referral ques-
tion. Melton et al. (1998) describe a number of pertinent Given the significant potential for deception and the
qualities for forensic reports; namely, reports should sepa- implications for the validity of their findings, mental
rate facts from inferences, stay within the scope of the refer- health professionals should develop a low threshold for
suspecting deceptive responding. At the same time,
ral question, avoid information overkill, and minimize
because the label of “malingerer” may carry considerable
clinical jargon. Melton and colleagues also note that the cli-
weight with legal decisionmakers and potentially tarnish
nician must take special care to protect the privacy rights of
all aspects of the person’s legal position, conclusions that
individuals mentioned in a report, insofar as this informa- a person is feigning should not be reached hastily.
tion most certainly will become part of the public record. (Melton et al., 1998)
Clinicians must write forensic reports with great care:
The most common and venerable method for identify-
Finally, and most important, the report and the clinician ing dishonest clients is the clinical interview. However, a
who writes it will, or at least should, receive close scrutiny
more objective approach should be preferred. The assess-
during adversary negotiations or proceedings. A well-
ment of potential malingering with interview hinges upon
written report may obviate courtroom testimony. A poorly
the judgment of the clinician (e.g., “This client is inconsistent
written report may become, in the hands of a skillful law-
yer, an instrument to discredit and embarrass its author. in his presentation of symptoms and appears eager to be
Therefore, attention to detail and to the accuracy of infor- sick, so I conclude that he is malingering”), which may
mation is required. prove erroneous. In contrast, an objective approach provides
(Melton, et al., 1998, p. 523) normative data, hit rates, and the like for the evaluation. Not
366 Chapter 12

only might this improve the accuracy of the assessment, in 172 questions, 32 are repeated inquiries to detect inconsist-
addition more standardized approaches should find greater ency of responding. Examples of the kinds of structured
acceptability in many court systems. interview questions include: “Do you ever feel like the fill-
According to DSM-IV malingering is defined as: ings in your teeth can pick up radio messages?” (Rare
Symptoms); “Do you have severe headaches at the same
… the intentional production of false or grossly exagger-
ated physical or psychological symptoms, motivated by time as you have a fear of germs?” (Symptom Combina-
external incentives such as avoiding military duty, avoid- tions); “Does the furniture where you live seem to get big-
ing work, obtaining financial compensation, evading ger or smaller from day to day?” (Improbable or Absurd
criminal prosecution, or obtaining drugs. Symptoms); “Do you have any serious problems with
(American Psychiatric Association, 1994, p. 683) thoughts about suicide?” (Blatant Symptoms). The scale
takes less than an hour to administer.
The incidence of malingering among referred clients is
Results allow for classification of examinees as definite
hard to pin down, although thought to be significant, at
feigning, probable feigning, and honest. Reliability of the
least in forensic settings. Forensic practitioners estimate
instrument is good, with internal-consistency reliability
the occurrence to be 15 to 20 percent of their cases (Rogers,
coefficients for subscales ranging from .66 to .92. Interrater
1986; Rogers, Sewell, & Goldstein, 1994). For reasons of jus-
reliability estimates are superb, ranging from .89 to 1.00.
tice and fairness, the detection of malingering with empiri-
Although the validity of the SIRS can be discussed
cally validated procedures is an important obligation of
along the familiar lines of content, criterion-related, and
forensic psychologists.
construct validity (and the test performs well in these
Several promising tests of malingering have emerged
domains), the real measure of its clinical utility pertains to
in recent years (Rogers, 2008). For reasons of space, we will
the capacity of the test to discriminate known or suspected
focus here on three procedures that illustrate the breadth of
malingerers from psychiatric patients and normal controls.
approaches available: the Structured Interview of Reported
One recent study indicates that the test performs well in
Symptoms (SIRS), the Test of Memory Malingering
this capacity (Gothard, Viglione, Meloy, & Sherman, 1996).
(TOMM), and certain MMPI-2 indices.
In a mixed sample of 125 males referred for competency
Structured Interview of Reported Symptoms evaluation (including 30 persons asked to simulate malin-
(SIRS) One promising instrument is the Structured Inter- gering, 7 individuals strongly suspected of malingering,
view of Reported Symptoms (SIRS), a 172-item interview and 88 persons for whom malingering appeared unlikely),
schedule designed expressly for the evaluation of malin- the SIRS was overall 97.8 percent accurate in classifying
gering (Rogers, Bagby, & Dickens, 1992). The approach participants as malingered or nonmalingered.
embodied in the SIRS was based on strategies identified in In a review and meta-analysis of the SIRS, Green and
the clinical literature as potentially useful for detecting Rosenfeld (2011) found that studies published since the ini-
malingering. Using a structured interview method, malin- tial validation demonstrate higher sensitivity (correct
gering is assessed on eight primary scales: detection of those known to be malingerers) but lower
specificity (correct identification of those known to possess
• Rare Symptoms (overreporting of infrequent symp-
real psychological difficulties). In other words, genuine
toms)
patient samples are more likely to be misclassified as feign-
• Symptom Combinations (real psychiatric symptoms
ing than nonclinical samples.
that rarely occur together)
Another concern about the SIRS is the comparative lack
• Improbable or Absurd Symptoms (symptoms reveal a of research with populations in the criminal justice system,
fantastic quality) where the instrument often is used. This population is rela-
• Blatant Symptoms (overendorsement of obvious signs tively uneducated, and minorities are heavily overrepre-
of mental disorder) sented, constituting 57 percent of the jail population
• Subtle Symptoms (overendorsement of everyday nationwide (tables from www.census.gov). By one estimate,
problems) more than 80 percent of the urban jail population in the
• Severity of Symptoms (symptoms portrayed with United States is African American (Dixon, 1995). How well
extreme, unbearable severity) does SIRS perform with incarcerated populations? In a
• Selectivity of Symptoms (indiscriminant endorsement large sample of jail inmates, McDermott and Sokolov (2009)
of psychiatric problems) reported that 66 percent of respondents scored in the malin-
gering range on the test. But inmates designated as malin-
• Reported versus Observed Symptoms (comparison of
gering in their charts were no more likely than others to
observed and reported symptoms)
score in the malingering range, raising questions about the
In addition to the eight primary scales, five supple- validity of the test, the accuracy of clinical judgments, or
mentary scales are used to interpret response styles. Of the both. Based on a study of 43 individuals with intellectual
Legal Issues and the Future of Testing 367

disability, Weiss, Rosenfeld, and Farkas (2011) recommend possessing a strong motivation to malinger (last two sam-
caution with the SIRS because a high proportion of their ples from Haber & Fichtenberg, 2006). The reader will
sample was incorrectly classified as feigning. notice that 99 percent of the intact adults scored between
45 and 50, and 100 percent of the TBI patients with no moti-
Test of Memory Malingering (TOMM) The
vation to malinger scored in this normal range. In contrast,
TOMM is a 50-item visual recognition test that includes
a whopping 64 percent of the patients seeking compensa-
two learning trials and an optional retention trial (Tom-
tion for head injury (and therefore having motivation to
baugh, 1997). The secret to the test is that it appears to be
malinger) scored below the cut-off, far worse than the sam-
difficult, while actually it is quite easy. As a result, malin-
ple with confirmed brain damage! Recent research con-
gerers encounter an enticing opportunity to perform
firms the value of the TOMM in a variety of settings,
poorly, whereas others complete the task with near perfect
including clinic-referred pediatric patients (Kirk, Harris,
scores. In the first learning trial, 50 line drawings are pre-
Hutaff-Lee, and others, 2010), juvenile offenders (Gast &
sented to the individual for 3 seconds each. Then, in the
Hart, 2010), and Spanish-speaking TBI populations (Strutt,
first test phase each stimulus is presented alongside three
Scott, Lozano, Tieu, & Peery, 2012).
distractor drawings; the examinee is asked to pick each
item shown previously. Of course, the position of each cor- Assessment of Mental State for the Insanity
rect drawing alongside the three distractor choices is var- Plea In criminal trials the defendant may invoke a vari-
ied randomly. A second learning trial then ensues, with a ety of defenses including entrapment, diminished capacity
second test phase. A delayed retention trial, consisting only (e.g., from mental subnormality), automatism (e.g., from
of the test phase, is administered after a 20-minute delay. hypnotic suggestion), and the insanity plea. Whenever a
The results for the TOMM consist of the number of correct special defense is invoked, an evaluation of the defendant’s
choices (out of 50) on Trial 1, Trial 2, and the Retention test. mental state at the time of the offense (MSO) is required.
Although there are several ways to summate and inter- In some courts, a psychologist is qualified to offer opinions
pret TOMM test scores, the most common approach is to about the MSO of a defendant. We restrict the discussion
utilize a cutting score of 44/45 on the Trial 2 outcome. In here to the insanity plea since this is the most common doc-
other words, scores of 45 or higher on Trial 2 are considered trine that would trigger the need for an MSO evaluation.
normal, whereas scores of 44 or lower indicate the likeli- Almost everyone is familiar with the insanity defense,
hood of malingering. This interpretive strategy emerged but only the exceptional person understands its provisions.
from several studies indicating that individuals who have Technically, the insanity defense is known as not guilty by
no motivation to malinger—whether they are normal adults reason of insanity (NGRI). Based on a few sensational and
or patients with brain impairment—rarely score lower than widely publicized trials such as the case of John Hinckley,
45 on the second trial. In contrast, individuals with motiva- who attempted to assassinate President Ronald Reagan,
tion to malinger (e.g., brain-injured persons involved in liti- the lay public generally has concluded that the insanity
gation, or others who have something to gain from poor defense is commonly employed by cynical lawyers to help
test performance) often score well below 45. We have sum- dangerous clients evade legal responsibility for heinous
marized the score ranges for a few studies in Table 12.3. crimes. Nothing could be further from the truth. In reality,
The three samples portrayed in this table include a the NGRI plea is widely respected by jurisprudence experts
large nonclinical sample of 405 adults (Tombaugh, 1997), a and is invoked in fewer than 1 in 1,000 trials (Blau, 1984).
small sample of 22 individuals with confirmed traumatic And in this tiny fraction of all criminal cases, the defense
brain injury (TBI) but no motivation to malinger (i.e., no succeeds less than 1 time in 4 (Melton et al., 1998). The
pending litigation), and a small sample of 28 individuals widespread belief that persons found NGRI “walk” away
with mild head injury seeking compensation and therefore from their crimes also is inaccurate: Most receive hospital

Table 12.3 Score Ranges for the Test of Memory Malingering


Percent Obtaining TOMM Trial 2 Score of:

50 49 48 47 46 45 40–44 30–39 <30


Samples:
Intact adults no motive (N = 405)a 91 7 1 1 1
b
Definite TBI with no motive (N = 22) 73 14 5 5 5
Possible TBI with with motive (N = 28)b 25 4 0 7 0 0 18 25 21

a
Based on data from Tombaugh (1997)
b
Based on data from Haber & Fichtenberg (2006)
NOTE:TBI = traumatic brain injury. Percentages are rounded off and row totals therefore do not equal exactly 100 percent.
368 Chapter 12

treatment that lasts several years. Recidivism rates are per- of irresistible impulse, and its use as part of an insanity
haps lower (and certainly not higher) than felons convicted plea appears to be waning.
of similar offenses (Melton et al., 1998). Even though out- The Durham rule was formulated in 1954 by the Dis-
lawed in some states, the insanity defense has shown trict of Columbia Federal Court of Appeals in Durham v.
remarkable resiliency—probably because it performs a United States. Dissatisfied with the M’Naughten rule, Judge
desirable role in a modern and compassionate society. David Bazelon proposed a new test, known as the Durham
Several legal tests for insanity have had significant rule, which provided for the defense of insanity if the crim-
influence in the United States, including the M’Naughten inal act was a “product” of mental disease or defect. The
rule, the Durham rule, the Model Penal Code rule, and the purpose of the Durham rule was to give mental health pro-
Guilty But Mentally Ill (GBMI) verdict (Wrightsman et al., fessionals a wider latitude in presenting information perti-
2002). Some jurisdictions include irresistible impulse as a nent to the defendant’s responsibility. Legal scholars hailed
supplement to the M’Naughten Rule. A few states have Durham as a great step forward, but in 1972 the rule was
abolished the insanity defense altogether. We will survey dropped by the circuit that had formulated it.
the different standards briefly before commenting upon The Durham rule was replaced by the Model Penal
the role of psychological tests in determining legal insanity. Code rule proposed by the American Law Institute.
The M’Naughten rule is the oldest, stemming from Adopted in 1972, the Model Penal Code rule is as follows:
an 1843 case in England. Daniel M’Naughten was plagued
A person is not responsible for criminal conduct if at the
by paranoid delusions that the prime minister, Robert time of such conduct, as a result of mental disease or
Peel, was part of a conspiracy against him. M’Naughten defect, he lacks substantial capacity either to appreciate
stalked the prime minister and, in a case of mistaken the criminality (wrongfulness) of his conduct or to con-
identity, shot his male secretary at No. 10, Downing form his conduct to the requirements of the law.
Street. M’Naughten was found not guilty by reason of (cited in Melton et al., 1998)
insanity, a verdict that touched off a national furor. In
The Model Penal Code rule also contains provisions
response to the furor, Queen Victoria commanded all 15
that prohibit the inclusion of the psychopath or antisocial
high judges of England to appear before the House of
personality within the insanity defense.
Lords and clarify the newly forged guidelines on insanity.
The Model Penal Code rule differs from the M’Naughten
The M’Naughten rule states:
rule in three important ways:
The jury ought to be told in all cases that every man is to
be presumed to be sane, and to possess a sufficient degree 1. By using the term appreciate, it acknowledges the
of reason to be responsible for his crimes, until the con- emotional determinants of criminal action.
trary be proved to their satisfaction; and that to establish 2. It does not require a total lack of appreciation by
a defense on the grounds of insanity it must be clearly ­ ffenders for the nature of their conduct—only a lack
o
proved that, at the time of committing the act, the accused of “substantial capacity.”
was laboring under such a defect of reason, from disease
3. It includes both a cognitive element and a volitional
of the mind, as not to know the nature and quality of the
element, making defendants’ inability to control
act he was doing, or, if he did know it, that he did not
their actions an independent criterion for insanity
know what he was doing was wrong.
(cited in Wrightsman et al., 1994) (Wrightsman et al., 2002, p. 329).

Thus, the M’Naughten rule “excuses” criminal behav- About 20 states now follow the Model Penal Code rule
ior if the defendant, as a consequence of a “disease of the or slight variants of it.
mind,” did not know what he or she was doing (e.g., a par- A recent development in the insanity plea is the Guilty
anoid schizophrenic who believed he or she was shooting But Mentally Ill (GBMI) verdict. Approximately one-
the literal devil) or did not know that what he or she was fourth of the states allow juries to reach a verdict of GBMI
doing was wrong (e.g., a person with mental retardation in cases in which the defendant pleads insanity. Typically,
who believed that it was acceptable to shoot an obnoxious in states that allow the GBMI verdict, the judge instructs
panhandler). Approximately half of the states use the the jury to return with one of four verdicts:
M’Naughten rule. • Guilty of the crime
Some jurisdictions also allow “irresistible impulse” as • Not guilty of the crime
a supplement to the M’Naughten rule. An irresistible
• Not guilty by reason of insanity
impulse is generally defined as a behavioral response that is
• Guilty but mentally ill
so strong that the accused could not resist it by will or rea-
son. But when is an impulse irresistible as opposed to sim- The intention of the last alternative is that a defendant
ply unresisted? This has proved difficult to define. For found GBMI should receive the same sentence as if found
obvious reasons, legal experts are unhappy with the notion guilty of the crime, but he or she begins the sentence in a
Legal Issues and the Future of Testing 369

psychiatric hospital. After treatment is completed, the defines a defendant as not guilty by reason of insanity if
defendant then serves the remainder of the sentence in he or she “lacks substantial capacity” to appreciate the
a prison. criminality of his or her conduct. Neuropsychological test
But the intention of GBMI and its reality are two different results have a direct bearing upon this issue.
things. Initial support for the GBMI verdict as a humane vari- Rating scales such as the Rogers Criminal Responsibil-
ant of the insanity plea has waned in recent years. Wrights- ity Assessment Scales (R-CRAS) also provide a useful basis
man et al. (2002) point out that jurors express confusion when for evaluating criminal responsibility (Rogers, 1984, 1986).
asked to make the difficult distinction between mental illness The R-CRAS is completed by the examiner immediately
that results in insanity (GBMI) and mental illness that does following a review of clinical records, police investigative
not. Melton et al. (1998) find little virtue in the verdict: reports, and the final clinical interview with the patient-
The GBMI verdict is conceptually flawed, has significant defendant. The instrument consists of clear descriptive cri-
potential for misleading the factfinder, and does not teria for 25 items assessing both psychological and
appear to achieve its goals of reducing insanity acquittals situational factors. The items are scored with respect to the
or prolonging confinement of offenders who are mentally time of the crime on five scales measuring these variables:
ill and dangerous. The one goal it may achieve is relieving • Patient Reliability
the anxiety of jurors and judges who otherwise would
have difficulty deciding between a guilty verdict and a
• Organicity
verdict of not guilty by reason of insanity. It is doubtful • Psychopathology
this goal is a proper one or worth the price. (p. 215) • Cognitive Control
Empirical studies indicate that offenders found GBMI • Behavioral Control
seldom receive adequate treatment. Furthermore, they The individual items on the R-CRAS were derived from
may receive harsher sentences than their counterparts the Model Penal Code standard of insanity (Table 12.4).
found merely guilty (Callahan, McGreevy, Cirincione, &
Steadman, 1992). In fact, some defendants found GBMI
have been sentenced to death!
Table 12.4 Sample Items from the R-CRAS
Now that the reader has been introduced to variants of
the insanity plea, we review the role of the psychologist in
determining legal insanity. An important point is that psy-
chologists are rightfully cautious in offering an interview-
based opinion as to a person’s mental state at the time of a
criminal offense. After all, the crime usually occurred days,
weeks, months, or even years before, and the client may be
unable to assist in the accurate reconstruction of events and
mental states. Consequently, psychological testimony regard-
ing legal insanity should be cautious and conservative. Relia-
bility studies of insanity evaluations also suggest that caution
is appropriate. In a review of seven studies, Melton et al.
(1998) determined that interrater agreement (as to whether a
defendant was legally insane) ranged from a low of 64 per-
cent (between prosecution and defense psychiatrists) to a
high of 97 percent (between psychologists with forensic train-
ing who used structured instruments, discussed later).
In spite of controversy over the role of the psycholo-
gist in MSO determinations, some experts foresee an
increased role for psychological assessment in cases
involving the insanity plea. In particular, neuropsycholog-
ical assessments may provide objective, valid data to help
the courts decide the merits of an insanity defense. Recent
court rulings affirm that neuropsychological test findings
can be used to show that a defendant has impaired capa-
bility to choose right and refrain from wrong (Blau, 1984;
Heilbrun, 1992). Martell (1992) has discussed the relevance
Source: Adapted and reproduced by special permission of the Publisher, Psychological
of neuropsychological assessment to the insanity plea as Assessment Resources, Inc., Odessa, FL 33556, from the Rogers’ Criminal Responsibility
Assessment Scales by Richard Rogers, Ph.D. Copyright 1984 by PAR, Inc. Further repro-
defined by the Model Penal Code. The Model Penal Code duction is prohibited without permission from PAR, Inc.
370 Chapter 12

Interrater reliabilities of the R-CRAS scales ranged from The presiding judge may request a psychological or
.48 (for a Malingering subscale) to 1.00 (for Organicity). Con- psychiatric evaluation to assist in determining a defendant’s
struct validity was established by comparing the disposition competency to stand trial. One recent report indicates that
of 93 legal cases with R-CRAS data. Even though legal out- more than 25,000 evaluations of competency to stand trial
come is determined by many variables besides the psycho- are performed in the United States each year (McDonald,
logical state of the person at the time of the crime, there was Nussbaum, & Bagby, 1992). It is important to emphasize that
95 percent agreement in the determination of sanity and 73 psychologists, psychiatrists, and other mental health profes-
percent agreement in the determination of insanity. sionals merely assist in a competency hearing by presenting
Even though reviewers recognize the promise of the expert opinions. Only the judge has the power to make a
R-CRAS, for some a healthy skepticism still prevails. One competency determination. Although there is no standard
concern is that the subscales of the instrument represent an format for a competency determination, most judges request
ordinal level of measurement, whereas an interval level of that the psychologist consider most or all of the 11 factors
quantification is implied. Another concern is that the test cited in Table 12.5).
developers claim to “quantify areas of judgment that are Incompetency to stand trial is entirely separate from
logical and/or intuitive in nature” that leads to a false sense legal insanity; these two issues are judged by completely
of scientific certainty (Melton et al., 1998). Certainly, the different standards. Legal insanity pertains to the moment
R-CRAS performs a valuable function by helping clinicians of the criminal act, whereas incompetency implies a current,
organize their thinking and evaluation. The utility of the ongoing condition. Furthermore, incompetency is not syn-
overall decision—sane versus insane—will rest upon addi- onymous with mental illness, although the two may occur
tional validational research (Howell & Richards, 1989). In together. In the event that the judge rules the defendant
support of test validity, Rogers and Sewell (1999) reanalyzed incompetent, the trial is postponed, usually for a period of
413 insanity cases and found that the R-CRAS contributed six months or so. In some cases, persons found incompe-
substantially to the determination of criminal responsibility. tent are placed in a mental institution for treatment to
restore their competency so that a trial can be held later.
Competency to Stand Trial The Sixth Amend-
Individuals charged with less-serious crimes may receive
ment to the U.S. Constitution, passed in 1791, guarantees
outpatient treatment.
every accused citizen the right to an impartial, speedy, and
In addition to information obtained from the clinical
public trial with benefit of counsel. If the defendant is una-
interview, psychological test results are important compo-
ble to exercise these constitutional rights for any reason,
nents of a competency evaluation. For example, a low IQ
then a proper trial cannot take place. Specifically, if the
may constitute evidence of incompetence in the eyes of the
defendant has a mental defect, illness, or condition that ren-
court. Although there are no firm guidelines, most courts
ders him or her unable to understand the proceedings or to
rule that persons with significant intellectual deficits—say,
assist in his or her defense, the defendant would be consid-
an IQ in the range of moderate mental retardation or
ered incompetent to stand trial. This standard was con-
lower—are incompetent to stand trial. Likewise, a pattern
firmed by the U.S. Supreme Court in Dusky v. United States
(1960) as “whether [the defendant] has sufficient present
ability to consult with his lawyer with a reasonable degree
Table 12.5 Factors Considered in Determining
of rational understanding—and whether he has a rational
Competency to Stand Trial
as well as factual understanding of the proceedings against
1. Defendant’s appreciation of the charges
him.” In practice, competency to stand trial refers to four
elements and distinctions (Melton et al., 1998): 2. Defendant’s appreciation of the nature and range of penalties
3. Defendant’s understanding of the adversary nature of the legal
1. The defendant’s capacity to understand the criminal process
process, including the role of the participants in that 4. Defendant’s capacity to disclose to attorney pertinent facts
process surrounding the alleged offense
5. Defendant’s ability to relate to attorney
2. The defendant’s ability to function in that process, pri-
6. Defendant’s ability to assist attorney in planning defense
marily through consulting with counsel in the prepara-
tion of a defense 7. Defendant’s capacity to realistically challenge prosecution witnesses
8. Defendant’s ability to manifest appropriate courtroom behavior
3. The defendant’s capacity, as opposed to willingness, to
9. Defendant’s capacity to testify relevantly
relate to counsel and understand the proceedings
10. Defendant’s motivation to help himself in the legal process
4. The defendant’s reasonable degree of understanding,
11. Defendant’s capacity to cope with the stress of incarceration prior
as opposed to perfect or complete understanding to trial
Most U.S. courts follow this standard, which empha- Source: Florida Rules of Criminal Procedure, cited in Wrightsman, L. S., Nietzel, M. T.,
& Fortune, W. H. (1994). Psychology and the legal system (3rd ed.). Pacific Grove, CA:
sizes current functioning of the accused. Brooks/Cole.
Legal Issues and the Future of Testing 371

of test results indicating severe neuropsychological deficit incompetent. Clinicians have a variety of methods and tests
may warrant a finding of legal incompetence, even if the (described previously) for identifying clients who might be
client’s IQ is in the normal range. For example, a defendant malingering. Even so, the process of competency evalua-
with severe stroke-induced deficits in language compre- tion is not foolproof, as indicated by such high-profile cases
hension may be found incompetent to stand trial. as the Connecticut man who avoided prosecution for mur-
Several formalized screening tests and procedures are der (Associated Press, June 30, 1998). This individual had
available to assist in competency evaluation. Rogers and allegedly murdered his former girlfriend and her current
Johansson-Love (2009) provide an outstanding introduc- boyfriend with a handgun, then shot himself in the head.
tion to evidence-based practice in evaluating competency He suffered brain damage and partial paralysis and was
to stand trial. We focus our attention here on the MacArthur declared incompetent to stand trial by four psychiatrists in
Competence Assessment Tool—Criminal Adjudication four separate hearings. They argued that he was incapable
(MacCAT-CA), one of the most promising and psychomet- of communicating effectively with his lawyer. A court order
rically sound of the many tests developed for this purpose that he undergo yearly competency evaluations was over-
(Hoge, Bonnie, Poythress, & Monahan, 1999). turned, dropping him through the cracks and leaving him a
The MacCAT-CA consists of 22 items grouped into free man. Nine years later he was found attending college
three subscales of psycholegal abilities: Understanding, as a pre-med student with a 3.3 grade point average. Exam-
Reasoning, and Appreciation. The examiner begins the test ples like this are reason for humility and caution when psy-
by reading a hypothetical short story to the defendant chologists approach competency evaluations.
about two men who get into a fight (one of them is later
charged). The first subscale (8 items) assesses the defend- Personal Injury and Related Testimony Per-
ant’s ability to understand the legal system with questions sonal injury as from an automobile accident is often a
like “What is the job of the defendant’s lawyer?” These source of litigation for monetary compensation. In per-
questions are scored 0, 1, or 2, based on degree of under- sonal injury lawsuits, attorneys may hire psychologists to
standing. The second subscale (8 items) evaluates the testify as to the lifelong consequences of traumatic stress or
defendant’s ability to reason in regard to the hypothetical acquired brain damage. For example, a clinical neuropsy-
story, and to evaluate legal options for the hypothetical chologist might administer a comprehensive test battery
defendant. The third subscale (6 items) departs from the (see Chapter 10, Neuropsychological Assessment and
imaginary scenario and assesses the defendant’s capacity Screening) and then testify as to the long-term functional
to understand his or her legal situation. The questions implications of known brain damage.
explore the defendant’s appraisal of how he or she is likely In general, a consulting psychologist who testifies in
to function and to be treated during the course of the trial. court will encounter extremely high practice standards. We
The psychometric properties of the MacCAT-CA were have already mentioned the Frye standard, which provides
evaluated in a study of 729 felony defendants (Poythress, that testimony must be based upon tests and procedures
Monahan, Bonnie, Otto, & Hoge, 2002). The researchers that have “gained general acceptance” in the field. Thus, a
found good internal consistency for the three subscales, test or procedure that is relevant or useful in everyday clin-
with coefficient alphas of .81 to .88. The construct validity ical practice—but which is not widely accepted in the
of the instrument is well supported by confirmatory factor field—might be greeted with skepticism in the courts. A
analyses that yield the three factors posited by the test judge may even rule that testimony is inadmissible if it is
developers (Zapf, Skeem, & Golding, 2005). In general, the based on tests or procedures with flimsy validation. Worse
MacCAT-CA and similar instruments are a useful begin- yet, the judge may allow such testimony, which opens the
ning to a competency evaluation but should not be the sole expert witness to criticism and ridicule by opposing attor-
method of assessment. Most forensic experts emphasize neys. With these concerns in mind, Heilbrun (1992) has
the complexities of the legal process and the need to use published guidelines for the practice of forensic assess-
competency screening instruments sparingly, and mainly ment. These include the use of well documented tests with
in complex cases, as an adjunct to the clinical interview. reliability of .80 or higher, interpretation with actuarial for-
The MacCAT-CA and similar approaches prove less help- mulas where available, and evaluation for malingering,
ful when clients put forth little effort, demonstrate cogni- defensiveness, and other reasons to discount the test data.
tive impairment, or come from diverse cultural Increasingly, courts have been willing to compensate
backgrounds (Pinals, Tillbrook, & Mumley, 2006). Addi- mental injuries in addition to physical injuries. The damage
tional competency screening tests are reviewed by Zapf is variously referred to as “psychic trauma” or “emotional
and Roesch (2009). distress” or “emotional harm.” The evaluation of emotional
A serious concern in competency evaluations is injury will rely somewhat on psychological test results
whether the client is malingering. After all, delaying a trial (especially personality tests), but the assessment requires
date for a long time provides a strong motive to appear great clinical skill including “a longitudinal history of the
372 Chapter 12

impairment, its treatment, and attempts at rehabilitation, summed to yield a total score (range of 0 to 40) that reflects
including the claimant’s motivation to recover” (Melton et the extent to which the individual resembles the prototypi-
al., 1998, p. 381). We see once again that the question of cal psychopath. Two factor scores also can be derived, each
malingering haunts most forms of forensic assessment. based on eight or nine items. Factor 1 reflects the selfish,
callous, and remorseless use of others, whereas factor 2
Specialized Personality Assessment in Foren- indicates a chronically unstable and antisocial lifestyle.
sic Settings On occasion, psychologists are asked to A substantial body of research indicates that the PCL-R
provide specialized forms of personality assessment in possesses strong reliability and validity. For example, inter-
forensic settings. For example, a prison psychologist might rater reliability is typically in the .90s, test-retest coeffi-
evaluate an inmate for antisocial tendencies, or a forensic cients approach .90, and internal consistency coefficients
psychologist might assess a treated pedophile for sexual are in the mid-to high .80s (Schroeder, Schroeder, & Hare,
interest in young children. The range of tools and tech- 1983). The predictive validity of the instrument is bolstered
niques useful for specialized assessment is broad. We cover by the capacity of PCL-R scores to predict a variety of anti-
only one specialized approach here. social behaviors, including violent recidivism following
In prison settings, examiners have a special interest in prison release, poor response to correctional treatment pro-
determining whether inmates possess the traits of psycho- grams, and disorderly behavior while in prison (Hare,
pathic personality. Similar to antisocial personality as 1996; Sreenivasan, Walker, Weinberger, Kirkish, & Garrick,
described in DSM-IV (APA, 2000), the concept of psycho- 2008). For example, in one study inmates with high PCL-R
pathic personality has a long and rich history that dates scores were twice as likely to engage in fights and more
back to Emil Kraepelin (1856–1926), the father of diagnos- than three times as likely to be belligerent than other
tic psychiatry. But it was the psychiatrist Cleckley (1941) inmates (Hare & McPherson, 1984). In another study of a
who first provided a detailed description of the psycho- therapeutic community treatment program for adult male
path in his pathbreaking book, The Mask of Sanity. Based offenders, psychopaths showed less clinical improvement,
upon extensive clinical work with individuals who he demonstrated lower levels of motivation, and were dis-
labeled psychopaths, Cleckley identified a number of per- charged from the program earlier than nonpsychopaths.
sonality traits and behavioral signs displayed by these The differences were not small: The psychopathic law-
individuals. The key qualities appear to be a lack of breakers stayed in the treatment program less than half as
remorse or shame in a charming individual who uses other long as the other offenders (Ogloff, Wong, & Greenwood,
people and whose life lacks any goal or direction. Good at 1990). Clearly, psychopathic personality is a useful con-
lying, the psychopath also shows poor judgment and is cept, and the PCL-R is a practical measure of the construct.
considered incapable of love. One recent study does raise concerns about examiner
Although his description is antiquated in places, con- differences in the scoring of the PCL-R in field settings
temporary researchers continue to find descriptive and (Boccaccini, Turner, & Murrie, 2008). Although the manual
predictive value in Cleckley’s conception of the psycho- for the instrument does provide specific guidelines for
path. In fact, one researcher has developed a highly each item, judgment is needed to differentiate scores of 0,
respected and widely used assessment tool based closely 1, or 2. It is possible for examiners to differ in their scoring
on this original formulation of psychopathic personality. tendencies for a variety of reasons, including the evalua-
The Psychopathy Checklist-Revised (PCL-R; Hare, tors’ readiness to seek outside information, diligence in fol-
2003; Hare & Neumann, 2006) consists of a 20-item rating lowing the required protocol, response bias in using higher
scale carefully designed to assess the qualities of psycho- or lower ends of the scales, and drift over time in obser-
pathic personality in a quantitative and empirical fashion. vance of scoring rules. Even the characteristics of the exam-
Prior to filling out the rating scale, the examiner conducts a iner (e.g., warm versus cold demeanor) can cause item
lengthy semistructured interview (90 to 120 minutes) with scores to shift up or down.
the client. The interview concerns the Cleckley-based traits Boccaccini et al. (2008) examined the PCL-R total scores
of psychopathy, as slightly revised and expanded by Hare for 20 different Texas state-contracted evaluators who were
and colleagues (Hare, Harpur, Hakstian, and others, 1990). hired to screen 321 referrals for civil commitment as sexually
Each item reflects a particular symptom, such as glibness violent predators. The evaluators encountered their referrals
and superficial charm, grandiose sense of self-worth, path- in a more or less quasi-random manner, so it is reasonable to
ological lying, lack of remorse or guilt, or failure to accept expect that the average PCL-R scores across examiners
responsibility. Items are rated on a 3-point scale (0 = doesn’t should be reasonable similar. This proved not to be the case
apply, 1 = applies somewhat, 2 = definitely applies). The in dramatic fashion. Restricting the comparison to examin-
rating is based on lifetime functioning rather than the pre- ers completing at least 20 evaluations (to provide constancy
sent state, which explains why a long interview is essential of results), the researchers found sizable disparities in aver-
for correct use of the scale. The item scores are then age PCL-R scores: the means varied from a low of 17.5 (SD
Legal Issues and the Future of Testing 373

of 8.8) to a high of 27.1 (SD of 6.1). These are large inequities the screen. The report provides a list of careers that best fit
on the PCL-R, which has a maximum possible score of 40. In the interests of the client. A hard copy is also printed for
this study, examiner differences account for a large degree of later review. Presumably, the client is better informed
variability in the PCL-R total scores. There may be a need to about compatible career options and, therefore, more likely
improve the field reliability of assessment for this forensic to choose a satisfying line of work. This scenario is a simple
instrument. Perhaps some form of training program is example of computer-assisted psychological assessment
needed to certify individuals in its use. (CAPA), a recent development hailed by many psycholo-
gists but criticized by others.
It is common knowledge that computers are now used
12.2: Computerized widely in psychological testing. However, the breadth of
these applications might surprise the reader. In addition to
Assessment and the Future straightforward applications such as presenting test ques-
tions, scoring test data, and printing test results (as described
of Testing earlier), computers can be used to (1) design individualized
12.2a Identify contemporary applications of the tests based upon real-time feedback during testing, (2) inter-
computer in psychological assessment pret test results according to complex decision rules, (3)
write lengthy and detailed narrative reports, and (4) present
12.2b Discuss the professional and social issues raised test stimuli in engaging and realistic formats, including
by this practice high-definition video and virtual reality. We touch upon all
Computers are now used in virtually every aspect of assess- of these topics in our review. The umbrella term computer-
ment, including the administration, scoring, and interpreta- assisted psychological assessment (CAPA) refers to the
tion of many tests. In fact, for many instruments it is now entire range of computer applications in psychological
possible for the practitioner to seat the client in front of a assessment. CAPA holds great promise to the practice of
computer with instructions that consist of “Please follow the psychology but also presents a variety of practical and ethi-
instructions.” Minutes later, the practitioner receives a cal problems that demand careful and thoughtful considera-
lengthy narrative report, consisting not only of summary tion. A brief history of CAPA is a good backdrop to the
scores but also a lengthy, sophisticated interpretive report. discussion of practical and ethical concerns (Table 12.6)
Although the use of computers in testing is manifestly a pos-
itive development, it also raises a number of troubling ques-
tions. In this topic, current applications of the computer in Table 12.6 Historical Landmarks in CAPA
psychological assessment are surveyed, and the professional
and social issues raised by this practice are discussed. The
topic closes with thoughts on the future of testing—which
will be forged in large measure by increasingly sophisticated
applications of computer technology. We begin with an
overview and history of the computer in testing.

12.2.1: Computers in Testing:


Overview and History
In many counseling centers it is possible for a client to
make an appointment with a microcomputer to explore
career options. Other than a brief interaction with the
receptionist to schedule time at the computer, the client
need not interact with any other human being during the
entire assessment process. The exact scenario will differ
from one setting to the next but might resemble the follow-
ing. Instructions on the computer screen encourage the
user to press any key. The computer then prompts the cli-
12.2.2: Computer-Based Test
ent to answer a series of questions about activities and
interests by pressing designated numeric keys. After com- Interpretation: Current Status
pletion of the inventory, the computer calculates raw scores Computer-based test interpretation, or CBTI, refers to
for a long list of occupational scales and makes appropriate test interpretation and report writing by computer. Every
statistical transformations. Next, a brief report appears on major test publisher now offers computer-based test
374 Chapter 12

interpretations. These services are available by mail-in, Scoring Reports Scoring reports consist of scores
online computer with modem, or on-site microcomputer and/or profiles. In addition, a scoring report may include
package. Moreover, the market for computer-based test- statistical significance tests and confidence intervals plot-
ing and report writing is so lucrative that we can antici- ted for the test scores. By definition, scoring reports do not
pate massive growth in this field for many years to come. include narrative text or explanation of scores. Moreland
Butcher (1987, App. A) listed 169 vendors as of 1986. (1992) discusses the appeal of scoring reports:
Conoley, Plake, and Kemmerer (1991) note that the num-
ber of computerized psychological test interpretations These kinds of data make it possible to identify espe-
had increased to more than 400 by 1990. New computer- cially meaningful scores and meaningful differences
among scores at a glance. They should also increase a
ized test systems are reported virtually every month in
user’s confidence that those scores are in fact important.
trade magazines and newspapers (e.g., APA Monitor).
Statistical significance tests are undoubtedly superior to
Computer-based test interpretation is here to stay. “clinical rules of thumb” when it comes to accurate inter-
In this section we will provide an overview of the pretation of test scores. And who has time to hand calcu-
types of computer-based test interpretations currently late confidence intervals—especially for tests with
available. A comprehensive review of products could eas- dozens of scales?
ily span several volumes, so the reader will have to settle
for a discussion of diverse and representative examples of An example of a scoring report for the Jackson Voca-
CBTI. We will examine four approaches to CBTI: scoring tional Interest Survey (Jackson, 1991) is shown in Figure 12.1.
reports, descriptive reports, actuarial reports, and com- You will notice that a great deal of information is pre-
puter-assisted clinical reports (Moreland, 1992). sented in an efficient, condensed manner. This is typical

Figure 12.1 A Scoring Report from the online version of the Jackson Vocational Interest Blank
NOTE: The full report consists of an 11-page printout.
Source: Reprinted with permission from JVIS.com. © 2008, SIGMA Assessment Systems, Inc. All rights reserved.

Similarity to College Students


JVIS profiles from over 10,000 university students who were enrolled in more than 150 different major fields, ranging from accounting to zoology, have
been collected and analyzed. That analysis indicated that the major fields could be classed into 17 broad academic clusters. Each cluster is based on
data from both males and females and represents a set of educational majors that shared a similar pattern of JVIS scores.

The chart below ranks the similarity of your JVIS Basic Interest profile to each of the student clusters. A high score indicates that your pattern of inter-
ests is similar to students in the fields of concentration defining the cluster, while a low score indicates dissimilarity. These scores indicate your probable
interest and satisfaction with these academic clusters. These scores do not tell you whether or not you will be successful in any particular field.

Score Similarity University Major Cluster

+0.62 Very Similar Environmental Resource Management


+0.55 Similar Health, Physical Education and Recreation
+0.39 Moderately Similar Agribusiness and Economics
+0.37 Moderately Similar Art and Architecture
+0.30 Moderately Similar Food Science
+0.12 Neutral Engineering
+0.03 Neutral Science
−0.03 Neutral Computer Science
−0.08 Neutral Performing Arts
−0.12 Neutral Social Service
−.12 Neutral Health Services and Science
−.19 Neutral Mathematical Sciences
−.25 Neutral Business
−.25 Dissimilar Communication Arts
−.30 Dissimilar Behavioral Science
−.32 Dissimilar Education
−.54 Dissimilar Social Science, Law and Politics

Your JVIS profile is most similar to college students whose academic areas of specialization are in the three clusters listed below. Sample majors for
each of these three areas are also listed.

University Major Cluster Sample Majors

Environmental Resource Management Wildlife Technology, Recreation and Parks, Environmental Resource Management, Agricultural
Business Management, Agriculture, Forest Science and Technology, Horticulture.
Health, Physical Education and Recreation Health and Physical Education, Recreation and Parks.
Agribusiness and Economics Agricultural Economics and Rural Sociology, Agricultural Business Management, Food Service
and Housing Administration.
Legal Issues and the Future of Testing 375

of scoring reports. In a single page, this hypothetical model of simplicity and clarity. By comparison, most con-
respondent would learn that his interests are highly simi- temporary computer-based descriptive reports provide
lar to majors in liberal arts, education, and business. In excessive detail. Typically, the clinician must wade through
terms of occupational fit, he also learns that he is highly several pages of narrative to extract essential features
compatible with counselors, teachers, lawyers, adminis- about the client.
trators, and other professions with an emphasis upon
human relations. Actuarial Reports: Clinical versus Actuar-
ial Prediction The actuarial approach to computer-
Descriptive Reports A descriptive report goes one
based test interpretation is based upon the empirical
step further than a scoring report by providing brief scale-
determination of relationships between test results and the
by-scale interpretation of test results. Descriptive reports
criteria of interest. The nature of this approach is best
are especially useful when test findings are conveyed to
understood in the context of the longstanding debate on
mental health professionals who have little knowledge of
clinical versus actuarial prediction. A brief detour is needed
the test in question. For example, most clinical psycholo-
here to introduce relevant concepts and issues before dis-
gists know that a high score on the MMPI Psychasthenia
cussing actuarial reports.
scale signifies worry and dissatisfaction with social rela-
Many computer-based test interpretations make pre-
tionships—but other mental health practitioners may not
dictions about the test taker. These predictions are often
have a clue as to the meaning of an elevation on this scale.
disguised in the language of classification or diagnosis,
A descriptive report can convey invaluable information in
but they are predictions nonetheless. For example, when a
a half page or less. A variety of descriptive reports have
computer-based neuropsychological test report tenta-
been developed over the years. We have provided a generic
tively classifies a client as having brain damage, this is
example in Figure 12.2.
actually an implicit prediction that can be confirmed or
disconfirmed by external criteria such as brain scans and
Figure 12.2 Generic Example of an MMPI-2 Brief neurological consultation. Likewise, when a computer-
Descriptive Report based MMPI-2 report provides a tentative DSM-IV diag-
nosis of a clinic referral, this is also a prediction that can be
Brief Descriptive Report for John Sample, 35-year-old married man validated or invalidated by external criteria such as inten-
with 20 years of education sive clinical interview. A final example: When a computer-
Scale: VRIN TRIN L F K based CPI screening report for police candidates warns
Score: 47 50 48 53 46
that an applicant will make a poor adjustment in law
Scale: Hs D Hy Pd Mf Pa Pt Sc Ma Si enforcement, this is also a prediction that could be proved
Score: 56 82 60 59 48 67 78 52 40 76
correct or incorrect by an inspection of personnel records
Major Features:
Validity: Valid profile from a cooperative patient with openness
at a later date.
and lack of exaggeration The use of computers for test-based prediction high-
D 82 Depressed and pessimistic, lacks energy and lights an essential distinction known as clinical versus
­concentration, thinks about self-harm
Pt 78 Worried and full of turmoil, feels anxious and tense,
actuarial judgment (Dawes, Faust, & Meehl, 1989; Garb,
reports physical complaints 1994; Meehl, 1954, 1965, 1986). In clinical judgment, the
Si 76 Introverted and shy, uncomfortable in large social decision maker processes information in his or her head
events, prefers a few close friends
Pa 67 Sensitive and moralistic, feels victimized and to diagnose, classify, or predict behavior. An example: A
­suspicious, tends to blame others clinical psychologist uses experience, intuition, and text-
Hy 60 Mildly demanding and self-centered, lacks insight,
somewhat attention seeking book knowledge to determine whether an MMPI profile
Pd 59 Mildly rebellious and impulsive, restless, independent, indicates psychosis. Psychosis is a broad category that
energetic, and assertive
Hs 56 No special interpretation, average level of bodily and
includes serious mental disorders often characterized by
health concerns hallucinations, delusions, and disordered thinking. Thus,
Sc 52 No special interpretation, absence of unusual beliefs or
a clinician’s prediction of psychosis (or lack thereof) can
strange behaviors
Mf 48 No special interpretation, mix of esthetic pursuits and be validated against external criteria such as detailed
interests in the outdoors interview.
Ma 40 Lacking in energy, probably feels fatigued, possible
depressive symptoms In actuarial judgment, an empirically derived for-
mula is used to diagnose, classify, or predict behavior. An
example: A clinical psychologist merely plugs scale scores
The reader will notice that the 20-year-old male patient into a research-based formula to determine whether an
is described as shy, sensitive, worried, and severely MMPI profile indicates psychosis. The actuarial predic-
depressed. Referral of this medical patient to a psycholo- tion, too, can be validated against appropriate external
gist or psychiatrist clearly is warranted. This report is a criteria.
376 Chapter 12

The essence of actuarial judgment is the careful devel- on chance relations among variables and produce a spuri-
opment and subsequent use of an empirically based for- ously high rate of correct decisions.
mula for diagnosis, classification, or prediction of behavior. When the conditions are met for a fair test of clinical
A common type of actuarial formula is the regression equa- versus actuarial decision making, the latter method is
tion in which subtest scores are combined in a weighted superior in the vast majority of cases. The actuarial
linear sum to predict a relevant criterion. But other statisti- approach is clearly better for the task cited previously—
cal approaches may work well for decision making, too, differential diagnosis of neurosis or psychosis from the
including simple cutoff scores and rule-based flow charts. MMPI. L. R. Goldberg (1965) determined that a simple lin-
Of course, statistical rules lend themselves to computer ear sum of selected MMPI scale scores resulted in 70 per-
implementation, so it is fitting to discuss clinical versus cent correct classifications, whereas Ph.D. psychologists
actuarial judgment in this section on computer-based test averaged only 62 percent, with the single best psychologist
interpretation. achieving 67 percent correct decisions. The decision rule
Although computers facilitate the use of the actuarial that defeated all human contenders was: if the T-score sum
method, we need to emphasize that “actuarial” and “com- on L + Pa + Sc − Hy − Pt exceeds 44, diagnose psychosis;
puterized” are not synonymous. To be truly actuarial, test otherwise, diagnose neurosis.3
interpretations must be automatic (prespecified or routi- Dawes, Faust, and Meehl (1989) cited nearly 100 com-
nized) and based on empirically established relations parative studies in the social sciences. In almost every case,
(Dawes, Faust, & Meehl, 1989). If a computer program the actuarial method equaled or surpassed the clinical
incorporates such automatic, empirically based decision- method, sometimes substantially. The research by Leli and
making rules, then it is making an actuarial prediction. Filskov (1984) is typical in this regard. They studied the
Conversely, if a computer program embodies the thinking diagnosis of progressive brain dysfunction based upon
and judgment of a clinician—no matter how wise that per- neuropsychological testing. An actuarial decision rule
son is—then it is making a clinical prediction. derived from one set of cases was applied to a new sample
Meehl (1954) was the first to introduce the issue of clini- with 83 percent correct identification. Working from pre-
cal versus actuarial judgment to a broad range of social sci- cisely the same test data, groups of inexperienced and expe-
entists. He stated the issue with pure simplicity: “When rienced clinicians correctly identified only 63 percent and 58
shall we use our heads instead of the formula?” Consider percent of the new cases, respectively. The reader will notice
the practical problem of distinguishing between neurosis the disturbing and embarrassing fact that experience did
and psychosis on the basis of MMPI results. Neurosis is an not improve hit rates for this clinical decision-making task.
outdated (but still used) diagnostic term that refers to a A study by McMillan, Hastings, and Coldwell (2004)
milder form of mental disorder in which symptoms of anxi- also illustrates the value of simple actuarial methods for
ety or dysphoria predominate. As noted previously, psycho- predicting clinical outcomes. Their investigation involved
sis is a more serious form of mental disorder that may 124 residents of a forensic intellectual disability hospital in
include hallucinations, delusions, and disordered thinking. England. In this setting, violence is not a rare occurrence,
The differential diagnosis between these two broad classes and predicting who might be violent (and therefore require
of mental disorder is important. Persons with neurosis often greater attention) is of utmost importance. The researchers
respond well to individual psychotherapy, whereas a patient compared two approaches to prediction of hospital vio-
with psychosis may need powerful antipsychotic medica- lence in the next six months: (1) the actuarial approach was
tions that produce adverse side effects. Which is superior for merely to tally the number of documented episodes in the
MMPI-based diagnostic decision making, the head of the prior six months and use this information as the index of
well-trained psychologist or an appropriate formula based risk; and (2) the clinical approach was to use the judgment
upon prior research? We return to this issue later. of the clinical team (psychiatrist, psychologist, nursing
Meehl (1954) specified two conditions for a fair com- staff, and attendants) on a 9-point risk scale (0 = ‘no risk’
parison of these contrasting approaches to decision mak- and 8 = ‘very high risk’). Briefly, the actuarial approach
ing. First, both methods should base judgments on the proved slightly but not significantly superior to the clinical
same data. For example, in comparing the experienced cli- approach. Both approaches revealed strong predictive
nician against an actuarial equation, both approaches validity. The results substantiate the common adage that
should prognosticate from the same pool of MMPI profiles “the best predictor of future behavior is past behavior.”
and only those profiles. Second, we must avoid conditions A recent meta-analysis of 136 studies by Grove, Zald,
that can artificially inflate the accuracy of the actuarial Lebow, Snitz, and Nelson (2000) provides additional support
approach. For example, the actuarial equation should be
derived on an initial sample, prior to the comparison with
clinical decision making on a new sample of MMPI pro- 3
Respectively, the full names for these scales are L (validity scale),
files. Otherwise, the actuarial decision rules will capitalize Paranoia, Schizophrenia, Hysteria, and Psychasthenia.
Legal Issues and the Future of Testing 377

for the superiority of actuarial prediction over clinical predic- Unfortunately, as the reader will discover in the following,
tion. These researchers analyzed diverse studies in the fields most computerized narrative test reports are clinically
of medicine, education, and clinical psychology in which based—which raises concerns about their validity.
practitioners predicted such outcomes as academic perfor-
Actuarial Interpretation: Sample Approach 
mance, job success, medical diagnosis, psychiatric diagnosis,
The developers of the Personality Inventory for Children
criminal recidivism, and suicide. In each study, the clinical
(PIC) produced an exemplary system for computer-based
predictions of the practitioners (physicians, professors, and
actuarial test interpretation, which we will describe for
psychologists) were compared to the actuarial predictions
illustrative purposes. The reader will recall from a previ-
derived from empirically based statistical formulas. Although
ous chapter that the PIC, now updated as the PIC-2, is a
the researchers found a few scattered instances in which the
true-false inventory that the parent or caregiver com-
clinical method was notably more accurate than the statisti-
pletes with respect to the child’s behavior. Based upon
cal method, on the whole, their survey confirmed prior find-
these responses, a profile of T scores (mean of 50, SD of 10)
ings on this topic. The authors conclude:
is produced for four validity scales (e.g., Defensiveness),
Even though outlier studies can be found, we identified 12 clinical scales (e.g., Delinquency), and four factor scales
no systematic exceptions to the general superiority (or at (e.g., Social Incompetence). In total, T scores are reported
least material equivalence) of mechanical prediction. It for 20 scales on the PIC. Of course, higher T scores indicate
holds in general medicine, in mental health, in personal- a greater likelihood of psychopathology.
ity, and in education and training settings. It holds for
Actuarial interpretation of the PIC rests upon the
medically trained judges and for psychologists. It holds
empirically derived correlations between individual scales
for inexperienced and seasoned judges. (p. 25)
and important nontest criteria. Research subjects for the
Perhaps the most disturbing conclusion of these Lachar and Gdowski (1979) study consisted of 431 children
researchers was that the availability of clinical interview referred to a busy teaching clinic. As part of the evaluation
actually detracted from the accuracy of practitioner predic- process for each child, the staff members, parents, and
tions in the diverse fields studied. Compared to the empiri- teachers completed a comprehensive questionnaire, which
cally based statistical predictions, the clinical predictions listed 322 descriptive statements concerning behavior and
were outperformed by an even greater margin when infor- other variables. In addition, parents or caretakers filled out
mation from clinical interview was available to the practi- the PIC.
tioners. The reasons for this are unclear but likely include In the first phase of the actuarial study, the 322 descrip-
the susceptibility of humans to certain cognitive biases tive statements were correlated with the 20 PIC scales to
(e.g., paying too much attention to vivid interview infor- identify significant scale correlates. In the second phase,
mation). Also, clinicians typically do not receive adequate the significant correlates were analyzed further to deter-
feedback as to the accuracy of their judgments and, hence, mine the relationship between descriptive statements and
have no basis for correcting maladaptive predictions. T-score ranges on the PIC scales. The outcome of this pro-
The lesson to be learned from this literature is that com- digious effort was a series of actuarial tables not unlike the
puterized narrative test reports should incorporate actuarial tables used by insurance companies to predict the likeli-
methods, when possible. For example, computer-generated hood of illness, death, accidents, and the like, based upon
reports should use existing actuarial formulas to determine population demographics such as age, sex, and residence.
the likelihood of various psychiatric diagnoses, rather than Some examples of actuarial correlates of the Delinquency,
relying upon the programmed logic of a master clinician. or DLQ, Scale are depicted in Table 12.7.

Table 12.7 Occurrence Rates for Actuarial Descriptors of the PIC Delinquency Scale
T-Score Ranges
*
Descriptor Base Rate 30–59 60–69 70–79 80–89 90–99 100–109 110–119 >120
Refuses to go to bed 30 18 26 23 33 36 33 42 38
Lies 62 44 36 48 73 71 79 90 91
Uses drugs 12 0 2 6 7 11 18 32 53
Rejects school 40 16 26 40 42 50 47 56 67
Involved with police 17 0 4 6 10 21 19 58 63
*
Percentage of all children rated as displaying the characteristic.
NOTE: These five descriptors are merely a representative sample of the 51 actuarial correlates of the Delinquency Scale.
Source: Material from Actuarial Assessment of Child and Adolescent Personality: An Interpretive Guide for the Personality Inventory for Children Profile copyright © 1979 by Western
Psychological Services. Reprinted by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, CA 90025, United States of America.
378 Chapter 12

Actuarial tables capture a wealth of information useful profiles will be uninterpretable by means of an actuarial
in clinical practice. Consider two hypothetical 12-year-old approach. The discouraging truth about actuarial “cook-
children, Jimmy and Johnny, each referred to a clinician book” systems for test interpretation is that the classifica-
with the same presenting problem: school underachieve- tion rate usually plummets when a system is used in a
ment. As part of the intake procedure, the clinician asks new setting. The classification rate refers to the percent-
each mother to fill out the PIC. Suppose that the Delin- age of test results that fit the complex profile classification
quency, or DLQ, Scale score for Jimmy is highly elevated at rules necessary for actuarial interpretation. For example,
a T score of 114, whereas Johnny obtains an average range in the Gilberstadt and Duker (1965) actuarial MMPI sys-
T score of 54. Based upon these scores, the clinician would tem, the 1-2-3 code type is defined by these rules for the
know the likelihood—listed here as percentages—that cer- Hs (Hypochondriasis), D (Depression), Hysteria (Hs),
tain behavioral descriptions apply to each child: and L, F, K (validity) scales:
1. Hs, D, and Hy over T score 70
Jimmy (DLQ = 114) Johnny (DLQ = 54)
2. Hs > D > Hy
Refuses to go to bed 42% 18%
3. No other scales over T score 70
Lies 90% 44%
4. L < T score 66, F < T score 86, and K < T score 71
Uses drugs 32% 0%
Rejects school 56% 16% Persons who produce this kind of MMPI profile often
Involved with police 58% 0%
suffer from psychophysiological overreactivity, not to men-
tion a host of other empirically confirmed characteristics.
The reader will immediately recognize that Jimmy fits Of course, there are several additional code types, each
a pattern of pervasive conduct disorder, whereas Johnny defined by a set of complex decision rules, and each accom-
appears to have few such behavior problems. In Jimmy’s panied by an elaborate, actuarially based description of
case, the underachievement is most likely secondary to a personality and psychopathology. A typical finding is that
pattern of antisocial behavior, whereas for Johnny the clini- a computer-assisted actuarial system developed within
cian must look elsewhere to understand the school failure. one client population will be capable of interpreting up to
Of course, this is only a small fraction of the information 85 percent of the test profiles encountered in that setting.
that would be available from a computer-based actuarial However, when the actuarial system is applied to a new
interpretation of the PIC. In a full report, the clinician client population, perhaps 50 percent of the test profiles
would receive statistics and narrative statements pertinent will fit the decision rules. This means that about half of the
to all 20 scales from the PIC. test profiles do not fit the rules. At best, these clients will
Unfortunately, great effort and expense are needed to receive a superficial, scale-by-scale interpretation rather
develop actuarial tables like those provided for the original than a more sophisticated actuarial interpretation based
PIC. Few test publishers are willing to take on the financial upon code types. The problem of shrinkage in classifica-
burden. Increasingly, test developers rely on clinical judg- tion rate is observed in virtually all studies of actuarial
ment as the basis for computer-assisted assessment. interpretation (Moreland, 1992).
Computer-assisted clinical reports tend to be lengthy
Computer-Assisted Clinical Reports In a com- and detailed, full of scale scores, item indices, and graphs.
puter-assisted clinical report, the interpretive statements Of course, these reports also include several pages of nar-
assigned to test results are based upon the judgment of one rative report, usually phrased in terms of hypotheses as
or more expert clinicians. The expert clinicians formalize opposed to confirmed findings. The shortest such report is
their thought processes and develop automated decision about six pages (e.g., the Karson Clinical Report for the
rules that are then translated into computer code. This 16PF), whereas longer ones can run to 10 or 20 pages (e.g.,
method differs crucially from the computer-assisted actu- MMPI-2 interpretations).
arial approach in which interpretive statements are based
strictly upon formal research findings. Superficially, the
12.2.3: Interactive Video, Virtual
two approaches may appear to be identical insofar as each
is rule based and automated. The difference has to do with Reality, and Smartphones
the origin of the rules: empirical research (actuarial The digital revolution continues to accelerate, with far
approach) versus clinician judgment (clinical approach). reaching consequences for every aspect of society, includ-
Even though clinicians generally recognize the supe- ing psychological assessment. Digital devices such as
riority of the actuarial method, there is one significant smartphones, tablets, and laptops have become smaller,
advantage to the computer-assisted clinical approach. faster, cheaper, and possess bandwidth hardly imagined a
The advantage is that the clinical approach can be few years ago. The revolution promises to enhance psy-
designed to interpret all test profiles, whereas some test chological assessment in ways we are just beginning to
Legal Issues and the Future of Testing 379

comprehend, but also guarantees new ethical challenges. duration) to an individual. At a critical point the scene is
Consider one seemingly inconsequential quandary posed stopped and four options for addressing the conflict are
by the digital revolution: Before conducting an assess- provided; the assessee is asked to choose the option that
ment, should a psychologist Google a client? Unless per- best describes what he or she would do in this situation.
sonal safety could be at stake, the answer is “No”: Depending on the option chosen, the computer branches
to an extension of the first scene depicting how events
Curiosity about a client is not a clinically appropriate reason might unfold. Again, the conflict escalates, the scene is
to do an Internet search. Let’s put it this way: If you know frozen, four options for addressing the conflict are pre-
that your client plays in a soccer league, it would be a little sented, and the assessee decides which option would best
odd if on Saturday afternoon you drove by the game to see resolve the conflict. The computer then branches to an
how your client is doing. In the same way, if you’re doing a entirely new conflict scene. (p. 180)
search, thinking, “What can I find out about this person?”
that raises questions about the psychologist’s motives. The perils of this effort include the increased expense
(Martin, 2010). required for test development (e.g., cost of producing
high-quality, convincing videos) as well as daunting the-
The promise of the digital revolution is huge, but
oretical issues (e.g., challenge of conceptualizing “good”
alongside new developments, psychology and other health
conflict resolution skills). This kind of interactive,
professions will need new ethical guidelines.
branching, video-based test also poses unique psycho-
Interactive Video in Assessment One recent metric problems. For example, how do you assess the
approach to assessment made possible by modern com- reliability of specific subelements of the test when only a
puter technology is the flexible, interactive presentation of few of the examinees may have taken that “route”
high quality, captivating video segments. At IBM, research- through the test?
ers have been developing the Workplace Situations test to In spite of these challenges, the development of path
assess job applicants for manufacturing positions (Dras- breaking instruments such as the CRSA is well worth the
gow, Olson-Buchanan, & Moberg, 1999). What is unique effort. Consider one important payoff, namely, scores on
about the test is the nature of the stimuli. Rather than the CRSA show essentially no correlation with general cog-
merely describing work situations, the test displays com- nitive ability (Drasgow et al., 1999). Psychologists long
puter-driven interactive video of realistic work scenes. have suspected that social skills are distinct from cognitive
The assessment consists of 30 short scenes in a fictional skills, but when both are assessed with traditional paper-
organization named Quintronics. The scenes depict work- and-pencil instruments, moderate to strong correlations
related interpersonal episodes arising in the manufacture are the rule. Most likely, this is because of shared method
of hypothetical electronic products called quintelles and variance, namely, verbal test-taking skills help an exami-
alpha pinhole boards. The computer vignettes depict such nee navigate any paper-and-pencil test, regardless of the
concerns as excessive workloads, poor training, interper- construct being measured. By using interactive video as
sonal conflict, poor productivity, and flawed work. Each the primary test stimulus, instruments such as the CRSA
scene is presented and then the screen pauses with a provide a purer measure of social skills than paper-and-
description of five ways of responding to the workplace pencil tests. This unique instrument illustrates that social
problem. The scenes have a highly realistic feeling to skills contribute something different than cognitive skills
them, which enhances the face validity of the test. This to effective work performance.
kind of interactive video test likely provides a more accu- Another potential application of interactive video is
rate assessment than paper-and-pencil tests of how people in personnel screening for entry-level police officers. Law
would actually respond on the job. Tests that use interac- enforcement personnel must have good observational
tive video are especially good at tapping examinees’ abili- and evaluative skills, which can be assessed realistically
ties to deal with complex, real-life problems, such as with video stimuli. For example, an assessment might
decision making under time pressure or conflict resolution consist, in part, of a videotape of witnesses at a crime
in the workplace. scene. Police candidates might be asked to determine the
Olson-Buchanan et al. (1998) have developed an inter- truth of the witnesses and to draw conclusions about the
active video test of conflict resolution that reveals both the crime based upon their observational powers (APA Moni-
promise and the perils of this new technology. Their instru- tor, June 1994). This example—currently hypothetical—
ment, the Conflict Resolution Skills Assessment (CRSA), illustrates the potential for multimedia to revolutionize
consists of nine conflict scenes, each with the potential for psychological assessment.
multiple branchings, depending upon the examinee’s It is worth noting that interactive video tests can be
ongoing response pattern: virtually free of reading and writing requirements on the
A typical item on the Conflict Resolution Skills Assess- part of the examinee. Talented job candidates who do not
ment begins by presenting a conflict scene (1–3 minutes in possess good reading or writing abilities but who do have
380 Chapter 12

practical job skills can be identified by means of these tests. produces “subjective engagement that is equivalent to
For some jobs, interactive video might be fairer than the engagement in the real world (Parsons et al., 2008, p. 187).”
paper-and-pencil approach. While promising, the test is currently an experimental
measure in need of further validation.
Virtual Reality Approaches to Assessment
Buxbaum, Dawson, and Linsley (2012) recently vali-
Virtual reality (VR) is a sophisticated mode of human-
dated a virtual reality measure of hemispatial neglect, suit-
computer interface that allows users to navigate and
able for patients who have sustained a right-hemisphere
manipulate three-dimensional environments in a natural-
stroke. In 40 to 50 percent of cases, right-hemisphere stroke
istic fashion (Vince, 2004). The participant wears a pair of
patients demonstrate impairment in the detection of
goggles that transmit realistic, three-dimensional images
objects, persons, and events on the left side of space, a con-
of a simulated environment, called a virtual environment
dition known as hemispatial neglect. This condition consti-
(VE). In the most sophisticated applications of VR, the
tutes a serious barrier to independence. For example,
user also would wear gloves that are interfaced with the
patients tend to bump into things when walking. While
video display so that objects in the VE can be manipu-
paper-and-pencil tasks such as line bisection are useful in
lated. More commonly, especially in psychological assess-
diagnosis, they do not detect subtle cases of the disorder.
ments with VR, the simulated environment is navigated
Dawson, Buxbaum, and Rizzo (2008) developed the
with a joystick or similar device. The VR user can walk,
Virtual Reality Lateralized Attention Test (VRLAT) to pro-
run, or even fly through the VE and explore points of ref-
vide an ecologically valid and more sensitive measure of
erence that would be difficult or impossible in the real
hemispatial neglect (Dawson, Buxbaum, & Rizzo, 2008).
world (Vince, 2004).
The test requires patients to travel along a virtual path
New assessment tools that utilize virtual reality are in
using a joystick (participant condition) or passively view-
their infancy, but show great promise. One positive feature
ing the environment while the examiner controls the pace
of these tests is that most possess good ecological validity.
(examiner condition). The task of the patient is to identify
The required tasks highly resemble real world issues and
virtual objects on both sides of the path and to avoid col-
concerns. Consider the contrast between a paper-and-pen-
lisions with the objects. As patients negotiate the path,
cil measure and a VR measure of executive functions. The
they are instructed to call out the objects they see, includ-
Trail-Making Test, part B (TMT-B), requires the user to
ing unique attributes (e.g., blue tree, pig statue, camel
draw a line between numbers and letters in alternating
statue, orange tree). Each path is negotiated in both direc-
order as quickly as possible (Reitan & Wolfson, 1993). It is
tions, so every object occurs once on the left side, and
considered a measure of executive functions, among other
once on the right side. In total, users complete six path-
skills. The VE Grocery Store test requires the user to navi-
ways. Individual responses are scored 0 to 3 depending
gate a VR-simulated grocery store in search of shopping
on whether the object is detected and correctly described.
list items (Parsons, Rizzo, Brennan, Bittman, & Zelinski,
Of course, there are separate scores for left-sided objects
2008). This, too, is considered a measure of executive func-
and right-sided objects. A disparity between the scores
tions. The TMT-B may be a good test, but the task demands
favoring the detection of right-sided objects would be
are foreign to everyday living. In real life, we never connect
interpreted as left-sided hemispatial neglect. This is pre-
numbers and letters with a pencil line. In contrast, almost
cisely what the researchers found in a sample of 64 posta-
everyone has the need to shop for groceries. The VE Gro-
cute right-hemisphere stroke patients. Based on
cery Store test embodies good ecological validity. The tasks
appropriate statistical analyses, the researchers concluded
of the test include:
that the VRLAT possesses strong sensitivity and specific-
1. navigating through a virtual grocery store by follow- ity, minimal practice effects, and strong validity (Buxbaum
ing specified routes though the aisles, et al., 2012). Further, it was a better predictor of real-world
2. finding and selecting items needed to prepare sim- collisions than a battery of paper-and-pencil tests. The
ple meals, such as making a peanut butter and jelly collision test involved patients quickly traversing a 150
sandwich, foot long maze with multiple left and right turns. The
3. pricing and selecting other items so that no more than maze corridor was 3.5 feet in width. The number of colli-
a budgeted amount is spent, and sions with the walls was recorded.
For purposes of illustration, we have described here
4. performing a prospective memory task when a certain
only two assessment tools that utilize virtual reality, The
individual is encountered.
VE Grocery Store test and the Virtual Reality Lateralized
Researchers can vary the density of items on the Attention Test. Dozens more VR tests are available or in
shelves, the similarity of packaging, and the background production. Innovative approaches based on VR can be
distractions (e.g., loudspeaker announcements). Because found in several journals, including Virtual Reality and
of its strong ecological validity, the VE Grocery Store test CyberPsychology, Behavior, and Social Networking.
Legal Issues and the Future of Testing 381

12.2.4: Evaluation of Computer- based on the same input. The product is the same no matter
how many times the computer program is used. Further-
Based Test Interpretation more, because computerized reports are based on objective
Computerized testing has clear advantages but also some rules, they are not distorted by halo effects or other subjective
potentially serious disadvantages in comparison to the tra- biases that might enter into a clinically derived report.
ditional clinical approach to psychological testing. We offer Butcher (1987) asserts that computerized reports could have
a brief survey here, stressing both the advantages and dis- special significance in court cases, because they would be
advantages of computer-based testing, diagnosis, and viewed as “untouched by human hands.” This is an intrigu-
report writing. More detail on this topic can be found in ing possibility, but perhaps somewhat overly optimistic.
Butcher (1987), Moreland (1992), Roid and Johnson (1998), Lawyers and judges will still want to know who pro-
Butcher, Perry, and Atlis (2000), and Mills, Potenza, Fremer, grammed the software, how the narrative statements were
and Ward (2002). developed, and so on.

Advantages of Computerized Testing and Disadvantages of Computerized Testing and


Report Writing The main advantages of computer- Report Writing Consider the following illustration,
based testing are quick turnaround, inexpensive cost, hypothetical yet realistic and probably not a rare occur-
near-perfect reliability, and complete objectivity. In addi- rence. A hospital physician refers a difficult medical
tion, some measurement applications such as flexible patient to the psychology service for a personality evalua-
adaptive testing virtually require the use of computers for tion. The patient is escorted to the testing center where a
their implementation. We explore these points in more receptionist seats him at a table in front of a microcom-
detail later. puter. Instructions appear on the computer monitor to
In a busy clinical practice, delays between testing and answer a series of self-statements true or false by pressing
submission of the consulting report are common, almost the T or F key. The patient completes the computerized
inevitable. These delays not only tarnish the reputation of objective personality inventory and is escorted back to the
the consultant, but they may also adversely affect the medical service. Seconds later, a narrative report based on
treatment outcome for the client. For example, a college the patient’s responses emerges from the printer. The con-
student with learning disabilities may need immediate sulting psychologist peruses the report briefly, then sends
intervention in order to avert an academic disaster. A it (unsigned) through departmental mail to the physician.
delay of two or three weeks in submission of a consulting The report is handsome, ever so crisp in its laser-printed
report could spell, indirectly, the difference between fail- appearance, with a graphic summary of scales on the
ure and success in academic performance. Computer- cover page. Furthermore, the narrative is valid sounding
based reports can speed up the entire consultation process. and reads as if it were copyedited by a professional writer
Many software systems produce reports that can be trans- (in fact, it was). The physician is impressed and takes the
ferred into a standard word-processing program for report to heart, making treatment decisions based on the
immediate customized editing, thereby speeding up the personality evaluation.
turnaround time (e.g Psychological Corporation, 1994; This scenario illustrates an essential quandary with
Tanner, 1992). computer-based testing and report writing: Computers
Cost is another consideration in computer-based can so dominate the testing process that the clinical psy-
testing. Although there are no definitive studies on this chologist is demoted to a mere clerk—or is removed from
topic, most authorities assert that computer-scored and the assessment loop entirely. Although most psycholo-
interpreted psychological tests cost considerably less gists acknowledge that computers are a welcome addi-
than those produced entirely by clinician effort (Butcher, tion to the practice of psychological testing, critics have
1987). In their studies of automated testing at the Salt raised a number of disquieting concerns about recent
Lake City VA Hospital, Klingler, Miller, Johnson, and assessment practices such as those depicted here. Com-
Williams (1977) concluded that the computer cut the cost puterization of the testing process raises practical, legal,
of testing in half. Certainly as the computerized testing ethical, and measurement issues that deserve thoughtful
programs become more sophisticated and are used by review.
larger numbers of clinicians, the cost per consultation In general, skeptics do not attack the practice of com-
will plummet. puterizing the mechanics of test administration and scor-
Reliability and objectivity are the hallmarks of the com- ing; these computer applications are seen as efficient and
puter. Assuming that the software is accurate and error-free, appropriate uses of modern technology. Nonetheless, even
computers simply do not make clerical scoring errors, nor do the most ardent proponents acknowledge the need to
they vary their methods of stimulus presentation from one investigate test-form equivalency when an existing test is
day to the next, nor do they yield different narrative reports adapted to computerized administration. In particular,
382 Chapter 12

practitioners should not assume that the computerized


adaptation and the original version of a test produce iden-
tical results. Equivalency is an empirical issue that must be
demonstrated by appropriate research. For most tests,
equivalency can be demonstrated, but this must not be
taken for granted (Lukin, Dowd, Plake, & Kraft, 1985;
Schuldberg, 1988).
Some tests do not maintain score equivalency when
translated to computer. The Category Test (CT) from the
Halstead-Reitan Neuropsychological Battery is a case in
point. In a comparison of computerized and standard ver-
sions of the Category Test with rehabilitation patients,
Berger, Chibnall, and Gfeller (1994) found a huge difference
in error rate for two groups of subjects who had equivalent
backgrounds: an average of 84 errors on the computerized
CT versus an average of 66 errors on the standard CT test.
Apparently, the computerized CT test is much more diffi-
cult than the standard version, which means that separate
norms must be developed for its interpretation. Much
smaller differences between computerized and standard
test administration have also been reported for the MMPI,
with computer-based scores tending to underestimate (very The measurement advantages of CAT can be summa-
slightly) the booklet-based scores (Watson, Thomas, & rized in two words: precision and efficiency (Weiss & Vale,
Anderson, 1992). 1987). Regarding precision, CAT guarantees that each
examinee is measured with the same degree of precision
12.2.5: Computerized Adaptive because testing continues until this criterion is met. This is
not so with traditional tests in which scores at both tails of
Testing the distribution reflect greater levels of measurement error
A final advantage of computer-based testing is its applica- than scores in the middle of the distribution. Regarding effi-
tion to flexible adaptive testing. Adaptive testing is noth- ciency, the CAT approach requires far fewer test items than
ing new—Binet used it when he worked out the methods are needed in traditional testing. For example, written certi-
for finding the basal and ceiling items on his famous intel- fication examinations usually include 200 to 500 items,
ligence test. Binet placed his items along a continuum of while CAT examinations are always shorter, often includ-
difficulty so that the examiner could test downward to ing fewer than 100 items to achieve a more accurate level of
find the examinee’s basal level and test upward to find the measurement (Lunz & Bergstrom, 1994). In one analysis,
ceiling level. This procedure eliminated the need to the reliability of alternative computer-adaptive tests for cer-
administer irrelevant items—those so easy (below the tification in medical technology was .96 (Lunz, Bergstrom,
basal level) that the examinee would surely pass them, or & Wright, 1994). This is remarkable because shorter tests
those so hard (above the ceiling level) that the examinee (the goal in CAT testing) tend to have lower reliability than
would surely fail them. Another example of adaptive test- longer tests (such as found in traditional testing programs).
ing is the two-stage procedure whereby results on an ini- In addition to increased measurement efficiency, CAT
tial routing test are used to determine the entry level for has many other advantages over traditional paper-and-
subsequent scales. For example, on the Stanford-Binet: pencil assessment. Wainer (2000) points out that CAT
Fifth Edition, results of the initial vocabulary and matrices allows for better test security, immediate scoring and feed-
subtests determine the starting points for subsequent sub- back, equal challenge to all examinees, presenting of new
tests. By reducing the time needed to obtain an accurate items, and the use of a variety of question types.
measure of ability, adaptive testing fulfills a very construc- Regarding the last point, examples of novel item types
tive purpose. not possible on a traditional multiple-choice exam include
Computerized adaptive testing (CAT) is a family of spoken words (such as for a spelling test), open-ended
procedures that allows for accurate and efficient measure- math problems (the answer is typed in), and video seg-
ment of ability (Wainer, 2002). Although details differ from ments (followed by written questions).
one method to another, most forms of computerized adap- The CAT approach to psychological testing has been
tive testing share similar features. used mainly by large organizations such as the U.S. Army
Legal Issues and the Future of Testing 383

and the Educational Testing Service for assessment of intel- In the late 1990s, the Educational Testing Service moved
ligence and special abilities. In recent years, national licens- toward near total reliance on CAT versions of the Graduate
ing boards (e.g., in medicine) have begun to implement Record Examination and other selection tests. Licensing and
CAT testing because of convenience in scheduling tests, certification boards such as the National Council of State
tighter control over test security, reduced costs, and the Boards of Nursing also have introduced CAT versions of
opportunity for better data collection (Lunz & Bergstrom, their certification tests. Mills and Stocking (1996) discuss
1994). Technical information on CAT systems is proprie- practical issues in large-scale computerized testing.
tary and difficult to obtain. Nonetheless, it is clear that the
efficiency of the CAT approach is substantial. CAT uses
12.2.6: The Future of Testing
fewer items of better quality than a conventional test of the
same length. A general finding is that CAT reduces test What is the future of psychological testing in the twenty-
length by about 50 percent, with reductions for individual first century? We will hazard a few speculations here, cog-
examinees of up to 80 percent, with no loss in measure- nizant that prognostications about the future often are
ment accuracy (Laatsch & Choca, 1994; Weiss & Vale, 1987). wrong. Forecasting developments in testing is especially
One recent study revealed phenomenal success for the difficult because the enterprise is increasingly constrained,
CAT approach in reducing the time spent in testing, and directly or indirectly, by public opinion. For example, at one
simultaneously providing better discriminant validity in point in the 1980s the legislature of the state of California
the assessment of depressive symptomatology (Gibbons, made it illegal for school psychologists to use traditional
Weiss, Kupfer, Frank, Fagiolini, and others, 2008). The intelligence tests as a basis for placing minority students in
study involved 800 outpatients who completed the 616- special education classes. These restraints on testing were
item Mood and Anxiety Spectrum Scales (MASS) at two driven by public outrage over the excessive placement of
times. The first administration was used to develop and minority students in special education classes. Thus, even
evaluate a CAT version of the MASS, while the second when a particular technology of testing is feasible and pro-
administration confirmed the functioning of the CAT moted by psychologists, there exists the possibility that it
method in live testing. The CAT version utilized an aver- might be strictly controlled or even banned.
age of 95 percent fewer test items (30 instead of 616) and yet A case in point is Matarazzo’s (1992) prediction that
was shown to provide better discrimination of seriously biological measures of intelligence will gain prominence in
depressed versus mildly depressed patients. the twenty-first century. Certainly it appears true that bio-
Recently, CAT has been applied to mainstream person- logical measures of ability such as averaged evoked poten-
ality tests like the MMPI-2 with encouraging results. For- tial (gauged from EEG waves), or glucose metabolic rate in
bey and Ben-Porath (2007) provide a review of the MMPI-2 the brain (gauged from PET scans), or relative brain size
computerized adaptive version. They conclude that the (gauged from MRI scans) will prove to be effective
new approach provides the same accuracy of measurement approaches to assessment. But Matarazzo (1992) goes fur-
with about a 20 percent reduction in the number of items ther in asserting that these and other biological approaches
administered. Even so, CAT approaches with the MMPI-2 actually will receive common usage:
are experimental at this point, and not likely to receive sig- Therefore, another of my predictions is that in the early
nificant clinical usage for several years. decades of the 21st century we may see the further devel-
There may well be reasons for caution in the application opment and use in practice of these and other biological
of CAT to personality testing. An unavoidable consequence indices of brain function and structure in a test (or a test
of using CAT is that item order will change from one exami- battery) for the measurement of individual differences in
nee to another, which may invoke context effects that influ- mental ability, thus heralding the first clear break from
test items and tests in the Binet tradition in a century.
ence subsequent item responding. Ortner (2008) investigated
(p. 1012, italics in the original)
this prospect with a CAT version of the Eysenck Personality
Profiler (EPP) with 362 German adults. Some participants Although Matarazzo’s prediction could come true, a
first encountered items representing extreme trait levels, more likely scenario is that the general public will be
while others were first exposed to items representing threatened when biological indices are used in assessment
medium trait levels. These initial exposures distorted their and will, therefore, take steps (e.g., pressure on legislators)
subsequent responses to the point where scores on three out to ensure that such measures receive limited (if any) appli-
of seven scales of the EPP shifted up or down significantly. cation. The public will be threatened because, rightly or
These findings indicate that context effects may be a prob- wrongly, biological characteristics such as glucose meta-
lem in using CAT with personality inventories. bolic rate in the brain are perceived to be relatively perma-
As the cost of computing continues to plummet, more nent and immutable. The fear will arise that biological tests
and more large-scale applications of CAT will be developed. will sort people into a caste system. Even if (or when) the
384 Chapter 12

validity of biological tests is firmly established, it will be Test publishers likely will focus on less-expensive and
decades (if ever) until they are found acceptable by the less-risky forms of test development such as instruments
general public. that embody distinctive constructs relevant to specific tar-
get groups. Examples might include tests to measure risky
Trends in Testing: A Few Confident Predictions behaviors in adolescents, mental decline in elderly persons,
The computerization of testing is already a fixture of indus- faulty cognitions in depressed persons, or communication
trialized societies and this trend can only increase in the problems in maritally distressed couples. These kinds of
future. Existing tests will be adapted to the desktop com- instruments will flourish, whereas publishers will rarely
puter with increasing regularity. An example of this trend invest in new omnibus tests of personality or ability, prefer-
is Fepsy (Ferrum 1 Psyche), a system for automated neu- ring instead to revise and recycle existing instruments.
ropsychological testing that is available online at 220 sites We can also predict with some confidence that the
throughout the Netherlands and most of Europe. Fepsy is movement toward evidence-based assessment will gain
described on the Internet at www.euronet.nl/users/fepsy. strength in the years ahead. In evidence-based assessment,
Fepsy consists of the following subtests: the soundness of a testing tool is evaluated not just by
1. Auditory reaction time means of the standard psychometric indices of reliability
2. Binary choice reaction time and validity but through considerations of clinical utility as
well (Barlow, 2005).
3. Tapping task
Clinical utility is a broad and imprecise concept that
4. Visual searching task
consists of several features, including treatment utility,
5. Recognition tasks monetary cost, psychological cost, and client acceptability
6. Vigilance task (Hunsley & Mash, 2005). Treatment utility is vital. This is
7. Rhythm task the extent to which assessment data contributes to positive
8. Classification task treatment outcomes. Does the client get better, faster, as a
result of the assessment? If not, what is the point? But bet-
9. Six Visual Half field tasks
ter outcomes are not the only consideration in clinical util-
10. Corsi block tapping
ity. The monetary cost of assessment needs to be considered
A common use is pre- and postoperative testing of as well. Even a helpful assessment may prove to be coun-
patients who undergo epilepsy surgery for relief of seizures. terproductive if the cost is prohibitive. Someone has to pay
The system has even been used with fully conscious patients for assessments, whether it is the clients or the insurance
during surgery. Under local anesthesia, the patient works on companies. More money for assessment means less money
a subtest while simultaneously receiving harmless electrical is available for other purposes such as ongoing psycho-
stimulation at distinctive sites on the cortex. The purpose is therapy. Cost is always an issue in health care (Cummings,
to determine whether specific cognitive functions might be 2007). Clinical utility also includes the psychological cost
affected when scar tissue is excised from the brain. The of measurement errors. For example, when assessment
advantage of using a multicenter, multinational, computer- incorrectly suggests that an older adult shows signs of
ized testing system is that the examiner has access to norma- dementia, the client and family pay an emotional price.
tive data for thousands of patients with specific conditions. Likewise, false negative results (e.g., concluding that a
Another prediction is that fewer and fewer wide- patient with mild dementia is normal) also exact an emo-
spectrum tests (e.g., personality inventories and individ- tional toll. Emotional costs are intangible, but nonetheless
ual intelligence tests) will be released by test publishers an important consideration in clinical utility. Finally, client
(Gregory, 1998). Instead, publishers will concentrate on acceptability needs to be considered as well (Hunsley &
tests designed to assess particular areas of functioning for Mash, 2005). Will the client agree to complete the assess-
specific target populations (e.g., measures of memory ment? Or will the client resist the assessment and thereby
functioning for elderly persons suspected of having produce misleading results? All of these factors need to be
dementia). The reasons for these complementary trends considered in clinical utility.
are economic: Driven in large measure by the insistence of the insur-
ance industry that therapies must be empirically based,
Test publishing is big business, a respectable way for
there is a growing demand for brief, effective treatments
large corporations to earn a profit. Publishers will be
throughout the entire field of health care. Evidence-based
reluctant to make the major investment needed to develop
new instruments that have the grandiose ambition of assessment is inescapably intertwined with this national
assessing many aspects of personality or intellect for a movement toward evidence-based therapy for medical
wide range of subjects. The cost is too high and—in light and psychological illnesses. Side by side with this trend,
of the existing competition—the risk is too great. we can expect to see a keen emphasis upon empirically
(Gregory, 1998, pp. 76–77) based psychological assessments.
Legal Issues and the Future of Testing 385

Finally, we can predict that positive psychological • analyzing social context and behavior from ambient
assessment will gain greater popularity. Positive psycho- sounds
logical assessment is a natural spin-off of the positive psy- • correlating mood reports with GPS locations and
chology movement, which is defined as “the scientific and ambient noise levels
practical pursuit of optimal human functioning” (Lopez & • analyzing voice modulation for patterns of stress
Snyder, 2003). Proponents of the positive psychology
movement find the current focus of assessment—with its With the enlistment of the users, a wide range of eco-
emphasis on pathology and what is wrong with people— logical momentary assessments could be gathered (Cour-
to be lopsided and incomplete. A full understanding of voisier, Eid, & Lischetzke, 2012). One example: Migraine
persons also includes an appraisal of with is right with headache sufferers could provide pain ratings at random
them. It includes a census of such positive qualities as intervals throughout the day, to determine the efficacy of
hope, creativity, wisdom, courage, forgiveness, humor, treatments. The smartphone might chime a distinctive
gratitude, and coping. The traditional instruments of psy- tone, signaling a request to tap an on-screen scale. Com-
chological testing—for example, the Rorschach, MMPI-2, pleting the scale would take about five seconds. Another
MCMI-III, and so forth—provide essentially no informa- example: Short surveys could be administered following
tion on these positive human qualities. In the years ahead, phone calls from significant others to identify patterns of
new instruments and original philosophies of assessment emotional reactivity.
will most certainly redress the imbalance. While holding great promise, smartphones in assess-
ment also carry likely pitfalls (Miller, 2012). Obtaining
The Smartphone Revolution By 2025, more than truly informed consent will be more problematic because
5 billion people will be using ultra-broadband smartphones users do not read software licensing agreements before
with capacities far beyond current technology (Miller, 2012). clicking “I agree.” Further, confidentiality is difficult to
Although smartphones were not designed for purposes of guarantee because of the vulnerability of digital systems to
psychological testing, they possess the potential for imple- hacking. Liability is another concern. Developers of testing
menting a large range of ecologically valid assessments. To apps could be liable for unintended consequences. A pro-
envision the smartphones of the future, we need a phenom- gramming bug might cause smartphones to malfunction or
enal leap of imagination. We also need new vocabulary, could prevent emergency calls. Further, technology is
specifically, we need to know the definition of teraflop. This changing so fast that established practitioners will need
is a measure of computer speed and refers to a trillion constant updating:
decimal number computations per second, in other words,
really fast. In 2025, smartphones are likely to possess How can older researchers grow comfortable with such a
futuristic technology—one that is, to all intents and pur-
at least eight 200 GHZ [gigahertz] processors, yielding poses, indistinguishable from magic?
about 10 teraflops—making them ten times faster than (Miller, 2012, p. 234).
the first teraflop supercomputer in 1997, the $50 million,
9,600 processor Intel ASCI Red that filled a whole room. Smartphone applications in assessment will raise chal-
Such powerful smartphones cold run complex psych lenging ethical and practical issues. Even so, the future is
apps continually in the background (e.g., running emo- bright in this new arena of testing.
tion detection algorithms on voice input or combining
GPS and geographic information system (GIS) data into Testing and the Next Big Questions in Psy-
measures of daily movement patterns), without disrupt- chology In this closing section of the book, we turn to
ing other apps and annoying participants. Future smart- some admittedly more speculative predictions about the
phones—basically handheld supercomputers—will be future of testing. In doing so, the reader is invited to exer-
able to run psych apps of nearly limitless complexity cise his or her imagination as well. After all, psychological
(Miller, 2012, p. 224). testing is here to stay, and will continue to evolve and
adapt. As it has for more than a century, testing will con-
Future applications will extend well beyond simple sur-
tinue to play a significant role in psychology and modern
veys and tests. Once clients download the appropriate “test-
society. But exactly how?
ing apps,” smartphones could be programmed for countless
The starting point for this final conversation is a fasci-
forms of creative assessment. Here are a few examples of
nating issue of Perspectives on Psychological Science, a jour-
current and future assessment uses (Miller, 2012). In each
nal of the Association for Psychological Science (Diener,
case, the permission of the user would be needed:
2009). The journal editor asked a number of leading psy-
• estimating fast food exposure from a user’s GPS and chologists—none from the field of psychometrics—to write
GIS data about the most important questions to be asked in their
• gathering data from biosensors for remote physiology particular specialty in the upcoming decade. These ques-
assessment tions have been reproduced in paraphrased manner (for
386 Chapter 12

consistency and clarity) in Table 12.8. A few esoteric contri- role for psychological testing in answering one of the big
butions have been omitted. questions in psychology.
Another question on the list pertains to evolutionary
explanations of personality and individual differences
Table 12.8 The Next Big Questions in Diverse Fields of
Psychology (Buss, 2009). Regardless of the particular directions pur-
sued in this line of research, accurate measurement of
What is the connection between complex psychological states such as
emotion or cognition and the physical substrates of the brain? (Barrett, 2009) appropriate personality constructs will be required. By
Why do people do what they do? And, what are the important situational appropriate, we are referring here not to just any old person-
and personality variables in answering this question? (Funder, 2009) ality constructs, but to those that “cut nature at its joints.”
What is the nature and nurture of plasticity in early human development? In other words, the personality constructs measured in
(Belsky & Pluess, 2009)
evolutionary psychology need to capture essential under-
How do we achieve a synthesized understanding of early cognitive
development from studies of separate cognitive abilities? (Oakes, 2009) lying elements of personality that could be susceptible to
How can evolutionary psychology successfully explain personality and evolutionary influence. As an example, consider research
individual differences? (Buss, 2009)
on humor styles, discussed in an earlier chapter. Using the
How do stressful events and negative emotions influence the immune
system, and how big are the effects? (Kiecolt-Glaser, 2009)
Humor Styles Questionnaire (HSQ, Martin, et al., 2003) in a
How can you tell if a particular memory belonging to you or someone else behavior genetics analysis of identical and fraternal twins,
is true or false? (Bernstein & Loftus, 2009) Vernon et al. (2008) found that positive forms of humor
Can we improve our physical health by altering our social networks? (Affiliative and Self-enhancing) revealed significant genetic
(Cohen & Janicki-Deverts, 2009)
influence whereas negative forms of humor (Aggressive
How can decision making be improved? (Milkman, Chugh, & Bazerman, 2009)
and Self-defeating) arose from environmental influences.
How can we promote self knowledge (“Know thyself”) and what are the
results of greater self knowledge? (Wilson, 2009) The point of this digression is that the HSQ successfully
Can psychological research on correcting cognitive errors promote partitions humor into meaningful elements, including
human welfare? (Lilienfeld, Ammirati, & Landfield, 2009)
some that can be explained in evolutionary terms. For
Is it possible to teach intuition and can it be enhanced by virtual
simulation? (Seligman & Kahana, 2009)
example, we could hypothesize that Affiliative humor pro-
What are the mechanisms of gene-environment interaction effects in the motes group bonding which, in turn, promotes individual
development of conduct disorder? (Dodge, 2009) survival and thus allows for genes to be passed on. This
Why do different individuals progress along different life trajectories? conclusion is possible because of the careful analysis of
(Smith, 2009)
humor implicit in the development of the relevant person-
How can we live well? How can we achieve and sustain a good life?
(Park & Peterson, 2009) ality test, the Humor Situations Questionnaire.
What is the near and distant future of human-android interaction? Finally, consider the question whether we can improve
(Roese & Amir, 2009) our physical health by altering our social networks (Cohen
& Janicki-Deverts, 2009). There is no doubt that member-
Of course, the whole point in asking a question is to ship in a diverse social network is correlated with a variety
hope that an answer will be found. While at first glance it of positive health outcomes, such as resistance to cognitive
may seem only a slight possibility that psychological test- decline with aging, better prognosis when facing chronic
ing could contribute to answering any of these questions, illnesses, and even greater resistance to infectious disease
on closer examination it seems likely that testing will play (Cohen & Janicki-Deverts, 2009). But these results are cor-
an essential role in many cases. relational, not necessarily causal. The pressing question is:
Consider the question of nature and nurture of plastic- if individuals alter their social networks will this improve
ity in early human development (Belsky & Pluess, 2009). physical health? From the standpoint of psychological test-
This is a general topic that invites many specific lines of ing, the construct of social network is crucial to the answer.
inquiry. For example, Davis et al. (2007) found that prena- What is a social network? How is it assessed or measured?
tal maternal depression and elevated cortisol levels in late Research in this area of behavioral medicine will require
pregnancy predicted negative reactivity in children at age the development of straightforward and valid measures of
2. But what is “negative reactivity?” The dependent varia- social network, another role for testing in answering one of
ble in this line of research—reactivity in children—is a con- the big questions of psychology.
struct measured by rating scales and situational tests. Thus, As a final challenge, the reader is invited to review the
one line of answers to the underlying question (“What is list in Table 12.8. What roles can you see for psychological
the nature and nurture of plasticity in early human devel- testing in answering these big questions?
opment?”) likely will hinge on the development of precise
and valid measures of reactivity in children. This is a clear Chapter Quiz: Legal Issues and the Future of Testing
Appendix A
Major Landmarks in the
­History of Psychological
­Testing
2200 b.c. Chinese begin civil service examinations.
1838 Jean Esquirol distinguishes between mental illness and mental retardation.
1862 Wilhelm Wundt uses a calibrated pendulum to measure the “speed of thought.”
1866 O. Edouard Seguin writes the first major textbook on the assessment and treatment of mental retardation.
1869 Wundt founds the first experimental laboratory in psychology in Leipzig, Germany.
1884 Francis Galton administers the first test battery to thousands of citizens at the International Health Exhibit.
1890 James McKeen Cattell uses the term mental test in announcing the agenda for his Galtonian test battery.
1896 Emil Kraepelin provides the first comprehensive classification of mental disorders.
1901 Clark Wissler discovers that Cattellian “brass instruments” tests have no correlation with college grades.
1904 Charles Spearman proposes that intelligence consists of a single general factor g and numerous specific factors s1, s2, s3, and so forth.
1904 Karl Pearson formulates the theory of correlation.
1905 Alfred Binet and Theodore Simon invent the first modern intelligence test.
1908 Henry H. Goddard translates the Binet-Simon scales from French into English.
1912 Stern introduces the IQ, or intelligence quotient: the mental age divided by chronological age.
1916 Lewis Terman revises the Binet-Simon scales, publishes the Stanford-Binet; revisions appear in 1937, 1960, 1986, and 2003.
1917 Robert Yerkes spearheads the development of the Army Alpha and Beta examinations used for testing WWI recruits.
1917 Robert Woodworth develops the Personal Data Sheet, the first personality test.
1920 The Rorschach inkblot test is published.
1921 Psychological Corporation—the first major test publisher—is founded by Cattell, Thorndike, and Woodworth.
1926 Florence Goodenough publishes the Draw-A-Man Test.
1926 The first Scholastic Aptitude Test is published by the College Entrance Examination Board.
1927 The first edition of the Strong Vocational Interest Blank is published.
1935 The Thematic Apperception Test is released by Morgan and Murray at Harvard University.
1936 Lindquist and others publish the precursor to the Iowa Tests of Basic Skills.
1936 Edgar Doll publishes the Vineland Social Maturity Scale for assessment of adaptive behavior in those with mental retardation.
1938 L. L. Thurstone proposes that intelligence consists of about seven group factors known as primary mental abilities.
1938 Raven publishes the Raven's Progressive Matrices, a nonverbal test of reasoning intended to measure Spearman's g factor.
1938 Lauretta Bender publishes the Bender Visual Motor Gestalt Test, a design-copying test of visual-motor integration.
1938 Oscar Buros publishes the first Mental Measurements Yearbook.
1938 Arnold Gesell releases his scale of infant development.
The Wechsler-Bellevue Intelligence Scale is published; revisions are published in 1955 (WAIS), 1981 (WAIS-R), 1997 (WAIS-III), and 2008
1939
(WAIS-IV).
1939 Taylor–Russell tables published for determining the expected proportion of successful applicants with a test.
1939 The Kuder Preference Record, a forced-choice interest inventory, is published.
1942 The Minnesota Multiphasic Personality Inventory (MMPI) is published.
1948 Office of Strategic Services (OSS) uses situational techniques for selection of officers.

387
388 Appendix A

1949 The Wechsler Intelligence Scale for Children is published; revisions are published in 1974 (WISC-R), 1991 (WISC-III), and 2003 (WISC-IV).
1950 The Rotter Incomplete Sentences Blank is published.
1951 Lee Cronbach introduces coefficient alpha as an index of reliability (internal consistency) for tests and scales.
1952 American Psychiatric Association publishes the Diagnostic and Statistical Manual (DSM-I).
1953 Stephenson develops the Q-technique for studying the self-concept and other variables.
1954 Paul Meehl publishes Clinical vs. Statistical Prediction.
1956 The Halstead-Reitan Test Battery begins to emerge as the premiere test battery in neuropsychology.
1957 C. E. Osgood describes the semantic differential.
1958 Lawrence Kohlberg publishes the first version of his Moral Judgment Scale; research with it expands until the mid-1980s.
1959 Campbell and Fiske publish a test validation approach known as the multitrait-multimethod matrix.
1963 Raymond Cattell proposes the theory of fluid and crystallized intelligences.
In Hobson v. Hansen the court rules against the use of group ability tests to “track” students on the grounds that such tests discriminate
1967
against minority children.
1968 American Psychiatric Association publishes DSM-II.
1969 Nancy Bayley publishes the Bayley Scales of Infant Development (BSID). The revised version (BSID-2) is published in 1993.
1969 Arthur Jensen proposes the genetic hypothesis of African American versus white IQ differences in the Harvard Educational Review.
1971 In Griggs v. Duke Power the Supreme Court rules that employment test results must have a demonstrable link to job performance.
1971 George Vaillant popularizes a hierarchy of 18 ego adaptive mechanisms and describes a methodology for their assessment.
1971 Court decision requires that tests used for personnel selection must be job relevant (Griggs v. Duke Power).
1972 The Model Penal Code rule for legal insanity is published and widely adopted in the United States.
1974 Rudolf Moos begins publication of the Social Climate Scales to assess different environments.
1974 Friedman and Rosenman popularize the Type A coronary-prone behavior pattern; their assessment is interview-based.
1975 The U.S. Congress passes Public Law 94-142, the Education for All Handicapped Children Act.
1978 Jane Mercer publishes SOMPA (System of Multicultural Pluralistic Assessment), a test battery designed to reduce cultural discrimination.
In the Uniform Guidelines on Employee Selection adverse impact is defined by the four-fifths rule; also guidelines for employee selection
1978
­studies are published.
1979 In Larry P. v. Riles the court rules that standardized IQ tests are culturally biased against low-functioning black children.
1980 In Parents in Action on Special Education v. Hannon the court rules that standardized IQ tests are not racially or culturally biased.
1985 The American Psychological Association and other groups jointly publish the influential Standards for Educational and Psychological Testing.
1985 Sparrow and others publish the Vineland Adaptive Behavior Scales, a revision of the pathbreaking 1936 Vineland Social Maturity Scale.
1987 American Psychiatric Association publishes DSM-III-R.
1989 The Lake Wobegon Effect is noted: Virtually all states of the union claim that their achievement levels are above average.
1989 The Minnesota Multiphasic Personality Inventory-2 is published.
American Psychological Association publishes a revised Ethical Principles of Psychologists and Code of Conduct (American Psychologist,
1992
December 1992)
1994 American Psychiatric Association publishes DSM-IV.
1994 Herrnstein and Murray revive the race and IQ heritability debate in The Bell Curve.
1999 APA and other groups publish revised Standards for Educational and Psychological Testing.
2003 New revision of APA Ethical Principles of Psychologists and Code of Conduct goes into effect.
Appendix B
Standard and Standardized-
Score Equivalents of Percentile
Ranks in a Normal Distribution
This table lists the equivalence between percentile ranks that is equivalent to a percentile rank of 97. Reading across
and four other types of scores: z scores (mean of 0, SD of the row that begins with PR 97, we discover that the equiv-
1.00), deviation IQs (mean of 100, SD of 15), T scores (mean alent IQ is 128. Suppose that we desire to know the percen-
of 50, SD of 10), and GRE-like scores (mean of 500, SD of tile rank that is equivalent to a GRE score of 675. In the far
100). The application of the table assumes that the distribu- right column, we locate a score of 675 and read across to the
tion of scores on a test or variable is normally distributed. left-hand column to discover that the equivalent percentile
We illustrate how this appendix can be used with two rank is 96.
examples. Suppose that we desire to know the WAIS-IV IQ

z Deviation IQ T Score GRE-Like Score z Deviation IQ T Score GRE-Like Score


Mean 0.00 100 50 500 75 0.67 110 57 567
St. Dev. 1.00 15 10 100 74 0.64 110 56 564
PR 99 2.33 135 73 733 73 0.61 110 56 561
98 2.05 131 71 705 72 0.58 109 56 558
97 1.88 128 69 688 71 0.55 108 56 555
96 1.75 126 68 675 70 0.52 108 55 552
95 1.64 125 66 664 69 0.49 107 55 549
94 1.55 123 66 655 68 0.47 107 55 547
93 1.48 122 65 648 67 0.44 107 54 544
92 1.41 121 64 641 66 0.41 106 54 541
91 1.34 120 63 634 65 0.39 106 54 539
90 1.28 119 63 628 64 0.36 105 54 536
89 1.22 118 62 622 63 0.33 105 53 533
88 1.18 118 62 618 PR 62 0.31 105 53 531
87 1.13 117 61 613 61 0.28 104 53 528
86 1.08 116 61 608 60 0.25 104 53 525
85 1.04 116 60 604 59 0.23 104 52 523
84 0.99 115 60 599 58 0.20 103 52 520
83 0.95 114 60 595 57 0.18 103 52 518
PR 82 0.91 114 59 591 56 0.15 102 52 515
81 0.88 113 59 588 55 0.12 102 51 512
80 0.84 113 58 584 54 0.10 102 51 510
79 0.80 112 58 580 53 0.07 101 51 507
78 0.77 112 58 577 52 0.05 101 51 505
77 0.74 111 57 574 51 0.03 100 50 503
76 0.71 111 57 571 50 0.00 100 50 500

(continued)

389
390 Appendix B

z Deviation IQ T Score GRE-Like Score z Deviation IQ T Score GRE-Like Score


49 −0.03 100 50 497 24 −0.71 89 43 429
48 −0.05 99 49 495 23 −0.74 89 43 426
47 −0.07 99 49 493 22 −0.77 88 42 423
46 −0.10 98 49 490 21 −0.80 88 42 420
45 −0.12 98 49 488 20 −0.84 87 42 416
44 −0.15 98 48 485 19 −0.88 87 41 412
43 −0.18 97 48 482 18 −0.91 86 41 409
42 −0.20 97 48 480 17 −0.95 86 40 405
41 −0.23 96 48 477 16 −0.99 85 40 401
40 −0.25 96 47 475 15 −1.04 84 40 396
39 −0.28 96 47 472 14 −1.08 84 39 392
38 −0.31 95 47 469 13 −1.13 83 39 387
37 −0.33 95 47 467 12 −1.18 82 38 382
36 −0.36 95 46 464 11 −1.22 82 38 378
35 −0.39 94 46 461 10 −1.28 81 37 372
34 −0.41 94 46 459 9 −1.34 80 37 366
33 −0.44 93 46 456 8 −1.41 79 36 359
32 −0.47 93 45 453 7 −1.48 78 35 352
PR 31 −0.49 93 45 451 6 −1.55 77 34 345
30 −0.52 92 45 448 5 −1.64 75 34 336
29 −0.55 92 44 445 4 −1.75 74 32 325
28 −0.58 91 44 442 3 −1.88 72 31 312
27 −0.61 90 44 439 2 −2.05 69 29 295
26 −0.64 90 44 436 1 −2.33 65 27 267
25 −0.67 90 43 433
Glossary
accommodation in Piaget’s theory, the adjustment of an attention-deficit/hyperactivity disorder a behavioral syndrome
unsuccessful schema so that it works. characterized by fidgeting, distractibility, impulsivity, attentional
deficits, poor social skills, and not considering consequences.
achievement test a test that measures the degree of learning,
success, or accomplishment in a subject matter. attitude learned cognitive, affective, and behavioral predispositions
actuarial judgment the kind of automated judgment in which to respond positively or negatively to certain objects, situations,
an empirically derived formula is used to diagnose or predict institutions, concepts, or persons.
behavior. basal ganglia a collection of nuclei in the forebrain that make
adverse impact in hiring, adverse impact is said to exist if one connections with the cerebral cortex above and the thalamus below;
group has a selection rate less than four-fifths of the rate of the the basal ganglia participate in the control of movement.
group with the highest selection rate (Uniform Guidelines on basal level for tests in which subtest items are ranked from easiest
Employee Selection, 1978). to hardest, the level below which the examinee would almost
age norm a type of standardization that depicts the level of test certainly answer all questions correctly.
performance for each separate age group in the normative sample. base rate in decision theory, the proportion of successful applicants
alcohol abuse an alcohol use disorder characterized by the who would be selected using current methods, without benefit of
functional impact of drinking on the life of the patient (e.g., unsafe the new test.
behavior, legal problems, family conflicts). behavior observation scale a variation upon the BARS technique
alcohol dependence an alcohol use disorder characterized by which uses a continuum from “almost never” to “almost always”
tolerance, withdrawal, and preoccupation with drinking. to measure how often an employee performs specific tasks on each
behavioral dimension.
alternate-forms reliability a form of reliability in which alternate
forms of the same test are given to a group of heterogeneous and behavior sample in testing, the notion that a test is just a sample
representative subjects; scores for the two forms are then correlated. of behaviors that permits the examiner to make inferences about a
larger domain of relevant behaviors.
Alzheimer’s disease a degenerative neurological disorder; in the
early stages, the most prominent symptom is memory loss. behavior therapy the application of the methods and findings
of experimental psychology to the modification of maladaptive
Americans with Disabilities Act an act passed by Congress in
behavior; also called behavior modification.
1990 that forbids discrimination against qualified individuals with
disabilities. behavioral assessment a variety of techniques that concentrate on
behavior itself rather than on underlying traits, hypothetical causes,
amygdala an almond-shaped mass of gray matter located in
or presumed dimensions of personality.
the anterior temporal lobe, involved in emotions and other
capacities. behavioral avoidance test a behavioral procedure in which the
analogue behavioral assessment the observation of clients in a therapist measures how long the client can tolerate an anxiety-
contrived but plausible setting in which they are instructed to inducing stimulus.
engage in relevant tasks designed to elicit behaviors of interest. behavioral procedure a procedure for assessing the antecedents
aphasia any deviation in language performance caused by brain and consequences of behavior; behavioral procedures include
damage. checklists, rating scales, interviews, and structured observations.

apraxia variety of dysfunctions characterized by a breakdown in behaviorally anchored rating scale a criterion-referenced
the direction or execution of complex motor acts. rating scale.

aptitude test a test that measures one or more clearly defined and bias in construct validity a type of bias demonstrated when a test
relatively homogeneous segments of ability. is shown to measure different hypothetical traits (psychological
constructs) for one group than another or to measure the same trait
architectural system likened to “hardware” in the information- but with differing degrees of accuracy.
processing approach to intelligence, the architectural system refers
to biologically based properties (e.g., memory span, speed of bias in content validity a type of bias demonstrated when an item
encoding) necessary for information processing. or subscale is relatively more difficult for members of one group
than another after the general ability level of the two groups is held
assessment appraising or estimating the level or magnitude of
constant.
some attribute of a person; testing is one small part of assessment
which also incorporates observations, interviews, rating scales, and bias in predictive validity a type of bias demonstrated when the
checklists. inference drawn from the test score is not made with the smallest
feasible random error or if there is constant error in an inference or
assessment center an approach to assessment of managerial
prediction as a function of membership in a particular group.
talent, which consists of multiple simulation techniques, including
group presentations, problem-solving exercises, group discussion biodata objective or scoreable autobiographical data; recognized as
exercises, interviews, and in-basket techniques. a valid adjunct to personnel selection.
assimilation in Piaget’s theory, the application of a schema to an Broca’s aphasia also known as expressive aphasia, a form of
object, person, or event. language disturbance characterized by effortful, nonfluent speech
and few words.
attention the cognitive capacity of the brain to identify what is
important and to ignore what is irrelevant. C scale a variant on the stanine scale with 11 units.

391
392 Glossary

ceiling level for tests in which subtest items are ranked from easiest concurrent validity a type of criterion-related validity in which the
to hardest, the level above which the examinee would almost criterion measures are obtained at approximately the same time as
certainly fail all remaining questions. the test scores.
cerebellum part of the hindbrain responsible for helping to concussion a transitory alteration of consciousness from a blow
coordinate muscle tone, posture, and skilled movements. to the head; may be followed by temporary amnesia, dizziness,
nausea, weak pulse, and slow respiration, yet there is no
cerebral cortex the outermost layer of the brain that is the
demonstrable organic brain damage.
source of the highest levels of sensory, motor, and cognitive
processing. conservation in Piaget’s theory, the awareness that physical
quantities do not change in amount when they are superficially
cerebrospinal fluid a clear liquid, continuously produced and
altered in appearance.
replenished within the ventricles of the brain, that provides
protection against external buffeting. construct a theoretical, intangible quality or trait in which
individuals differ.
cerebrovascular accident a “stroke” most often caused by plugging
up (infarction) of a brain artery, leading to death of surrounding construct validity a type of validity that refers to the
brain tissue. appropriateness of test-based inferences about the underlying
construct purportedly measured by the test.
cerebrum the most substantial part of the brain, consisting of the
two hemispheres that each contain four lobes. constructional dyspraxia impairment of the ability to deal
with spatial relationships either in a two- or three-dimensional
certification testing to determine that a person has at least a
framework.
minimum proficiency in some discipline or activity.
consumer psychology the branch of industrial/organizational
classical theory of measurement the dominant theory in psychology that deals with the development, advertising, and
psychological testing; the theory assumes that an observed score marketing of products and services.
consists of a true score plus measurement error.
content validity the type of validity that is determined by the
classification in testing, the process of using tests to assign a person degree to which the questions, tasks, or items on a test are
to one category rather than another. representative of the universe of behavior the test was designed
clerical scoring error in testing, an error in test scoring related to to sample.
the mechanics of scoring, such as adding subscores incorrectly or contextual intelligence in Sternberg’s theory, the mental activity
consulting the wrong conversion table. involved in purposive adaptation to, shaping of, and selection of
clinical judgment the kind of judgment in which the decision real-world environments relevant to one’s life.
maker processes information in his or her head to diagnose or contingency management procedure an approach to behavior
predict behavior. therapy in which the therapist identifies and alters the
coaching in testing, the attempt to boost test scores by providing consequences of unwanted behaviors.
the examinee with extra practice on testlike materials, review of convergent validity a type of validity that is demonstrated when
fundamental concepts likely to be covered by the test, and advice a test correlates highly with other variables or tests with which it
about optimal test-taking strategies. shares an overlap of constructs.
coding complexity in observational rating situations, the use of too corpus callosum the major commissure that serves to integrate the
many categories, or ill-defined categories, which leads to low inter- functions of the two cerebral hemispheres.
rater reliability.
correction for guessing in group testing, the practice of revising a
coefficient alpha an index of reliability that may be thought of subject’s final score in light of apparent guessing.
as the mean of all possible split-half coefficients, corrected by the
Spearman-Brown formula. correlation coefficient a numerical index of the degree of linear
relationship between two sets of scores; correlation coefficients can
cognitive behavior therapy an approach to behavior change that vary between 21.00 and 11.00.
emphasizes changing the client’s belief structure.
correlation matrix a complete table of intercorrelations between all
collaboratory Internet-based arrangements that facilitate the the variables that is the beginning point of factor analysis.
collaboration of test specialists, regardless of geographical
cranial nerves 12 paired neural tracts that help govern basic
location.
sensory and motor functions such as vision, smell, facial
competency to stand trial the determination by the presiding judge movement, taste, and hearing.
that a defendant does not have a mental defect, illness, or condition
creativity test a test that assesses the ability to produce new ideas,
that renders him or her unable to understand the proceedings or to
insights, or artistic creations that are accepted as being of social,
assist in his or her defense.
aesthetic, or scientific value.
componential intelligence in Sternberg’s theory, the internal
criterion contamination a source of error in test validation when
mental mechanisms that are responsible for intelligent behavior.
the criterion is “contaminated” by its artificial commonality with
computer-assisted psychological assessment CAPA refers to the the test, such as test and criterion contain nearly identical items.
entire range of computer applications in psychological assessment Also, a form of evaluation error in which a criterion measure
and includes testing, scoring, report writing, and individualized includes factors that are not demonstrably part of the job, for
test administration. example, rating appearance when it is not job related.
computer-based test interpretation CBTI refers to test criterion-keyed approach a test development approach in which
interpretation and report writing by computer, which is a major test items are assigned to a particular scale if, and only if, they
component of computer-assisted psychological assessment (CAPA). discriminate between a well defined criterion group and a relevant
control group.
computerized adaptive testing a family of procedures that allows
for accurate and efficient measurement of ability; individualized criterion problem the difficult problem of conceptualizing and
testing continues until a predetermined level of measurement measuring work performance constructs which are often complex,
precision is reached. fuzzy, and multidimensional.
Glossary 393

criterion-referenced test a test in which the objective is to executive functions brain functions that include logical analysis,
determine where the examinee stands with respect to very tightly conceptualization, reasoning, planning, and flexibility of thinking.
defined educational objectives; no comparison is made to the
executive system likened to “software” in the information-
performance of other examinees.
processing approach to intelligence, the executive system refers to
criterion-related validity the type of validity that is demonstrated environmentally learned components that steer problem solving
when a test is shown to be effective in estimating an examinee’s and provide overall guidance.
performance on some outcome measure.
expectancy table a table that portrays the established relationship
critical incidents checklist a form of performance evaluation between test scores and expected outcome on a relevant task.
based upon actual episodes of desirable and undesirable on-the-job
experiential intelligence in Sternberg’s theory, the ability to deal
behavior.
effectively with novel tasks.
cross-sectional design a research design in which subjects of
expert rankings a scaling method that relies upon the judgment of
different ages are tested at one point in time.
experts to determine the rankings for individual components.
cross-sequential design a research design that combines cross-
expert witness in court cases, a witness whom the judge deems
sectional and longitudinal methods.
qualified to testify about a proper subject matter.
cross-validation for predictive tests, the practice of using the
extravalidity concerns the side effects and unintended
original regression equation in a new sample to determine whether
consequences of testing.
the test predicts the criterion as well as it did in the original sample.
extraversion a sociable, outgoing, excitement-seeking personality
crystallized intelligence in Cattell and Horn’s theory, what one
disposition.
has already learned through the investment of fluid intelligence in
cultural settings (e.g., learning algebra in school). extrinsic religious expression the use of religion for external goals
such as security, status, and friendship.
culture-fair test a test designed to minimize irrelevant influences of
cultural learning and social climate and thereby produce a cleaner face validity for tests, the appearance of validity to test users,
separation of natural ability from specific learning. examiners, and especially the examinees; not a technical form of
validity, but important for the social acceptability of a test.
custody evaluation in divorce cases, the psychological evaluation
of a child (or children) and both parents so as to offer an opinion to factor an underlying construct or variable that helps explain the
the court as to the best interests of the child (or children) in custody correlations between several tests or measures.
arrangements. factor analysis a family of statistical procedures that researchers
decision theory an approach to psychological measurement use to summarize relationships among variables that are correlated
that considers the costs and benefits of test-based decisions, for in highly complex ways; the goal of factor analysis is to derive a
example, in personnel selection. parsimonious set of derived factors.
defense mechanisms unconscious mental strategies available to the factor loading in factor analysis, the correlation between an
ego in dealing with the conflicting demands of id, superego, and individual test and a single factor.
external reality. factor matrix a table of correlations between variables and factors;
diagnosis determining the nature and source of a person’s the correlations are called factor loadings.
abnormal behavior, and classifying the behavior pattern within an false negatives in decision theory, a subject who is incorrectly
accepted diagnostic system. predicted to fail on the criterion.
discriminant validity a type of validity that is demonstrated false positives in decision theory, a subject who is incorrectly
when a test does not correlate with variables or tests from which it predicted to succeed on the criterion.
should differ.
fear survey schedule a behavioral assessment device which
divergent production the creation of numerous appropriate requires respondents to indicate the presence and intensity of their
responses to a single stimulus situation. fears in relation to various stimuli, typically on a 5- or 7-point
divergent thinking the kind of thinking that goes off in different Likert scale.
directions. fetal alcohol effect a subtle version of fetal alcohol syndrome in
Durham rule the legal provision for the defense of insanity if the which physical abnormalities are not observed, but behavioral
criminal act was a “product” of mental disease or defect; dropped problems such as attentional difficulties are noted.
in 1972 and replaced by the Model Penal Code. fetal alcohol syndrome a cluster of physical and behavioral
duty to warn stemming from the Tarasoff case, the responsibility of abnormalities, including mental retardation, caused by the
clinicians to communicate any serious threat to the potential victim, mother’s drinking of alcohol during pregnancy.
law enforcement agencies, or both. fluid intelligence in Cattell and Horn’s theory, a largely nonverbal
dysarthria slurred, hesitant speech (not drug or alcohol induced) and relatively culture-reduced form of mental efficiency.
that often signifies damage to the cerebellum. forced-choice method in personality test development, an item-
ecological momentary assessment using wireless technology to writing method in which the alternatives are matched for social
measure patient experience (e.g., pain, fatigue, mood) in the real desirability.
world at the point of experience. forced-choice scale a performance evaluation scale designed to
ego in psychoanalytic theory, the conscious self that mediates eliminate bias and subjectivity in supervisor ratings by forcing a
between the id and reality. choice between options that are equal in social desirability.
equilibration in Piaget’s theory, the entire process of assimilation, forebrain the large, outermost portion of the brain consisting of
accommodation, and equilibrium. the cerebral cortex and underlying structures such as the corpus
callosum, basal ganglia, limbic lobe, thalamus, and hypothalamus.
evidence-based assessment evaluation of a testing tool not only
by means of the standard psychometric indices of reliability and freedom from distractibility the third factor on the WISC-III
validity but also through considerations of clinical utility. consisting of Arithmetic and Digit Span.
394 Glossary

frequency distribution a method of summarizing data or test id in psychoanalytic theory, the unconscious part of personality
scores by specifying a small number of usually equal-sized class that is the seat of all instinctual needs such as for food, water,
intervals and then tallying how many scores fall within each sexual gratification, and avoidance of pain.
interval.
illusory validation in projective testing, the finding that subjects
frequency polygon a method of summarizing data or test scores ignore disconfirming instances and cling to their preexisting
in graphic form; similar to a histogram, except that the frequency stereotypes.
of the class intervals is represented by single points rather than
implicit association test a covert measure of attitudes that makes
columns.
use of automatic or “unconscious” associations to target concepts
frontal lobe the part of the cerebral cortex at the front of the brain (e.g., racial groups) as determined by sophisticated reaction time
that is required for the programming, regulation, and verification of analyses.
executive functions and motor performance.
in-basket technique a realistic work sample test that simulates the
frustration in Rosenzweig’s system, the state that occurs whenever work environment of an administrator.
an organism encounters an obstacle or obstruction en route to the
index of intellectual deterioration on the Shipley Institute of
satisfaction of a need.
Living Scale, an index based on the discrepancy between verbal
functionalist definition of validity the view that a test is valid if it and abstractions ability that was intended to gauge the effects of
serves the purpose for which it is used. organic brain impairment.
fundamental lexical hypothesis in personality theory, the notion individual achievement tests achievement tests administered one-
that trait terms have survived in language because they convey on-one to gauge achievement levels; these tests are essential in the
important information about our dealings with others. assessment of potential learning disabilities.
general factor according to Spearman, the single general factor of individual tests instruments which by their design and purpose
intelligence that must exist to account for the observed correlations must be administered one on one.
between a large number of tests.
informed consent in testing, the principle that test takers or
generalizability theory a domain sampling model of reliability that their representatives are made aware, in language that they can
recognizes several alternatives of generalization for test results. understand, of the purposes and likely consequences of testing.
gifted the designation of a person as gifted typically means that he insanity plea in court cases, a defense based upon reference to
or she has extraordinary ability in some area. legal insanity as spelled out by the Model Penal Code or other legal
glial cells cells that provide structural support for the neurons and statutes.
also supply nutrients and perform other functions. instructional validity a view promoted by court systems that
grade norm a type of standardization that depicts the level of test school districts must actually teach what it is they test for on state-
performance for each separate school grade in the normative sample. wide achievement tests.
graphic rating scale a scale that consists of trait labels, brief integrative model a model of career assessment in which
definitions of those labels, and a continuum for the rating. information from interest, ability, and personality domains is
considered simultaneously.
gray matter those parts of the brain that consist of densely packed
cell bodies of neurons that are gray in color. integrity test an instrument designed to screen potential employees
for theft-proneness and other undesirable qualities; overt integrity
group achievement tests also called educational achievement
tests contain questions about attitudes toward theft and items
tests, these instruments are commonly administered to dozens or
dealing with admission of theft and other illegal activities.
hundreds of students at the same time to gauge achievement levels
in one or more well-defined academic domains. intelligence according to experts, (1) the capacity to learn from
group tests mainly pencil-and-paper measures suitable to the experience and (2) the capacity to adapt to one’s environment.
testing of large groups of persons at the same time. intelligence test although there are exceptions, an intelligence test
guilty but mentally ill (GBMI) a verdict allowed in some states in generally yields an overall summary score based on results from a
which the intention is for the accused to begin his or her sentence in heterogeneous sample of items (e.g., verbal skills, reasoning, spatial
a psychiatric hospital. thinking).

halo effect the tendency to rate an employee high or low on all interest inventory a test that measures the preference for certain
dimensions because of a global impression. activities or topics and thereby helps determine occupational
choice.
heritability index an estimate of how much of the total variance in a
given trait is due to genetic factors; the index can vary from 0.0 to 1.0. interscorer reliability for tests that involve judgmental scoring, the
typical degree of agreement between scorers.
hindbrain the lowest, most simply organized, brain structures; the
hindbrain consists of the myelencephalon and metencephalon. interval scale a measurement scale that provides information about
ranking and the relative strength of ranks; based on the assumption
hippocampus part of a complex, ill-defined memory circuit that of equal-sized units or intervals for the underlying scale.
consolidates new experiences into long-term memories.
intrinsic religious expression the use of religion for internal goals
histogram a method of summarizing data or test scores in graphic
such as finding meaning and direction in life.
form; a histogram contains the same information as a frequency
distribution. introversion a quiet, “bookish,” reserved personality disposition.
homogeneous scale a scale in which the individual items tend ipsative test a test in which the average of the subscales is always
to measure the same thing; homogeneity is gauged by item-total the same for every examinee; thus, for an individual examinee,
correlations. high scores on subscales must be balanced by low scores on other
subscales.
hypothalamus a small structure at the center of the brain that helps
govern motivated behavior and bodily regulation: feeding, sexual IQ constancy On the Wechsler tests, the axiomatic assumption
behavior, sleeping, temperature regulation, emotional behavior, and that IQ must remain constant with normal aging, even though raw
movement. intellectual ability might shift or decline.
Glossary 395

item-characteristic curve a graphical display of the relationship local norms norms derived from a representative local sample, as
between the probability of a correct response and the examinee’s opposed to a national sample.
position on the underlying trait measured by the test.
locus of control a construct that refers to perceptions that people
item-difficulty index for a single test item, the proportion of have about the source of things that happen to them (e.g., internal
examinees in a large tryout sample who get that item correct. versus external).
item-discrimination index a statistical index of how efficiently longitudinal design a research design in which the same subjects
an item discriminates between persons who obtain high and low are tested at several points in time.
scores on the entire test.
mean the arithmetic average of a group of scores.
item information function a graph portraying the relationship measurement error everything other than the true score that makes
between the trait level of examinees and the information provided up an examinee’s obtained test score.
by a test item.
median the middlemost score when all the scores in a sample have
item-reliability index siriT, the product of a test item’s internal been ranked.
consistency as indexed by the correlation with the total score (riT)
and its variability as indexed by the standard deviation (si). medulla oblongata part of the hindbrain that helps mediate
swallowing, vomiting, breathing, the control of blood pressure,
item response function a mathematical equation that describes respiration, and, partially, heart rate.
the relation between the amount of a latent trait an individual
possesses and the probability that he or she will give a designated memory a complex and multifaceted phenomenon that allows for
response to a test item designed to measure that construct. the recall of previously learned information and skills.

item response theory also known as latent trait theory, a modern meninges a thin layering of three tough membranes that encase
framework for test construction in which the psychometrician the brain and spinal cord, providing protection against external
posits a single dimension of skill or underlying trait on which all buffeting.
of the test items rely; each respondent is assumed to have a certain mental retardation significantly subaverage general intellectual
amount of the latent trait being measured. functioning resulting in or associated with concurrent impairments
item-validity index siriC consists of the product of a test item’s in adaptive behavior and manifested during the developmental
standard deviation (si) and the point-biserial correlation with the period.
criterion riC. mental state at the time of the offense (MSO) the mental state of a
job analysis the process of defining a job in terms of the behaviors defendant at the time of the offense is relevant in special pleadings
necessary to perform it; includes job description (physical such as the insanity defense; psychologists and psychiatrists may
characteristics of the work) and job specification (personal offer opinions as to the MSO of defendants.
characteristics needed). method of absolute scaling a procedure for obtaining a measure of
kappa the index of inter-rater agreement, corrected for chance, absolute item difficulty based upon results for different age groups
used as one measure of the reliability of diagnostic systems and of test takers.
rating scales. method of empirical keying a scale development method in which
Kuder–Richardson formula 20 an index of reliability that is test items are selected based entirely on how well they contrast a
relevant to the special case where each test item is scored 0 or 1 criterion group from a normative sample.
(e.g., right or wrong). method of equal-appearing intervals a method for constructing
Lake Wobegon effect the observation that virtually all states of interval-level scales from attitude statements.
the union claim that average achievement scores for their school method of rational scaling a scale construction method in which
systems exceed the 50th percentile. all scale items correlate positively with each other and also with
latent trait theory a modern framework for test construction in the total score for the scale; also known as the internal consistency
which a single dimension of skill or underlying trait is posited. See approach.
item response theory. midbrain the middle portion of the brain consisting of cranial
learning disability an indistinct concept that typically refers nerves and relay stations for vision and hearing.
to a severe discrepancy between general ability and individual mixed-standard scale a complex approach to performance
achievement that cannot be explained by sensory/motor evaluation designed to minimize rating errors in performance
handicaps, mental retardation, emotional problems, or cultural appraisal; items for performance dimensions are randomly ordered
deprivation. on the scale.
legally blind this term applies to individuals with central visual M’Naughten rule one of several standards of legal insanity;
acuity of 20/200 or less in the better eye (with correction) or to essentially, “the party accused was laboring under such a defect
those with significant reduction in their visual field to a diameter of reason, from disease of the mind, as not to know the nature and
of 20 degrees or less; used to determine eligibility for government quality of the act he was doing.…”
benefits.
mode the most frequently occurring score.
Lexile scale a measure of reading demand of a text, on a scale from
Model Penal Code rule a standard of legal insanity—“A person is
200 to 1,700, based on semantic difficulty (vocabulary) and syntactic
not responsible for criminal conduct if at the time of such conduct,
complexity (sentence).
as a result of mental disease or defect, he lacks substantial capacity
Likert scale a scale that presents the examinee with five responses either to appreciate the criminality [wrongfulness] of his conduct or
ordered on an agree/disagree or approve/disapprove continuum. to conform his conduct to the requirements of the law.”
limbic lobe a group of subcortical structures responsible for moral dilemma a brief story that involves a difficult moral choice
elaboration of emotion and the control of visceral activity. such as whether to steal to prolong someone’s life; used in the
limbic system a group of interconnected brain structures, located study of moral reasoning.
deep within the brain, and involved in olfaction, emotion, and motor cortex the strip of brain tissue located on the precentral
motivation. gyrus that is involved in bodily movement.
396 Glossary

multi-infarct dementia a form of vascular brain impairment in tremor; poverty and slowness of movement without paralysis; and
which the hardly noticeable individual effects of many small changes in posture and muscle tone.
infarcts or “strokes” accumulate over a number of years. percentile the percentage of persons in the standardization
multimedia the collective capacity of the modern computer to use sample who scored below a specific raw score; percentiles vary
still images, live video segments, music, tables, charts, animation, from 0 to 100.
and other approaches in an interactive format. perceptual organization the second factor on the WISC-III
multitrait-multimethod matrix a research design for assessing consisting of Picture Arrangement, Picture Completion, Block
convergent and discriminant validity that calls for the assessment Design, and Object Assembly.
of two or more traits by two or more methods. personal injury in personal injury lawsuits, attorneys may hire
neuropsychological tests tests and procedures with proven psychologists to testify as to the lifelong consequences of traumatic
sensitivity to the effects of brain damage. stress or acquired brain damage.
neuropsychology the study of the relationship between brain personality an inexplicit construct which is invoked to
function and behavior. explain behavioral consistency within persons and behavioral
distinctiveness between persons.
nominal scale a measurement scale in which the categories are
arbitrary and do not designate “more” or “less” of anything; the personality coefficient a term used to refer to the finding that the
simplest and lowest level of measurement. predictive validity of personality scales rarely exceeds .30.
nonverbal behavior the subtler forms of human communication personality test a test that measures the traits, qualities, or
contained in glance, gesture, body language, tone of voice, and behaviors that determine a person’s individuality; this information
facial expression. helps predict future behavior.
norm group a sample of examinees who are representative of the phallometry the assessment of sexual arousal by attaching a
population for whom the test is intended. flexible band around the penis of an examinee who views standard
slides and pictures.
norm-referenced test a test in which the performance of each examinee
is interpreted in reference to a relevant standardization sample. phrenology the discredited idea, attributed to Franz Joseph Gall
(1758–1828), that cranial “bumps” signify a prominence of certain
normal distribution a symmetrical, mathematically defined, bell-
mental faculties and personality traits.
shaped frequency distribution.
physiognomy the historical and discredited idea that we can judge
normal ogive the normal distribution graphed in cumulative form.
the inner character of people from their outward appearance,
normalized standard score a score obtained by a transformation especially the face.
that renders a skewed distribution into a normal distribution.
pineal body a pea-sized structure that sits at the center of the brain;
norms a summary of test results for a large and representative it secretes the hormone melatonin in a cyclic biological rhythm, but
group of subjects. its functions are not well understood.
not guilty by reason of insanity (NGRI) a verdict allowed in some placement in testing, the sorting of persons into different programs
states in which the defendant is found not guilty because his or her appropriate to their needs or skills.
criminal act was the result of mental disease or defect.
polygraph a device that monitors ongoing physiological responses,
oblique axes in factor analysis, the assumption that factors are including changes in breathing, pulse rate, blood pressure, and
correlated with one another, that is, not at right angles. perspiration; inaccurately referred to as a “lie detector.”
observer drift in observational rating situations, the tendency for positive psychological assessment the appraisal of what is right
an observer to become fatigued and less vigilant over time, thus with people, for example, evaluation of hope, creativity, wisdom,
failing to notice target behaviors when they occur. courage, forgiveness, humor, gratitude, and coping.
occipital lobe the part of the cerebral cortex at the rear of the brain positive psychology the scientific and practical pursuit of optimal
that contains the vision centers. human functioning.
occupational reinforcer patterns an evaluation of jobs in terms of power test a test that allows enough time for test takers to attempt
the worker-perceived reinforcers that are present or absent. all items; however, the test is difficult enough that no test taker is
able to obtain a perfect score.
operational definition a definition of a concept in terms of the way
it is measured, such as, intelligence is “what the tests test.” predictive validity a type of criterion-related validity in which the
criterion measures are obtained in the future, usually months or
ordinal scale a measurement scale that allows for ranking; ordinal
years after the test scores are obtained, such as when college grades
scales do not provide information about the relative strength of
are predicted from an entrance exam.
ranking.
primary mental abilities the seven group factors of intelligence
orthogonal axes in factor analysis, the assumption that the factors
posited by Thurstone.
are at right angles to one another, which means that they are
uncorrelated. processing speed the fourth factor on the WISC-III consisting of
Coding and Symbol Search.
overt integrity test an employment test that seeks to assess
attitudes toward theft; these instruments may also contain a section projective hypothesis the assumption that personal interpretations
dealing with overt admissions of theft. of ambiguous stimuli must necessarily reflect the unconscious
needs, motives, and conflicts of the examinee.
paralinguistics the nonverbal aspects of speech such as tone of
voice and rate of speaking. projective test a test in which the examinee encounters vague,
ambiguous stimuli and responds with his or her own constructions.
parietal lobe the part of the cerebral cortex that mediates spatial
integration and sensory awareness of what is happening on the psychometrician a specialist in psychology or education who
surface of the body. develops and evaluates psychological tests.
Parkinson’s disease a degenerative brain disease characterized by psychophysics the empirical study of the functional relationship
three types of motor disturbance: involuntary movement, including between physical stimuli and mental phenomena.
Glossary 397

Public Law 93-112 a “Bill of Rights” for persons with disabilities RIASEC model a theory of person–environment types that
that outlawed discrimination based upon disability. proposes six themes: Realistic, Investigative, Artistic, Social,
Enterprising, and Conventional (RIASEC).
Public Law 94-142 the Education for All Handicapped Children
Act that mandated that schoolchildren with disabilities receive rotation to positive manifold in factor analysis, a method of
appropriate assessment and educational opportunities. rotating the factor matrix that seeks to eliminate as many of the
negative factor loadings as possible.
Public Law 99-457 legislation that requires states to provide a free
appropriate public education to children ages 3 through 5 who rotation to simple structure in factor analysis, a method of
have disabilities. rotating the factor matrix that seeks to simplify the factor loadings
pupillometrics the measurement of pupil size to gauge interest in, so that each test has significant loadings on as few factors as
or pleasure in, the observed stimulus. possible.

Q-technique a technique for studying changes in self-concept and routing procedure in tests such as the Stanford-Binet: Fifth
other variables by the sorting of statements into a near-normal Edition, the first items or subtests administered for the purpose
distribution for assigned categories. of determining the appropriate starting points for subsequent
subtests.
qualified individualism in testing for selection, the ethical stance
that age, sex, race, or other demographic characteristics must not routing test an initial subtest used to determine the entry level for
be used, even if knowledge of these factors would improve the all remaining subtests; used with individual intelligence tests such
validity of selection. as the SB:FE.

quotas in testing for selection, the ethical stance that the best- savant an individual who has mental deficiencies and a highly
qualified candidates within definable subgroups should be selected developed talent in a single area such as art, rapid calculation,
in proportion to their representation in the population. memory, or music.

random sampling a selection strategy in which every subject has an schema in Piaget’s theory, an organized pattern of behavior or a
equal chance of being chosen. well-defined mental structure that leads to knowing how to do
something.
rapport in testing, a comfortable, warm atmosphere that serves to
motivate examinees and elicit cooperation. screening the use of quick and simple tests or procedures to
identify persons who might have special characteristics or needs.
Rasch Model named after the Danish mathematician Georg
Rasch, this mathematical model uses complex equations to predict self-efficacy in Bandura’s theory, the personal judgment of how
the probability of respondents at different skill levels correctly well one can execute courses of action required to deal with
answering test questions. prospective situations.

rater bias the tendency for supervisor ratings to be inaccurate self-monitoring a therapeutic approach in which the client chooses
because of leniency, severity, and other forms of evaluation the goals and actively participates in supervising, charting, and
errors. recording progress toward the end point(s) of therapy.

ratio scale a measurement scale that yields equal-sized units or semantic differential a rating technique in which the subject uses
intervals and that possesses a conceptually meaningful zero point; a seven-point continuum to rate a concept on a number of bipolar
the highest level of measurement. adjectives such as good-bad, strong-weak, active-passive.

raw score the most basic level of information provided by a sensitivity the ability of a test, expressed as a percentage, to
psychological test, for example, the number of questions answered accurately “rule in” or identify individuals who manifest a trait or
correctly. syndrome of interest.

reactivity of measurement the phenomenon in which the process simultaneous processing a form of information processing
of measurement (e.g., clients knowing that they are being observed characterized by the simultaneous execution of several different
and rated) changes what we seek to measure. mental operations.

real definition a definition that seeks to tell us the true nature of the situational exercise an assessment procedure in which the
thing being defined. prospective employee is asked to perform under circumstances that
are highly similar to the anticipated work environment.
regression equation an equation that describes the best-fitting
straight line for estimating the criterion from the test; the best- skewness the symmetry or asymmetry of a frequency distribution;
fitting line is one that minimizes the sum of the squared deviations positive skew indicates that scores are piled up at the low end and
from the line. negative skew indicates that scores are piled up at the high end.
reliability the attribute of consistency in measurement. social desirability response set the tendency of examinees to react
to the perceived desirability (or undesirability) of a test item rather
reliability coefficient the ratio of true score variance to the total
than responding accurately to its content.
variance of test scores.
social intelligence the capacity to understand other people and to
religion as Quest the view that complexity, doubt, and
relate effectively to them.
tentativeness are aspects of mature religious expression.
source traits the stable and constant sources of behavior that are
restriction of range a phenomenon in which the range on a
less visible than surface traits but more important in accounting for
variable is restricted, causing correlations with other variables to be
behavior.
artificially low.
Spearman-Brown formula a formula for adjusting split-half
response to intervention RTI is a relatively recent approach
correlations so that they reflect the full length of a scale.
to learning disabilities in school systems that stresses early
identification and lack of response to intervention as important specific factor according to Spearman, a factor of intelligence
factors in LD identification. specific to an individual test.
reticular formation a network of ascending and descending specificity the ability of a test, expressed as a percentage, to
nerve cell bodies and fibers that governs general arousal or accurately “rule out” or identify individuals who do not manifest a
consciousness. trait or syndrome of interest.
398 Glossary

speed test a timed test that contains items of uniform and generally table of specifications in test development, a table that lists the
simple level of difficulty; the time limit is strict enough that few exact number of items in relevant content areas; such a table also
subjects finish a speed test. specifies the precise number of items which must embody different
cognitive processes.
split-half reliability a form of reliability in which scores from
the two halves of a test (e.g., even items versus odd items) are technical manual in testing, the manual that summarizes the
correlated with one another; the correlation is then adjusted for technical data about a new instrument.
test length.
temporal lobe the part of the cerebral cortex involved in
standard deviation a statistical index that reflects the degree of processing of auditory sensations, long-term memory storage,
dispersion in a group of scores; the square root of the variance. and modulation of biological drives such as aggression, fear, and
sexuality.
standard error of measurement an index of measurement error
which indicates the extent to which an examinee’s score might vary teratogen a substance that crosses the placental barrier and causes
over a number of parallel tests. physical deformities in the fetus.
standard error of the difference a statistical index that can help test a standardized procedure for sampling behavior and
a test user determine whether, for an individual examinee, the describing it with categories or scores. In addition, most tests have
difference between scores on two tests or subtests is significant. norms or standards by which the results can be used to predict
other, more important, behaviors.
standard error of estimate SEest is the margin of error to be
expected in the predicted criterion score. test anxiety a constellation of phenomenological, physiological,
and behavioral responses that accompany concern about possible
standard of care the standard of care that is usual, customary, or
failure on a test.
reasonable.
test bias in popular usage, a test is biased if it discriminates
standard score a transformed score in which the original score is
unfairly against racial and ethnic minorities, women, and the poor;
expressed as the distance from the mean in standard deviation
technically, test bias refers to differential validity for definable,
units.
relevant subgroups of persons.
standardization fallacy the fallacious view that a test standardized
test fairness the extent to which the social consequences of test
on one population is ipso facto unfair when used in any other
usage are considered fair to relevant subgroups; a matter of social
population.
values, test fairness is especially pertinent when tests are used for
standardization sample a large and representative group of selection decisions.
subjects representative of the population for whom the test is
test of functional literacy a test that evaluates practical knowledge
intended.
and skills used in everyday life.
standardized procedure in testing, the attempt through
test-retest reliability a form of reliability in which the same test is
carefully written instructions to ensure that the procedures for
given twice to the same group of heterogeneous and representative
administering a test are uniform from one examiner and setting to
subjects; scores for the two sessions are then correlated.
another.
thalamus a key structure that provides sensory input and
stanine scale a scale in which all raw scores are converted to a
information about ongoing movement to the cerebral cortex; the
single-digit system of scores ranging from 1 to 9.
thalamus is the major relay station in the brain.
state anxiety the transitory feelings of fear or worry that most
token economy a behavioral approach in which many different
persons experience on occasion.
forms of prosocial behavior are rewarded with tokens that can be
sten scale a 10-unit scale with five units above and five units below later exchanged for material rewards or privileges.
the mean.
trait any relatively enduring way in which one individual differs
stereotype threat the threat of confirming, as self-characteristic, a from another.
negative stereotype about one’s group.
trait anxiety the relatively stable tendency of an individual to
stratified random sampling a selection strategy in which subjects respond anxiously to a stressful predicament.
are chosen randomly, with the constraint that the sample matches
true score an examinee’s hypothetical real score on a test; the
the population on relevant background variables such as race, sex,
true score can be estimated probabilistically, but is never directly
occupation, and so on.
known.
subgroup norms norms derived from an identified subgroup, as
T score a transformed score with mean of 50 and standard
opposed to a diversified national sample.
deviation of 10.
successive processing a form of information processing in which a
Type A coronary-prone behavior pattern a behavior pattern
proper sequence of mental operations must be followed.
consisting of insecurity of status, hyperaggressiveness, free-floating
superego in psychoanalytic theory, that part of personality that is hostility, and a sense of time urgency (hurry sickness).
roughly synonymous with conscience and comprises the societal
unqualified individualism in testing for selection, the ethical
standards of right and wrong that are conveyed to us by our
stance that, without exception, the best-qualified candidates
parents.
should be selected for employment, admission, or other
surface traits in Cattell’s theory, the more obvious aspects privilege.
of personality that typically emerge in the first stages of
user’s manual in testing, the manual that gives instructions
factor analysis when individual test items are correlated with
for administration and also provides guidelines for test
each other.
interpretation.
systematic measurement error a type of measurement error that
validity a test is valid to the extent that inferences made from it are
arises when, unknown to the test developer, a test consistently
appropriate, meaningful, and useful.
measures something other than the trait for which it was
intended. validity coefficient the correlation between test and criterion (rxy).
Glossary 399

validity shrinkage the common discovery in cross-validation visual agnosia a difficulty in the recognition of drawings, objects,
research that a test predicts the relevant criterion less accurately or faces caused by brain damage.
with the new sample of examinees than with the original tryout Wernicke’s aphasia also known as receptive aphasia, a form of
sample. language disturbance in which speech is fluent but meaningless,
value according to Rokeach and others, a shared and enduring presumably because language comprehension is impaired.
belief about ideal modes of behavior or end states of existence.
white matter those parts of the brain that consist of axons wrapped
variance a statistical index that reflects the degree of dispersion in a in a white, fatty substance called the myelin sheath.
group of scores.
work sample an assessment procedure that uses a miniature replica
ventricles fluid-filled caverns within the brain. of the job for which examinees have applied.
verbal comprehension the first factor on the WISC-III consisting of work values the needs, motives, and values that influence
Information, Similarities, Vocabulary, and Comprehension. vocational choice, job satisfaction, and career development.
virtual reality the use of sophisticated computer images projected
to wrap-around goggles to portray a moving, changing, three-
dimensional environment.
References
Aamodt, M. G., Keller, R., Crawford, K., & Kimbrough, W. (1981). Allport, G. W., & Odbert, H. (1936). Trait names, a psycholexical
A critical incident job analysis of the university housing resident study. Psychological Monographs, 47 (Whole No. 211).
assistant position. Psychological Reports, 49, 983–986.
Allport, G. W., & Ross, J. (1967). Personal religious orientation and
Abel, E. L. (1995). An update on incidence of FAS: FAS is not an equal prejudice. Journal of Personality and Social Psychology, 5, 432–443.
opportunity birth defect. Neurobehavioral Toxicology, 17, 437–443.
Altepeter, T. S. (1989). The PPVT-R as a measure of psycholinguistic
Abel, E. L. (2009). Fetal alcohol syndrome: Same old, same old. functioning: A caution. Journal of Clinical Psychology, 45, 935–941.
­Addiction, 104, 1274–1275.
Altepeter, T. S., & Johnson, K. A. (1989). Use of the PPVT-R
Abell, S. C., Briesen, P., & Watz, L. (1996). Intellectual evaluations of for intellectual screening with adults: A caution. Journal of
children using human figure drawings: An empirical investigation Psychoeducational Assessment, 7, 39–45.
of two methods. Journal of Clinical Psychology, 52, 67–74.
Alzheimer'.s Disease and Related Disorders Association. (2000).
Achenbach, T. M. (1991). Manual for the Teacher's Report Form and General statistics/demographics. Chicago: Author.
1991 Profile. Burlington: University of Vermont, Department of
Amabile, T. M. (1983). The social psychology of creativity. New York:
­Psychiatry.
Springer-Verlag.
Achenbach, T. M. (1992). Manual for the Child Behavior Checklist/2–3
Ambrosini, P. J. (2000). Historical development and present status of
and 1992 Profile. Burlington: University of Vermont, Department of
the schedule for affect disorders and schizophrenia for school-age
Psychiatry.
children (K-SADS). Journal of the American Academy of Child and
Achenbach, T. M., & Rescorla, L. A. (2000). Manual for the ASEBA Adolescent Psychiatry, 39, 49–58.
preschool forms and profiles. Burlington: University of Vermont,
American Association for Counseling and Development. (1988).
Research Center for Children, Youth, and Families.
Ethical standards. Washington, DC: Author.
Adams, G. A., Elacqua, T., & Colarelli, S. (1994). The employment
American Educational Research Association, American Psychological
interview as a sociometric selection technique. Journal of Group
Association, & National Council on Measurement in Education.
Psychotherapy, Psychodrama, and Sociometry, 47 (Fall), 99–113.
(1985). Standards for educational and psychological testing. Washington,
Adams, K. M., & Heaton, R. K. (1985). Automated interpretation DC: American Psychological Association.
of neuropsychological test data. Journal of Consulting and Clinical
American Educational Research Association, American
Psychology, 53, 790–802.
Psychological Association, & National Council on Measurement
Agbenyega, S., & Jiggetts, J. (1999). Minority children and their ­over- in Education. (1999). Standards for educational and psychological
representation in special education. Education, 119, 619–633. testing (2nd ed.). Washington, DC: American Psychological
Aguinis, A., Culpepper, S. A., & Pierce, C. A. (2010). Revival of Association.
test bias research in preemployment testing. Journal of Applied American Federation of Teachers, National Council on Measurement
­Psychology, 95, 648–680. in Education, & National Education Association. (1990). Standards
Ahrens, J., Evans, R., & Barnett, R. (1990). Factors related to dropping for teacher competence in educational assessment of students.
out of school in an incarcerated population. Educational and Washington, DC: Author.
Psychological Measurement, 50, 611–617. American Psychiatric Association. (1994). Diagnostic and statistical
Aiken, L. R. (1989). Assessment of personality. Boston: Allyn and Bacon. manual of mental disorders (4th ed.). Washington, DC: Author.
Ainsworth, M., & Bowlby, J. (1965). Child care and the growth of love. American Psychiatric Association. (2000). Diagnostic and statistical
London: Penguin Books. manual of mental disorders (4th ed., text revision). Washington, DC:
Author.
Albers, C., & Grieve, A. (2007). Test Review: Bayley, N. (2006). Bayley
Scales of Infant and Toddler Development—Third Edition. San Antonio, American Psychological Association. (1953). Ethical standards of
TX—Harcourt Assessment. Journal of Psychoeducational Assessment, psychologists. Washington, DC: Author.
25, 180–190. American Psychological Association. (1986). Guidelines for computer-
Albert, S., Fox, H. M., & Kahn, M. W. (1980). Faking psychosis on based tests and interpretations. Washington, DC: Author.
the Rorschach: Can expert judges detect malingering? Journal of American Psychological Association. (1988). In the Supreme Court of
Personality Assessment, 44, 115–119. the United States: Clara Watson v. Fort Worth Bank & Trust. American
Alkhadher, O., Clarke, D., & Anderson, N. (1998). Equivalence and Psychologist, 43, 1019–1028.
predictive validity of paper-and-pencil and computerized adaptive American Psychological Association. (1992a). Ethical principles
formats of the Differential Aptitude Tests. Journal of Occupational and of psychologists and code of conduct. American Psychologist, 47,
Organizational Psychology, 71, 205–217. 1597–1611.
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. American Psychological Association. (1992b). Psychological testing of
Monterey, CA: Brooks/Cole. language minority and culturally different children. Washington, DC:
Allport, G. W. (1937). Personality: A psychological interpretation. New Author.
York: Holt, Rinehart and Winston. American Psychological Association. (1993). Guidelines for providers
Allport, G. W. (1950). The individual and his religion. New York: of psychological services to ethnic, linguistic, and culturally diverse
­Macmillan. populations. American Psychologist, 48, 45–48.

400
References 401

American Psychological Association. (1994). Report of the ethics Arvey, R. D., & Faley, R. H. (1988). Fairness in selecting employees.
committee, 1993. American Psychologist, 49, 659–666. Reading, MA: Addison-Wesley.
American Psychological Association. (2002). Ethical principles of Arvey, R. D., & Murphy, K. R. (1998). Performance evaluation in
psychologists and code of conduct. American Psychologist, 57, work settings. Annual Review of Psychology, 49, 141–168.
1060–1073.
Asher, J. J., & Sciarrino, J. A. (1974). Realistic work samples: A review.
American Psychological Association. (2012). Specialty guidelines for Personnel Psychology, 27, 519–533.
forensic psychology. American Psychologist, 68, 7–19.
Assel, M., & Anthony, J. (2009). Factor structure of the DIAL-3: A test
American Speech-Language-Hearing Association. (1991). Code of of a theory-driven conceptualization versus an empirically driven
ethics of the American Speech-Language Hearing Association. Rockville, conceptualization in a nationally representative sample. Journal of
MD: Author. Psychoeducational Assessment, 27, 113–124.
Ammer, C. (2003). The American Heritage dictionary of idioms. New Atkins v. Virginia. (2002). U.S. Supreme Court Cases, 536, 304–354.
York: Houghton Mifflin Harcourt. Retrieved from http://docs.justia.com/cases/supreme/536/304.
Anastasi, A. (1975). Review of the Goodenough-Harris Drawing Test. pdf
The seventh mental measurements yearbook. Lincoln: University of Atkinson, L., Bevc, I., Dickens, S., & Blackwell, J. (1992). Concurrent
Nebraska Press. validities of the Stanford-Binet (Fourth Edition), Leiter, and
Anastasi, A. (1985). Psychological testing (6th ed.). New York: Macmillan. Vineland with developmentally delayed children. Journal of School
Psychology, 30, 165–173.
Anastasi, A. (1986). Emerging concepts of test validation. Annual
Review of Psychology, 37, 1–15. Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917–1992.
Journal of Applied Psychology, 77, 836–874.
Anastasi, A. (1988). Psychological testing (6th ed.). New York:
Macmillan. Axelrod, B. N., Greve, K., & Goldman, R. (1994). Comparison of four
Wisconsin Card Sorting Test Scoring guides with novice raters.
Andersen, P., & Vandehey, M. A. (2011). Career counseling and Assessment, 1, 115–121.
development in a global economy (2nd ed.). Belmont, CA: Cengage
Learning. Aylward, G. P., & Carson, A. (2005, April 1). Use of the Test Observation
Checklist with the Stanford-Binet Intelligence Scales for Early
Andersson, H. W. (1996). The Fagan Test of Infant Intelligence: Childhood, Fifth Edition (Early SB5). Paper presented at the National
Predictive validity in a random sample. Psychological Reports, 78, Association of School Psychologists, Atlanta, GA.
1015–1026.
Bach, P. J., Harowski, K., Kirby, K., Peterson, P., & Schulein, M. (1981).
Andreasen, N. (2001). Brave new brain: Conquering mental illness in the The interrater reliability of the Luria-Nebraska Neuropsychological
era of the genome. New York: Oxford University Press. Battery. Clinical Neuropsychology, 3, 19–21.
Andreasen, N. C., & Black, D. (1995). Introductory textbook of psychiatry Baddeley, A. (1986). Working memory. Oxford: Clarendon Press/
(2nd ed.). Washington, DC: American Psychiatric Press. Oxford University Press.
Andrew, D. M., Peterson, D. G., & Longstaff, H. P. (1979). Minnesota Baer, D. M., Harrison, R., Fradenburg, L., Petersen, D., & Milla, S.
Clerical Test Manual. San Antonio, TX: The Psychological Corporation. (2005). Some pragmatics in the valid and reliable recording of
Andrews, F. M. (1975). Social and psychological factors which directly observed behavior. Research on Social Work Practice, 15,
influence the creative process. In I. A. Taylor & J. W. Getzels (Eds.), 440–451.
Perspectives in creativity. Chicago: Aldine. Bagby, R. M., Rogers, R., Buis, T., & Kalemba, V. (1994). Malingered
Ansorge, C. J. (1985). Review of the Cognitive Abilities Test. Ninth and defensive response styles on the MMPI-2: An examination of
mental measurements yearbook. Lincoln: University of Nebraska Press. validity scales. Assessment, 1, 31–38.
Anstey, K. J., Jorm, A. F., Reglade-Méslin, C., & others. (2007). Bailey, D., Larson, L., Borgen, F., & Gasser, C. (2008). Changing of the
Weekly alcohol consumption, brain atrophy, and white matter guard: Interpretive continuity of the 2005 Strong Interest Inventory.
hyperintensities in a community-based sample aged 60 to 64 years. Journal of Career Assessment, 16, 135–155.
Psychosomatic Medicine, 68, 778–785. Baker, C., Koenig, A., & Sowell, V. (1995). Relationship of the Blind
Anthony, J. C., LeResche, L., Niaz, U., Von Korff, M., & Folstein, Learning Aptitude Test to Braille reading skills. Journal of Visual
M. (1982). Limits of the Mini-Mental State as a screening test for Impairment & Blindness, 89, 440–447.
dementia and delirium among hospital patients. Psychological Baker, F. B. (2001). The basics of item response theory (2nd ed.).
Medicine, 12, 397–408. College Park, MD: ERIC Clearing House on Assessment and
Anthony, J., & Assel, M. (2007). A first look at the validity of the Evaluation.
DIAL-3 Spanish version. Journal of Psychoeducational Assessment, 25, Balboni, G., Pedrabissi, L., Molteni, M., & Villa, S. (2001).
165–179. Discriminant validity of the Vineland Scales: Score profiles of
APA Task Force. (2006). Evidence-based practice in psychology. individuals with mental retardation and a specific disorder.
American Psychologist, 61, 271–285. American Journal of Mental Retardation, 106, 162–172.
Arizona Senate Research Staff. (2008, August 27). Arizona State Senate Ballard, J., & Zettel, J. (1977). Public Law 94-142 and Sec. 504: What
issue brief: AIMS (Arizona Instrument to Measure Standards). Phoeniz, they say about rights and protections. Exceptional Children, 44,
AZ: Author. 177–185.
Arnau, R. C., Meagher, M. W., Norris, M. P., & Bramson, R. (2001). Baltes, P. B., Reese, H., & Nesselroade, J. (1977). Life-span
Psychometric evaluation of the Beck Depression Inventory-II with developmental psychology: Introduction to research methods. Belmont,
primary care medical patients. Health Psychology, 20, 112–119. CA: Wadsworth.
Arvey, R. D., & Campion, J. E. (1982). The employment interview: A Bandura, A. (1965). Vicarious processes: A case of no-trial learning. In
summary and review of recent research. Personnel Psychology, 35, L. Berkowitz (Ed.), Advances in experimental social psychology (vol. 2).
281–332. New York: Academic Press.
402 References

Bandura, A. (1971). Social learning theory. Morristown, NJ: General Batey, M. (2007). A psychometric investigation of everyday creativity.
Learning Press. Unpublished doctoral dissertation, University College, London.
Bandura, A. (1977). Social learning. Englewood Cliffs, NJ: Prentice Batey, M., & Furnham, A. (2006). Creativity, intelligence, and
Hall. personality: A critical review of the scattered literature. Genetic,
Bandura, A. (1982). Self-efficacy mechanism in human agency. Social, and General Psychology Monographs, 132, 355–429.
American Psychologist, 37, 122–147. Batson, C. D., Schoenrade, P., & Ventis, W. (1993). Religion and the
Bandura, A. (1997). Self-efficacy: The exercise of control. New York: individual: A social-psychological perspective. New York: Oxford
Freeman. University Press.

Bandura, A. (2006). Guide for constructing self-efficacy scales. In Bausell, R. B. (1986). A practical guide to conducting empirical research.
T. Urdan & F. Pajares (Eds.), Self-efficacy beliefs of adolescents (pp. New York: Harper & Row.
307–337). Greenwich, CT: Information Age Publishing. Bayless, J. D., Varney, N. R., & Roberts, R. J. (1989). Tinker Toy Test
Bandura, A., & Walters, R. H. (1963). Social learning and personality performance and vocational outcome in patients with closed-head
development. New York: Holt, Rinehart and Winston. injuries. Journal of Clinical and Experimental Neuropsychology, 11,
913–917.
Barber, M., & Stott, D. (2004). Validity of the Telephone Interview for
Cognitive Status (TICS) in post-stroke subjects. International Journal Bayley, N. (1969). Bayley Scales of Infant Development. San Antonio, TX:
of Geriatric Psychiatry, 19, 75–79. The Psychological Corporation.

Barkley, R. A. (1996). Attention-deficit/hyperactivity disorder. In E. Bayley, N. (2006). Bayley Scales of Infant and Toddler Development—
J. Mash & R. A. Barkley (Eds.), Child psychopathology (pp. 63–112). Third Edition. San Antonio, TX: Harcourt Assessment.
New York: Guilford. Beck, A. T. (1976). Cognitive therapy and the emotional disorders. New
Barlow, D. (2005). What'.s new about evidence-based assessment? York: New American Library.
Psychological Assessment, 17, 308–311. Beck, A. T. (1983). Negative cognitions. In E. Levitt, B. Lubin, & J.
Barnett, W. S., & Camilli, G. (2002). Compensatory preschool Brooks (Eds.), Depression: Concepts, controversies, and some new facts
education, cognitive development, and “race.” In J. Fish (Ed.), Race (2nd ed.). Hillsdale, NJ: Erlbaum.
and intelligence: Separating science from myth. Mahwah, NJ: Erlbaum. Beck, A. T. (1987). Cognitive models of depression. Journal of Cognitive
Bar-On, R. (1997). Bar-On Emotional Quotient Inventory: Technical Psychotherapy: An International Quarterly, 1, 5–37.
manual (EQ-i). Toronto, Canada: Multi-Health Systems. Beck, A. T., & Steer, R. A. (1987). Manual for the revised Beck Depression
Bar-On, R. (2000). Emotional and social intelligence: Insights Inventory. San Antonio, TX: The Psychological Corporation.
from the Emotional Quotient Inventory (EQ-i). In R. Bar-On & J. Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Manual for the Beck
Parker (Eds.), Handbook of emotional intelligence (pp. 363–388). San Depression Inventory-II. San Antonio, TX: The Psychological
Francisco: Jossey-Bass. Corporation.
Bar-On, R., & Parker, J. D. (2000). Bar-On Emotional Quotient Inventory: Beck, A. T., Steer, R. A., & Garbin, M. G. (1988). Psychometric
Youth version. North Tonawanda, NY: Multi-Health Systems properties of the Beck Depression Inventory: Twenty-five years of
Incorporated. evaluation. Clinical Psychology Review, 8, 77–100.
Barrett, L. F. (2009). The future of psychology: Connecting mind to Beck, A. T., Ward, C. H., Mendelsohn, M., Mock, J., & Erbaugh, J.
brain. Perspectives on Psychological Science, 4, 326–339. (1961). An inventory for measuring depression. Archives of General
Barrett, P. K. (2000). Validation of the Test of Nonverbal Intelligence- Psychiatry, 4, 561–571.
Third Edition (TONI-3) for Jamaican students. Unpublished Doctoral Behling, O. (1998). Employee selection: Will intelligence and consci-
Dissertation, Auburn University, Auburn, AL. entiousness do the job? Academy of Management Executive, 12, 77–86.
Barrick, M. R., Swider, B. W., & Stewart, G. L. (2010). Initial Beirne-Smith, M., Ittenbach, R. F., & Patton, J. R. (2002). Mental
evaluations in the interview: Relationships with subsequent retardation (6th ed.). Upper Saddle River, NJ: Merrill (Prentice Hall).
interviewer evaluations and employment offers. Journal of Applied
Belcher, M. J. (1992). Review of the Wonderlic Personnel Test. The
Psychology, 95(6), 1163–1172.
eleventh mental measurements yearbook. Lincoln: University of
Barron, F. (1953). An ego-strength scale which predicts response to Nebraska Press.
psychotherapy. Journal of Consulting Psychology, 17, 327–333.
Bell, L., & Casebourne, J. (2008). Increasing employment for ethnic
Barron, F. (1955). The disposition toward originality. Journal of minorities: A survey of research findings. London: Center for Economic
Abnormal and Social Psychology, 51, 478–485. and Social Inclusion.
Barron, F. (1968). Creativity and personal freedom. Princeton, NJ: Van Bell, N., Lassiter, K., Matthews, T., & Hutchinson, M. (2001).
Nostrand. Comparison of the Peabody Picture Vocabulary Test-Third Edition
Barron, F., & Harrington, D. M. (1981). Creativity, intelligence, and and Wechsler Adult Intelligence Scale-Third Edition with university
personality. Annual Review of Psychology, 32, 439–476. students. Journal of Clinical Psychology, 57, 417–422.
Barry, A. E. (2005). How attrition impacts the internal and external Bell, N., Matthews, T., Lassister, K., & Leverett, J. (2002). Validity of
validity of longitudinal research. Journal of School Health, 75, the Wonderlic Personnel Test as a measure of fluid or crystallized
267–270. intelligence: Implications for career assessment. North American
Journal of Psychology, 4, 113–120.
Bartol, C., & Bartol, A. (2004). Introduction to forensic psychology:
Research and application. Thousand Oaks, CA: Sage. Bellak, L. (1992). The Thematic Apperception Test, the Children'.s
Apperception Test, and the Senior Apperception Technique in clinical use
Bartsch, A. J., Homola, G., Biller, A., & others. (2007). Manifestations
(5th ed.). Orlando, FL: Grune & Stratton.
of early brain recovery associated with abstinence from alcoholism.
Brain, 130, 36–47. Bellak, L., & Bellak, S. S. (1991). Children'.s Apperception Test Manual
(CAT) (8th rev. ed.). Larchmont, NY: C. P. S.
Bate, A., Mathias, J., & Crawford, J. (2001). Performance on the Test of
Everyday Attention and standard tests of attention following severe Bellak, L., & Bellak, S. S. (1994). Children'.s Apperception Test Human
traumatic brain injury. Clinical Neuropsychologist, 15, 405–422. Figures (CAT-H) (11th ed.). Larchmont, NY: C. P. S.
References 403

Belsky, J., & Pluess, M. (2009). The nature (and nurture?) of plasticity Berry, D. J., Bridges, L. J., & Zaslow, M. J. (2004). Early childhood
in early human development. Perspectives on Psychological Science, 4, measures profiles. Washington, DC: Child Trends.
345–351. Bersoff, D. N. (1988). Should subjective employment devices be
Bem, D., & Funder, D. (1978). Predicting more of the people more scrutinized? Its elementary, my dear Ms. Watson. American
of the time: Assessing the personality of situations. Psychological Psychologist, 43, 1016–1018.
Review, 85, 485–501. Bertrand, J., Floyd, R., Weber, K., & others. (2004). National task force
Bender, L. (1938). A visual motor gestalt test and its clinical use. New on fetal alcohol syndrome and fetal alcohol effect. Fetal alcohol syndrome:
York: American Orthopsychiatric Association. Guidelines for referral and diagnosis. Atlanta, GA: Centers for Disease
Control and Prevention.
Bennett, G. K., Seashore, H. G., & Wesman, A. G. (1974). Fifth edition
manual for the Differential Aptitude Tests, Forms S and T. San Antonio, Bialik, C. (2010, September 4). Seven careers in a lifetime? Think
TX: The Psychological Corporation. twice, researchers say. Wall Street Journal.
Bennett, G. K., Seashore, H. G., & Wesman, A. G. (1982). Differential Bianchini, K., Etherton, J., Greve, K., Heinly, M., & Meyers, J. (2008).
Aptitude Tests: Administrator'.s handbook. San Antonio, TX: The Classification accuracy of MMPI-2 validity scales in the detection
Psychological Corporation. of pain-related malingering: A known-groups study. Assessment, 15,
435–449.
Bennett, G. K., Seashore, H. G., & Wesman, A. G. (1984). Differential
Aptitude Tests: Technical Supplement. San Antonio, TX: The Bickley, P. G., Keith, T. Z., & Wolfe, L. M. (1995). The three-stratum
Psychological Corporation. theory of cognitive abilities: Test of the structure of intelligence
across the life span. Intelligence, 20, 309–328.
Bennett, T. (1988). Use of the Halstead-Reitan Neuropsychological
Test Battery in the assessment of head injury. Cognitive Bilker, W. B., Hansen, J. A., Brensinger, C. M., & others. (2012).
Rehabilitation, 6, 18–25. Development of abbreviated nine-item forms of the Raven'.s
Standard Progressive Matrices Test. Assessment, 19, 354–369.
Ben-Porath, Y. S., & Butcher, J. N. (1989). Psychometric stability of
rewritten MMPI items. Journal of Personality Assessment, 53, 645–653. Binet, A., & Simon, T. (1905). Methodes nouvelles pour le diagnostic
du niveau intellectuel des anormaux. Annee Psychologique, 11,
Ben-Porath, Y. S., & Tellegen, A. (2008). MMPI-2-RF (Minnesota
191–244.
Multiphasic Personality Inventory-2 Restructured Form): Manual for
administration, scoring, and interpretation. Minneapolis: University of Blake, R. J., Potter, E., III, & Sliwak, R. (1993). Validation of the
Minnesota Press. structural scales of the CPI for predicting the performance of junior
officers in the U.S. Coast Guard. Journal of Business Psychology, 7,
Benson, D. F. (1988). Disorders of visual gnosis. In J. W. Brown (Ed.),
431–448.
The neuropsychology of visual perception. Hillsdale, NJ: Erlbaum.
Blin, Dr. (1902). Les debilites mentales. Revue de Psychiatrie, 8, 337–345.
Benson, D. F. (1994). The neurology of thinking. New York: Oxford
University Press. Bloch, A. (2002). Refugees'. opportunities and barriers in employment
and training, Research Report 179. Leeds, UK: Department for
Benson, P. G. (1985). Minnesota Importance Questionnaire. In D. J.
Work and Pensions.
Keyser & R. C. Sweetland (Eds.), Test critiques (vol. 2). Kansas City,
MO: Test Corporation of America. Block, J. (1961). The Q-sort method in personality assessment and
psychiatric research. Springfield, IL: Charles C. Thomas.
Benson, P., Donahue, M., & Erickson, J. (1993). The Faith Maturity
Scale: Conceptualization, measurement, and empirical validation. Block, J. (2008). The Q-Sort in character appraisal: Encoding subjective
In M. L. Lynn & D. O. Moberg (Eds.), Research in the social scientific impressions of persons quantitatively. Washington, DC: American
study of religion (vol. 5). Greenwich, CN: JAI Press. Psychological Association.
Benton, A., Hamsher, K., Rey, G., & Sivan, A. (1994). Multilingual Blum, G. (1950). The Blacky Pictures. New York: The Psychological
Aphasia Examination (3rd ed.). Iowa City, IA: AJA Associates. Corporation.
Benton, A., Sivan, A., Hamsher, K., Varney, N., & Spreen, O. (1994). Blumenthal, J. A. (1985). Review of Jenkins Activity Survey. In J. V.
Contributions to neuropsychological assessment (2nd ed.). New York: Mitchell, Jr. (Ed.). The ninth mental measurements yearbook (vol. 1).
Oxford University Press. Lincoln: Buros Institute of Mental Measurements of the University
of Nebraska-Lincoln.
Beran, T. (2007). Differential Ability Scales (2nd ed.). Canadian Journal
of School Psychology, 22, 128–132. Blustein, D. L. (2006). The psychology of working: A new perspective.
New York: Routledge.
Berg, E. A. (1948). A simple objective test for measuring flexibility in
thinking. Journal of General Psychology, 39, 15–22. Blustein, D. L., Kenna, A., Gill, N., & DeVoy, J. (2008). The psychology
of working: A new framework for counseling practice and public
Berger, S. G., Chibnall, J., & Gfeller, J. (1994). The Category Test: A
policy. The Career Development Quarterly, 56, 294–308.
comparison of computerized and standard versions. Assessment, 3,
255–258. Boake, C. (2002). From the Binet-Simon to the Wechsler-Bellevue:
Tracing the history of intelligence testing. Journal of Clinical and
Berk, R. A. (Ed.). (1984). A guide to criterion-referenced test construction.
Experimental Neuropsychology, 24, 383–405.
Baltimore: Johns Hopkins University Press.
Board of Trustees of the Society for Personality Assessment. (2005).
Bernreuter, R. G. (1931). The personality inventory. Stanford, CA:
The status of the Rorschach in clinical and forensic practice:
Stanford University Press.
An official statement by the Board of Trustees of the Society for
Bernstein, D. M., & Loftus, E. F. (2009). How to tell if a particular Personality Assessment. Journal of Personality Assessment, 85,
memory is true or false. Perspectives on Psychological Science, 4, 219–237.
370–374.
Boccaccini, M., Turner, D., & Murrie, D. (2008). Do some evaluators
Bernstein, I., & Nunnally, J. (1994). Psychometric theory. New York: report consistently higher or lower PCL-R scores than others?
McGraw-Hill. Findings from a statewide sample of sexually violent predator
Berry, C., Sackett, P., & Wiemann, S. (2007). A review of recent evaluations. Psychology, Public Policy, and Law, 14, 262–283.
developments in integrity test research. Personnel Psychology, 60, Boden, M. (2004). The creative mind: Myths and mechanisms (2nd ed.).
271–301. London: Routledge.
404 References

Boggs, D. H., & Simon, J. R. (1968). Differential effect of noise on tasks Braden, J. (1992). Intellectual assessment of deaf and hard of hearing
of varying complexity. Journal of Applied Psychology, 52, 148–153. people: A quantitative and qualitative research synthesis. School
Psychology Review, 21, 82–94.
Boggs, K. (1999). Campbell Interest and Skill Survey: Review and
critique. Measurement and Evaluation in Counseling and Development, Braden, J., & Hannah, J. (1998). Assessment of hearing impaired and
32, 168–182. deaf children with the WISC-III. In D. Saklofske & A. Prifitera (Eds.),
Use of the WISC-III in clinical practice. New York: Houghton Mifflin.
Bond, L. (1996). Norm- and criterion-referenced testing. Practical
Assessment, Research and Evaluation, [On-line journal], 5. Available: Bradley, K., Boyd-Wickizer, J., Powell, S., & Burman, M. (1998).
ericae.net. Review: Some alcohol screening tests have acceptable test
properties for use in general clinical populations of U.S. women.
Bonner, C. M. (1988). Utilization of spiritual resources by patients
Journal of the American Medical Association, 280, 166–171.
experiencing a recent cancer diagnosis. Unpublished master'.s thesis,
University of Pittsburgh. Bradley, R., Corwyn, R., Pipes McAdoo, H., & Garcia Coll, C. (2001).
The home environments of children in the United States Part I:
Bonner, M. F., Ash, S., & Grossman, M. (2010). The new classification
Variations by age, ethnicity, and poverty status. Child Development,
of primary progressive aphasia into semantic, logopenic, or
72, 1844–1867.
nonfluent/agrammatic variants. Current Neurology and Neuroscience
Reports, 10, 484–490. Bradley, R. H., & Caldwell, B. M. (1984). 174 children: A study
of the relationship between home environment and cognitive
Boring, E. G. (1923, June). Intelligence as the tests test it. New
development during the first 5 years. In A. W. Gottfried (Ed.), Home
Republic, 35–37.
environment and early cognitive development: Longitudinal research.
Boring, E. G. (1950). A history of experimental psychology (2nd ed.). Orlando, FL: Academic Press.
New York: Appleton-Century-Crofts.
Bradley, R. H., & Rock, S. L. (1985). The HOME Inventory: Its relation
Borkowski, J. (1985). Signs of intelligence: Strategy generalization to school failure and development of an elementary-age version.
and metacognition. In S. R. Yussen (Ed.), The growth of reflection in In W. K. Frankenburg, R. N. Emde, & J. W. Sullivan (Eds.), Early
children. Orlando: Academic Press. identification of children at risk. New York: Plenum.
Borman, W., Ilgen, D., Klimoski, R., & Weiner, I. (2003). Handbook of Bradley, R. H., Mundfrom, D., Whiteside, L., Case, P., & Barrett,
psychology, industrial and organizational psychology. San Francisco: K. (1994). A factor analytic study of the Infant-Toddler and Early
Jossey-Bass. Childhood versions of the HOME Inventory administered to white,
Bornstein, M. H. (1994). Infancy. In R. J. Sternberg (Ed.), Encyclopedia Black, and Hispanic American parents of children born preterm.
of human intelligence. New York: Macmillan. Child Development, 65, 880–888.

Bornstein, R. F., & Masling, J. M. (2005). Scoring the Rorschach: Seven Bradley, R. H., Rock, S. L., Caldwell, B. M., & Brisby, J. A. (1989). Use
validated systems. Mahwah, NJ: Erlbaum. of the HOME Inventory for families with handicapped children.
American Journal on Mental Retardation, 94, 313–330.
Boter, R., & Hoekstra-Vrolijk, S. (1994). ITVIC, an intelligence test
for visually impaired children. In A. Kooijman & P. Looijestijn Bradley-Johnson, S. (2001). Cognitive assessment for the youngest
(Eds.), Low vision: Research and new developments in rehabilitation children: A critical review of tests. Journal of Psychoeducational
(pp. 135–138). Amsterdam: IOS Press. Assessment, 19, 19–44.

Bouchard, T. J., Jr. (1994). Twin studies. In R. J. Sternberg (Ed.), Bradshaw, J. L., & Mattingley, J. B. (1995). Clinical neuropsychology:
Encyclopedia of human intelligence. New York: Macmillan. Behavioral and brain science. San Diego, CA: Academic Press.

Bouchard, T. J., Jr., Lykken, D., McGue, M., Segal, N., & Tellegen, A. Bradway, K. P. (1944). IQ constancy on the Revised Stanford-Binet
(1990). Sources of human psychological differences: The Minnesota from the preschool to the junior high school level. Journal of Genetic
Study of Twins Reared Apart. Science, 250, 223–228. Psychology, 65, 197–217.
Braithwaite, V., & Law, H. (1985). Structure of human values: Testing
Bowden, E., & Jung-Beeman, M. (2003). Normative data for 144
the adequacy of the Rokeach Value Survey. Journal of Personality and
compound remote associate problems. Behavior Research Methods,
Social Psychology, 49, 250–263.
Instruments & Computers, 35, 634–639.
Brannick, M. T., Michaels, C. E., & Baker, D. P. (1989). Construct
Bowers, T., & Pantle, M. (1998). Shipley Institute for Living Scale and
validity of in-basket scores. Journal of Applied Psychology, 74, 957–963.
the Kaufman Brief Intelligence Test as screening instruments for
intelligence. Assessment, 5, 187–195. Brannigan, G. G., & Decker, S. L. (2003). Bender Visual-Motor Gestalt
Test (2nd ed.). Itasca, IL: Riverside Publishing.
Bowling, A. (1997). Measuring health: A review of quality of life
measurement scales (2nd ed.). Buckingham, UK: Open University Brass, D. J., & Oldham, G. R. (1976). Validating an in-basket test
Press. using an alternative set of leadership scoring dimensions. Journal of
Applied Psychology, 61, 652–657.
Bowling, A. (2001). Measuring disease: A review of disease-specific
quality of life measurement scales (2nd ed.). Buckingham, UK: Open Brauer, B., Braden, J., Pollard, R., & Hardy-Braz, S. (1998). Deaf and
University Press. hard of hearing people. In J. Sandoval, C. Frisby, K. Geisinger, J.
Scheuneman, & J. Grenier (Eds.), Test interpretation and diversity.
Bowman, M. (1989). Testing individual differences in ancient China.
Washington, DC: American Psychological Association.
American Psychologist, 44, 576–578.
Brazelton, T. B., & Nugent, J. (1995). Neonatal Behavioral Assessment
Boyd, T. M., & Sauter, S. (1993). Route-finding: A measure of
Scale (3rd ed.). London: Cambridge University Press.
everyday executive functioning in the head-injured adult. Applied
Cognitive Psychology, 7, 171–181. Breaugh, J. A. (2009). The use of biodata for employee selection: Past
research and future directions. Human Resource Management Review,
Bracken, B. A., & Fagan, T. K. (1990). Guest editors'. introduction
19, 219–231.
to the conference “Intelligence: Theories and Practice.” Journal of
Psychoeducational Assessment, 8, 221–222. Bremner, J. D. (2005). Brain imaging handbook. New York: Norton.
Brackett, M., & Mayer, J. (2003). Convergent, discriminant, and Breslau, N. (1994). A gradient relationship between low birth weight
incremental validity of competing measures of emotional and IQ at age 6 years. Archives of Pediatric and Adolescent Medicine,
intelligence. Personality and Social Psychology Bulletin, 29, 1147–1158. 148, 377–383.
References 405

Breslau, N., Chilcoat, H., Susser, E., & others. (2001). Stability and Buss, A. (1997). Evolutionary perspectives on personality traits. In
change in children'.s Intelligence Quotient scores: A comparison of R. Hogan, J. Johnson, & S. Briggs (Eds.), Handbook of personality
two socioeconomically disparate communities. American Journal of psychology. San Diego, CA: Academic Press.
Epidemiology, 154, 711–717.
Buss, D. M. (2009). How can evolutionary psychology successfully
Breuer, J., & Freud, S. (1893–1895). Studies on hysteria. In J. Strachey explain personality and individual differences? Perspectives on
(Ed., in collaboration with A. Freud). The standard edition of the Psychological Science, 4, 359–366.
complete psychological works of Sigmund Freud (vol. 2). London:
Butcher, J. N. (1985). Introduction to the special series. Journal of
Hogarth, 1955.
Consulting and Clinical Psychology, 53, 746–747.
Brief, D. E., & Comrey, A. L. (1993). A profile of personality for a Butcher, J. N. (1993). The Minnesota Report user'.s guide. Minneapolis,
Russian sample: As indicated by the Comrey Personality Scales. MN: National Computer System.
Journal of Personality Assessment, 60, 267–284.
Butcher, J. N. (2005). MMPI-2: A practitioner'.s guide. Washington, DC:
Britt, G., & Myers, B. (1994). The effects of Brazelton intervention: American Psychological Association.
A review. Infant Mental Health Journal, 15, 278–292.
Butcher, J. N. (2011). A beginner'.s guide to the MMPI-2 (3rd ed.).
Brodal, A. (1981). Neurological anatomy (3rd ed.). New York: Oxford Washington, DC: American Psychological Association.
University Press.
Butcher, J. N. (Ed.). (1987). Computerized psychological assessment:
Brody, E. B., & Brody, N. (1976). Intelligence: Nature, determinants and A practitioner'.s guide. New York: Basic Books.
consequences. New York: Academic Press.
Butcher, J. N. (Ed.). (2000). Basic sources on the MMPI-2. Minneapolis,
Bromberg, W. (1959). The mind of man: A history of psychotherapy and MN: University of Minnesota Press.
psychoanalysis. New York: Harper & Row.
Butcher, J. N., & Williams, C. L. (1992). Essentials of MMPI-2 and
Brooks, B., Holdnack, J. A., & Iverson, G. L. (2011). Advanced MMPI-A interpretation. Minneapolis: University of Minnesota Press.
clinical interpretation of the WAIS-IV and WMS-IV: Prevalence of
low scores varies by level of intelligence and years of education. Butcher, J. N., & Williams, C. L. (2000). Essentials of MMPI-2 and
Assessment, 18, 156–167. MMPI-A interpretation. Minneapolis: University of Minnesota Press.

Brooks, B., Iverson, G., Holdnack, J., & Feldman, H. (2008). Potential Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer,
for misclassification of mild cognitive impairment: A study of B. (1989). Minnesota Multiphasic Personality Inventory-2: Manual for
memory scores on the Wechsler Memory Scale-III in healthy older administration and scoring. Minneapolis: University of Minnesota Press.
adults. Journal of the International Neuropsychological Society, 14, Butcher, J. N., Graham, J. R., Williams, C. L., & Ben-Porath, Y.
463–478. S. (1990). Development and use of the MMPI-2 content scales.
Brooks-Gunn, J., Klebanov, P., & Duncan, G. (1996). Ethnic differences Minneapolis: University of Minnesota Press.
in children'.s intelligence test scores: Role of economic deprivation, Butcher, J., Perry, J., & Atlis, M. (2000). Validity and utility of computer-
home environment, and maternal characteristics. Child Development, based test interpretation. Psychological Assessment, 12, 6–18.
67, 396–408.
Buxbaum, L. J., Dawson, A. M., & Linsley, D. (2012). Reliability
Brown, I. T., Chen, T., Gehlert, N. C., & Piedmont, R. L. (2012, October and Validity of the Virtual Reality Lateralized Attention Test in
8). Age and gender effects on the Assessment of Spirituality and Assessing Hemispatial Neglect in Right-Hemisphere Stroke.
Religious Sentiments (ASPIRES) Scale: A cross-sectional analysis. Neuropsychology, 26, 430–441.
Psychology of Religion and Spirituality [online publication].
Caldwell, B. M., & Bradley, R. H. (1984). Home observation for
Bruininks, R. H., Woodcock, R. W., Weatherman, R. F., & Hill, B. K. measurement of the environment. Little Rock: University of Arkansas
(1996). Scales of Independent Behavior-Revised, Interviewer'.s Manual. at Little Rock.
Allen, TX: DLM Teaching Resources.
Caldwell, B. M., & Bradley, R. H. (1994). Environmental issues in
Bruyere, S. M., & O'.Keeffe, J. (Eds.). (1994). Implications of the developmental follow-up research. In S. L. Friedman & H. C.
Americans with Disabilities Act for psychology. New York: Springer. Haywood (Eds.), Developmental follow-up: Concepts, domains, and
Buck, J. (1948). The H-T-P technique, a qualitative and quantitative methods. San Diego, CA: Academic Press.
scoring method. Journal of Clinical Psychology Monograph Supplement, Caldwell, B. M., & Richmond, J. (1967). Social class level and the
5, 1–120. stimulation potential of the home. In J. Hellmuth (Ed.), The
Buck, J. (1981). The House-Tree-Person technique: A revised manual. Los exceptional infant (vol. 1). Seattle, WA: Special Child Publications.
Angeles: Western Psychological Services. Callahan, L. A., McGreevy, M., Cirincione, C., & Steadman, H. (1992).
Bufford, R., & Parker, T., Jr. (1985). Religion and well-being: Concurrent Measuring the effects of the Guilty But Mentally Ill (GBMI) verdict.
validation of the Spiritual Well-Being Scale. Paper presented at the Law and Human Behavior, 16, 447–462.
annual meeting of the American Psychological Association, Los Campbell, C. D. (1988). Coping with hemodialysis: Cognitive
Angeles. appraisals, coping behaviors, spiritual well-being, assertiveness,
Bufford, R., Paloutzian, R., & Ellison, C. (1991). Norms for the and family adaptability and cohesion as correlates of adjustment
Spiritual Well-Being Scale. Journal of Psychology and Theology, 19, (Doctoral dissertation, Western Conservative Baptist Seminary,
56–70. 1983). Dissertation Abstracts International, 49, 538B.

Bullock, E., & Reardon, R. (2008). Interest profile elevation, Big Five Campbell, D. (2002). The history and development of the Campbell
personality traits, and secondary constructs on the Self-Directed Interest and Skill Survey. Journal of Career Assessment, 10, 150–168.
Search: A replication and extension. Journal of Career Assessment, 16, Campbell, D. P. (1971). Handbook for the Strong Vocational Interest
326–338. Blank. Stanford, CA: Stanford University Press.
Burke, H. R. (1958). Raven'.s Progressive Matrices: A review and Campbell, D. P. (1974). Manual for the Strong-Campbell Vocational
critical evaluation. Journal of Genetic Psychology, 93, 199–228. Interest Blank. Stanford, CA: Stanford University Press.
Buschke, H., & Fuld, P. A. (1974). Evaluating storage, retention, Campbell, D. P., Hyne, S., & Nilsen, D. (1992). Manual for the Campbell
and retrieval in disordered memory and learning. Neurology, 24, Interest and Skill Survey. Minneapolis, MN: National Computer
1019–1025. Systems.
406 References

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant Cattell, H. E. P., & Mead, A. D. (2008). The Sixteen Personality
validation by the multitrait-multimethod matrix. Psychological Factor Questionnaire (16PF). In G. J. Boyle, G. Matthews, & D.
Bulletin, 56, 81–105. H. Saklofske (Eds.), The SAGE handbook of personality theory and
Campbell, J. P., Gasser, M., & Oswald, F. (1996). The substantive nature assessment (vol. 2, pp. 135–159). Thousand Oaks, CA: SAGE
of job performance variability. In K. R. Murphy (Ed.), Individual Publishers.
differences and behavior in organizations. San Francisco: Jossey-Bass. Cattell, J. McK. (1890). Mental tests and measurements. Mind, 15,
Campbell, J., & McCord, D. (1996). The WAIS-R Comprehension and 373–380.
Picture Arrangement Subtests as measures of social intelligence: Cattell, R. (1950). Personality: A systematic theoretical and factual study.
Testing traditional interpretations. Journal of Psychoeducational New York: McGraw-Hill.
Assessment, 14, 240–249. Cattell, R. B. (1941). Some theoretical issues in adult intelligence
Campbell, J., Bell, S., & Keith, L. (2001). Concurrent validity of the testing. Psychological Bulletin, 38, 592 (abstract).
Peabody Picture Vocabulary Test-Third Edition as an intelligence Cattell, R. B. (1971). Abilities: Their structure, growth, and action. Boston:
and achievement screener for low SES African American children. Houghton Mifflin.
Assessment, 8, 85–94.
Cattell, R. B. (1973). Personality pinned down. Psychology Today, 7,
Campion, J. E. (1972). Work sampling for personnel selection. Journal 40–46.
of Applied Psychology, 56, 40–44.
Cautela, J. R. (1977). Behavioral analysis forms for clinical intervention.
Campion, M. A., Pursell, E. D., & Brown, B. K. (1988). Structured Champaign, IL: Research Press.
interviewing: Raising the psychometric properties of the
employment interview. Personnel Psychology, 41, 25–42. Ceci, S. (1996). On intelligence: A bio-ecological treatise on intellectual
development. (Expanded ed.). Cambridge, MA: Harvard University
Campione, J., & Brown, A. (1978). Toward a theory of intelligence: Press.
Contributions from research with retarded children. Intelligence, 2,
279–304. Ceci, S. J. (1994). Bioecological theory of intellectual development. In R. J.
Sternberg (Ed.), Encyclopedia of human intelligence. New York: Macmillan.
Canfield, A. A. (1951). The “sten” scale—A modified C-scale.
Educational and Psychological Measurement, 11, 295–297. Centers for Disease Control and Prevention. (2012). Alcohol use and
binge drinking among women of childbearing age—United States,
Cannell, J. J. (1988). Nationally normed elementary achievement 2006–2010. Morbidity and Mortality Weekly Report, 61, 534–538.
testing in America'.s public schools: How all 50 states are above the
national average. Educational Measurement: Issues and Practice, 7, 5–9. Chaffee, J. W. (1985). The thorny gates of learning in Sung China: A social
history of examinations. Cambridge: Cambridge University Press.
Capraro, R., & Capraro, M. (2002). Myers-Briggs Typica Indicator score
reliability across studies: A meta-analytic reliability generalization Chalmers, T. (1833). On the power, wisdom, and goodness of God
study. Educational and Psychological Measurement, 62, 590–602. as manifested in the adaptation of external nature to the moral and
intellectual constitution of man. London: William Pickering.
Carless, S. (2000). The validity of scores on the Multidimensional
Aptitude Battery. Educational and Psychological Measurement, 60, Chamberlin, J. (2009). How do you spot raw legal talent? Take this
592–603. test. Monitor on Psychology, 40(6), 12.

Carlson, C. F., Kula, M., & St. Laurent, C. (1997). Rorschach revised Chan, R. (2000). Attentional deficits in patients with closed head
DEPI and CDI with inpatient major depressives and borderline injury: A further study to the discriminative validity of the Test of
personality disorder with major depression: Validity issues. Journal Everyday Attention. Brain Injury, 14, 227–236.
of Clinical Psychology, 53, 51–58. Chan, R., & Lai, M. (2006). Latent structure of the Test of Everyday
Carpenter, M. B. (1991). Core text of neuroanatomy (4th ed.). Baltimore: Attention: Convergent evidence from patients with traumatic brain
Williams & Wilkins. injury. Brain Injury, 20, 653–659.

Carroll, D. (1988). How accurate is polygraph lie detection? In A. Chan, R., Lai, M., & Robertson, I. (2006). Latent structure of the Test
Gale (Ed.), The polygraph test: Lies, truth and science. London: Sage. of Everyday Attention in a non-clinical Chinese sample. Archives of
Clinical Neuropsychology, 21, 477–485.
Carroll, J. B. (1993). Human cognitive abilities. New York: Cambridge
University Press. Chapell, M. S., Blanding, Z. B., Silverstein, M. E., & others. (2005).
Test Anxiety and Academic Performance in Undergraduate and
Carson, S., Peterson, J. B., & Higgins, D. M. (2005). Reliability, validity Graduate Students. Journal of Educational Psychology, 97, 268–274.
and factor structure of the Creative Achievement Questionnaire.
Creativity Research Journal, 17, 37–50. Chapman, L. J., & Chapman, J. P. (1967). Genesis of popular but
erroneous psychodiagnostic observations. Journal of Abnormal
Carter, C., Mintun, M., Nichols, T., & Cohen, J. (1997). Anterior Psychology, 74, 271–280.
cingulate gyrus dysfunction and selection attention deficits in
schizophrenia. American Journal of Psychiatry, 154, 1670–1675. Chase, C. I. (1985). Review of the Torrance Tests of Creative Thinking.
Ninth mental measurements yearbook. Lincoln, NB: University of
Carver, C., & Scheier, M. (2002). Optimism. In C. R. Snyder & S. Nebraska Press.
Lopez (Eds.), The handbook of positive psychology (pp. 434–445). New
York: Oxford University Press. Cherpitel, C. (2002). Screening for alcohol problems in the U.S.
general population: Comparison of the CAGE, RAPS4, and
Carver, C., & Scheier, M. (2003). Optimism. In S. Lopez & C. R. RAPS4-QF by gender, ethnicity, and service utilitzation. Alcoholism:
Snyder (Eds.), Positive psychological assessment: A handbook of models Clinical and Experimental Research, 26, 1686–1691.
and measures. Washington, DC: American Psychological Association.
Chiaravalloti, N. D., & DeLuca, J. (2003). Assessing the behavioral
Cascio, W. F. (1976). Turnover, biographical data, and fair consequences of multiple sclerosis: An application of the Frontal
employment practice. Journal of Applied Psychology, 61, 576–580. Systems Behavior Scale (FrSBe). Cognitive and Behavioral Neurology,
Cascio, W. F. (1987). Applied psychology in personnel management 16, 54–67.
(3rd ed.). Englewood Cliffs, NJ: Prentice Hall. Chibnall, J., & Detrick, P. (2003). The NEO-PI-R, Inwald Personality
Cathers-Schiffman, T., & Thompson, M. (2007). Assessment of Inventory, and MMPI-2 in the prediction of police academy
English- and Spanish-speaking students with the WISC-III and performance: A case for incremental validity. American Journal of
Leiter-R. Journal of Psychoeducational Assessment, 25, 41–52. Criminal Justice, 27, 233–248.
References 407

Chin, C., Ledesma, H., Cirino, P., & others. (2001). Relation between Committee on Ethical Guidelines for Forensic Psychologists. (1991).
Kaufman Brief Intelligence Test and WISC-III scores of children Specialty guidelines for forensic psychologists. Law and Human
with RD. Journal of Learning Disabilities, 34, 2–8. Behavior, 15, 655–665.
Choi, H., & Proctor, T. (1994). Error-prone subtests and error types in Community Research Partners. (2007). School readiness assessment:
the administration of the Stanford-Binet Intelligence Scale: Fourth A review of the literature. Columbus, OH: Author.
Edition. Journal of Psychoeducational Assessment, 12, 165–171. Comrey, A. (1995). Career assessment and the Comrey Personality
Chung, J. (2009). Clinical validity of Fuld Object Memory Evaluation Scales. Journal of Career Assessment, 3, 140–156.
to screen for dementia in Chinese society. International Journal of Comrey, A. L. (1970). Manual for the Comrey Personality Scales. San
Geriatric Psychiatry, 24, 156–162. Diego, CA: EdITS.
Chung, J., & Ho, W. (2009). Validity of Fuld Object Memory Comrey, A. L. (1973). A first course in factor analysis. New York:
Evaluation for the detection of dementia in nursing home residents. Academic Press.
Aging and Mental Health, 13, 274–279.
Comrey, A. L. (1980). Handbook of interpretations for the Comrey
Cizek, G. J. (1999). Cheating on tests: How to do it, detect it, and prevent Personality Scales. San Diego, CA: EdITS.
it. Mahwah, NJ: Erlbaum.
Comrey, A. L. (2008). The Comrey Personality Scales. In G. J. Boyle,
Clark, D. A. (1988). The validity of measures of cognition: A review G. Matthews, & D. H. Saklofske (Eds.), The SAGE handbook of
of the literature. Cognitive Therapy and Research, 12, 1–20. personality theory and assessment, vol 2: Personality measurement and
Clarkin, J. F., Hull, J., Cantor, J., & Sanderson, C. (1993). Borderline testing (pp. 113–134). Thousand Oaks, CA: Sage Publications.
personality disorder and personality traits: A comparison of Comrey, A. L., & Backer, T. (1970). Construct validation of the
SCID-II BPD and NEO-PI. Psychological Assessment, 5, 472–476. Comrey Personality Scales. Multivariate Behavior Research, 5,
Clarren, S., Randels, S., Sanderson, M., & Fineman, R. (2001). 469–477.
Screening for fetal alcohol syndrome in primary schools: A Comrey, A. L., & Schiebel, D. (1983). Personality test correlates of
feasibility study. Teratology, 63, 3–10. psychiatric outpatient status. Journal of Consulting and Clinical
Cleary, T. A., Humphreys, L. G., Kendrick, S. A., & Wesman, A. Psychology, 51, 756–762.
(1975). Educational uses of tests with disadvantaged students. Comrey, A. L., & Schiebel, D. (1985). Personality test correlates of
American Psychologist, 30, 15–41. psychiatric case history data. Journal of Consulting and Clinical
Cleckley, H. (1941). The mask of sanity. St. Louis, MO: C. V. Mosby. Psychology, 53, 470–479.

Cleckley, H. (1976). The mask of sanity (5th ed.). St. Louis, MO: Mosby. Conn, H. O. (2011). Normal pressure hydrocephalus (NPH): More
about NPH by a physician who is the patient. Clinical Medicine,
Clemans, W. V. (1971). Test administration. In R. L. Thorndike (Ed.), 11(2), 162–165.
Educational measurement (2nd ed.). Washington, DC: American
Council on Education. Conners, C. K. (1990). Conners'. Rating Scales. Los Angeles: Western
Psychological Services.
Cleveland, J. N., Murphy, K. R., & Williams, R. E. (1989). Multiple
uses of performance appraisal: Prevalence and correlates. Journal of Conners, C. K. (1991). Conners'. Teacher Rating Scales- 39. North
Applied Psychology, 74, 130–135. Tonawanda, NY: Multi-Health Systems, Inc.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Conners, C. K. (1995). Conners'. Continuous Performance Test II (CPT
Educational and Psychological Measurement, 20, 37–46. II). North Tonawanda, NY: Multi-Health Systems, Inc.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Conners, C. K. (1997). Conners'. Rating Scales-Revised. North
ed.). Hillsdale, NJ: Erlbaum. Tonawanda, NY: Multi-Health Systems.

Cohen, M. (1997). Children'.s Memory Scale. San Antonio, TX: Conoley, C. W. (1992). Review of Beck Depression Inventory. The
Psychological Corporation. eleventh mental measurements yearbook. Lincoln: University of
Nebraska Press.
Cohen, S., & Janicki-Deverts, D. (2009). Can we improve our physical
health by altering our social networks? Perspectives on Psychological Conoley, C. W., Plake, B., & Kemmerer, B. (1991). Issues in computer-
Science, 4, 375–378. based test interpretive systems. Computers in Human Behavior, 7,
97–101.
Colby, A., & Kohlberg, L. (1987). The measurement of moral judgment
Constantino, G., & Malgady, R. (1996). Development of TEMAS, a
(vol. I). Cambridge: Cambridge University Press.
multicultural thematic apperception test: Psychometric properties
Colby, A., Kohlberg, L., Gibbs, J. C., & others. (1978). Measuring moral and clinical utility. In G. R. Sodowsky & J. C. Impara (Eds.),
judgment: Standardized scoring manual. Cambridge, MA: Harvard Multicultural assessment in counseling and clinical psychology. Lincoln,
University, Moral Education Research Foundation. NE: The Buros Institute of Mental Measurements.
Colby, A., Kohlberg, L., Gibbs, J., & Lieberman, M. (1983). A Constantino, G., & Malgady, R. (2000). Multicultural and cross-
longitudinal study of moral judgment. Monographs for the Society for cultural utility of the TEMAS (Tell-Me-A-Story) Test. In R. Dana
Research in Child Development, 48, 1, 2. (Ed.), Handbook of cross-cultural and multicultural personality
Cole, N. S., & Moss, P. A. (1989). Bias in test use. In R. L. Linn (Ed.), assessment. Mahwah, NJ: Erlbaum.
Educational measurement (3rd ed.). New York: ACE/Macmillan. Constantino, G., Malgady, R., & Rogler, L. (1988). Tell-Me-A-Story
College Board. (2005). Retrieved from www.collegeboard.com/ (TEMAS): Manual. Los Angeles: Western Psychological Services.
student/testing/sat/ on September 21, 2005. Conte, J. (2005). A review and critique of emotional intelligence
Collins, J. M., & Schmidt, F. L. (1993). Personality, integrity, and white measures. Journal of Organizational Behavior, 26, 433–440.
collar crime: A construct validity study. Personnel Psychology, 46, Conway, J. M., Jako, R., & Goodman, D. (1995). A meta-analysis
295–311. of interrater and internal consistency reliability of selection
Colom, R., Quiroga, M., & Juan-Espinosa, M. (1999). Are cognitive interviews. Journal of Applied Psychology, 80, 565–579.
sex differences disappearing? Evidence from Spanish populations. Cooper, D., & Shepard, K. (1992). Review of DIAL-R. Learning
Personality and Individual Differences, 27, 1189–1195. Disabilities Research & Practice, 7, 171–174.
408 References

Corkin, S. (1968). Acquisition of motor skill after bilateral medial neuropsychological testing. Delray Beach, FL: GR Press/St. Lucie
temporal-lobe excision. Neuropsychologia, 6, 255–265. Press.
Cornelius, S. W., & Caspi, A. (1987). Everyday problem solving in Critchley, M. (1953). The parietal lobes. London: Edward Arnold.
adulthood and old age. Psychology and Aging, 2, 144–153.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of
Cosden, M. (1992). Review of the Draw A Person: A Quantitative tests. Psychometrika, 16, 297–334.
Scoring System. The eleventh mental measurements yearbook. Lincoln:
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.),
University of Nebraska Press.
Educational measurement (2nd ed.). Washington, DC: American
Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory Council on Education.
(NEO PI-R) and the NEO Five Factor Inventory (NEO-FFI) professional
Cronbach, L. J. (1988). Five perspectives on the validity argument. In
manual. Odessa, FL: Psychological Assessment Resources.
H. Wainer & H. I. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence
Costa, P. T., Herbst, J. H., McCrae, R. R., & Siegler, I. C. (2000). Erlbaum.
Personality at midlife: Stability, intrinsic maturation, and response
to life events. Assessment, 7, 365–378. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in
psychological tests. Psychological Bulletin, 52, 281–302.
Costa, P. T., Jr. (1991). Clinical use of the five-factor model. Journal of
Personality Assessment, 57, 393–398. Culbertson, J., & Edmonds, A. (1996). Learning Disabilities.
In R. Adams, O. Parsons, J. Culbertson, & S. Nixon (Eds.).
Costa, P. T., Jr., & McCrae, R. (1989). NEO Five-Factor Inventory test Neuropsychology for clinical practice: Etiology, assessment, and treatment
manual. Port Huron, MI: Sigma Assessment Systems. of common neurological disorders. Washington, DC: American
Costa, P. T., Jr., & McCrae, R. (1992). NEO PI-R test manual. Port Psychological Association.
Huron, MI: Sigma Assessment Systems. Cullen, M., & Sackett, P. (2004). Integrity testing in the workplace. In
Costa, P. T., Jr., McCrae, R. R., & Holland, J. L. (1984). Personality and J. Thomas (Ed.), Comprehensive handbook of psychological assessment,
vocational interests in an adult sample. Journal of Applied Psychology, Vol. 4: Industrial and organizational assessment. Hoboken, NJ: John
69, 390–400. Wiley.
Costa, P., McCrae, R., & Martin, T. (2005). The NEO-PI-3: A more Cummings, N. A. (2007). Treatment and assessment take place in an
readable revised NEO Personality Inventory. Journal of Personality economic context, always. In S. O. Lilienfeld & W. T. O'.Donohue
Assessment, 84, 261–270. (Eds.), The great ideas of clinical science: 17 principles that every mental
Costa, P., McCrae, R., & Martin, T. (2008). Incipient adult personality: health professional should understand (pp. 163–184). New York:
The NEO-PI-3 in middle-school-aged children. British Journal of Routledge.
Developmental Psychology, 26, 71–89. Cummings, R., Maddux, C., Harlow, S., & Dyas, L. (2002). Academic
Costenbader, V., & Ngari, S. (2001). A Kenya standardization of misconduct in undergraduate teacher education students and
the Raven'.s Coloured Progressive Matrices. School Psychology its relationship to their principled moral reasoning. Journal of
International, 22, 258–268. Instructional Psychology, 29, 286–296.

Cote, L., & Crutcher, M. D. (1991). The basal ganglia. In E. R. Kandel, Cunningham, M., Wong, D., & Barbee, A. (1994). Self-presentation
J. H. Schwartz, & T. M. Jessell (Eds.), Principles of neural science dynamics on overt integrity tests: Experimental studies with the
(3rd ed.). New York: Elsevier. Reid Report. Journal of Applied Psychology, 79, 643–658.

Courvoisier, D. S., Eid, M., & Lischetzke, T. (2012). Compliance Cureton, E. E. (1950). Validity, reliability, and baloney. Educational and
to a cell phone-based ecological momentary assessment study: Psychological Measurement, 10, 94–96.
The effect of time and personality characteristics. Psychological Cutler, B. L., & Kovera, M. B. (2011). Expert psychological testimony.
Assessment, 24, 713–720. Current Directions in Psychological Science, 20, 53–57.
Coveny, T. E. (1972). A new test for the visually handicapped: da Costa Armentano, C. G., Porto, C. S., Brucki, S., & Nitrini, R.
Preliminary analysis of reliability and validity of the Perkins-Binet. (2009). Study on the Behavioural Assessment of the Dysexecutive
Education of the Handicapped, 4, 97–101. Syndrome (BADS) performance in healthy individuals, mild
Cowdery, K. M. (1926–27). Measurement of professional attitudes: cognitive impairment and Alzheimer'.s disease: A preliminary
Differences between lawyers, physicians, and engineers. Journal of study. Dementia & Neuropsychologia, 3, 101–107.
Personnel Research, 5, 131–141. Dahlstrom, W. G., Welsh, G. S., & Dahlstrom, L. E. (1975). An MMPI
Craig, R. J. (Ed.). (1993). The Millon Clinical Multiaxial Inventory: A handbook: Vol. II. Research applications. Minneapolis: University of
clinical research information synthesis. Hillsdale, NJ: Erlbaum. Minnesota Press.
Cramond, B., Matthews-Morgan, J., Bandalos, D., & Zuo, L. (2005). Daley, T., Whaley, S., Sigman, M., Espinosa, M., & Neumann, C.
A report on the 40-year follow-up of the Torrance Tests of Creative (2003). IQ on the rise: The Flynn effect in rural Kenyan children.
Thinking: Alive and well in the new millennium. Gifted Child Psychological Science, 14, 215–219.
Quarterly, 49, 283–291. Dana, R. H. (1959). Proposal for objective scoring of the TAT.
Crandall, J. E. (1981). Theory and measurement of social interest: Perceptual and Motor Skills, 10, 27–43.
Empirical tests of Alfred Adler'.s concept. New York: Columbia
Das J. P., Naglieri, J., & Kirby, J. (1994). Assessment of cognitive
University Press.
processes: The PASS theory of intelligence. Boston: Allyn and Bacon.
Crawford, J. R., Sommerville, J., & Robertson, I. (1997). Assessing
Das, J. P. (1994). Serial and parallel processing. In R. J. Sternberg (Ed.),
the reliability and abnormality of subtest differences on the Test of
Encyclopedia of human intelligence. New York: Macmillan.
Everyday Attention. British Journal of Clinical Psychology, 36, 609–617.
Das, J. P., & Naglieri, J. A. (1993). Cognitive assessment system:
Creed, P., Patton, W., & Bartrum, D. (2002). Multidimensional
Standardization version. Chicago: Riverside.
properties of the LOT-R: Effects of optimism and pessimism on
career and well-being related variables in adolescents. Journal of Das, J. P., Kirby, J. R., & Jarman, R. F. (1979). Simultaneous and
Career Assessment, 10, 42–61. successive cognitive processes. New York: Academic Press.
Cripe, L. (1996). The ecological validity of executive function Das, J. P., Kirby, J. R., & Jarman, R. F. (1979). Simultaneous and
testing. In R. J. Sbordone & C. J. Long (Eds.), Ecological validity of successive cognitive processes. Orlando, FL: Academic Press.
References 409

Davis, A. S., Johnson, J. A., & D'.Amato, R. C. (2005). Evaluating Dickens, W., & Flynn, J. R. (2006). Black Americans reduce the racial
and using long-standing school neuropsychological batteries: IQ gap: Evidence from standardization samples. Psychological
The Halstead-Reitan and the Luria-Nebraska neuropsychological Science, 17, 913–920.
batteries. In R. C. D'.Amato, E. Fletcher-Jansen, & C. R. Reynolds Diener, E. (2009). Editor'.s introduction: Special issue on the next big
(Eds.), Handbook of school neuropsychology (pp. 236–263). Hoboken, questions in psychology. Perspectives on Psychological Science, 4, 325.
NJ: Wiley.
Diessner, R., & Lewis, G. (2007). Further validation of the Gratitude,
Davis, C. (1980). Perkins-Binet Tests of Intelligence for the blind. Resentment, and Appreciation Test. Journal of Social Psychology, 147,
Watertown, MA: Perkins School for the Blind. 445–447.
Davis, E., Glynn, L., Schetter, C., & others. (2007). Prenatal exposure Digman, J. (1990). Personality structure: Emergence of the five-factor
to maternal depression and cortisol influences infant temperament. model. Annual Review of Psychology, 41, 417–440.
Journal of the American Academy of Child and Adolescent Psychiatry, 46,
737–746. Dikmen, S., Machamer, J., Winn, H., & Temkin, N. (1995).
Neuropsychological outcome at 1-year post head injury.
Davison, M., Gasser, M., & Ding, S. (1996). Identifying major profile Neuropsychology, 9, 80–90.
patterns in a population: An exploratory study of WAIS and GATB
patterns. Psychological Assessment, 1, 26–31. DiLalla, L. F., Thompson, L. A., Plomin, R., & others. (1990). Infant
predictors of preschool and adult IQ: A study of infant twins and
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus their parents. Developmental Psychology, 26, 759–769.
actuarial judgment. Science, 243, 1668–1674.
Dillon, R. F., Pohlmann, J. T., & Lohman, D. F. (1981). A factor analysis
Dawis, R. V. (1996). The theory of work adjustment and person- of Raven'.s Advanced Progressive Matrices freed of difficulty
environment correspondence counseling. In D. Brown & L. Brooks factors. Educational and Psychological Measurement, 41, 1295–1302.
(Eds.), Career choice and development (3rd ed., pp. 75–120). San
Francisco: Jossey-Bass. Dixon, C. E. (2011, May 16). Traumatic brain injury produced by
exposure to blasts, a critical problem in current wars: Biomarkers,
Dawis, R. V. (2002). Person-Environment Correspondence theory. In clinical studies, and animal modes. Proceedings of the Society of
D. Brown & Associates (Eds.), Career choice and development (4th ed., Photo-Optical Instrumentation Engineers, 80290M.
pp. 427–464). San Francisco: Jossey-Bass.
Dodge, K. A. (2009). Mechanisms of gene-environment interaction
Dawis, R. V., & Lofquist, L. H. (1984). A psychological theory of work effects in the development of Conduct disorder. Perspectives on
adjustment. Minneapolis: University of Minnesota Press. Psychological Science, 4, 408–414.
Dawson, A., Buxbaum, L. J., & Rizzo, A. A. (2008). The Virtual Reality Dodrill, C. B. (1979). Sex differences on the Halstead-Reitan
Lateralized Attention Test: Sensitivity and validity of a new clinical Neuropsychological Battery and on other neuropsychological
tool for assessing hemispatial neglect. Virtual Rehabilitation, 2008, measures. Journal of Clinical Psychology, 35, 236–241.
77–82.
Dodrill, C. B. (1981). An economical method of measuring general
Dayan, K., Fox, S., & Kasten, R. (2008). The preliminary employment intelligence in adults. Journal of Consulting and Clinical Psychology,
interview as a predictor of assessment center outcomes. 49, 668–673.
International Journal of Selection and Assessment, 16, 102–111.
Dodrill, C. B., & Warner, M. H. (1988). Further studies of the
de Bildt, A., Kraijere, D., Sytema, S., & Minderaa, R. (2005). The Wonderlic Personnel Test as a brief measure of intelligence. Journal
psychometric properties of the Vineland Adaptive Behavior Scales of Consulting and Clinical Psychology, 56, 145–147.
in children and adolescents with mental retardation. Journal of
Autism and Developmental Disorders, 35, 53–62. Doll, E. A. (1935). The Vineland Social Maturity Scale. Training School
Bulletin, 32, 1–7, 25–32, 48–55, 68–74.
de Raad, B., & Perugini, M. (Eds.). Big five assessment. Ashland, OH:
Hogrefe and Huber Publishers. Dolliver, R. H., Irvin, J. A., & Bigley, S. E. (1972). Twelve-year
follow-up of the Strong Vocational Interest Blank. Journal of
Decker, S. L. (2008). Measuring growth and decline in visual-motor Counseling Psychology, 19, 212–217.
processes with the Bender-Gestalt second edition. Journal of
Psychoeducational Assessment, 26, 3–15. Donders, J. (1995). Validity of the Kaufman Brief Intelligence Test
(K-BIT) in children with traumatic brain injury. Assessment, 2,
DeCrans, M. (1990). Spiritual well-being in the rural elderly. 219–224.
Unpublished manuscript, Marquette University, Milwaukee, WI.
Donders, J., & Levitt, T. (2012). Criterion validity of the
Delis, D. C., & Kaplan, E. (1982). Assessment of aphasia with the neuropsychological assessment battery after traumatic brain injury.
Luria-Nebraska Neuropsychological Battery: A conceptual critique. Archives of Clinical Neuropsychology, 27, 440–445.
Journal of Consulting and Clinical Psychology, 50, 32–39.
Donders, J., Tulsky, D., & Zhu, J. (2001). Criterion validity of new
Delis, D. C., Kramer, J., Kaplan, E., & Ober, B. (2000). California Verbal WAIS-III subtest scores after traumatic brain injury. Journal of the
Learning Test—Second Edition. San Antonio, TX: The Psychological International Neuropsychological Society, 7, 892–898.
Corporation.
Donlon, T. F. (Ed.). (1984). The College Board technical handbook for the
Dellas, M., & Gaier, E. L. (1970). Identification of creativity: The Scholastic Aptitude Test and Achievement Tests. New York: College
individual. Psychological Bulletin, 73, 55–73. Entrance Examination Board.
Deri, S. (1949). Introduction to the Szondi Test. New York: Grune & Donnay, D., & Borgen, F. (1996). Validity, structure, and content of the
Stratton. 1994 Strong Interest Inventory. Journal of Counseling Psychology, 43,
Dey, A. N., Schiller, J. S., & Tai, D. A. (2004). Summary health statistics 275–291.
for U.S. children: National Health Interview Survey, 2002. Washington, Donnay, D., Thompson, R., Morris, M., & Schaubhut, N. (2004).
DC: National Center for Health Statistics. Technical brief for the newly revised Strong Interest Inventory assessment:
Diamond, S. (1980). Wundt before Leipzig. In R. W. Rieber (Ed.), Content, reliability and validity. Mountain View, CA: Consulting
Wilhelm Wundt and the making of a scientific psychology. New York: Psychologists Press.
Plenum Press. Drakeley, R. J., Herriot, P., & Jones, A. (1988). Biographical data,
Dickens, W., & Flynn, J. (2006). Black Americans reduce the racial IQ training success and turnover. Journal of Occupational Psychology, 61,
gap. Psychological Science, 17, 913–920. 145–152.
410 References

Drasgow, F., Olson-Buchanan, J., & Moberg, P. (1999). Development Embretson, S. E. (1996). The new rules of measurement. Psychological
of an interactive assessment: Trials and tribulations. In F. Drasgow Assessment, 8, 341–349.
& J. Olson-Buchanan (Eds.), Innovations in computerized assessment.
Embretson, S. E., & Reise, S. (2000). Item response theory for
Mahwah, NJ: Erlbaum.
psychologists. Mahwah, NJ: Erlbaum.
Drebing, C., Van Gorp, W., Stuck, A., Mitrushina, M., & Beck, J.
Emmons, R., McCullough, M., & Tsang, J. (2003). The assessment of
(1994). Early detection of cognitive decline in higher cognitively
gratitude. In S. Lopez & C. R. Snyder (Eds.), Positive psychological
functioning older adults: Sensitivity and specificity of a
assessment (pp. 345–360). Washington, DC: American Psychological
neuropsychological screening battery. Neuropsychology, 8, 31–37.
Association.
DuBois, P. E. (1939). A test standardized on Pueblo Indian children.
Eonta, S. E., Carr, W., McArdle, J. J., & others. (2011). Automated
Psychological Bulletin, 36, 523.
neuropsychological assessment metrics: Repeated assessments with
DuBois, P. H. (1970). A history of psychological testing. Boston: Allyn two military samples. Aviation, Space, and Environmental Medicine,
and Bacon. 82, 34–39.
Dumenci, L. (1995). Construct validity of the Self-Directed Search Erard, R. E. (2012). Expert testimony using the Rorschach
using hierarchically nested structural models. Journal of Vocational Performance Assessment System in psychological injury cases.
Behavior, 47, 21–34. Psychological Injury and the Law, 5, 122–134.
Dumont, R., Cruse, C., Price, L., & Whelley, P. (1996). The relationship Erdberg, P. (1985). The Rorschach. In C. S. Newmark (Ed.), Major
between the Differential Ability Scales (DAS) and the Wechsler psychological assessment instruments. Boston: Allyn and Bacon.
Intelligence Scale for Children-Third Edition (WISC-III). Psychology
Esquirol, J. E. D. (1845/1838). Mental maladies. (trans. E. K. Hunt).
in the Schools, 33, 203–209.
Philadelphia: Lea & Blanchard.
Dunai, F., & Porter, R. (2001). Armed Services Vocational Aptitude
Estes, W. K. (1974). Learning theory and intelligence. American
Battery predictors of entry-level radiography students'. success.
Psychologist, 29, 740–749.
Military Medicine, 166, 422–426.
Evans, D. A., Funkenstein, H., Albert, M., & others. (1989). Prevalence
Dunn, L. M., & Dunn, D. M. (2007). Examiner'.s manual: Peabody
of Alzheimer'.s Disease in a community population of older
Picture Vocabulary Test—Fourth Edition. New York: Pearson.
persons. Journal of the American Medical Association, 262, 2551–2556.
Dunn, L. M., & Dunn, L. M. (1981). Peabody Picture Vocabulary Test-
Ewing, J. A. (1984). Detecting alcoholism: The CAGE questionnaire.
Revised. Circle Pines, MN: American Guidance Service.
Journal of the American Medical Association, 252, 1905–1907.
Dunn, L. M., & Dunn, L. M. (1998). Examiner'.s Manual: Peabody
Exner, J. E., Jr. (1991). The Rorschach: A comprehensive system, Volume 2.
Picture Vocabulary Test-III. Circle Pines, MN: American Guidance
Current research and advanced interpretation (2nd ed.). New York: Wiley.
Service.
Exner, J. E., Jr. (1993). The Rorschach: A comprehensive system, Volume 1.
Dyce, J. A. (1996). Factor structure of the Beck Hopelessness Scale.
Basic foundations (3rd ed.). New York: Wiley.
Journal of Clinical Psychology, 52, 555–558.
Exner, J. E., Jr. (1995). Issues and methods in Rorschach research.
Ebbinghaus, H. (1885/1913). Memory: A contribution to experimental
Mahwah, NJ: Erlbaum.
psychology. Translated by Henry A. Ruger & Clara E. Bussenius.
New York: Teachers College Press. Exner, J. E., Jr., & Weiner, I. B. (1994). The Rorschach: A comprehensive
system, Volume 3. Assessment of children and adolescents (2nd ed.).
Ebbinghaus, H. (1897). Ueber eine neue Methode zur Pruefung
New York: Wiley.
geistiger Faehigkeiten und ihre Anwendung bei Schulkindern.
Zeitschrift fuer Angewandte Psychologie, 13, 401–459. Eyde, L. D., & Primhoff, E. S. (1992). Responsible test use. In M.
Zeidner and R. Most (Eds.), Psychological testing: An inside view. Palo
Eccles, J. C. (1973). The understanding of the brain. New York:
Alto, CA: Consulting Psychologists Press.
McGraw-Hill.
Eyde, L. D., Robertson, G. J., & Krug, S. (2009). Responsible test use:
Educational Testing Service. (1989). Guidelines for proper use of GRE
Case studies for assessing human behavior. Washington, DC: American
scores. Princeton, NJ: Author.
Psychological Association.
Eggerth, D. D. (2008). From theory of work adjustment to person-
Eyde, L. D., Robertson, G. J., Krug, S., & others. (1993). Responsible
environment correspondence counseling: Vocational psychology as
test use: Case studies for assessing human behavior. Washington, DC:
positive psychology. Journal of Career Assessment, 16, 60–74.
American Psychological Association.
Eisenstein, N., & Engelhart, C. (1997). Comparison of the K-BIT with
Eysenck, H. J. (1986). Is intelligence? In R. J. Sternberg & D. K.
short forms of the WAIS-R in a neuropsychological population.
Detterman (Eds.), What is intelligence? Contemporary viewpoints on its
Psychological Assessment, 9, 57–62.
nature and definition. Norwood, NJ: Ablex.
Elder, G. (1974). Children of the great depression: Social change in life
Eysenck, H. J. (1986). Toward a new model of intelligence. Personality
experience. Boulder, CO: Westview Press.
and Individual Differences, 7, 731–736.
Elliott, C. D. (1990). The Differential Ability Scales: Introductory and
Eysenck, H. J., & Eysenck, M. W. (1975). Manual of the Eysenck
technical handbook. San Antonio, TX: The Psychological Corporation.
Personality Questionnaire. San Diego: Educational and Industrial
Elliott, C. D. (2007). Differential Ability Scales—Second Edition: Testing Service.
Introductory and technical manual. San Antonio, TX: Harcourt
Eysenck, H. J., & Eysenck, M. W. (1985). Personality and individual
Assessment.
differences: A natural science approach. New York: Plenum Press.
Ellis, A. (1962). Reason and emotion in psychotherapy. New York: Lyle
Factor, S., & Weiner, W. (2008). Parkinson'.s disease: Diagnosis
Stuart.
and clinical management (2nd ed.). New York: Demos Medical
Ellison, C. W. (1983). Spiritual well-being: Conceptualization and Publishing.
measurement. Journal of Psychology and Theology, 11, 330–340.
Fagan, J. F. III, & Haiken-Vasen, J. (1997). Selective attention
Ellison, C. W., & Smith, J. (1991). Toward an integrative measure to novelty as a measure of information processing across the
of health and well-being. Journal of Psychology and Theology, 19, lifespan. In J. Burack & J. Enns (Eds.), Attention, development, and
35–48. psychopathology. New York: Guilford.
References 411

Fagan, J. F. III, & McGrath, S. K. (1981). Infant recognition memory Fiorello, C. A., & Primerano, D. (2005). Research into practice: Cattell-
and later intelligence. Intelligence, 5, 121–130. Horn-Carroll cognitive assessment in practice: Eligibility and
program development issues. Psychology in the Schools, 42, 525–536.
Fagan, J. F. III, & Shepherd, P. A. (1986). The Fagan Test of Infant
Intelligence: Training manual. Cleveland, OH: Infantest Corporation. First, M., & Gibbon, M. (2004). The structured clinical interview
for DSM-IV axis I disorders (SCID-I) and the structured clinical
Fagan, J. F. III, Singer, L., Montie, J., & Shepherd, P. (1986).
interview for DSM-IV axis II disorders (SCID-II). In M. Hilsenroth
Selective screening device for the early detection of normal or
& D. Segal (Eds.), Comprehensive handbook of psychological assessment,
delayed cognitive development in infants at risk for later mental
Vol 2: Personality Assessment (pp. 134–143). Hoboken, NJ: John
retardation. Pediatrics, 78, 1021–1026.
Wiley.
Fagan, J. F. III. (1984). Infant memory. In M. Moscovitch (Ed.), Infant
Fish, J. M. (Ed.). (2002). Race and intelligence: Separating science from
memory. New York: Plenum Press.
myth. Mahwah, NJ: Erlbaum.
Fagan, J., & Holland, C. (2006). Racial equality in intelligence:
Fisher, S., & Greenberg, R. P. (1984). The scientific credibility of Freud'.s
Predictions from a theory of intelligence as processing. Intelligence,
theories and therapy. New York: Columbia University Press.
15, 319–334.
Fiske, D. W. (1986). The trait concept and the personality
Fancher, R. E. (1985). The intelligence men: Makers of the IQ controversy.
questionnaire. In A. Angleitner & J. S. Wiggins (Eds.), Personality
New York: Norton.
assessment via questionnaires: Current issues in theory and measurement.
Farrell, M., & Phelps, L. (2000). A comparison of the Leiter-R and Berlin: Springer-Verlag.
the Universal Nonverbal Intelligence Test (UNIT) with children
Flanagan, J. C. (1954). The critical incident technique. Psychological
classified as language impaired. Journal of Psychoeducational
Bulletin, 51, 327–358.
Assessment, 18, 268–274.
Flanagan, J. C. (1956). The evaluation of methods in applied
Faul, M., Xu, L., Wald, M. M., & Coronado, V. G. (2010). Traumatic
psychology and the problem of criteria. Occupational Psychology,
brain injury in the United States: Emergency department visits,
30, 1–9.
hospitalizations, and deaths. Atlanta, GA: Centers for Disease Control
and Prevention. Flanagan, R., & di Guiseppe, R. (1999). Critical review of the
TEMAS: A step within the development of thematic apperception
Federal Rules of Evidence for United States Courts and Magistrates. (1975).
instruments. Psychology in the Schools, 36, 21–30.
St. Paul, MN: West Publishing Company.
Flavell, J. (1976). Metacognitive aspects of problem-solving. In L.
Fehring, R., Brennan, P., & Keller, M. (1987). Psychological and
Resnick (Ed.), The nature of intelligence. Hillsdale, NJ: Erlbaum.
spiritual well-being in college students. Research in Nursing and
Health, 10, 391–398. Fletcher, J., & Vaughn, S. (2009). Response to intervention: Preventing
and remediating academic difficulties. Child Development
Feist, G. (1999). Autonomy and independence. Encyclopedia of
Perspectives, 3, 30–37.
creativity (vol. 1, pp. 157–163). San Diego, CA: Academic Press.
Floyd, R. G., Evans, J. J., & McGrew, K. S. (2003). Relations between
Feist, G., & Barron, F. (2003). Predicting creativity from early to late
measures of Cattell-Horn-Carroll (CHC) cognitive abilities and
adulthood: Intellect, potential, and personality. Journal of Research in
mathematics achievement across the school-age years. Psychology in
Personality, 37, 62–88.
the Schools, 40, 155–171.
Feldman, R. D. (1982). Whatever happened to the quiz kids? Chicago:
Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932 to
Chicago Review Press.
1978. Psychological Bulletin, 95, 29–51.
Feldstein, S., & Miller, W. (2007). Does subtle screening for substance
Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests
abuse work? A review of the Substance Abuse Subtle Screening
really measure. Psychological Bulletin, 101, 171–191.
Inventory (SASSI). Addiction, 102, 41–50.
Flynn, J. R. (1994). IQ gains over time. In R. J. Sternberg (Ed.),
Feldt, L. S., & Brennan, R. L. (1989). In R. L. Linn (Ed.), Educational
Encyclopedia of human intelligence. New York: Macmillan.
measurement (3rd ed.). New York: American Council on Education/
Macmillan. Flynn, J. R. (2007a). What is intelligence: Beyond the Flynn effect.
Cambridge: Cambridge University Press.
Ferris, G., Judge, T., Rowland, K., & Fitzgibbons, D. (1994).
Subordinate influence and the performance evaluation process: Test Flynn, J. R. (2007b). Solving the IQ puzzle. Scientific American Mind,
of a model. Organizational Behavior and Human Decision Processes, 58, 18, 25–31.
101–135. Flynn, J. R., & Rossi-Casé, L. (2012). IQ gains in Argentina between
Ferris, S. H. (1992). Diagnosis by specialists: Psychological testing. 1964 and 1998. Intelligence, 40, 145–150.
Acta Neurologica Scandinavica, 85, 32–35. Folstein, M., Folstein, S., & McHugh, P. (1975). Mini-Mental State: A
Finholt, T., & Olson, G. (1997). From laboratories to collaboratories: practical method for grading the cognitive state of patients for the
A new organizational form for scientific collaboration. Psychological clinician. Journal of Psychiatric Research, 12, 189–198.
Science, 8, 28–36. Fonseca, R., Scherer, L., de Oliveira, C., & others. (2009). Hemisphere
Finn, S. E. (1996). A manual for using the MMPI-2 as a therapeutic specialization for communicative processing: Neuroimaging data
intervention. Minneapolis: University of Minnesota Press. on the role of the right hemisphere. Psychology and Neuroscience, 2,
25–33.
Finn, S. E., & Tonsager, M. E. (1992). Therapeutic effects of providing
MMPI-2 feedback to college students awaiting therapy. Psychological Forbey, J., & Ben-Porath, Y. (2002). Use of the MMPI-2 in the
Assessment, 9, 374–385. treatment of offenders. International Journal of Offender Therapy and
Comparative Criminology, 46, 308–318.
Finn, S. E., & Tonsager, M. E. (1992). Therapeutic effects of providing
MMPI-2 test feedback to college students awaiting therapy. Forbey, J., & Ben-Porath, Y. (2007). Computerized adaptive
Psychological Assessment, 4, 278–287. personality testing: A review and illustration with the MMPI-2
computerized adaptive version. Psychological Assessment, 19, 14–24.
Finn, S. E., & Tonsager, M. E. (1997). Information-gathering and
therapeutic models of assessment: Complementary paradigms. Forrest, D. W. (1974). Francis Galton: The life and work of a Victorian
Psychological Assessment, 9, 374–385. genius. New York: Taplinger Publishing.
412 References

Forster, A. A. (1994). Learning Disability. In R. J. Sternberg (Ed.), Fuchs, D., & Fuchs, L. (2005). Responsiveness-to-intervention: A
Encyclopedia of human intelligence. New York: Macmillan. blueprint for practitioners, policymakers, and parents. Teaching
Fowler, R. D. (1985). Landmarks in computer-assisted psychological Exceptional Children, 38, 57–61.
assessment. Journal of Consulting and Clinical Psychology, 53, Fuess, C. M. (1950). The College Board: Its first fifty years. New York:
748–759. Columbia University Press.
Frank, G. (1983). The Wechsler enterprise: An assessment of the Fuld, P. A. (1977). Fuld Object-Memory Evaluation. Chicago: Stoelting Co.
development, structure, and use of the Wechsler tests of intelligence. New
Fuld, P. A., Masur, D. M., Blau, A. D., Crystal, H., & Aronson, M.
York: Pergamon Press.
K. (1990). Object-Memory Evaluation for prospective detection of
Frank, G. (1990). Research on the clinical usefulness of the Rorschach: dementia in normal functioning elderly: Predictive and normative
1. The diagnosis of schizophrenia. Perceptual and Motor Skills, 71, data. Journal of Clinical and Experimental Neuropsychology, 12,
573–578. 520–528.
Frank, L. K. (1939). Projective methods for the study of personality. Fuller, G. B., Parmelee, W. M., & Carroll, J. L. (1982). Performance of
Journal of Psychology, 8, 389–413. delinquent and nondelinquent highschool boys on the Rotter Incom­
Frank, L. K. (1948). Projective methods. Springfield, IL: Thomas. plete Sentences Blank. Journal of Personality Assessment, 46, 506–510.

Franke, W. (1963). The reform and abolition of the traditional Chinese Funder, D. C. (2009). Naïve and obvious questions. Perspectives on
examination system. Cambridge, MA: Harvard University Press. Psychological Science, 4, 340–344.

Frankenburg, W. K. (1985). The Denver approach to early case Fuqua, D. R., & Newman, J. L. (1994). An evaluation of the Career
finding: A review of the Denver Developmental Screening Beliefs Inventory. Journal of Counseling and Development, 72, 429–430.
Test and a brief training program in developmental diagnosis. Furnham, A., Batey, M., Anand, K., & Manfield, J. (2008). Personality,
In W. K. Frankenburg, R. M. Emde, & J. W. Sullivan (Eds.), hypomania, intelligence and creativity. Personality and Individual
Identification of children at risk: An international perspective. New Differences, 44, 1060–1069.
York: Plenum Press.
Furnham, A., Moutari, J., & Crump, J. (2003). The relationship
Frankenburg, W. K., & Dodds, J. B. (1967). The Denver developmental between the revised NEO-Personality Inventory and the ­Myers-
screening tests. Journal of Pediatrics, 71, 181–191. Briggs Type Indicator. Social Behavior and Personality, 31, 577–584.
Frankenburg, W. K., Dodds, J., Archer, P., & others. (1990). Denver II: Furnham, A., Toop, A., Lewis, C., & Fisher, A. (1995). P-E fit and job
Technical manual. Denver, CO: Denver Developmental Materials. satisfaction: A failure to support Holland'.s theory in three British
Frankl, V. (1963). Man'.s search for meaning: An introduction to samples. Personality and Individual Differences, 19, 677–690.
logotherapy. New York: Washington Square Press. Galton, F. (1879). Psychometric experiments. Brain, 2, 149–162.
Frauenheim, J. G., & Heckerl, J. R. (1983). A longitudinal study of Galton, F. (1883). Inquiries into human faculty and its development.
psychological and achievement test performance in severe dyslexic London: Macmillan.
adults. Journal of Learning Disabilities, 16, 339–347.
Galton, F. (1888). Natural inheritance. London: Macmillan.
Frechtling, J. A. (1989). Administrative uses of school testing
Garb, H. N. (1994). Judgment research: Implications for clinical
programs. In R. L. Linn (Ed.), Educational measurement (3rd ed.).
practice and testimony in court. Applied and Preventive Psychology,
New York: American Council on Education/Macmillan.
3, 173–183.
Frederickson, L. C. (1985). Goodenough-Harris Drawing Test. In D. J.
Garb, H. N., Florio, C., & Grove, W. (1998). The validity of the
Keyser & R. C. Sweetland (Eds.). Test critiques (vol. 2). Kansas City,
Rorschach and the MMPI: Results from meta-analyses. Psychological
MO: Test Corporation of America.
Science, 9, 402–404.
Frederiksen, N. (1962). Factors in In-basket Performance. Psychological
Gardner, H. (1983). Frames of mind: The theory of multiple intelligence.
Monographs, 76, Whole No. 541.
New York: Basic Books.
Freud, A. (1946). The ego and the mechanisms of defense. New York:
Gardner, H. (1986). The waning of intelligence tests. In R. J. Sternberg
International Universities Press.
& D. K. Detterman (Eds.), What is intelligence? Contemporary
Freud, S. (1900). The interpretation of dreams. In J. Strachey, (Ed., viewpoints on its nature and definition. Norwood, NJ: Ablex.
in collaboration with A. Freud). The standard edition of the complete
Gardner, H. (1992). Assessment in context: The alternative to
psychological works of Sigmund Freud. London: Hogarth, 1955, vols.
standardized testing. In B. R. Gifford & M. C. O'.Connor (Eds.),
4 and 5.
Alternative views of aptitude, achievement, and instruction. Boston:
Freud, S. (1927/1961). The future of an illusion (J. Strachey, trans.). New Klummer.
York: Basic Books. (Originally published 1900).
Gardner, H. (1993). Multiple intelligences: The theory in practice. New
Freud, S. (1933). New introductory lectures on psychoanalysis. New York: Basic Books.
York: Norton.
Gardner, H. (1998). Are there additional intelligences? The case for
Frey, M. C., & Detterman, D. K. (2004). Scholastic Assessment or g? naturalistic, spiritual, and existential intelligences. In J. Kane (Ed.),
The relationship between the scholastic assessment test and general Education, information, and transformation. Englewood Cliffs, NJ:
cognitive ability. Psychological Science, 15, 373–378. Prentice Hall.
Fridlund, A. J., Ekman, P., & Oster, H. (1987). Facial expressions of Gardner, J. (1967). The adjustment of drug addicts as measured by
emotion: Review of literature 1970–1983. In A. W. Siegman & S. the sentence completion test. Journal of Projective Techniques and
Feldstein (Eds.), Nonverbal behavior and communication (2nd ed.). Personality Assessment, 31, 28–29.
Hillsdale, NJ: Erlbaum.
Gardner, R. A. (1981). Digits forward and digits backward as two
Friedman, A. F. (1987). Eysenck Personality Questionnaire. In D. J. separate tests: Normative data on 1567 school children. Journal of
Keyser & R. C. Sweetland (Eds.), Test critiques compendium. Kansas Clinical Child Psychology, 10, 131–135.
City, MO: Test Corporation of America.
Gast, J., & Hart, K. J. (2010). The performance of juvenile offenders
Friedman, T. L. (2009). The world is flat 3.0: A brief history of the twenty- on the Test of Memory Malingering. Journal of Forensic Psychology
first century. New York: Picador. Practice, 10, 53–68.
References 413

Gaudry, E., Vagg, P., & Spielberger, C. D. (1975). Validation of the Goddard, H. H. (1910a). A measuring scale for intelligence. The
state-trait distinction in anxiety research. Multivariate Behavioral Training School, 6, 146–155.
Research, 10, 331–341.
Goddard, H. H. (1910b). Four hundred feebleminded children
Gavett, B. E., Lou, K. R., Daneshvar, D. H., & others. (2012). classified by the Binet method. Pedagogical Seminary, 17, 387–397.
Diagnostic accuracy statistics for seven Neuropsychological Goddard, H. H. (1911). Two thousand normal children measured by
Assessment Battery (NAB) test variables in the diagnosis of the Binet measuring scale of intelligence. Pedagogical Seminary, 18,
Alzheimer'.s disease. Applied Neuropsychology, 19, 108–115. 232–259.
Gazzaniga, M. S. (1970). The bisected brain. New York: Appleton- Goddard, H. H. (1912). Feeble-mindedness and immigration. Training
Century-Crofts. School Bulletin, 9, 91.
Gazzaniga, M. S., & LeDoux, J. E. (1978). The integrated mind. New Goddard, H. H. (1917). The mental level of a group of immigrants.
York: Plenum Press. Psychological Bulletin, 14, 68–69.
Geary, D. C., & Whitworth, R. H. (1988). Is the factor structure of Goddard, H. H. (1919). Psychology of the normal and subnormal. New
the WISC-R different for Anglo- and Mexican-American children? York: Dodd, Mead, and Co.
Journal of Psychoeducational Assessment, 6, 253–260.
Goddard, H. H. (1928). Feeblemindedness: A question of definition.
GED Testing Service (1991). Examiner'.s manual: Test of General Journal of Psycho-Asthenics, 33, 219–227.
Educational Development. Washington, DC: GED Testing Service of
the American Council on Education. Goffin, R. D., Rothstein, M., & Johnston, N. (1996). Personality testing
and the assessment center: Incremental validity for managerial
Gelb, S. (1986). Henry H. Goddard and the immigrants, 1910–1917: selection. Journal of Applied Psychology, 81, 746–756.
The studies and their social context. Journal of the History of the
Behavioral Sciences, 22, 324–332. Goffin, R., Rothstein, M., & Johnston, N. (2000). Predicting job
performance using personality constructs: Are personality tests
George, C., & Solomon, J. (1999). Attachment and caregiving: The created equal? In R. Goffin & E. Helmes (Eds.), Problems and
caregiving behavioral system. In J. Cassidy & P. Shaver (Eds.), solutions in human assessment: Honoring Douglas N. Jackson at seventy.
Handbook of attachment: Theory, research and clinical application New York: Kluwer Academic/Plenum Publishers.
(pp. 649–670). New York: Guilford Press.
Goldberg, L. R. (1965). Diagnosticians vs. diagnostic signs: The
Gerard, A. B. (1993). Manual for Parent–Child Relationship Inventory. diagnosis of psychosis vs. neurosis from the MMPI. Psychological
Los Angeles: Western Psychological Services. Monographs, 79 (9, Whole No. 602).
Geschwind, N. (1972). Language and the brain. Scientific American, Goldberg, L. R. (1981a). Developing a taxonomy of trait-descriptive
226, 76–83. terms. In D. Fiske (Ed.), New directions for methodology of social and
Geschwind, N., & Galaburda, A. M. (1987). Cerebral lateralization: behavioral science: Problems with language imprecision (no. 9). San
Biological mechanisms, associations, and pathology. Cambridge, MA: Francisco: Jossey-Bass.
MIT Press. Goldberg, L. R. (1981b). Language and individual differences: The
Getz, I. R. (1984). Moral judgment and religion: A review of the search for universals in personality lexicons. In L. Wheeler (Ed.),
literature. Counseling and Values, 28, 94–116. Review of personality and social psychology. Beverly Hills, CA: Sage.
Ghez, C. (1991). The cerebellum. In E. R. Kandel, J. H. Schwartz, & Goldberg, L. R. (1990). An alternative “description of personality”
T. M. Jessell (Eds.), Principles of neural science (3rd ed.). New York: The big-five factor structure. Journal of Personality and Social
Elsevier. Psychology, 59, 1216–1229.
Ghiselli, E. E. (1966). The validity of occupational aptitude tests. New Golden, C. (2004). The Adult Luria-Nebraska Neuropsychological
York: Wiley. Battery. In G. Goldstein, S. Beers, & M. Hersen (Eds.), Intellectual
and neuropsychological assessment (pp. 133–146). Hoboken, NJ:
Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory
Wiley.
for the behavioral sciences. San Francisco: W. H. Freeman.
Golden, C. J., Purish, A. D., & Hammeke, T. A. (1980). Luria-
Gibbons, R., Weiss, D., Kupfer, D., Frank, E., Fagiolini, A., & others.
Nebraska Neuropsychological Battery: Manual. Los Angeles: Western
(2008). Using computerized adaptive testing to reduce the burden
Psychological Services.
of mental health assessment. Psychiatric Services, 59, 361–368.
Golden, C. J., Purish, A. D., & Hammeke, T. A. (1986). Luria-Nebraska
Gifford, R. (1991). Applied psychology: Variety and opportunity. Boston:
Neuropsychological Battery: Forms I and II. Los Angeles: Western
Allyn and Bacon.
Psychological Services.
Gignac, G. (2006). A confirmatory examination of the factor structure
Goldfried, M. R., & Zax, M. (1965). The stimulus value of the TAT.
of the Multidimensional Aptitude Battery: Contrasting oblique,
Journal of Projective Techniques, 29, 46–57.
higher order, and nested factor models. Educational and Psychological
Measurement, 66, 136–145. Golding, S. (1993). Interdisciplinary Fitness Interview—Revised: A
training manual. State of Utah Division of Mental Health.
Gilberstadt, H., & Duker, J. (1965). A handbook for clinical and actuarial
MMPI interpretation. Philadelphia: W. B. Saunders. Goldstein, I. L. (1991). Training in work organizations. In M.
D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and
Glascoe, F. P. (1991). Developmental screening: Rationale, methods
organizational psychology (vol. 2). Palo Alto, CA: Consulting
and application. Infants and Young Children, 4, 1–10.
Psychologists Press.
Glascoe, F. P. (2005). Commonly used screening tests. Retrieved from
Goldstein, I. L. (1992). Training (3rd ed.). Monterey, CA: Brooks/Cole.
www.dbpeds.org/articles on September 2, 2005.
Goldstein, K. (1944). The mental changes due to frontal lobe damage.
Glascoe, F. P., & Byrne, K. E. (1993). The accuracy of three
Journal of Psychology, 17, 187–208.
developmental screening tests. Journal of Early Intervention, 17,
368–379. Goleman, D. (1995). Emotional intelligence: Why it can matter more than
IQ. New York: Bantam.
Glascoe, F. P., & Shapiro, H. (2005). Introduction to developmental
and behavioral screening. Retrieved from www.dbpeds.org/articles Goodenough, F. L. (1926). Measurement of intelligence by drawings.
on September 2, 2005. New York: Harcourt, Brace & World.
414 References

Goodenough, F. L. (1949). Mental testing: Its history, principles, and Graham, J. R. (1993). MMPI-2: Assessing personality and
applications. New York: Rinehart. psychopathology. New York: Oxford.
Goodglass, H., Kaplan, E., & Barresi, B. (2000). Boston Diagnostic Graham, J. R. (2000). MMPI-2: Assessing personality and
Aphasia Examination (3rd ed.). Austin, TX: The Psychological psychopathology (3rd ed.). New York: Oxford University Press.
Corporation.
Granstrom, S. L. (1987). A comparative study of loneliness, Buberian
Goodman, J. (1990). Infant intelligence: Do we, can we, should we religiosity and spiritual well-being in cancer patients. Paper presented
assess it? In C. C. Reynolds & R. W. Kamphaus (Eds.), Handbook of at the conference of the National Hospice Organization.
psychological and educational assessment of children: Intelligence and Gray, B. (2001). A factor analytic study of the Substance Abuse
achievement. New York: Guilford. Subtle Screening Inventory (SASSI). Educational and Psychological
Gordon, G., & Charanian, T. (1964). Measuring the creativity of Measurement, 61, 102–118.
research scientists and engineers. Working paper cited in I. A. Green, D., & Rosenfeld, B. (2011). Evaluating the gold standard: A
Taylor and J. W. Getzels (Eds.), Perspectives in creativity. Chicago: review and meta-analysis of the Structured Interview of Reported
Aldine. Symptoms. Psychological Assessment, 23, 95–107.
Gordon, M., & Keiser, S. (1998). Accommodations in higher education Greenough, W. T., Black, J. E., & Wallace, C. S. (1987). Experience and
under the Americans with Disabilities Act (ADA). DeWitt, NY: GSI brain development. Child Development, 58, 539–559.
Publications.
Gregory, R. J. (1987). Adult intellectual assessment. Boston: Allyn and
Gordon, M., & Mettelman, B. B. (1988). The assessment of attention: I. Bacon.
Standardization and reliability of a behavior-based measure. Journal
of Clinical Psychology, 44, 688–690. Gregory, R. J. (1994a). Aptitude tests. In R. J. Sternberg (Ed.),
Encyclopedia of human intelligence. New York: Macmillan.
Goslin, D. A. (1963). The search for ability: Standardized testing in social
perspective. New York: Russell Sage Foundation. Gregory, R. J. (1994b). Profile interpretation. In R. J. Sternberg (Ed.),
Encyclopedia of human intelligence. New York: Macmillan.
Gothard, S., Viglione, D., Meloy, J. R., & Sherman, M. (1996).
Detection of malingering in competency to stand trial evaluations. Gregory, R. J. (1994c). Classification of intelligence. In R. J. Sternberg
Law and Human Behavior, 19, 493–505. (Ed.), Encyclopedia of human intelligence. New York: Macmillan.

Gottfredson, G. D., & Holland, J. L. (1975). Vocational choices of men Gregory, R. J. (1998). Testing in clinical psychology. In S. Cullari (Ed.),
and women: A comparison of predictors from the Self-Directed Foundations of clinical psychology. Boston: Allyn and Bacon.
Search. Journal of Counseling Psychology, 22, 28–34. Gregory, R. J. (1999). Foundations of intellectual assessment: The WAIS-III
Gottfredson, G. D., & Holland, J. L. (1989). Dictionary of Holland and other tests in clinical practice. Boston: Allyn and Bacon.
Occupational Codes (2nd ed.). Odessa, FL: Psychological Assessment Gregory, R. J. (2009). Testing bias. In I. Weiner & E. Craighead (Eds.),
Resources. Corsini'.s encyclopedia of psychology. New York: Wiley.
Gottfredson, L. S. (2005). Using Gottfredson'.s theory of Gregory, R. J., & Gernert, C. H. (1990). Age trends for fluid and
circumscription and compromise in career guidance and crystallized intelligence in an able subpopulation. Unpublished
counseling. In S. D. Brown & R. W. Lendt (Eds.), Career development manuscript.
and counseling: Putting theory and research to work (pp. 71–100). New
Gresham, F. M. (1993). “What'.s wrong in this picture?” Response to
York: John Wiley & Sons.
Motta et al.'.s review of human figure drawings. School Psychology
Gough, H. (1995). Career assessment and the California Psychological Quarterly, 8, 182–186.
Inventory. Journal of Career Assessment, 3, 101–122.
Greve, K., Love, J., Sherwin, E., & others. (2002). Temporal stability of
Gough, H. G. (1984). A managerial potential scale for the California the Wisconsin Card Sorting Test in a chronic traumatic brain injury
Psychological Inventory. Journal of Applied Psychology, 69, 233–244. sample. Assessment, 9, 271–277.
Gough, H. G. (1987). California Psychological Inventory manual. Palo Grös, D. F., Antony, M. M., Simms, L. J., & McCabe, R. E. (2007).
Alto, CA: Consulting Psychologists Press. Psychometric properties of the State-Trait Inventory for
Cognitive and Somatic Anxiety (STICSA): Comparison to the
Gough, H. G., & Bradley, P. (1992a). Comparing two strategies for
State-Trait Anxiety Inventory (STAI). Psychological Assessment, 19,
developing personality scales. In M. Zeidner & R. Most (Eds.),
369–381.
Psychological testing: An inside view. Palo Alto, CA: Consulting
Psychologists Press. Grossman, S. A., Richards, C., Anglin, D., & Hutson, H. (2000).
Caring for the patient with mental retardation in the ED. Annals of
Gough, H. G., & Bradley, P. (1992b). Delinquent and criminal
Emergency Medicine, 35, 69–76.
behavior as assessed by the Revised California Psychological
Inventory. Journal of Clinical Psychology, 48, 298–307. Groth-Marnat, G. (1997). Handbook of psychological assessment (2nd
ed.). New York: Wiley.
Gough, H. G., & Bradley, P. (1996). CPI manual (3rd ed.). Mountain
View, CA: Consulting Psychologists Press. Groth-Marnat, G. (2003). Handbook of psychological assessment (4th ed.).
New York: Wiley.
Gould, S. J. (1981). The mismeasure of man. New York: Norton.
Grove, W., & Barden, C. (1999). Protecting the integrity of the legal
Gow, A. J., Johnson, W., Pattie, A., & others. (2011). Stability and
system: The admissibility of testimony from mental health experts
change in intelligence from age 11 to ages 70, 79, and 87: The
under Daubert/Kumho analyses. Psychology, Public Policy, and Law, 5,
Lothian Birth Cohorts of 1921 and 1936. Psychology and Aging, 26,
224–242.
232–240.
Grove, W., Barden, C., Garb, H., & Lilienfeld, S. (2002). Failure
Grace, J., & Malloy, P. F. (2001). Frontal Systems Behavior Scale
of Rorschach-Comprehensive-System-Based testimony to be
professional manual. Lutz, FL: Psychological Assessment Resources.
admissible under the Daubert-Joiner-Kumho standard. Psychology,
Graham, J. (1961). Lavater'.s physiognomy in England. Journal of the Public Policy, and Law, 8, 216–234.
History of Ideas, 22, 561–572.
Grove, W., Zald, D., Lebow, B., Snitz, B., & Nelson, C. (2000). Clinical
Graham, J. R. (1987). The MMPI: A practical guide (2nd ed.). New York: versus mechanical prediction: A meta-analysis. Psychological
Oxford University Press. Assessment, 12, 19–30.
References 415

Guaiana, G., Tyson, P., & Mortimer, A. (2004). The Rivermead of math performance for typical children and children with math
Behavioural Memory Test can predict social functioning among disorders. Psychology in the Schools, 45, 838–858.
schizophrenic patients treated with clozapine. International Journal Hall, P., & Hall, D. (1983). The handshake as interaction. Semiotica, 45,
of Psychiatry in Clinical Practice, 8, 245–249. 249–264.
Gudjonsson, G. H. (1995). The Standard Progressive Matrices: Hambleton, R. K. (1984). Validating the test scores. In R. A. Berk
Methodological problems associated with the administration of (Ed.), A guide to criterion-referenced test construction. Baltimore: Johns
the 1992 adult standardisation sample. Personality and Individual Hopkins University Press.
Differences, 18, 441–442.
Hambleton, R. K. (1989). Principles and selected applications of item
Guilford, J. P. (1954). Psychometric methods. New York: McGraw-Hill.
response theory. In R. L. Linn (Ed.), Educational measurement (3rd
Guilford, J. P. (1959). Personality. New York: McGraw-Hill. ed.). New York: American Council on Education/Macmillan.
Guilford, J. P. (1967). The nature of human intelligence. New York: Hambleton, R. K., & Zenisky, A. (2003). Advances in criterion-
McGraw-Hill. referenced testing methods and practices. In C. R. Reynolds &
Guilford, J. P. (1985). The Structure-of-Intellect model. In B. B. R. W. Kamphaus (Eds.), Handbook of psychological and educational
Wolman (Ed.), Handbook of intelligence: Theories, measurements, and assessment of children: Intelligence, aptitude, and achievement (2nd ed.,
applications. New York: Wiley. pp. 377–404). New York: Guilford Press.

Guilford, J. P., & Fruchter, B. (1978). Fundamental statistics in Hammill, D. D. (1999). Detroit Tests of Learning Aptitude-4 (DTLA-4).
psychology and education (6th ed.). New York: McGraw-Hill. Austin, TX: PRO-ED.

Guilford, J. P., & Guilford, J. S. (1980). Christensen-Guilford Fluency Handler, L., & Clemence, A. (2005). The Rorschach Prognostic Rating
Tests. Orange, CA: Sheridan Psychological Services. Scale. In R. F. Bornstein & J. M. Masling (Eds.), Scoring the Rorschach:
Seven validated systems. Mahwah, NJ: Erlbaum.
Guilford, J. P., & Hoepfner, R. (1971). The analysis of intelligence. New
York: McGraw-Hill. Hansen, J. (2007). Evidence of validity for the skill scale scores of the
Campbell Interest and Skill Survey. Journal of Vocational Behavior, 71,
Guion, R. (1998). Assessment, measurement, and prediction for personnel 23–44.
decisions. Mahwah, NJ: Erlbaum.
Hansen, J. C. (1992). Strong user'.s guide, Revised edition. Palo Alto, CA:
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley. Consulting Psychologists Press.
Gunning, M. D., Denison, F. C., Stockley, C. J., & others. (2010). Hansen, J. C., & Campbell, D. P. (1985). Manual for the Strong Interest
Assessing maternal anxiety in pregnancy with the State-Trait Inventory Form T325 of the Strong Vocational Interest Blanks, Fourth
Anxiety Inventory: Issues of validity, location and participation. Edition. Stanford, CA: Stanford University Press.
Journal of Reproductive and Infant Psychology, 28, 266–273.
Hansen, J.-I., & Neuman, J. (1999). Evidence of concurrent prediction
Gutkin, R. B., & Reynolds, C. R. (1981). Factorial similarity of the of the Campbell Interest and Skill Survey (CISS) for college major
WISC-R for white and black children from the standardization selection. Journal of Career Assessment, 7, 239–247.
sample. Journal of Educational Psychology, 73, 227–231.
Hanson, G. A. (1991). To catch a thief: The legal and policy
Guttman, L. (1944). A basis for scaling qualitative data. American implications of honesty testing in the workplace. Law and Inequality,
Sociological Review, 9, 139–150. 9, 497–531.
Guttman, L. (1947). The Cornell technique for scale and intensity Hanzel, E. P. (2003). Assessment of cognitive abilities in high-
analysis. Educational and Psychological Measurement, 7, 247–280. functioning children with autistic disorder: A comparison of the
Gynther, M. D., & Gynther, R. A. (1976). Personality inventories. In I. WISC-III and the Leiter-R. Dissertation Abstracts International: Section
B. Weiner (Ed.), Clinical methods in psychology. New York: Wiley. B: The Sciences and Engineering, 64(3-B), 1492.
Haaland, K. Y., & Delaney, H. D. (1981). Motor deficits after left or Hare, R. D. (1996). Psychopathy: A clinical construct whose time has
right hemisphere damage due to stroke or tumor. Neuropsychologia, come. Criminal Justice and Behavior, 23, 25–54.
19, 17–27. Hare, R. D. (2003). The Hare Psychopathy Checklist-Revised (PCL-R)
Haber, A., & Fichtenberg, N. (2006). Replication of the Test of (2nd ed.). Toronto: Multi-Health Systems.
Memory Malingering (TOMM) in a traumatic brain injury and head Hare, R. D., & McPherson, L. (1984). Violent and aggressive behavior
trauma sample. The Clinical Neuropsychologist, 20, 524–532. by criminal psychopaths. International Journal of Law and Psychiatry,
Hachinski, V. C., Iliff, L., Zilha, E., & others. (1975). Cerebral blood 7, 35–50.
flow in dementia. Archives of Neurology, 32, 632–637. Hare, R. D., Harpur, T., & Hakstian, R., & others. (1990). The Revised
Hack, M., Taylor, G., Drotar, D., & others. (2005). Poor predictive Psychopathy Checklist: Descriptive statistics, reliability, and factor
validity of the Bayley Scales of Infant Development for cognitive structure. Psychological Assessment: A Journal of Consulting and
function of extremely low birth weight children at school age. Clinical Psychology, 1, 6–17.
Pediatrics, 116, 333–341. Hare, R., & Neuman, C. (2006). The PCL-R assessment of
Haedt-Matt, A. A., & Keel, P. K. (2011). Revisiting the affect psychopathy: Development, structural properties, and new
regulation model of binge eating: A meta-analysis of studies directions. In C. Patrick (Ed.), Handbook of psychopathy (pp. 58–88).
using ecological momentary assessment. Psychological Bulletin, 37, New York: Guilford.
660–681. Hargrave, G., & Hiatt, D. (1989). Use of the California Psychological
Hain, J. (1964). The Bender-Gestalt Test: A scoring method for Inventory in law enforcement officer selection. Journal of Personality
identifying brain damage. Journal of Consulting and Clinical Assessment, 53, 267–277.
Psychology, 28, 34–40.
Hargrave, G., Hiatt, D., Ogard, E., & Karr, C. (1994). Comparison
Haladyna, T. M. (1992). Review of the Millon Clinical Multiaxial of the MMPI and the MMPI-2 for a sample of peace officers.
Inventory-II. Eleventh mental measurements yearbook. Lincoln: Psychological Assessment, 6, 27–32.
University of Nebraska Press.
Harmon, L. W. (1989). Counseling. In R. L. Linn (Ed.), Educational
Hale, J., Fiorello, C., Dumont, R., & others. (2008). “Differential measurement (3rd ed.). New York: American Council on Education/
Ability Scales-Second Edition”: (Neuro) Psychological predictors Macmillan.
416 References

Harmon, L. W., Hansen, J. C., Borgen, F., & Hammer, A. (1994). Strong Haynes, S. N. (2001). Introduction to the special section on clinical
Interest Inventory applications and technical guide. Palo Alto, CA: applications of analogue behavioral observation. Psychological
Consulting Psychologists Press. Assessment, 13, 3–4.
Harrell, T. H., Honaker, L., & Parnell, T. (1992). Equivalence of the Hayslip, B., & Panek, P. E. (1989). Adult development and aging. New
MMPI-2 with the MMPI in psychiatric patients. Psychological York: Harper & Row.
Assessment, 4, 460–465. Heaton, R. K., Chelune, G., Talley, J., & others. (1993). Wisconsin Card
Harrington, D. M. (1975). Effect of explicit instructions to “be Sorting Test manual: Revised and expanded. Odessa, FL: Psychological
creative” on the psychological meaning of divergent thinking test Assessment Resources.
scores. Journal of Personality, 43, 434–454. Heaton, R. K., Smith, H. H., Jr., Lehman, R. A. W., & Vogt, A. T. (1978).
Harris, D. B. (1963). Children'.s drawings as measures of intellectual Prospects for faking believable deficits on neuropsychological
maturity. New York: Harcourt, Brace & World. testing. Journal of Consulting and Clinical Psychology, 46, 892–900.

Harris, M. M., & Schaubroeck, J. (1988). A meta-analysis of self- Hebb, D. O. (1939). Intelligence in man after large removals of
supervisor, self-peer, and peer-supervisor ratings. Personnel cerebral tissue: Report of four left frontal lobe cases. Journal of
Psychology, 38, 43–62. General Psychology, 21, 73–87.

Harrison, D. A., & Hulin, C. L. (1989). Investigations of absenteeism: Heilbrun, A. B., Jr., & Georges, M. (1990). The measurement of
Using event history models to study the absence-taking process. principled morality by the Kohlberg Moral Dilemma Questionnaire.
Journal of Applied Psychology, 74, 300–316. Journal of Personality Assessment, 55, 183–194.

Harrison, D. A., & Shaffer, M. (1994). Comparative examinations of Heilbrun, K. (1992). The role of psychological testing in forensic
self-reports and perceived absenteeism norms: Wading through assessment. Law and Human Behavior, 16, 257–272.
Lake Wobegon. Journal of Applied Psychology, 79, 240–251. Helms, J. E. (1992). Why is there no study of cultural equivalence in
standardized cognitive ability testing? American Psychologist, 47,
Harrison, P. L., & Schock, H. H. (1994). Draw-A-Figure test. In R.
1083–1101.
J. Sternberg (Ed.), Encyclopedia of human intelligence. New York:
Macmillan. Helson, R., & Soto, C. J. (2005). Up and down in middle age:
Monotonic and nonmonotonic changes in roles, status, and
Hartung, P., Borges, N., & Jones, B. (2005). Using person matching
personality. Journal of Personality and Social Psychology, 89, 194–204.
to predict career specialty choice. Journal of Vocational Behavior, 67,
102–117. Helson, R., Kwan, V., John, O. P., & Jones, C. (2002). The growing
evidence for personality change in adulthood: Findings from
Hathaway, S. R., & McKinley, J. C. (1940). A multiphasic personality
research with personality inventories. Journal of Research in
schedule (Minnesota): I. Construction of the schedule. Journal of
Personality, 36, 287–306.
Psychology, 10, 249–254.
Hendriks, A., Hofstee, W., & De Raad, B. (1999). The Five-Factor
Hathaway, S. R., & McKinley, J. C. (1942). A multiphasic personality
Personality Inventory. Personality and Individual Differences, 27, 307–325.
schedule (Minnesota): III. The measurement of symptomatic
depression. Journal of Psychology, 14, 73–84. Herman, D. O. (1988). Blind Learning Aptitude Test. In D. J. Keyser &
R. C. Sweetland (Eds.), Test critiques (vol. 5). Kansas City, MO: Test
Hathaway, S. R., & McKinley, J. C. (1943). The Minnesota Multiphasic Corporation of America.
Personality Inventory (rev. ed.). Minneapolis: University of
Minnesota Press. Hernandez-Reif, M., Field, T., Diego, M., & Ruddock, M. (2006).
Greater arousal and less attentiveness to face/voice stimuli
Hawkins, D. B. (1988). Interpersonal behavior traits, spiritual by neonates of depressed mothers on the Brazelton neonatal
well-being, and their relationships to blood pressure (doctoral Behavioral Assessment Scale. Infant Behavior and Development, 29,
dissertation, Western Conservative Baptist Seminary, 1986). 594–598.
Dissertation Abstracts International, 48, 3680B.
Herrnstein, R. J., & Murray, C. (1994). The bell curve: Intelligence and
Hawkins, D. B., & Larson, R. (1984). The relationship between measures class structure in American life. New York: Free Press.
of health and spiritual well-being. Unpublished manuscript, Western
Conservative Baptist Seminary, Portland, OR. Hersen, M., & Bellack, A. S. (Eds.). (1988). Dictionary of behavioral
assessment techniques. New York: Pergamon.
Hawkins, K. A., Faraone, S. V., Pepple, J. R., Seidman, L. J., & Tsuang,
Herzberg, P., Glaesmer, H., & Hoyer, J. (2006). Separating optimism
M. T. (1990). WAIS-R validation of the Wonderlic Personnel Test as
and pessimism: A robust psychometric analysis of the Revised Life
a brief intelligence measure in a psychiatric sample. Psychological
Orientation Test (LOT-R). Psychological Assessment, 18, 433–438.
Assessment: A Journal of Consulting and Clinical Psychology, 2,
198–201. Heyman, R. (2001). Observation of couple conflicts: Clinical
assessment applications, stubborn truths, and shaky foundations.
Hawkins, K., Dean, D., & Pearlson, G. (2004). Alternative forms of the
Psychological Assessment, 13, 5–35.
Rey Auditory Verbal Learning Test: A review. Behavioral Neurology,
15, 99–107. Hiatt, D., & Hargrave, G. E. (1988). MMPI profiles of problem peace
officers. Journal of Personality Assessment, 52, 722–731.
Hawthorne, J. (2009). Promoting development of the early parent-
infant relationship using the Neonatal Behavioural Assessment Higgs, M. (2001). Is there a relationship between the Myers-Briggs
Scale. In J. Barlow & P. Svanberg (Eds.), Keeping the baby in mind: Type Indicator and emotional intelligence? Journal of Managerial
Infant mental health in practice. New York: Routledge/Taylor & Psychology, 16, 509–533.
Francis Group. Highhouse, S., & Nolan, K. P. (in press). One history of the
Hayes, P. A. (2008). Addressing cultural complexities in practice: assessment center. In D. J. R. Jackson, C. E. Lance, & B. J. Hoffman
Assessment, diagnosis, and therapy (2nd ed.). Washington, DC: (Eds.), The psychology of assessment centers (pp. 25–44). New York:
American Psychological Association. Routledge/Taylor & Francis Group.

Haynes, S. G., Feinleib, M., & Eaker, E. (1983). Type A behavior and Hill, B. (2005). ICAP User'.s Group Home Page. Retrieved from www.
the ten-year incidence of coronary heart disease in the Framingham cpinternet.com/bhill/icap on September 13, 2005.
heart study. In R. H. Rosenman (Ed.), Psychosomatic risk factors and Hill, P. C., & Hood, R. W. (Eds.). (1999). Measures of religiosity.
coronary heart disease. Bern, Switzerland: Huber. Birmingham, AL: Religious Education Press.
References 417

Hill, P. C., & Pargament, K. I. (2008). Advances in the Holland, J. L. (1985c). Vocational Preference Inventory (VPI)
conceptualization and measurement of religion and spirituality: manual—1985 edition. Odessa, FL: Psychological Assessment
Implications for physical and mental health research. Psychology of Resources.
Religion and Spirituality, S(1), 3–17. Holland, J. L. (1987). 1987 manual supplement for the Self-Directed
Hilliard, A. G. (1984). IQ testing as the emperor'.s new clothes: A Search. Odessa, FL: Psychological Assessment Resources.
critique of Jensen'.s Bias in Mental Testing. In C. R. Reynolds & Holland, J. L., Johnston, J., Asama, N. F., & Polys, S. (1993).
R. T. Brown (Eds.), Perspectives on bias in mental testing. New York: Validating and using the Career Beliefs Inventory. Journal of Career
Plenum Press. Development, 19, 233–244.
Hintze, J., Volpe, R., & Shapiro, E. (2002). Best practices in the Hollander, E., Kolevzon, A., & Coyle, J. (2011). Textbook of autism
systematic direct observation of student behavior. In A. Thomas & spectrum disorders. Washington, DC: American Psychiatric Publishing.
J. Grimes (Eds.), Best practices in school psychology IV. Washington,
DC: National Association of School Psychologists. Hollingshead, A., & Redlich, F. (1958). Social class and mental illness.
New York: Wiley.
Hiskey, M. S. (1966). Manual for the Hiskey-Nebraska Test of Learning
Aptitude. Lincoln, NE: Union College Press. Hollingworth, H.L. (1943). Leta Stetter Hollingworth. Lincoln, NE:
University of Nebraska Press.
Hofer, S., Sliwinski, M., & Flaherty, B. (2002). Understanding ageing:
Further commentary on the limitations of cross-sectional designs Hollingworth, L. (1914). Variability as related to sex differences in
for ageing research. Gerontology, 48, 22–29. achievement: A critique. American Journal of Sociology, 19, 510–530.

Hoffart, A., Friis, S., Strand, J., & Olsen, B. (1994). Symptoms and Hollingworth, L. (1928). Children clustering at 165 IQ and children
cognitions during situational and hyperventilatory exposure clustering at 146 IQ compared for three years in achievement. In
in agoraphobic patients with and without panic. Journal of G. Whipple (Ed.), The twenty-seventh yearbook of the National Society
Psychopathology and Behavioral Assessment, 16, 15–32. for the Study of Education: Nature and nurture, Part II—Their influence
upon achievement. Bloomington, IL: Public School Publishing.
Hoffman, F. J., Sheldon, K. L., Minskoff, E. H., & others. (1987).
Needs of learning disabled adults. Journal of Learning Disabilities, Hollingworth, L. (1935). The comparative beauty of the faces of highly
20, 43–52. intelligent adolescents. Journal of Genetic Psychology, 47, 268–281.

Hofmann, S. G., & Reinecke, M. A. (2010). Cognitive-behavioral therapy Hollingworth, L., & Monahan, J. (1926). Tapping-rate of children who
with adults. A guide to empirically-informed assessment and intervention. test above 135 IQ (Stanford-Binet). Journal of Educational Psychology,
New York: Cambridge University Press. 17, 505–518.

Hogan, A. E., Scott, K. G., & Bauer, C. R. (1992). The Adaptive Social Holmes, T., & Rahe, R. (1967). The Social Readjustment Rating Scale.
Behavior Inventory (ASBI): A new assessment of social competence Journal of Psychosomatic Research, 11, 213–218.
in high-risk three-year-olds. Journal of Psychoeducational Assessment, Holzinger, K. J., & Swineford, F. (1939). A study in factor analysis: The
10, 230–239. stability of a bi-factor solution. University of Chicago, Supplementary
Educational Monographs, No. 48.
Hogan, J., & Hogan, R. (1986). Manual for the Hogan Personnel Selection
System. Minneapolis, MN: National Computer Systems. Holzinger, K., & Harman, H. (1941). Factor analysis: A synthesis of
factorial methods. Chicago: University of Chicago Press.
Hogan, R. (2002). The Hogan Personality Inventory. In B. de Raad &
M. Perugini (Eds.), Big five assessment (pp. 329–346). Ashland, OH: Holzman, P., Levy, D., & Johnston, M. H. (2005). The use of the
Hogrefe and Huber. Rorschach technique for assessing formal thought disorder. In R.
F. Bornstein & J. M. Masling (Eds.), Scoring the Rorschach: Seven
Hogan, R. T. (1986). Manual for the Hogan Personality Inventory.
validated systems. Mahwah, NJ: Erlbaum.
Minneapolis, MN: National Computer Systems.
Honzik, M. (1957). Developmental studies of parent-child
Hoge, C. W., McGurk, D., Thomas, J. L., & others. (2008). Mild
resemblance in intelligence. Child Development, 28, 215–228.
traumatic brain injury in U.S. soldiers returning from Iraq. New
England Journal of Medicine, 358(5), 453–463. Hooper, S., Hatton, D., Baranek, G., Roberts, J., & Bailey, D. (2000).
Nonverbal assessment of IQ, attention, and memory abilities in
Hoge, D. R. (1996). Religion in America: The demographics of belief
children with fragile-X syndrome using the Leiter-R. Journal of
and affiliation. In E. P. Shafranske (Ed.), Religion and the clinical
Psychoeducational Assessment, 18, 255–267.
practice of psychology. Washington, DC: American Psychological
Association. Horn, J. L. (1968). Organization of abilities and the development of
intelligence. Psychological Review, 75, 242–259.
Hoge, S., Bonnie, R., Poythress, N., & Monahan, J. (1999). The
MacArthur Competence Assessment Tool—Criminal Adjudication. Horn, J. L. (1985). Remodeling old models of intelligence. In B. B.
Odessa, FL: Psychological Assessment Resources. Wolman (Ed.), Handbook of intelligence: Theories, measurements, and
applications. New York: Wiley.
Holland, J. L. (1959). A theory of vocational choice. Journal of
Counseling Psychology, 6, 35–44. Horn, J. L. (1994). Theory of fluid and crystallized intelligence. In
R. J. Sternberg (Ed.), Encyclopedia of human intelligence. New York:
Holland, J. L. (1966). The psychology of vocational choice. Waltham, MA:
Macmillan.
Blaisdell.
Horn, J. L., & Cattell, R. B. (1966). Refinement and test of the theory
Holland, J. L. (1978). The occupations finder. Palo Alto, CA: Consulting
of fluid and crystallized general intelligences. Journal of Educational
Psychologists Press.
Psychology, 57, 253–270.
Holland, J. L. (1985). Vocational Preference Inventory (VPI) Horn, J. L., & Masunaga, H. (2000). New directions for research
manual—1985 edition. Odessa, FL: Psychological Assessment into aging and intelligence: The development of expertise. In T. J.
Resources. Perfect & E. A. Maylor (Eds.), Models of cognitive aging (pp. 125–159).
Holland, J. L. (1985a). Making vocational choices: A theory of vocational Oxford, England: Oxford University Press.
personalities and work environments (2nd ed.). Englewood Cliffs, NJ: Horton, A. (2008). The Halstead-Reitan Neuropsychological Test
Prentice Hall. Battery: Past, present, and future. In A. Horton & D. Wedding
Holland, J. L. (1985b). Self-Directed Search: Professional manual—1985 (Eds.), The neuropsychology handbook (3rd ed.) (pp. 251–278). New
edition. Odessa, FL: Psychological Assessment Resources. York: Springer.
418 References

Hough, L. M., Eaton, N., Dunnette, M., Kamp, J., & McCloy, R. (1990). Ivcevic, Z., & Mayer, J. D. (2009). Mapping dimensions of creativity in
Criterion-related validities of personality constructs and the effect the life-space. Creativity Research Journal, 21, 152–165.
of response distortion on those validities [Monograph]. Journal of Iversen, G., Williamson, D., Ropacki, M., & Reilly, K. (2007).
Applied Psychology, 75, 581–595. Frequency of abnormal scores on the Neuropsychological
Howell, R. J., & Richards, L. (1989). Review of the Rogers Criminal Assessment Battery Screening Module (S-NAB) in a mixed
Responsibility Assessment Scales. The tenth mental measurements neurological sample. Applied Neuropsychology, 14, 178–182.
yearbook. Lincoln: University of Nebraska Press.
Jaberg, P. E., Dixon, D. J., & Weis, G. M. (2009). Replication evidence
Huffcutt, A. (2007). Employment interviews. In D. Whetzel & G. in support of the psychometric properties of the Devereux Early
Wheaton (Eds.), Applied measurement: Industrial psychology in human Childhood Assessment. Canadian Journal of School Psychology, 24,
resources management (pp. 181–199). New York: Taylor & Francis/ 158–166.
Erlbaum.
Jackson, A., Brooks-Gunn, J., Huang, C., & Glassman, M. (2000).
Huffcutt, A. I., & Roth, P. (1998). Racial group differences in Single mothers in low-wage jobs: Financial strain, parenting, and
employment interview evaluations. Journal of Applied Psychology, 83, preschoolers'. outcomes. Child Development, 71, 1409–1423.
179–189.
Jackson, D. N. (1970). A sequential system for personality scale
Hughes, J. L., & McNamara, W. J. (1959). Manual for the revised development. In C. D. Spielberger (Ed.), Current topics in clinical and
Programmer Aptitude Test. New York: The Psychological community psychology (vol. 2). Orlando, FL: Academic Press.
Corporation.
Jackson, D. N. (1984a). Manual for the Multidimensional Aptitude
Human Rights Watch. (2001). Beyond reason: The death penalty and Battery. Port Huron, MI: Research Psychologists Press.
offenders with mental retardation. Human Rights Watch Publications,
Jackson, D. N. (1984b). Personality Research Form manual. Port Huron,
13, 1–50.
MI: Research Psychologists Press.
Humphreys, L. G. (1971). Theory of intelligence. In R. Cancro (Ed.),
Jackson, D. N. (1991). Jackson Vocational Interest Survey manual (3rd
Intelligence: genetic and environmental influences. New York: Grune &
ed.). Port Huron, MI: Research Psychologists Press.
Stratton.
Jackson, D. N. (1998). Manual for the Multidimensional Aptitude Battery,
Hunsberger, B. (1995). Religion and prejudice: The role of religious
Second Edition. Port Huron, MI: Research Psychologists Press.
fundamentalism, quest, and right-wing authoritarianism. Journal of
Social Issues, 51, 113–129. Jackson, D. N. (2000). Career Directions Inventory manual. Port Huron,
MI: Sigma Assessment Systems.
Hunsley, J., & Bailey, J. (1999). The clinical utility of the Rorschach:
Unfulfilled promises and an uncertain future. Psychological Jackson, D. N., & Messick, S. (1968). Creativity. In P. London & D.
Assessment, 11, 266–277. Rosenhan (Eds.). Foundations of abnormal psychology. New York:
Holt.
Hunsley, J., & Mash, E. J. (2005). Introduction to the special section on
developing guidelines for the evidence-based assessment (EBA) of Jackson, J., Mulick, J., & Rojahn, J. (Eds.). (2007). Handbook of
adult disorders. Psychological Assessment, 17, 251–255. intellectual and developmental disabilities. New York: Springer.
Hunter, J. E. (1989). The Wonderlic Personnel Test as a predictor of James, W. (1902). The varieties of religious experience. New York:
training success and job performance. Northfield, IL: E. F. Wonderlic Longman.
Personnel Test. Jankowski, D. (2002). A beginner'.s guide to the MCMI-III. Washington,
Hunter, J. E. (1994). General Aptitude Test Battery. In R. J. Sternberg DC: American Psychological Association.
(Ed.), Encyclopedia of human intelligence. New York: Macmillan.
Jennett, B., & Teasdale, G. (1981). Management of head injuries.
Hunter, J. E., & Schmidt, F. L. (1976). Critical analysis of the statistical Philadelphia: F. A. Davis.
and ethical implications of various definitions of test bias.
Jennett, B., Teasdale, G. M., & Knill-Jones, R. P. (1975). Predicting
Psychological Bulletin, 83, 1053–1071.
outcome after head injury. Journal of Royal College of Physicians of
Hurtz, G., & Donovan, J. (2000). Personality and job performance: London, 9, 231–237.
The Big Five revisited. Journal of Applied Psychology, 83, 869–879.
Jensen, A. (1998). The g factor: The science of mental ability. Westport,
Hutt, M. L. (1977). The Hutt Adaptation of the Bender-Gestalt Test. New CT: Praeger.
York: Grune & Stratton.
Jensen, A. R. (1969). How much can we boost IQ and scholastic
Hutt, M. L., & Briskin, G. J. (1960). The clinical use of the revised Bender- achievement? Harvard Educational Review, 39, 1–123.
Gestalt Test. New York: Grune & Stratton.
Jensen, A. R. (1977). Cumulative deficit in IQ of blacks in the rural
Institute of Medicine. (2001). Crossing the quality chasm: A new health south. Developmental Psychology, 13, 184–191.
system for the 21st century. Washington, DC: National Academy Press.
Jensen, A. R. (1979). g: outmoded theory or unconquered frontier?
International Psychogeriatric Association. (2002). Behavioral and Creative Science and Technology, 2, 16–29.
Psychological Symptoms of Dementia (BPSD) educational pack. Skokie,
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
IL: Author.
Jensen, A. R. (1981). Raising the IQ: The Ramey and Haskins Study.
Inwald, R. (2008). The Inwald Personality Inventory (IPI) and Hilson
Intelligence, 5, 29–40.
Research Inventories: Development and rationale. Aggression and
Violent Behavior, 13, 298–327. Jensen, A. R. (1984). Test bias: Concepts and criticisms. In C. R.
Reynolds & R. T. Brown (Eds.), Perspectives on bias in mental testing.
Inwald, R. E. (1988). Five-year follow-up of departmental
New York: Plenum Press.
terminations as predicted by 16 preemployment psychological
indicators. Journal of Applied Psychology, 73, 703–710. Jensen, A. R. (1998). The g factor: The science of mental ability. Westport,
Irwin, P. M. (1992). Elementary and Secondary Education Act of 1965: CT: Praeger.
FY 1993 Guide to Programs. Congressional Research Service. Jensen, A. R. (2006). Clocking the mind: Mental chronometer individual
Washington, DC: Government Printing Office. differences. Amsterdam: Elsevier.
Itard, J. M. G. (1932/1801). The wild boy of Aveyron. Trans. by G. & M. Jensen, A. R. (2011). The theory of intelligence and its measurement.
Humphrey. New York: Appleton-Century-Crofts. Intelligence, 39, 171–177.
References 419

Jensen, A. R., & Osborne, R. T. (1979). Forward and backward digit span Kapuscinski, A. N., & Masters, K. S. (2010). The current status of
interaction with race and IQ: A longitudinal developmental comparison. measures of spirituality: A critical review of scale development.
Berkeley: University of California. (ERIC Document Reproduction Psychology of Religion and Spirituality, 2, 191–205.
Service No. ED 173 384).
Kaufman, A. S. (1983). Test review: WAIS-R. Journal of
John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The Big Five Psychoeducational Assessment, 1, 309–319.
Inventory: Versions 4a and 54. Berkeley, CA: University of California,
Kaufman, A. S. (1990). Assessing adolescent and adult intelligence.
Berkeley, Institute of Personality and Social Research.
Boston: Allyn and Bacon.
John, O. P., Naumann, L. P., & Soto, C. J. (2008). Paradigm shift to the
Kaufman, A. S., & Kaufman, N. L. (1983). K-ABC administration and
integrative Big-Five trait taxonomy: History, measurement, and
scoring manual. Circle Pines, MN: American Guidance Service.
conceptual issues. In O. P. John, R. W. Robins, & L. A. Pervin (Eds.),
Handbook of personality: Theory and research (pp. 114–158). New York: Kaufman, A. S., & Kaufman, N. L. (2004a). Kaufman Brief Intelligence
Guilford Press. Test (2nd ed.). Circle Pines, MN: American Guidance Service.
Johnson, J. H., & Williams, T. A. (1975). The use of on-line computer Kaufman, A. S., & Kaufman, N. L. (2004b). Kaufman Test of Educational
technology in a mental health admitting system. American Achievement (2nd ed.). Circle Pines, MN: American Guidance
Psychologist, 30, 388–390. System Publishing.
Johnson, R. C., McClearn, G. E., Yuen, S., Nagoshi, C. T., Ahern, F. Kaufman, A. S., & Lichtenberger, E. O. (2002). Assessing adolescent and
M., & Cole, R. E. (1985). Galton'.s data a century later. American adult intelligence (2nd ed.). Boston: Allyn & Bacon.
Psychologist, 40, 875–892. Kaufman, A. S., McLean, J. E., & Reynolds, C. R. (1988). Sex, race,
Johnson, S. T. (1994). Scholastic Assessment Tests. In R. J. Sternberg residence, region, and education differences on the 11 WAIS-R
(Ed.), Encyclopedia of human intelligence. New York: Macmillan. subtests. Journal of Clinical Psychology, 44, 231–248.
Johnston, D. W. (1986). Behavior therapy. In R. Harre & R. Lamb Kaufman, J. C., & Baer, J. (2004). Hawking'.s Haiku, Madonna'.s
(Eds.), The dictionary of physiological and clinical psychology. math: Why it is hard to be creative in every room of the house. In
Cambridge, MA: MIT Press. R. J. Sternberg, E. L. Grigorenko, & J. L. Singer (Eds.), Creativity:
From potential to realization (pp. 3–19). Washington, DC: American
Johnston, W. T., & Bolen, R. M. (1984). A comparison of the factor
Psychological Association.
structures of the WISC-R for Blacks and Whites. Psychology in the
Schools, 21, 42–44. Kaufman, J. C., Cole, J. C., & Baer, J. (2009). The construct of
creativity: A structural model for self-reported creativity ratings.
Joint Committee on Testing Practices. (1988). Code of fair testing
Journal of Creative Behavior, 43, 119–134.
practices in education. Washington, DC: Author.
Kaufman, J. D. (2012). Development of the Kaufman Domains of
Jones, K. L., Smith, D. W., Ulleland, C. N., & Streissguth, A. P. (1973).
Creativity Scale (K-DOCS). Psychology of Aesthetics, Creativity, and
Patterns of malformation in offspring of chronic alcoholic mothers.
the Arts, 6, 298–308.
Lancet, 1, 1267–1271.
Kausler, D. (1991). Experimental psychology, cognition, and human aging
Jones, K., & Barber, J. (2012). Help for unemployed Americans. APA
(2nd ed.). New York: Springer-Verlag.
Monitor, 43(1), 18–19.
Kazdin, A. E. (1990). Evaluation of the Automatic Thoughts
Julian, E. (2005). Validity of the Medical College Admission Test for
Questionnaire: Negative cognitive processes and depression among
predicting medical school performance. Academic Medicine, 80,
children. Psychological Assessment: A Journal of Consulting and Clinical
910–917.
Psychology, 2, 73–79.
Jung, C. G. (1910). The association method. American Journal of
Keith, T. Z. (1999). Effects of general and specific abilities on student
Psychology, 21, 219–269.
achievement: Similarities and differences across ethnic groups.
Kaiser, H. F., & Michael, W. B. (1975). Domain validity and School Psychology Quarterly, 14, 239–262.
generalizability. Educational and Psychological Measurement, 35,
Kelley, T. L. (1928). Crossroads in the mind of man: A study of
31–35.
differentiable mental abilities. Stanford, CA: Stanford University
Kalat, J. (2012). Biological psychology (11th ed.). Belmont, CA: Press.
Wadsworth.
Kelly, E. L., & Fiske, D. W. (1951). The prediction of performance in
Kamin, L. J., & Goldberger, A. S. (2001). Twin studies in behavioral clinical psychology. Ann Arbor: University of Michigan Press.
research: A skeptical view. Unpublished manuscript.
Kendall, P. C., & Hollon, S. D. (1989). Anxious self-talk: Development
Kamphaus, R. W. (1993). Clinical assessment of children'.s intelligence. of the Anxious Self-Statements Questionnaire (ASSQ). Cognitive
Boston: Allyn and Bacon. Therapy and Research, 13, 81–93.
Kanaya, T., Scullin, M., & Ceci, S. (2003). The Flynn effect and U.S. Kennedy, C., & Moore, J. (Eds.). (2010). Military neuropsychology. New
Policies: The impact of rising IQ scores on American society via York: Springer Publishing.
mental retardation diagnoses. American Psychologist, 58, 778–790.
Kennedy, W. A., Van de Riet, V., & White, J. C., Jr. (1963). A normative
Kandel, E. R. (1991). Perception of motion, depth, and form. In E. R. sample of intelligence and achievement of negro elementary school
Kandel, J. H. Schwartz, & T. M. Jessell (Eds.), Principles of neural children in the southeast United States. Monographs of the Society for
science (3rd ed.). New York: Elsevier. Research in Child Development, 28 [No. 90], 68.
Kandel, E. R., Schwartz, J. H., & Jessell, T. M. (1995). Essentials of Kent, G. H., & Rosanoff, A. J. (1910). A study of association in
neural science and behavior. Norwalk, CT: Appleton & Lange. insanity. American Journal of Insanity, 67, 37–96; 317–390.
Kandel, E. R., Schwartz, J. H., Jessel, T. M., Siegelbaum, S. A., & Kerr, B., & Gagliardi, C. (2003). Measuring creativity in research and
Hudspeth, A. J. (2013). Principles of neural science (5th ed. rev.). New practice. In S. Lopez & C. R. Snyder (Eds.), Positive psychological
York: McGraw-Hill Medical. assessment: A handbook of models and measures. Washington, DC:
American Psychological Association.
Kane, R. L. (1991). Standardized and flexible batteries in
neuropsychology: An assessment update. Neuropsychology Review, Kertesz, A. (1982). Aphasia and associated disorders: Taxonomy,
2, 281–339. localization, and recovery. New York: Grune & Stratton.
420 References

Kertesz, A. (2006). Western Aphasia Battery-Revised. San Antonio, TX: Kohlberg, L. (1958). The development of modes of moral thinking and
Harcourt. choice in the years ten to sixteen. Unpublished doctoral dissertation,
University of Chicago.
Keyser, D. J., & Sweetland, R. C. (Eds.). (1984–1994). Test Critiques
(volumes I–X). Kansas City, MO: Test Corporation of America. Kohlberg, L. (1981). Essays on moral development: Vol. 1. The philosophy
Khaleefa, O., & Lynn, R. (2008). Normative data for Raven'.s of moral development. San Francisco: Harper & Row.
Coloured Progressive Matrices Scale in Yemen. Psychological Reports, Kohlberg, L. (1984). Essays on moral development: Vol. 2. The psychology
103, 170–172. of moral development. San Francisco: Harper & Row.
Kiecolt-Glaser, J. K. (2009). Psychoneuroimmunology: Psychology'.s Kohlberg, L., & Elfenbein, D. (1975). The development of moral
gateway to the biomedical future. Perspectives on Psychological judgments concerning capital punishment. American Journal of
Science, 4, 367–369. Orthopsychiatry, 45, 614–639.
Kiernan, R., Mueller, J., & Langston, J. W. (2009). Cognistat manual. Kohlberg, L., & Kramer, R. (1969). Continuities and discontinuities
Fairfax, CA: Cognistat, Inc. in children and adult moral development. Human Development, 12,
Kifer, E. (1985). Review of ACT Assessment Program. Ninth mental 225–252.
measurements yearbook. Lincoln: University of Nebraska Press. Kolb, B., & Milner, B. (1981). Performance of complex arm and facial
Killian, G. A. (1987). House-Tree-Person technique. In D. J. Keyser & movements after focal brain lesions. Neuropsychologia, 19, 491–503.
R. C. Sweetland (Eds.), Test critiques compendium. Kansas City, MO: Kolb, B., & Whishaw, I. Q. (2002). Fundamentals of human
Test Corporation of America. neuropsychology (5th ed.). New York: Worth/Freeman.
Kim, K. H. (2006). Can we trust creativity tests? A review of the Kolb, B., & Whishaw, I. Q. (2011). An introduction to brain and behavior
Torrance Tests of Creative Thinking (TTCT). Creativity Research (3rd ed.). New York: Worth Publishers.
Journal, 18, 3–14.
Kolb, B., Milner, B., & Taylor, L. (1983). Perception of faces by patients
Kim, W. J., Kim, L. I., & Rue, D. S. (1997). Korean American children. with localized cortical excisions. Canadian Journal of Psychology, 37,
In G. Johnson-Powell, J. Yamamoto, G. E. Wyatt, & W. Arroyo 8–18.
(Eds.), Transcultural child development: Psychological assessment and
Koppitz, E. (1963). The Bender Gestalt Test for young children. New
treatment (pp. 183–207). Hoboken, NJ: John Wiley & Sons.
York: Grune and Stratton.
Kim, Y., Pilkonis, P. A., Frank, E., Thase, M. E., & Reynolds, C. F.
Koppitz, E. (1975). The Bender Gestalt Test for young children, Volume II:
(2002). Differential functioning of the Beck Depression Inventory in
Research and application, 1963–1975. New York: Grune and Stratton.
late-life patients: Use of item response theory. Psychology and Aging,
17, 379–391. Koss, E., Patterson, M., Mack, J., Smyth, K., & Whitehouse, P.
(1998). Reliability and validity of the Tinkertoy Test in evaluating
King, K. (2001). A critique of behavioral observational coding systems
individuals with Alzheimer'.s disease. Clinical Neuropsychologist, 12,
of couples'. interaction: CISS and RCISS. Journal of Social and Clinical
325–329.
Psychology, 20, 1–23.
Kostrubala, C., & Braden, J. (1998). The American Sign Language
Kinnear, P. R., & Gray, C. D. (1997). SPSS for Windows made simple
translation of the WAIS-III. San Antonio, TX: The Psychological
(2nd ed.). Trowbridge, UK: Psychology Press.
Corporation.
Kinsbourne, M. (1994). Neuropsychology of attention. In D. W. Zaidel
Kraus, J. F., & MacArthur, D. L. (1996). Epidemiologic aspects of brain
(Ed.), Neuropsychology. San Diego, CA: Academic Press.
injury. Neurologic Clinics, 14(2): 435–450.
Kirk, J. W., Harris, B., Hutaff-Lee, C. F., & others. (2010). Performance
Krikorian, R., & Bartok, J. (1998). Developmental data for the Porteus
on the Test of Memory Malingering (TOMM) among a large clinic-
Maze Test. Clinical Neuropsychologist, 12, 305–310.
referred pediatric sample. Child Neuropsychology, 17, 242–254.
Krokoff, L. J., Gottman, J., & Hass, S. (1989). Validation of a global rapid
Kirkpatrick, L., & Hood, R. (1990). Intrinsic-Extrinsic Religious
couples interaction scoring system. Behavioral Assessment, 11, 65–79.
Orientation: The boon or bane of contemporary psychology of
religion? Journal for the Scientific Study of Religion, 29, 442–462. Krugman, M. (1970). H-T-P: House, Tree, and Person. In O. K. Buros
(Ed.), Personality tests and reviews. Highland Park, NJ: Gryphon
Kleiman, L., & Faley, R. (1985). The implications of professional and
Press.
legal guidelines for court decisions involving criterion-related
validity: A review and analysis. Personnel Psychology, 38, 303–833. Krumboltz, J. (1999). Career Beliefs Inventory: Applications and technical
guide. Palo Alto, CA: Consulting Psychologists Press.
Klieger, D. M., & Franklin, M. E. (1993). Validity of the fear
survey schedule in phobia research: A laboratory test. Journal of Krumboltz, J. D. (1993). Integrating career and personal counseling.
Psychopathology and Behavioral Assessment, 15, 207–218. Career Development Quarterly, 42, 143–148.
Klimoski, R., & Palmer, S. (1994). The ADA and the hiring process in Krumboltz, J. D. (1996). A learning theory of career counseling. In M.
organizations. In S. M. Bruyere & J. O'.Keeffe (Eds.), Implications of L. Savickas & W. B. Walsh (Eds.), Handbook of career counseling theory
the Americans with Disabilities Act for psychology. New York: Springer. and practice (pp. 55–80). Palo Alto, CA: Davies-Black.
Kline, P. (1986). A handbook of test construction. New York: Methuen. Krumboltz, J. D. (2009). The happenstance learning theory. Journal of
Career Assessment, 17, 135–154.
Kline, P. (1993). The handbook of psychological testing. London:
Routledge. Krumboltz, J. D., & Vosvick, M. A. (1996). Career assessment and the
Career Beliefs Inventory. Journal of Career Assessment, 4, 345–361.
Kline, P. (1999). The handbook of psychological testing (2nd ed.). London:
Routledge. Kuder, G. F. (1934). Kuder preference record. Chicago: Science Research
Associates.
Klingler, D. E., Miller, D., Johnson, J., & Williams, T. (1977). Process
evaluation of an on-line computer-assisted unit for intake Kuder, G. F., & Richardson, M. W. (1937). The theory of estimation of
assessment of mental health patients. Behavior Research Methods and test reliability. Psychometrika, 2, 151–160.
Instrumentation, 9, 110–116. Kuncel, N. R., & Sackett, P. R. (2007). Selective citation mars
Klove, H. (1963). Clinical neuropsychology. In F. M. Forster (Ed.), The conclusions about test validity and predictive bias. American
medical clinics of North America. New York: Saunders. Psychologist, 62, 145–146.
References 421

Kuncel, N., Campbell, J., & Ones, D. (1998). Validity of the Graduate Larson, G. E. (1994). Armed Services Vocational Aptitude Battery. In
Record Examination: Estimated or tacitly known? American R. J. Sternberg (Ed.), Encyclopedia of human intelligence. New York:
Psychologist, 53, 567–568. Macmillan.
Kuncel, N., Hezlett, S., & Ones, D. (2001). A comprehensive Larson, G. E., & Wolfe, J. (1995). Validity results for g from an
meta-analysis of the predictive validity of the Graduate Record expanded test base. Intelligence, 20, 15–25.
Examinations: Implications for graduate student selection and Lassiter, K., & Bardos, A. (1995). The relationship between young
performance. Psychological Bulletin, 127, 162–181. children'.s academic achievement and measures of intelligence.
Kupfermann, I. (1991). Hypothalamus and limbic system: Peptidergic Psychology in the Schools, 32, 170–177.
neurons, homeostasis, and emotional behavior. In E. R. Kandel, J. Latham, G. P., & Skarlicki, D. (1995). Criterion-related validity of
H. Schwartz, & T. M. Jessell (Eds.), Principles of neural science (3rd the situational and patterned behavior description interviews
ed.). New York: Elsevier. with organizational citizenship behavior. Human Performance, 8,
Kurtines, W., & Greif, E. B. (1974). The development of moral 67–80.
thought: Review and evaluation of Kohlberg'.s approach. Lau, B. C., Collins, M. W., & Lovell, M. R. (2011). Sensitivity and
Psychological Bulletin, 81, 453–470. specificity of subacute computerized neurocognitive testing and
Kvaal, K., Ulstein, I., Nordhus, I. H., & Engedal, K. (2005). The symptom evaluation in predicting outcomes after sports-related
Spielberger State-Trait Anxiety Inventory (STAI): The state scale in concussion. American Journal of Sports Medicine, 39(6), 1209–1216.
detecting mental disorders in geriatric patients. International Journal Laux, J., Salyers, K., & Kotova, E. (2005). A psychometric evaluation
of Geriatric Psychiatry, 20, 629–634. of the SASSI-3 in a college sample. Journal of College Counseling, 8,
Kwate, N. (2001). Intelligence or misorientation? Eurocentrism in the 41–51.
WISC-III. Journal of Black Psychology, 27, 221–238. LaVoie, A. L. (1987). The Blacky Pictures. In D. J. Keyser & R. C.
La Rue, A. (1992). Aging and neuropsychological assessment. New York: Sweetland (Eds.), Test critiques compendium. Kansas City, MO: Test
Plenum. Corporation of America.

Laatsch, L., & Choca, J. (1994). Cluster-branching methodology for LeBuffe, P. A., & Naglieri, J. A. (1999a). Devereux Early Childhood
adaptive testing and the development of the Adaptive Category Assessment (DECA): A measure of within-child protective factors
Test. Psychological Assessment, 6, 345–351. in preschool children. NHSA Dialog, 3, 75–80.

LaBarbera, D. (2005). Physician assistant Self-Directed Search LeBuffe, P. A., & Naglieri, J. A. (1999b). Devereux Early Childhood
Holland Codes. Journal of Career Assessment, 13, 337–346. Assessment Program: Technical manual. Lewisville, NC: Kaplan Press.

Lachar, D. (1974). The MMPI: Clinical assessment and automated LeBuffe, P. A., & Naglieri, J. A. (2003). The Devereux Early Childhood
interpretation. Los Angeles: Western Psychological Services. Assessment Clinical Form (DECA-C): A measure of behaviors related to
risk and resilience in preschool children. Lewisville, NC: Kaplan Press.
Lachar, D. (1987). Automated assessment of child and adolescent
personality. In J. N. Butcher (Ed.), Computerized psychological Ledbetter, M., Smith, L., Vosler-Hunter, W., & Fischer, J. (1991). An
assessment: A practitioner'.s guide. New York: Basic Books. evaluation of the research and clinical usefulness of the Spiritual
Well-Being Scale. Journal of Psychology and Theology, 19, 49–55.
Lachar, D., & Gdowski, C. L. (1979). Actuarial assessment of child and
adolescent personality: An interpretive guide for the Personality Inventory Lee, M. S., Wallbrown, F., & Blaha, J. (1990). Note on the construct
for Children profile. Los Angeles: Western Psychological Services. validity of the Multidimensional Aptitude Battery. Psychological
Reports, 67, 1219–1222.
Lachar, D., & Gruber, C. (2001). Manual: Personality Inventory for
Children-2. Los Angeles: Western Psychological Services. Lefebvre, M. F. (1981). Cognitive distortion and cognitive errors
in depressed psychiatric and low back pain patients. Journal of
Lacks, P. (1999). Bender-Gestalt screening for brain dysfunction (2nd ed.). Consulting and Clinical Psychology, 49, 517–525.
New York: Wiley.
Lehman, R. E. (1978). Symptom contamination of the Schedule of
Lah, M. I. (1989). New validity, normative, and scoring data for the Recent Events. Journal of Consulting and Clinical Psychology, 46,
Rotter Incomplete Sentences Blank. Journal of Personality Assessment, 1564–1565.
53, 607–620.
Leiter, R. G. (1948). Leiter International Performance Scale. Chicago:
Lah, M. I., & Rotter, J. B. (1981). Changing college student norms on Stoelting Co.
the Rotter Incomplete Sentences Blank. Journal of Consulting and
Leiter, R. G. (1979). Leiter International Performance Scale: Instruction
Clinical Psychology, 49, 985.
manual. Chicago: Stoelting Co.
Lamp, R., & Krohn, E. (2001). A longitudinal predictive validity
Leli, D. A., & Filskov, S. B. (1984). Clinical detection of intellectual
investigation of the SB:FE and K-ABC with at-risk children. Journal
deterioration associated with brain damage. Journal of Clinical
of Psychoeducational Assessment, 19, 334–349.
Psychology, 40, 1435–1441.
Landy, F. (1996). The psychology of work behavior (5th ed.) Monterey,
Lent, R. W., Brown, S. D., & Hackett, G. (2000). Contextual supports
CA: Brooks/Cole.
and barriers to career choice: A social cognitive analysis. Journal of
Landy, F. J., & Farr, J. L. (1983). The measurement of work performance: Counseling Psychology, 47, 36–49.
Methods, theory and applications. New York: Academic Press.
Lester, B. M. (1984). Data analysis and prediction. In T. B. Brazelton
Lane, S. (1992). Review of the Iowa Tests of Basic Skills. Eleventh (Ed.), Neonatal Behavioral Scale (2nd Ed.). London: Spastics
mental measurements yearbook. Lincoln: University of Nebraska International Medical Publications.
Press.
Levashina, J., Morgeson, F. P., & Campion, M. A. (2012). Tell me some
LaPiana, W. P. (1998). A history of the Law School Admission more: Exploring how verbal ability and item verifiability influence
Council and the LSAT. Keynote Address to the 1998 LSAC Annual responses to biodata questions in a high-stakes selection context.
Meeting. Personnel Psychology, 65, 359–383.
Larrabee, G. (2008). Flexible vs. fixed batteries in forensic Levin, H., Song, J., Ewing-Cobbs, L., & Roberson, G. (2001). Porteus
neuropsychological assessment: Reply to Bigler and Hom. Archives Maze performance following traumatic brain injury in children.
of Clinical Neuropsychology, 23(7–8), 763–776. Neuropsychology, 15, 557–567.
422 References

Levinson, E. M. (1990). Vocational assessment involvement and use Lindzey, G. (1959). On the classification of projective techniques.
of the Self-Directed Search by school psychologists. Psychology in the Psychological Bulletin, 56, 158–168.
Schools, 27, 217–228. Linn, R. L. (1989). Review of the Iowa Tests of Basic Skills. Tenth Mental
Lewinsohn, P. M. (1965). Psychological correlates of overall quality of Measurements Yearbook. Lincoln: University of Nebraska Press.
figure drawings. Journal of Consulting Psychology, 29, 504–512.
Lipsitt, P. D. (1970). Competency Screening Test. Boston: Competency to
Lewinsohn, P. M., Munoz, R. F., Youngren, M. A., & Zeiss, A. M. Stand Trial and Mental Illness Project.
(1986). Control your depression: Reducing depression through learning
Lipsitz, J. D., Dworkin, R., & Erlenmeyer-Kimling, L. (1993). Wechsler
self-control techniques, relaxation training, pleasant activities, social
Comprehension and Picture Arrangement subtests and social
skills, constructed thinking, planning ahead, and more (rev. ed.). New
adjustment. Psychological Assessment, 5, 430–437.
York: Prentice Hall.
Lishman, W. A. (1997). Organic psychiatry: The psychological
Lewinsohn, P., & Talkington, J. (1979). Studies on the measurement
consequences of cerebral disorder (3rd ed.). Oxford: Blackwell Scientific
of unpleasant events and relations with depression. Applied
Publications.
Psychological Measurement, 3, 83–101.
Liskow, B., Campbell, J., Nickel, E., & Powell, B. (1995). Validity of
Lewis, M., & Brooks-Gunn, J. (1981). Visual attention at three
the CAGE questionnaire in screening for alcohol dependence in a
months as a predictor of cognitive functioning at two years of age.
walk-in (triage) clinic. Journal of Studies on Alcohol, 56, 277–281.
Intelligence, 5, 131–140.
Loe, S. A., Kadlubek, R. M., & Williams, W. J. (2007). Administration
Lewis, M., & Sullivan, M. W. (1985). Infant intelligence and its
and scoring errors on the WISC-IV among graduate student
assessment. In B. B. Wolman (Ed.), Handbook of intelligence: Theories,
examiners. Journal of Psychoeducational Assessment, 25, 237–247.
measurements, and applications. New York: Wiley.
Lofquist, L. H., & Dawis, R. V. (1991). Essentials of person-environment
Lezak, M. (1982). The problem of assessing executive functions.
correspondence counseling. Minneapolis: University of Minnesota
International Journal of Psychology, 17, 281–297.
Press.
Lezak, M. (1983). Neuropsychological assessment (2nd ed.). New York:
Lohman, D., & Hagen, E. (2001). Cognitive Abilities Test, Form 6;
Oxford University Press.
Examiner'.s manual. Boston: Houghton Mifflin.
Lezak, M. (1995). Neuropsychological assessment (3rd ed.). New York:
Lopez, S. J., & Snyder, C. R. (Eds.). (2003). Positive psychological
Oxford University Press.
assessment: A handbook of models and measures. Washington, DC:
Lezak, M. D., & O'.Brien, K. P. (1990). Chronic emotional, social, and American Psychological Association.
physical changes after traumatic brain injury. In E. D. Bigler (Ed.),
Lopez, S., & Snyder, C. R. (Eds.). (2003). Positive psychological
Traumatic brain injury: Mechanisms of damage, assessment, intervention,
assessment: A handbook of models and measures. Washington, DC:
and outcome. Austin, TX: PRO-ED.
American Psychological Association.
Lezak, M. D., Howieson, D. B., Bigler, E. D., & Tranel, D. (2012).
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test
Neuropsychological assessment (5th ed.). New York: Oxford
scores. Menlo Park, CA: Addison-Wesley.
University Press.
Lezak, M., Howieson, D., & Loring, D. (2004). Neuropsychological Lord, F., & Novick, M. (1968). Statistical theories of mental tests. New
assessment (4th ed.). New York: Oxford University Press. York: Addison-Wesley.

Lichtenberg, P., Manning, Vangel, S., & Ross. T. (1995). Normative Lovell, M. R. (2006). The ImPACT neuropsychological test battery.
and ecological validity data in older urban medical patients: A In R. J. Echemendia (Ed.), Sports neuropsychology: Assessment and
program of neuropsychological research. Advances in Medical management of traumatic brain injury (pp. 193–215). New York:
Psychotherapy, 8, 121–136. Guilford Press.

Lichtenberger, E., & Kaufman, A. (2009). Essentials of WAIS-IV Lovell, M. R., Iverson, G. L., Podell, M. W., & others. (2006).
assessment. New York: Wiley. Measurement of symptoms following sports-related concussion:
Reliability and normative data for the post-concussion scale.
Lien, M. T., & Carlson, J. S. (2009). Psychometric properties of the Applied Neuropsychology, 13(3), 166–174.
Devereux Early Childhood Assessment in a Head Start sample.
Journal of Psychoeducational Assessment, 27, 386–396. Lowe, P. A., Lee, S. W., Witteborg, K. M., & others. (2008). The Test
Anxiety Inventory for Children and Adolescents (TAICA): Examination
Likert, R. (1932). A technique for the measurement of attitudes. of the psychometric properties of a new multidimensional measure of
Archives of Psychology, 140. test anxiety among elementary and secondary school students. Journal
Lilienfeld, S. O., Ammirati, R., & Landfield, K. (2009). Giving of Psychoeducational Assessment, 26, 215–230.
debiasing away: Can psychological research on correcting cognitive Lubinski, D., Benbow, C., & Ryan, J. (1995). Stability of vocational
errors promote human welfare? Perspectives on Psychological Science, interests among the intellectually gifted from adolescence to
4, 390–398. adulthood: A 15-year longitudinal study. Journal of Applied
Lilienfeld, S., Wood, J., & Garb, H. (2000). The scientific status of Psychology, 80, 196–200.
projective techniques. Psychological Science in the Public Interest, 2, Lüdtke, O., Roberts, B. W., Trautwein, U., & Nagy, G. (2011). A
27–66. random walk down university avenue: Life paths, life events, and
Lilienfeld, S., Wood, J., & Garb, H. (2001, May). What'.s wrong with personality trait change at the transition to university life. Journal of
this picture? Scientific American, 81–87. Personality and Social Psychology, 101, 620–637.
Lindal, E., & Stefansson, J. (1993). Mini-Mental State Examination Lukasik, C. (2004). The physiognomy of biometrics. Retrieved from
scores: Gender and lifetime psychiatric disorders. Psychological www.common-place.org, 5, 1–4.
Reports, 72, 631–641. Lukin, M. E., Dowd, E. T., Plake, B., & Kraft, R. (1985). Comparing
Lindenberger, U., & Baltes, P. (1994). Aging and intelligence. In R. computerized vs. traditional psychological assessment. Computers
J. Sternberg (Ed.), Encyclopedia of human intelligence. New York: in Human Behavior, 1, 49–58.
Macmillan. Lunz, M., & Bergstrom, B. (1994). Computer adaptive testing: A
Lindvall, C. M. (1967). Measuring pupil achievement and aptitude. New national pilot study. In M. Wilson (Ed.), Objective measurement:
York: Harcourt, Brace & World. Theory into practice (vol. 2). Norwood, NJ: Ablex.
References 423

Lunz, M., Bergstrom, B., & Wright, B. (1994). Reliability of alternate Manly, T., Nimmo-Smith, I., Watson, P., & others. (2001). The
computer-adaptive tests. In M. Wilson (Ed.), Objective measurement: differential assessment of children'.s attention: The Test of Everyday
Theory into practice (vol. 2). Norwood, NJ: Ablex. Attention for Children (TEA-Ch), normative sample and ADHD
performance. Journal of Child Psychology and Psychiatry, 42, 1065–1081.
Luria, A. R. (1966). Higher cortical functions in man. New York: Basic
Books. Manning, W. H., & Jackson, R. (1984). College entrance examinations:
Objective selection or gatekeeping for the economically privileged.
Luria, A. R. (1970). The functional organization of the brain. Scientific
In C. R. Reynolds & R. T. Brown (Eds.), Perspectives on bias in mental
American, 222, 66–78.
testing. New York: Plenum Press.
Luria, A. R. (1973). The working brain. New York: Basic Books.
Manto, M., & Pandolfo, M. (Eds.). (2002). The cerebellum and its
Lynn, R. (1987). Japan: Land of the rising IQ. A reply to Flynn. disorders. New York: Cambridge University Press.
Bulletin of the British Psychological Society, 40, 464–468.
Marcus, D. K., Fulton, J. J., & Clarke, E. J. (2010). Lead and conduct
Lynn, R. (2009). What has caused the Flynn effect? Secular increases problems: A meta-analysis. Journal of Clinical Child and Adolescent
in the Development Quotients of infants. Intelligence, 37, 16–24. Psychology, 39, 234–241.
Lyon, G. R. (1996b). Special education for students with disabilities. Mardell, C., & Goldenberg, D. (2011). Developmental indicators for the
The Future of Children, 6, 1–19. assessment of learning—Fourth edition (DIAL-4). San Antonio, TX:
Pearson.
Lyon, G. R., (1996a). Learning disabilities. Special Education for
Students With Disabilities, 6, 1–18. Marks, P. A., & Seeman, W. (1963). The actuarial description of abnormal
personality. Baltimore: Williams & Wilkins.
MacAndrew, C. (1965). The differentiation of male alcoholic out-
patients from nonalcoholic psychiatric patients by means of the Markwardt, F. C. (1997). Peabody Individual Achievement Test-Revised/
MMPI. Quarterly Journal of Studies on Alcohol, 26, 238–246. Normative Update. Circle Pines, MN: American Guidance Service.
Machover, K. (1949). Personality projection in the drawing of the human Marnic, L. R. (2011). Evaluating the Bender Visual Motor Gestalt Test
figure. Springfield, IL: Charles C. Thomas. II as a diagnostic screening instrument among clinically referred
children and adolescents. Dissertation Abstracts International: Section
Machover, K. (1951). Drawing of the human figure: A method of B: The Sciences and Engineering, 72(5-B), 3118.
personality investigation. In H. Anderson & G. Anderson (Eds.),
An introduction to projective techniques. New York: Prentice Hall. Martell, D. A. (1992). Forensic neuropsychology and the criminal law.
Law and Human Behavior, 16, 313–336.
Mack, J., & Patterson, M. (1995). Executive dysfunction and
Alzheimer'.s disease: Performance on a test of planning ability, the Martin, J. C. (1994). Birth defects. In R. J. Sternberg (Ed.), Encyclopedia
Porteus Maze Test. Neuropsychology, 9, 556–564. of human intelligence. New York: Macmillan.

Mackenzie Ross, S. J., Brewin, C., Curran, H. V., & others. (2010). Martin, R. (2003). Sense of Humor. In S. Lopez & C. R. Snyder (Eds.),
Neuropsychological and psychiatric functioning in sheep Positive psychological assessment: A handbook of models and measures.
farmers exposed to low levels of organophosphate pesticides. Washington, DC: American Psychological Association.
Neurotoxicology and Teratology, 32, 452–459. Martin, R. A. (1996). The Situational Humor Response Questionnaire
(SHRQ) and Coping Humor Scale (CHS): A decade of research
MacPhillamy, D. J., & Lewinsohn, P. M. (1982). The Pleasant Events
findings. Humor: International Journal of Humor Research, 9, 251–272.
Schedule: Studies on reliability, validity, and scale intercorrelation.
Journal of Consulting and Clinical Psychology, 50, 363–380. Martin, R. A., & Lefcourt, H. M. (1983). Sense of humor as a
moderator of the relation between stressors and moods. Journal of
Maddi, S. R. (2000). Personality theories: A comparative analysis (6th
Social and Personality Psychology, 45, 1313–1324.
ed.). Prospective Heights, IL: Waveland Press.
Martin, R. A., & Lefcourt, H. M. (1984). Situational Humor Response
Mahoney, M., & Arnkoff, D. (1978). Cognitive and self-control
Questionnaire: Quantitative measure of sense of humor. Journal of
therapies. In S. Garfield & A. Bergin (Eds.), Handbook of
Social and Personality Psychology, 47, 145–155.
psychotherapy and behavior change: An empirical analysis. New
York: Wiley. Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K.
(2003). Individual differences in uses of humor and their relation
Main, M., & Hesse, E. (1990). Parents'. unresolved traumatic
to psychological well-being: Development of the Humor Styles
experiences are related to infant disorganized attachment status:
Questionnaire. Journal of Research in Personality, 37, 48–75.
Is frightened and/or frightening parental behavior the linking
mechanism? In M. Greenberg, D. Cicchetti, & E. Cummings (Eds.), Martin, S. (2010). The internet'.s ethical challenges. APA Monitor,
Attachment in the preschool years (pp. 161–182). Chicago: University 41(7), 32.
of Chicago Press. Martindale, C. (1981). Cognition and consciousness. Homewood, IL:
Main, M., & Solomon, J. (1986). Discovery of a new, insecure- Dorsey.
disorganized/disoriented attachment pattern. In T. B. Brazelton & Martuza, V. R. (1977). Applying norm-referenced and criterion-referenced
M. W. Yogman (Eds.), Affective development in infancy (pp. 95–124). measurement in education. Boston: Allyn and Bacon.
Norwood, NJ: Ablex Publishing.
Masten, A. S., Best, K. M., & Garmezy, N. (1990). Resilience and
Majnemer, A., & Mazer, B. (1998). Neurologic evaluation of development: Contributions from the study of children who
the newborn infant: Definition and psychometric properties. overcame adversity. Development and Psychopathology, 2, 425–444.
Developmental Medicine and Child Neurology, 40, 708–715.
Masters, K. S., & Hooker, S. A. (2012, November 12). Religiousness/
Malgady, R. G., Constantino, G., & Rogler, L. H. (1984). Development spirituality, cardiovascular disease, and cancer: Cultural integration
of a Thematic Apperception Test (TEMAS) for urban Hispanic for health research and intervention. Journal of Consulting and
children. Journal of Consulting and Clinical Psychology, 52, 986–996. Clinical Psychology, online publication.
Maloney, M. P., & Ward, M. P. (1979). Mental retardation and modern Matarazzo, J. D. (1972). Wechsler'.s measurement and appraisal of adult
society. New York: Oxford University Press. intelligence (5th ed.). Baltimore: Williams & Wilkins.
Man, D., Chung, J., & Mak, M. (2009). Development and validation Matarazzo, J. D. (1990). Psychological assessment versus
of the Online Rivermead Behavioral Memory Test (OL-RBMT) for psychological testing: Validation from Binet to the school, clinic,
people with stroke. Neurorehabilitation, 24, 231–236. and courtroom. American Psychologist, 45, 999–1017.
424 References

Matarazzo, J. D. (1992). Psychological testing and assessment in the McCallum, R. S. (1990). Determining the factor structure of the
21st century. American Psychologist, 47, 1007–1018. Stanford-Binet: Fourth Edition—the right choice. Journal of
Psychoeducational Assessment, 8, 436–442.
Matson, J. (Ed.). (2007). Handbook of assessment in persons with
intellectual disability. London: Academic Press. McCoy, B. (2000). Quack! Tales of medical fraud from the museum of
questionable medical devices. Santa Monica, CA: Santa Monica Press.
Matson, J. L., & Tureck, K. (2012). Early diagnosis of autism: Current
status of the Baby and Infant Screen for Children with Autism McCrae, R. R. (1985). Review of the Defining Issues Test. Ninth
Traits (BISCUIT-Parts 1, 2, and 3). Research in Autism Spectrum mental measurements yearbook. Lincoln: University of Nebraska
Disorders, 6, 1135–1141. Press.
Matson, J. L., Boisjoli, J. A., & Wilkins, J. (2007). Baby and Infant McCrae, R. R., & Costa, P. T., Jr. (1987). Validation of the five-factor
Screen for Children with Autism Traits (BISCUIT). Baton Rouge, La: model of personality across instruments and observers. Journal of
Disability Consultants, LLC. Personality and Social Psychology, 2, 81–90.
Matson, J. L., Boisjoli, J. A., Hess, J. A., & Wilkins, J. (2010). Factor McCrae, R., Costa, P., & Martin, T. (2005). The NEO-PI-3: A more
structure and diagnostic fidelity of the Baby and Infant Screen for readable revised NEO Personality Inventory. Journal of Personality
Children with Autism Traits-Part 1 (BISCUIT-Part 1). Developmental Assessment, 84, 261–270.
Neurorehabilitation, 13, 72–79. McCullough, M. E., Emmons, R. A., & Tsang, J. (2002). The Grateful
Matson, J. L., Wilkins, J., & Fodstad, J. C. (2011). The validity of the disposition: A conceptual and empirical topography. Journal of
Baby and Infant Screen for Children with Autism Traits: Part 1 Personality and Social Psychology, 82, 112–127.
(BISCUIT: Part 1). Journal of Autism and Developmental Disorders, 41, McDermott, B. E., & Sokolov, G. (2009). Malingering in a correctional
1139–1146. setting: The use of the structured interview of reported symptoms
Matthews, G., Zeidner, M., & Roberts, R. (2002). Emotional intelligence: in a jail sample. Behavioral Sciences & the Law, 27, 753–765.
Science and myth. Cambridge, MA: MIT Press. McDonald, A., Nussbaum, D., & Bagby, R. (1992). Reliability,
Mattis, S. (2001). Dementia Rating Scale-2. Lutz, FL: Psychological validity, and utility of the Fitness Interview Test. Canadian Journal of
Assessment Resources. Psychiatry, 36, 480–484.
Maxwell, J. K., & Wise, F. (1984). PPVT IQ validity in adults: A McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ:
measure of vocabulary, not of intelligence. Journal of Clinical Erlbaum.
Psychology, 40, 1048–1053. McGee, R., Clark, S., & Symons, D. (2000). Does the Conners'.
May, P. A., Gossage, J. P., Kalberg, W. O., & others. (2009). Prevalence continuous performance test aid in ADHD diagnosis? Journal of
and epidemiologic characteristics of FASD from various Abnormal Child Psychology, 28, 415–424.
research methods with an emphasis on recent in-school studies. McGlynn, F. D., & Rose, M. P. (1998). Assessment of anxiety and fear.
Developmental Disabilities Research Reviews, 15, 176–192. In A. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical
Mayer, J. D. (2007–2008). The big questions of personality handbook (4th ed.). Boston: Allyn and Bacon.
psychology: Defining common pursuits of the discipline. McGrath, E., Wypij, D., Rappaport, L., Newburger, J., &
Imagination, Cognition and Personality, 27, 3–26. Bellinger, C. (2004). Prediction of IQ and achievement at age
Mayer, J., & Salovey, P. (1993). The intelligence of emotional 8 from neurodevelopmental status at age 1 in children with
intelligence. Intelligence, 17, 433–442. D-transposition of the great arteries. Pediatrics, 114, 572–576.
Mayer, J., Salovey, P., & Caruso, D. (2002). Mayer-Salovey-Caruso McGrath, R., Pogge, D., Stokes, J., & others. (2005). Field reliability of
Emotional Intelligence Test (MSCEIT) user'.s manual. Toronto, ON: Comprehensive System scoring in an adolescent inpatient sample.
Multi-Health Systems. Assessment, 12, 199–209.
Mayer, J., Salovey, P., & Caruso, D. (2004). Emotional intelligence: McGrew, K. S. (1997). Analysis of the major intelligence batteries
Theory, findings, and implications. Psychological Inquiry, 15, according to a proposed comprehensive Gf-Gc framework. In D.
197–215. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary
intellectual assessment: Theories, tests, and issues (pp. 151–179). New
Mayer, J., Salovey, P., & Caruso, D. (2008). Emotional intelligence:
York: Guilford.
New ability or eclectic traits? American Psychologist, 63, 503–517.
McGrew, K. S., & Flanagan, D. P. (1998). The intelligence test desk
Mayer, J., Salovey, P., Caruso, D., & Sitarenios, G. (2003). Measuring
reference (ITDR): Gf-Gc cross-battery assessment. Boston: Allyn &
emotional intelligence with the MSCEIT V2.0. Emotion, 3, 97–105.
Bacon.
Mayers, L., & Redick, T. S. (2012). Clinical utility of ImPACT
McGue, M., Bouchard, T., Iacono, W., & Lykken, D. (1993). Behavior
assessment for postconcussion return-to-play counseling:
genetics of cognitive ability: A life-span perspective. In R. Plomin
Psychometric issues. Journal of Clinical and Experimental
& G. McClearn (Eds.), Nature, nurture, and psychology. Washington,
Neuropsychology, 34, 235–242.
DC: American Psychological Association.
Mayeux, R., & Kandel, E. R. (1991). Disorders of language: The
McGurk, F. C. J. (1953a). On white and Negro test performance and
aphasias. In E. R. Kandel, J. H. Schwartz, & T. M. Jessel (Eds.),
socio-economic factors. Journal of Abnormal and Social Psychology, 48,
Principles of neural science (3rd ed.). New York: Elsevier.
448–450.
McAllister, L. W. (1986). A practical guide to CPI interpretation. Palo McGurk, F. C. J. (1953b). Socioeconomic status and culturally-
Alto, CA: Consulting Psychologists Press. weighted test scores of Negro subjects. Journal of Applied Psychology,
McCall, R. B. (1976). Toward an epigenetic conception of mental 37, 276–277.
development in the first three years of life. In M. Lewis (Ed.), Origins McGurk, F. C. J. (1975). Race differences—twenty years later. Homo,
of intelligence: Infancy and early childhood. New York: Plenum Press. 26, 219–239.
McCall, R. B. (1979). The development of intellectual functioning McKee, A. C., Cantu, R. C., Nowinski, C. J., & others. (2009). Chronic
in infancy and the prediction of later IQ. In J. D. Osofsky (Ed.), traumatic encephalopathy in athletes: Progressive tauopathy
Handbook of infant development. New York: Wiley. following repetitive head injury. Journal of Neuropathology and
McCall, W. A. (1939). Measurement. New York: Macmillan. Experimental Neurology, 68, 709–735.
References 425

McKee, A. C., Stein, T. D., & Nowinski, C. J. (2012, October 1). The Meisels, S., Marsden, D., Wiske, M., & Henderson, L. (1997). Early
spectrum of disease in chronic traumatic encephalopathy. Brain, Screening Inventory-Revised. San Antonio, TX: The Psychological
online publication Corporation.
McKee-Ryan, F. M., Song, Z., Wanberg, C., & Kinicki, A. (2005). Meisels, S., Wiske, M., & Henderson, L. (2008). Early Screening
Psychological and physical well-being during unemployment: A Inventory—Revised. San Antonio, TX: The Psychological Corporation.
meta-analytic study. Journal of Abnormal Psychology, 90, 53–76. Melton, G. B. (1995). Review of the Ackerman-Schoendorf Scales
McKey, R. H., & others. (1985). The impact of Head Start on children, for Parent Evaluation of Custody. The Twelfth mental measurements
families and communities. Washington, DC: U.S. Government yearbook. Lincoln: University of Nebraska Press.
Printing Office. Melton, G. B., Petrila, J., Poythress, N., & Slobogin, C. (1998).
McKinley, J. C., & Hathaway, S. R. (1940). A Multiphasic Personality Psychological evaluation for the courts (2nd ed.). New York: Guilford.
Schedule (Minnesota): II. A differential study of hypochondriasis. Mendez, M., Licht, E., & Saul, R. E. (2008). The Frontal Systems
Journal of Psychology, 10, 255–268. Behavior Scale in the evaluation of dementia. International Journal of
McKinley, J. C., & Hathaway, S. R. (1944). The MMPI: V. Hysteria, Geriatric Psychiatry, 23, 1203–1204.
hypomania and psychopathic deviate. Journal of Applied Psychology, Menzies, G. (2003). 1421: The year China discovered America. New York:
28, 153–174. William Morrow.
McKinley, J. C., Hathaway, S. R., & Meehl, P. E. (1948). The MMPI: VI. Mercer, J. R., & Lewis, J. F. (1978). System of Multicultural Pluralistic
The K scale. Journal of Consulting Psychology, 12, 20–31. Assessment. San Antonio, TX: The Psychological Corporation.
McLean, C. P., Asnaani, A., Litz, B. T., & Hofmann, S. G. (2011). Merenda, P. F. (1985). Comrey Personality Scales. In D. J. Keyser &
Gender differences in anxiety disorders: Prevalence, course of R. C. Sweetland (Eds.), Test critiques (vol. 4). Kansas City, MO: Test
illness, comorbidity and burden of illness. Journal of Psychiatric Corporation of America.
Research, 45, 1027–1035.
Messiah, A., Encrenaz, G., Sapinho, D., & others. (2007). Paradoxical
McMillan, D., Hastings, R., & Coldwell, J. (2004). Clinical and increase of positive answers to the Cut-down, Annoyed, Guilt, Eye-
actuarial prediction of physical violence in a forensic intellectual opener (CAGE) questionnaire during a period of decreasing alcohol
disability hospital: A longitudinal study. Journal of Applied Research consumption: Results from two population-based surveys in Ile-de-
in Intellectual Disabilities, 17, 255–265. France, 1991 and 2005. Addiction, 103, 598–603.
McNulty, J., Graham, J., Ben-Porath, Y., & Stein, L. (1997). Comparative Messick, S. (1980). Test validity and the ethics of assessment.
validity of MMPI-2 scores of African American and caucasian mental American Psychologist, 35, 1012–1027.
health center clients. Psychological Assessment, 9, 464–470.
Messick, S. (1995). Validity of psychological assessment: Validation of
McReynolds, P., & Ludwig, K. (1984). Christian Thomasius and the inferences from persons'. responses and performances as scientific
origin of psychological rating scales. Isis, 75, 546–553. inquiry into score meaning. American Psychologist, 50, 741–749.
McReynolds, P., & Ludwig, K. (1987). On the history of rating scales. Mevarech, Z. (1995). Metacognition, general ability, and mathematical
Personality and Individual Differences, 8, 281–283. understanding. Early Education and Development, 6, 155–168.
Mednick, S. (1962). The associative basis of the creative process. Meyer, G. J. (1997). Assessing reliability: Critical corrections for a
Psychological Review, 3, 220–232. critical examination of the Rorschach Comprehensive System.
Mednick, S., & Mednick, M. (1966). Manual: Remote Associates Test. Psychological Assessment, 9, 480–489.
Boston: Houghton Mifflin. Meyer, G. J., & Eblin, J. J. (2012). An overview of the Rorschach
Medoff-Cooper, B., & Ratcliffe, S. (2005). Development of preterm Performance Assessment System (R-PAS). Psychological Injury and
infants: Feeding behaviors and Brazelton Neonatal Behavioral Law, 5, 107–121.
Assessment Scale at 40 and 44 weeks'. postconceptual age. Advances Meyer, G. J., & Handler, L. (1997). The ability of the Rorschach to
in Nursing Science, 28, 356–363. predict subsequent outcome: A meta-analysis of the Rorschach
Prognostic Rating Scale. Journal of Personality Assessment, 69, 1–38.
Meehl, P. E. (1954). Clinical versus statistical prediction. Minneapolis:
University of Minnesota Press. Meyer, G. J., Viglione, D. J., Mihura, J. L., Erard, R. E., & Erdberg,
P. (2011). Rorschach Performance Assessment System: Administration,
Meehl, P. E. (1965). Seer over sign: The first good example. Journal of
coding, interpretation, and technical manual. Toledo, OH: Rorschach
Experimental Research in Personality, 1, 29–32.
Performance Assessment System.
Meehl, P. E. (1986). Causes and effects of my disturbing little book.
Mickley, J. (1990). Spiritual well-being, religiousness, and hope: Some
Journal of Personality Assessment, 50, 370–375.
relationships in a sample of women with breast cancer. Unpublished
Megargee, E. (1972). The California Psychological Inventory handbook. master'.s thesis, University of Maryland, School of Nursing, College
San Francisco: Jossey-Bass. Park, MD.
Meichenbaum, D. (1977). Cognitive-behavior modification: An integrative Middleton, H., Keene, R., & Brown, G. (1990). Convergent and
approach. New York: Plenum Press. discriminant validities of the Scales of Independent Behavior and
Meier, S. T. (1984). The construct validity of burnout. Journal of the revised Vineland Adaptive Behavior Scales. American Journal of
Occupational Psychology, 57, 211–219. Mental Retardation, 94, 669–673.

Meier, V. J., & Hope, D. A. (1998). Assessment of social skills. In A. Miele, F. (1979). Cultural bias in the WISC. Intelligence, 3, 149–164.
S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can
handbook (4th ed.). Boston: Allyn and Bacon. decision making be improved? Perspectives on Psychological Science,
Meijer, E., Verschuere, B., Merckelbach, H., & Crombez, G. (2008). 4, 379–383.
Sex offender management using the polygraph: A critical review. Miller, F. G., & Lazowski, L. (1999). The adult SASSI-3 manual.
International Journal of Law and Psychiatry, 31, 423–429. Springville, IN: The SASSI Institute.
Meisels, S., & Atkins-Burnett, S. (2005). Developmental screening Miller, F. G., Roberts, J., Brooks, M., & Lazowski, L. (1997). SASSI-3
in early childhood: A guide (5th ed.). Washington, DC: National user'.s guide: A quick reference for administration and scoring.
Association for the Education of Young Children. Bloomington, IN: Baugh Enterprises.
426 References

Miller, G. (2012). The smartphone psychology manifesto. Perspectives Moore, R. C., Viglione, D. J., Rosenfarb, I. S., Patterson, T. L., &
on Psychological Science, 7, 221–237. Mausbach, B. T. (2012, November 12). Rorschach measures
of cognition relate to everyday and social functioning in
Miller, L. K. (1989). Musical savants: Exceptional skill in the mentally
schizophrenia. Psychological Assessment, online publication.
retarded. Hillsdale, NJ: Erlbaum.
Moore, W. P. (1994). The devaluation of standardized testing: One
Miller, S. D., & Duncan, B. L. (2000). Outcome and Session Rating Scales:
district'.s response. Applied Measurement in Education, 7, 343–368.
Administration and scoring manual. Chicago: Institute for the Study
of Therapeutic Change. Moreland, K. L. (1992). Computer-assisted psychological assessment.
In M. Zeidner & R. Most (Eds.), Psychological testing: An inside view.
Miller, S. D., Duncan, B. L., Brown, J., Sparks, J., & Claud, D. (2003).
Palo Alto, CA: Consulting Psychologists Press.
The Outcome Rating Scale: A preliminary study of the reliability,
validity, and feasibility of a brief visual analog measure. Journal of Moreno, K. E., & Segall, D. O. (1997). Reliability and construct
Brief Therapy, 2, 91–100. validity of the CAT-ASVAB In W. A. Sands, B. K. Waters, & J.
R. McBride (Eds.), Computerized adaptive testing: From inquiry to
Miller, T. R. (1991). Personality: A clinician'.s experience. Journal of
operation. Washington, DC: American Psychological Association.
Personality Assessment, 57, 415–433.
Morgan, C. D., & Murray, H. A. (1935). A method for investigating
Millman, J., & Greene, J. (1989). The specification and development
phantasies: The Thematic Apperception Test. Archives of Neurology
of tests of achievement and ability. In R. L. Linn (Ed.), Educational
and Psychiatry, 34, 289–306.
measurement (3rd ed.). New York: ACE/Macmillan.
Morgan, C. D., Shoenberg, M., Dorr, D., & Burke, M. (2002).
Millon, T. (1969). Modern psychopathology: A biosocial approach to
Overreport on the MCMI-III: Concurrent validation with the
maladaptive learning and functioning. Philadelphia: Saunders.
MMPI-2 using a psychiatric inpatient sample. Journal of Personality
Millon, T. (1981). Disorders of personality: DSM-III, Axis II. New York: Assessment, 78, 288–300.
Wiley.
Mori, L., & Armendariz, G. (2001). Analogue assessment of child
Millon, T. (1983). Millon Clinical Multiaxial Inventory manual (2nd ed.). behavior problems. Psychological Assessment, 13, 36–45.
Minneapolis, MN: National Computer Systems.
Morrison, M. W., Gregory, R. J., & Paul, J. J. (1979). Reliability of the
Millon, T. (1986). A theoretical derivation of pathological Finger Tapping Test and a note on sex differences. Perceptual and
personalities. In T. Millon & G. Klerman (Eds.), Contemporary Motor Skills, 48, 139–142.
directions in psychopathology: Toward the DSM-IV. New York:
Morrison, T., & Morrison, M. (1995). A meta-analytic assessment of
Guilford.
the predictive validity of the quantitative and verbal components
Millon, T. (1987). Manual for the Millon Clinical Multiaxial Inventory-II of the Graduate Record Examination with graduate grade point
(MCMI-II) (2nd ed.). Minneapolis, MN: National Computer Systems. average representing the criterion of graduate success. Educational
Millon, T. (1994). Manual for the Millon Clinical Multiaxial Inventory- and Psychological Measurement, 55, 309–316.
III (MCMI-III) (3rd ed.). Minneapolis, MN: National Computer Morrow, C., Bandstra, E., Anthony, J., & others. (2001). Influence
Systems. of prenatal cocaine exposure on full-term infant neurobehavioral
Millon, T., & Davis, R. (1996). The Millon Clinical Multiaxial functioning. Neurotoxicology and Teratology, 23, 533–544.
Inventory-III (MCMI-III). In C. S. Newmark (Ed.), Major psychological Moruzzi, G., & Magoun, H. W. (1949). Brain stem and reticular
assessment instruments (2nd ed.). Boston: Allyn and Bacon. formation and activation of the EEG. Electroencephalography and
Mills, C., & Tissot, S. (1995). Identifying academic potential in Clinical Neurophysiology, 1, 455–473.
students from underrepresented populations: Is using the Raven'.s Motowidlo, S. J., Carter, G., Dunnette, M., Tippins, N., Werner,
Progressive Matrices a good idea? Gifted Child Quarterly, 39, 209–217. S., Burnett, J., & Vaughan, M. (1992). Studies of the structured
Mills, C., Potenza, M., Fremer, J., & Ward, W. (2002). Computer-based behavioral interview. Journal of Applied Psychology, 77, 571–587.
testing: Building the foundation for future assessments. Mahwah, NJ: Motta, R. W., Little, S., & Tobin, M. (1993). The use and abuse of
Erlbaum. human figure drawings. School Psychology Quarterly, 8, 162–169.
Milner, B. (1968). Disorders of memory after brain lesions in man. Mount, M., Witt, L., & Barrick, M. (2000). Incremental validity of
Neuropsychologia, 6, 175–179. empirically keyed biodata scales over GMA and the five factor
Mischel, W. (1968). Personality and assessment. New York: Wiley. personality constructs. Personnel Psychology, 53, 299–323.

Mischel, W., Shoda, Y., & Mendoza-Denton, R. (2002). Situation- Mountain, M., & Snow, W. (1993). Wisconsin Card Sorting Test as a
behavior profiles as a locus of consistency in personality. Current measure of frontal pathology: A review. Clinical Neuropsychologist,
Directions in Psychological Science, 11, 50–54. 7, 108–118.

Mitchell, T. W., & Klimoski, R. J. (1986). Estimating the validity of Muchinsky, P. (2003). Psychology applied to work: An introduction to
cross-validity estimation. Journal of Applied Psychology, 71, 311–317. industrial and organizational psychology (7th ed.). Belmont, CA:
Wadsworth.
Mitchell, V. (2007). Earning a secure attachment style: A narrative
of personality change in adulthood. In R. Josselson, A. Lieblich, Murphy, K. R. (1984). Review of Armed Services Vocational Aptitude
& D. P. McAdams (Eds.), The meaning of others: Narrative studies of Battery. In D. Keyser & R. Sweetland (Eds.), Test critiques (vol. 1).
relationships (pp. 93–116). Washington, DC: American Psychological Kansas City, MO: Test Corporation of America.
Association. Murphy, K. R. (1992). Review of TONI-2. The eleventh mental
Moberg, D. O. (1971). Spiritual well-being: Background and issues. measurements yearbook. Lincoln: University of Nebraska Press.
Washington, DC: White House Conference on Aging. Murphy, K. R., & Davidshofer, C. O. (1988). Psychological testing.
Englewood Cliffs, NJ: Prentice Hall.
Montague, M., & Bos, C. S. (1990). Cognitive and metacognitive
characteristics of eighth grade students'. mathematical problem Murphy, K. R., & Davidshofer, C. O. (2004). Psychological testing
solving. Learning and Individual Differences, 2, 371–388. (6th ed.). Englewood Cliffs, NJ: Prentice Hall.
Moore, E. G. J. (1986). Family socialization and the IQ-test Murphy, K. R., & Pardaffy, V. A. (1989). Bias in behaviorally anchored
performance of traditionally and transracially adopted children. rating scales: Global or scale-specific? Journal of Applied Psychology,
Developmental Psychology, 22, 317–326. 74, 343–346.
References 427

Murphy, K. R., Jako, R., & Anhalt, R. (1993). Nature and consequences of Naglieri, J., McNeish, T., & Bardos, A. (1991). Draw-A-Person:
halo error: A critical analysis. Journal of Applied Psychology, 78, 218–225. Screening Procedure for Emotional Disturbance. Austin, TX: ProEd.
Murray, H. A. (1938). Explorations in personality. New York: Oxford National Association of School Psychologists. (1992). Principles for
University Press. professional ethics. Silver Spring, MD: Author.
Murray, H. A. (1943). Thematic Apperception Test—Manual. Cambridge, National Association of School Psychologists. (2010). Principles for
MA: Harvard University Press. professional ethics. Silver Springs, MD: Author.
Museum of Modern Art. (1955). The family of man. New York: Maco National Joint Committee on Learning Disabilities. (1988). A position
Magazine Corporation. paper of the National Trust Committee on Learning Disabilities.
Myers, D. (2002). Social psychology (7th ed.). New York: McGraw-Hill. Journal of Learning Disabilities, 21, 53–55.

Myers, I. B., & McCaulley, M. H. (1985). Manual: A guide to the Naugle, R. I., Chelune, G., & Tucker, G. (1993). Validity of the
development and use of the Myers-Briggs Type Indicator. Palo Alto, CA: Kaufman Brief Intelligence Test. Psychological Assessment, 5, 182–186.
Consulting Psychologists Press. Nauta, W. J. H. (1971). The problem of the frontal lobe. Journal of
Myers, I., & McCaulley, M. (1985). A guide to the development and Psychiatric Research, 8, 167–187.
use of the Myers-Briggs Type Indicator. Palo Alto, CA: Consulting Naveh-Benjamin, M., McKeachie, W. J., & Lin, Y. (1987). Two types
Psychologists Press. of test-anxious students: Support for an information processing
Myrtek, M. (2007). Type a behavior and hostility as independent risk model. Journal of Educational Psychology, 79, 131–136.
factors for coronary heart disease. In J. Jordan, B. Bardé, & A. M. Needleman, H. L., Gunnoe, C., Leviton, A., Reed, R., Peresie,
Zeiher (Eds.), Contributions toward evidence-based psychocardiology: H., Maher, C., & Barrett, P. (1979). Deficits in psychologic and
A systematic review of the literature (pp. 159–183). Washington, DC: classroom performance of children with elevated dentine lead
American Psychological Association. levels. The New England Journal of Medicine, 300, 689–695.
Naglieri, J. A. (1981). Concurrent validity of the Revised Peabody Needleman, H. L., Schell, A., Bellinger, D., Leviton, A., & Allred, E.
Picture Vocabulary Test. Psychology in the Schools, 18, 286–289. (1990). The long-term effects of exposure to low doses of lead in
Naglieri, J. A. (1988). Draw A Person: A quantitative scoring system. San childhood. New England Journal of Medicine, 322, 83–88.
Antonio, TX: The Psychological Corporation. Neisser, U. (Ed.). (1998). The rising curve: Long-term gains in IQ
Naglieri, J. A., & Das, J. P. (2005a). Planning, Attention, Simultaneous, and related measures. Washington, DC: American Psychological
and Successive (PASS) cognitive processes as a model for Association.
intelligence. Journal of Psychoeducational Assessment, 8, 303–337. Neisser, U., Boodoo, G., & Bouchard, T., & others. (1996). Intelligence:
Naglieri, J. A., & Das, J. P. (2005b). Planning, Attention, Simultaneous, Knowns and unknowns. American Psychologist, 51, 77–101.
Successive (PASS) theory: A revision of the concept of intelligence. Nelson, R., & Piedmont, R. L. (2008, August). Psychometric utility of
In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual the ASPIRES Scales in non-Christian samples. Paper presented at the
assessment: Theories, tests, and issues (pp. 120–135). New York: American Psychological Association Conference, Boston.
Guilford Press.
Nester, M. A. (1994). Psychometric testing and reasonable
Naglieri, J. A., & Paolitto, A. (n.d.). Attention deficit diagnosis and accommodation for persons with disabilities. In S. M. Bruyere & J.
treatment: Current status/future directions. Unpublished paper O'.Keeffe (Eds.), Implications of the Americans with Disabilities Act for
available at: www.riverpub.com/products/cas/cas_add.html. psychology. New York: Springer.
Naglieri, J. A., & Pfeiffer, S. (1983). Stability, concurrent and predictive Nestor, P. G., & Schutt, R. K. (2012). Research methods in psychology:
validity of the PPVT-R. Journal of Clinical Psychology, 39, 965–967. Investigating human behavior. Thousand Oaks, CA: SAGE.
Naglieri, J. A., & Pfeiffer, S. (1992). Performance of disruptive Nettelbeck, T., & Wilson, C. (2004). The Flynn effect: Smarter, not
behavior disordered and normal samples on the Draw A Person: faster. Intelligence, 32, 85–93.
Screening Procedure for Emotional Disturbance. Psychological
Netter, B., & Viglione, D., Jr. (1994). An empirical study of
Assessment, 4, 156–159.
malingering schizophrenia on the Rorschach. Journal of Personality
Naglieri, J. A., & Rojahn, J. (2001). Intellectual classification of Black Assessment, 62, 45–57.
and White children in special education programs using the
Nevo, B. (1985). Face validity revisited. Journal of Educational
WISC—III and the cognitive assessment system. American Journal on
Measurement, 22, 287–293.
Mental Retardation, 106, 359–367.
Nevo, B. (1992). Examinee feedback: Practical guidelines. In M.
Naglieri, J. A., & Yazzie, C. (1983). Comparison of the WISC-R and
Zeidner & R. Most (Eds.), Psychological testing: An inside view. Palo
PPVT-R with Navajo children. Journal of Clinical Psychology, 39,
Alto, CA: Consulting Psychologists Press.
598–600.
Newland, T. E. (1971). Blind Learning Aptitude Test. Champaign:
Naglieri, J. A., Das, J. P., & Goldstein, S. (2012). Cognitive Assessment
University of Illinois Press.
System—Second edition. Austin, TX: PRO-ED.
Newsome, S., Day, A., & Catano, V. (2000). Assessing the predictive
Naglieri, J. A., Rojahn, J., Matto, H. C., & Aquilino, S. A. (2005). Black-
validity of emotional intelligence. Personality and Individual
White differences in cognitive processing: A study of the planning,
Differences, 29, 1005– 1016.
attention, simultaneous, and successive theory of intelligence.
Journal of Psychoeducational Assessment, 23, 146–160. Nichols, S. L., Glass, G. V., & Berliner, D. C. (2006). High-stakes testing
and student achievement: Problems for the No Child Left Behind Act.
Naglieri, J. A., Taddei, S., & Williams, K. M. (2012, September
Tempe, AZ: Education Policies Study Laboratory.
17). Multigroup confirmatory factor analysis of U.S. and Italian
children'.s performance on the PASS theory of intelligence as Nieuwenhuis-Mark, R. E. (2010). The death knoll for the MMSE: Has
measured by the Cognitive Assessment System. Psychological it outlived its purpose? Journal of Geriatric Psychiatry and Neurology,
Assessment, online publication. 23, 151–157.
Naglieri, J., & Das, J. (1990). Planning, attention, successive, and Nihira, K., Leland, H., & Lambert, N. (1993). Adaptive Behavior Scale-
simultaneous cognitive processes as a model for intelligence. Residential and Community (2nd ed.). Washington, DC: American
Journal of Psychoeducational Assessment, 8, 165–170. Association on Mental Retardation.
428 References

Nijenhuis, J., & van der Flier, H. (1997). Comparability of GATB OSS Assessment Staff. (1948). Assessment of men: Selection of personnel
scores for immigrants and majority group members: Some Dutch for the Office of Strategic Services. New York: Rinehart.
findings. Journal of Applied Psychology, 82, 675–687.
Otis, A. S. (1918). An absolute point scale for the group measure of
Nijenhuis, J., Evers, A., & Mur, J. (2000). Validity of the Differential intelligence. Journal of Educational Psychology, 9, 238–261, 333–348.
Aptitude Test for the assessment of immigrant children. Educational Ottinger, R., & Kurzon, C. (2007, May 21). Biodata: The measure of an
Psychology, 20, 99–115. applicant?. New York Law Journal, online publication (3 pp.).
Nisan, M., & Kohlberg, L. (1982). Universality and cross-cultural Owens, W. A. (1976). Background data. In M. D. Dunnette (Ed.),
variation in moral development: A longitudinal and cross-sectional Handbook of industrial and organizational psychology. Chicago: Rand
study in Turkey. Child Development, 53, 865–876. McNally.
Nisbett, R. E., Aronson, J., Blair, C., & others. (2012). Intelligence: Ownby, R. L. (1991). Psychological reports: A guide to report writing in
New findings and theoretical developments. American Psychologist, professional psychology (2nd ed.). Brandon, VT: Clinical Psychology
67, 130–139. Publishing Co.
Norris, G., & Tate, R. (2000). The Behavioural Assessment of the Paloutzian, R. F., & Ellison, C. W. (1982). Loneliness, spiritual well-
Dysexecutive Syndrome (BADS): Ecological, concurrent and being and the quality of life. In L. A. Peplau & D. Perlman (Eds.),
construct validity. Neuropsychological Rehabilitation, 10, 33–45. Loneliness: A sourcebook of current theory, research and therapy. New
Nottingham, E. J., & Mattson, R. E. (1981). A validation study of the York: Wiley.
Competency Screening Test. Law and Human Behavior, 5, 329–335. Panigua, F. (1994). Assessing and treating culturally diverse clients: A
Nunnally, J. (1967). Psychometric theory. New York: McGraw-Hill. practical guide. Thousand Oaks, CA: Sage.
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: Park, N., & Peterson, C. (2009). Achieving and sustaining a good life.
McGraw-Hill. Perspectives on Psychological Science, 4, 422–428.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). Parsons, F. (1909). Choosing a vocation. Boston: Houghton Mifflin.
New York: McGraw-Hill. Parsons, T. D., Rizzo, A. A., Brennan, J., Bittman, M., & Zelinski, E.
O'.Neill, J., Jacobson, S., & Jacobson, J. (1994). Evidence of observer (2008). Assessment of executive functioning using virtual reality:
reliability for the Fagan Test of Infant Intelligence (FTII). Infant Virtual Environment Grocery Store. Gerontechnology, 7, 189–189.
Behavior and Development, 17, 465–469. Patterson, C. (1980). An alternative perspective—lead pollution in the
Oakes, L. M. (2009). The “Humpty Dumpty Problem” in the study human environment. In Lead in the human environment. Washington,
of early cognitive development: Putting the infant back together DC: National Academy of Sciences.
again. Perspectives on Psychological Science, 4, 352–358. Patton, J. R., Payne, J. S., & Beirne-Smith, M. (1986). Mental retardation
Ochse, R. (1990). Before the gates of excellence. Cambridge, England: (2nd ed.). Columbus, OH: Merrill.
Cambridge University Press. Paul, A. M. (2004). The cult of personality. New York: Free Press.
Oei, T., Evans, L., & Crook, G. M. (1990). Utility and validity of Paul, L. K., Brown, W. S., Adolphs, R., & others. (2007). Agenesis of
the STAI with anxiety disorder patients. British Journal of Clinical the corpus callosum: Genetic, developmental and functional aspects
Psychology, 29, 429–432. of connectivity. Nature, 8, 287–299.
Offer, D., & Sabshin, M. (1966). Normality: Theoretical and clinical Paulhus, D., Fridhandler, B., & Hayes, S. (1997). Psychological
concepts of mental health. New York: Basic Books. defense: Contemporary theory and research. In R. Hogan, J.
Ogg, Brinkman, T. M., Dedrick, R. F., & Carlson, J. S. (2010). Factor Johnson, & S. Briggs (Eds.), Handbook of personality psychology. San
structure and invariance across gender of the Devereux Early Diego: Academic Press.
Childhood Assessment Protective Factor Scale. School Psychology Paulman, R. G., & Kennelly, K. J. (1984). Test anxiety and ineffective
Quarterly, 25, 107–118. test taking: Different names, same construct? Journal of Educational
Ogloff, J. R., Wong, S., & Greenwood, A. (1990). Treating criminal Psychology, 76, 279–288.
psychopaths in a therapeutic community program. Behavioral Payne, A. F. (1928). Sentence completions. New York: New York
Science and the Law, 8, 181–190. Guidance Clinic.
Oles, H. J., & Davis, G. D. (1977). Publishers violate APA standards Pearson, K. (1914, 1924, 1930ab). The life, letters, and labours of Francis
on test distribution. Psychological Reports, 41, 713–714. Galton (Volumes I, II, III, IIIb). Cambridge: Cambridge University
Ollendick, T. H. (1983). Reliability and validity of the Revised Fear Press.
Survey Schedule for Children (FSSC-R). Behavior Research and Pedersen, N. L., Plomin, R., Nesselroade, J., & McClearn, G. (1992). A
Therapy, 21, 685–692. quantitative genetic analysis of cognitive abilities during the second
Olson, H. C. (1994). Fetal alcohol syndrome. In R. J. Sternberg (Ed.), half of the life span. Psychological Science, 3, 346–353.
Encyclopedia of human intelligence. New York: Macmillan. Penfield, W. (1958). Functional localization in temporal and deep
Olson-Buchanan, J., Drasgow, F., Moberg, P., Mead, A., Keenan, P., sylvian areas. Research Publication, Association of Nervous and Mental
& Donovan, M. (1998). Interactive video assessment of conflict Disease, 36, 210–217.
resolution skills. Personnel Psychology, 51, 1–24. Penfield, W., & Evans, J. (1935). The frontal lobe in man: A clinical
Ornberg, B., & Zalewski, C. (1994). Assessment of adolescents with study of maximum removals. Brain, 58, 115–133.
the Rorschach: A critical review. Assessment, 1, 209–217. Penfield, W., & Jasper, H. (1959). Epilepsy and the functional anatomy of
Ortner, T. (2008). Effects of changed item order: A cautionary note the human brain. Boston: Little, Brown.
to practitioners on jumping to computerized adaptive testing Peretz, H., & Fried, Y. (2012). National cultures, performance
for personality assessment. International Journal of Selection and appraisal practices, and organizational absenteeism and turnover:
Assessment, 16, 249–257. A study across 21 countries. Journal of Applied Psychology, 97,
448–459.
Ortner, T. M., & Caspers, J. (2011). Consequences of test anxiety on
adaptive versus fixed item testing. European Journal of Psychological Perry, J. C. (1990). The Defense Mechanism Rating Scales (5th ed.).
Assessment, 27, 157–163. Cambridge, MA: J. C. Perry.
References 429

Perry, J. C., & Henry, M. (2004). Studying defense mechanisms in Piotrowski, C. (1996). The status of Exner'.s Comprehensive System
psychotherapy using the defense mechanism rating scales. In in contemporary research. Perceptual and Motor Skills, 82, 1341–1342.
U. Hentschel, G. Smith, J. Draguns, & W. Ehlers (Eds.), Defense
Piotrowski, Z. A. (1964). A digital computer administration of inkblot
mechanisms: Theoretical, research and clinical perspectives (pp. 165–192).
test data. Psychiatric Quarterly, 38, 1–26.
Oxford, England: Elsevier.
Pirozzolo, F. J., Hansch, E., Mortimer, J., Webster, D., & Kuskowski,
Perry, J. C., Beck, S. M., Constantinides, P., & Foley, J. (2009).
A. (1982). Dementia in Parkinson disease: A neuropsychological
Studying change in defensive functioning in psychotherapy
analysis. Brain and Cognition, 1, 71–83.
using the defense mechanism rating scales: Four hypotheses, four
cases. In R. A. Levy & J. S. Ablon (Eds.), Handbook of evidence-based Pittenger, D. J. (2005). Cautionary comments regarding the Myers-
psychodynamic psychotherapy: Bridging the gap between science and Briggs Type Indicator. Consulting Psychology Journal: Practice and
practice (pp. 121–153). Totowa, NJ, US: Humana Press. Research, 57, 210–221.
Pervin, L. A. (1993). Personality: Theory and research (6th ed.). New Plaisted, J. R., & Golden, C. J. (1982). Test-retest reliability of the
York: Wiley. clinical, factor and localization scales of the Luria-Nebraska
Neuropsychological Battery. International Journal of Neuroscience, 17,
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling,
163–167.
norming, and equating. In R. L. Linn (Ed.), Educational measurement
(3rd ed.). New York: American Council on Education/Macmillan. Plaud, J. J., & Eifert, G. (Eds.). (1998). From behavior theory to behavior
therapy. Boston: Allyn and Bacon.
Peterson, C. (2000). Optimistic explanatory style and health. In
J. Gillham (Ed.), The science and optimism of hope (pp. 145–162). Polivy, J., & Herman, C. P. (1993). Etiology of binge eating:
Philadelphia: Templeton Foundation Press. Psychological mechanisms. In C. G. Fairburn & G. T. Wilson (Eds.),
Binge eating: Nature, assessment, and treatment (pp. 173–205). New
Pfeiffer, E. (1975). A short portable mental status questionnaire for the
York: Guilford Press.
assessment of organic brain deficit in elderly patients. Journal of the
American Geriatrics Society, 23, 433–441. Pollack, R. H. (1971). Binet on perceptual-cognitive development or
Piaget-come-lately. Journal of the History of the Behavioral Sciences, 7,
Phelps, L., & Ensor, A. (1986). Concurrent validity of the WISC-R
370–374.
using deaf norms and the Hiskey-Nebraska. Psychology in the
Schools, 23, 138–141. Pollens, R., McBratnie, B., & Burton, P. (1988). Beyond cognition:
Executive functions. Cognitive Rehabilitation, 6, 26–33.
Phillips, S. E. (1994). High-stakes testing accommodations: Validity
versus disabled rights. Applied Measurement in Education, 7, 93–120. Poortinga, Y. H., & Van de Vijver, F. J. R. (2004). Cultures and
cognition: Performance differences and invariant structures. In
Piaget, J. (1932). The moral judgment of the child. London: Kegan Paul.
R. J. Sternberg & E. L. Grigorenko (Eds.), Culture and competence:
Piaget, J. (1972). The psychology of intelligence. Totowa, NJ: Littlefield Contexts of life success (pp. 139–162). Washington, DC: American
Adams. Psychological Association.
Piedmont, R. L. (1999). Does spirituality represent the sixth factor of Pope, K. S. (1992). Responsibilities in providing psychological test
personality? Spiritual transcendence and the Five-Factor Model. feedback to clients. Psychological Assessment, 4, 268–271.
Journal of Personality, 67, 985–1013.
Popham, W. J. (1978). Criterion-referenced measurement. Englewood
Piedmont, R. L. (2001). Spiritual transcendence and the scientific Cliffs, NJ: Prentice Hall.
study of spirituality. Journal of Rehabilitation, 67, 4–14.
Porch, B. (2001). Porch Index of Communicative Ability—2001 Revision.
Piedmont, R. L. (2004). Spiritual transcendence as a predictor of Austin, TX: Pro-Ed.
psychosocial outcome from an outpatient substance abuse program.
Porteus, S. D. (1931). The psychology of a primitive people: A study of the
Psychology of Addictive Behaviors, 18, 213–222.
Australian aborigine. London: Edward Arnold & Co.
Piedmont, R. L. (2010). Assessment of Spirituality and Religious
Porteus, S. D. (1965). Porteus Maze Test. Fifty years'. application. Palo
Sentiments (ASPIRES): Technical manual (2nd ed.). Timonium, MD:
Alto, CA: Pacific Books.
Author.
Powers, D. (2004). Validity of Graduate Record Examinations
Piedmont, R. L., & Weinstein, H. P. (1993). A psychometric
(GRE) General Test scores for admissions to colleges of Veterinary
evaluation of the new NEO-PIR Facet Scales for Agreeableness and
Medicine. Journal of Applied Psychology, 89, 208–219.
Conscientiousness. Journal of Personality Assessment, 60, 302–318.
Powers, K., & Hagans-Murillo, K. (2004). Twenty-five years after
Piedmont, R. L., Werdel, M., & Fernando, M. (2009). The utility of
Larry P.: The California response to overrepresentation of African-
the Assessment of Spirituality and Religious Sentiments (ASPIRES)
Americans in special education. California School Psychologist, 9,
scale with Christians and Buddhists in Sri Lanka. Research in the
145–158.
Social Scientific Study of Religion, 20, 131–143.
Poythress, N., Monahan, J., Bonnie, R., Otto, R., & Hoge, S. (2002).
Piersma, H., & Boes, J. (1997). MCMI-III as a treatment outcome
Adjudicative competence: The MacArthur studies. New York: Kluwer/
measure for psychiatric inpatients. Journal of Clinical Psychology, 53,
Plenum.
825–832.
Prentky, R. (2001). Mental illness and roots of genius. Creativity
Piirto, J. (1998). Understanding those who create. Scottsdale, AZ: Gifted
Research Journal, 13, 95–104.
Psychology Press.
Prewett, P. N. (1995). A comparison of two screening tests (the Matrix
Pinals, D., Tillbrook, C., & Mumley, D. (2006). Practical application
Analogies Test-Short Form and the Kaufman Brief Intelligence Test)
of the MacArthur Competence Assessment Tool—Criminal
with the WISC-III. Psychological Assessment, 7, 69–72.
Adjudication (MacCAT-CA) in a public sector forensic setting.
Journal of the American Academy of Psychiatry and Law, 34, 179–188. Prout, H., & Schwartz, J. (1984). Validity of the PPVT-R with mentally
retarded adults. Journal of Clinical Psychology, 40, 584–587.
Pintner, R. (1917). The mentality of the dependent child. Journal of
Educational Psychology, 8, 220–238. Psychological Corporation. (1994). WISC-III Writer manual. San
Antonio, TX: Author.
Pintner, R. (1921). Intelligence. In E. L. Thorndike (Ed.), Intelligence
and its measurement: A symposium. Journal of Educational Purish, A. (2001). Misconceptions about the Luria-Nebraska
Psychology, 12, 123–147, 195–216. Neuropsychological Battery. Neurorehabilitation, 16, 275–280.
430 References

Pyle, W. H. (1913). The examination of school children. New York: Reppermund, S., Brodaty, H., Crawford, J. D., & others. (2011). The
Macmillan. relationship of current depressive symptoms and past depression
with cognitive impairment and instrumental activities of daily
Qu, C. (1997). Reliability and validity of the Hiskey-Nebraska Test
living in an elderly population: The Sydney Memory and Ageing
of Learning Aptitude (H-NTLA) in testing China'.s deaf children.
Study. Journal of Psychiatric Research, 45, 1600–1607.
Chinese Mental Health Journal, 11, 70–72.
Reschly, D., Myers, T., & Hartel, C. (2002). Mental retardation:
Quek, K. F., Low, W. Y., Razack, A. H., Loh, C. S., & Chuak, C. B.
Determining eligibility for Social Security benefits. Washington, DC:
(2004). Reliability and validity of the Spielberger State-Trait Anxiety
National Academies Press.
Inventory (STAI) among urological patients: A Malaysian study.
Medical Journal of Malaysia, 59, 258–267. Rest, J. R. (1979). The Defining Issues Test: Manual. Minneapolis:
University of Minnesota Press.
Ramey, C. T., & Ramey, S. (1998). Early intervention and early
experience. American Psychologist, 53, 109–10. Rest, J. R. (1986). Moral research methodology. In S. Modgil &
C. Modgil (Eds.), Lawrence Kohlberg: Consensus and controversy.
Ramos, E., Alfonso, V. C., & Schermerhorn, S. M. (2009). Graduate
Philadelphia: Taylor & Francis.
students'. administration and scoring errors on the Woodcock-
Johnson III Tests of Cognitive Abilities. Psychology in the Schools, 46, Rest, J. R., & Thoma, S. J. (1985). Relation of moral judgment to
650–657. formal education. Developmental Psychology, 21, 709–714.
Ranseen, J., Campbell, D., & Baer, R. (1998). NEO PI-R profiles of Rest, J. R., Thoma, S., Narvaez, D., & Bebeau, M. (1997). Alchemy and
adults with attention deficit disorder. Assessment, 5, 19–24. beyond: Indexing the Defining Issues Test. Journal of Educational
Psychology, 89, 498–507.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment
tests. Copenhagen: Denmarks Paedagogiske Institut. Rey, A. (1964). L'.examen clinique en psychologie. Paris: Presses
Raven, J. (2000). The Raven'.s Progressive Matrices: Change and Universitaires de France.
stability over culture and time. Cognitive Psychology, 41, 1–48. Reynolds, C. R. (1994). Bias in testing. In R. J. Sternberg (Ed.),
Raven, J. C. (1938). Progressive Matrices. London: Lewis. Encyclopedia of human intelligence. New York: Macmillan.

Raven, J. C. (1965). The Coloured Progressive Matrices Test. London: Lewis. Reynolds, C. R. (1998). Cultural bias in testing of intelligence and
personality. In A. Bellack & M. Hersen (Series Eds.) & C. Belar (Vol.
Raven, J. C., & Summers, B. (1986). Manual for Raven'.s Progressive Ed.), Comprehensive clinical psychology: Sociocultural and individual
Matrices and Vocabulary Scales—research supplement no. 3. London: differences. New York: Elsevier Science.
Lewis.
Reynolds, C. R., & Brown, R. T. (1984a). Bias in mental testing: An
Raven, J. C., Court, J. H., & Raven, J. (1983). Manual for Raven'.s introduction to the issues. In Reynolds, C. R., & Brown, R. T. (Eds.),
Progressive Matrices and Vocabulary Scales (Section 3)—Standard Perspectives on bias in mental testing. New York: Plenum Press.
Progressive Matrices (1983 edition). London: Lewis.
Reynolds, C. R., & Brown, R. T. (Eds.). (1984b). Perspectives on bias in
Raven, J. C., Court, J. H., & Raven, J. (1986). Manual for Raven'.s mental testing. New York: Plenum Press.
Progressive Matrices and Vocabulary Scales (Section 2)—Coloured
Progressive Matrices (1986 edition, with U.S. norms). London: Lewis. Reynolds, C. R., Chastain, R. L., Kaufman, A. S., & McLean, J.
E. (1987). Demographic characteristics and IQ among adults:
Raven, J. C., Court, J. H., & Raven, J. (1992). Standard Progressive Analysis of the WAIS-R standardization sample as a function
Matrices. 1992 Edition. Oxford: Oxford Psychologists Press. of the stratification variables. Journal of School Psychology, 25,
Reddon, J. R., & Jackson, D. N. (1989). Readability of three adult 323–342.
personality tests: Basic Personality Inventory, Jackson Personality Reynolds, C. R., Lowe, P. A., & Saenz, A. L. (1999). The problem of
Inventory, and Personality Research Form-E. Journal of Personality bias in psychological assessment. In C. R. Reynolds & T. B. Gutkin
Assessment, 53, 180–183. (Eds.), The handbook of school psychology (3rd ed.). New York: Wiley.
Reeves, D., & Wedding, D. (1994). The clinical assessment of memory: Riccio, C., Reynolds, C., & Lowe, P. (2001). Clinical applications of
A practical guide. New York: Springer. continuous performance tests: Measuring attention and impulsive
Regenwetter, M. (2009). Perspectives on preference aggregation. responding in children and adults. New York: Wiley.
Perspectives on Psychological Science, 4, 403–407. Richards, P. S. (1991). The relation between conservative religious
Rehm, L. P. (1984). Self-management therapy for depression. Advances ideology and principled moral reasoning: A review. Review of
in Behavior Research and Therapy, 6, 83–98. Religious Research, 32, 359–368.
Rehm, L. P., Kornblith, S. J., O'.Hara, M. W., & others. (1981). An Richards, P. S., & Bergin, A. E. (2005). Religious and spiritual
evaluation of major components in a self-control therapy program assessment. In P. S. Richards & A. E. Bergin (Eds.), A spiritual
for depression. Behavior Modification, 5, 459–490. strategy for counseling and psychotherapy (2nd ed., pp. 219–249).
Washington, DC: American Psychological Association.
Reid-Arndt, S. A., Nehl, C., & Hinkebein, J. (2007). The Frontal Systems
Behavior Scale (frSBe) as a predictor of community integration Richards, P. S., & Davison, M. L. (1992). Religious bias in moral
following a traumatic brain injury. Brain Injury, 21, 1361–1369. development research: A psychometric investigation. Journal for the
Scientific Study of Religion, 31, 467–485.
Reilly, R. R., & Chao, G. T. (1982). Validity and fairness of some
alternative employee selection procedures. Personnel Psychology, Rieber, R. W. (Ed.). (1980). Wilhelm Wundt and the making of a scientific
35, 1–63. psychology. New York: Plenum Press.
Reise, S., Ainsworth, A., & Haviland, M. (2005). Item response theory: Rinas, J., & Clyne-Jackson, S. (1988). Professional conduct and legal
Fundamentals, applications, and promise in psychological research. concerns in mental health practice. Norwalk, CT: Appleton & Lang.
Current Directions in Psychological Science, 14, 95–101. Ritter, N., Kilinc, E., Navruz, B., & Bae, Y. (2011). Test review: Test
Reitan, R. M. (1984). Aphasia and sensory perceptual deficits in adults. of Nonverbal Intelligence-4 (TONI-4). Journal of Psychoeducational
Tucson, AZ: Neuropsychology Press. Assessment, 29, 384–388.
Reitan, R. M., & Wolfson, D. (1993). The Halstead-Reitan Ritzler, B. A., Sharkey, K. J., & Chudy, J. (1980). A comprehensive
Neuropsychological Test Battery: Theory and clinical interpretation projective alternative to the TAT. Journal of Personality Assessment,
(2nd ed.). Tucson, AZ: Neuropsychology Press. 44, 358–362.
References 431

Ritzler, B., Erard, R., & Pettigrew, G. (2002). Protecting the integrity & D. L. Segal (Eds.), Comprehensive handbook of psychological
of Rorschach expert witnesses: A reply to Grove and Barden (1999) assessment (vol. 2). New York: John Wiley.
Re: The admissibility of testimony under Daubert/Kumho analyses.
Rogers, R., Sewell, K., & Goldstein, A. (1994). Explanation models of
Psychology, Public Policy, and Law, 8, 201–215.
malingering: A prototypical analysis. Law and Human Behavior, 18,
Roberts, B. W., Walton, K. E., & Viechtbauer, W. (2006). Patterns of 543–552.
mean-level change in personality traits across the life course: A meta-
Rogoff, B. (1984). What are the interrelations among the three
analysis of longitudinal studies. Psychological Bulletin, 131, 1–25.
subtheories of Sternberg'.s triarchic theory of intelligence?
Robertson, I. H., Ward, T., Ridgeway, V., & NimmoSmith, I. Behavioral and Brain Sciences, 7, 300–301.
(1994). Test of Everyday Attention (TEA). Gaylord, MI: National
Roid, G. (2002, August). New Stanford-Binet Intelligence Scales,
Rehabilitation Services.
Fifth Edition: Author'.s Overview. Paper presented at the Annual
Robertson, I. H., Ward, T., Ridgeway, V., & NimmoSmith, I. (1996). Convention of the American Psychological Association, Chicago.
The structure of normal human attention: The Test of Everyday
Roid, G. (2003). Stanford-Binet Intelligence Scales (5th ed.). Itasca, IL:
Attention. Journal of the International Neuropsychological Society, 2,
Riverside Publishing.
525–534.
Roid, G. (2005). Stanford-Binet Intelligence Scales for Early Childhood
Robertson, I., & Smith, M. (2001). Personnel selection. Journal of
(5th ed.). Itasca, IL: Riverside Publishing.
Occupational and Organizational Psychology, 74, 441–472.
Roid, G. H., & Johnson, W. B. (1998). Computer assisted
Robins, D. L. (2008). Screening for autism in primary care settings.
psychological assessment. In A. S. Bellack & M. Hersen (Eds.),
Autism, 12, 537–556.
Comprehensive clinical psychology (vol. 4). Amsterdam: Elsevier.
Robins, D. L., & Dumont-Mathieu, T. (2006). The Modified Checklist
Roid, G., & Miller, L. (1997). Leiter-R Manual. Wood Dale, IL:
for Autism in Toddlers (M-CHAT): A review of current findings and
Stoelting Co.
future directions. Journal of Developmental and Behavioral Pediatrics,
27, S111–S119. Roldán-Tapia, L., Parrón, T., & Sánchez-Santed, F. (2005). Neuro­
psychological effects of long-term exposure to organophosphate
Robins, D. L., Fein, D., & Barton, M. (1999). The Modified Checklist for
pesticides. Neurotoxicology and Teratology, 27, 259–266.
Autism in Toddlers (M-CHAT). Storrs, CT: University of Connecticut.
Roese, N. J., & Amir, E. (2009). Human-android interaction in the near Rorschach, H. (1921). Psychodiagnostik. Berne: Birchen.
and distant future. Perspectives on Psychological Science, 4, 429–434. Rosenberg, S., Ryan, J., & Prifitera, A. (1984). Rey Auditory-Verbal
Rogers, B. (1989). Review of Metropolitan Achievement Test, Sixth Learning Test performance of patients with and without memory
Edition. The tenth mental measurements yearbook. Lincoln: University impairment. Journal of Clinical Psychology, 40, 785–787.
of Nebraska Press. Ross, S. M., Gottfredson, D. K., Christensen, P., & Weaver, R. (1986).
Rogers, B. G. (1992). Review of GED. The eleventh mental measurements Cognitive self-statements in depression: Findings across clinical
yearbook. Lincoln: University of Nebraska Press. populations. Cognitive Therapy and Research, 10, 159–166.

Rogers, C. R. (1951). Client-centered therapy: Its current practice, Rossier, J., de Stadelhofen, F., & Berthoud, S. (2004). The hierarchical
implications, and theory. Boston: Houghton Mifflin. structures of the NEO-PI-R and the 16 PF 5. European Journal of
Psychological Assessment, 20, 27–38.
Rogers, C. R. (1961). On becoming a person: A therapist'.s view of
psychotherapy. Boston: Houghton Mifflin. Rosvold, H. E., Mirsky, A. E., Sarason, I., & others. (1956). A
continuous performance test of brain damage. Journal of Consulting
Rogers, C. R. (1980). A way of being. Boston: Houghton Mifflin. Psychology, 20, 343–350.
Rogers, C. R., & Dymond, R. F. (Eds.). (1954). Psychotherapy and
Rotter, J. B. (1966). Generalized expectancies for internal versus
personality change: Co-ordinated research studies in the client-centered
external control of reinforcement. Psychological Monographs, 80
approach. Chicago: University of Chicago Press.
(Whole No. 609).
Rogers, R. (1984). Rogers Criminal Responsibility Assessment Scales.
Rotter, J. B. (1972). Beliefs, social attitudes, and behavior: A social
Odessa, FL: Psychological Assessment Resources.
learning analysis. In J. B. Rotter, J. Chances, & E. J. Phares (Eds.),
Rogers, R. (1986). Conducting insanity evaluations. New York: Van Applications of a social learning theory of personality. New York: Holt,
Nostrand Reinhold. Rinehart and Winston.
Rogers, R. (1986). Conducting insanity evaluations. Odessa, FL: Rotter, J. B., & Rafferty, J. E. (1950). Manual for the Rotter Incomplete
Psychological Assessment Resources. Sentences Blank: College Form. New York: The Psychological
Rogers, R. (2001). Schedule of Affective Disorders and Schizophrenia Corporation.
(SADS). In R. Rogers (Ed.), Handbook of diagnostic and structured Rotter, J. B., Lah, M., & Rafferty, J. (1992). Manual—Rotter Incomplete
interviewing. New York: Guilford. Sentences Blank (2nd ed.). Orlando, FL: The Psychological
Rogers, R. (Ed.). (2008). Clinical assessment of malingering and deception Corporation.
(3rd ed.). New York: Guilford. Rotter, J. B., Rafferty, J. E., & Schachtitz, E. (1965). Validation of the
Rogers, R., & Johansson-Love, J. (2009). Evaluating competency to Rotter Incomplete Sentences Test. In B. I. Murstein (Ed.), Handbook
stand trial with evidence-based practice. Journal of the American of projective techniques. New York: Basic Books.
Academy of Psychiatry & Law, 37, 450–460. Rozin, P. (2009). What kind of empirical research should we
Rogers, R., & Sewell, K. (1999). The R-CRAS and insanity evaluations: publish, fund, and reward? A different perspective. Perspectives on
A re-examination of construct validity. Behavioral Sciences and the Psychological Science, 4, 435.
Law, 17, 181–194. Rubenzer, S., Faschingbauer, T., & Ones, D. (2000). Assessing the
Rogers, R., Bagby, M., & Dickens, S. (1992). Structured Interview U.S. presidents using the Revised NEO Personality Inventory.
of Reported Symptoms (SIRS) manual. Odessa, FL: Psychological Assessment, 7, 403–420.
Assessment Resources. Rubin, M. (1999). Emotional intelligence and its role in mitigating
Rogers, R., Jackson, R., & Cashel, M. (2004). The Schedule for aggression. Unpublished doctoral dissertation, Immaculata College,
Affective Disorders and Schizophrenia (SADS). In M. J. Hilsenroth Immaculata, Pennsylvania.
432 References

Rule, W. R., & Traver, M. D. (1983). Test-retest reliabilities of State- Saulle, M., & Greenwald, B. D. (2012). Chronic Traumatic
Trait Anxiety Inventory in a stressful social analogue situation. Encephalopathy: A review. Rehabilitation Research and Practice,
Journal of Personality Assessment, 47, 276–277. online journal, Article ID 816069, 9 pages.
Rushton, J. P., & Jensen, A. R. (2005). Thirty years of research on race Savickas, M., Taber, B., & Spokane, A. (2002). Convergent and
differences in cognitive ability. Psychology, Public Policy, and Law, 11, discriminant validity of five interest inventories. Journal of
235–294. Vocational Behavior, 61, 139–184.
Russell, M., Martier, S., Sokol, R., & others. (1994). Screening for Scarr, S. (1981). Testing for children: Assessment and the many
pregnancy risk-drinking. Alcoholism: Clinical and Experimental determinants of intellectual competence. American Psychologist, 36,
Research, 18, 1156–1161. 1159–1168.
Russo, J. (1994). Thurstone'.s scaling model applied to the assessment Scarr, S. (1987). Foreward. In R. Elliott (Ed.), Litigating intelligence: IQ
of self-reported depressive severity. Psychological Assessment, 6, tests, special education and social science in the courtroom. Dover, MA:
159–171. Auburn House.
Rust, J., & Lindstrom, A. (1996). Concurrent validity of the WISC-III Scarr, S. (1994). Culture-Fair and Culture-Free tests. In R. J. Sternberg
and Stanford-Binet-IV. Psychological Reports, 79, 618–620. (Ed.), Encyclopedia of human intelligence. New York: Macmillan.
Ryan, A. M., & Sackett, P. R. (1987). Pre-employment honesty testing: Scarr, S., & Weinberg, R. A. (1976). IQ test performance of black children
Fakability, reactions of test takers, and company image. Journal of adopted by white families. American Psychologist, 31, 726–739.
Business and Psychology, 1, 248–256.
Scarr, S., & Weinberg, R. A. (1983). The Minnesota Adoption Studies:
Ryan, J. J., Sattler, J. M., & Tree, H. A. (2009, August). Exploratory factor Genetic differences and malleability. Child Development, 54, 260–267.
analysis of the WAIS-IV. Paper presented at the Annual Convention
of the American Psychological Association, Toronto, Canada. Scarr-Salapatek, S. (1971). Unknowns in the IQ equation. Science, 174,
1223–1228.
Ryan, M. (1985). Review of the Minnesota Clerical Test. The ninth
mental measurements yearbook (vol. I). Lincoln: University of Schaie, K. W. (1958). Rigidity-flexibility and intelligence: A cross-
Nebraska Press. sectional study of the adult life span from 20–70. Psychological
Monographs, 72, no. 9 (Whole No. 462).
Ryan, R. M. (1987). Thematic Apperception Test. In D. J. Keyser &
R. C. Sweetland (Eds.), Test critiques compendium. Kansas City, MO: Schaie, K. W. (1977). Quasi-experimental designs in the psychology
Test Corporation of America. of aging. In J. E. Birren & K. W. Schaie (Eds.), Handbook of the
psychology of aging. New York: Van Nostrand Reinhold.
Saccuzzo, D. P., & Johnson, N. E. (1995). Traditional psychometric
tests and proportionate representation: An intervention and Schaie, K. W. (1978). Review of Senior Apperception Techniques. The
program evaluation study. Psychological Assessment, 7, 183–194. eighth mental measurements yearbook. Lincoln: University of Nebraska
Press.
Sackett, P. R., Borneman, M. J., & Connelly, B. S. (2008). High stakes
testing in higher education and employment: Appraising the Schaie, K. W. (1980). Cognitive development in aging. In L. K. Obler
evidence for validity and fairness. American Psychologist, 63, 215–227. & M. Alpert (Eds.), Language and communication in the elderly.
Lexington, MA: Heath.
Sadock, B., & Sadock, V. (2004). Kaplan and Sadock'.s comprehensive
textbook of psychiatry (8th ed.). Philadelphia: Lippincott, Williams Schaie, K. W. (1985). Manual for the Schaie-Thurstone Adult Mental
and Wilkins. Abilities Test (STAMAT). Palo Alto, CA: Consulting Psychologists
Press.
Sala, F. (2002). Emotional Competence Inventory: Technical manual.
Philadelphia: McClelland Center for Research, HayGroup. Schaie, K. W. (1996). Intellectual development in adulthood: The Seattle
Longitudinal Study. New York: Cambridge University Press.
Salovey, P., & Mayer, J. (1989–1990). Emotional intelligence.
Imagination, Cognition, and Personality, 9, 185–211. Schaie, K. W. (2005). Developmental influences on adult intelligence: The
Seattle longitudinal study. New York: Oxford University Press.
Salter, D., Forney, D., & Evans, N. (2005). Two approaches to examining
the stability of Myers-Briggs Type Indicator scores. Measurement and Schaie, K. W. (2011). Historical influences on aging and behavior. In
Evaluation in Counseling and Development, 37, 208–219. K. W. Schaie & S. L. Willis (Eds.), Handbook of the psychology of aging
(7th ed., pp. 41–55). San Diego, CA: Elsevier.
Salvia, J., & Ysseldyke, J. (2001). Assessment (8th ed). Boston:
Houghton Mifflin. Schaie, K. W., & Willis, S. L. (1986). Adult development and aging.
Boston: Little, Brown.
Samelson, F. (1977). World War I intelligence testing and the
development of psychology. Journal of the History of the Behavioral Schaie, K. W., Caskie, G., Revell, A., & others. (2005). Extending
Sciences, 13, 274–282. neuropsychological assessments in the Primary Mental Ability
space. Aging, Neuropsychology, and Cognition, 12, 245–277.
Sandford, J. A., & Turner, A. (1997). Intermediate Visual and
Auditory Continuous Performance Test (IVA). Los Angeles: Western Schalock, R. L., Borthwick-Duffy, S. A., Buntinx, W., & others. (2010).
Psychological Services. Intellectual disability: Definition, classification, and systems of supports
(11th ed.). Washington, DC: American Association on Intellectual
Sarason, I. G. (1961). Test anxiety, experimental instructions, and
and Developmental Disability.
verbal learning. American Psychologist, 16, 374.
Schalock, R., Luckasson, R., Shogren, K., & others. (2007). The
Sashidharan, T., Pawlow, L. A., & Pettibone, J. C. (2012). An
renaming of Mental Retardation: Understanding the change to the
examination of racial bias in the Beck Depression Inventory-II.
term Intellectual Disability. Intellectual and Developmental Disabilities,
Cultural Diversity and Ethnic Minority Psychology, 18, 203–209.
45, 116–124.
Sattler, J. M. (1988). Assessment of children (3rd ed.). San Diego, CA:
Schatz, P., Pardini, J., Lovell, M. R., Collins, M. W., & Podell, K. (2006).
Jerome M. Sattler, Publisher.
Sensitivity and specificity of the ImPACT test battery for concussion
Sattler, J. M. (2001). Assessment of children: Cognitive applications. San in athletes. Archives of Clinical Neuropsychology, 21, 91–99.
Diego, CA: Jerome M. Sattler, Publisher.
Schear, J. M., & Craft, R. B. (1989). Examination of the concurrent
Sattler, J. M. (2008). Assessment of children: Cognitive foundations (5th validity of the California Verbal Learning Test. Clinical
ed.). La Mesa, CA: Jerome M. Sattler, Publisher. Neuropsychologist, 3, 162–168.
References 433

Scheier, M., Carver, C., & Bridges, M. (1994). Distinguishing Shapiro, E. S. (1996). Academic skills problems workbook. New York:
optimism from neuroticism (and trait anxiety, self-mastery, and Guilford.
self-esteem): A reevaluation of the Life Orientation Test. Journal of Sharkey, K. J., & Ritzler, B. A. (1985). Comparing diagnostic validity
Personality and Social Psychology, 67, 1063–1078. of the TAT and a new Picture Projective Test. Journal of Personality
Scheuneman, J. D. (1987). An argument opposing Jensen on test bias: Assessment, 49, 406–412.
The psychological aspects. In S. Modgil & C. Modgil (Eds.), Arthur Shaughnessy, M., & Moore, J. (1994). The KAIT with developmental
Jensen: Consensus and controversy. New York: Falmer Press. students, honor students, and freshmen. Psychology in the Schools,
Schmidt, F. (2002). The role of general cognitive ability and job 31, 286–287.
performance: Why there cannot be a debate. Human Performance, 15, Shaw, S., Cullen, J., McGuire, J., & Brinckerhoff, L. (1995).
187–211. Operationalizing a definition of learning disabilities. Journal of
Schmidt, F. L., Hunter, J. E., McKenzie, R. C., & Muldrow, T. W. Learning Disabilities, 28, 586–597.
(1979). Impact of valid selection procedures on work-force Shayer, M., Ginsburg, D., & Coe, R. (2007. Thirty years on—a large
productivity. Journal of Applied Psychology, 64, 609–626. anti-Flynn effect? The Piagetian test Volume & Heaviness norms
Schmidt, F., & Zimmerman, R. (2004). A counterintuitive hypothesis 1975–2003. British Journal of Educational Psychology, 77, 25–41.
about employment interview validity and some supporting Sheldon, W., & Stevens, S. (1942). The varieties of temperament: A
evidence. Journal of Applied Psychology, 89, 553–561. psychology of constitutional differences. New York: Harper & Brothers.
Schmidt, K. S., & Gallo, J. L. (2007). Behavioral and Psychological Shen, H., & Comrey, A. (1997). Predicting medical students'.
Assessment of Dementia (BPAD). Lutz, FL: Psychological Assessment academic performances by their cognitive abilities and personality
Resources. characteristics. Academic Medicine, 72, 781–786.
Schmitt, N. (1995). Review of the Differential Aptitude Tests, Sheshlow, D., & Adams, W. (2006). Wide Range Assessment of Memory
Fifth Edition. The twelfth mental measurements yearbook. Lincoln: and Learning (2nd Ed.). Lutz, FL: Psychological Assessment
University of Nebraska Press. Resources.
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Shiffman, S., & Hufford, M. (2001). Ecological momentary
Assessment, 8, 350–353. assessment. Applied Clinical Trials, 10, 42–48.
Schmitt, N., & Kunce, C. (2002). The effects of required elaboration of Shiffman, S., Hufford, M., & Paty, J. (2001). The patient experience
answers to biodata questions. Personnel Psychology, 55, 569–587. movement. Applied Clinical Trials, 10, 48–56.
Schmitt, N., & Robertson, I. (1990). Personnel selection. Annual Shiffman, S., Hufford, M., Hickcox, M., & others. (1997). Remember
Review of Psychology, 41, 289–320. that? A comparison of real-time versus retrospective recall of
smoking lapses. Journal of Consulting and Clinical Psychology, 65,
Schoenberg, M., Dawson, K., Duff, K., & others. (2006). Test
292–300.
performance and classification statistics for the Rey Auditory
Verbal Learning Test in selected clinical samples. Archives of Clinical Shurrager, H. C. (1961). A haptic intelligence scale for adult blind.
Neuropsychology, 21, 693–703. Chicago: Illinois Institute of Technology.
Schroeder, M. L., Schroeder, K. G., & Hare, R. D. (1983). Shurrager, H. C., & Shurrager, P. S. (1964). Manual for the Haptic
Generalizability of a checklist for assessment of psychopathy. Intelligence Scale for the Blind. Chicago: Psychology Research
Journal of Consulting and Clinical Psychology, 51, 511–516. Technology Center, Illinois Institute of Technology.

Schroffel, A. (2012). The use of in-basket exercises for the recruitment Siegman, A. W. (1956). The effect of manifest anxiety on a concept
of advanced social service workers. Public Personnel Management, formation task, a nondirected learning task, and on timed and
41, 151–160. untimed intelligence tests. Journal of Consulting Psychology, 20,
176–178.
Schuldberg, D. (1988). The MMPI is less sensitive to the automated
testing format than it is to repeated testing: Item and scale effects. Silver, J. M., McAllister, T. W., & Yudofsky, S. C. (Eds.). (2011).
Computers in Human Behavior, 4, 285–298. Textbook of traumatic brain injury (2nd ed.). Washington, DC:
American Psychiatric Association.
Schuler, M. (1999). Brief report: Frequency of maternal cocaine use
during pregnancy and infant neurobehavioral outcome. Journal of Silverstein, A. B. (1986). Organization and Structure of the Detroit
Pediatric Psychology, 24, 511–514. Tests of Learning Aptitude (DTLA-2). Educational and Psychological
Measurement, 46, 1061–1066.
Schwab, L. O. (1979). The Nebraska assessment for independent living
(Project 93–013). Lincoln: Department of Human Development and Silvia, P. J., Wigert, B., Reiter-Palmon, R., & Kaufman, J. C. (2012).
the Family, University of Nebraska. Assessing creativity with self-report scales: A review and empirical
evaluation. Psychology of Aesthetics, Creativity, and the Arts, 6, 19–34.
Seashore, C. E. (1938). The psychology of musical talent. Boston: Silver,
Simpson, J. A., Rholes, W. S., & Nelligan, J. S. (1992). Support seeking
Burdett.
and support giving within couples in an anxiety-provoking
Segal, N. (2012). Born together—Reared apart: The landmark Minnesota situation: The role of attachment styles. Journal of Personality and
Twin Study. Cambridge, MA: Harvard University Press. Social Psychology, 62, 434–446.
Seligman, M. E. P., & Csikszentmihalyi, M. (2000). Positive Sipps, G. J., Berry, G. W., & Lynch, E. M. (1987). WAIS-R and social
psychology: An introduction. American Psychologist, 55, 5–14. intelligence: A test of established assumptions that uses the CPI.
Seligman, M. E. P., & Kahana, M. (2009). Unpacking intuition: A Journal of Clinical Psychology, 43, 499–504.
conjecture. Perspectives on Psychological Science, 4, 399–402. Sisson, E. D. (1948). Forced-choice: The new Army rating. Personnel
Seligman, M. E. P., Abramson, L. Y., Semmel, A., & Von Baeyer, C. Psychology, 1, 365–381.
(1979). Depressive attributional style. Journal of Abnormal Psychology, Sivan, A. B. (1991). Revised Visual Retention Test: Clinical and
88, 242–247. experimental applications (5th ed.). San Antonio, TX: The
Psychological Corporation.
Sellbom, M., Fishler, G., & Ben-Porath, Y. (2007). Identifying MMPI-2
predictors of police officer integrity and misconduct. Criminal Skinner, B. F. (1953). Science and human behavior. New York:
Justice and Behavior, 34, 985–1004. Macmillan.
434 References

Skinner, B. F. (1974). About behaviorism. New York: Knopf. Spearman, C. (1923). The nature of ‘intelligence’. and the principles of
cognition. London: Macmillan.
Smith, A. (1960). Changes in Porteus Maze scores of brain-operated
schizophrenics after an eight year interval. Journal of Mental Science, Spearman, C. (1927). The abilities of man. New York: Macmillan.
106, 967–978.
Specht, J., Egloff, B., & Schmuckle, S. C. (2011). Stability and change
Smith, A. (1973). Symbol Digit Modalities Test. Manual. Los Angeles: of personality across the life course: The impact of age and major
Western Psychological Services. life events on mean-level and rank-order stability of the big five.
Journal of Personality and Social Psychology, 101, 862–882.
Smith, A., & Kinder, E. (1959). Changes in psychological test
performances of brain-operated subjects after eight years. Science, Special Education Today. (1985). ACALD definition of learning
129, 149–150. disabilities. 2, 1–20.
Smith, G. T. (2009). Why do different individuals progress along Sperry, R. W. (1964). The great cerebral commissure. Scientific
different life trajectories? Perspectives on Psychological Science, 4, American, 210, 42–52.
415–421.
Spielberger, C. D. (1973). Manual for the State-Trait Anxiety Inventory
Smith, J. (2001). Detroit Tests of Learning Aptitude, Fourth Edition. for children. Palo Alto: Consulting Psychologists Press.
Fourteenth mental measurements yearbook. Lincoln: University of
Spielberger, C. D. (1983). Manual for the State-Trait Anxiety Inventory
Nebraska Press.
(form y). Menlo Park, CA: Mind Garden.
Smith, M., Delves, T., Lansdown, R., Clayton, B., & Graham, P. (1983).
Spielberger, C. D. (1989). State-Trait Anxiety Inventory (STAI): A
The effects of lead exposure on urban children: The Institute of
comprehensive bibliography (Revised). Menlo Park, CA: Mind Garden.
Child Health/
Spielberger, C. D., & Vagg, P. R. (Eds.). (1995). Test anxiety: Theory,
Southampton Study. Developmental Medicine and Child Neurology, 25,
assessment, and treatment. Philadelphia: Taylor & Francis.
1–54.
Spielberger, C. D., Gonzalez, H. P., Taylor, C. J., & others. (1980). Test
Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations:
Anxiety Inventory. Palo Alto, CA: Consulting Psychologists Press.
An approach to the construction of unambiguous anchors for rating
scales. Journal of Applied Psychology, 47, 149–155. Spielberger, C. D., Gorsuch, R. L., & Lushene, R. E. (1970). The State-
Trait Anxiety Inventory: Test manual. Palo Alto, CA: Consulting
Smith, T. W., Follick, M. J., Ahern, D. K., & Adams, A. (1986). Cognitive
Psychologist Press.
distortion and disability in chronic low back pain. Cognitive Therapy
and Research, 10, 201–210. Spitzer, R., & Endicott, J. (1978). Research diagnostic criteria:
Rationale and reliability. Archives of General Psychiatry, 35, 773–782.
Smyth, J., Wonderlich, S., Crosby, R., & others. (2001). The use of
ecological momentary assessment approaches in eating disorder Spohr, H., & Steinhausen, H. (Eds.). (1996). Alcohol, pregnancy, and the
research. International Journal of Eating Disorders, 30, 83–95. developing child. Cambridge: Cambridge University Press.
Snow, J. H. (1992). Review of Luria-Nebraska Neuropsychological Spreen, O. (2001). Learning disabilities and their neurological
Battery: Forms I and II. The eleventh mental measurements yearbook. foundations, theories, and subtypes. In A. Kaufman & N. Kaufman
Lincoln: University of Nebraska Press. (Eds.), Specific learning disabilities and difficulties in children and
adolescents. Cambridge, England: Cambridge University Press.
Snyder, C. R., & Lopez, S. (2007). Positive psychology: The scientific and
practical explorations of human strengths. Thousand Oaks, CA: Sage. Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological
tests: Administration, norms, and commentary (2nd ed.). New York:
Snyder, D. K., Lachar, D., & Wills, R. M. (1988). Computer-based
Oxford University Press.
interpretation of the Marital Satisfaction Inventory: Use in treatment
planning. Journal of Marital and Family Therapy, 14, 397–409. Springer, S., & Deutsch, G. (1997). Left brain, right brain (5th ed.). San
Francisco: W. H. Freeman.
Society for Industrial and Organizational Psychology, Inc. (1987).
Principles for the validation and use of personnel selection procedures (3rd Sreenivasan, S., Walker, S., Weinberger, L., Kirkish, P., & Garrick, T.
ed.). College Park, MD: Author. (2008). Four-facet PCL-R structure and cognitive functioning among
high violent criminal offenders. Journal of Personality Assessment, 90,
Society for Research in Child Development. (2010). Social policy
197–200.
report brief: Protecting children from lead exposure. Sharing Youth
and Child Development Knowledge, 24(1). Stafford-Clark, D. (1971). What Freud really said. New York: Schocken
Books.
Sokol, R. J., & Clarren, S. K. (1989). Guidelines for use of terminology
describing the impact of prenatal alcohol on the offspring. Stanley, J. C. (1971). Reliability. In R. L. Thorndike (Ed.), Educational
Alcoholism: Clinical and Experimental Research, 13, 597–598. measurement. Washington, DC: American Council on Education.
Sonne, J. L. (2012). Mental status examination. In J. L. Sonne (Ed.), Steele, C. M. (1997). A threat in the air: How stereotypes shape
PsycEssentials: A pocket resource for mental health practitioners (pp. intellectual identity and performance. American Psychologist, 6,
47–56). Washington, DC: American Psychological Association. 613–629.
Sontag, L. W., Baker, C., & Nelson, V. (1958). Mental growth and Steele, C. M., & Aronson, J. (1995). Stereotype threat and the
personality development: A longitudinal study. Monographs of the intellectual test performance of African Americans. Journal of
Society for Research in Child Development, 23 (Whole No. 68). Personality and Social Psychology, 69, 797–811.
Sotile, W. M., Julian, A., Henry, S. E., & Sotile, M. O. (1988). Family Steer, R. A., Beck, A. T., & Brown, G. (1989). Sex differences on the
Apperception Test manual. Los Angeles: Western Psychological Revised Beck Depression Inventory for outpatients with affective
Services. disorders. Journal of Personality Assessment, 53, 693–702.
Soto, C. J., John, O. P., Gosling, S. D., & Potter, J. (2011). Age Steers, R. M., & Rhodes, S. R. (1978). Major influences on employee
differences in personality traits from 10 to 65: Big Five domains and attendance: A process model. Journal of Applied Psychology, 63,
facets in a large cross-sectional sample. Journal of Personality and 391–407.
Social Psychology, 100, 330–348.
Stefan, S. (2001). Unequal rights: Discrimination against people with
Spearman, C. (1904). “General intelligence,” objectively determined mental disabilities and the Americans with Disabilities Act. Washington,
and measured. American Journal of Psychology, 15, 201–293. DC: American Psychological Association.
References 435

Stehouwer, R. S. (1987). Beck Depression Inventory. In D. J. Keyser & Sternberg, R., & Lubart, T. (1992). Buy low and sell high: An
R. C. Sweetland (Eds.), Test critiques compendium. Kansas City, MO: investment approach to creativity. Current Directions in Psychological
Test Corporation of America. Research, 1, 1–5.
Steinweg, D. L., & Worth, H. (1993). Alcoholism: The keys to the Stevens, S. S. (1946). On the theory of scales and measurement.
CAGE. American Journal of Medicine, 94, 520–523. Science, 103, 677–680.
Stenner, A. J. (2001). The Lexile Framework: A common metric for Stewart, G., Dustin, S., Barrick, M., & Darnold, T. (2008). Exploring
matching readers and text. California School Library Association the handshake in employment interviews. Journal of Applied
Journal, 25, 41–42. Psychology, 93, 1139–1146.
Stephenson, W. (1953). The study of behavior: Q-technique and its Stewart, P., Reihman, J., Lonky, E., Darvill, T., & Pagano, J. (1999).
methodology. Chicago: University of Chicago Press. Prenatal PCB exposure and neonatal behavioral assessment scale
Steptoe, A., Wright, C., Kunz-Ebrecht, S., & Iliffe, S. (2006). (NBAS) performance. Neurotoxicology and Teratology, 22, 21–29.
Dispositional optimism and health behaviour in community- Stockwell, S., Schaeffer, B., & Lowenstein, J. (1991). The SAT coaching
dwelling older people: Associations with healthy ageing. British coverup. Cambridge, MA: Fairtest.
Journal of Health Psychology, 11, 71–84.
Stokes, G., & Cooper, L. (2001). Content/construct approaches in
Stern, R., & White, T. (2003a). Neuropsychological Assessment life history form development for selection. International Journal of
Battery: Administration, scoring, and interpretive manual. Lutz, FL: Selection and Assessment, 9, 138–151.
Psychological Assessment Resources.
Stokes, G., & Cooper, L. (2004). Biodata. In J. Thomas (Ed.),
Stern, R., & White, T. (2003a). Neuropsychological Assessment Comprehensive handbook of psychological assessment, Vol. 4: Industrial
Battery: Psychometric and technical manual. Lutz, FL: Psychological and organizational assessment (pp. 243–268). Hoboken, NJ: John Wiley.
Assessment Resources.
Stokes, G., Mumford, M., & Owens (Eds.). (1994). Biodata handbook:
Stern, W. L. (1912). Uber die psychologischen Methoden der Theory, research, and use of biographical information in selection and
Intelligenzprufung. American translation by G. M. Whipple (1914). performance prediction. Palo Alto, CA: Consulting Psychologists
The psychological methods of testing intelligence. Educational Press.
Psychology Monographs, no. 13, Baltimore: Warwick & York.
Stone, B. J. (1994). Group ability test versus teachers'. ratings for
Sternberg, R. J. (1981). Intelligence and nonentrenchment. Journal of predicting achievement. Psychological Reports, 75, 1487–1490.
Educational Psychology, 73, 1–16.
Storandt, M., & Hill, R. D. (1989). Very mild senile dementia of the
Sternberg, R. J. (1985a). Componential analysis: A recipe. In D. Alzheimer type: 2. Psychometric test performance. Archives of
K. Detterman (Ed.), Current topics in human intelligence (vol. 1). Neurology, 46, 383–386.
Norwood, NJ: Ablex.
Stout, J. C., Ready, R. E., Grace, J., Malloy, P. F., & Paulsen, J. S.
Sternberg, R. J. (1985b). Beyond IQ: A triarchic theory of human (2006). Factor Analysis of Frontal Systems Behavior Scale (frSBe).
intelligence. Cambridge: Cambridge University Press. Assessment, 10, 79–85.
Sternberg, R. J. (1986). Intelligence applied: Understanding and increasing Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A compendium of
your intellectual skills. San Diego, CA: Harcourt Brace Jovanovich. neuropsychological tests: Administration, norms, and commentary (3rd
Sternberg, R. J. (1993). Sternberg Triarchic Abilities Test (Level H). ed.). New York: Oxford University Press.
Unpublished test. Strauss, E., Sherman, E., & Spreen, O. (2006). A compendium of
Sternberg, R. J. (1994). The triarchic theory of intelligence. In R. neuropsychological tests: Administration, norms, and commentary (3rd
J. Sternberg (Ed.), Encyclopedia of human intelligence. New York: ed.). New York: Oxford University Press.
Macmillan. Strayhorn, J. C., & Strayhorn, J. M. (2012). Lead exposure and the
Sternberg, R. J. (1996). Successful intelligence. New York: Simon & 2010 achievement test scores of children in New York counties.
Schuster. Child and Adolescent Psychiatry and Mental Health, 6, 4.
Sternberg, R. J. (2002). Creativity as a decision. American Psychologist, Streiner, D. L., Goldberg, J. O., & Miller, H. R. (1993). MCMI-II item
57, 376. weights: Their lack of effectiveness. Journal of Personality Assessment,
60, 471–476.
Sternberg, R. J. (Ed.). (1994). Encyclopedia of human intelligence (vols. 1,
2). New York: Macmillan. Streissguth, A., Bookstein, F., & Barr, H. (1996). A dose-response
study of the enduring effects of prenatal alcohol exposure: birth to
Sternberg, R. J., & Detterman, D. K. (Eds.). (1986). What is intelligence?
14 years. In H. Spohr & H. Steinhausen (Eds.), Alcohol, pregnancy,
Contemporary viewpoints on its nature and definition. Norwood, NJ:
and the developing child. Cambridge: Cambridge University Press.
Ablex.
Streissguth, A., Bookstein, F., Barr, H., & others. (2004). Risk factors
Sternberg, R. J., & Kaufman, J. C. (1998). Human abilities. Annual
for adverse life outcomes in fetal alcohol syndrome and fetal
Review of Psychology, 49, 479–502.
alcohol effects. Developmental and Behavioral Pediatrics, 25, 226–238.
Sternberg, R. J., & Williams, W. (1997). Does the Graduate Record
Streissguth, A., Martin, D., Barr, H., & Sandman, B. (1984).
Examination predict meaningful success in the graduate training of
Intrauterine alcohol and nicotine exposure: Attention and reaction
psychologists? A case study. American Psychologist, 52, 630–641.
time in 4-year-old children. Developmental Psychology, 20, 533–541.
Sternberg, R. J., & Zhang, L. (1995). What do we mean by giftedness?
Strong, E. K. (1927). Vocational Interest Blank. Stanford, CA: Stanford
A pentagonal implicit theory. Gifted Child Quarterly, 39, 88–94.
University Press.
Sternberg, R. J., Castejon, J., Prieto, M., Hautamaki, J., & Grigorenko,
Strong, E. K. (1955). Vocational interests 18 years after college.
E. (2001). Confirmatory factor analysis of the Sternberg Triarchic
Minneapolis: University of Minnesota Press.
Abilities Test in three international samples. European Journal of
Psychological Assessment, 17, 1–16. Strong, E. K., Hansen, J., & Campbell, D. (1994). Strong Interest
Inventory. Palo Alto, CA: Consulting Psychologists Press.
Sternberg, R. J., Conway, B. E., Ketron, J. L., & Bernstein, M. (1981).
People'.s conceptions of intelligence. Journal of Personality and Social Stroop, J. R. (1935). Studies of interference in serial verbal reaction.
Psychology, 41, 37–55. Journal of Experimental Psychology, 18, 643–662.
436 References

Strub, R. L., & Black, F. W. (2000). The mental status examination in Teacher, Administrator, and Counselor Manual: Iowa Tests of Educational
neurology (5th ed.). Philadelphia: F. A. Davis. Development. Forms X-8 and Y-8. 1988.
Strutt, A. M., Scott, B. M., Lozano, V. J., Tieu, P. G., & Peery, S. (2012). Teare, J. F., & Thompson, R. W. (1982). Concurrent validity of the
Assessing sub-optimal performance with the Test of Memory Perkins-Binet tests of intelligence for the blind. Journal of Visual
Malingering in Spanish speaking patients with TBI. Brain Injury, 26, Impairment and Blindness, 76, 279–280.
853–863.
Teasdale, G., & Jennett, B. (1974). The Glasgow Coma Scale. Lancet,
Sumi, K. (2006). Correlations between optimism and social 2, 81.
relationships. Psychological Reports, 99, 938–940.
Teasdale, T., & Owen, D. (2005). A long-term rise and recent decline
Sundet, J., Barlaug, D., & Torjussen, T. (2004). The end of the Flynn in intelligence test performance: The Flynn effect in reverse.
effect? A study of secular trends in mean intelligence test scores Personality and Individual Differences, 39, 837–843.
of Norwegian conscripts during half a century. Intelligence, 32,
Teichner, G., Golden, C., Bradley, J., & Crum, T. (1999). Internal
349–362.
consistency and discriminant validity of the Luria Nebraska
Sundet, J., Borren, I., & Tambs, K. (2008). The Flynn effect is partly Neuropsychological Battery-III. International Journal of Neuroscience,
caused by changing fertility patterns. Intelligence, 36, 183–191. 98, 141–152.
Super, D. E. (1953). A theory of vocational development. American Tellegen, A., & Ben-Porath, Y. (1992). The new uniform T scores for
Psychologist, 8(5), 185–190. the MMPI-2: Rationale, derivation, and appraisal. Psychological
Super, D. E. (1990). Career choice and development: Applying Assessment, 4, 145–155.
contemporary theories to practice. San Francisco: Jossey-Bass. Tellegen, A., & Ben-Porath, Y. S. (2008). MMPI-2-RF (Minnesota
Super, D. E. (1994). A life-span, life-space perspective on convergence. Multiphasic Personality Inventory-2 Restructured Form): Technical
In M. L. Savika & R. W. Lent (Eds.), Convergence in career development manual. Minneapolis: University of Minnesota Press.
theories: Implications for science and practice (pp. 63–74). Palo Alto, Temple, R., & Zgaljardic, D. (2009). Ecological validity of the
CA: Consulting Psychologists Press. Neuropsychological Assessment Battery Screening Module in post-
Super, D. E., Savickas, M. L., & Super, C. M. (1996). The life-span, acute brain injury rehabilitation. Brain Injury, 23, 45–50.
life-space approach to careers. In D. Brown, L. Brooks, & Associates Templeton, A. R. (2002). The genetic and evolutionary significance of
(Eds.), Career choice and development (3rd ed., pp. 121–177). San human races. In J. Fish (Ed.), Race and intelligence: Separating science
Francisco: Jossey-Bass. from myth. Mahwah, NJ: Erlbaum.
Sweeney, J., Slade, H., Ivins, R., & others. (2007). Scientific Tendler, A. D. (1930). A preliminary report on a test for emotional
investigation of brain-behavior relationships using the Halstead- insight. Journal of Applied Psychology, 14, 123–126.
Reitan Battery. Applied Neuropsychology, 14, 65–72.
Teng, S. (1942–43). Chinese influence on the western examination
Swenson, W. M., Rome, H., Pearson, J., & Brannick, T. (1965). A system. Harvard Journal of Asiatic Studies, 7, 267–312.
totally automated psychological test: Experience in a medical
Terman, L. M. (1916). The measurement of intelligence. Boston:
center. Journal of the American Medical Association, 191, 925–927.
Houghton Mifflin.
Tabachnick, B. G., & Fidell, L. S. (1989). Using multivariate statistics
Terman, L. M., & Oden, M. H. (1959). Genetic studies of genius: The
(2nd ed.). New York: Harper & Row.
gifted group at mid-life. Stanford, CA: Stanford University Press.
Tallent, N. (1993). Psychological report writing (4th ed.). Englewood
Terrell, F., Terrell, S., & Taylor, J. (1981). Effect of race of examiner
Cliffs, NJ: Prentice Hall.
and cultural mistrust on the WAIS performance of Black students.
Tamkin, A. S., & Scherer, I. W. (1957). What is measured by the Journal of Consulting and Clinical Psychology, 49, 750–751.
“Cannot Say” scale of the group MMPI? Journal of Consulting
Thoma, S. (2006). Research on the defining issues test. In M. Killen
Psychology, 21, 413–417.
& J. Smetana (Eds.), Handbook of moral development (pp. 67–91).
Tan, J. E., Hultsch, D. F., Hunter, M. A., & Strauss, E. (2010). Mahwah, NJ: Erlbaum.
Psychometric investigation of the modified Scales of Independent
Thomas, M., & Watkins, P. (2003, May). Measuring the grateful
Behavior-Revised in an elderly sample. Clinical Gerontologist: The
trait: Development of revised GRAT. Paper presented at the
Journal of Aging and Mental Health, 33, 69–83.
Annual Convention of the Western Psychological Association,
Tanner, B. A. (1992). Computer-aided reporting of the results of Vancouver, BC.
neuropsychological evaluations of traumatic brain injury. Computers
Thompson, C. (1949). The Thompson modification of the Thematic
in Human Behavior, 9, 51–56.
Apperception Test. Journal of Projective Techniques, 13, 469–478.
Tasbihsazan, R., Nettelbeck, T., & Kirby, N. (2003). Predictive
Thorndike, E. L. (1912). The permanence of interests and their
validity of the Fagan Test of Infant Intelligence. British Journal of
relation to abilities. Popular Science Monthly, 81, 449–456.
Developmental Psychology, 21, 585–597.
Tasto, D. L., Hickson, R., & Rubin, S. E. (1971). Scaled profile analysis Thorndike, E. L. (1918). The seventeenth yearbook of the National Society
of fear survey schedule factors. Behavior Therapy, 2, 543–549. for the Study of Education. Pt. II. Bloomington, IL: Public School
Publishing Co.
Tate, R. L. (2010). A compendium of tests, scales, and questionnaires:
The practitioner'.s guide to measuring outcomes after acquired Thorndike, E. L. (1920). Intelligence and its uses. Harper'.s Magazine,
brain impairment. Hove, UK: Psychology Press. 140, 227–235.

Tauszcik, Y. R., & Pennebaker, J. W. (2010). The psychological Thorndike, E. L. (1920a). A constant error in psychological ratings.
meaning of words: LIWC and computerized text analysis methods. Journal of Applied Psychology, 4, 25–29.
Journal of Language and Social Psychology, 29, 24–54. Thorndike, E. L. (1920b). Intelligence and its use. Harper'.s Magazine,
Taylor, F. S. (1942). The origin of the thermometer. Annals of Science, 140, 227–235.
5, 129–156. Thorndike, E. L. (Ed.). (1921). Intelligence and Its Measurement: A
te Nijenhuis, J., Cho, S. H., Murphy, R., & Lee, K. H. (2012). The Flynn Symposium. Journal of Educational Psychology, 12, 123–147, 195–216.
effect in Korea: Large gains. Personality and Individual Differences, 53, Thorndike, R. L., & Stein, S. (1937). An evaluation of the attempts to
147–151. measure social intelligence. Psychological Bulletin, 34, 275–285.
References 437

Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986). The Stanford- Trinidad, D., & Johnson, C. (2002). The association between
Binet Intelligence Scale: Fourth Edition, Guide for administering and emotional intelligence and early adolescent tobacco and alcohol
scoring. Chicago: Riverside. use. Personality and Individual Differences, 32, 95–105.
Thurstone, L. L. (1921). Intelligence. In E. L. Thorndike (Ed.), Tröster, A. (2012). Understanding Parkinson'.s: Cognition and
Intelligence and Its Measurement: A Symposium. Journal of Parkinson'.s. New York: Parkinson'.s Disease Foundation.
Educational Psychology, 12, 123–147, 195–216. Trull, T. J., Useda, J., Costa, Jr., P., & McCrae, R. (1995). Comparison of
Thurstone, L. L. (1925). A method of scaling psychological and the MMPI-2 Personality Psychopathology Five (PSY-5), the NEO-PI,
educational tests. Journal of Educational Psychology, 16, 433–451. and the NEO-PI-R. Psychological Assessment, 7, 508–516.
Thurstone, L. L. (1929). Theory of attitude measurement. Psychological Trull, T. J., Widiger, T., Useda, J., & others. (1998). A structured
Review, 36, 222–241. interview for the assessment of the five-factor model of personality.
Psychological Assessment, 10, 229–240.
Thurstone, L. L. (1931). Multiple factor analysis. Psychological Review,
38, 406–427. Tsai, L., & Tsuang, M. (1979). The Mini-Mental State Test and
computerized tomography. American Journal of Psychiatry, 136,
Thurstone, L. L. (1938). Primary mental abilities. Psychometric
436–439.
Monographs, no. 1. Chicago: University of Chicago Press.
Turk, A. A., Brown, W. S., Symington, M., & Paul, L. K. (2010). Social
Thurstone, L. L. (1947). Multiple factor analysis. Chicago: University of
narratives in agenesis of the corpus callosum: Linguistic analysis of
Chicago Press.
the Thematic Apperception Test. Neuropsychologia, 48, 43–50.
Thurstone, L. L., & Thurstone, T. (1930). A neurotic inventory. Journal
Turkheimer, E., Haley, A., Waldron, M., D'.Onofrio, B., & Gottesman,
of Social Psychology, 1, 3–30.
I. I. (2003). Socioeconomic status modifies heritability of IQ in
Thurstone, L. L., & Thurstone, T. (1941). Factorial studies in young children. Psychological Science, 4, 623–628.
intelligence. Psychometric Monographs, No. 2. Chicago: University of
Tzeng, O. C. S. (1987). Strong-Campbell Interest Inventory. In D. J.
Chicago Press.
Keyser & R. C. Sweetland (Eds.), Test critiques compendium. Kansas
Tiffin, J. (1968). Purdue Pegboard Examiner'.s Manual. Chicago: Science City, MO: Test Corporation of America.
Research Associates.
Tzeng, O., Ware, R., & Chen, J. (1989). Measurement and utility of
Tinius, T. (2003). The Intermediate Visual and Auditory Continuous continuous unipolar ratings for the Myers-Briggs Type Indicator.
Performance Test as a neuropsychological measure. Archives of Journal of Personality Assessment, 53, 727–738.
Clinical Neuropsychology, 18, 199–214.
U.S. Department of Education. (1977). Definition and criteria for
Tombaugh, T. (1997). The test of memory malingering (TOMM): defining students as learning disabled. Federal Register, 42(250),
Normative data from cognitively intact and cognitively impaired 65083.
individuals. Psychological Assessment, 9, 260–268.
U.S. Department of Education. (1992). Fourteenth Annual Report to
Tombaugh, T., McDowell, I., Kristjansson, B., & Hubley, A. (1996). Congress on the Implementation of the Individuals with Disabilities
Mini-Mental State Examination (MMSE) and the Modified Education Act. Washington, DC: Author.
MMSE (3MS): A psychometric comparison and normative data.
Uematsu, S., Lesser, R., Fisher, R. S., & others. (1992). Motor and
Psychological Assessment, 8, 48–59.
sensory cortex in humans: Topography studied with chronic
Tomkins, S. S. (1947). The Thematic Apperception Test. New York: Grune subdural stimulation. Neurosurgery, 31(1), 59–71.
& Stratton.
Ulrich, L., & Trumbo, D. (1965). The selection interview since 1949.
Tong, E., Bishop, G., Enkelmann, H., & others. (2005). The use of Psychological Bulletin, 63, 100–116.
ecological momentary assessment to test appraisal theories of
United States Employment Service. (1970). Manual for the USES
emotion. Emotion, 5, 508–512.
General Aptitude Test Battery. Washington, DC: United States
Torgerson, J. (2009). The response to intervention instructional model: Department of Labor.
Some outcomes from a large-scale implementation in Reading First
Urquhart Hagie, M., Gallipo, P., & Svien, L. (2003). Traditional culture
schools. Child Development Perspectives, 3, 38–40.
versus traditional assessment for American Indian Students: An
Torrance, E. P. (1966). The Torrance Tests of Creative Thinking: Norms— investigation of potential test item bias. Assessment for Effective
Technical Manual (Research Edition). Princeton, NJ: Personnel Press. Intervention, 29, 15–25.
Torrance, E. P. (1974). The Torrance Tests of Creative Thinking Norms— Vaillant, G. (1971). Theoretical hierarchy of adaptive ego mechanisms.
Technical Manual Research Edition—Verbal Tests, Forms A & B. Archives of General Psychiatry, 24, 107–118.
Princeton, NJ: Personnel Press.
Vaillant, G. (1977). Adaptation to life: How the best and the brightest came
Torrance, E. P. (1998). The Torrance Tests of Creative Thinking: Norms— of age. Boston: Little, Brown.
Technical Manual Figural (Streamlined) Forms A & B. Bensenville, IL:
Vaillant, G. (1992). Ego mechanisms of defense: A guide for clinicians and
Scholastic Testing Service.
researchers. Washington, DC: American Psychiatric Press.
Totsika, V., & Sylva, K. (2004). The Home Observation for
Vaillant, G., & Vaillant, C. (1990). Natural history of male
Measurement of the Environment revisited. Child and Adolescent
psychosocial health, XII: A 45-year study of predictors of successful
Mental Health, 9, 25–35.
aging at age 65. American Journal of Psychiatry, 147, 31–37.
Traxler, A. E. (1951). Administering and scoring the objective test.
Van de Vijver, F., & Harsveld, M. (1994). The incomplete equivalence
In E. F. Lindquist (Ed.), Educational measurement. Washington, DC:
of the paper-and-pencil and computerized versions of the General
American Council on Education.
Aptitude Test Battery. Journal of Applied Psychology, 79, 852–859.
Treffert, D. A. (1989). Extraordinary people. London: Bantam Press.
Van Gorp, W. (1992). Review of Luria-Nebraska Neuropsychological
Trefflinger, D. (1985). Review of the Torrance Tests of Creative Battery: Forms I and II. The eleventh mental measurements yearbook.
Thinking. In J. V. Mitchell, Jr., (Ed.), The ninth mental measurements Lincoln: University of Nebraska Press.
yearbook (pp. 1632–1634).
Van Iddekinge, C. H., Roth, P. L., Raymark, P. H., & Odle-Dusseau,
Trevisan, M. S. (1992). Review of GED. The eleventh mental H. N. (2012). The criterion-related validity of integrity tests: An
measurements yearbook. Lincoln: University of Nebraska Press. updated meta-analysis. Journal of Applied Psychology, 97, 499–530.
438 References

Vance, B., Kitson, D., & Singer, M. (1985). Relationship between the Walsh, B. D. (1996, March). The psychometric characteristics of the
standard scores of PPVT-R and Wide Range Achievement Test. Career Beliefs Inventory. Dissertation Abstracts International, Section
Journal of Clinical Psychology, 41, 691–693. A: Humanities and Social Sciences, 56(9-A), 3516.
VanderVeer, B., & Schweid, E. (1974). Infant assessment: Stability of Walsh, B. D., Thompson, T., & Kapes, J. (1997). The construct
mental functioning in young retarded children. American Journal of validity of scores on the Career Beliefs Inventory. Journal of Career
Mental Deficiency, 79, 1–4. Assessment, 5, 31–46.
Varma, A., DeNisi, A., & Peters, L. (1996). Interpersonal affect and Walsh, W. B., & Holland, J. L. (1992). A theory of personality types
performance appraisal: A field study. Personnel Psychology, 49, and work environments. In W. Walsh, R. Price, & K. Craik (Eds.),
341–360. Person-environment psychology: Models and perspectives. Hillsdale, NJ:
Erlbaum.
Vaughn, S., & Haager, D. (1994). The measurement and assessment of
social skills. In G. R. Lyon (Ed.), Frames of reference for the assessment Wanek, J. (1999). Integrity and honesty testing: What do we know?
of learning disabilities: New views on measurement issues. Baltimore: How do we use it? International Journal of Selection and Assessment,
Brookes Publishing. 7, 183–195.
Vautier, S., & Pohl, S. (2009). Do balanced scales assess bipolar Wang, J., & Kaufman, A. (1993). Changes in fluid and crystallized
construct? The case of the STAI scales. Psychological Assessment, 21, intelligence across the 20- to 90-year age range on the K-BIT. Journal
187–193. of Psychoeducational Assessment, 11, 29–37.
Vernon, M. C., & Alles, B. F. (1986). Psychoeducational assessment of Wang, L. (1995). Differential Aptitude Tests. Measurement and
deaf and hard-of-hearing children and adolescents. In P. J. Lazarus Evaluation in Counseling and Development, 28, 168–171.
& S. S. Strichart (Eds.), Psychoeducational evaluation of children and Wang, L., Beckett, G. H., & Brown, L. (2006). Controversies of
adolescents with low-incidence handicaps. New York: Grune & Stratton. standardized assessment in school accountability reform: A critical
Vernon, M. C., & Brown, D. W. (1964). A guide to psychological synthesis of multidisciplinary research evidence. Applied Measurement
tests and testing procedures in the evaluation of deaf and hard-of- in Education, 19, 306–328.
hearing children. Journal of Speech and Hearing Disorders, 29, 414–423. Wang, M. C., Haertel, G. D., & Walberg, H. J. (1990). What influences
Vernon, P. A. (2000). Recent studies of intelligence and personality learning? A content analysis of review literature. Journal of
using Jackson'.s Multidimensional Aptitude Battery and Educational Research, 84, 30–43.
Personality Research Form. In R. Goffin & E. Helmes (Eds.), Washington, J., & Craig, H. (1999). Performance of at-risk, African
Problems and solutions in human assessment: Honoring Douglas American preschoolers on the Peabody Picture Vocabulary Test-III.
N. Jackson at seventy. New York: Kluwer Academic/Plenum Language, Speech, & Hearing Services in Schools, 30, 75–82.
Publishers.
Wasylkiw, L., & Fekken, G. (2002). Personality and self-reported
Vernon, P. A., Martin, R., Schermer, J., & Mackie, A. (2008). A health: Matching predictors and criteria. Personality and Individual
behavioral genetic investigation of humor styles and their Differences, 33, 607–620.
correlations with the Big-5 personality dimensions. Personality and
Individual Differences, 44, 1116–1125. Watkins, C., Campbell, V., Nieberding, R., & Hallmark, R. (1995).
Contemporary practice of psychological assessment by clinical
Vernon, P. E. (1950). The structure of human abilities. London: Methuen. psychologists. Professional Psychology: Research and Practice, 26, 54–60.
Vernon, P. E. (1979). Intelligence: Heredity and environment. San Watkins, P., Woodward, K., Stone, T., & Kolts, R. (2003). Gratitude
Francisco: Freeman. and happiness: Development of a measure of gratitude and
Viglione, D. J., Blume-Marcovici, A. C., Miller, H. L., Giromini, L., & relationships with subjective well-being. Social Behavior and
Meyer, G. (2012). An inter-rater reliability study for the Rorschach Personality, 31, 431–452.
Performance Assessment System. Journal of Personality Assessment, Watson, B. (1983). Test-retest stability of the Hiskey-Nebraska Test of
94, 607–612. Learning Aptitude in a sample of hearing-impaired children and
Vince, J. (2004). Introduction to virtual reality. New York: Springer adolescents. Journal of Speech and Hearing Disorders, 48, 145–149.
Publishing. Watson, B. U., & Goldgar, D. E. (1985). A note on the use of the
Vincent, A., Roebuck-Spencer, T., Gilleland, K., & Schlegel, R. (2012). Hiskey-Nebraska Test of Learning Aptitude with deaf children.
Automated Neuropsychological Assessment Metrics (v4) Traumatic Language, Speech, and Hearing Services in the Schools, 16, 53–57.
Brain Injury Battery: Military normative data. Military Medicine, Watson, C. G., Thomas, D., & Anderson, P. (1992). Do computer-
177, 256–269. administered Minnesota Multiphasic Personality Inventories
Viswesvaran, C., Ones, D., & Schmidt, F. (1996). Comparative underestimate booklet-based scores? Journal of Clinical Psychology,
analysis of the reliability of job performance ratings. Journal of 48, 744–748.
Applied Psychology, 81, 557–574. Wechsler, D. (1932). Analytic use of the Army Alpha examination.
Wagner, R. (1949). The employment interview: A critical review. Journal of Applied Psychology, 16, 254–256.
Personnel Psychology, 2, 17–46. Wechsler, D. (1939). The measurement of adult intelligence. Baltimore:
Wainer, H. (Ed.). (2000). Computerized adaptive testing: A primer (2nd Williams & Wilkins.
ed.). Mahwah, NJ: Erlbaum. Wechsler, D. (1941). The measurement of adult intelligence (2nd ed.).
Walker, C. (2006). Cognitive improvement and alcoholism recovery [fact Baltimore: Williams & Wilkins.
sheet]. Center City, MN: Hazelden Publishing. Wechsler, D. (1944). Measurement of adult intelligence (3rd ed.).
Wallas, G. (1926). The art of thought. New York: Harcourt, Brace. Baltimore: Williams & Wilkins.

Wallbrown, F. H., Carmin, C. N., & Barnett, R. W. (1988). Wechsler, D. (1949). Manual for the Wechsler Intelligence Scale for
Investigating the construct validity of the Multidimensional Children. New York: The Psychological Corporation.
Aptitude Battery. Psychological Reports, 62, 871–878. Wechsler, D. (1952). The range of human capacities (2nd ed.). Baltimore:
Williams & Wilkins.
Walls, R. T., Zane, T., & Thvedt, J. E. (1979). The Independent Living
Behavior Checklist. Dunbar: West Virginia Research and Training Wechsler, D. (1955). Manual for the Wechsler Adult Intelligence Scale.
Center. New York: The Psychological Corporation.
References 439

Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Wiesner, W. H., & Cronshaw, S. F. (1988). A meta-analytic
Children-Revised. San Antonio, TX: The Psychological Corporation. investigation of the impact of interview format and degree of
Wechsler, D. (1981). Manual for the Wechsler Adult Intelligence Scale- structure on the validity of the employment interview. Journal of
Revised. San Antonio, TX: The Psychological Corporation. Occupational Psychology, 61, 275–290.

Wechsler, D. (1989). Manual for the Wechsler Preschool and Primary Wiggins, J. (1997). In defense of traits. In R. Hogan, J. Johnson, & S.
Scale of Intelligence-Revised. San Antonio, TX: The Psychological Briggs (Eds.), Handbook of personality psychology. San Diego, CA:
Corporation. Academic Press.

Wechsler, D. (1991). Manual for the Wechsler Intelligence Scale for Wilkinson, G. S. (1993). Wide Range Achievement Test-III:
Children-III. San Antonio, TX: The Psychological Corporation. Administration manual. Wilmington, DE: Wide Range.

Wechsler, D. (1997). Manual for the Wechsler Adult Intelligence Scale-III. Wilkinson, G., & Robertson, G. (2006). Wide Range Achievement Test—
San Antonio, TX: The Psychological Corporation. Fourth Edition. Lutz, FL: Psychological Assessment Resources.

Wechsler, D. (2003). WISC-IV: Technical and interpretive manual. San Williams, M. (1979). Brain damage, behaviour, and the mind. New
Antonio, TX: Psychological Corporation. York: Wiley.

Wechsler, D. (2008). Manual for the Wechsler Adult Intelligence Scale— Williams, R. L. (1970). Danger: Testing and dehumanizing Black
Fourth Edition. San Antonio, TX: Pearson. children. Clinical Child Psychology Newsletter, 9, 5–6.

Wechsler, D., Coalson, D., & Raiford, S. (2008). WAIS-IV technical and Williamson, L., Campion, J., Malo, S., & others. (1997). Employment
interpretive manual. San Antonio, TX: Pearson. interview on trial: Linking interview structure with litigation
outcomes. Journal of Applied Psychology, 82, 900–912.
Weekes, N. Y. (1994). Sex differences in the brain. In D. W. Zaidel
(Ed.), Neuropsychology (2nd ed.). San Diego, CA: Academic Press. Willingham, W. W., Ragosta, M., Bennett, R., & others. (1988). Testing
handicapped people. Boston: Allyn and Bacon.
Weiner, I. B. (1994). The Rorschach Inkblot Method (RIM) is not a
test: Implications for theory and practice. Journal of Personality Wilson, B. A., Cockburn, J., & Baddeley, A. (1991). The Rivermead
Assessment, 62, 498–504. Behavioral Memory Test (2nd ed.). Suffolk, UK: Thames Valley Test
Company.
Weiner, I. B. (1996). Some observations on the validity of the
Rorschach inkblot method. Psychological Assessment, 8, 206–213. Wilson, B., Alderman, N., Burgess, P., Emslie, H., & Evans, J.
(1996). Behavioral Assessment of the Dysexecutive Syndrome. Bury St.
Weiner, I. B., & Kuehnle, K. (1998). Projective assessment of children
Edmunds, England: Thames Valley Test Company.
and adolescents. In A. S. Bellack, & M. Hersen (Eds.), Comprehensive
clinical psychology, (vol. 4). Amsterdam: Elsevier. Wilson, M. N. (1994). African Americans. In R. J. Sternberg (Ed.),
Encyclopedia of human intelligence. New York: Macmillan.
Weiss, D. J. (1985). Adaptive testing by computer. Journal of Consulting
and Clinical Psychology, 53, 774–789. Wilson, M., & Reschly, D. (1996). Assessment in school psychology
training and practice. School Psychology Review, 25, 9–23.
Weiss, D. J. (Ed.). (1983). New horizons in testing: Latent trait theory and
computerized adaptive testing. New York: Academic Press. Wilson, R. S. (1983). The Louisville Twin Study: Developmental
synchronies in behavior. Child Development, 54, 298–316.
Weiss, D. J., & Vale, C. D. (1987). Computerized adaptive testing
for measuring abilities and other psychological variables. In J. N. Wilson, T. D. (2009). Know thyself. Perspectives on Psychological
Butcher (Ed.), Computerized psychological assessment: A practitioner'.s Science, 4, 384–389.
guide. New York: Basic Books. Wing, H. (1992). Review of the Bennett Mechanical Comprehension
Weiss, D. S., Zilberg, N. J., & Genevro, J. L. (1989). Psychometric Test. The eleventh mental measurements Yearbook. Lincoln: University
properties of Loevinger'.s Sentence Completion Test in an adult of Nebraska Press.
psychiatric outpatient sample. Journal of Personality Assessment, 53, Winter, D. G., & Stewart, A. J. (1977). Power motive reliability as
478–486. a function of retest instructions. Journal of Consulting and Clinical
Weiss, R. A., Rosenfeld, B., & Farkas, M. R. (2011). The utility of Psychology, 45, 436–440.
the Structured Interview of Reported Symptoms in a sample of Wirt, R. D., & Broen, W. E., Jr. (1958). Booklet for the Personality
individuals with intellectual disabilities. Assessment, 18, 284–290. Inventory for Children. Minneapolis, MN: Authors.
Weller, C. E., & Fields, J. (2011). The Black and White labor gap in Wirt, R. D., Lachar, D., Klinedinst, J. K., & Seat, P. D. (1984).
America: Why African Americans struggle to find jobs and remain Multidimensional description of child personality: A manual for the
employed compared to Whites. Washington, DC: Center for American Personality Inventory for Children, Revised 1984. Los Angeles: Western
Progress. Psychological Services.
Wertheimer, M. (1945). Productive thinking. New York: Harper & Row. Wisniewski, J. J., & Naglieri, J. A. (1989). Validity of the Draw A
Wesman, A. G. (1971). Writing the test item. In R. L. Thorndike (Ed.), Person: A Quantitative Scoring System with the WISC-R. Journal of
Educational measurement (2nd ed.). Washington, DC: American Psychoeducational Assessment, 7, 346–351.
Council on Education. Wissler, C. (1901). The correlation of mental and physical tests. The
Westbrook, B. W., & Bane, K. D. (1992). Review of Defining Issues Psychological Review, Monograph Supplement 3(6).
Test. Eleventh mental measurements yearbook. Lincoln: University of
Witchalls, C. (2012, September 27). James R. Flynn: Are we really
Nebraska Press.
getting smarter every year? The Independent.
Whipple, G. M. (1910). Manual of mental and physical tests. Baltimore:
Witelson, S. (2007). Sex and the single hemisphere: Specialization
Warwick and York.
of the right hemisphere for spatial processing. In G. Einstein
Whitney, D. R., Malizio, A. G., & Patience, W. M. (1985). The (Ed.), Sex and the brain (pp. 541–544). Cambridge, MA: MIT
reliability and validity of the GED Tests. American Council on Press.
Education GED Research Brief, May, No. 6.
Wolf, A. W., Schubert, D., Patterson, M., Grande, T., & Pendleton, L.
Whyte, J., Polansky, M., Cavallucci, C., Fleming, M., Lhulier, J., & (1990). The use of the MacAndrew Alcoholism Scale in detecting
Coslett, H. (1996). Innattentive behaviour arfter traumatic brain substance abuse and antisocial personality. Journal of Personality
injury. Journal of the International Neuropsychological Society, 2, 274–281. Assessment, 54, 747–755.
440 References

Wolf, T. H. (1973). Alfred Binet. Chicago: The University of Illinois Wulff, D. M. (1996). The psychology of religion: An overview. In E.
Press. P. Shafranske (Ed.), Religion and the clinical practice of psychology.
Washington, DC: American Psychological Association.
Wolff, K. C., & Gregory, R. J. (1992). The effects of a temporary
dysphoric mood upon selected WAIS-R subtests. Journal of Wundt, W. (1862). Die Geschwindigkeit des Gedankens. Gartenlaube,
Psychoeducational Assessment, 9, 340–344. 263–265.
Wolpe, J. (1958). Psychotherapy by reciprocal inhibition. Stanford, CA: Yalisove, D. (2004). Introduction to alcohol research: Implications for
Stanford University Press. treatment, prevention, and policy. Boston: Allyn & Bacon.
Wolpe, J. (1973). The practice of behavior therapy (2nd ed.). New York: Yama, M. (1990). The usefulness of human figure drawings as an
Pergamon. index of overall adjustment. Journal of Personality Assessment, 54,
78–86.
Wolpe, J., & Lang, P. J. (1977). Manual for the Fear Survey Schedule
(revised). San Diego, CA: Educational and Industrial Testing Service. Yerkes, R. M. (1919). Report of the psychology Committee of the
National Research Council. Psychological Review, 26, 83–149.
Wonderlic, E. F. (1983). Wonderlic Personnel Test manual. Northfield, IL:
E. F. Wonderlic & Associates. Yerkes, R. M. (Ed.). (1921). Psychological examining in the United States
Army. Memoirs of the National Academy of Sciences, vol. 15.
Wood, J. M., Nezworski, M., & Stejskal, W. (1996). The
Comprehensive System for the Rorschach: A critical examination. Yuan, Y. (2002). Development of the norm for the Fagan Test of Infant
Psychological Science, 7, 3–10. Intelligence in a town near Changsha. Chinese Mental health Journal,
16, 320–322.
Wood, J., Garb, H., & Nezworski, M. T. (2007). Psychometrics: Better
measurement makes better clinicians. In S. O. Lilienfeld & W. T. Zapf, P., & Roesch, R. (1997). Assessing fitness to stand trial: A
O'.Donohue (Eds.), The great ideas of clinical science: 17 principles comparison of institution-based evaluations and a brief screening
that every mental health professional should understand. New York: interview. Canadian Journal of Community Mental Health, 16, 53–66.
Routledge. Zapf, P., & Roesch, R. (2009). Evaluation of competence to stand trial.
Woodcock, R. W., McGrew, K. S., & Werder, J. K. (1994). Mini-Battery New York: Oxford University Press.
of Achievement: Examiner'.s manual. Chicago: Riverside. Zapf, P., Skeem, J., & Golding, S. (2005). Factor structure and validity
Woodcock, R., McGrew, K., & Mather, N. (2001). Woodcock-Johnson III of the MacArthur Competence Assessment Tool—Criminal
Tests of Achievement. Itasca, IL: Riverside. Adjudication. Psychological Assessment, 17, 433–445.
Woodworth, R. S. (1919). Examination of emotional fitness for Zavala, A. (1965). Development of the forced-choice rating scale
warfare. Psychological Bulletin, 16, 59–60. technique. Psychological Bulletin, 63, 117–124.
Wortman, J., Lucas, R. E., & Donnellan, M. B. (2012, July 9). Stability Zeidner, M., Roberts, R., & Matthews, G. (2008). The science of
and change in the Big Five personality domains: Evidence from emotional intelligence: Current consensus and controversies.
a longitudinal study of Australians. Psychology and Aging, online European Psychologist, 13, 64–78.
publication. Zhai, F., Brooks-Gunn, J., & Waldfogel, J. (2011). Head Start and
Wrightsman, L., Nietzel, M., Fortune, W., & Greene, E. (2002). Psychology urban children'.s school readiness: A birth cohort study in 18 cities.
and the legal system (5th ed.). Pacific Grove, CA: Brooks/Cole. Developmental Psychology, 47, 134–152.
Name Index
A Asnaani, A., 258
Aamodt, M. G., 330 Assel, M., 192
Abel, G., 171 Atkins-Burnett, S., 191
Abell, S. C., 197 Atkinson, L., 129
Achenbach, T. M., 171 Atlis, M., 381
Adams, G. A., 319 Austin, J. T., 329
Adams, W., 301 Axelrod, B. N., 306
Agbenyega, S., 352 Aylward, G., 186
Aiken, L., 224, 226, 227, 232
Ainsworth, A., 66, 67, 68 B
Ainsworth, M., 256 Bach, P., 308
Albers, C., 183 Backer, T., 232
Albert, M., 292 Baddeley, A., 128, 301, 302
Albert, S., 222 Bae, Y., 199
Alderman, N., 307 Baer, D., 246
Alfonso, V. C., 11 Baer, J., 268
Alkhadher, O., 148 Baer, R., 255
Allen, M. J., 94, 95 Bagby, M., 366, 370
Alles, B. F., 11 Bagby, R., 234
Allport, G., 217, 261, 262 Bailey, D., 197, 342
Allred, E., 172 Bailey, J., 86
Altepeter, T. S., 200 Baker, C., 187, 201
Amabile, T., 167 Balboni, G., 206
Ambrosini, P., 244 Balla, D., 204, 206
Amir, E., 386 Baltes, P., 176, 177
Ammirati, R., 386 Bandalos, D., 272
Anastasi, A., 74, 197 Bandura, A., 216, 239
Andersen, P., 333, 334 Bane, K. D., 261
Anderson, N., 148, 320 Baranek, G., 197
Anderson, P., 382 Barbee, A., 325
Andersson, H. W., 189 Barber, J., 334
Andreasen, N., 243, 283, 285, 286 Barden, C., 364
Andrew, D. M., 323, 323 Bardos, A., 198, 227
Andrews, F., 269 Barlaug, D., 179
Anglin, D., 180 Barnett, R. W., 142
Anhalt, R., 332 Barnett, W. S., 170
Ansorge, C. J., 143 Bar-On, R., 128, 275
Anstey, K. J., 291 Barr, H., 171
Anthony, J. C., 314 Barresi, B., 303
Anthony, J., 192 Barrett, L., 386
Aquilino, S. A., 132 Barrick, M., 317, 319, 320
Archer, P., 193 Barron, F., 267, 269
Aristotle, 29 Barry, A. E., 256
Armendariz, G., 246 Bartok, J., 306
Arnau, R. C., 242 Barton, M., 207
Arnkoff, D., 239 Bartrum, D., 276
Aronson, J., 23 Bartsch, A. J., 292
Aronson, M., 300 Bate, A., 123, 298
Arvey, R. D., 318, 319, 332, 360 Batey, M., 268, 269
Ash, S., 287 Batson, C. D., 263, 264
Asher, J., 326 Bauer, C. R., 171

441
442 Name Index

Bausell, R. B., 74, 88, 92 Black, F. W., 303


Bayless, J. D., 206 Black, J. E., 168
Bayley, N., 182, 188 Blackwell, J., 129
Bazerman, M., 386 Blake, R. J. 324
Bebeau, M., 260 Blau, A. D., 300
Beck, A. T., 86, 90, 240, 242 Blau, T. H., 364, 367, 369
Beck, S. M., 213 Blin, Dr., 35
Behling, O., 323 Bloch, A., 334
Beirne-Smith, M., 115, 195 Block, J., 215
Belcher, M. J., 322 Blume-Marcovici, A. C., 220
Bell, L., 334 Blustein, D. L., 334
Bell, N., 322 Boake, C., 37, 300
Bell, S., 200, 201 Boccacini, M., 372
Bellack, A. S., 238 Boden, M., 267
Bellak, L., 226 Boes, J., 237
Bellak, S. S., 226 Boggs, D. H., 13
Bellinger, C., 189 Boggs, K., 346
Bellinger, D., 172 Boisjoli, J. A., 270
Belsky, J., 386 Bolen, L. M., 164
Benbow, C., 343 Bond, L., 57
Bender, L., 304 Bonner, M. F., 287
Bennett, G. K., 147 Bonnie, R., 371
Bennett, R., 358 Boodoo, G., 175
Bennett, T., 296 Bookstein, F., 171
Ben-Porath, Y. S., 234, 236, 324, 383 Borgen, F., 342, 343
Benson, D. F., 287, 288 Boring, E., 30, 31, 32, 100
Benson, P., 265 Borman, W. C., 315, 318, 319, 321, 331, 332, 333
Benton, A., 297, 306 Borneman, M. J., 165
Beran, T., 185 Bornstein, M. H., 221
Berg, E. A., 308 Borren, I., 179
Berger, S. G., 382 Borthwick-Duffy, S. A., 203
Bergin, A. E., 262 Bos, C. S., 114
Bergstrom, B., 382 Boter, R., 202
Berk, L. E., 171 Bouchard, T. J., Jr., 167, 168, 175
Berk, R. A., 4, 70 Bowden, E., 269
Bernard, P., 188 Bowlby, J., 256
Bernreuter, R. G., 42 Bowling, A., 45
Bernstein, D., 386 Bowman, M., 37
Bernstein, I. H., 33, 70, 86, 192 Boyd-Wickizer, J., 313
Bernstein, M., 102 Bracken, B. A., 101, 197
Berry, C., 326 Brackett, M., 274
Berry, D. J., 184 Braden, J., 202
Berry, G. W., 121 Bradley, J., 308
Bersoff, D. N., 362 Bradley, K., 313
Bertrand, J., 171 Bradley, P., 91, 349, 352, 253
Bertua, C., 320 Bradley, R. H., 193, 194, 195
Best, K. M., 183 Bradley-Johnson, S., 201
Bevc, I., 129 Bradshaw, J. L., 288
Bialik, C., 334 Bramson, R., 242
Bickley, P. G., 111 Brannick, M. T., 327
Bigler, E., 295 Brannigan, G., 304
Bigley, S. E., 343 Brass, D. J., 327
Bilker, W. B., 146 Brauer, B., 202
Biller, A., 291 Brazelton, T. B., 181
Binet, A., 35 Brennan, J., 380
Bishop, G., 247 Brennan, R. L., 59, 66, 70
Bittman, M., 380 Brensinger, C. M., 146
Black, D., 243 Breslau, N., 169
Name Index 443

Breuer, J., 210 Capraro, M., 251


Brewin, C., 172 Capraro, R., 251
Bridges, L. J., 182 Carlson, C. F., 145, 222
Bridges, M., 275 Carlson, J. S., 184
Briesen, P., 197 Carmin, C. N., 142
Brigham, C. C., 41 Carpenter, M. B., 284
Briskin, G. J., 304 Carroll, J. B., 110, 128
Britt, G., 181 Carroll, J. L., 224
Brodaty, H., 166 Carson, A., 186, 187
Brody, E. B., 109 Carson, S., 268
Brody, G. H., 109 Carter, C., 284
Broen, W. E., Jr., 237 Caruso, D., 274, 275
Bromberg, W., 34 Carver, C., 275, 276
Brooks, B., 300 Cascio, W. F., 316, 317, 359, 360
Brooks, M., 313 Casebourne, J., 334
Brooks-Gunn, J., 171, 174, 188, 195 Cashel, M., 244
Brown, A., 113 Caskie, G., 178
Brown, B. K., 318 Caspers, J., 14
Brown, D. W., 11 Caspi, A., 102, 257
Brown, G., 86, 205, 242 Castejon, J., 116
Brown, J., 45 Castelli, W. P., 213
Brown, L., 198 Cathers-Schiffman, T., 197
Brown, R. T., 161, 173 Cattell, J. McK., 28, 31
Brown, S. D., 338 Cattell, R., 44, 110, 177, 217
Brown, W. S., 225, 281 Cautela, J. B., 240
Bruininks, R., 204 Ceci, S., 169, 179
Buck, J., 43, 228 Chaffee, J. W., 29
Bufford, R., 265 Chamberlin, S., 156
Buis, T., 234 Chan, R., 298
Buntinx, W., 203 Chao, G. T., 317, 321
Burgess, P., 307 Charcot, J. M., 35
Burke, H. R., 146 Chase, C., 272
Burton, P., 305 Chastain, R. L., 173
Buschke, H., 300 Chelune, G., 133, 306
Buss, D. M., 217, 386 Chen, J., 250
Butcher, J. N., 44, 233, 234, 235, 236, 374, 381 Cherpitel, C., 313
Buxbaum, L. J., 380 Chiaravalloti, N. D., 295
Byrne, K. E., 193 Chibnall, J., 324, 382
Chilcoat, H., 169
C Chin, C., 133
Caldwell, B. M., 193, 195 Cho, S. H., 179
Callahan, L. A., 369 Choca, J. P., 383
Camara, W. J., 325 Choi, H., 10
Camilli, G., 170 Chudy, J., 226
Campbell, D. P., 342, 345 Chugh, D., 386
Campbell, D. T., 82 Chung, J., 300, 301
Campbell, D., 255 Cicchetti, D., 197, 204, 206
Campbell, J. P., 77, 89, 97, 329 Cirincione, C., 369
Campbell, J., 121, 155, 200, 313 Cirino, P., 133
Campbell, V., 227 Clark, D. A., 241
Campion, J. E., 319, 326 Clark, S., 298
Campion, M. A., 317, 318 Clarke, D., 148
Campione, J., 113, Clarkin, J. F., 255
Canfield, A. A., 54 Clarren, S. K., 172
Canivez, G., 133 Claud, D., 45
Cannell, J. J., 26 Clayton, B., 172
Cantor, J., 255 Cleary, T., 162
Cantu, R. C., 312 Cleckley, H., 80, 372
444 Name Index

Clemans, W. V., 10, 12 Cunningham, M., 325


Clemence, A., 221 Cureton, E. E., 97
Cleveland, J. N., 328 Curran, H. V., 172
Clyne-Jackson, S., 20
Cockburn, J., 301 D
Coe, R., 179 Dahlstrom, L. E., 233, 236
Cohen, J., 284 Dahlstrom, W. G., 233, 236
Cohen, M., 128 Daley, T., 178
Cohen, S., 386 Damaye, M., 35
Colarelli, S., 319 Dana, R. H., 224
Colby, A., 259 Darnold, T., 319
Coldwell, J., 376 Dartnall, N., 197
Cole, J. C., 268 Darvill, T., 182
Cole, N. S., 84, 160 Das, J. P., 113, 131, 132
Collins, M. W., 311 Davidshofer, C. O., 317
Colom, R., 148 Davis, C., 201
Comrey, A. L., 107, 232 Davis, G. D., 18
Conn, H. O., 292 Davis, R., 236
Connelly, B. S., 165 Davison, M. L., 149, 261
Conners, C. K., 298 Dawes, R. M., 376
Conoley, C. W., 242, 374 Dawis, R. V., 337
Constantinides, P., 213 Dawson, A., 380
Constantino, G., 227 Dayan, K., 228
Conte, J., 275 de Bildt, A., 206
Conway, B. E., 102 de Oliveira, C., 288
Conway, J. M., 318, 319 Dean, D., 300
Cooper, L., 317 DeBusk, R., 216
Corkin, S., 286 Decker, S. L., 304, 305
Cornelius, S. W., 102 Delaney, H. D., 307
Coronado, V. G., 279 Delis, D. C., 303, 308
Corwyn, R., 195 DeLuca, J., 295
Cosden, M., 198 Delves, T., 172
Costa, P. T., Jr., 44, 218, 254, 255, 257, 323 Dembroski, T., 213
Costenbader, V., 146 DeNisi, A., 332
Court, J., 62, 145 Denison, F. C., 230
Cowdery, K. M., 44 DeRaad, B., 218
Coyle, J., 207 Deri, S., 43
Craig, H., 200 Detrick, P., 324
Craig, R. J., 237 Detterman, D. K., 101, 116, 152
Cramond, B., 272 Deutsch, G., 113, 288
Crandall, J. E., 81 DeVoy, J., 334
Crawford, J. D., 166, 298 Dey, A., 279
Crawford, J., 123 di Guiseppe, R., 227
Crawford, K., 330 Diamond, S., 30
Creed, P., 276 Dickens, S., 129, 366
Cripe, L., 305 Dickens, W., 175
Critchley, M., 305 Diego, M., 115
Cronbach, L. J., 64, 65, 74, 80, 85 Diener, E., 385
Cronshaw, S. F., 319 DiLalla, L. F., 189
Crook, G. M., 231 Dillon, R. F., 145
Crosby, R., 247 Ding, S., 149
Crum, T., 308 Dixon, C. E., 291
Crump, J., 251 Dixon, D. J., 184
Crystal, H., 300 Dixon, D., 366
Csikszentmihalyi, M., 250, 266 Dodds, J., 188
Cullen, M., 324 Dodge, K., 386
Cummings, N. A., 384 Dodrill, C., 307, 322
Cummings, R., 261 Doll, E., 204
Name Index 445

Dolliver, R. H., 343 Esquirol, J. E. D., 34


Donahue, E. M., 258 Estes, W. K., 123
Donahue, M., 265 Evans, D. A., 292
Donders, J., 121, 122, 123, 133, 310 Evans, J. J., 111
Donlon, T. F., 90, 152 Evans, J., 290
Donnay, D., 342, 343 Evans, L., 231
Donnellan, M. B., 258 Evers, A., 148
Donovan, H., 323 Ewart, C. K., 216
Dowd, E. T., 382 Ewing, J., 313
Drakeley, R. J., 317 Ewing-Cobbs, L., 306
Drasgow, F., 379 Exner, J. E., Jr., 219
Drebing, C., 314 Eyde, L. D., 16, 18, 22
Drotar, D., 188 Eysenck, H. J., 231
DuBois, P. E., 22 Eysenck, M. W., 231
DuBois, P. H., 38, 44
Duker, J., 378 F
Dumont, R., 185 Factor, S., 284
Dunai, F., 150 Fagan, J. F., 188, 189
Duncan, B. L., 45 Fagan, T. K., 101
Duncan, G., 174 Fagiolini, A., 383
Dunn, J. A., 200 Faley, R. H., 360
Dunn, L. M., 200 Fancher, R., 31, 35, 37
Dunnette, M. D., 323 Faraone, S., 322
Durieux-Smith, A., 188 Farkas, M. R., 367
Dustin, S., 319 Farr, J. L., 328, 329
Dworkin, R., 121 Farrell, M., 197
Dyas, L., 261 Faschingbauer, T., 255
Dymond, R. F., 214 Faul, M., 279
Faust, D., 375
E Fein, D., 207
Eaker, E., 213 Feist, G., 267, 269
Eaton, N., 323 Fekken, G., 218
Ebbinghaus, H., 285 Feldman, R., 116
Eblin, J. J., 220 Feldstein, S., 314
Eccles, J., 280 Feldt, L. S., 59, 66, 70
Edwards, A. E., 93, 230 Fernando, M., 65
Eggerth, D. D., 337 Ferris, G., 332
Eifert, G., 239 Ferris, S. H., 314
Eisenstein, N., 133 Fidell, L. S., 104, 107
Ekstrom, R., 201 Field, T., 182
Elacqua, T., 319 Fields, J., 335
Elder, G., 256 Fineman, R., 172
Elfenbein, D., 259 Finholt, T., 255
Elliott, C. D., 184 Finn, S. E., 21, 86
Ellis, A., 240 Fiorello, C., 111, 185
Ellison, C., 264, 265 First, M., 244
Embretson, S., 59, 68, 69, 95, 129 Fischer, J., 265
Emmons, R., 276, 277 Fish, J. M., 174
Endicott, J., 244 Fisher, R. S., 282
Engelhart, C., 133 Fisher, S., 210
Enkelmann, H., 247 Fiske, D. W., 82, 216, 218, 318
Ensor, A., 198 Fitzgibbons, D., 332
Erard, R. E., 219, 364 Flaherty, B., 177
Erbaugh, J., 241 Flanagan, J. C., 330
Erdberg, P., 219 Flanagan, R., 227
Erickson, J., 265 Flavell, J., 113
Erlenmeyer-Kimling, L., 121 Florio, C., 222
Espinosa, M., 178 Floyd, R. G., 111
446 Name Index

Floyd, R., 171 Garrick, T., 372


Flynn, J. R., 175, 178, 179 Gasser, C., 342
Fodstad, J. C., 208 Gasser, M., 149, 329
Foley, J., 213 Gast., J., 367
Folstein, M., 314 Gaudry, E., 230
Folstein, S., 314 Gazzaniga, M., 281
Fonseca, R., 288 Gdowski, C. L., 377
Forbey, J., 383 Geary, D. C., 164
Forrest, D. W., 43 Gelb, S., 38
Forster, A., 138 Genevro, J. L., 223
Fortune, W. H., 18, 363 George, C., 256
Fowler, R. D., 235 Georges, M., 260
Fox, H. M., 222 Gernert, C., 47, 81
Fox, S., 328 Geschwind, N., 287
Fradenburg, L., 246 Getz, I. R., 260
Frank, E., 242, 383 Gfeller, J., 382
Frank, G., 118 Ghez, C., 284
Frank, L. K., 218, 220, 249 Ghiselli, E. E., 77, 89, 97, 316, 321, 322
Franke, W., 29 Gibbon, M., 244
Frankenburg, W. K., 188, 193 Gibbons, R., 383
Frankl, V., 261 Gibbs, J., 259
Franklin, M. E., 240 Gifford, R., 328
Frauenheim, J. G., 138 Gilberstadt, H., 378
Frechtling, J. A., 57 Gill, N., 334
Frederickson, L. C., 197 Gilleland, K., 311
Frederiksen, N., 327 Ginsburg, D., 179
Fremer, J., 381 Giromini, L., 220
Freud, S., 210, 261 Glaesmer, H., 276
Frey, M. C., 152 Glascoe, F. P., 190, 193
Fridhandler, B., 211 Glassman, M., 195
Fried, Y., 333 Goddard, H. H., 22, 37, 273
Friedman, A. F., 231 Goffin, R. D., 230, 328
Friedman, M., 213 Goldberg, L. R., 44, 217, 257, 376
Friedman, T. L., 334, 338 Goldberg, P., 224
Friis, S., 240 Goldberger, A. S., 168
Fruchter, B., 54, 70 Golden, C., 308
Fuchs, D., 137 Goldenberg, D. S., 191, 192
Fuchs, L., 137 Golding, S., 371
Fuld, P. A., 300 Goldman, R., 306
Fuller, G. B., 224 Goldstein, A., 366
Funder, D., 386 Goldstein, I. L., 332
Funkenstein, H., 292 Goldstein, S., 131, 132
Fuqua, D. R., 340 Goleman, D., 275
Furnham, A., 251, 268, 269 Gonzalez, H. P., 14
Goodenough, F., 22, 31, 35, 39, 43, 197, 227
G Goodglass, H., 303
Gagliardi, C., 269 Goodman, D., 318, 319
Gall, F. J., 30 Goodman, J., 187, 188
Gallipo, P., 164 Gordon, M., 298, 258
Gallo, J. L., 294 Gorsuch, R. L., 103, 230
Galton, F., 28, 31, 42, 107 Goslin, D. A., 41, 42
Garb, H. N., 86, 219, 222, 364, 375 Gosling, S. D., 258
Garbin, M. G., 90 Gossage, J. P., 172
Garcia Coll, C., 195 Gothard, S., 366
Gardner, H., 114 Gottfredson, G. D., 343
Gardner, J., 224 Gottfredson, L., 321, 338
Gardner, R., 120 Gottman, J., 247
Garmezy, N., 183 Gough, H., 44, 91, 249, 252, 253, 324
Name Index 447

Gould, S., 35, 38 Haney, T., 213


Gow, A. J., 177 Hannah, J., 202
Grace, J., 295 Hansch, E., 293
Graham, J., 233, 235, 236 Hansen, J. A., 146
Graham, P., 172 Hansen, J. C., 342, 346
Grande, T., 235 Hanson, G. A., 326
Gray, B., 314 Hanson, M., 318, 319
Gray, C. D., 103 Hardy-Braz, S., 202
Gray, J., 278 Hare, R., 372
Green, D., 366 Hargrave, G. E., 19, 85, 236, 254
Greenberg, R., 210 Harlow, S., 261
Greene, E., 18, 363 Harmon, L. W., 56
Greene, J., 92 Harowski, K., 308
Greenough, W. T., 168 Harpur, T., 372
Greenwald, B. D., 312 Harrington, D. M., 269
Greenwood, A., 372 Harris, B., 212, 367
Gregory, R. J., 6, 20, 22, 47, 60, 81, 117, 140, 147, 155, 165, 293, Harris, D., 197
307, 384 Harris, M. M., 329
Greif, E. B., 259 Harrison, D. A., 329
Greve, K., 306 Harrison, P. L., 198
Grieve, A., 183 Harrison, R., 246
Grigorenko, E., 116 Harsveld, M., 149
Grossman, M., 287 Hart, K. J., 367
Grossman, S., 180 Hartel, C., 204
Groth-Marnat, G., 224, 254 Hass, S., 247
Grove, W., 222, 364, 377 Hastings, R., 376
Gruber, C., 237 Hathaway, S., 42, 44, 92, 233
Guaiana, G., 301 Hatton, D., 197
Guilford, J. P., 33, 54, 70, 111, 112, 216, 269, 271 Hautamaki, J., 116
Guion, R. M., 316, 318, 319, 321, 326, 329, 331 Haviland, M., 68, 67
Gulliksen, H., 87 Hawkins, K., 300, 322
Gunning, M. D., 230 Hawthorne, J., 181
Gutkin, R. B., 164 Hayes, P. A., 165
Guttman, L., 90 Hayes, S., 211
Gynther, M. D., 230 Haynes, S. N., 246
Gynther, R. A., 230 Heaton, R. K., 306, 307
Hebb, D., 290
H Heckerl, J. R., 138
Haager, D., 138 Hedge, J., 318, 319
Haaland, K. Y., 307 Heilbrun, A. B., Jr., 260
Hachinski, V. C., 293 Heilbrun, K., 369, 371
Hack, M., 188 Helms, J. E., 161
Hackett, G., 338 Helson, R., 256, 258
Haedt-Matt, A. A., 248 Henry, M., 211
Haertel, G. D., 114 Herbst, J. H., 257
Hagans-Murillo, K., 355 Herman, C. P., 248
Hagen, E., 143, 173 Hernandez-Reif, M., 182
Haiken-Vasen, J., 188 Herrnstein, R., 174
Hain, J., 304 Hersen, M., 238
Hakstian, R., 372 Hershberger, S., 59
Haladyna, T. M., 237 Herzberg, P., 276
Hale, J., 185 Hess, J. A., 208
Hallmark, R., 227 Hesse, E., 256
Hambleton, R. K., 57, 75, 93, 95 Hezlett, S., 155, 320
Hammeke, T., 308 Hiatt, D., 19, 85, 236, 254
Hammill, D., 130 Hickcox, M., 247
Hamsher, K., 297 Hickson, R., 240
Handler, L., 221 Higgins, D. M., 268
448 Name Index

Higgs, M., 251 Hutchinson, M., 201


Highhouse, S., 327 Hutson, H., 180
Hill, B. K., 204, 205 Hutt, M., 304
Hill, P. C., 262 Hyne, S., 345
Hill, R., 267
Hilliard, A. G., 161 I
Hintze, J., 245 Iacono, W., 167
Hiskey, M. S., 198 Ilgen, D., 315
Ho, W., 301 Iliff, L., 293
Hoekstra-Vrolik, S., 202 Iliffe, S., 276
Hoepfner, R., 270 Inwald, R., 324
Hofer, S., 177 Irvin, J. A., 343
Hoffart, A., 240 Itard, J., 34
Hoffman, F. J., 138 Ivcevic, Z., 268
Hofmann, S. G., 239, 258 Iverson, G., 310, 311
Hogan, A. E., 171
Hogan, J., 324 J
Hogan, R., 324 Jaberg, P. E., 184
Hoge, C. W., 291 Jackson, A., 195
Hoge, D. R., 261 Jackson, D. N., 141, 163, 230, 270, 272, 374
Hoge, S., 371 Jackson, R., 244
Holdnack, J., 300 Jacobson, J., 189
Holland, J. L., 335, 343 Jacobson, S., 189
Hollander, E., 207 Jako, R., 318, 319, 332
Hollingshead, A., 193 James, W., 262
Hollingworth, H., 39 Janicki-Deverts, D., 386
Hollingworth, L., 39 Jankowski, D., 237
Holmes, T., 76 Jarman, R. F., 131
Holtzman, W. H., 221 Jennett, B., 88, 289
Holzinger, K. J., 103 Jensen, A. R., 92, 99, 108, 120, 145, 159, 161, 165, 168,
Homola, G., 291 170, 173, 175
Hood R. W., 262, 263 Jessell, T. M., 282, 283
Hooker, S. A., 262 Jiggetts, J., 352
Hooper, S., 197 John, O. P., 258
Hoover, H. D., 4, 55 Johnson, C., 275
Hope, D. A., 239 Johnson, K. A., 200
Horn, J., 110, 111, 177 Johnson, R. C., 31
Horton, A., 296 Johnson, W., 177, 381
Hough, L. M., 323 Johnston, D. W., 239
Howell, R. J., 370 Johnston, M. H., 164, 221
Howieson, D., 126, 288, 295 Johnston, N., 164, 230, 328
Hoyer, J., 276 Jones, K. L., 171
Huang, C., 195 Jones, K., 334
Hubley, A., 295, 314 Jorm, A. F., 291
Huffcutt, A. I., 318, 319 Juan-Espinosa, M., 148
Hufford, M., 247 Judge, T., 332
Hughes, J. L., 79 Julian, E., 155
Hulin, C. L., 329 Jung, C. G., 43
Hull, J., 255 Jung-Beeman, M., 269
Hultsch, D. F., 205
Humphreys, L., 162 K
Hunsberger, B., 263 Kaemmer, B., 236
Hunsley, J., 86, 384 Kahana, M., 386
Hunter, J. E., 79, 149, 166, 316, 321 Kahn, M. W., 222
Hunter, M. A., 205 Kaiser, H. F., 64
Hunter, R. F., 316, 321 Kalat, J., 286
Hurtz, G., 323 Kalberg, W. O., 172
Hutaff-Lee, C. F., 367 Kalemba, V., 234
Name Index 449

Kamin, L. J., 168 Klimoski, R. B., 97, 315, 358


Kamp, J., 323 Kline, P., 59, 81, 86, 320, 321
Kanaya, T., 179 Klinedinst, J. K., 237
Kandel, E. R., 282, 283, 287, 288 Klove, H., 307
Kane, R. L., 296 Koenig, A., 201
Kapes, J., 340 Kohlberg, L., 259, 260
Kaplan, E., 303, 308 Kolb, B., 138, 286, 289
Kapuscinski, A. N., 262 Kolen, M. J., 4, 55
Karr, C., 236 Kolevzon, A., 207
Kasten, R., 328 Kolts, R., 277
Kaufman, A. S., 87, 115, 120, 124, 126, 132, 173, 173, 177 Koluchova, J., 169
Kaufman, J. C., 101, 113, 132, 268, 270 Koppitz, E., 304
Kaufman, N., 87 Kornblith, S. J., 242
Kausler, D., 176 Koss, E., 292, 306
Keel, P. K., 248 Kostrubala, C., 202
Keiser, S., 358 Kraft, R., 382
Keith, L., 200 Kraijer, D., 206
Keith, T. Z., 111 Kramer, J., 414
Keller, R., 330 Kramer, R., 259
Kelley, T. L., 42 Krikorian, R., 303
Kelly, E. L., 318 Kristjansson, B., 295, 314
Kendall, L., 330 Krohn, E., 129
Kendrick, S., 162 Krokoff, L., 247
Kenna, A., 334 Krug, S., 18
Kennedy, C., 311 Krugman, M., 228
Kennedy, W. A., 173 Krumboltz, J. D., 338, 339, 340
Kennelly, K. J., 15 Kuder, G. F., 44, 64
Kent, G. H., 43 Kuehnle, K., 224
Kentle, R. L., 258 Kula, M., 222
Keene, R., 205 Kunce, C., 317
Kerr, B., 269 Kuncel, N., 155, 165, 320
Ketron, J. L., 102 Kunz-Ebrecht, S., 276
Khaleefa, O., 146 Kupfer, D., 383
Kiecolt-Glaser, J., 386 Kupfermann, I., 286
Kifer, E., 153 Kurtines, W., 259
Kilinc, E., 199 Kurzon, C., 318
Killian, G. A., 228 Kuskowski, A., 293
Kim, K. H., 271, 272 Kwate, N., 161
Kim, L. I., 165
Kim, W. J., 165 L
Kim, Y., 242 La Rue, A., 285, 292, 293
Kimbrough, W., 330 Laatsch, L., 383
Kinder, E., 306 LaBarbera, D., 344
King, K., 246 Lachar, D., 235, 237, 377
Kinicki, A., 334 Lacks, P., 304
Kinnear, P. R., 103 Lah, M. I., 223
Kinsbourne, M., 297 Lai, T. C., 298
Kirby, J., 131 Lambert, N. M., 207
Kirby, K., 308 Lamp, R., 129
Kirby, N., 189 Landfield, K., 386
Kirk, J. W., 367 Landy, F., 318, 328, 329
Kirkish, P., 372 Lane, S., 157
Kirkpatrick, L., 263 Lang, P. J., 240
Kite, E., 36 Lansdown, R., 172
Kitson, D., 200 LaPiana, W. P., 156
Klebanov, P., 174 Larrabee, G., 296
Kleiman, L. S., 361 Larsen, G., 278
Klieger, D. M., 240 Larson, G. E., 150, 321
450 Name Index

Larson, L., 342 Low, W. Y., 231


Lassiter, K., 198, 201, 322 Lowe, P., 14, 162
Latham, G. P., 319 Lozano, V. J., 367
Lau, B. C., 311 Lubart, T., 267
Lazowski, L., 313 Lubinski, D., 343
Lebow, B., 376 Lucas, R. E., 258
LeBuffe, P. A., 183 Luckasson, R., 303
Ledbetter, M., 124, 265 Ludwig, K., 33
Ledesma, H., 133 Lukin, M. E., 382
LeDoux, J. E., 281 Lunz, M., 382
Lee, K. H., 179 Luria, A., 112, 288, 308
Lee, M. S., 142 Lushene, R.E., 230
Lee, S. W., 14 Lykken, D., 168
Lefcourt, H. M., 216, 278 Lynch, E. M., 121
Lehman, R. A., 76, 307 Lynn, R. L., 146, 175, 179
Leiter, R. G., 124
Leland, H., 207 M
Lent, R. W., 338 MacDougall, J., 213
LeResche, L., 314 Machover, K., 43, 227
Lesser, R., 282 Mack, J., 306
Lester, B. M., 181 Mackenzie Ross, S. J., 172
Levashina, J., 317 MacMurray, B., 188
Leverett, J., 322 MacPhillamy, D. J., 243
Levin, H., 306 Maddi, S. R., 214
Leviton, A., 172 Maddux, C., 261
Levitt, T., 310 Magoun, H. W., 284
Levy, D., 221 Mahoney, M., 239
Lewinsohn, P. M., 227, 242 Main, M., 256
Lewis, J. F., 56 Majnemer, A., 182
Lewis, M., 188 Mak, M., 301
Lezak, M., 123, 126, 289, 291, 295, 300, 303, 305, 306, 307 Malgady, R. G., 227
Licht, E., 295 Malizia, K., 188
Lichtenberg, P., 300 Malizio, A., 158
Lichtenberger, E., 118, 124 Malloy, P. F., 295
Lieberman, M., 259 Malo, S., 319
Lien, M. T., 184 Maloney, M., 195
Likert, R., 90 Man, D., 301
Lilienfeld, S., 222, 364, 386 Manly, T., 298
Lin, Y., 14 Manning, W. H., 163
Lindal, E., 314 Manto, M., 284
Lindenberger, U., 244 Marcus, D. K., 172
Lindzey, G., 219 Mardell, C., 191, 193
Linn, R., 157 Marnic, L. R., 305
Lipsitz, J. D., 121 Martell, D. A., 369
Lishman, W. A., 172 Martin, J. C., 171
Liskow, B., 313 Martin, R., 278, 386
Little, S., 227 Martin, S., 379
Litz, B. T., 258 Martin, T., 44, 257
Lofquist, L. H., 337 Martuza, V. R., 74
Loftus, E., 386 Mash, E. J., 384
Loh, C. S., 231 Masling, J., 221
Lohman, D., 143, 145 Masten, A. S., 183
Longstaff, H. P., 323 Masters, K. S., 262
Lonky, E., 182 Masur, D. M., 300
Lopez, S., 266, 276, 385 Matarazzo, J., 18, 126, 383
Lord, F. M., 59, 68, 95 Mathias, J., 123, 298
Loring, D., 123 Matson, J. L., 207, 208
Lovell, M. R., 311, 312 Matthews, G., 275
Name Index 451

Matthews, T., 201 Meloy, J. R., 366


Matthews-Morgan, J., 272 Melton, G. B., 20, 350
Mattingley, J. B., 288 Mendelsohn, M., 241
Matto, H. C., 132 Mendez, M., 295
Maurer, S. D., 319 Mendoza-Denton, R., 218
Mausbach, B. T., 220 Menzies, G., 165
May, P. A., 172 Mercer, J. R., 56
Mayer, J. D., 209, 268 Merenda, P. F., 232
Mayer, J., 251, 273, 275 Messiah, A., 313
Mayers, L., 311 Messick, S., 74, 80, 85, 270, 272
Mayeux, R., 287 Mettelman, B. B., 298
Mazer, B., 182 Mevarech, Z., 114
McAllister, T. W., 252, 290 Meyer, G. J., 219, 220
McBratnie, B., 305 Michael, W. B., 64
McCall, R., 3, 187 Middleton, H., 205
McCallum, R. S., 197 Miele, F., 164
McCaulley, M. H., 44, 249, 250 Mihura, J. L., 219
McClearn, G. E., 168 Milkman, K., 386
McCloy, R., 323 Milla, S., 246
McCord, D., 121 Miller, F., 314
McCoy, B., 30 Miller, G., 385
McCrae, R. R., 44, 218, 254, 255, 257, 260, 323 Miller, H. L., 220
McCullough, M., 276 Miller, I., 196
McDonald, A., 370 Miller, L., 115
McDonald, R. P., 86 Miller, N. M., 216
McDowell, I., 298, 314 Miller, S. D., 45
McGee, R., 298 Miller, W., 314
McGlynn, F. D., 240 Millman, J., 92
McGrath, R., 222 Millon, T., 236
McGrath, S. K., 188, 189 Mills, C., 146, 381
McGreevy, M., 369 Milner, B., 285, 289
McGrew, K. S., 110, 111 Minderaa, R., 206
McGue, M., 168 Mintun, M., 284
McGurk, D., 291 Mirsky, A., 298
McGurk, F., 161 Mischel, W., 218
McHugh, P., 314 Mitchell, T. W., 97
McKeachie, W., 14 Mitchell, V., 256, 257
McKee, A. C., 312 Moberg, D., 264
McKee-Ryan, F. M., 334 Moberg, P., 379
McKenzie, R., 79 Mock, J., 241
McKey, R. H., 9 Molteni, M., 206
McKinley, J. C., 42, 44, 92, 234 Monahan, J., 371
McLean, C. P., 258 Montague, M., 114
McLean, J. E., 118, 120, 173 Montie, J., 189
McMillan, D., 376 Moore, E., 23
McNamara, W. J., 79 Moore, J., 311
McNeish, T., 227 Moore, R. C., 220
McNulty, J., 236 Moore, W. P., 26
McPherson, L., 372 Moreland, K. L., 374, 378, 381
McReynolds, P., 33 Moreno, K. E., 149, 151
Mead, A. D., 44 Morgan, C. D., 43, 224
Meagher, M. W., 242 Morgeson, F. P., 317
Mednick, M., 267 Mori, L., 246
Mednick, S., 267, 268 Morris, M., 341
Meehl, P., 80, 233, 375, 376 Morrison, M. W., 154, 307
Meichenbaum, D., 239 Morrison, M., 63
Meier, V. J., 239 Morrison, T., 154
Meisels, S., 191 Morrow, C., 182
452 Name Index

Mortimer, A., 301 Nihira, K., 206


Mortimer, J., 293 Nijenhuis, J., 148, 149
Moruzzi, G., 284 Nilsen, D., 345
Moss, P. A., 84, 160 NimmoSmith, I., 297
Motowidlo, S. J., 319 Nisan, M., 260
Motta, R., 227 Nisbett, R. E., 168
Mount, M., 317 Nolan, K. P., 327
Mountain, M., 306 Nolan, R., 213
Moutafi, J., 251 Norris, G., 307
Muchinsky, P. M., 323, 326, 330, 331 Norris, M. P., 242
Muldrow, T., 79 Novick, M. R., 59, 64, 68, 95
Mulick, J., 203 Nowinski, C. J., 312
Mumford, M. D., 316 Nugent, J., 181
Mundfrom, D., 195 Nunnally, J., 33, 60, 70, 86, 93, 95, 192
Mungas, D. M., 230 Nussbaum, D., 370
Munoz, R. F., 242
Mur, J., 148 O
Murphy, K. R., 150, 317, 328, 331, 332 O’Neill, J., 189
Murphy, R., 179 Oakes, L., 386
Murray, C., 174, 224, 230 O’Brien, K., 291
Murray, H., 43 Ochse, R., 267, 269
Murrie, D., 372 Odle-Dusseau, H. N., 325
Myers, B., 181 Oei, T., 231
Myers, I. B., 44, 249, 250 Offer, D., 249
Myers, T., 204 Ogard, E., 236
Myrtek, M., 13 Ogg, J. A., 183
Ogloff, J., 372
N O’Hara, M. W., 242
Naglieri, J., 113, 131, 132, 183, 198, 200, 227 Oldham, G., 327
Nagy, G., 258 Ollendick, T. H., 240
Narvaez, D., 260 Olsen, B., 240
Naugle, R. I., 133 Olson, G., 255
Naumann, L. P., 258 Olson-Buchanan, J., 379
Naveh-Benjamin, M., 14 Ones, D. S., 155, 255, 325, 330
Needleman, H., 9, 172 Ortner, T. M., 383
Neisser, U., 175 Ortner, T., 14
Nelson, C., 376 Osborne, R. T., 120
Nelson, V., 187 Oswald, F., 329
Nesselroade, J. R., 168, 176 Otis, A. S., 40
Nester, M. A., 358 Ottinger, R., 318
Nestor, P. G., 256 Otto, R., 371
Nettelbeck, T., 178, 189 Owen, D., 179
Netter, B., 222 Owens, W. A., 316
Neumann, C., 372 Ownby, R. L., 21
Nevo, B., 75, 98
Newburger, J., 189 P
Newcomer, P., 134 Pagano, J., 182
Newland, T., 201 Palmer, S., 358
Newman, J. L., 340 Paloutzian, R., 264, 265
Newsome, S., 275 Pandolfo, M., 284
Nezworski, M., 86, 222 Panigua, F., 22
Ngari, S., 146 Paolitto, A., 132
Niaz, U., 314 Pardaffy, V. A., 331
Nichols, T., 284 Pardini, J., 311
Nickel, E., 313 Pargament, K. I., 262
Nieberding, R., 227 Park, N., 386
Nietzel, M. T., 18, 363 Parker, J. D., 128
Nieuwenhuis-Mark, R. E., 314 Parmelee, W. M., 224
Name Index 453

Parsons, F., 335 Pluess, M., 386


Parsons, T. D., 380 Pogge, D., 222
Patience, W., 158 Pohlmann, J. T., 145
Patterson, C., 172 Polivy, J., 248
Patterson, M., 235, 306 Pollack, R. H., 35
Patterson, T. L., 220 Pollard, R., 202
Pattie, A., 177 Pollens, R., 305
Patton, J., 115, 195 Poortinga, Y. H., 165
Patton, W., 276 Pope, K., 21
Paty, J., 247 Popham, W. J., 57
Paul, J., 63, 307 Porter, R., 150
Paul, L. K., 225, 281 Porteus, S., 22, 116, 306
Paulhus, D., 211 Potenza, M., 381
Paulman, R. G., 15 Potter, E., 324
Paulsen, J. S., 295 Potter, J., 258
Pawlow, L. A., 242 Powell, B., 313
Payne, A. F., 43 Powell, S., 313
Payne, J., 115, 195 Powers, D., 155
Pearlson, G., 300 Powers, K., 255
Pearson, K., 31 Poythress, N., 20, 371
Pedersen, N. L., 168 Prentky, R., 267
Pedrabissi, L., 206 Prewett, N., 133
Peery, S., 367 Prieto, M., 116
Penfield, W., 282, 290 Prifitera, A., 300
Pennebaker, J. W., 225 Primerano, D., 111
Pepple, J., 322 Primhoff, E. S., 16
Peretz, H., 333 Proctor, T., 10
Perry, J. C., 211, 212 Prout, H., 201
Perry, J., 381 Puhlik-Doris, P., 278
Perugini, M., 218 Purish, A., 308
Pervin, L. A., 217 Pursell, E. D., 318
Peters, L., 332 Pyle, W. H., 39
Petersen, D., 246
Petersen, N. S., 55 Q
Peterson, C., 275, 386 Qu, P., 198
Peterson, D., 323 Quek, K. F., 231
Peterson, J. B., 268 Quiroga, M., 148
Peterson, P., 308
Petrila, J., 20 R
Pettibone, J. C., 242 Rafferty, J. E., 223
Pettigrew, G., 364 Ragosta, M., 358
Pfeiffer, S. J., 200, 227 Rahe, R., 76
Phelps, L., 197, 198 Ramey, C. T., 170
Phillips, S. E., 358 Ramey, S., 170
Piaget, J., 259 Ramos, E., 11
Piedmont, R. L., 254, 265 Randels, S., 172
Piersma, H., 237 Ranseen, J., 255
Piirto, J., 269 Rappoport, L., 189
Pilkonis, P. A., 242 Rasch, G., 68, 69, 95
Pintner, R., 39 Raven, J. C., 62, 110, 145
Piotrowski, C., 304 Raven, J., 62, 145
Pipes McAdoo, H., 195 Raymark, P. H., 325
Pirozzolo, F., 293 Razack, A. H., 231
Pittenger, D., 252 Ready, R. E., 295
Plaisted, J., 308 Reddon, J. R., 230
Plake, B., 374, 382 Redick, T. S., 311
Plaud, J. J., 239 Redlich, F., 193
Plomin, R., 168, 189 Ree, M. J., 321
454 Name Index

Reese, H., 176 Roth, P. L., 319


Reglade-Méslin, C., 291 Roth, P., 325
Rehm, L. P., 242 Rothstein, M., 230, 328
Reihman, J., 182 Rotter, J. B., 215, 223
Reilly, K., 310 Rounds, J. B., Jr., 341
Reilly, R. R., 317, 321 Rowland, K., 332
Reinecke, M. A., 239 Rubenzer, S., 255
Reise, S., 66, 67, 95 Rubin, M., 275
Reitan, R., 62, 284, 288, 291, 296 Rubin, S. E., 240
Reiter-Palmon, R., 270 Ruddock, M., 182
Reppermund, S., 166 Rue, D. S., 166
Reschly, D. J., 204, 245 Rule, W. R., 231
Rescorla, L. A., 171 Rundmo, T., 213
Rest, J. R., 259, 260 Rushton, P., 173, 175
Revell, A., 178 Russo, J., 89
Rey, A., 300 Ryan, A., 325
Reynolds, C. F., 242 Ryan, J. J., 83
Reynolds, C. R., 120, 161, 162, 164, 165, 173 Ryan, J., 300, 342
Rhodes, L., 329 Ryan, M., 323
Richards, L., 370 Ryan, R. M., 225
Richards, P. S., 180, 261, 262
Richardson, M. W., 65 S
Richmond, J., 193 Sabshin, M., 249
Ridgeway, V., 297 Saccuzzo, D. P., 146
Rinas, J., 20 Sackett, P. R., 165, 324, 325, 326
Ritter, N., 199 Sadock, B., 249
Ritzler, B. A., 225, 364 Sadock, V., 249
Rizzo, A. A., 380 Saenz, A. L., 162
Roberson, G., 306 Sala, F., 275
Roberts, B. W., 257, 258 Salgado, J., 320
Roberts, J., 197, 313 Salovey, P., 251, 273, 274, 275
Roberts, R. J., 306 Salvia, J., 135, 197, 356
Roberts, R., 275 Samelson, F., 41
Robertson, G., 18, 134 Sanderson, C., 255
Robertson, I., 298, 316, 319, 420 Sanderson, M., 172
Robins, D. L., 207 Sandford, J., 298
Rock, S. L., 194 Sarason, I., 15, 298
Roebuck-Spencer, T., 311 Sashidharan, T., 242
Roesch, R., 371 Sattler, J., 10, 12, 15, 21, 22, 72, 83, 133, 173, 185, 197, 245
Roese, N., 386 Saul, R. E., 295
Rogers, B., 159 Saulle, M., 312
Rogers, C. R., 214 Savickas, M. L., 338, 346
Rogers, R., 234, 244, 366, 369, 370 Scarr, S., 146, 170, 184, 352
Rogler, L. H., 227 Schachtitz, E., 323
Roid, G., 128, 129, 186, 196, 381 Schaie, K. W., 82, 109, 176, 178, 256
Rojahn, J., 132, 203 Schalock, R. L., 203
Ropacki, M., 310 Schatz, P., 311
Rorschach, H., 43, 219 Schaubhut, N., 341
Rosanoff, A. J., 43 Schaubroeck, J., 329
Rose, M. P., 240 Scheier, M., 275, 276
Rosenberg, S., 300 Schell, A., 172
Rosenfarb, I. S., 220 Scherer, I. W., 234
Rosenfeld, B., 366 Scherer, L., 288
Rosenman, R., 213 Schermerhorn, S. M., 11
Ross, J., 262 Scheuneman, J. D., 164
Ross, T., 300 Schiebel, D., 232
Rossi-Casé, L., 179 Schiller, J., 279
Rosvold, H., 298 Schlegel, R., 311
Name Index 455

Schmidt, F. L., 80, 166, 318, 321, 330 Skarlicki, D., 319
Schmidt, K. S., 294 Skeels, H. M., 169
Schmitt, N., 64, 316, 317, 319 Skeem, J., 371
Schneider, D. L., 325 Skinner, B. F., 215, 239
Schock, H., 198 Sliwak, R., 324
Schoenberg, M., 300 Sliwinski, M., 177
Schoenrade, P., 263 Slobogin, C., 20
Schroeder, K., 372 Smedslund, G., 213
Schroeder, M., 372 Smith, A., 306
Schroffel, A., 327 Smith, D. W., 171
Schubert, D., 235 Smith, G., 386
Schuldberg, D., 382 Smith, H., 307
Schulein, M., 308 Smith, J., 265
Schuler, M., 182 Smith, L., 265
Schutt, R. K., 256 Smith, M., 172, 320
Schwartz, J. H., 282, 283 Smith, P. C., 330
Schwartz, J., 201 Smither, R. D., 329, 332, 333
Schweid, E., 188 Smyth, J., 247
Sciarrino, J. A., 327 Smyth, K., 306
Scott, B. M., 367 Snitz, B., 376
Scott, K. G., 171 Snow, W., 306, 308
Scullin, M., 179 Snyder, C. R., 266, 267, 276, 385
Seashore, H., 6, 147 Solomon, J., 256
Segal, N., 168, 168 Sommerville, J., 298
Segall, D. O., 149, 151 Song, J., 306
Seguin, E., 34 Song, Z., 334
Seidman, L., 322 Sonne, J. L., 293
Seligman, M., 250, 266, 386 Sontag, L. W., 187
Sewell, K., 366, 370 Soto, C. J., 256, 258
Shaffer, M., 329 Sparks, J., 45
Shapiro, E., 245, 246 Sparrow, S., 204
Shapiro, H., 190 Spearman, C., 42, 59, 108, 109
Sharkey, K. J., 225 Specht, J., 258
Shaw, S., 136 Sperry, R., 281
Shayer, M., 179 Spielberger, C. D., 231
Sheldon, K., 138 Spielberger, C., 14
Sheldon, W., 213 Spitzer, R., 244
Shepherd, P. A., 189 Spokane, A., 342
Sherbenou, R. J., 198 Spreen, O., 45, 138, 297, 300, 303
Sherman, E., 45, 300, 303, 366 Springer, S. P., 113, 288
Sheslow, D., 301 Spurzheim, J., 30
Shiffman, S., 247 Sreenivasan, S., 372
Shoda, Y., 218 St. Laurent, C., 222
Shogren, K., 203 Stafford-Clark, D., 210
Shurrager, H. C., 201 Stanley, J. C., 59
Shurrager, P. S., 201 Steadman, H., 369
Siegler, I. C., 257 Steele, C. M., 23
Siegman, A. W., 15 Steer, R. A., 86, 90, 242
Sigman, M., 178 Steers, R. M., 329
Silver, J. M., 290 Stefansson, J., 314
Silverstein, A., 206 Stein, L., 236
Silvia, P. J., 270 Stein, S., 273
Simon, J. R., 13 Stein, T. D., 312
Singer, M., 189, 200 Steinweg, D. L., 313
Sipps, G. J., 121 Stejskal, W., 222
Sisson, E. D., 331 Stenner, A. J., 158
Sitarenios, G., 274, 275 Stephenson, W., 214
Sivan, A., 297, 303 Steptoe, A., 276
456 Name Index

Stern, R., 310 te Nijenhuis, J., 179


Sternberg, R. J., 101, 102, 115, 116, 154, 267, 268, 269 Teare, J. F., 201
Stevens, S. S., 87, 213 Teasdale, G., 88, 289
Stewart, A. J., 225 Teasdale, T., 179
Stewart, G., 319, 320 Teichner, G., 308
Stewart, P., 182 Tellegen, A., 168, 233, 236
Stockley, C. J., 230 Temple, R., 310
Stokes, G., 316, 317 Templeton, A. R., 174
Stokes, J., 222 Teng, S., 29
Stone, B. J., 145 Terman, L. M., 39, 101
Stone, T., 277 Terrell, F., 14, 23
Storandt, M., 293 Terrell, S., 14, 23
Stout, J. C., 295 Thase, M. E., 242
Strand, J., 240 Thoma, S. J., 259, 260
Strauss, E., 45, 205, 300, 303 Thomas, D., 382
Strayhorn, J. M., 172 Thomas, J. L., 291
Strayhorn, J.C., 172 Thomas, M., 277
Streissguth, A., 171 Thompson, C., 226
Strong, E. K., 32, 44, 341 Thompson, L. A., 189
Stroop, J. R., 131 Thompson, M., 197
Strub, R. L., 303 Thompson, R. W., 201
Strutt, A. M., 367 Thompson, R., 341
Stuck, A., 314 Thompson, T., 340
Sullivan, M. W., 189 Thorndike, E. L., 3, 32, 44, 101, 273, 331
Summers, B., 146 Thorndike, R. L., 173
Sundet, J., 179 Thurstone, L. L., 42, 90, 109, 147
Super, C. M., 338 Thurstone, T. G., 42, 109
Super, D. E., 338 Tieu, P. G., 367
Susser, E., 169 Tissot, S., 146
Svien, L., 164 Tobin, M., 227
Sweeney, J., 296 Tombaugh, T., 295, 311, 367
Swider, B. W., 320 Tomkins, S. S., 224
Swineford, F., 103 Tong, E., 247
Symons, D., 298 Tonsager, M. E., 21, 86
Sytema, S., 206 Torjussen, T., 179
Szondi, L., 43 Torrance, E. P., 271
Tranel, D., 295
T Trautwein, U., 258
Tabachnick, B. G., 104, 107 Traver, M. D., 231
Taber, B., 342 Traxler, A. E., 12, 13
Taddei, S., 132 Tree, H. A., 83
Tai, D., 279 Treffert, D. A., 115
Talkington, J., 242 Trefflinger, D., 272
Tallent, N., 20 Trinidad, D., 275
Talley, J., 306 Trontel, E. H., 230
Tambs, K., 179 Tröster, A., 285
Tamkin, A. S., 234 Trull, T. J., 218, 254
Tan, J. E., 205 Trumbo, D., 318
Tanner, B. A., 381 Tsai, L., 314
Tasbihsazan, R., 189 Tsang, J., 276
Tasto, D. L., 240 Tsatsanis, K., 197
Tate, R. L., 294 Tsuang, M., 314, 322
Tate, R., 307 Tucker, G., 133
Tauszcik, Y. R., 225 Tulsky, D., 121, 124
Taylor, C. B., 216 Tureck, K., 208
Taylor, C. J., 14 Turk, A. A., 225
Taylor, G., 188 Turkheimer, E., 168
Taylor, J., 14, 23 Turner, A., 298
Name Index 457

Turner, D., 372 Walsh, B. D., 340


Tyson, P., 301 Walsh, W. B., 344
Tzeng, O., 250, 242 Walters, R. H., 216
Walton, K. E., 257
U Wanberg, C., 334
Uematsu, S., 282 Wanek, J., 324
Ulleland, C. N., 171 Wang, J., 147, 177
Ulmer, D., 213 Wang, M. C., 114
Ulrich, L., 318 Ward, C. H., 241
Urquhart-Hagie, M., 164 Ward, M., 195
Useda, J., 218, 254 Ward, T., 297
Ward, W., 381
V Ware, R., 250
Vagg, P. A., 14 Warner, M. H., 322
Vagg, P. R., 230 Washington, J., 200
Vaillant, G., 211 Wasserman, J., 197
Vale, C. D., 382 Wasylkiw, L., 218
Van de Riet, V., 173 Watkins, C., 227
Van de Vijver, F., 149, 165 Watkins, P., 277
Van der Flier, H., 149 Watson, B., 198
Van Gorp, W., 308 Watson, C. B., 382
Van Iddekinge, C. H., 325 Watson, J. B., 267
Vance, B., 200 Watson, P., 298
Vandehey, M. A., 333, 337 Watz, L., 197
VanderVeer, B., 188 Weatherman, R., 204
Vangel, S., 300 Weber, K., 171
Varma, A., 332 Webster, D., 293
Varney, N., 297, 306 Wechsler, D., 11, 14, 39, 65, 72, 118, 120, 125, 176, 185, 299
Vaughn, S., 138 Weinberg, R. A., 170
Vautier, S., 230 Weinberger, L., 372
Ventis, W., 263 Weiner, I., 219, 224, 315
Vernon, M. C., 11 Weiner, W., 284
Vernon, P. A., 107, 230, 386 Weingardner, J., 230
Vernon, P. E., 109, 169 Weinstein, H. P., 254
Viechtbauer, W., 257 Weir, K., 278
Viglione, D. J., 219, 222, 366 Weis, G. M., 184
Villa, S., 206 Weiss, D. J., 382
Villanova, P., 329 Weiss, D. S., 223
Vince, J., 380 Weiss, R. A., 366
Vincent, A., 311 Weller, C. E., 335
Viswesvaran, C., 330 Welsh, G. S., 233, 236
Vogt, A., 307 Werdel, M., 265
Volpe, R., 245 Werder, J., 134
Von Korff, M., 314 Wertheimer, M., 268
Vosler-Hunter, W., 265 Wesman, A. G., 92, 147, 162
Vosvick, M. A., 339 Westbrook, B. W., 261
Whaley, S., 178
W Whipple, G. M., 141
Wagner, R., 318 Whishaw, I. Q., 138, 286
Wainer, H., 282 White, J. C., 173
Walberg, H. J., 114 White, T., 309
Wald, M. M., 279 Whitehouse, P., 306
Waldfogel, J., 171 Whiteside, L., 195
Walker, C., 292 Whitney, D., 158
Walker, S., 372 Whitworth, R. H., 164
Wallace, C. S., 168 Widiger, T., 218
Wallas, G., 268 Wielgosz, A., 213
Wallbrown, F. H., 142 Wiemann, S., 326
458 Name Index

Wiesner, W. H., 319 Woodward, K., 277


Wigert, B., 270 Woodworth, R. S., 32, 42
Wiggins, J., 217 Worth, H., 313
Wilkins, J., 207 Wortman, J., 258
Wilkinson, G. S., 134 Wright, B., 382
Williams, B., 213 Wright, L., 213
Williams, C. L., 235 Wrightsman, L. S., 18, 363, 368
Williams, K. M., 132 Wulff, D. M., 262, 263
Williams, M., 287 Wundt, W., 30, 31
Williams, R. E., 328 Wypij, D., 189
Williams, W., 154
Williamson, D., 319 X
Williamson, L., 319 Xu, L., 279
Willingham, W. W., 358
Willis, S., 82, 176 Y
Wilson, B., 301, 306 Yama, M., 227
Wilson, C., 178 Yazzie, C., 200
Wilson, M., 175, 245 Yen, W. M., 94, 95
Wilson, R. S., 187 Yerkes, R. M., 40, 41, 118, 140
Wilson, T., 386 Youngren, M. A., 242
Wing, H., 322 Ysseldyke, S., 135, 197, 356
Winter, D. G., 225 Yuan, Y., 189
Wirt, R. D., 237 Yudofsky, S. C., 290
Wisniewski, J. J., 198
Wissler, C., 32 Z
Witelson, S., 288 Zald, D., 376
Witt, L., 317 Zapf, P., 371
Witteborg, K. M., 14 Zaslow, M. J., 184
Wolf, A. W., 235 Zedeck, S., 77, 89, 97
Wolf, T. H., 35 Zeidner, M., 275
Wolfe, J., 321 Zeiss, A. M., 242
Wolfe, L. M., 111 Zelinski, E., 380
Wolff, K. C., 22 Zgaljardic, D., 310
Wolfson, D., 62, 284, 288, 291, 296 Zhai, F., 171
Wolpe, J., 239, 240 Zhu, J., 121, 124
Wonderlic, E. F., 322 Zilberg, N. J., 223
Wong, D., 325 Zilha, E., 293
Wong, S., 372 Zimmerman, R., 318
Wood, J. M., 86, 219, 222 Zuo, L., 272
Woodcock, R., 134, 204
Subject Index
A Career development stage theories, 338
Actuarial judgment, 375–376 Cattell-Horn-Carroll theory, 110–111
Albemarle v. Moody, 361 Cerebellum, 284
Alcohol abuse, 291–292 Cerebral cortex, 281–282
Alcohol dependence, 312 Cheating on tests, 25
Alzheimer’s disease, 292–293 Children’s Apperception Test, 226
American College Test (ACT), 153 Classical theory of measurement error, 59–61
Americans with Disabilities Act, 358 Clinical judgment, 375
Analogue behavioral assessment, 246–247 Coding (Wechsler subtest), 122–123
Aphasia, 303 Coefficient alpha, 64
Aptitude tests, 6, 41–42 Cognitive Abilities Test (CogAT), 143–145
Armed Services Vocational Aptitude Battery (ASVAB), Cognitive Assessment System-II (CAS-II), 131–132
149–150 College Entrance Examination Board (CEEB), 41
Arithmetic (Wechsler subtest), 119 Competency to stand trial, 370–371
Army Alpha and Beta tests, 40–41 Comprehension (Wechsler subtest), 121
Assessment, 5 Computer-assisted psychological assessment (CAPA), 373
Assessment center, 327–328 Computer-based test interpretation (CBTI), 373–378
Assessment of Spiritual and Religious Sentiments (ASPIRES), Computerized adaptive testing (CAT), 382–383
264–265 Comrey Personality Scales (CPS), 232
Attentional systems, 283–284 Concurrent validity, 77
Autism Spectrum Disorders, 207–208 Construct validity, 80–84
Autobiographical data, 316–318 Constructional dyspraxia, 288
Automated Neuropsychological Assessment Metrics-4 Content validity, 74–75
(ANAM4), 310 Convergent thinking, 270
Corpus callosum, 281
B Correction for guessing, 13
Basal ganglia, 284 Correlation coefficient, 62–63
Bayley Scales of Infant and Toddler Development-III, 182–183 Cranial nerves, 283
Beck Depression Inventory (BDI), 241 Crawford v. Honig, 355
Behavioral Assessment of the Dysexecutive System (BADS), Creativity tests, 267–273
306–307 Criterion contamination, 332
Behavioral assessment, 228–231 Criterion-referenced tests, 57–58
Bender Gestalt Test-II (BGT-II), 304 Cultural and linguistic minorities, 22–24
Bennett Mechanical Comprehension Test, 322
Big Five Inventory (BFI), 258 D
Big Five personality factors, 324 Debra P. v. Turlington, 353–354
Binet-Simon 1905 Scale, 36 Decision theory, 78–80
Binet-Simon 1908 and 1911 Scales, 37 Defense mechanisms, 211
Goddard’s translation of, 37 Defining Issues Test, 260–261
and immigration testing, 37–39 Denver-II, 189
Biodata, 316 Developmental Indicators for the Assessment of Learning-4
Block Design (Wechsler subtest), 122 (DIAL-4), 191–193
Brass instruments era, 30–33 Devereaux Early Childhood Assessment-Clinical Form
(DECA-C), 183–184
C Diana v. State Board of Education, 353
CAGE questionnaire, 313 Differential Ability Scales-II (DAS-II), 184–185
California Psychological Inventory (CPI), 252–254 Differential Aptitude Test (DAT), 147–148
Campbell Interest and Skills Survey (CISS), 345–348 Digit Span (Wechsler subtest), 120
Career Beliefs Inventory, 340 Divergent production, 270–271
Career development, 333–335 Draw-A–Person (DAP), 198

459
460 Subject Index

Durham rule, 368 Histogram, 48


Duty to warn, 18–19 History of psychological testing, 28–45
Hobson v. Hansen, 353
E Home Observation for the Measurement of the Environment
Ecological momentary assessment, 247–248 (HOME), 193–195
Emotional intelligence, 273 House-Tree-Person Test (H-T–P), 227–228
Employment interview, 318–320
Equal Employment Opportunity Commission (EEOC), I
359, 360 Immediate Post-concussion Assessment and Cognitive
Evidence-based assessment, 45 Testing (ImPACT), 311–312
Examinee Feedback Questionnaire (EFeQ), 98 In-Basket Test, 327
Executive functions, 288–290, 305–307 Information function, 66–67
Expectancy table, 56–57 Information (Wechsler subtest), 119–120
Expert rankings, 88–89 Informed consent, 19–20
Expert witness, 363–364 Integrity tests, 324–326
Eysenck Personality Questionnaire (EPQ), 231–232 Intellectual Disability, 202–207
Intelligence
F age changes and, 175–178
Face validity, 75 definitions of, 100–102
Factor analysis, 100–107 environmental effects on, 169–171
Fagan’s Test of Infant Intelligence (FTII), 188–189 genetic contributions to, 167–168
Faith Maturity Scale, 265–266 infant capacities, 180–184
Fetal alcohol effect, 171 race differences on, 173–175
Fetal alcohol syndrome, 171 simultaneous and successive processing in, 112–113
Figure Weights (Wechsler subtest), 124 structure-of-intellect model in, 111–112
Finger Localization Test, 297 teratogenic effects on, 171–172
Finger Tapping Test, 307 theory of multiple, 114–115
Five-factor model of personality, 217–218 triarchic theory of, 115–117
Flynn effect, 178, 179 Intelligence test(s), 117–139
Forensic assessment, 364–365 predictive validity of infant, 187–188
Frequency distribution, 47 Interactive video in assessment, 379–380
Frequency polygon, 49 Interest inventories, 44
Freudian theories of personality, 209–213 Interval scale, 89
Fuld Object Memory Evaluation, 300–301 Inventory for Client and Agency Planning (ICAP),
205–206
G Iowa Tests of Basic Skills (ITBS), 157
General Aptitude Test Battery (GATB), 148–149 Item-characteristic curve, 94–95
Generational changes in intelligence, 178–179 Item-difficulty index, 93–94
Georgia NAACP v. Georgia, 355 Item-discrimination index, 95–96
GI Forum v. Texas Education Agency, 355–356 Item-reliability index, 94
Goodenough-Harris Drawing Test, 197 Item-response function, 68–69
Graduate Record Exam (GRE), 154–155 Item response theory, 68–69
Graphic rating scales, 330 Item-validity index, 94
Gratitude, assessment of, 276–277
Griggs v. Duke Power, 360 K
Group tests, 140–159 Kaufman Brief Intelligence Test-2 (KBIT-2), 132–133
Guilty but mentally ill (GBMI), 368 Kaufman Test of Educational Achievement-II (KTEA-II),
Guttman scales, 90 134–135
Kuder-Richardson formula, 65
H
Halo effect, 331–332 L
Halstead-Reitan Test Battery, 297 Lake Wobegon effect, 25
Happenstance Learning Theory, 338–339 Larry P. v. Riles, 354–355
Haptic Intelligence Scale for the Adult Blind (HISAB), 201 Law, sources of, 349–352
High-stakes testing, 24–26 Law School Admission Test (LSAT), 155–157
Hindbrain, 282–283 Learning disabilities, 135–139
Hiskey-Nebraska Test of Learning Aptitude, 198 Left hemisphere language functions, 286–288
Subject Index 461

Leiter International Performance Scale-Revised, 196–197 Origins of projective testing, 42–43


Letter-Number Sequencing (Wechsler subtest), 121 Origins of rating scales, 33–34
Levels of measurement, 87–88
Lexile measures, 158 P
Likert scales, 90 Parents in Action on Special Education v. Joseph P. Hannon,
Limbic system, 286 355
Locus of control, 215–216 Parkinson’s Disease, 284, 293
Peabody Picture Vocabulary Test-IV(PPVT-IV), 199–201
M Percentile, 51
Malingering, 365–366 Percentile rank, 51
Matrix Reasoning (Wechsler subtest), 122 Personal injury, 371–372
Mayer-Salovey-Caruso Emotional Intelligence Test Personality theories, 213–218
(MSCEIT), 273–275 Personality coefficient, 218
Medical College Admission Test (MCAT), 155 Personality Inventory for Children-2(PIC-2), 237–238
Luria-Nebraska Neurpsychological Battery (LNNB), Personality Research Form (PRF), 229–230
308–309 Personality tests, 42
Mean, 48 Person-environment fit, 335–337
Median, 48 Phrenology, 30
Memory systems, 285–286 Physiognomy, 29–30
Mental retardation (early views), 34 Picture Completion (Wechsler subtest), 121–122
Mental status exam, 293–294 Picture Concepts (Wechsler subtest), 122
Method of absolute scaling, 89–90 Picture Projective Test, 225–226
Method of empirical keying, 90 Pleasant Events Schedule (PES), 243
Method of equal-appearing intervals, 89 Porteus Maze Test, 116, 306
Method of rational scaling, 91 Positive psychological assessment, 266–278
Metropolitan Achievement Test (MAT), 157–158 Positive psychology, 266
Midbrain, 283 Primary mental abilities, 109–110
Millon Clinical Multiaxial Inventory-III (MCMI-III), Professional testing standards, 16–17
236–237 Projective hypothesis, 218–219
Mini-Mental State Exam (MMSE), 314 Psychograph, 29–30
Minnesota Clerical Test, 322–323 Psychometrician, 2
Minnesota Multiphasic Personality Inventory-2 (MMPI-2), Psychopathy Checklist-Revised (PCL-R), 372–373
233–236 Public Law 94–142, 356–357
M’Naughten rule, 368 Public Law 99–457, 357
Mode, 48
Moral Judgment Scale, 259–261 Q
Multidimensional Aptitude Battery-II (MAB-II), 141–143 Q-technique, 214–215
Multitrait-multimethod matrix, 82–83
Myart v. Motorola, 360 R
Myers-Briggs Type Indicator (MBTI), 250–252 Random sampling, 55
Rapport, 13–14
N Rasch model, 69
NEO Personality Inventory-Revised (NEO-PI-R), 254–255 Rater bias, 332
Neonatal Behavioral Assessment Scale (NBAS), 181–182 Ratio scale, 88
Neuropsychological Assessment Battery (NAB), 309–310 Raven’s Progressive Matrices (RPM), 145–146
No Child Left Behind Act, 357–358 Reliability, 58–72
Nominal scale, 87 alternate-forms, 63
Normal distribution, 49–50 coefficient alpha in, 64
Normal pressure hydrocephalus, 292 measurement error and, 60–61
Norm group, 46, 55 restriction of range and, 69
Not guilty by reason of insanity (NGRI), 367 speed and powers tests and, 70
split-half, 64
O standard error of measurement and, 70–72
Object Assembly (Wechsler subtest), 122 test-retest, 63
Occupational Information Network (O*NET), 339 unstable characteristics and, 70
Optimism, assessment of, 275–276 Reliability coefficient, 61–62
Ordinal scale, 87 Religion as quest, 263–264
462 Subject Index

Responsible test use, 26–27 Test Bias, 159–165


Responsibilities of test publishers, 17–18 content validity and, 161–162
Responsibilities of test users, 18–22 construct validity and, 163–165
Rey Auditory Verbal Learning Test, 300 definition of, 159
Right hemisphere functions, 288 predictive validity and, 162–163
Rivermead Behavioral Memory Test (RBMT), 301 Test fairness, 166–167
Rogers’ Criminal Responsibility Assessment Scales qualified individualism and, 167
(R-CRAS), 369–370 quotas and, 167
Rorschach Inkblot Technique, 219–222 unqualified individualism and, 166–167
Rotter Incomplete Sentences Blank (RISB), 223–224 Test(s),
consequences of, 1–2
S definition of, 2
Scales of Independent Behavior-Revised (SIB-R), 204–205 group vs. individual, 5
Scholastic Assessment Tests (SAT), 151–153 norm-referenced vs. criterion-referenced, 4
Screening for school readiness, 189–190 standardized procedure in, 3, 9–10
Self-Directed Search, 344–345 types of, 5–7
Self-monitoring, 242–243 uses of, 7–9
Sense of humor, assessment of, 277–278 Test administration,
Sensitivity, 84 influence of the examiner, 13–14
Sensory-perceptual exam, 296–297 group testing, 12–13
Sentence Completion Series, 222–223 sensitivity to disabilities in, 11–12
Similarities (Wechsler subtest), 121 Test anxiety, 14–15
Skewness, 50 Test of Everyday Attention, 297–298
Smartphone revolution, 385 Test of General Educational Development (GED), 158–159
Source traits, 217 Test of Memory Malingering (TOMM), 367
Spearman-Brown formula, 63–64 Test of Nonverbal Intelligence-4 (TONI-4), 198–199
Specificity, 84 Test utility, 86
Spiritual Well-Being Scale, 264 Thematic Apperception Test (TAT), 224–225
Stability and change in personality, 255–258 Theory of multiple intelligences, 114–115
Standard scores, 51–53 Tinkertoy Test, 306
Standard deviation, 49 Torrance Tests of Creative Thinking (TTCT), 271–272
Standard error of measurement, 71 Traumatic brain injury, 290–291
Standard error of the difference, 72 Triarchic theory of intelligence (Sternberg), 115–117
Standard error of the estimate, 78 Type A coronary-prone behavior pattern, 213–214
Standard of care, 20 TWEAK questionnaire, 313
Standardization sample, 4
Stanford-Binet, 39 U
Stanford-Binet: Fifth Edition (SB5), 128–130 Uniform Guidelines on Employee Selection, 361–362
Stanford-Binet Intelligence Scales for Early Childhood, United States v. Georgia Power, 361
186–187 User’s manual, 98
Stanine scale, 54
State-Trait Anxiety Inventory (STAI), 230–231 V
Sten scale, 54 Validity, 73–86
Stereotype threat, 23–24 concurrent, 77
Strong Interest Inventory-Revised (SII-R), 341–343 construct, 80–84
Strong Vocational Interest Blank (SVIB), 341 content, 74–75
Structured Clinical Interview for DSM-IV, (SCID), 244–245 criterion-related, 76–77
Structured Interview of Reported Symptoms (SIRS), 366–367 predictive, 77
Substance Abuse Subtle Screening Inventory-3 (SASSI-3), Validity coefficient, 76
313 Validity shrinkage, 97–98
Surface trait, 217 Variance, 49
Symbol Search (Wechsler subtest), 123 Vineland Adaptive Behavior Scales-II (VABS-II), 206–207
Szondi test, 43 Virtual reality in assessment, 380
Visual Puzzles (Wechsler subtest), 123–124
T Visual system, 288
Table of specifications, 92 Vocabulary (Wechsler subtest), 120
Technical manual, 98–99 Vocational Preference Inventory (VPI), 343–344
Subject Index 463

W Wechsler Preschool and Primary Scale of Intelligence-IV


Watson v. Fort Worth Bank and Trust, 362 (WPPSI-IV), 185–186
Wechsler Adult Intelligence Scale-IV (WAIS-IV), Wide Range Assessment of Memory and Learning-2
124–125 (WRAML-2), 301–302
Wechsler Intelligence Scale for Children-IV (WISC-IV), Wisconsin Card Sorting Test, 306
126–128 Wonderlic Personnel Test-Revised, 321–322
Wechsler Memory Scale-IV, 299–300 Work sample, 326
Behavioral Assessment
The study of behavior is the raison d'être of psychology as a science and
objective assessment of behavior a necessity.

Chapter Outline
Assessing Behavior Behavior Rating Scales
Response Sets Direct Observation
Assessment of Behavior Continuous Performance Tests (CPTs)
in the Schools Psychophysiological Assessments
Behavioral Interviewing Summary

Learning Objectives
After reading and studying this chapter, students 6. Compare and contrast categorical and dimen-
should be able to: sional diagnostic models.
1. Define and describe behavioral assessment and 7. Compare and contrast omnibus and single-domain
explain how it is similar to as well as different (syndrome-specific) rating scales. Explain how
from other forms of assessment, especially per- each can facilitate diagnosis and treatment plan-
sonality testing. ning and give examples.
2. Explain how response sets can impact behavioral 8. Describe the strengths and limitations of behav-
assessments. ior rating scales.
3. Describe how behavioral assessments are used in 9. Describe and evaluate the major behavior rating
public schools to assess emotional and behavio- scales.
ral disorders. 10. Describe the history and use of direct observations.
4. Explain the difference between behavioral inter- 11. Describe continuous performance tests and their
viewing and traditional clinical interviews. application.
5. Explain how validity scales can be used to guard 12. Describe psychophysiological assessments, their
against response sets and give examples. use, and their current status.

From Chapter 11 of Mastering Modern Psychological Testing: Theory & Methods, First Edition. Cecil R. Reynolds,
Ronald B. Livingston. Copyright © 2012 by Pearson Education, Inc. All rights reserved.
369
BEHAVIORAL ASSESSMENT

The study of behavior is the raison d'être of psychology as a science and objective assess-
ment of behavior a necessity.
Behavioral assessment has a storied history in psychology and has evolved from a sim-
ple counting of behavioral occurrences to more sophisticated rating scales and observational
schemes. Although professional psychologists traditionally have focused primarily on assess-
ment of cognitive abilities and personality characteristics, the drive for less inferential measures
of behavior as well as a movement toward establishing diagnoses based on observable behavior
has led to a renewal of interest and use of behavioral assessment methods. Additionally, fed-
eral laws mandate that schools provide special education and related services to students with
emotional disorders. Before these services can be provided, the schools must be able to identify
children with these disorders. The process of identifying these children often involves a psy-
chological evaluation completed by a school psychologist or other clinician wherein behaviors
consistent with the federal definition of emotional disturbance must be documented clearly. This
has also led to the derivation of increasingly sophisticated measures of actual behavior.
When describing the different types of tests, we
noted that tests typically can be classified as measures
Behavioral assessment of either maximum performance or typical response.
emphasizes what a person does. Maximum performance tests are often referred to as
Most methods of personality ability tests. On these tests, items are usually scored as
assessment emphasize what either correct or incorrect, and examinees are encour-
a person has (e.g., attributes, aged to demonstrate the best performance possible.
character, or other latent traits Achievement and aptitude tests are common examples
such as anxiety). of maximum performance tests. In contrast, typical
response tests attempt to measure the typical behav-
ior and characteristics of examinees. Typical response
tests typically assess constructs such as personality, behavior, attitudes, or interests (Cronbach,
1990). Behavioral assessment as most commonly conducted is a measure of typical responding
(i.e., what a person does on a regular basis) as are personality scales. Behavioral assessments
can be constructed that measure maximum levels of performance, but such measures are better
conceptualized as aptitude, ability, or achievement measures.
Behavioral assessment also differs from traditional personality assessment in several ways.
Behavioral assessment emphasizes what a person does and in this context emphasizes observ-
able behavior as opposed to covert thoughts and feelings. Behavioral assessment attempts then to
define how a person behaves overtly on a day-to-day basis using observable expression and acts
as the primary means of evaluating behavior. Most methods of personality assessment emphasize
what a person has (e.g., his or her character, attributes, or reported feelings and thoughts). How
we say we think and feel is not always congruent with what we do. Behavioral assessment is
generally seen as more objective than personality assessment as well because most behavioral
assessment scales do not ask for interpretations of behavior, only observations of the presence
and frequency of a specified behavior. Thus, many see behavioral assessment as having a lower
level of inference for its interpretations than traditional personality assessment, where the level
of inference between scores and predictions of behavior can be quite high.
Early conceptualizations of behavioral assessment dealt only with overtly observable
behavior. Typically, behavioral assessment in its formative years (from the 1930s into the late
1970s) relied on observation and counting of specific behaviors of concern. For example, in the

370
BEHAVIORAL ASSESSMENT

1960s and early 1970s many school psychologists were issued behavior counters (often referred
to as clickers because of the clicking sound they made each time a behavior was “counted”) along
with other standardized test materials. Behavior was also seen as being highly contextually or
setting specific—this perception is still characteristic of behavioral assessment, but is viewed
with less rigidity today. For example, there are clear tendencies for children to behave similarly
in the presence of their father and their mother, but it is not unusual for some clear differences
to emerge in how a child behaves with each parent. The same is true across classrooms with dif-
ferent teachers—a child will have a tendency to respond in similar ways in all classrooms, but
there will be clear differences depending on the teacher present and the teacher’s approach to
classroom management, and the teacher’s personality and experience in working with children.
As the field of behavioral assessment has matured, practitioners of behavioral assessment have
come to recognize the importance of chronic characteristics such as anxiety and depression, locus
of control, impulsivity, and other latent traits that do generalize across many settings to a signifi-
cant extent, though far from perfectly. However, in assessing these traits, behavioral assessment
scales ask about observable behaviors that are related
to anxiety or depression as opposed to asking about Early conceptualizations of
thoughts and feelings. Examples might be, “Says, I
behavioral assessment relied
have no friends,” and “Says, no one likes to be with
on observation and counting of
me,” on a behavioral assessment scale, whereas on a
self-report personality scale, an item asking about a specific behaviors of concern in
similar construct might be worded, “I feel lonely much highly specific settings.
of the time.”
Many traditional scales of personality assessment (though certainly not all) have come to
be used in conjunction with behavioral assessment. However, these scales, such as the BASC-2
Self-Report of Personality (Reynolds & Kamphaus, 2004) discussed in the preceding chapter,
do rely less on high-inference items and constructs and instead focus on more behavioral ques-
tions (e.g., “My parents blame me for things I do not do” as opposed to an item stem such as
“People are out to get me”). This distinction may seem subtle, but behavioral assessment pro-
fessionals argue in favor of the use of terms and items that describe actual behavior as opposed
to states.
Thus the lines between behavioral assessment and some forms of traditional personality
assessment do blur at points. The key difference in our minds is the level of inference involved
in the interpretations of the test scores obtained under these two approaches. They are also quite
complementary—it is important to know both how people typically behave and how they think
and feel about themselves and others. Behavioral assessment also is not a specific or entirely
unique set of measuring devices, but rather more of a paradigm, a way of thinking about and
obtaining assessment information. Even responses to projective tests, which most psychologists
would view as the antithesis of behavioral assessment, can be reconceptualized and interpreted
as behavioral in nature (e.g., see the very interesting chapter by Teglasi, 1998) by altering the
method of interpretation, moving toward low-inference interpretations of responses as samples
of actual behavior. It is common practice now for clinicians to use a multimethod, multimodal
approach to behavioral assessment. Practitioners will collect data or assessment information via
behavioral interviewing, direct observation, and impressionistic behavior rating scales, as well
as self-report “personality scales” designed to reduce the level of inference involved in their
interpretation.

371
BEHAVIORAL ASSESSMENT

ASSESSING BEHAVIOR
Whereas we might not consciously be aware of it, we all engage in the assessment and interpreta-
tion of behavior on a regular basis. When you note that “Tommy is a difficult child” or “Tamiqua
is extroverted” you are making an assessment (albeit a crude and general one) and then forming
a judgment about their behavior. We use these informal evaluations to determine with whom we
want to associate and who we want to avoid, among many other ways. Clinicians use behavioral as-
sessment to produce far more objective determinations about the behavior of individuals. By using
standardized behavioral assessment methods such as behavior rating scales, practitioners can also
determine the degree to which behaviors cluster together to reflect broader behavioral dimensions.
For example, most behavior rating scales produce scores that reflect dimensions such as distract-
ibility, aggression, hyperactivity, depression, or anxiety. In addition to telling us if clients behave in
particular ways, behavior rating scales also indicate how common or rare these behaviors are in the
general population. In other words, is the strength of the exhibited tendency to behave in particular
ways strong enough or severe enough to warrant clinical interventions, or are these tendencies of
a level similar to other people? With clients referred for an evaluation, this information is helpful
in helping the clinician determine if there are real behavioral problems requiring psychological
interventions or if the behavior is within normal limits. At times one determines the problem is one
of a caregiver or teacher who simply has a very low tolerance level for what is a common, normal
set of behaviors, and thus a different intervention is required—with a different target! Addition-
ally, through repeated behavioral assessments, which are quite efficient and also have no so-called
practice effect, one can monitor treatment effects regularly and accurately when behavioral change
is the goal of intervention.

RESPONSE SETS
Response sets occur in behavioral assessments.
Response sets also occur in Response biases or response sets are test responses
behavioral assessments. that misrepresent a person’s true characteristics. For
example, an individual completing an employment-
screening assessment that asks about behaviors on the
job might attempt to present an overly positive image by answering all of the questions in the
most socially appropriate manner possible, even if these responses do not accurately represent
the person. On the other hand, a teacher who is hoping to have a disruptive student transferred
from his or her class might be inclined to exaggerate the student’s misbehavior when complet-
ing a behavior rating scale to hasten that student’s removal. A parent may be completing a rating
scale of his or her child’s behavior and not want to be seen as a poorly skilled parent and so may
deny common behavior problems that are present with the child being rated. In each of these
situations the individual completing the test or scale responded in a manner that systematically
distorted reality. This is often referred to as dissimulation that is, making another person (or
yourself) appear dissimilar from how he or she really is or behaves. When response sets are
present, the validity of the test results may be compromised because they introduce construct-
irrelevant error to test scores (e.g., AERA et al., 1999). That is, the test results do not accu-
rately reflect the construct the test was designed to measure. To combat this, many behavioral
assessment scales incorporate several types of validity scales designed to detect the presence

372
BEHAVIORAL ASSESSMENT

of response sets. Validity scales take different forms, but the general principle is that they are
designed to detect individuals who are not responding in an accurate manner. Special Interest
Topic 1 provides an example of a fake bad response set that might appear on a behavior rating
scale completed by parents about their child. Why would parents want the behavior ratings on

SPECIAL INTEREST TOPIC 1


An Example of a “Fake Bad” Response Set on
a Parent Behavior Rating Scale
Typical response measures, despite the efforts of test developers, always remain susceptible to response
sets. The following case is an authentic example. In this case the Behavior Assessment System for
Children—Second Edition (BASC-2) was used.
Susan was brought to a private psychologist’s office on referral from the State Department of Reha-
bilitation Services. Her mother is applying for disability benefits for Susan and says Susan has Bipolar Disor-
der. She is 14 years old and repeating the seventh grade this school year because she failed to attend school
regularly last year. When skipping school, she spent time roaming the local shopping mall or engaging in
other relatively unstructured activities. She failed all of her classes in both semesters of the past school year.
Her mother says this is because she is totally out of control behaviorally and “has Bipolar Disorder.” Susan’s
father expressed concern about her education, especially her lack of interest and unwillingness to do home-
work, but did not describe the hyperirritability and depressive attributes that are common with PBD.
Susan’s responses to the diagnostic interview suggested that she was not interested in school and
wanted to spend time with friends and engaged in social activities. She complained about having trou-
ble keeping up in school as well, noting that reading was especially difficult for her. She acknowledged
some attentional problems that she attributed to lack of interest in academic work, but did not note any
other behavioral issues commonly associated with Pediatric Bipolar Disorder
Susan’s Parent Rating Scale—Adolescent version (PRS-A) completed by the mother indicates evi-
dence of a “fake bad” response set. All of her clinical scale scores were above the normative T-score
mean of 50 and all of her adaptive scale scores were below the normative mean of 50. In other words,
the PRS-A results suggest that Susan is severely maladjusted in all behavioral domains, which, although
possible, is not likely.
The mother’s response set was identified by the Infrequency or F scale, where she obtained a
score of 16, and is in the Extreme Caution range, indicating a high probability that the overall ratings
provide a far more negative picture of the child’s behavior than is actually the case. The following table
shows her full complement of PRS-A scores based on the mother’s ratings.

Clinical Scales Adaptive Scales


Scale T-Score Scale T-Score
Aggression 81 Activities of Daily Living 44
Anxiety 73 Adaptability 35
Attention Problems 70 Functional Communication 28
Atypicality 66 Leadership 39
Conduct problems 65 Social Skills 49
Depression 91
Hyperactivity 67
Somatization 85
Withdrawal 59

373
BEHAVIORAL ASSESSMENT

their child to represent behavior that is worse than what actually is occurring? There are many
reasons, but the two most common are a plea for immediate help with the child, so overwrought
ratings are provided to get the clinician’s attention to the desperate plight of the parents, and the
second is to obtain a diagnosis for which services or disability payments might be received. We
will talk more about detecting response sets later in the chapter.

ASSESSMENT OF BEHAVIOR IN THE SCHOOLS


Public Law 94–142 (IDEA) and its most current reau-
Public Law 94–142 and its most thorization, the Individuals with Disabilities Educa-
current reauthorization, the tion Improvement Act of 2004 (IDEA 2004), man-
Individuals with Disabilities date that schools provide special education and related
Education Improvement Act services to students with emotional disorders. These
laws compel schools to identify students with emotional
of 2004 (IDEA 2004), mandate
disorders and, as a result, expand school assessment
that schools provide special
practices, previously focused primarily on cognitive
education services to students abilities, to include the evaluation of personality, behav-
with emotional disorders. ior, and related constructs. We emphasize schools as a
setting and children as a group in this chapter because
that is the location and the population with whom behavioral assessments are most common, al-
though behavioral assessments are also often used in clinics, private psychology practices, and other
settings. A small number of behavioral assessment devices are available for assessment of adults.
The instruments used to assess behavior and personality in the schools can usually be clas-
sified as behavior rating scales, self-report measures, or projective techniques. The results of a
national survey of school psychologists indicated that 5 of the top 10 instruments were behavior
rating scales, 4 were projective techniques, and 1 was a self-report measure (Livingston et al.,
2003; see Table 1 for a listing of these assessment instruments). These are representative of the

Table 1 10 Most Popular Tests of Child Personality and Behavior


Name of Test Type of Test
1. BASC Teacher Rating Scale Behavior rating scale
2. BASC Parent Rating Scale Behavior rating scale
3. BASC Self-Report of Personality Self-report measure
4. Draw-A-Person Projective technique
5. Conners Rating Scales—Revised Behavior rating scale
6. Sentence Completion Tests Projective technique
7. House-Tree-Person Projective technique
8. Kinetic Family Drawing Projective technique
9. Teacher Report Form (Achenbach) Behavior rating scale
10. Child Behavior Checklist (Achenbach) Behavior rating scale

Note: BASC = Behavior Rating System for Children. The Conners Rating Scales—Revised and Sen-
tence Completion Tests actually were tied. Based on a national sample of school psychologists
(Livingston et al., 2003).

374
BEHAVIORAL ASSESSMENT

instruments school psychologists use to assess children suspected of having an emotional, be-
havioral, or other type of disorder. The distribution is quite interesting to observe. The field of
psychology has moved strongly, as has medicine and some other fields, to what is often termed
evidence-based practice, referring to engagement in professional practices that have clear sup-
port in the science that underlies the profession. Traditionally, stemming from work from the late
1800s into the 1960s, projective assessment dominated assessment and diagnosis of emotional
and behavioral disorders. There is a polemic, staunch emotional controversy surrounding projec-
tive testing. Nowhere is the division of opinion more evident than in these survey results. About
half of the most frequently used tests in this assessment area in the schools are behavior rating
scales—the most objective of behavior assessments and those with the strongest scientific evi-
dence to support their use—whereas 40% are projective tests, clearly the most subjective of our
assessment devices and the class of assessments with the least scientific support!
When behavioral assessments are conducted in schools, psychologists or other behavior
specialists will often conduct an observational assessment in a classroom or perhaps even on the
playground, counting the frequency of specified behaviors. However, teachers are often called
on to provide relevant information on students’ behavior. Classroom teachers are often asked to
help with the assessment of students in their classrooms—for example, by completing behavior
rating scales on students in their class. This practice
provides invaluable data to school psychologists and Those who do behavioral
other clinicians because teachers have a unique oppor- assessments understand the
tunity to observe children in their classrooms. Teach-
need for information on how
ers can provide information on how the child behaves
in different contexts, both academic and social. Those children behave in different
who do behavioral assessments understand the need contexts and are interested in
for information on how children behave in different the consistencies as well as
contexts—school, home, and community being the inconsistencies in behavior
most important—and are interested in the consistencies across settings.
as well as inconsistencies in behavior across settings.

BEHAVIORAL INTERVIEWING
Most assessments begin with a review of the referral information and statement of the referral
questions. Next comes a form of interview with the person to be evaluated, or in the case of a
child or adolescent, an interview with a parent or caregiver may occur first. The traditional clinical
interview usually begins with broad, sweeping questions such as “Why are you here?”, “How can I
help you?”, or perhaps with a child, “Why do you think
A behavioral interview tends you were asked to come here?” The clinician brings out
the presenting problem in this way and then solicits a
to focus on the antecedents
detailed history and attempts to understand the current
and consequences of behaviors
mood states of the interviewee as well as any relevant
of concern as well as what traits of interest and seeks to understand the psychody-
attempts at change have been namics of the behaviors or states of concern. Behavio-
made. An attempt to look at the ral interviewing has a different emphasis.
relevant reinforcement history When conducting a behavioral interview, once
is made as well. the issue to be addressed has been established, the cli-
nician focuses on the antecedents and consequences

375
BEHAVIORAL ASSESSMENT

of behaviors of concern as well as what attempts at change have been made. An attempt to look at
the relevant reinforcement history is made as well (i.e., what has sustained the behavior and why
has it not responded to efforts to create change). Problem-solving strategies are then introduced
that are intended to lead to an intervention. Ramsay, Reynolds, and Kamphaus (2002) described
six steps in behavioral interviewing that can be summarized as follows:
1. Identify the presenting problem and define it in behavioral terms.
2. Identify and evaluate environmental contingencies supporting the behaviors.
3. Develop a plan to alter these contingencies and reinforcers to modify the behavior.
4. Implement the plan.
5. Evaluate the outcomes of treatment or intervention. (This often involves having done a
behavioral assessment using standardized rating scales for example to establish a baseline
rate of behaviors of concern and then reassessing with the same scales later to look for
changes from baseline.)
6. Modify the intervention plan if the behavior is not responding and evaluate the outcome of
these changes.
The first three steps are the heart of the interview process in a behavioral interview, where-
as the follow-up steps are conducted on a continuing basis in a behavioral paradigm. One of
the key goals of the behavioral interview, contrasted with a traditional clinical interview, is to
minimize the levels of inference used to obtain and interpret information. By stressing behavior
as opposed to subjective states, a more definitive plan can be derived, clear goals can be set, and
the progress of the individual monitored more clearly.

BEHAVIOR RATING SCALES


A behavior rating scale is essentially an inventory that asks a knowledgeable informant to rate an
individual on a number of dimensions. When working with children and adolescents the informants
are typically parents or teachers. On behavior rating scales designed for adults the informants might
be a spouse, adult child, or health care worker. The instructions of the behavior rating scale typically
ask an informant to rate a person by indicating whether he or she observes the behavior described:
0 = rarely or never
1 = occasionally
2 = often
3 = almost always
The scale will then present a series of item stems for which the informant rates the indi-
vidual. For example:

Reacts to minor noises from 0 1 2 3


outside the classroom.
Tells lies. 0 1 2 3
Interacts well with peers. 0 1 2 3
Is irritable. 0 1 2 3

376
BEHAVIORAL ASSESSMENT

As we have noted, behavior rating scales have been used most often with children and
adolescents, but there is growing interest in using behavior rating scales with adults. The follow-
ing discussion will initially focus on some major behavior rating scales used with children and
adolescents, but we will also provide an example of a scale used with adults.
Behavior rating scales have a number of posi-
tive characteristics (e.g., Kamphaus & Frick, 2002; Behavior rating scales have
Piacentini, 1993; Ramsay et al., 2002; Witt, Heffer, & been used most often with
Pfeiffer, 1990). For example, children may have dif-
children and adolescents, but
ficulty accurately reporting their own feelings and be-
there is growing interest in
haviors due to a number of factors such as limited in-
sight or verbal abilities or, in the context of self-report using them with adults.
tests, limited reading ability. However, when using
behavior rating scales, information is solicited from the important adults in a child’s life. Ideally
these adult informants will have had adequate opportunities to observe the child in a variety of
settings over an extended period of time. Behavior rating scales also represent a cost-effective
and time-efficient method of collecting assessment information. For example, a clinician may be
able to collect information from both parents and one or more teachers with a minimal invest-
ment of time or money. Most popular child behavior rating scales have separate inventories for
parents and teachers. This allows the clinician to collect information from multiple informants
who observe the child from different perspectives and in various settings. Behavior rating scales
can also help clinicians assess the presence of rare behaviors. Although any responsible clinician
will interview the client and other people close to the client, it is still possible to miss important
indicators of behavioral problems. The use of well-designed behavior rating scales may help
detect the presence of rare behaviors, such as fire setting and animal cruelty that might be missed
in a clinical interview.
There are some limitations associated with the use of behavior rating scales. Even though the
use of adult informants to rate children provides some degree of objectivity, as we noted these scales
are still subject to response sets that may distort the true characteristics of the child. For example, as
a “cry for help” a teacher may exaggerate the degree of a student’s problematic behavior in hopes of
hastening a referral for special education services or even in the hope the child will be removed form
the classroom to a different placement. Accordingly, parents might not be willing or able to acknowl-
edge their child has significant emotional or behavioral problems and tend to underrate the degree
and nature of problem behaviors. Although behavior rating scales are particularly useful in diagnos-
ing “externalizing” problems such as aggression and hyperactivity, which are easily observed by
adults, they may be less helpful when assessing “internalizing” problems such as depression and
anxiety, which are not as apparent to observers.
Ratings of behavior on such omnibus behavior rating scales are impressionistic (i.e., based
on the impressions of the person completing the scale) to a large extent. Test authors do not
ask or expect the person completing the rating to count behaviors and typically items that ask
for a specific count are avoided (e.g., one would rarely see an item such as “Gets out of seat
without permission or at an inappropriate time 1 time per day”). Rather, behavior rating scales
ask “Gets out of seat without permission or at an inappropriate time” with a range of responses
such as rarely, sometimes, often, almost always. Not everyone will interpret such terms as rarely,
sometimes, often, and so on in the same way and this does introduce some error into the ratings.
However, the research on carefully developed behavior rating scales generally demonstrates their

377
BEHAVIORAL ASSESSMENT

scores to be very reliable and also shows them to dif-


Behavior rating scale scores, ferentiate better among various groups of diagnostic
despite their impressionistic conditions in the emotional and behavioral domain
basis, predict diagnoses than any other single form of assessment available to
accurately, predict future us. Behavior rating scale scores, despite their impres-
behavior and learning sionistic basis, predict diagnosis accurately, predict
future behavior and learning problems, help us detect
problems, help detect
changes in behavior, and can even predict what types
changes in behavior, and can
of interventions are most likely to work to change a
even predict what types of
behavior (e.g., see Vannest, Reynolds, & Kamphaus,
interventions are most likely to 2009).
work to change a behavior. It is no surprise, then, that over the past two
decades behavior rating scales have gained popularity
and become increasingly important in the psychological assessment of children and adolescents
(Livingston et al., 2003). It is common for a clinician to have both parents and teachers complete
behavior rating scales for one child. This is desirable because parents and teachers have the op-
portunity to observe the child in different settings and can contribute unique yet complementary
information to the assessment process. The consistencies as well as inconsistencies of a child’s
behavior in different settings and with different adults are also quite informative. Next, we will
briefly review some of the most popular scales.

Behavior Assessment System for Children—Second Edition—


Teacher Rating Scale and Parent Rating Scale (TRS and PRS)
The Behavior Assessment System for Children (BASC) is an integrated set of instruments that in-
cludes a Teacher Rating Scale (TRS), a Parent Rating Scale (PRS), self-report scales, a classroom
observation system, a scale that assesses the parent–child relationship (the Parenting Relationship
Questionnaire), and a structured developmental history (Reynolds & Kamphaus, 1992). Although
the BASC is a relatively new set of instruments, a 2003 national survey of school psychologists
indicates that the TRS and PRS are the most frequently used behavior rating scales in the public
schools today (Livingston et al., 2003). Information obtained from the publisher estimates the
BASC was used with more than 1 million children in the United States alone in 2003. By 2006,
following the release of the second edition of the BASC, known as the BASC-2, this estimate had
grown to 2 million children per year. The TRS and PRS are appropriate for children from 2 to
21 years. Both the TRS and PRS provide item stems describing a behavior to which the informant
responds never, sometimes, often, or almost always. The TRS is designed to provide a thorough
examination of school-related behavior whereas the PRS is aimed at the home and community
environment (Ramsay et al., 2002). In 2004, Reynolds and Kamphaus released the Behavior As-
sessment System for Children—Second Edition (BASC-2), with updated scales and normative
samples. Table 2 depicts the 5 composite scales, 16 primary scales, and 7 content scales for all the
preschool, child, and adolescent versions of both instruments. Reynolds and Kamphaus (2004)
described the individual primary subscales of the TRS and PRS as follows:
● Adaptability: ability to adapt to changes in one’s environment
● Activities of Daily Living: skills associated with performing everyday tasks
● Aggression: acting in a verbally or physically hostile manner that threatens others

378
BEHAVIORAL ASSESSMENT

● Anxiety: being nervous or fearful about actual or imagined problems or situations


● Attention Problems: inclination to be easily distracted or have difficulty concentrating
● Atypicality: reflects behavior that is immature, bizarre, or suggestive of psychotic processes
(e.g., hallucinations)
● Conduct Problems: inclination to display antisocial behavior (e.g., cruelty, destructive)
● Depression: reflects feelings of sadness and unhappiness
● Functional Communication: expression of ideas and communication in any way others can
understand
● Hyperactivity: inclination to be overactive and impulsive
● Leadership: reflects ability to achieve academic and social goals, particularly the ability to
work with others
● Learning Problems: reflects the presence of academic difficulties (only on the TRS)
● Social Skills: reflects the ability to interact well with peers and adults in a variety of settings
● Somatization: reflects the tendency to complain about minor physical problems
● Study Skills: reflects skills that are associated with academic success, for example, study
habits, organization skills (only on the TRS)
● Withdrawal: the inclination to avoid social contact
New to the BASC-2 are the content scales, so-called because their interpretation is driven
more by item content than actuarial or predictive methods. These scales are intended for use by
advanced-level clinicians to help clarify the meaning of the primary scales and as an additional
aid to diagnosis.
In addition to these individual scales, the TRS and PRS provide several different composite
scores. The composite scores for the BASC and the subsequent BASC-2 were derived from a
series of exploratory and confirmatory factor analyses, supplemented by a technique called struc-
tural equation modeling and are thus empirically derived composite scores. Table 2 summarizes
the structure and organization of scores produced by the BASC-2.
The authors recommend that interpretation fol-
low a “top-down” approach, by which the clinician The authors of the Behavior
starts at the most global level and progresses to more Assessment System for
specific levels (e.g., Reynolds & Kamphaus, 2004).
Children—Second Edition
The most global measure is the Behavioral Symptoms
(BASC-2) recommend that
Index (BSI), which is a composite of the Aggression,
Attention Problems, Anxiety, Atypicality, Depression, interpretation follow a “top-
and Somatization scales. The BSI reflects the overall down” approach, by which
level of behavioral problems and provides the clini- the clinician starts at the most
cian with a reliable but nonspecific index of pathol- global level and progresses to
ogy. For more specific information about the nature more specific levels.
of the problem behavior, the clinician proceeds to the
four lower order composite scores:
● Internalizing Problems. This is a composite of the Anxiety, Depression, and Somatization
scales. Some authors refer to internalizing problems as “overcontrolled” behavior. Students
with internalizing problems experience subjective or internal discomfort or distress, but
they do not typically display severe acting-out or disruptive behaviors (e.g., aggression,
impulsiveness). As a result, these children may go unnoticed by teachers and school-based
clinicians. There are some notable exceptions. Children with depression, especially boys,

379
BEHAVIORAL ASSESSMENT

Table 2 Composites, Primary Scales, and Content Scales in the TRS and PRS

Teacher Rating Scales Parent Rating Scales


P C A P C A
2–5 6–11 12–21 2–5 6–11 12–21

COMPOSITE
Adaptive Skills • • • • • •
Behavioral Symptoms Index • • • • • •
Externalizing Problems • • • • • •
Internalizing Problems • • • • • •
School Problems • •

PRIMARY SCALE
Adaptability • • • • • •
Activities of Daily Living • • •
Aggression • • • • • •
Anxiety • • • • • •
Attention Problems • • • • • •
Atypicality • • • • • •
Conduct Problems • • • •
Depression • • • • • •
Functional Communication • • • • • •
Hyperactivity • • • • • •
Leadership • • • •
Learning Problems • •
Social Skills • • • • • •
Somatization • • • • • •
Study Skills • •
Withdrawal • • • • • •

CONTENT SCALE
Anger Control • • • • • •
Bullying • • • • • •
Developmental Social Disorders • • • • • •
Emotional Self-Control • • • • • •
Executive Functioning • • • • • •
Negative Emotionality • • • • • •
Resilency • • • • • •

NUMBER OF ITEMS 100 139 139 134 160 150

Note: Shaded cells represent new scales added to the BASC-2. P = preschool version; C = child version;
A = adolescent version.
Source: Behavior Assessment System for Children, Second Edition (BASC-2). Copyright © 2004 NCS Pearson,
Inc. Reproduced with permission. All rights reserved. “BASC” is a trademark, in the US and/or other coun-
tries, of Pearson Education, Inc. or its affiliates.

380
BEHAVIORAL ASSESSMENT

are often irritable and have attentional difficulties, and can be misdiagnosed as having
attention deficit hyperactivity disorder (ADHD) if one looks only at these symptoms and
does not obtain a full picture of the child’s behavior.
● Externalizing Problems. This is a composite of the Aggression, Conduct Problems, and
Hyperactivity scales. Relative to the behaviors and symptoms associated with internalizing
problems, the behaviors associated with externalizing problems are clearly apparent to ob-
servers. Children with high scores on this composite are typically disruptive to both peers
and adults, and usually will be noticed by teachers and other adults.
● School Problems. This composite consists of the Attention Problems and Learning Prob-
lems scales. High scores on this scale suggest academic motivation, attention, and learning
difficulties that are likely to hamper academic progress. This composite is available only
for the BASC-TRS.
● Adaptive Skills. This is a composite of Activities of Daily Living, Adaptability, Leader-
ship, Social Skills, and Study Skills scales. It reflects a combination of social, academic,
and other positive skills (Reynolds & Kamphaus, 2004).
The third level of analysis involves examining the 16 clinical (e.g., Hyperactivity,
Depression) and adaptive scales (e.g., Leadership, Social Skills). Finally, clinicians will often
examine the individual items. Although individual items are often unreliable, when interpreted
cautiously they may provide clinically important information. This is particularly true of what is
often referred to as “critical items.” Critical items, when coded in a certain way, suggest possible
danger to self or others or reflect an unusual behavior that may be innocuous, but also may not,
and requires questioning by the clinician for clarification. For example, if a parent or teacher
reports that a child often “threatens to harm self or others,” the clinician would want to determine
whether these statements indicate imminent danger to the child or others.
When interpreting the Clinical Composites and Scale scores, high scores reflect abnormal-
ity or pathology. The authors provide the following classifications: T-score ⱖ70 is Clinically
Significant; 60–69 is At-Risk; 41–59 is Average; 31–40 is Low; and ⱕ30 is Very Low. Scores on
the adaptive composite and scales are interpreted differently, with high scores reflecting adaptive
or positive behaviors. The authors provided the following classifications: T-score ⱖ70 is Very
High; 60–69 is High; 41–59 is Average; 31–40 is At-Risk; and ⱕ30 is Clinically Significant.
Computer software is available to facilitate scoring and interpretation, and the use of this soft-
ware is recommended because hand scoring can be challenging for new users. An example of a
completed TRS profile is depicted in Figure 1.
The TRS and PRS have several unique features that promote their use. First, they contain
a validity scale that helps the clinician detect the presence of response sets. As noted previously,
validity scales are specially developed and incorporated in the test for the purpose of detecting re-
sponse sets. Both the parent and teacher scales contain a fake bad (F) index that is elevated when
an informant excessively rates maladaptive items as almost always and adaptive items as never. If
this index is elevated, the clinician should consider the possibility that a negative response set has
skewed the results. Another unique feature of these scales is that they assess both negative and
adaptive behaviors. Before the advent of the BASC, behavior rating scales were often criticized
for focusing only on negative behaviors and pathology. Both the TRS and PRS address this criti-
cism by assessing a broad spectrum of behaviors, both positive and negative. The identification
of positive characteristics can facilitate treatment by helping identify strengths to build on. Still

381
BEHAVIORAL ASSESSMENT

64. Bullies others. 127. Has toileting accidents


CLINICAL PROFILE 97. Falls down. 130. Eats too little
Hyper- Aggres- Conduct
Externalizing
Depres- Soma-
Internalizing
Attention Learning
School 135. Has a hearing problem.
T Score Problems Anxiety Problems Problems Atypicality With- BSI T Score
activity sion Problems sion tization Problems Problems drawal
Composite Composite Composite
120 120
Note: Hight scores on the scales indicate
115 115 high levels of adaptives skills

110 110 ADAPTIVE PROFILE


Functional Adaptive
Skills
105 105 Adapt- Social Leader- Study Commu-
Composite
T Score ability Skills ship Skills nication
100 100 100

95
95 95

90
90 90

85
85 85
80
80 80
75
75 75
70

70 70
65

65 65 60

60 60 55

55 55 50

45
50 50

40
45 45
35
40 40
30
35 35
25
30 30
20

25 25
15

20 20 10

FIGURE 1 An example of a completed TRS profile

Source: Behavior Assessment System for Children, Second Edition (BASC-2). Copyright © 2004 NCS Pearson, Inc.
Reproduced with permission. All rights reserved. “BASC” is a trademark, in the US and/or other countries, of
Pearson Education, Inc. or its affiliates.

another unique feature is that the TRS and PRS provide three norm-referenced comparisons that
can be selected depending on the clinical focus. The child’s ratings can be compared to a general
national sample, a gender-specific national sample, or a national clinical sample composed of
children who have a clinical diagnosis and are receiving treatment. In summary, the BASC-2
PRS and BASC-2 TRS are psychometrically sound instruments that have gained considerable
support in recent years.
Currently there is an interesting discussion under way regarding the relative merits of
categorical diagnostic systems (such as that employed in the DSM–IV–TR) versus dimensional
models of diagnosis. Special Interest Topic 2 presents a brief introduction to this topic.

Achenbach System of Empirically Bvased Assessment—


Child Behavior Checklist (CBCL) and Teacher Report Form (TRF)
The Child Behavior Checklist (CBCL) and the Teacher Report Form (TRF) (Achenbach, 1991a,
1991b) are two components of the Achenbach System of Empirically Based Assessment (ASEBA)
that also includes a self-report scale and a direct observation system. There are two forms of the

382
BEHAVIORAL ASSESSMENT

SPECIAL INTEREST TOPIC 2


Categorical Versus Dimensional Diagnosis
There are many approaches to grouping individuals as well as objects. Whenever we engage in diagno-
sis, we are engaged in grouping via the assignment of a label or designation to a person as having or
not having a disorder or disease—and, having a disorder or not having a disorder typically are mutually
exclusive decisions. In the traditional medical approach to diagnosis, categorical systems and methods
are used. Typically, categorical approaches to diagnosis of mental and developmental disorders rely
heavily on observation and interview methods designed to detect the presence of particular symptoms
or behaviors, both overt and covert. The degree or severity of the symptom is rarely considered except
that it must interfere in some way with normal functioning in some aspect of one’s life (i.e., it must have
a negative impact on the patient). A symptom is either present or absent. A dichotomous decision is
then reached on a diagnosis based on a declaration of presence or absence of a set of symptoms known
to cluster into a pattern designated as a disorder or syndrome.
In dimensional approaches to diagnosis, the clinician recognizes that many traits and states exists
that contribute to a diagnosis and that all of these exist at all times to some greater or lesser extent (i.e.,
they are present on a continuum). Psychologists, the primary practitioners of dimensional diagnosis, then
measure each of the relevant constructs using psychological tests of various types. The relative relation-
ship of each of the constructs to one another and their overall levels are used to derive a diagnosis or clas-
sification. Typically, a mathematical algorithm is used such as discriminant analysis, cluster analysis, latent
profile analysis, configural frequency analysis, logistic regression, or some other multivariate classification
approach in order to arrive at a correct diagnosis or classification. More often than not, psychologists will
refer to a diagnosis made using such a dimensional and actuarial approach as a classification as opposed
to traditional diagnosis to assist in making the distinction in the methods applied.
Dimensional approaches can at times blur the lines between “normality” and “psychopathol-
ogy;” however, this is not necessarily a negative outcome. Dimensional approaches can allow individu-
als who may not meet a strict symptom count to receive services when the combination of behavioral
and emotional issues they are experiencing results in clear impairment but a count of symptoms might
deny a diagnosis. There is also considerable evidence to show that mathematical or actuarial models of
diagnosis and classification tend to be more accurate and objective overall than are traditional methods.
The math algorithms are not swayed by subjective impression—however, some see this as a criticism as
well, arguing that diagnosis is as much or more an art than a science and that good clinicians should
be swayed by subjective information. For this reason, dimensional classification and diagnosis has been
very slow to catch on and is particularly resisted by the medical community, although the current trend
toward the practice of evidence-based medicine that has moved into many professional health care
fields has invited greater acceptance of dimensional approaches to diagnosis and classification.
The use of dimensional models continues to grow more so in psychology than elsewhere, but we
see growth in other areas of health care as well. The issues are complex, but the data are compelling. If
you want to know more about these approaches, we suggest the following two sources:
Grove, W., & Meehl, P. (1996). Comparative efficiency of the informal (subjective, impressionistic)
and formal 9 (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy.
Psychology, Public Policy, and Law. 2, 293–323.
Kamphaus, R., & Campbell, J. (Eds.). (2006). Psychodiagnostic assessment of children: Dimensional and
categorical approaches. New York: Wiley.

CBCL, one for children 2 to 3 years and one for children 4 to 18 years. The TRF is appropriate for
children from 5 to 18 years. The CBCL and TRF have long played an important role in the assess-
ment of children and adolescents and continue to be among the most frequently used psychological
tests in schools today. The scales contain two basic sections. The first section collects information
about the child’s activities and competencies in areas such as recreation (e.g., hobbies and sports),

383
BEHAVIORAL ASSESSMENT

social functioning (e.g., clubs and organizations), and schooling (e.g., grades). The second section
assesses problem behaviors and contains item stems describing problem behaviors. On these items
the informant records a response of not true, somewhat true/sometimes true, or very true/often true.
The clinical subscales of the CBCL and TRF are as follows:
● Withdrawn: reflects withdrawn behavior, shyness, and a preference to be alone
● Somatic Complaints: a tendency to report numerous physical complaints (e.g., headaches,
fatigue)
● Anxious/Depressed: reflects a combination of depressive (e.g., lonely, crying, unhappy)
and anxious (nervous, fearful, worried) symptoms
● Social Problems: reflects peer problems and feelings of rejection
● Thought Problems: evidence of obsessions/compulsions, hallucinations, or other “strange”
behaviors
● Attention Problems: reflects difficulty concentrating, attention problems, and hyperactivity
● Delinquent Behavior: evidence of behaviors such as stealing, lying, vandalism, and arson
● Aggressive Behavior: reflects destructive, aggressive, and disruptive behaviors
The CBCL and TRF provide three composite scores:
● Total Problems: overall level of behavioral problems
● Externalizing: a combination of the Delinquent Behavior and Aggressive Behavior scales
● Internalizing: a combination of the Withdrawn, Somatic Complaints, and Anxious/
Depressed scales
Computer-scoring software is available for the CBCL and TRF and is recommended
because hand scoring is a fairly laborious and time-consuming process. The CBCL and TRF
have numerous strengths that continue to make them popular among school psychologists and
other clinicians. They are relatively easy to use, are time efficient (when using the computer-
scoring program), and have a rich history of clinical and research applications (Kamphaus &
Frick, 2002).
Omnibus rating scales measure The BASC-2 TRS and PRS, the CBCL and
a wide range of symptoms and TRF, and similar rating scales are typically referred
behaviors that are associated to as omnibus rating scales. This indicates that they
measure a wide range of symptoms and behaviors that
with different emotional and
are associated with different emotional and behavio-
behavioral disorders.
ral disorders. Ideally an omnibus rating scale should
be sensitive to symptoms of both internalizing (e.g.,
anxiety, depression) and externalizing (e.g., ADHD, Oppositional Defiant Disorder) disorders
to ensure that the clinician is not missing important indicators of psychopathology. This is par-
ticularly important when assessing children and adolescents because there is a high degree of
comorbidity with this population. Comorbidity refers to the presence of two or more disorders
occurring simultaneously in the same individual. For example, a child might meet the criteria
for both an externalizing disorder (e.g., conduct disorder) and an internalizing disorder (e.g., de-
pressive disorder). However, if a clinician did not adequately screen for internalizing symptoms,
the more obvious externalizing symptoms might mask the internalizing symptoms and result in
an inaccurate or incomplete diagnosis. Inaccurate diagnosis typically leads to inadequate treat-
ment.

384
BEHAVIORAL ASSESSMENT

Single-Domain Rating Scales


Although omnibus rating scales play a central role
in the assessment of psychopathology, there are also Single-domain (syndrome-
a number of single-domain (or syndrome-specific) specific) rating scales often
rating scales. These rating scales resemble the omni- provide a more thorough
bus scales in format, but they focus on a single disor- assessment of the specific
der (e.g., ADHD) or behavioral dimension (e.g., social domain they are designed to
skills). Although they are narrow in scope, they often
assess than the omnibus scales.
provide a more thorough assessment of the specific
domain they are designed to assess than the omni-
bus scales. As a result, they can be useful in supplementing more comprehensive assessment
techniques (e.g., Kamphaus & Frick, 2002). Single-domain scales include measures limited to
ADHD, depression, or obsessive-compulsive disorder, for example. Following are some brief
descriptions of some contemporary syndrome-specific behavior rating scales.

CHILDHOOD AUTISM RATING SCALE. The CARS (Schopler, Reichler, & Renner, 1988) is a 15-
item scale that is designed to help identify autism in children over 2 years of age. The individual
items are summed to form a total score that is used to rate a child on a continuum from nonautis-
tic, to mild-to-moderate autism, to severe autism. The CARS can be completed by a professional
such as a psychologist, pediatrician, or teacher based on observations performed in a variety of
settings (e.g., classrooms, clinics). In the manual the authors report results of psychometric stud-
ies that suggest adequate reliability and validity, and a training video is available that shows how
to use and score the instrument.

BASC MONITOR FOR ADHD (KAMPHAUS & REYNOLDS, 1998). The BASC Monitor (Kam-
phaus & Reynolds, 1998) contains two 45-item ratings scales, one for teachers and one for par-
ents. It is designed for use with children and adolescents 4 to 18 years with ADHD. This in-
strument is designed to facilitate the treatment of ADHD by assessing the primary symptoms
of the disorder in a repeated assessment format. This allows the treatment team to monitor the
effectiveness of the treatment program and make adjustments when indicated (e.g., changes in
the medication regimen). Both the parent and teacher rating forms produce four scales: Attention
Problems, Hyperactivity, Internalizing Problems, and Adaptive Skills. The authors report results
of multiple psychometric studies that indicate good reliability and validity. There is also BASC
Monitor software that helps the clinician collect and track the results of repeated assessments and
any changes in pharmacological and behavioral interventions.

PEDIATRIC BEHAVIOR RATING SCALE. The PBRS (Marshall & Wilkinson, 2008) contains
two ratings scales, one for teachers (95 items) and one for parents (102 items). This instru-
ment is for children and adolescents between 3 and 18 years and is intended to help identify
early onset bipolar disorder and help distinguish it from other disorders with similar presen-
tations. Both forms produce nine scales: Atypical, Irritability, Grandiosity, Hyperactivity/
Impulsivity, Aggression, Inattention, Affect, Social Interactions, and a Total Bipolar Index.
The authors report results of preliminary psychometric studies that indicate adequate reli-
ability and validity.

385
BEHAVIORAL ASSESSMENT

These are just a few examples of the many


Omnibus scales such as the single-domain or syndrome-specific behavior rating
BASC-2 and the CBCL should scales. Many of these scales are available for a number
always be used over single- of psychological disorders and behavioral dimensions.
domain scales for initial They are particularly helpful in the assessment of ex-
diagnosis. ternalizing disorders such as ADHD and conduct dis-
order in children and adolescents. Note that they are
intended to supplement the omnibus scales such as
the BASC-2 and the CBCL, which should always be used over single-domain scales for initial
screening and assessment.

ADAPTIVE BEHAVIOR RATING SCALES. A special type of syndrome-specific scale is one de-
signed to assess adaptive behavior. The American Association on Intellectual and Developmen-
tal Disabilities, an influential organization that advocates for the rights of individuals with dis-
abilities, describes adaptive behavior as a collection of skills in three broad areas:
● Conceptual skills: includes literacy, quantitative skills such as telling time and using mon-
ey, and the ability for self-direction
● Practical skills: includes activities of daily living (e.g., getting dressed, adequate hygiene),
health care, using transportation, preparing meals, and house cleaning
● Social skills: includes general interpersonal and social skills and the ability to follow rules
and obey laws
The measurement of adaptive behaviors is particularly important in the assessment of
individuals with developmental and intellectual disabilities. For example, when diagnosing Men-
tal Retardation it is necessary to document deficits in adaptive skills in addition to deficits in
intellectual abilities. The assessment of adaptive behaviors can also facilitate treatment planning
for individuals with a wide range of disabilities.
The Vineland Adaptive Behavior Scales—Second Edition (Vineland-II) is an example of
a scale designed to assess adaptive behavior. There are a number of forms available for the
Vineland II. These are:
● Survey Interview Form: This form is administered to a parent or other caregiver as a
semistructured interview. That is, the survey provides a set of questions that the clinician
presents to the respondent. It includes open-ended questions that may allow the clinician
to gather more in-depth information than that acquired using standard behavior rating
scales. There is also an Expanded Interview Form that provides a more detail assessment
than the standard Survey Interview Form and
is recommended for low-functioning and young
clients. The measurement of adaptive
● Parent/Caregiver Rating Form: This behav- behaviors is particularly
ior rating scale covers essentially the same be- important in the assessment of
haviors as the Survey Interview Form but uses individuals with developmental
an objective rating scale format. This format is and intellectual disabilities.
recommended when time limitations prevent

386
BEHAVIORAL ASSESSMENT

using the more comprehensive interview form and for periodic monitoring of client
progress during treatment.
● Teacher Rating Form: This behavioral questionnaire is designed to be completed by a
teacher who has experience with a child in a school or preschool setting. Its assesses the
same behavioral domains as those measured by the Survey Interview Form and Parent/
Caregiver Rating Forms, but focuses on behaviors likely to be observed in a classroom or
structured daycare setting.

Adult Behavior Rating Scales


We have thus far emphasized behavior rating scales that are used with children and adolescents.
Behavior rating scales at these ages are far more common in clinical and school practice than in
the adult age range. Nevertheless, there are adult behavior rating scales and we expect that their
use will grow in the future. The Clinical Assessment Scales for the Elderly (CASE) by Reynolds
and Bigler (2001) is an example of such a scale. It is an omnibus behavior rating scale for per-
sons aged 55 through 90 years designed to be completed by a knowledgeable caregiver such as
a spouse or adult child, or a health care worker who has nearly daily contact with the examinee.
The CASE also has a separate self-report scale for cognitively intact seniors to complete, but here
we will focus on the behavior rating scale.
The various clinical scales of the CASE focus on diagnosis and evaluation of the presence
primarily of Axis I or clinical disorders in this age group. The complete self-report scales and the
behavior rating scale of the CASE contain 13 scales each, 10 clinical scales and 3 validity scales.
Table 3 lists and describes the scales of the CASE. As you can see, there is much overlap with
rating scales designed for children and adolescents, in terms of the constructs being assessed,
but some key differences as well. For example, the CASE contains a Fear of Aging scale that
is often useful in evaluating the source of anxieties as well as depressive symptoms in this age
group. A Cognitive Competency screening scale is included to alert clinicians to when a more
careful or thorough evaluation of intelligence and related neuropsychological skills might be
advised. A Substance Abuse scale is also included to alert clinicians to issues in this domain
as well—abuse of common substances and prescription medications are included on this scale
because they are far more common in this population than many clinicians perceive and are
thus often overlooked. You might wonder why most behavior rating scales for adolescents do
not include a substance abuse scale. Although this information is certainly valuable and no one
denies that substance abuse is a problem in the 13-to-18-year-old group, most behavior rating
scales do not include this for several practical reasons. Most adolescents abuse psychoactive
substances in a secretive fashion and so raters are most likely unaware of the issues, and even
if aware or suspicious, have no opportunity to observe the use, and if these scales come up
within the normal range or indicate “no problem,” clinicians may be overconfident they have
effectively ruled this out. More importantly, however, most of these scales are commonly used
in the public schools, which often have prohibitions against psychologists asking students about
substance abuse issues. The 10 clinical scales were designed to assist in the process of differen-
tial diagnosis of the primary Axis I clinical disorders that occur in the population over 55 years
of age. Scales such as study skills, conduct problems, and hyperactivity are of limited value, if
any, in this age group so they were not included.

387
BEHAVIORAL ASSESSMENT

Table 3 Clinical Assessment Scales for the Elderly—Clinical Scales and Descriptions
Clinical Scales Description
Anxiety (ANX) Items assess a generalized sense of apprehension and fears that tend toward being
irrational and nonspecific, including observable and subjective symptoms and worry
states.
Cognitive Competence (COG) Items assess impaired thought processes commonly associated with higher cognitive
deficits in such areas as attention, memory, reason, and logical thought.
Depression (DEP) Items assess indications of depressed mood, general dysthymia, sadness, fatigue, mel-
ancholy, and some cognitive symptoms associated with major depressive episodes.
Fear of Aging (FOA) Items assess a sense of apprehension about aging and overconcern with the natural
processes of aging and its effects on oneself and one’s family.
Mania (MAN) Items assess characteristics of manic states including pressured speech, grandiose
thought, agitation, distractibility, flight of ideas, and related phenomena.
Obsessive-Compulsive (OCD) Items assess nonproductive; ruminative thought patterns; excessive, targeted worry;
and related phenomena.
Paranoia (PAR) Items assess the presence of ideas of reference, nonbizarre delusions, suspicions of
others’ motives, a preoccupation with doubts about others, and related ideas.
Psychoticism (PSY) Items assess disorders of thought, bizarre delusions, confusion, negative symptoms,
and associated problems.
Somatization (SOM) Items assess hypersensitivity to health concerns and physical symptoms not fully
explained by medical problems or excessive numbers of physical complaints.
Substance Abuse (SUB) Items assess overuse of mood-altering substances of a variety of forms, including
common consumer products such as coffee/caffeine, alcohol, and illicit substances,
and the tendency toward dependency on such substances.
Infrequency (F) Items assess a tendency to overreport symptoms across a broad range of disorders
not commonly endorsed in concert and potentially reflecting acute stress, frank
psychosis, malingering, or a very negative response set.
Lie (L) Items assess the tendency to deny common problems or difficulties, to respond in a
socially desirable manner, or an attempt to present oneself in an overly positive light.
Validity (V) Items on this scale reflect highly unrealistic responses typically endorsed at high
levels only by a failure to read and comprehend the items, a failure to take the test
seriously, or by random responding.

Source: From Reynolds, C. R., & Bigler, E. D. (2001). Clinical Assessment Scales for the Elderly. Odessa, FL: Psy-
chological Assessment Resources. Reprinted with permission of PAR.

Three validity scales are provided with the full-length scales: a Lie (L) or social desir-
ability scale, an Infrequency (F) scale, and a Validity (V) scale composed of nonsensical items
designed to detect random or insincere marking. Screening versions of the CASE are also avail-
able and are significantly shortened versions of the same scales noted in Table 3, including two
of the three validity scales. The Infrequency (F) scale does not appear on the CASE screening
scales. Otherwise, the clinical and validity scales are common across the four versions of the
CASE, although individual items that make up the scales vary somewhat from scale to scale.
The full-length CASE rating scales are typically completed in 30 minutes or less, and may be
used in a clinician’s office, nursing home, and rehabilitation setting as well as in a general or

388
BEHAVIORAL ASSESSMENT

gerontological medical practice. The CASE screening versions are half or less of the length
of the full scales and require proportionately less time. The four forms of the CASE (the self-
report, the rating form, and the corresponding short versions of each) are designed to be used
independently or in combination.

DIRECT OBSERVATION
Direct observation and recording of behavior counts Direct observation of behavior
is the oldest method of behavioral assessment and is is the oldest form of behavioral
still widely used. As Ramsay et al. (2002) noted, some assessment and remains useful.
believe this approach to be the true hallmark of what
constitutes a behavioral assessment. In direct observa-
tion, an observer travels to some natural environment
of the individual (a school, a nursing home, etc.) and observes the subject, typically without the
person knowing he or she is the target of the observation, although the latter is not always possi-
ble or even ethical. In reality it is very difficult to get an accurate sample of typical behavior from
a person who knows he or she is being observed—observing the behavior will nearly always
change it. This is in some ways analogous to the Heisenberg principle of uncertainty in physics:
We can never observe something in its unobserved state (observing something changes it!).
In a direct observation, a set of behaviors are specified, then recorded and counted as they
occur. In such an instance it is crucial that the observer-recorder be as impartial and objective as
possible and that the behaviors to be recorded are described in clear, crisp terms so there is the
least amount of inference possible for the observer. Direct observation adds another dimension to
the behavioral assessment—rather than being impressionistic, as are behavior ratings, it provides
true ratio scale data that are actual counts of behavior. It also adds another dimension by being a
different method of assessment that allows triangulation or checking of results from other meth-
ods and allowing the observer to note antecedent events as well as consequences assigned to the
observed behaviors.
This form of traditional behavioral assessment can occur with or without a standardized
recording scheme. Often, observers will develop a form to aid them in coding and counting be-
haviors that is specific to the individual circumstance of any one observation period or simply
devise one they are comfortable using with all of their observations. However, this can introduce
a variety of biases and increase the subjectivity of the observations. It can also enhance the error
rates of recording behaviors due to the cognitive demand on the observer. Standardized observa-
tion forms are available for many different settings and enhance observer training, objectivity,
consistency, and accuracy, but do limit the flexibility of direct observation, which is one of its
key strengths. Nevertheless, we view the advantages of using a standardized observational or
recording system as outweighing the limitations of such systems.
The most widely used system is the Student Observation System (SOS) which is a com-
ponent of the BASC-2. The SOS is a standardized, objective observational recording system
that allows for the observation of 14 dimensions of behavior (some positive dimensions and
some negative dimensions), and is designed to be useful in any structured setting that has edu-
cational goals. It is most commonly used in classrooms. The 14 categories of behavior assessed
are listed in Table 4. Each of these categories or behavioral dimensions is defined specifi-
cally and clearly for the observer and research indicates high levels of interobserver agreement

389
BEHAVIORAL ASSESSMENT

Table 4 Behavioral Categories of the BASC-2 Student Observation System (SOS)


Category/Definition Specific Behavior Examples
1. Response to Teacher/Lesson: This category Raising hand to ask/answer a question; contributing to
describes the student’s appropriate academic class discussion; waiting for help or for an assignment
behaviors involving the teacher or class. This or task
category does not include working on school
subjects (see Category 3).
2. Peer Interaction: This category assesses positive or Conversing with others in small group or class
appropriate interactions with other students. discussion; lightly touching another student in a friendly
or encouraging manner; giving a pat on the back or
shaking hands
3. Work on School Subjects: This category includes Working on a school subject either at the student’s own
appropriate academic behaviors that the student desk or in a learning center
engages in alone, without interacting with others.
4. Transition Movement: This category is for appro- Walking to the blackboard; getting a book; sharpen-
priate and nondisruptive behaviors of children while ing a pencil; lining up; taking a water/bathroom break;
moving from one activity or place to another. Most performing an errand; following others in line
are out-of-seat behaviors and may be infrequent
during a classroom observation period.
5. Inappropriate Movement: This category is Playing at blackboard inappropriately; being asked to
intended for inappropriate motor behaviors that are leave the room or being physically removed from the
unrelated to classroom work. room; hitting others with a classroom-related object
(e.g., a musical instrument); refusing to leave a teacher’s
side to participate in school activities
6. Inattention: This category includes inattentive Scribbling on paper or desks; looking at objects unrelated
behaviors that are not disruptive. to classroom activity while not paying attention
7. Inappropriate Vocalization: This category includes Criticizing another harshly; picking on another student;
disruptive vocal behaviors. Only vocal behavior making disruptive noises such as screaming, belching,
should be checked. moaning, grinding teeth, or “shhh” sounds; refusing to
do schoolwork or participate in an activity; talking out of
turn, during a quiet time, or without permission
8. Somatization: This category includes behaviors Complaining that stomach hurts; complaining that head
regardless of inferred reason (e.g., a student may be hurts
sleeping because of medication, boredom, or poor
achievement motivation).
9. Repetitive Motor Movement: This category Rapping finger(s)/pencil on desk; tapping foot on floor;
includes repetitive behaviors (both disruptive and swinging foot in the air; twirling or spinning a pencil or
nondisruptive) that appear to have no external toy; moving body back and forth or from side to side
reward. Generally, the behaviors should be of while sitting; walking back and forth or in a circle in one
15-second duration or longer to be checked, and area; sucking on back of hand; staring fixedly at moving
may be more likely to be checked on Part A than on hand; hair-twisting
Part B because of their repetitive nature. They may,
however, be checked during either part.
10. Aggression: This category includes harmful Intentionally tearing, ripping, or breaking own or
behaviors directed at another student, the teacher, another’s work, belongings, or property
or property. The student must attempt to hurt
another or destroy property for the behavior to be
checked in this category. Aggressive play would not
be included here.
Continued

390
BEHAVIORAL ASSESSMENT

Category/Definition Specific Behavior Examples


11. Self-Injurious Behavior: This category includes Pulling own hair with enough force to pull it out; slapping
severe behaviors that attempt to injure one’s self. or punching self with enough force to cause a bruise or
These behaviors should not be confused with self- laceration; banging head on a wall, floor, or object with
stimulatory behaviors. This category is intended to enough force to bruise or injure; scratching or poking
capture behaviors of children with severe disabilities at own eyes with enough force to cause injury; placing
who are being served in special classes in schools paper, dirt, or grass in mouth and attempting to ingest it
and institutions.
12. Inappropriate Sexual Behavior: This category “Petting” self or others, any form of sexual touch-
includes behaviors that are explicitly sexual in nature. ing—hugging another student quickly as a brief hello or
The student could be seeking sexual gratification. good-bye would not be coded unless it involves sexual
Behaviors that are not flagrant and specifically sexual touching as well
(such as hitting others) are not included here.
13. Bowel/Bladder Problems: This category includes Urinating in his or her pants; having a bowel movement
urination and defecation. outside the toilet; soiling or smearing in pants
14. Other: This category includes behaviors that do not
seem to fit in any other categories. It should be used
infrequently.

Source: Behavior Assessment System for Children, Second Edition (BASC-2). Copyright © 2004 NCS Pearson, Inc.
Reproduced with permission. All rights reserved. “BASC” is a trademark, in the US and/or other countries, of
Pearson Education, Inc. or its affiliates.

on the ratings with as few as two in vivo training sessions (Reynolds & Kamphaus, 2004). The
SOS uses a momentary time sampling (MTS) procedure to ensure that it adequately samples
the full range of a child’s behavior in the classroom (Reynolds & Kamphaus, 1992). Several
characteristics of the SOS exemplify this effort:
● Both adaptive and maladaptive behaviors are observed (see Table 4).
● Multiple methods are used including clinician rating, time sampling, and qualitative
recording of classroom functional contingencies.
● A generous time interval is allocated for recording the results of each time sampling inter-
val (27 seconds).
● Operational definitions of behaviors and time sampling categories are included in the
BASC-2 manual (Reynolds & Kamphaus, 2004).
● Inter-rater reliabilities for the time sampling portion are high, which lends confidence that
independent observers are likely to observe the same trends in a child’s classroom behavior.
These characteristics of the SOS have contributed to its popularity as a functional behavioral
assessment tool. It is crucial, for example, to have adequate operational definitions of behaviors
that, in turn, contribute to good inter-rater reliability. Without such reliability clinicians will never
know if their observations are unique or potentially influenced by their own biases or idiosyncratic
definitions of behavior. MTS is also important in making direct observation practical as well as ac-
curate. The observer watches the target individual for a specified period and looks at the recording
sheet, marks the relevant behaviors seen, again does this in a specified period, and then observes
the target individual again. The BASC-2 SOS MTS is set to be a total of 15 minutes. With this
timeframe an observer can target multiple children in a classroom or efficaciously observe the same
target in multiple settings to assess the generalizability of the behavioral occurrences.

391
BEHAVIORAL ASSESSMENT

Data from the direct observation of behavior is useful in initial diagnosis, treatment plan-
ning, and in monitoring changes and treatment effectiveness. It gives the clinician a unique look
at the immediate antecedents and consequences of behavior in a relevant context in a way that
no other method can document. An electronic version of the BASC-2 SOS, the BASC Portable
Observation Program (BASC POP), is available that may be used on a laptop computer.

CONTINUOUS PERFORMANCE TESTS (CPTs)


Continuous performance tests (CPTs) are a specific
Continuous performance tests type of behavioral test originally designed to measure
are a specific type of behavioral vigilance, sustained and selective attention, and more
test originally designed to generally, executive control. There are many differ-
measure vigilance, attention, ent CPT paradigms that have been devised since the
and more generally, executive original CPT of Rosvold, Mirsky, Sarason, Bransome,
and Beck in 1956, but the basic CPT paradigms have
control.
remained similar until just recently. Typically, a CPT
requires an examinee to view a computer screen and
respond when a specific, but highly simple, stimulus or sequence of stimulating, appears on the
screen and to inhibit responding at all other times. For example, in the first CPT, the examinee
pressed a lever whenever the letter X appeared on a screen but was to resist pressing the lever
whenever any other letter or a number appeared. Gradually, CPTs became more complex and
an examinee might be required to respond only when the letter X is preceded by the letter A but
inhibit responding whenever the X appears (or any other letter appears) but it has not been im-
mediately preceded by the letter A. CPTs can be made more complex by using sequences that
mix color, numbers, letters, and even geometric or nonsense figures. CPTs can also be auditory
wherein examinees respond to a target sound but only when preceded by a designated or pre-
paratory sound. The patterns used have always been kept simple in order to minimize the effects
of short-term memory and maximize attention and inhibition as key variables being assessed.
Although the tasks seem simple enough, and indeed they are intended to be simple so that fac-
tors such as general intelligence are minimized, they do require intense levels of concentration
and over a period of even 15 or 20 minutes, many people will make mistakes on even such
simple tasks.
CPTs have been found highly sensitive over decades of research in detecting disorders of
self-regulation in which attention, concentration, and response inhibition systems are impaired.
These are often key indicators of disorders such as ADHD, are frequently symptoms appearing
following traumatic brain injury and many central nervous system diseases, and attempts have
been made to use CPT results as the so-called gold standard for diagnosis of ADHD. How-
ever, the disturbances in attention, concentration, and response inhibition apparent on CPT use
is not specific to even a small subset of disorders. In fact, not only do individuals with ADHD
show abnormal results on CPTs, individuals with bipolar disorder, borderline personality disor-
der, chronic fatigue syndrome, nearly all forms of dementia, mental retardation, schizophrenia,
seizure disorder, and a host of neurodevelopmental disorders that are genetic in origin also dem-
onstrate abnormal CPT results. Nevertheless, CPTs remain widely used and are highly sensitive
to symptoms associated with abnormalities of the self-regulatory and executive control systems
of the brain.

392
BEHAVIORAL ASSESSMENT

Based on research indicating that working memory is also associated with the executive
control systems of the brain, a recent CPT has been devised to assess the executive system of the
brain more broadly and has added working memory assessment to the standard CPT paradigms
that also assesses inhibitory control, sustained attention, and vigilance (Isquith, Roth, & Gioia,
2010). Known as the Tasks of Executive Control (TEC), the TEC consists of a set of six dif-
ferent tasks that manipulate working memory load as a component of attention, vigilance, and
response inhibition. It yields a wide range of scores associated with each of these tasks, some
of which are common to the traditional CPT paradigms and some of which are relatively new.
It is too early to determine how well these new approaches to the traditional CPT paradigm,
particularly the addition of working memory demands, will fare in the clinical and research
communities.
CPTs in general do not correlate highly with behavior rating scale data based on observa-
tions of children and adolescents in routine aspects of daily life or when performing academic
tasks. It seems clear that CPTs do provide unique forms of performance-based information about
the executive control systems of the brain and their continued evolution should provide addi-
tional insights into brain function as well as diagnostic issues related to central nervous system
problems.

PSYCHOPHYSIOLOGICAL ASSESSMENTS
Psychophysiological assessment is another power-
ful method of behavioral assessment that typically Psychophysiological assessment
involves recording physical changes in the body dur- is another powerful method
ing some specific event. The so-called lie detector of behavioral assessment that
or polygraph is perhaps the best-known example. It typically involves recording
records a variety of changes in the body of a person physical changes in the body
while answering yes–no questions, some of which
during some specific event.
are relevant to what the examiner wants to know
and some of which are not. Heart rate, respiration,
and the galvanic skin response (the ability of the skin to conduct an electric charge—which
changes if you start to sweat even a little bit) are commonly monitored by such devices.
There are many examples of psychophysiological assessment including the use of electroen-
cephalographs (EEGs) which monitor brain wave activity, electromyographs which monitor
activation of muscle tissue, and one of the most controversial, the penile plethysmograph
which monitors blood flow changes in the penis during exposure to different stimuli. The
latter device has been used to conduct evaluations of male sex offenders for some years and
its proponents claim to be able to diagnose pedophilia and other sexual disorders involving
fetishes with high degrees of accuracy—having looked at this literature, we remain skeptical
of many of these claims.
All devices in the psychophysiological assessment domain are highly sensitive and require
careful calibration along with standardized protocols for their use. However, too many of them
do not have adequate standardization or reference samples to make them as useful in clinical
diagnosis as they might become. Others, however, such as the EEG, are very common, well-
validated applications that are immensely useful in the right hands. We believe this form of as-
sessment holds great promise for the future of psychological assessment.

393
BEHAVIORAL ASSESSMENT

Summary
Behavioral assessment is not simply a specific set of measuring devices, but more of a para-
digm, a way of thinking about and obtaining assessment information. Behavioral assessment
differs from traditional personality assessment in that behavioral assessments emphasize what
an individual actually does, whereas most personality assessments emphasize characteristics or
traits of the individual. Many contemporary clinicians use a multimethod, multimodal approach
to assessment. That is, they collect data or assessment information using multiple techniques,
including behavioral interviewing, direct observation, and impressionistic behavior rating scales,
as well as traditional self-report “personality scales.” This approach is designed to reduce the
level of inference involved in interpretation.
Although a behavioral approach to assessment is best considered a broad paradigm it does
typically involve common techniques. For example, it is common for the clinician to conduct a
behavioral interview. In a behavioral interview the clinician focuses on the antecedents and con-
sequences of behaviors of concern as well as what interventions have been used. In contrast with
traditional clinical interviews a key goal of the behavioral interview is to minimize the level of
inference used to obtain and interpret information. By stressing behavior as opposed to subjec-
tive states, a more definitive plan can be derived, clear goals can be set, and the progress of the
individual monitored more clearly.
Another popular behavioral approach is the use of behavior rating scales. A behavior
rating scale is an objective inventory that asks a knowledgeable informant to rate an indi-
vidual on a number of dimensions. These ratings of behavior are largely impressionistic in
nature (i.e., based on the informant’s impression rather than actually counting behaviors), but
research has shown they predict diagnosis accurately, predict future behavioral and learning
problems, help detect changes in behavior, and can even predict what types of interventions
are most likely to work to change a behavior. As a result, behavior rating scales have gained
considerable popularity in recent years. Many of the most popular behavior rating scales are
referred to as an omnibus rating scales. In this context omnibus indicates that they measure a
wide range of symptoms and behaviors that are associated with different emotional and be-
havioral disorders. Ideally an omnibus rating scale should be sensitive to symptoms of both
internalizing (e.g., anxiety, depression) and externalizing (e.g., ADHD, Oppositional Defiant
Disorder) disorders to ensure that the clinician is not missing important indicators of psycho-
pathology. We provided detailed descriptions of two popular omnibus behavior rating scales:
the Behavior Assessment System for Children (BASC) which includes a Teacher Rating Scale
(TRS), a Parent Rating Scale (PRS); and the Achenbach System of Empirically Based Assess-
ment (ASEBA) which includes the Child Behavior Checklist (CBCL) and the Teacher Report
Form (TRF).
Although omnibus rating scales play a central role in the assessment of psychopathology,
there are also a number of single-domain or syndrome-specific rating scales. Single-domain rat-
ing scales resemble the omnibus scales in format, but focus on a single disorder or behavioral
dimension. Although they are narrow in scope, they often provide a more thorough assessment
of the specific domain they are designed to assess than the omnibus scales. As a result, they can
be useful in supplementing more comprehensive assessment techniques.

394
BEHAVIORAL ASSESSMENT

Behavior rating scales have been used most often with children and adolescents, but there
are behavior rating scales for adults. As an example we discussed the Clinical Assessment Scales
for the Elderly (CASE) which is an omnibus behavior rating scale for individuals aged 55 through
90 years designed to be completed by a knowledgeable caregiver, such as a spouse, adult child, or
a health care worker who has frequent contact with the examinee. It is likely that behavior rating
scales will be designed and used with adults more frequently in the future.
Direct observation and recording of behavior constitutes one of the oldest approaches to
behavioral assessment and is still commonly used. In direct observation, an observer travels to
some natural environment of the individual and observes the subject, typically without the person
knowing he or she is the target of the observation. Direct observation adds another dimension to
the behavioral assessment—rather than being impressionistic, as are behavior ratings, it provides
true ratio scale data that are actual counts of behavior. It also adds another dimension by being
a different method of assessment that allows triangulation or checking of results from other
methods and allowing the observer to note antecedent events as well as consequences assigned
to the observed behaviors. As an example of an approach to direct observation, we described the
Student Observation System (SOS) which is a component of the BASC-2.
Continuous performance tests (CPTs) are another type of behavioral assessment designed
to measure vigilance, sustained and selective attention, and executive control. They have been
found to be highly sensitive in detecting disorders of self-regulation in which attention, con-
centration, and response inhibition systems are impaired. Although often considered essential
techniques in the assessment of ADHD, the constructs they measure are also commonly impaired
in individuals with a number of other psychological and neuropsychological disorders. Research
indicates that CPTs provide performance-based information about executive control systems on
the brain and can facilitate both diagnosis and treatment.
The final behavioral approach we discussed was psychophysiological assessment. Psy-
chophysiological assessments typically involve recording physical changes in the body during
specific events. The polygraph or so-called lie detector is perhaps the best-known example of
psychophysiological assessment. It records a variety of changes in the body of a person while
answering yes–no questions, some of which are relevant to what the examiner wants to know and
some of which are not. Psychophysiological assessment devices are highly sensitive and require
careful calibration along with standardized protocols to produce valid and reliable results. Many
of these instruments have inadequate standardization and normative data to make them clinically
useful, but this approach holds considerable potential.

Key Terms and Concepts


Behavior Assessment System for Direct observation Public Law 94–142 (IDEA)
Children—Second Edition Impressionistic Single-domain rating scales
(BASC-2) Individuals with Disabilities
Behavior rating scale Education Improvement Act
Behavioral assessment of 2004 (IDEA 2004)
Behavioral interview Omnibus rating scales
Continuous performance tests Psychophysiological
(CPTs) assessment

395
BEHAVIORAL ASSESSMENT

Recommended Readings
Kamphaus, R. W., & Frick, P. J. (2002). Clinical assessment of child and adolescent personality and be-
havior. Boston: Allyn & Bacon. This text provides comprehensive coverage of the major personality and
behavioral assessment techniques used with children and adolescents. It also provides a good discussion
of the history and current use of projective techniques.
Reynolds, C. R., & Kamphaus, R. W. (2003). Handbook of psychological and educational assessment of
children: Personality, behavior, and context. New York: Guilford Press. This is another excellent source
providing thorough coverage of the major behavioral and personality assessment techniques used with
children. Particularly good for those interested in a more advanced discussion of these instruments and
techniques.
Riccio, C., Reynolds, C. R., & Lowe, P. A. (2001), Clinical applications of continuous performance tests:
Measuring attention and impulse of responding in children and adolescents. New York: Wiley. A good
source on CPTs.

396

You might also like