Download as pdf or txt
Download as pdf or txt
You are on page 1of 593

Comprehensive

Clinical
Psychology

Editor  
Cecil  R.  Reynolds  
Texas  A&M  University,  College  Station,  TX,  USA  

Comprehensive  Clinical  Psychology  Editors-­‐in-­‐Chief  


Alan  S.  Bellack  
The  University  of  Maryland  at  Baltimore,  MD,  USA  
Michel  Hersen  
Pacific  University,  Forest  Grove,  OR,  USA  

Assessment  
Volume  4  

2001  
AN  IMPRINT  OF  ELSEVIER  SCIENCE  
AMSTERDAM—LONDON—NEW  YORK—OXFORD—PARIS—SHANNON—TOKYO  
Elsevier  Science  Ltd.,  The  Boulevard,  Langford  Lane,  Kidlington,  Oxford,  
OX5  1GB,  UK  

Copyright  ©  2001  Elsevier  Science  Ltd.  

All   rights   reserved.   No   part   of   this   publication   may   be   reproduced,   stored   in   any   retrieval   system   or  
transmitted   in   any   form   or   by   any   means:   electronic,   electrostatic,   magnetic   tape,   mechanical  
photocopying,  recording  or  otherwise,  without  permission  in  writing  from  the  publishers.  

First  edition  1998  


Paperback  edition  2001  

Library  of  Congress  Cataloging  In  Publication  Data  


Comprehensive  clinical  psychology  /  editors-­‐in-­‐chiefs  Alan  S.  Bellack,  
Michel  Hersen.  -­‐1st  ed.  
p. cm.
Includes  indexes.  
Contents:  v.  1.  Foundations  /  volume  editor,  C.  Eugene  Walker  —  
v.  2.  Professional  issues  /  volume  editor,  Arthur  N.  Wiens  —  v.  3.  
Research  and  Methods  /  volume  editor,  Nina  R.  Schooler  —  v.  4.  
Assessment  /  volume  editor,  Cecil  R.  Reynolds  —  v.  5.  Children  &.  
adolescents  /volume  editor,  Thomas  Ollendick  —  v.  6.  Adults  /  volume  
editor,  Paul  Salkovskis  —  v.  7.  Clinical  geropsychology  /  volume  editor,  
Barry  Edelstein  —  v.  8.  Health  psychology  /  volume  editor,  Derek  W.  
Johnston  and  Marle  Johnston  —  v.  9.  Applications  in  diverse  
Populations  /  volume  editor,  Nirbhay  N.  Singh  -­‐  v.  10.  Sociocultural  
and  individual  differences  /  volume  editor,  Cynthia  D.  Belar  —  v.  11.  
Indexes.  
1. Clinical  psychology  I.  Bellack,  Alan  S.  II.  Hersen,  Michel.
[DNLM:  1.  Psychology,  Clinical.  WM  lOS  C737  1998]  
RC467.C597  1998  
616.89-­‐-­‐dc21  
DNLM/DLC  
for  Library  of  Congress     97-­‐50185  
CIP  

British  Library  Cataloguing  In  Publication  Data  


Comprehensive  clinical  psychology  
I.  Clinical  psychology  
II. Bellack,  Alan  S.  (Alan  Scott),  1944-­‐    II  Hersen,  Michel
616.8  ‘  9  

ISBN  0-08-042707-3  (set  :  alk.  paper)


ISBN  0-08-43143-7  (Volume  4)  
ISBN  0-­‐08-­‐044069-­‐X  (Volume  7  paperback)  

Typeset  by  Bibliocraft.  Dundee,  UK.  


Printed  and  bound  in  The  Netherlands  by  Giethoorn  Media  Group  
 
 
 
 
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.01
The Role of Assessment in Clinical
Psychology
LEE SECHREST, TIMOTHY R. STICKLE, and MICHELLE STEWART
University of Arizona, Tucson, AZ, USA

4.01.1 INTRODUCTION 2
4.01.1.1 Useful Clinical Assessment is Difficult but not Impossible 2
4.01.2 WHY ARE ASSESSMENTS DONE? 4
4.01.2.1 Bounded vs. Unbounded Inference and Prediction 5
4.01.2.2 Prevalence and Incidence of Assessment 5
4.01.2.3 Proliferation of Assessment Devices 7
4.01.2.4 Over-reliance on Self-report 9
4.01.3 PSYCHOMETRIC ISSUES WITH RESPECT TO CURRENT MEASURES 10
4.01.3.1 Reliability 10
4.01.3.2 Validity 10
4.01.3.3 Item Response Theory 11
4.01.3.4 Scores on Tests 11
4.01.3.5 Calibration of Measures 12
4.01.4 WHY HAVE WE MADE SO LITTLE PROGRESS? 12
4.01.4.1 The Absence of the Autopsy 14
4.01.5 FATEFUL EVENTS CONTRIBUTING TO THE HISTORY OF CLINICAL ASSESSMENT 14
4.01.5.1 The Invention of the Significance Test 14
4.01.5.2 Ignoring Decision Making 14
4.01.5.3 Seizing on Construct Validity 16
4.01.5.4 Adoption of the Projective Hypothesis 16
4.01.5.5 The Invention of the Objective Test 17
4.01.5.6 Disinterest in Basic Psychological Processes 17
4.01.6 MISSED SIGNALS 19
4.01.6.1 The Scientist±Practitioner Model 19
4.01.6.2 Construct Validity 19
4.01.6.3 Assumptions Underlying Assessment Procedures 20
4.01.6.4 Antecedent Probabilities 20
4.01.6.5 Need for Integration of Information 20
4.01.6.6 Method Variance 21
4.01.6.7 Multiple Measures 21
4.01.7 THE ORIGINS OF CLINICAL ASSESSMENT 22
4.01.7.1 The Tradition of Assessment in Psychology 22
4.01.7.1.1 Witmer 23
4.01.7.1.2 Army Alpha 23
4.01.8 THE RORSCHACH INKBLOT TECHNIQUE AND CLINICAL PSYCHOLOGY 23
4.01.8.1 The Social and Philosophical Context for the Appearance of the Rorschach 23
4.01.8.2 The Birth of the Rorschach 24
4.01.8.3 Clinical vs. Statistical Prediction 25
4.01.8.4 Old Tests Never Die, They Just Fade Away 26

1
2 The Role of Assessment in Clinical Psychology

4.01.9 OTHER MEASURES USED IN CLINICAL PSYCHOLOGY 27


4.01.9.1 The Thematic Apperception Test 27
4.01.9.2 Sentence Completion Tests 28
4.01.9.3 Objective Testing 28
4.01.9.4 The Clinician as a Clinical Instrument 28
4.01.9.5 Structured Interviews 29
4.01.10 CONCLUSIONS 29
4.01.11 REFERENCES 29

4.01.1 INTRODUCTION theoretical underpinnings for assessment activ-


ities, but in at least some respects we are not so
In this chapter we will describe the current negative in our outlook as we may seem. Let us
state of affairs with respect to assessment in explain. In general, tests and related instruments
clinical psychology and then we will attempt to are devised to measure constructs, for example,
show how clinical psychology got to that state, intelligence, ego strength, anxiety, antisocial
both in terms of positive influences on the tendencies. In that context, it is reasonable to
directions that efforts in assessment have taken focus on the construct validity of the test at
and in terms of missed opportunities for hand: how well does the test measure the
alternative developments that might have been construct it is intended to measure? Generally
more productive psychology. For one thing, we speaking, evaluations of tests for construct
really do not think the history is particularly validity do not produce single quantitated
interesting in its own right. The account and indexes. Rather, evidence for construct validity
views that we will give here are our own; we are consists of a ªweb of evidenceº that fits together
not taking a neutralÐand innocuousÐ at least reasonably well and that persuades a test
position. Readers will not find a great deal of user that the test does, in fact, measure the
equivocation, not much in the way of ªa glass construct at least passably well. The clinician
half-empty is, after all, half-fullº type of examiner especially if he or she is acquainted in
placation. By assessment in this chapter, we other ways with the examinee, may form
refer to formal assessment procedures, activities impressions, perhaps compelling, of the validity
that can be named, described, delimited, and so of test results. The situation may be something
on. We assume that all clinical psychologists are like the following:
more or less continuously engaged in informal Test5Ðconstruct
assessment of clients with whom they work. That is, the clinician uses a test that is a measure
Informal assessment, however, does not follow of a construct. The path coefficient relating the
any particular pattern, involves no rules for its test to the construct (in the convention of
conduct, and is not set off in any way from other structural equations modeling, the construct
clinical activities. We have in mind assessment causes the test performance) may well be
procedures that would be readily defined as substantial. A more concrete example is pro-
such, that can be studied systematically, and vided by the following diagram:
whose value can be quantified. We will not be IQ Test5Ð0.80Ðintelligence
taking account of neuropsychological assess- This diagram indicates that the construct of
ment nor of behavioral assessment, both of intelligence causes performance on an IQ test.
which are covered in other chapters in this We believe that IQ tests may actually be quite
volume. It will help, we think, if we begin by good measures of the construct of ªintelli-
noting the limits within which our critique of gence.º Probably clinicians who give intelli-
clinical assessment is meant to apply. We, gence tests believe that in most instances the test
ourselves, are regularly engaged in assessment gives them a pretty good estimate of what we
activities, including developmemt of new mea- mean by intelligence, for example, 0.80 in this
sures, and we are clinicians, too. example. To use a term that will be invoked
later, the clinician is ªenlightenedº by the results
4.01.1.1 Useful Clinical Assessment is Difficult from the test.
but not Impossible As long as the clinical use of tests is confined
to enlightenment about constructs, many tests
Many of the comments about clinical assess- may have reasonably good, maybe even very
ment that follow may seem to some readers to be good ªvalidity.º The tests are good measures of
pessimistic and at odds with the experiences of the constructs. In many, if not most, clinical uses
professional clinicians. We think our views are of tests, however, the tests are used in order to
quite in accord with both research and the make decisions. Tests are used, for example to
Introduction 3

decide whether a parent should have custody of mance in school, say college, as an example.
a child, to decide whether a patient is likely to College grades depend on motivation, persis-
benefit from some form of therapy, to decide tence, physical health, mental health, study
whether a child ªshouldº be placed in a social habits, and so on. If clinical psychologists are
classroom, or to decide whether a patient should serious about predicting performance in college,
be put on some particular medication. Using then they probably will need to measure several
our IQ test example, we get a diagram of the quite different constructs and then combine all
following sort: those measures into a prediction equation. The
IQ Test5Ð0.80ÐintelligenceÐ0.50Ð4 measurement task may seem onerous, but it is
School grades worth remembering Cronbach's (1960) band
This diagram, which represents prediction width vs. fidelity argument: it is often better to
rather than simply enlightenment, has two measure more things less well than to measure
paths, and the second path is almost certain one thing extraordinarily well. A lot of measure-
to have a far lower validity coefficient than the ment could be squeezed into the times usually
first one. Intelligence has a stronger relationship allotted to low bandwidth tests. The genius of the
to performance on an IQ test than to perfor- profession will come in the determination of
mance in school. If an IQ test had construct what to measure and how to measure it. The
validity of 0.80, and if intelligence as a construct combination of all the information, however, is
were correlated 0.50 with school grades, which likely best to be done by a statistical algorithm
means that intelligence would account for 25% for reasons that we will show later.
of the total variance in school grades, then the We are not negative toward psychological
correlation between the IQ test and school testing, but we think it is a lot more difficult and
grades would be only 0.80 x 0.50 = 0.40 (which complicated than it is generally taken to be in
is about what is generallly found to be the case). practice. An illustrative case is provided by the
IQ Test5Ð0.40Ð4School grades differential diagnosis of attention deficit hyper-
A very good measure of ego strength may not be activity disorder (ADHD). There might be an
a terribly good predictor of resistance to stress ADHD scale somewhere but a more responsible
in some particular set of circumstances. Epstein clinical study would recognize that the diagnosis
(1983) pointed out some time ago that tests can be difficult, and that the validity and
cannot be expected to be related especially well certainty of the diagnosis of ADHD is greatly
to specific behaviors, but it is in relation to improved by using multiple measures and
specific behaviors that tests are likely to be used multiple reporting agents across multiple con-
in clinical settings. texts. For example, one authority recommended
It could be argued and has been, (e.g., Meyer beginning with an initial screening interview, in
& Handler 1997), that even modest validities which the possibility of an ADHD diagnosis is
like 0.40 are important. Measures with a validity ruled in, followed by an extensive assessment
of 0.40, for example, can improve ones predic- battery addressing multiple domains and usual-
tion from that 50% of a group of persons will ly including (depending upon age): a Wechsler
succeed at some task to the prediction that 70% Intelligence Scale for Children (WISC-III;
will succeed. If the provider of a service cannot McCraken & McCallum, 1993), a behavior
serve all eligible or needy persons, that checklist (e.g., Youth Self-Report (YSR);
improvement in prediction may be quite useful. Achenbach & Edelbrock, 1987), an academic
In clinical settings, however, decisions are achievement battery (e.g., Kaufmann Assess-
made about individuals, not groups. To ment Battery for Children; Kaufmann &
recommend that one person should not receive Kaufmann, 1985), a personality inventory
a service because the chances of benefit from the (e.g., Millon Adolescent Personality Inventory
service are only 30% instead of the 50% that (MAPI); Millon & Davis, 1993), a computerized
would be predicted without a test, could be sustained attention and distractibility test
regarded as a rather bold decision for a clinician (Gordon Diagnostic System [GDS]; McClure
to make about a person in need of help. Hunter & Gordon, 1984), and a semistructured or a
and Schmidt (1990) have developed very useful stuctured clinical interview (e.g., Diagnostic
approaches to validity generalization that Interview Schedule for Children [DISC]; Cost-
usually result in estimates of test validity well ello, Edelbrock, Kalas, Kessler, & Klaric, 1982).
above the correlations reported in actual use, The results from the diagnostic assessment
but their estimates apply at the level of theory, may be used to further rule in or rule out ADHD
construct validity, rather than at the level of as a diagnosis, in conjunction with child
specific application as in clinical settings. behavior checklists (e.g., CBCL, Achenbach
A recommendation to improve the clinical & Edelbrock, 1983; Teacher Rating Scales,
uses of tests can actually be made: test for more Goyette, Conners, & Ulrich, 1978), completed
things. Think of the determinants of perfor- by the parent(s) and teacher, and additonal
4 The Role of Assessment in Clinical Psychology

school performance information. The parent may be one reason that psychotherapists are
and teacher complete both a historical list and disinclined to test their own clients: they have
then a daily behavior checklist for a period of many opportunities to observe the behaviors in
two weeks in order to adequately sample which they are interested, that is, if not the
behaviors. The information from home and actual behaviors than reasonably good indica-
school domains may be collected concurrently tors of them. As we see it, testing is done
with evaluation of the diagnostic assessement primarily for one or more of three reasons:
battery, or the battery may be used initially to efficiency of observation, revealing cryptic
continue to rule in the diagnosis as a possibility, conditions, and quantitative tagging.
and then proceed with collateral data collection. Testing may provide for more efficient
We are impressed with the recommended observation than most alternatives. For exam-
ADHD diagnostic process, but we do recognize ple, ªtailingº a person, that method so dear to
that it would involve a very extensive clinical detective story writers, would prove definitive
process that would probably not be reimbur- for many dispositions, but it would be expensive
sable under most health insurance plans. We and often impractical or even unethical (Webb,
would also note, however, that the overall Campbell, Schwartz, Sechrest, & Grove, 1981).
diagnostic approach is not based on any Testing may provide for more efficient observa-
decision-theoretic approach that might guide tion than most alternatives. It seems unlikely
the choice of instruments corresponding to a that any teacher would not have quite a good
process of decision making. Or alternatively, the idea of the intelligence and personality of any of
process is not guided by any algorithm for her pupils after at most a few weeks of a school
combining information so as to produce a year, but appropriate tests might provide useful
decision. Our belief is that assessment in clinical information from the very first day. Probably
psychology needs the same sort of attention and clinicians involved in treating patients do not
systematic study as is occurring in medical areas anticipate much gain in useful information
through such organizations as the Society for after having held a few sessions with a patient.
Medical Decision Making. In fact, they may not anticipate much gain
In summary, we think the above scenario, or under most circumstances, which could account
similar procedures using similar instruments for the apparent infrequent use of assessment
(e.g., Atkins, Pelham, & White, 1990; Hoza, procedures in connection with psychological
Vollano, & Pelham, 1995), represent an ex- treatment.
emplar of assessment practice. It should be Testing is also done in order to uncover
noted, however, that the development of such ªcrypticº conditions, that is, characteristics that
multimodal batteries is an iterative process. One are hidden from view or otherwise difficult to
will soon reach the point of diminishing returns discern. In medicine, for example, a great many
in the development of such batteries, and the conditions are cryptic, blood pressure being one
incremental validity (Sechrest, 1963) of instru- example. It can be made visible only by some
ments should be assessed. ADHD is an example device. Cryptic conditions have always been of
in which the important domains of functioning great interest in clinical psychology, although
are understood, and thus can be assessed. We their importance may have been exaggerated
know of no examples other that ADHD of such considerably. The Rorschach, a prime example
systematic approaches to assessment for deci- of a putative decrypter, was hailed upon its
sion making. Although approaches such as introduction as ªproviding a window on the
described here and by Pelham and his colleagues mind,º and it was widely assumed that in skillful
appear to be far from standard practice in the hands the Rorschach would make visible a wide
diagnosis of ADHD, we think they ought to be. range of hidden dispositions, even those
The outlined procedure is modeled after a unknown to the respondent (i.e., in ªthe
procedure developed by Gerald Peterson, unconsciousº). Similarly, the Thematic Apper-
Ph.D., Institute for Motivational Development, ception Test was said to ªexpose underlying
Bellevue, WA. inhibited tendenciesº of which the subject is
unaware and to permit the subject to leave the
test ªhappily unaware that he has presented the
4.01.2 WHY ARE ASSESSMENTS DONE? psychologist with what amounts to an X-ray
picture of his inner selfº (Murray, 1943, p. 1).
Why do we ªtestº in the first place? It is worth Finally, testing may be done, is often done, in
thinking about all the instances in which we do order to provide a quantitative ªtagº for some
not test. For example, we usually do not test our dispositions or other characteristic. In foot
own childrenÐnor our spouses. That is because races, to take a mundane example, no necessity
we have ample opportunities to observe the exists to time the races; it is sufficient to
ªperformancesº in which we are interested. That determine simply the order of the finish.
Why are Assessments Done? 5

Nonetheless, races are timed so that each one clinician or another person might make an
may be quantitatively tagged for sorting and inference about a behavior not even imagined at
other uses, for example, making comparisons the time of the original assessment. A clinician
between races. Similarly, there is scarcely ever might be asked to apply previously obtained
any need for more than a crude indicator of a assessment information to an individual's ability
child's intelligence, for example, ªwell above to work, ability as a parent, likelihood of
average,º such as a teacher might provide. behaving violently, or even the probability that
Nonetheless, the urge to seemingly precise an individual might have behaved in some way in
quantification is strong, even if the precision the past (e.g., abused a spouse or child). Thus,
is specious, and tests are used regularly to they are unbounded in context. Since reliability
provide such estimates as ªat the 78th percentile and validity require context, that is, a measure is
in aggressionº or ªIQ = 118.º Although quant- reliable in particular circumstances, one cannot
itative tags are used, and may be necessary, for readily estimate the reliability and validity of a
some decision-making, for example, the award- measure for unspecified circumstances.
ing of scholarships based on SAT scores, it is to To the extent that the same measures are used
be doubted that such tags are ever of much use repeatedly to make the same type of prediction
in clinical settings. or judgment about individuals, the more the
prediction becomes of a bounded nature. Thus,
4.01.2.1 Bounded vs. Unbounded Inference and an initially unbounded prediction becomes
Prediction bounded by the consistency of circumstances
of repeated use. Under these circumstances,
Bounded prediction is the use of a test or reliability, utility, and validity can be assessed in
measure to make some limited inference or a standard manner (Sechrest, 1968). Without
prediction about an individual, couple, or empirical data, unbounded predictions rest
family, a prediction that might be limited in solely upon the judgment of the clinician, which
time, situation, or range of behavior (Levy, has proven problematic (see Dawes, Faust, &
1963; Sechrest, 1968). Some familiar examples Meehl, 1989; Grove & Meehl, 1996; Meehl,
of bounded prediction are that of a college 1954). Again, the contrast with medical testing
student's grade point average based on their is instructive. In medicine, tests are generally
SAT score, assessing the likely response of an associated with gathering additional informa-
individual to psychotherapy for depression tion about specific problems or systems.
based on MMPI scores and a SCID interview, Although one might have a ªwellnessº visit to
or prognosticating outcome for a couple in detect level of functioning and signs of potential
marital therapy given their history. These problems, it would be scandalous to have a
predictions are bounded because they are using battery of medical tests to ªsee how your health
particular measures to predict a specified might beº under an unspecified set of circum-
outcome in a given context. Limits to bounded stances. Medical tests are bounded. They are for
predictions are primarily based on knowledge of specific purposes at specific times.
two areas. First, the reliability of the informa-
tion, that is, interview or test, for the population 4.01.2.2 Prevalence and Incidence of Assessment
from which the individual is drawn. Second, and
most important, these predictions are based on It is interesting to speculate about how much
the relationship between the predictor and the assessment is actually done in clinical psychol-
outcome. That is to say, they are limited by the ogy today. It is equally interesting to realize how
validity of the predictor for the particular little is known about how much assessment is
context in question. done in clinical psychology today. What little is
Unbounded inference or prediction, which is known has to do with ªincidenceº of assess-
common in clinical practice, is the practice of ment, and that only from the standpoint of the
making general assessment of an individual's clinician and only in summary form. Clinical
tendencies, dispositions, and behavior, and psychologists report that a modest amount of
inferring prognosis for situations that may not their time is taken up by assessment activities.
have been specified at the time of assessment. The American Psychological Association's
These are general statements made about (APA's) Committee for the Advancement of
individuals, couples, and families based on Professional Practice (1996) conducted a survey
interviews, diagnostic tests, response to projec- in 1995 of licensed APA members. With a
tive stimuli, and so forth that indicate how these response rate of 33.8%, the survey suggested
people are likely to behave across situations. that psychologists spend about 14% of their
Some unbounded predictions are simply de- time conducting assessmentsÐroughly six or
scriptive statements, for example, with respect to seven hours per week. The low response rate,
personality, from which at some future time the which ought to be considered disgraceful in a
6 The Role of Assessment in Clinical Psychology

profession that claims to survive by science, is 40 clinicians were queried, and in no instance
indicative of the difficulties involved in getting did any of those clinical psychologists refer any
useful information about the practice of client for psychological assessment.
psychology in almost any area. The response Thus, we conclude that only a small minority
rate was described as ªexcellentº in the report of of clients or patients of psychologists are
the survey. Other estimates converge on about subjected to any formal assessment procedures,
the same proportion of time devoted to a conclusion supported by Wade and Baker
assessment (Wade & Baker, 1977; Watkins, (1977) who found that relatively few clinicians
1991; Watkins, Campbell, Nieberding, & Hall- appear to use standard methods of administra-
mark, 1995). Using data across a sizable number tion and scoring. Despite Wade and Baker's
of surveys over a considerable period of time, findings, it also seems likely that clinical
Watkins (1991) concludes that about 50±75% of psychologists do very little assessment on their
clinical psychologists provide at least some own clients. Most assessments are almost
assessment services. We will say more later certainly on referral. Now contrast that state
about the relative frequency of use of specific of affairs with the practice of medicine:
assessment procedures, but Watkins et al. (1995) assessment is at the heart of medical practice.
did not find much difference in relative use Scarcely a medical patient ever gets any
across seven diverse work settings. substantial treatment without at least some
Think about what appears not to be known: assessment. Merely walking into a medical clinic
the number of psychologists who do assess- virtually guarantees that body temperature and
ments in any period of time; the number of blood pressure will be measured. Any indication
assessments that psychologists who do them of a problem that is not completely obvious will
actually do; the number or proportion of result in further medical tests, including referral
assessments that use particular assessment of patients from the primary care physician to
devices; the proportion of patients who are other specialists.
subjected to assessments; the problems for The available evidence also suggests that
which assessments are done. And that does psychologists do very little in the way of formal
not exhaust the possible questions that might be assessment of clients prior to therapy or other
asked. If, however, we take seriously the forms of intervention. For example, books on
estimate that psychologists spend six or seven psychological assessment even in clinical psy-
hours per week on assessment, then it is unlikely chology may not even mention psychotherapy
that those psychologists who do assessments or other interventions (e.g., see Maloney &
could manage more than one or two per week; Ward, 1976), and the venerated and author-
hence, only a very small minority of patients itative Handbook of psychotherapy and behavior
being seen by psychologists could be undergoing change (Bergen & Garfield, 1994) does not deal
assessment. Wade and Baker (1977) found that with assessment except in relation to diagnosis
psychologists claimed to be doing an average of and the prediction of response to therapy and to
about six objective tests and three projective determining the outcomes of therapy, that is,
tests per week, and that about a third of their there is no mention of assessment for planning
clients were given at least one or the other of the therapy at any stage in the process. That is, we
tests, some maybe both. Those estimates do not think, anomalous, especially when one con-
make much sense in light of the overall estimate templates the assessment activities of other
of only 15% of time (6±8 hours) spent in testing. professions. It is almost impossible even to get
It is almost certain that those assessment to speak to a physician without at least having
activities in which psychologists do engage are one's temperature and blood pressure mea-
carried out on persons who are referred by some sured, and once in the hands of a physician,
other professional person or agency specifically almost all patients are likely to undergo further
for assessment. What evidence exists indicates explicit assessment procedures, for example,
that very little assessment is carried out by auscultation of the lungs, heart, and carotid
clinical psychologists on their own clients, either arteries. Unless the problem is completely
for diagnosis or for planning of treatment. Nor obvious, patients are likely to undergo blood
is there any likelihood that clinical psychologists or other body-fluid tests, imaging procedures,
refer their own clients to some other clinician for assessments of functioning, and so on. The same
assessment. Some years ago, one of us (L. S.) contrast could be made for chiropractors,
began a study, never completed, of referrals speech and hearing specialists, optometrists,
made by clinical psychologists to other mental and, probably, nearly all other clinical specia-
health professionals. The study was never lists. Clinical psychology appears to have no
completed in part because referrals were, standard procedures, not much interest in them,
apparently, very infrequent, mostly having to and no instruments for carrying them out in any
do with troublesome patients. A total of about case. Why is that?
Why are Assessments Done? 7

One reason, we suspect, is that clinical available on the specific questions for which
psychology has never shown much interest in psychologists make assessments when they do
normal functioning and, consequently, does not so.
have very good capacity to identify normal Finally, we do believe that current limitations
responses or functioning. A competent specialist on practice imposed by managed care organiza-
in internal medicine can usefully palpate a tions are likely to limit even further the use of
patient's liver, an organ he or she cannot see, assessment procedures by psychologists. Pres-
because that specialist has been taught what a sures are toward very brief interventions, and
normal liver should feel like and what its that probably means even briefer assessments.
dimensions should (approximately) be. A phy-
sician knows what normal respiratory sounds 4.01.2.3 Proliferation of Assessment Devices
are. An optometrist certainly knows what
constitutes normal vision and a normal eye. Clinical psychology has experienced an
Presumably, a chiropractor knows a normal enormous proliferation of tests since the
spine when he or she sees one. 1960s. We are referring here to commercially
Clinical psychology has no measures equiva- published tests, available for sale and for use in
lent to body temperature and blood pressure, relation to clinical problems. For example,
that is, quick, inexpensive screeners (vital signs) inspection of four current test catalogs indicates
that can yield ªnormalº as a conclusion just as that there are at least a dozen different tests
well as ªabnormal.º Moreover, clinical psychol- (scales, inventories, checklists, etc.) related to
ogists appear to have a substantial bias toward attention deficit disorder (ADD) alone, includ-
detection of psychopathology. The consequence ing forms of ADD that may not even exist, for
is that clinical psychological assessment is not example, adult ADD. One of the test catalogs is
likely to provide a basis for a conclusion that a 100 pages, two are 176 pages, and the fourth is
given person is ªnormal,º and that no interven- an enormous 276 pages. Even allowing for the
tion is required. Obviously, the case is different fact that some catalog pages are taken up with
for ªintelligence,º for which the conclusion of advertisements for books and other such, the
ªaverageº or some such is quite common. amount of test material available is astonishing.
By their nature, psychological tests are not These are only four of perhaps a dozen or so
likely to offer many surprises. A medical test catalogs we have in our files.
may reveal a completely unexpected condition In the mid-1930s Buros published the first
of considerable clinical importance, for exam- listings of psychological tests to help guide users
ple, even in a person merely being subjected to a in a variety of fields in choosing an appropriate
routine examination. Most persons who come assessment instrument. These early uncritical
to the attention of psychologists and other listings of tests developed into the Mental
mental health professionals are there because measurements yearbook and by 1937 the listings
their behavior has already betrayed important had expanded to include published test reviews.
anomalies, either to themselves or to others. A The Yearbook, which includes tests and reviews
clinical psychologist would be quite unlikely to of new and revised tests published for commer-
administer an intelligence test to a successful cial use, has continued to grow and is now in its
business man and discover, completely unex- 12th edition (1995). The most recent edition
pectedly, that the man was really ªstupid.º Tests reviewed 418 tests available for use in education,
are likely to be used only for further exploration psychology, business, and psychiatry. Buros
or verification of problems already evident. If Mental Measurements Yearbook is a valuable
they are already evident, then the clinician resource for testers, but it also charts the growth
managing the case may not see any particular of assessment instruments. In addition to
need for further assessment. instruments published for commercial use, there
A related reason that clinical psychologists are scores of other tests developed yearly for
appear to show so little inclination to do noncommercial use that are never reviewed by
assessment of their own patients probably has Buros. Currently, there are thousands of
to do with the countering inclination of clinical assessment instruments available for research-
psychologists, and other similarly placed clin- ers and practitioners to choose from.
icians, to arrive at early judgments of patients The burgeoning growth in the number of tests
based on initial impressions. Meehl (1960) noted has been accompanied by increasing commer-
that phenomenon many years ago, and it likely cialization as well. The monthly Monitor
has not changed. Under those circumstances, published by the APA is replete with ads for
testing of clients would have very little incre- test instruments for a wide spectrum of
mental value (Sechrest, 1963) and would seem purposes. Likewise, APA conference attendees
unnecessary. At this point, it may be worth are inundated with preconference mailings
repeating that apparently no information is advertising tests and detailing the location of
8 The Role of Assessment in Clinical Psychology

the test publisher's booth at the conference site. of every individual item. The SF-36 has now
Once at the conference, attendees are often been translated into at least 37 languages and is
struck by the slick presentation of the booths being used in an extraordinarily wide variety of
and hawking of the tests. Catalogs put out by research projects. More important, however,
test publishers are now also slick, in more ways the SF-36 is also being employed routinely in
than one. They are printed in color on coated evaluating outcomes of clinical medical care.
paper and include a lot of messages about how Plans are well advanced for use of the SF-36 that
convenient and useful the tests are with almost will result in its administration to 300 000
no information at all about reliability and patients in managed care every year. It is
validity beyond assurances that one can count possible that over the years the Wechsler
on them. intelligence tests might have a comparable
The proliferation of assessment instruments history of development, and the Minnesota
and commercial development are not inherently Multiphasic Inventory (MMPI) has been the
detrimental to the field of clinical psychology. focus of a great many investigations, as has the
They simply make it more difficult to choose an Rorschach. Neither of the latter, however, has
appropriate test that is psychometrically sound, been the object of systematic development
as glib ads can be used as a substitute for the efforts funded centrally, and scarcely any of
presentation of sound psychometric properties the many other tests now available are likely to
and critical reviews. This is further complicated be subjected to anything like the same level of
by the availability of computer scoring and development effort (e.g., consider that in its
software that can generate assessment reports. more than 70-year history, the Rorschach has
The ease of computer-based applications such never been subjected to any sort of revision of its
as these can lead to their uncritical application original items).
by clinicians. Intense marketing of tests may Several factors undoubtedly contribute to the
contribute to their misuse, for example, by proliferation of psychological tests (not the
persuading clinical psychologists that the tests least, we suspect, being their eponymous
are remarkably simple and by convincing those designation and the resultant claim to fame),
same psychologists that they know more than but surely one of the most important would be
they actually do about tests and their appro- the fragmentation of psychological theory, or
priate uses. what passes for theory. In 1995 a taskforce was
Multiple tests, even several tests for every assembled under the auspices of the APA to try
construct, might not necessarily be a bad idea in to devise a uniform test (core) battery that
and of itself, but we believe that the resources in would be used in all psychotherapy research
psychology are simply not sufficient to support studies (Strupp, Horowitz, & Lambert, 1997).
the proper development of so many tests. Few of The effort failed, in large part because of the
the many tests available can possibly be used on many points of view that seemingly had to be
more than a very few thousand cases per year, represented and the inability of the conferees to
and perhaps not even that. The consequence is agree even on any outcomes that should be
that profit margins are not sufficient to support common to all therapies. Again, the contrast
really adequate test development programs. with medicine and the nearly uniform accep-
Tests are put on the market and remain there tance of the SF-36 is stark.
with small normative samples, with limited Another reason for the proliferation of tests
evidence for validity, which is much more in psychology is, unquestionably, the seeming
expensive to produce than evidence for relia- ease with which they may be ªconstructed.º
bility, and with almost no prospect for systema- Almost anyone with a reasonable ªconstructº
tic exploration of the other psychometric can write eight or 10 self-report items to
properties of the items, for example, discrimina- ªmeasureº it, and most likely the new little
tion functions or tests of their calibration scale will have ªacceptableº reliability. A
(Sechrest, McKnight, & McKnight, 1996). correlation or two with some other measure
One of us (L. S.) happens to have been a close will establish its ªconstruct validity,º and the
spectator of the development of the SF-36, a rest will eventually be history. All that is
now firmly established and highly valued required to establish a new projective test, it
measure of health and functional status (Ware seems, is to find a set of stimuli that have not,
& Sherbourne, 1992). The SF-36 took 15±20 according to the published literature, been used
years for its development, having begun as an before and then show that responses to the
item pool of more than 300 items. Over the years stimuli are suitably strange, perhaps stranger for
literally millions of dollars were invested in the some folks than others. For example, Sharkey
development of the test, and it was subjected, and Ritzler (1985) noted a new Picture
often repeatedly, to the most sophisticated Projective Test that was created by using
psychometric analyses and to detailed scrutiny photographs from a photo essay. The pictures
Why are Assessments Done? 9

were apparently selected based on the authors' of the test. Psychology programs appear,
opinions about their ability to elicit ªmean- unfortunately, to be abandoning training in
ingful projective material,º meaning responses basic measurement and its theory (Aiken, West,
with affective content and activity themes. No Sechrest, & Reno, 1990).
information was given pertaining to compar-
ison of various pictures and their responses nor 4.01.2.4 Over-reliance on Self-report
relationships to other measures of the target
constructs; no comparisons were made to ªWhere does it hurt?º is a question often heard
pictures that were deemed inappropriate. The in physicians' offices. The physician is asking the
ªvalidationº procedure simply compared diag- patient to self-report on the subjective experi-
noses to those in charts and results of the TAT. ence of pain. Depending on the answer, the
Although rater agreement was assessed, there physician may prescribe some remedy, or may
was no formal measurement of reliability. order tests to examine the pain more thoroughly
New tests are cheap, it seems. One concern is and obtain objective evidence about the nature
that so many new tests appear also to imply new of the affliction before pursuing a course of
constructs, and one wonders whether clinical treatment. The analog heard in psychologists'
psychology can support anywhere near as many offices is ªHow do you feel?º Again, the inquiry
constructs as are implied by the existence of so calls forth self-report on a subjective experience
many measures of them. Craik (1986) made the and like the physician, the psychologist may
eminently sensible suggestion that every ªnewº determine that tests are in order to better
or infrequently used measure used in a research understand what is happening with the client.
project should be accompanied by at least one When the medical patient goes for testing, she
well-known and widely used measure from the or he is likely to be poked, prodded, or pricked
same or a closely related domain. New measures so that blood samples and X-rays can be taken.
should be admitted only if it is clear that they The therapy client, in contrast, will most likely
measure something of interest and are not be responding to a series of questions in an
redundant, that is, have discriminant validity. interview or answering a pencil-and-paper
That recommendation would likely have the questionnaire. The basic difference between
effect of reducing the array of measures in these is that the client in clinical psychology will
clinical psychology by remarkable degrees if it continue to use self-report in providing a
were followed. sample, whereas the medical patient will provide
The number of tests that are taught in objective evidence.
graduate school for clinical psychology is far Despite the proliferation of tests in recent
lower than the number available for use. The years, few rely on evidence other than the
standard stock-in-trade are IQ tests such as the client's self-report for assessing behavior,
Wechsler Adult Intelligence Scale (WAIS), symptoms, or mood state. Often assessment
personality profiles such as the MMPI, diag- reports remark that the information gleaned
nostic instruments (Structured Clinical Inter- from testing was corroborated by interview
view for DSM-III-R [SCID]), and at some data, or vice versa, without recognizing that
schools, the Rorschach as a projective test. This both rely on self-report alone. The problems
list is rounded out by a smattering of other tests with self-report are well documented: poor
like the Beck Depression Inventory and Millon. recall of past events, motivational differences in
Recent standard application forms for clinical responding, social desirability bias, and mal-
internships developed by the Association of ingering, for example. Over-reliance on self-
Psychology Postdoctoral and Internship Cen- report is a major criticism of psychological
ters (APPIC) asked applicants to report on their assessment as it is currently conducted and was
experience with 47 different tests and proce- the topic of a recent conference sponsored by
dures used for adult assessment and 78 addi- the National Institute of Mental Health.
tional tests used with children! It is very What alternatives are there to self-report?
doubtful that training programs actually pro- Methods of obtaining data on a client's behavior
vide training in more than a handful of the that do not rely on self-report do exist.
possible devices. Behavioral observation with rating by judges
Training in testing (assessment) is not at all can permit the assessment of behavior, often
the same as training in measurement and without the client's awareness or outside the
psychometrics. Understanding how to admin- confines of an office setting. Use of other in-
ister a test is useful but cannot substitute for formants such as family members or co-workers
evaluating the psychometric soundness of tests. to provide data can yield valuable information
Without grounding in such principles, it is easy about a client. Yet, all too often these
to fall prey to glib ads and ease of computer alternatives are not pursued because they
administration without questioning the quality involve time or resourcesÐin short, they are
10 The Role of Assessment in Clinical Psychology

demanding approaches. Compared with asking given assessment may be highly dependable
a client about his or her mood state over the last across particular items but not necessarily
week, organizing field work or contacting across time. An example might be a measure
informants involves a great deal more work of mood, which ought to have high internal
and time. consistency (i.e., across items) but that might
Instruments are available to facilitate collec- not, in fact, should not, have high dependability
tion of data not relying so strongly on self- over time, else the measure would be better seen
report and for collection of data outside the as a trait rather than as a mood measure.
office setting, for example, the Child Behavior An assessment procedure might be highly
Checklist (CBCL; Achenbach & Edelbrock, dependable in terms of internal consistency and
1983). The CBCL is meant to assist in across time but not satisfactorily dependable
diagnosing a range of psychological and across users, for example, being susceptible to a
behavior problems in children, and it relies on variety of biases characteristic of individual
parent, teacher, and self-reports of behavior. clinicians. Or an assessment procedure might
Likewise, neuropsychological tests utilize func- not be adequately dependable across conditions
tional performance measures much more than of its use, as might be the case when a measure is
self-report. However, as Craik (1986) noted taken from a research to a clinical setting. Or an
with respect to personality research, methods assessment procedure might not be dependable
such as field studies are not widely used as across populations, for example, a projective
alternatives to self-report. This problem of over- instrument useful with mental patients might be
reliance on self-report is not new (see Webb, misleading if used with imaginative and playful
Campbell, Schwartz, & Sechrest, 1966). college students.
Issues of dependability are starkly critical
when one notes the regrettably common
4.01.3 PSYCHOMETRIC ISSUES WITH practice of justifying the use of a measure on
RESPECT TO CURRENT the ground that it is ªreliable,º often without
MEASURES even minimal specification of the facet(s) across
Consideration of the history and current which that reliability was established. The
status of clinical assessment must deal with practice is even more regrettable when, as is
some fundamental psychometric issues and often the case, only a single value for reliability
practices. Although psychometric is usually is given when many are available and when one
taken to refer to reliability and validity of suspects that the figure reported was not chosen
measures, matters are much more complicated randomly from those available. Moreover, it is
than that, particularly in light of developments in all too frequently the case that the reliability
psychometric theory and method since the estimate reported is not directly relevant to the
1960s, which seem scarcely to have penetrated decisions to be made. Internal consistency, for
clinical assessment as an area. Specifically, gen- example, may not be as important as general-
eralizability theory and Item Response Theory izability over time when one is using a screening
(IRT) offer powerful tools with which to explore instrument. That is, if one is screening in a
and develop clinical assessment procedures, but population for psychopathology, it may not be
they have seen scant use in that respect. of great interest that two persons with the same
scores are different in terms of their manifesta-
4.01.3.1 Reliability tions of pathology, but it is of great interest
whether if one retested them a day or so later,
The need for ªreliableº measures is by now the scores would be roughly consistent.
well accepted in all of psychology, including In short, clinical assessment in psychology is
clinical assessment. What is not so widespread is unfortunately casual in its use of reliability
the necessary understanding of what constitutes estimates, and it is shamefully behind the curve
reliability and the various uses of that term. In in its attention to the advantages provided by
their now classic presentation of generalizability generalizability theory, originally proposed in
theory, Cronbach and his associates (Cronbach, 1963 (Cronbach, Rajaratnam, & Gleser, 1963).
Gleser, Nanda, & Rajaratnam, 1972) used the
term ªdependabilityº in a way that is close to 4.01.3.2 Validity
what is meant by reliability, but they made
especially clear, as classical test theory had not, It is customary to treat validity of measures as
that measures are dependable (generalizable) in a topic separate from reliability, but we think
very specific ways, that is, that they are that is not only unnecessary but undesirable. In
dependable across some particular conditions our view, the validity of measures is simply an
of use (facets), and assessments of dependability extension of generalizability theory to the
are not at all interchangeable. For example, a question of what other performances aside from
Psychometric Issues with Respect to Current Measures 11

those involved in the test is the score general- ability testing. The origins of IRT go back at
izable. A test score that is generalizable to least to the early 1950s and the publication of
another very similar performance, say on the Lord's (1952) monograph, A theory of test
same set of test items or over a short period of scores, but it has had little impact on measure-
time, is said to be reliable. A test score that is ment outside the arena of ability testing (Meier,
generalizable to a score on another similar test is 1994). Certainly it has had almost no impact on
sometimes said to be ªvalid,º but we think that a clinical assessment. The current PsychInfo
little reflection will show that unless the tests database includes only two references to IRT
demand very different kinds of performances, in relation to the MMPI and only one to the
generalizability from one test to another is not Rorschach, and the latter one, now 10 years old,
much beyond the issues usually regarded as is an entirely speculative mention of a potential
having to do with reliability. When, however, a application of IRT (Samejima, 1988).
test produces a score that is informative about IRT, perhaps to some extent narrowly
another very different kind of performance, we imagined to be relevant only to test construction,
gradually move over into the realm termed can be of great value in exploring the nature of
validity, such as when a paper-and-pencil test of measures and improving their interpretation.
ªreadiness for changeº (Prochaska, DiCle- For example, IRT can be useful in under-
mente, & Norcross, 1992) predicts whether a standing just when scores may be interpreted as
client will benefit from treatment or even just unidimensional and then in determining the size
stay in treatment. of gaps in underlying traits represented by
We will say more later about construct adjacent scores. An example could be the
validity, but a test or other assessment procedure interpretation of Whole responses on the
may be said to have construct validity if it Rorschach. Is the W score a unidimensional
produces generalizable information and if that score, and, if so, is each increment in that score to
information relates to performances that are be interpreted as an equal increment? Some
conceptually similar to those implied by the cards are almost certainly more difficult stimuli
name or label given to the test. Essentially, to which to produce a W response, and IRT
however, any measure that does not produce could calibrate that aspect of the cards. IRT
scores by some random process is by that would be even more easily used for standard
definition generalizable to some other perfor- paper-and-pencil inventory measures, but the
mance and, hence, to that extent may be said to total number of applications to date is small, and
be valid. What a given measure is valid for, that one can only conclude that clinical assessment is
is, generalizable to, however, is a matter of being short-changed in its development.
discovery as much as of plan. All instruments
used in clinical assessment should be subjected to 4.01.3.4 Scores on Tests
comprehensive and continuing investigation in
order to determine the sources of variance in Lord's (1952) monograph was aimed at tests
scores. An instrument that has good general- with identifiable underlying dimensions such as
izability over time and across raters may turn out ability. Clinical assessment appears never to
to be, among other things, a very good measure have had any theory of scores on instruments
of some response style or other bias. The MMPI included under that rubric. That is, there seems
includes a number of ªvalidityº scales designed never to have been proposed or adapted any
to assess various biases in performance on it, and unifying theory about how test scores on clinical
it has been subjected to many investigations of instruments come about. Rather there seems to
bias. The same cannot be said of some other have been a passive, but not at all systematic,
widely used clinical assessment instruments and adoption of general test theory, that is, the idea
procedures. To take the most notable example, that test scores are in some manner generated by
of the more than 1000 articles on the Rorschach responses representing some underlying trait.
that are in the current PsychInfo database, only a That casual approach cannot forward the
handful, about 1%, appear to deal with issues of development of the field.
response bias, and virtually all of those are on Fiske (1971) has come about as close as
malingering and most of them are unpublished anyone to formulating a theory of test scores for
dissertations. clinical assessment, although his ideas pertain
more to how such tests are scored than to how
4.01.3.3 Item Response Theory they come about, and his presentation was
directed toward personality measurement
Although Item Response Theory (IRT) is a rather than clinical assessment. He suggested
potentially powerful tool for the development several models for scoring test, or otherwise
and study of measures of many kinds, its use to observed, responses. The simplest model is what
date has not been extensive beyond the area of we may call the cumulative frequency model,
12 The Role of Assessment in Clinical Psychology

which simply increments the score by 1 for every say, on an MMPI scale is ªhigh,º but we do not
observed response. This is the model that know very well what might be expected in the
underlies many Rorschach indices. It assumes behavior of a person with such a score. We
that every response is equivalent to every other would know even less about what difference it
one, and it ignores the total number of might make if the score were reduced to 60 or
opportunities for observation. Thus, each increased to 80 except that in one case we might
Rorschach W response counts as 1 for that expect some diminution in problems and in the
index, and the index is not adjusted to take other some increase. In part the lack of
account of the total number of responses. A calibration of measures in clinical psychology
second model is the relative frequency model, stems from lack of any specific interest and
which forms an index by dividing the number of diligence in accomplishing the task. Clinical
observed critical responses by some indicator of psychology has been satisfied with ªloose
opportunities to form a rate of responding, for calibration,º and that stems in part, as we will
example, as would be accomplished by counting assert later, from adoption of the uninformative
W responses and dividing by the total number of model of significance testing as a standard for
responses or by counting W responses only for validation of measures.
the first response to each card. Most paper-and-
pencil inventories are scored implicitly in that
way, that is, they count the number of critical 4.01.4 WHY HAVE WE MADE SO LITTLE
responses in relation to the total number PROGRESS?
possible.
A long story must be made short here, but It is difficult to be persuaded that progress in
Fiske describes other models, and still more are assessment in clinical psychology has been
possible. One may weight responses according substantial in the past 75 years, that is, since
to the inverse of their frequency in a population the introduction of the Rorschach. Several
on the grounds that common responses should arguments may be adduced in support of that
count for less than rare responses. Or one may statement, even though we recognize that it will
weight responses according to the judgments of be met with protests. We will summarize what
experts. One can assign the average weight we think are telling arguments in terms of
across a set of responses, a common practice, theory, formats, and validities of tests.
but one can also assign as the score the weight of First, we do not discern any particular
the most extreme response, for example, as improvements in theories of clinical testing
runners are often rated on the basis of their and assessments over the past 75 years. The
fastest time for any given distance. Pathology is Rorschach, and the subsequent formulation of
often scored in that way, for example, a the projective hypothesis, may be regarded as
pathognomic response may outweigh many having been to some extent innovations; they
mundane, ordinary responses. are virtually the last ones in the modern history
The point is that clinical assessment instru- of assessment. As noted, clinical assessment lags
ments and procedures only infrequently have well behind the field in terms of any theory of
any explicit basis in a theory of responses. For either the stimuli or responses with which it
the most part, scores appear to be derived in deals, let alone the connections between them.
some standard way without much thought No theory of assessment exists that would guide
having been given to the process. It is not clear selection of stimuli to be presented to subjects,
how much improvement in measures might be and certainly none pertains to the specific
achieved by more attention to the development format of the stimuli nor to the nature of the
of a theory of scores, but it surely could not hurt responses required. Just to point to two simple
to do so. examples of the deficiency in understanding of
response options, we note that there is no theory
4.01.3.5 Calibration of Measures to suggest whether in the case of a projective test
responses should be followed by any sort of
A critical limitation on the utility of psycho- inquiry about their origins, and there is no
logical measures of any kind, but certainly in theory to suggest in the case of self-report
their clinical application, is the fact that the inventories whether items should be formulated
measures do not produce scores in any directly so as to produce endorsements of the ªthis is
interpretable metric. We refer to this as the true of meº nature or so as to produce
calibration problem (Sechrest, McKnight, & descriptions such as ªthis is what I do.º
McKnight, 1996). The fact is that we have only a Given the lack of any gains in theory about the
very general knowledge of how test scores may assessment enterprise, it is not surprising that
be related to any behavior of real interest. We there have also not been any changes in test
may know in general that a score of 70, let us formats since the introduction of the Rorschach.
Why Have We Made So Little Progress? 13

Projective tests based on the same simple (and The main ªadvanceº in assessment over the
inadequate) hypothesis are still being devised, past 75 years is not that we do anything really
but not one has proven itself in any way better better but that we do it much more widely. We
than anything that has come before. Item writers have many more scales than existed in the past,
may be a bit more sophisticated than those in the and we can at least assess more things than ever
days of the Bernreuter, but items are still before, even if we can do that assessment only,
constructed in the same way, and response at best, passably well.
formats are the same as ever, ªagree±disagree,º Woodworth (1937/1992) wrote in his article
ªtrue±false,º and so on. on the future of clinical psychology that,
Even worse, however, is the fact that ªThere can be no doubt that it will advance,
absolutely no evidence exists to suggest that and in its advance throw into the discard
there have been any mean gains in the validities much guesswork and half-knowledge that now
of tests over the past 75 years. Even for tests of finds baleful application in the treatment of
intellectual functioning, typical correlations children, adolescents and adultsº (p. 16). It
with any external criterion appear to average appears to us that the opposite has occurred.
around 0.40, and for clinical and personality Not only have we failed to discard guesswork
tests the typical correlations are still in the range and half-knowledge, that is, tests and treat-
of 0.30, the so-called ªpersonality coefficient.º ments with years of research indicating little
This latter point, that validities have remained effect or utility, we have continued to generate
constant, may, of course, be related to the lack procedures based on the same flawed assump-
of development of theory and to the fact that the tions with the misguided notion that if we just
same test formats are still in place. make a bit of a change here and there, we will
Perhaps some psychologists may take excep- finally get it right. Projective assessments that
tion to the foregoing and cite considerable tell us, for example, that a patient is psychotic
advances. Such claims are made for the Exner are of little value. Psychologists have more
(1986) improvements on the Rorschach, known reliable and less expensive ways of determining
as the ªcomprehensive system,º and for the this. More direct methods have higher validity
MMPI-2, but although both claims are super- in the majority of cases. The widespread use
ficially true, there is absolutely no evidence for of these procedures at high actual and op-
either claim from the standpoint of validity of portunity cost is not justified by the occasional
either test. The Exner comprehensive system addition of information. It is not possible to
seems to have ªcleaned upº some aspects of know ahead of time which individuals might
Rorschach scoring, but the improvements are give more information via an indirect method,
marginal, for example, it is not as if inter-rater and most of the time it is not even possible
reliability increased from 0.0 to 0.8, and no to know afterwards whether indirectly ob-
improvements in validity have been established. tained ªinformationº is correct unless the
Even the improvements in scoring have been information has also been obtained in some
demonstrated for only a portion of the many other way, that is, asking the person, asking a
indexes. The MMPI-2 was only a cosmetic relative, or doing a structured interview. It is
improvement over the original, for example, unlikely that projective test responses will alter
getting rid of some politically incorrect items, clinical intervention in most cases, nor should
and no increase in the validity of any score or it.
index seems to have been demonstrated, nor is Is it fair to say that clinical psychology has no
any likely. standards (see Sechrest, 1992)? Clinical psy-
An additional element in the lack of evident chology gives the appearance of standards with
ªprogressº in the validity of test scores may be accreditation of programs, internships, licen-
lack of reliability (and validity!) of people being sure, ethical standards, and so forth. It is our
predicted. (One wise observer suggested that we observation, however, that there is little to no
would not really like it at all if behavior were monitoring of the purported standards. For
90% predictable! Especially our own.) We may example, in reviewing recent literature as
just have reached the limits of our ability to background to this chapter, we found articles
predict what is going to happen with and to published in peer-reviewed journals using
people, especially with our simple-minded and projective tests as outcome measures for
limited assessment efforts. As long as we limit treatment. The APA ethical code of conduct
our assessment efforts to the dispositions of the states that psychologists ª. . . use psychological
individuals who are clients and ignore their assessment . . . for purposes that are appropriate
social milieus, their real environmental circum- in light of the research on or evidence of
stances, their genetic possibilities, and so on, we the. . . proper application of the techniques.º
may not be able to get beyond correlations of The APA document, Standards for educational
0.3 or 0.4. and psychological testing, states:
14 The Role of Assessment in Clinical Psychology

. . . Validity however, is a unitary concept. potential accuracy of even those loose predic-
Although evidence may be accumulated in may tions. We are not sure how much improvement
ways, validity always refers to the degree to which in clinical assessment might be possible even
that evidence supports the inferences that are made with exact and fairly immediate feedback, but
from the scores. The inferences regarding specific
we are reasonably sure that very little improve-
uses of a test are validated, not the test itself. (APA,
1985, p. 9) ment can occur without it.

Further, the section titled, Professional stan- 4.01.5 FATEFUL EVENTS


dards for test use (APA, 1985, p. 42, Standard CONTRIBUTING TO THE HISTORY
6.3) states: OF CLINICAL ASSESSMENT

When a test is to be used for a purpose for which it The history of assessment in clinical psychol-
has not been previously validated, or for which ogy is somewhat like the story of the evolution
there is no supported claim for validity, the user is of an organism in that at critical junctures, when
responsible for providing evidence of validity. the development of assessment might well have
gone one way, it went another. We want to
No body of research exists to support the review here several points that we consider to be
validity of any projective instrument as the sole critical in the way clinical assessment developed
outcome measure for treatmentÐor as the sole within the broader field of psychology.
measure of anything. So not only do question-
able practices go unchecked, they can result in
4.01.5.1 The Invention of the Significance Test
publication.
The advent of hypothesis testing in psychol-
ogy had fateful consequences for the develop-
4.01.4.1 The Absence of the Autopsy ment of clinical assessment, as well as for the rest
Medicine has always been disciplined by the of psychology (Gigerenzer, 1993). Hypothesis
regular occurrence of the autopsy. A physician testing encouraged a focus on the question
makes a diagnosis and treats a patient, and if the whether any predictions or other consequences
patient dies, an autopsy will be done, and the of assessment were ªbetter than chance,º a
physician will receive feedback on the correct- distinctly loose and undemanding criterion of
ness of his or her diagnosis. If the diagnosis were ªvalidityº of assessment. The typical validity
wrong, the physician would to some extent be study for a clinical instrument would identify
called to account for that error; at least the error two groups that would be expected to differ in
would be known, and the physician could not some ªscoreº derived from the instrument and
simply shrug it off. We know that the foregoing then ask the question whether the two groups
is idealized, that autopsies are not done in more did in fact (i.e., to a statistically significant
than a fraction of cases, but the model makes degree) differ in that score. It scarcely mattered
our point. Physicians make predictions, and by how much they differed or in what specific
they get feedback, often quickly, on the way, for example, an overall mean difference vs.
correctness of those predictions. Surgeons send a difference in proportions of individuals
tissue to be biopsied by pathologists who are scoring beyond some extreme or otherwise
disinterested; internists make diagnoses based critical value. The existence of any ªsignificantº
on various signs and symptoms and then order difference was enough to justify triumphant
laboratory procedures that will inform them claims of validity.
about the correctness of their diagnosis; family
practitioners make diagnoses and prescribe 4.01.5.2 Ignoring Decision Making
treatment, which, if it does not work, they are
virtually certain to hear about. One juncture had to do with bifurcation of the
Clinical psychology has no counterpart to the development of clinical psychology from other
autopsy, no systematic provision for checking streams of assessment development. Specifi-
on the correctness of a conclusion and then cally, intellectual assessment and assessment of
providing feedback to the clinician. Without various capacities and propensities relevant to
some form of systematic checking and feedback, performance in work settings veered in the
it is difficult to see how improvement in either direction of assessment for decision-making
instruments or clinicians' use of them could be (although not terribly sharply nor completely),
regularly and incrementally improved. Psychol- while assessment in clinical psychology went in
ogist clinicians have been allowed the slack the direction of assessment for enlightenment.
involved in making unbounded predictions and What eventually happened is that clinical
then not getting any sort of feedback on the psychology failed to adopt any rigorous
Fateful Events Contributing to the History of Clinical Assessment 15

criterion of correctness of decisions made on the missed. Unfortunately, that maneuver would
basis of assessed performance, but adopted also maximize the number of false-positives,
instead a conception of assessments as generally that is, many cases would be identified as
informative or ªcorrect.º suitable for therapy who, in fact, were not.
Simply to make the alternative clear, the Obviously, the specificity of the test could be
examples provided by medical assessment are maximized by declaring all cases as unsuitable
instructive. The model followed in psychology for therapy, thus ensuring that the number of
would have resulted in medical research of some false-positives would be zeroÐwhile at the same
such nature as showing that two groups that time ensuring that the number of false-negatives
ªshouldº have differed in blood pressure, for would be maximal, and no one would be
example, persons having just engaged in treated.
vigorous exercise vs. persons having just We go into these issues in some detail in order
experienced a rest period, differed significantly to make clear how very different such thinking is
in blood pressure readings obtained by a from usual practices in clinical psychological
sphygmomanometer. Never mind by how much assessment. The requirements for Receiver
they differed or what the overlap between the Operating Curves (ROC), which is the way
groups. The very existence of a ªsignificantº issues of sensitivity and specificity of measures
difference would have been taken as evidence are often labeled and portrayed, are stringent.
for the ªvalidityº of the sphygmomanometer. They are not satisfied by simple demonstrations
Instead, however, medicine focused more that measures, for example, suitability for
sharply on the accuracy of decisions made on treatment, are ªsignificantly related toº other
the basis of assessment procedures. The aspect of measures of interest, for example, response to
biomedical assessment that most clearly distin- treatment. The development of ROC statistics
guishes it from clinical psychological assessment almost always occurs in the context of the use of
is its concern for sensitivity and specificity of tests for decision-making: treat±not treat, hire±
measures (instruments) (Kraemer, 1992). Krae- not hire, do further tests±no further tests. Those
mer's book, Evaluating medical tests: Objective kinds of uses of tests in clinical psychological
and quantitative guidelines, has not even a close assessment appear to be rare.
counterpart in psychology, which is, itself, Issues of sensitivity-specificity require the
revealing. These two characteristics of measures existence of some reasonably well-defined
are radically different from the concepts of criterion, for example, the definition of what
validity used in psychology, although ªcriterion is meant by favorable response to treatment and
validityº (now largely abandoned) would seem a way of measuring it. In biomedical research,
to require such concepts. ROC statistics are often developed in the
Sensitivity refers to the proportion of cases context of a ªgold standard,º a definitive
having a critical characteristic that are identified criterion. For example, an X ray might serve
by the test. For example, if a test were devised to as a gold standard for a clinical judgment about
select persons likely to benefit from some form the existence of a fracture, or a pathologist's
of therapy, sensitivity would refer to the report on a cytological analysis might serve as a
proportion of cases that would actually benefit gold standard for a screening test designed to
which would be identified correctly by the test. detect cancer. Clinical psychology has never had
These cases would be referred to as ªtrue anything like a gold standard against which its
positives.º Any cases that would benefit from various tests might have been validated.
the treatment but that could not be identified by Psychiatric diagnosis has sometimes been of
the test would be ªfalse-negativesº in this interest as a criterion, and tests of different types
example. Conversely, a good test should have have been examined to determine the extent to
high specificity, which would be avoiding ªfalse- which they produce a conclusion in agreement
positives,º or incorrectly identifying as good with diagnosis (e.g., Somoza, Steer, Beck, &
candidates for therapy persons who would not Clark, 1994), but in that case the gold standard
actually benefit. The ªtrue negativeº group is suspect, and it is by no means clear that
would be those persons who would not benefit disagreement means that the test is wrong.
from treatment, and a good test should correctly The result is that for virtually no psycholo-
identify a large proportion of them. gical instrument is it possible to produce a useful
As Kraemer (1992) points out, sensitivity and quantitative estimate of its accuracy. Tests and
specificity as test requirements are nearly always other assessment devices in clinical psychology
in opposition to each other, and are reciprocal. have been used for the most part to produce
Maximizing one requirement reduces the other. general enlightenment about a target of interest
Perfect sensitivity can be attained by, in our rather than to make a specific prediction of
example, a test that identifies every case as some outcome. People who have been tested are
suitable for therapy; no amenable cases are described as ªhigh in anxiety,º ªclinically
16 The Role of Assessment in Clinical Psychology

depressed,º or ªof average intelligence.º State- to be viewed as sufficient. Thus, for example,
ments of that sort, which we have referred to construct validity usually requires that one
previously as unbounded predictions, are measure of a construct correlates with another.
possibly enlightening about the nature of a Such a correlation is not, however, a sufficient
person's functioning or about the general range condition for construct validity, but, none-
within which problems fall, but they are not theless, a simple zero-order correlation between
specific predictions, and are difficult to refute. two tests is often cited as ªevidenceº for the
construct validity of one measure or the other.
4.01.5.3 Seizing on Construct Validity Even worse, under the pernicious influence of
the significance testing paradigm, any statisti-
In 1955, Cronbach and Meehl published what cally significant correlation may be taken as
is arguably the most influential article in the evidence of ªgood construct validity.º Or, for
field of measurement: Construct validity in another example, construct validity usually
psychological tests (Cronbach & Meehl, 1955). requires a particular factor structure for a
This is the same year as the publication of measure, but the verification of the required
Antecedent probability and the efficiency of factor structure is not sufficient evidence for
psychometric signs, patterns, or cutting scores construct validity of the measure involved. The
(Meehl & Rosen, 1955). It is safe to say that no fact that a construct is conceived as unidimen-
two more important articles about measure- sional does not mean that a measure alleged to
ment were ever published in the same year. The represent the construct does so simply because it
propositions set forth by Cronbach and Meehl appears to form a single factor.
about the validity of tests were provocative and The net result of the dependence on sig-
rich with implications and opportunities. In nificance testing and the poor implementation
particular, the idea of construct validity re- of the ideas represented by construct validity has
quired that measures be incorporated into been that the standards of evidence for the
elaborated theoretical structure, which was validity of psychological measures has been
labeled the ªnomological net.º Unfortunately, distressingly low.
the fairly daunting requirements for embedding
measures in theory were mostly ignored in 4.01.5.4 Adoption of the Projective Hypothesis
clinical assessment (the same could probably be
said about most other areas of psychology, but The projective hypothesis (Frank, 1939) is a
it is not our place here to say so), and the idea of general proposition stating that whatever an
construct validity was trivialized. individual does when exposed to an ambiguous
The trivialization of construct validity reflects stimulus will reveal important aspects of his or
in part the fact that no standards for construct her personality. Further, the projective hypoth-
validity exist (and probably none can be written) esis suggests that indirect responses, that is,
and the general failure to distinguish between those to ambiguous stimuli, are more valid than
necessary and sufficient conditions for the direct responses, that is, those to interviews or
inference of construct validity. In their pre- questionnaires. There is little doubt that indirect
sentation of construct validity, Cronbach and responses reveal something about people,
Meehl did not specify any particular criteria for although whether that which is revealed is, in
sufficiency of evidence, and it would be difficult fact, important is more doubtful. Moreover,
to do so. Construct validity exists when every- what one eats, wears, listens to, reads, and so on
thing fits together, but trying to specify the are rightly considered to reveal something about
number and nature of the specific pieces of that individual. While the general proposition
evidence would be difficult and, perhaps, about responses to ambiguous stimuli appears
antithetical to the idea itself. It is also not quite reasonable, the use of such stimuli in the
possible to quantify level or degree of construct form of projective tests has proven problematic
validity other than in a very rough way and such and of limited utility.
quantifications are, in our experience, rare. It is The course of development of clinical
difficult to think of an instance of a measure assessment might have been different and more
described as having ªmoderate or ªlowº con- useful had it been realized that projection was
struct validity, although ªhighº construct the wrong term for the link between ambiguous
validity is often implied. stimuli and personality. A better term would
It is possible to imagine what some of the have been the ªexpressive hypothesis,º the
necessary conditions for construct validity notion that an individual's personality may
might be, one notable requirement being be manifest (expressed) in response to a wide
convergent validity (Campbell & Fiske, 1959). range of stimuli, including ambiguous stimuli.
In some manner that we have not tried to trace, Personality style might have come to be of
conditions necessary for construct validity came greater concern, and unconscious determinants
Fateful Events Contributing to the History of Clinical Assessment 17

of behavior, implied by projection, might have not only do tests such as the Rorschach and
received less emphasis. TAT continue to be used, but new projective
In any case, when clinical psychology adopted tests continue to be developed. That could be
the projective hypothesis and bought wholesale considered a pseudoscientific enterprise that, at
into the idea of unconscious determinants of best, yields procedures telling clinical psychol-
behavior, that set the field on a course that has ogists what they at least should already know or
been minimally productive but that still affects have obtained in some other manner, and that,
an extraordinarily wide range of clinical at worst, wastes time and money and further
activities. Observable behaviors have been damages the credibility of clinical psychology.
downplayed and objective measures treated
with disdain or dismissed altogether. The idea
of peering into the unconscious appealed both 4.01.5.5 The Invention of the Objective Test
to psychological voyeurs and to those bent
At one time we had rather supposed without
on achieving the glamour attributed to the
thinking about it too much that objective tests
psychoanalyst.
had always been around in some form or other.
Research on projective stimuli indicates that
Samelson (1987), however, has shown that at
highly structured stimuli which limit the dis-
least the multiple-choice test was invented in the
positions tapped increase the reliability of such
early part of the twentieth century, and it seems
tests (e.g., Kagan, 1959). In achieving acceptable
likely that the true±false test had been devised
reliability, the nature of the test is altered in such
not too long before then. The objective test
a way that the stimulus is less ambiguous and the
revolutionized education in ways that Samelson
likelihood of an individual ªprojectingº some
makes clear, and it was not long before that
aspect of their personality in an unusual way
form of testing infiltrated into psychology.
becomes reduced. Thus, the dependability of
Bernreuter (1933) is given credit for devising the
responses to projective techniques probably
first multiphasic (multidimensional) personality
depends to an important degree on sacrificing
inventoryÐonly 10 years after the introduction
their projective nature. In part, projective tests
of the Rorschach into psychology.
seem to have failed to add to assessment
Since 1933, objective tests have flourished. In
information because most of the variance in
fact, they are now much more widely used than
responses to projective stimuli is accounted for
projective tests and are addressed toward almost
by the stimuli themselves. For example, ªpop-
every imaginable problem and aspect of human
ularº responses on the Rorschach are popular
behavior. The Minnesota Multiphasic Person-
because the stimulus is the strongest determi-
ality Inventory (1945) was the truly landmark
nant of the response (Murstein, 1963).
event in the course of development of paper-and-
Thorndike (Thorndike & Hagen, 1955,
pencil instruments for assessing clinical aspects
p. 418), in describing the state of affairs with
of psychological functioning. ªPaper-and-pen-
projective tests some 40 years ago, stated:
cilº is often used synonymously with ªobjectiveº
in relation to personality. From that time on,
A great many of the procedures have received very other measures flourished, of recent in great
little by way of rigorous and critical test and are profusion.
supported only by the faith and enthusiasm of Paper-and-pencil tests freed clinicians from
their backers. In those few cases, most notable that
of the Rorschach, where a good deal of critical
the drudgery of test administration, and in that
work has been done, results are varied and there is way they also made testing relatively inexpen-
much inconsistency in the research picture. Mod- sive as a clinical enterprise. They also made tests
est reliability is usually found, but consistent readily available to psychologists not specifi-
evidence of validity is harder to come by. cally trained on them, including psychologists at
subdoctoral levels. Paper-and-pencil measures
also seemed so easy to administer, score, and
The picture has not changed substantially in
interpret. As we have noted previously, the ease
the ensuing 40 years and we doubt that it is
of creation of new measures had very sub-
likely to change much in the next 40. As Adcock
stantial effects on the field, including clinical
(1965, cited in Anastasi, 1988) noted, ªThere are
assessment.
still enthusiastic clinicians and doubting statis-
ticians.º As noted previously (Sechrest, 1963,
1968), these expensive and time-consuming 4.01.5.6 Disinterest in Basic Psychological
projective procedures add little if anything to Processes
the information gained by other methods and
their abandonment by clinical psychology Somewhere along the way in its development,
would not be a great loss. Despite lack of clinical assessment became detached from the
incremental validity after decades of research, mainstream of psychology and, therefore, from
18 The Role of Assessment in Clinical Psychology

the many developments in basic psychological ogists carry out assessments.) Whether the great
theory and knowledge. The Rorschach was diversity is a curse or a blessing depends on one's
conceived not as a test of personality per se but point of view. We think that a useful perspective
in part as an instrument for studying perception is provided by contrasting psychological mea-
and Rorschach referred to it as his ªexperimentº sures with those typically used in medicine,
(Hunt, 1956). Unfortunately, the connections of although, obviously, a great many differences
the Rorschach to perception and related mental exist between the two enterprises. Succinctly,
processes were lost, and clinical psychology however, we can say that most medical tests are
became preoccupied not with explaining how very narrow in their intent, and they are devised
Rorschach responses come to be made but in to tap basic states or processes. A screening test
explaining how Rorschach responses reflect for tuberculosis, for example, involves subcu-
back on a narrow range of potential determi- taneous injection of tuberculin which, in an
nants: the personality characteristics of respon- infected person, causes an inflammation at the
dents, and primarily their pathological point of injection. The occurrence of the
characteristics at that. inflammation then leads to further narrowly
It is testimony to the stasis of clinical focused tests. The inflammation is not tubercu-
assessment that three-quarters of a century losis but a sign of its potential existence. A
after the introduction of the Rorschach, a creatinine clearance test is a test of renal
period of time marked by stunning (relatively) function based on the rate of clearance of
advances in understanding of such basic ingested creatinine from the blood. A creatinine
psychological processes as perception, cogni- clearance test can indicate abnormal renal
tion, learning, and motivation and by equivalent functioning, but it is a measure of a fundamental
or even greater advances in understanding of the physiological process, not a state, a problem, a
biological structures and processes that underlie disease, or anything of that sort. A physician
human behavior, the Rorschach continues, who is faced with the task of diagnosing some
virtually unchanged, to be the favorite instru- disease process involving renal malfunction will
ment for clinical assessment. The Exner System, use a variety of tests, not necessarily specified by
although a revision of the scoring system, in no a protocol (battery) to build an information
way reflects any basic changes in our advance- base that will ultimately lead to a diagnosis.
ment of understanding of the psychological By contrast, psychological assessment is, by
knowledge base in which the Rorschach is, or and large, not based on measurement of basic
should be, embedded. Take, just for one psychological processes, with few exceptions.
instance, the great increase of interest in and Memory is one function that is of interest to
understanding of ªprimingº effects in cognition; neuropsychologists, and occasionally to others,
those effects would clearly be relevant to the and instruments to measure memory functions
understanding of Rorschach responses, but do exist. Memory can be measured indepen-
there is no indication at all of any awareness dently of any other functions and without
on the part of those who write about the regard to any specific causes of deficiencies.
Rorschach that any such effect even exists. It Reaction time is another basic psychological
was known a good many years ago that process. It is currently used by cognitive
Rorschach responses could be affected by the psychologists as a proxy for mental processing
context of their administration (Sechrest, 1968), time, and since the 1970s, interest in reaction
but without any notable effect on their use in time as a marker for intelligence has grown and
assessment. become an active research area.
Nor do any other psychological instruments For the most part, however, clinical assess-
show any particular evidence of any relationship ment has not been based on tests of basic
to the rest of the field of psychology. Clinical psychological functions, although the Wechsler
assessment could have benefited greatly from a intelligence scales might be regarded as an
close and sensitive connection to basic research exception to that assertion. A very large number
in psychology. Such a connection might have of psychological instruments and procedures
fostered interest in clinical assessment in the are aimed at assessing syndromes or diagnostic
development of instruments for the assessment conditions, whole complexes of problems.
of basic psychological processes. Scales for assessing attention deficit disorder
Clinical psychology hasÐis afflicted with, we (ADD), suicide probability, or premenstrual
might sayÐan extraordinary number of differ- syndrome (PMS) are instances. Those instru-
ent tests, instruments, procedures, and so on. It ments are the equivalent of a medical ªTest for
is instructive to consider the nature of all these Diabetes,º which does not exist. The Conners'
tests; they are quite diverse. (We use the term Rating Scales (teachers) for ADD, for example,
ªtestº in a somewhat generic way to refer to the has subscales for Conduct Problem, Hyperac-
wide range of mechanisms by which psychol- tivity, Emotional Overindulgent, Asocial,
Missed Signals 19

Anxious-Passive, and Daydream-Attendance. the attention of clinician-assessors by adver-


Several of the very same problems might well be tisers. It would be astonishing to think of a
represented on other instruments for entirely medical test advertised as ªdiagnoses brain
different disorders. But if they were, they would tumors in only 15 minutes,º or ªcomplete
involve a different set of items, perhaps with a diabetes workup in only 30 minutes.º An MRI
slightly different twist, to be integrated in a examination for a patient may take up to several
different way. Psychology has no standard ways hours from start to finish, and no one suggests a
of assessing even such fundamental dispositions ªshort formº of one. Is it imaginable that one
as ªasocial.º could get more than the crudest notion of
One advantage of the medical way of doing childhood depression in 15±20 minutes?
things is that tests like creatinine clearance have
been used on millions of persons, are highly
standardized, have extremely well-established 4.01.6 MISSED SIGNALS
norms, and so on. Another set of ADD scales, At various times in the development of
the Brown, assesses ªability to activate and clinical psychology, opportunities existed to
organize work tasks.º That sounds like an guide, or even redirect, assessment activities in
important characteristic of children, so impor- one way or another. Clinical psychology might
tant that one might think it would be widely very well have taken quite a different direction
used and useful. Probably, however, it appears than it has (Sechrest, 1992). Unfortunately, in
only on the Brown ADD Scales, and it is our view, a substantial number of critical
probably little understood otherwise. ªsignals to the field were missed, and entailed
Clinical assessment has also not had the in missing them was failure to redirect the field
benefit of careful study from the standpoint of in what would have been highly constructive
basic psychological processes that affect the ways.
clinician and his or her use and interpretation of
psychological tests. Achenbach (1985), to cite a
useful perspective, discusses clinical assessment 4.01.6.1 The Scientist±Practitioner Model
in relation to the common sources of error in
human judgment. Achenbach refers to such We do not have the space to go into the
problems as illusory correlation, inability to intricacies of the scientist±practitioner model of
assess covariation, and the representativeness training and practice, but it appears to be an idea
and availability heuristics and confirmatory whose time has come and gone. Suffice it to say
bias described by Kahneman, Slovic, and here that full adoption of the model would not
Tversky (1982). Consideration of these sources have required every clinical practitioner to be a
of human, that is, general, error in judgment researcher, but it would have fostered the idea
would be more likely if clinical assessment were that to some extent every practitioner is respons-
more attuned to and integrated into the main- ible for the scientific integrity of his or her own
stream developments of psychology. practice, including the validity of assessment
We do not suppose that clinical assessment procedures. The scientist±practitioner model
should be limited to basic psychological might have helped clinical psychologists to be
processes; there may well be a need for involved in research, even if only as contributors
syndrome-oriented or condition-oriented in- rather than as independent investigators.
struments. Without any doubt, however, clin- That involvement could have been of vital
ical assessment would be on a much firmer importance to the field. The development of
footing if from the beginning psychologists had psychological procedures will never be sup-
tried to define and measure well a set of ported commercially to any appreciable extent,
fundamental psychological processes that could and if they are to be adequately developed, it will
be tapped by clinicians faced with diagnostic or have to be with the voluntaryÐand
planning problems. enthusiasticÐparticipation of large numbers
Unfortunately, measurement has never been of practitioners who will have to contribute
taken seriously in psychology, and it is still data, be involved in the identification of
lightly regarded. One powerful indicator of the problems, and so on. That participation would
casual way in which measurement problems are have been far more likely had clinical psychology
met in clinical assessment is the emphasis placed stuck to its original views of itself (Sechrest,
on brevity of measures. ª. . . entire exam can be 1992).
completed. . . in just 20 to 30 minutesº (for head
injury), ªcompleted in just 15±20 minutesº 4.01.6.2 Construct Validity
(childhood depression), ª39 itemsº (to measure
six factors involved in ADD) are just a few of the We have already discussed construct validity
notations concerning tests that are brought to at some length, and we have explained our view
20 The Role of Assessment in Clinical Psychology

that the idea has been trivialized, in essence Unfortunately, Lindzey's paper appears to
abandoned. That is another lost opportunity, have been only infrequently cited and to have
because the power of the original formulation been substantially ignored by those who were
by Cronbach and Meehl (1955) was great. Had engaged in turning out all those projective tests,
their work been better understood and honestly inventories, scales, and so on. At this point we
adopted, clinical psychology would by this time know virtually nothing more about the perfor-
almost certainly have had a set of well-under- mance of persons on clinical instruments than
stood and dependable measures and proce- was known by Lindzey in 1952. Perhaps even
dures. The number and variety of such measures less.
would have been far less than exists now, and
the dependability of them would have been
circumscribed, but surely it would have been 4.01.6.4 Antecedent Probabilities
better to have good than simply many measures.
In 1955 Meehl and Rosen published an
exceptional article on antecedent probabilities
and the problem of base rates. The article was,
4.01.6.3 Assumptions Underlying Assessment perhaps, a bit mathematical for clinical psy-
Procedures chology, but it was not really difficult to
In 1952, Lindzey published a systematic understand, and its implications were clear.
analysis of assumptions underlying the use of Whenever one is trying to predict (or diagnose) a
projective techniques (Lindzey, 1952). His paper characteristic that is quite unevenly distributed
was a remarkable achievement, or would have in a population, the difficulty in beating the
been had anyone paid any attention to it. The accuracy of the simple base rates is formidable,
Lindzey paper could have served as a model and sometimes awesomely so. For example, even in
stimulus for further formulations leading to a a population considered at high risk for suicide,
theory, comprehensive and integrated, of per- only a very few persons will actually commit
formance on clinical instruments. A brief listing suicide. Therefore, unless a predictive measure is
of several of the assumptions must suffice to extremely precise, the attempt to identify those
illustrate what he was up to: persons who will commit suicide will identify as
suicidal a relatively large number of ªfalse-
IV. The particular response alternatives emitted
positives,º that is, if one wishes to be sure not to
are determined not only by characteristic response miss any truly suicidal people, one will include in
tendencies (enduring dispositions) but also by the ªpredicted suicideº group a substantial
intervening defenses and his cognitive style. number of people not so destined. That problem
is a serious to severe limitation when the cost of
XI. The subject's characteristic response tenden- missing a true-positive is high, but so, relatively,
cies are sometimes reflected indirectly or symbo- is the cost of having to deal with a false-positive.
lically in the response alternatives selected or More attention to the difficulties described by
created in the test situation. Meehl and Rosen (1955) would have moved
psychological assessment in the direction taken
XIII. Those responses that are elicited or pro- by medicine, that is, the use of ROCs. Although
duced under a variety of different stimulus condi- ROCs do not make the problem go away, they
tions are particularly likely to mirror important keep it in the forefront of attention and require
aspects of the subject. that those involved, whether researchers or
clinicians, deal with it. That signal was missed in
XV. Responses that deviate from those typically clinical psychology, and it is scarcely mentioned
made by other subjects to this situation are more in the field today. Many indications exist that a
likely to reveal important characteristics of the
large proportion of clinical psychologists are
subject than modal responses which are more like
those made by most other subjects. quite unaware that the problem even exists, let
alone that they have an understanding of it.
These and other assumptions listed by Lindzey
could have provided a template for systematic 4.01.6.5 Need for Integration of Information
development of both theory and programs of
research aimed at supporting the empirical base Many trends over the years converge on the
for projectiveÐand otherÐtesting. Assump- conclusion that psychology will make substan-
tion XI, for example, would lead rather natu- tial progress only to the extent that it is able to
rally to the development of explicit theory, integrate its theories and knowledge base with
buttressed by empirical data, which would those developing in other fields. We can address
indicate just when responses probably should this issue only on the basis of personal
and should not be interpreted as symbolic. experience; we can find no evidence for our
Missed Signals 21

view. Our belief is that clinical assessment in included ªvalidityº scales that were meant to
psychology rarely results in a report in which detect, and, in the case of the K-scale, even
information related to a subject's genetic correct for, methods effects such as lying,
disposition, family structure, social environ- random responding, faking, and so on. By
ment, and so on are integrated in a systematic 1960 or so, Jackson and Messick had begun to
and effective way. publish their work on response styles in
For example, we have seen many reports on objective tests, including the MMPI (e.g.,
patients evaluated for alcoholism without any Jackson & Messick, 1962). At about the same
attention, let alone systematic attention, to a time, Berg (1961) was describing the ªdeviant
potential genetic basis for their difficulty. At response tendency,º which was the hypothesis
most a report might include a note to the effect that systematic variance in test scores could be
that the patient has one or more relatives with attributed to general tendencies on the part of
similar problems. Never was any attempt made some respondents to respond in deviant ways.
to construct a genealogy that would include Nonetheless, it was the Campbell and Fiske
other conditions likely to exist in the families of (1959) paper that brought the idea of method
alcoholics. The same may be said for depressed variance to the attention of the field.
patients. It might be objected that the respon- Unfortunately, the cautions expressed by
sibilities of the psychologist do not extend into Campbell and Fiske, as well as by others
such realms as genetics and family and social working on response styles and other method
structure, but surely that is not true if the effects, appear to have had little effect on
psychologist aspires to be more than a sheer developments in clinical assessment. For the
technician, for example, serving the same most part, the problems raised by methods
function as a laboratory technician who effects and response styles appear to have been
provides a number for the creatinine clearance pretty much ignored in the literature on clinical
rate and leaves it to someone else, ªthe doctor,º assessment. A search of a current electronic
to put it all together. database in psychology turned up, for example,
That integration of psychological and other only one article over the past 30 years or so
information is of great importance has been linking the Rorschach to any discussion of
implicitly known for a very long time. That method effects (Meyer, 1996). When one
knowledge has simply never penetrated training considers the hundreds of articles having to
programs and clinical practice. That missed do with the Rorschach that were published
opportunity is to the detriment of the field. during that period of time, the conclusion that
method effects have not got through to the
attention of the clinical assessment community
4.01.6.6 Method Variance is unavoidable. The consequence almost surely
is that clinical assessments are not being
The explicit formulation of the concept of corrected, at least not in any systematic way,
method variance was an important develop- for method effects and response biases.
ment in the history of assessment, but one whose
import was missed or largely ignored. The
concept is quite simple: to some extent, the value 4.01.6.7 Multiple Measures
obtained for the measurement of any variable
depends in part on the characteristics of the At least a partial response to the problem of
method used to obtain the estimate. (A key idea method effects in assessment is the use of
is the understanding that any specific value is, in multiple measures, particularly measures that
fact, an estimate.) The first explicit formulation do not appear to share sources of probable error
of the idea of method variance was the seminal or bias. That recommendation was explicit in
Campbell and Fiske paper on the ªmultitrait- Campbell and Fiske (1959), and it was echoed
multimethod matrixº (Campbell & Fiske, and elaborated upon in 1966 (Webb et al.,
1959). (That paper also introduced the very 1966), and again in 1981 (Webb et al., 1981).
important concepts of ªconvergentº and ªdis- Moreover, Webb and his colleagues warned
criminantº validity, now widely employed but, specifically against the very heavy reliance on
unfortunately, not always very well under- self-report measures in psychology (and other
stood.) There had been precursors of the idea of social sciences). That warning, too, appears to
method variance. In fact, much of the interest in have made very little difference in practice.
projective techniques stemmed from the idea Examination of catalogs of instruments meant
that they would reveal aspects of personality to be used in clinical assessment will show that a
that would not be discernible from, for example, very large proportion of them depend upon self-
self-report measures. The MMPI, first pub- reports of individual subjects about their own
lished in 1943 (Hathaway & McKinley), dispositions, and measures that do not rely
22 The Role of Assessment in Clinical Psychology

directly on self-reports nonetheless do nearly all tion that sensory discrimination is indicative of
rely solely on the verbal responses of subjects. intelligence continues to be promoted and
Aside from rating scales to be used with parents, investigated (e.g., Jensen, 1992). Galton also
teachers, or other observers of behavior, used questionnaire, rating scale, and free
characteristics of interest such as personality association techniques to gather data.
and psychopathology almost never require James McKeen Cattell, the first American
anything of a subject other than a verbal report. student of Wundt, is credited with initiating the
By contrast, ability tests almost always require individual differences movement. Cattell, an
subjects to do something, solve a problem, important figure in American psychology,
complete a task, or whatever. Wallace (1966) (Fourth president of the American Psychologi-
suggested that it might be useful to think of cal Association and the first psychologist elected
traits as abilities, and following that lead might to the National Academy of Science) became
very well have expanded the views of those interested in whether individual differences in
interested in furthering clinical assessment. reaction time might shed light on consciousness
and, despite Wundt's opposition, completed his
dissertation on the topic. He wondered if, for
4.01.7 THE ORIGINS OF CLINICAL example, some individuals might be observed to
ASSESSMENT have fast reaction time across situations and
supposed that the differences may have been lost
The earliest interest in clinical assessment was
in the averaging techniques used by Wundt and
probably that used for the classification of the
other experimental psychologists (Wiggins,
ªinsaneº and mentally retarded in the early
1973). Cattell later became interested in the
1800s. Because there was growing interest in
work of Galton and extended his work by
understanding and implementing the humane
applying reaction time and other physiological
treatment of these individuals, it was first
processes as measures of intelligence. Cattell is
necessary to distinguish between the two types
credited with the first published reference to a
of problems. Esquirol (1838), a French physi-
mental test in the psychological literature
cian, published a two-volume document out-
(Cattell, 1890).
lining a continuum of retardation based
Cattell remained influenced by Wundt in his
primarily upon language (Anastasi, 1988).
emphasis on psychophysical processes.
Assessment in one form or another has been
Although physiological functions could be
part of clinical psychology from its beginnings.
easily and accurately measured, attempts to
The establishment of Wundt's psychological
relate them to other criteria, however, such as
laboratory at Leipzig in 1879 is considered by
teacher ratings of intelligence and grades,
many to represent the birth of psychology.
yielded poor results (Anastasi, 1988).
Wundt and the early experimental psychologists
Alfred Binet conducted extensive and varied
were interested in uniformity rather than
research on the measurement of intelligence. His
assessment of the individual. In the Leipzig
many approaches included measurements of
lab, experiments investigated psychological
cranial, facial, and hand form, handwriting
processes affected by perception, in which
analysis, and inkblot tests. Binet is best known
Wundt considered individual differences to be
for his work in the development of intelligence
error. Accordingly, he believed that since
scales for children. The earliest form of the scale,
sensitivity to stimuli differs, using a standard
the Binet±Simon, was developed following
stimulus would compensate and thus eliminate
Binet's appointment to a governmental com-
individual differences (Wundt, Creighton, &
mission to study the education of retarded
Titchener, 1894/1896).
children (Binet & Simon, 1905). The scale
assessed a range of abilities with emphasis on
4.01.7.1 The Tradition of Assessment in comprehension, reasoning, and judgment. Sen-
Psychology sorimotor and perceptual abilities were rela-
tively less prominent, as Binet considered the
Sir Francis Galton's efforts in intelligence and broader process, for example, comprehension,
heritability pioneered both the formal testing to be central to intelligence. The Binet±Simon
movement and field testing of ideas. Through scale consisted of 30 problems arranged in order
his Anthropometric Laboratory at the Interna- of difficulty. These problems were normed using
tional Exposition in 1884, and later at the South 50 3±11-year-old normal children and a few
Kensington Museum in London, Galton gath- retarded children and adults.
ered a large database on individual differences A second iteration, the 1908 scale, was
in vision, hearing, reaction time, other sensor- developed. The 1908 scale was somewhat longer
imotor functions, and physical characteristics. and normed on approximately 300 3±13-year-
It is interesting to note that Galton's proposi- old normal children. Performance was grouped
The Rorschach Inkblot Technique and Clinical Psychology 23

by age according to the level at which 80±90% controversial ways by both Yerkes and E. G.
of the normal children passed, giving rise to the Boring to assess average American intelligence
term ªmental age.º levels (see Yerkes, 1921, 1941). Despite what-
The Binet±Simon has been revised, trans- ever controversy may have arisen over the years,
lated, and adapted in numerous languages. the army continues to use testing to assess
Perhaps the most well-known revision was aptitudes (Jensen, 1985).
directed by Lewis Terman (1916) at Stanford
University and this test is what is known as the
Stanford±Binet. The Stanford±Binet was the 4.01.8 THE RORSCHACH INKBLOT
origin of the intelligence quotient (IQ), the ratio TECHNIQUE AND CLINICAL
between chronological and mental ages. PSYCHOLOGY
The history of the Rorschach Inkblot
4.01.7.1.1 Witmer Technique is in many ways a reflection of the
history of clinical psychology in America.
Lightner Witmer, who studied with both Clinical psychology continues to struggle with
Cattell and Wundt, established the first Amer- competing world views focusing on the nature
ican psychological clinic at the University of of reality, the mind, and human behavior. In
Pennsylvania in 1896. This event is considered clinical psychology the debate about how to
by many as the beginning of clinical psychology view the mind and behavior is usually expressed,
(Garfield, 1965; McReynolds, 1987, 1996). broadly speaking, as poles of a dimension
Witmer's approach to assessment was focused anchored by only observable behavior at one
on determining the causes of children's pro- end, the influences of conscious mental pro-
blems and then to make recommendations for cesses (i.e., cognition) more in the center, and
treatment. Diagnoses, per se, were not con- unconscious mental processes anchoring the
sidered important, however, Witmer did make other end. The relative importance of obser-
use of the Stanford±Binet and other formal vable behavior and unconscious mental pro-
assessment tools. McReynolds (1996) noted cesses alternate with the intellectual fashions of
that Witmer strongly emphasized both direct the times.
observation and extensive background data as The role of the clinical psychologist as
especially important for assessment. scientist, diagnostician, and therapist continue
Although Witmer characterized his work as to change, with a growing fracture between the
practical, he remained committed to a scientific scientifically and the clinically oriented. A
basis for psychology (McReynolds, 1996). It central focus of debate has to do with molar
seems reasonable to conclude that Witmer was vs. molecular views of personality and the ways
interested in assessment for bounded inference in which personality is assessed. Conflict over
and prediction. That is, he wanted information the use of the Rorschach is characteristic of the
as it might relate to specific problems for the debate and perturbing in light of long-standing
express purpose of treating those problems doubts about the psychometric adequacy and
(Witmer, 1996/1907). the clinical usefulness of the instrument. An
additional factor in the ongoing conflict in
4.01.7.1.2 Army Alpha psychology seems to be that in psychology, alas,
like old soldiers, theories never die. Even if
Robert M. Yerkes initiated and administered refuted, they are not replaced, they only very
a program to test 1.75 million army recruits gradually fade away (Meehl, cited by Lykken,
during World War I. This program, which 1991).
Yerkes developed in conjunction with Terman
and H. H. Goddard, administered the Army 4.01.8.1 The Social and Philosophical Context
Alpha written mental test to recruits. Illiterate for the Appearance of the Rorschach
recruits and those failing the Alpha were given a
picture-based test called the Army Beta. Although the Rorschach was first introduced
Yerkes hoped that the army could be in the United States in 1925, it was during the
ªengineeredº by classifying the intelligence 1940s and 1950s that the Rorschach rose to
and capabilities of all recruits. To that end, prominence in clinical psychology. The prevail-
recruits were graded from A through E and ing theoretical views in American academic
Yerkes recommended that they be assigned psychology during the early years of the
rank and tasks according to their tested ability. Rorschach were Gestalt and behaviorism. In
Although the army did not use the results many ways the interest and devotion of
uniformly, in many instances recruits for officer Rorschach proponents to the technique seems
training were required to have an A or B grade to have been a reaction against what they saw as
on the Alpha. The tests results were later used in reductionist and positivistic approaches to
24 The Role of Assessment in Clinical Psychology

personality assessment on the part of behavior- vincing evidence of validity in decades of


ists and often atheoretical psychometricians. attempts to find it. The planes still don't land.
Additionally, behaviorists focused on environ-
mental determinants of behavior at the same 4.01.8.2 The Birth of the Rorschach
time that psychoanalytic theory, in spite of its
rejection in much of academia, was beginning to Whether and how to use the Rorschach has
flourish in clinical psychology. Moreover, by been a source of controversy since its introduc-
the late 1940s, many psychologists were inter- tion. Perhaps much of the controversy and
ested in reviving the notion of the self, which had dissent about scoring and interpretation of
been rejected by behaviorism and psycho- responses to the inkblots among advocates of
analysis (Reisman, 1991). the technique were a result of its founder's death
Proponents of the Rorschach believed that a few months after the publication of his initial
underlying dimensions of ªtrueº personality monograph detailing 10 years of studies with
could be elicited only by indirect, projective inkblots, leaving a nascent method open to
methods; defense mechanisms, repression, and various interpretations. The original notions of
perhaps other unconscious processes prevented using the technique tentatively and experimen-
an individual from having access to critical tally began fading with its founder's death,
information about him- or herself. Direct assess- being replaced by an overriding concern for
ment of personality was narrow and incomplete, clinical uses.
but the ambiguity of the inkblot stimulus Herman Rorschach, the son of a Swiss art
material would elicit true responses. Because teacher, began experimenting with inkblots in
during the 1940s and 1950s testing was virtually various psychopathic hospitals in 1911, the year
the only applied professional activity performed after completing his medical training (Klopfer &
by clinical psychologists (Millon, 1984), it is not Kelley, 1942). The Rorschach method was
surprising that the Rorschach would generate a introduced in the United States in 1925 by
great deal of interest and activity. What is David Levy, a psychologist and psychiatrist
surprising is that a test criticized even then and (Hertz, 1986; Klopfer & Kelley, 1942), who had
continuously until now as being too subjective in been a student of Emil Oberholzer, Rorschach's
administration, scoring, and interpretation, of closest medical colleague and who continued
questionable reliability, and of dubious validity, Rorschach's work after his death. Levy taught
would be continually used for 70 years. the technique to Samuel Beck, who wrote his
Rorschach proponents did claim to view the dissertation on the technique and published the
technique as scientific, and there were attempts first manual on the Rorschach in 1937 (Exner,
to establish norms and to approach the 1969; Hertz, 1986; Klopfer & Kelley, 1942).
Rorschach scientifically, but we view the Beck and Bruno Klopfer were probably the
Rorschach ultimately as what Richard Feyn- most influential individuals in terms of widening
man (1986) refers to as ªCargo Cult Science:º the use of the technique, as well as in fomenting
debate about how to score and interpret
In the South Seas there is a cargo cult of people. Rorschach responses. Beck was more behavior-
During the war, they saw airplanes land with lots al and experimental in his approach and
of good materials, and they want the same thing to strongly advocated establishing norms and
happen now. So they've arranged to make things testing the validity of responses. Klopfer, a
like runways, to put fires along the sides of the German who had studied with Jung in Switzer-
runways, to make a wooden hut for a man to sit on, land after fleeing Hitler and before coming to
with two wooden pieces on his head like head- the United States, was much more inferential in
phones and bars of bamboo sticking out like
antennasÐhe's the controllerÐand they wait for
his interpretation and scoring. Rorschach
the airplanes to land. They're doing everything himself was considerably more tentative about
right. The form is perfect. It looks just the way it his findings than subsequent proponents of the
looked before. But it doesn't work. No airplanes technique were, or than they seem to be to this
land. So I call these things cargo cult science, day.
because they follow all the apparent percepts and It is likely that dissemination of the
forms of scientific investigation, but they're miss- Rorschach was actually helped by the con-
ing something essential, because the planes don't troversy and dissent within the ranks of
land. Rorschach adherents, as well as by the fight
against perceived rigid standards of psycho-
The Rorschach technique is missing some- metrics and nomothetic personality theories.
thing essential. Although as we stated earlier, The internal debate among adherents of various
people almost certainly ªprojectº aspects of systems of scoring and interpretation seemed to
their personality on to ambiguous stimuli, use of foster beliefs that the findings finally proving
the Rorschach has failed to demonstrate con- them right were just around the corner. This
The Rorschach Inkblot Technique and Clinical Psychology 25

belief of imminent justification seems to simple: when there is a database of cases with
characterize even present day Rorschach pro- known outcome, can a skilled clinician use his or
ponents. her judgment to combine the relevant informa-
Another faction of Rorschach adherents with tion about a client into the correct formulations
more interest in applying the Rorschach to (predictions) as well as or better than a
clinical cases took the view that assessment and statistical formula that uses the same informa-
prediction based on clinical judgment and tion? The answer, based on numerous studies in
acumen are inherently superior to psychometric which clinicians had as much or more informa-
and statistical assessment and prediction. Dur- tion as was entered into the statistical predic-
ing the 1950s and 1960s, the emphasis shifted tion, is no. Clinicians occasionally equal but
from scores and scoring systems to the utiliza- never exceed statistical predictions of behavior,
tion of clinical acumen and sensitivity, and diagnoses, psychotherapy outcome, and like
attempts to understand subtle aspects of the events of interest. The preponderance of
entire testing situation (Sarason, 1954). As the evidence favors statistical prediction.
role of the clinical psychologist expanded into Even when statistical models are based upon
more applied clinical activity, practitioners' the information used by clinicians, the models
attention to the experimental scientific roots of outperform the clinicians on whom they are
the discipline began fading. With this movement based (Dawes et al., 1989). Exceptions do occur
further from a scientific basis for theories and in circumstances of events that reverse the
techniques, the theories promoted by academic actuarial formula or of judgments mediated by
psychologists were considered mechanistic by theories that are, therefore, difficult or even
most practitioners. As a result, the academics' impossible to duplicate statistically (Dawes
criticisms about projectives such as the et al., 1989). When such information is available
Rorschach were increasingly viewed as invalid to clinicians, and those circumstances may be
(or, perhaps worse, as irrelevant). In our reading infrequent, they are likely to outperform
of the literature, it appears that even those statistical models. Meehl (1954) referred to
Rorschach supporters who believe science is these rare events as the broken leg phenomenon.
important cling to the ªCargo Cult Scienceº of That name was derived from an illustration in
ratios and scoring systems lacking in empirical which a statistical formula is highly successful in
support but with the expectation of redemption predicting an individual's weekly attendance at
almost momentarily expected. a movie, but should be discarded upon
This shift in the 1950s and 1960s to a focus on discovering that the subject is in a cast with a
clinical skills was in the context of the emergence fractured femur.
of psychotherapy as a primary professional One reason for the superiority of statistical
activity for psychologists. Erikson's theory of prediction is that clinicians tend to think that
psychosocial development was embraced, psy- too many cases are exceptions to ordinary rules
chodynamic theory in various forms (Adler, and, even in the case of rare events, they
Rank, Horney, Sullivan) was popular with ultimately perform better when they rely strictly
clinicians, and Rogerian humanistic psychology on statistical conclusions (Goldberg, 1968). The
emerged along with behavior modification and human mind is a poor computer and does not do
systematic desensitization (Reisman, 1991). In a good job at quantifying and weighting
psychiatry there were rapid advances in psy- observations, the very things that regression
chotropic medications. These changes in the equations were invented for (Goldberg, 1991).
field seemed to steel the resolve of clinicians who We do not mean to suggest that statistical
believed that human psychology could not be formulas can be used to perform psychother-
reduced to biology, classification, and statistical apy, or that the predictions could be made
formulas. Despite the lack of any demonstrated without first gathering the relevant observations
validity of the Rorschach from research studies, from clinicians. We also bear in mind that a
clinicians focused on the feedback they received, great many clinical decisions are made in
or thought they received, from clients, and circumstances in which there is no known
believed the Rorschach helped them to better outcome.
understand their clients. At about the same time The debate about clinical vs. statistical
as these developments, Paul Meehl (1954) prediction has been characterized by ad homi-
published an analysis of the general problem nem attacks, and Meehl (1954) started his book,
of clinical vs. statistical prediction. Clinical versus statistical prediction, with lists of
invective from both sides of the argument.
4.01.8.3 Clinical vs. Statistical Prediction Briefly, opponents of statistical prediction have
suggested that the approach is atomistic, inhu-
The central issue in relation to comparisons man, arbitrary, and oversimplified, while its
of clinical and statistical (actuarial) prediction is proponents suggest that it is objective, reliable,
26 The Role of Assessment in Clinical Psychology

rigorous, and scientific. Conversely, negative or questionnaire) methods because people are so
appraisals of clinical prediction suggest the repressed that they cannot describe their real
method is sloppy, muddleheaded, unscientific, emotions and impulses. A large body of literature
and vague, while its proponents suggest that the indicates the fallacy of this assumption. Even
within self-report items, more content obvious
method is dynamic, sensitive, meaningful, and
items prove to be more valid than subtle ones. Why
holistic (Meehl, 1954). give an hour test, with another hour to score, to get
The case for the use of psychodiagnostic tests a crude estimate of anxiety or depression which is
such as the Rorschach and the validity of clinical usually less reliable and valid than a short true±
observation of relationships between thoughts, false scale which takes a few minutes and where
behavior, and personality characteristics be- there is no unreliability of scoring? I have com-
comes questionable considering the findings pared direct and indirect (Rorschach and TAT)
about the questionable validity of clinical measures of dependency, anxiety, depression, and
judgments. Further, it has been known for a hostility using peer ratings as criteria. The most
long while that statements from clinical assess- indirect methods have zero validity, the most direct
methods have low to moderate validity, and
ments and psychological reports are often of
methods which are intermediate in directness
universal applicability (Forer, 1949). When (e.g., sentence completion) are intermediate in
previously prepared statements representative validity. A great deal of effort was expended in
of those in psychological evaluations are scoring content from TAT and Rorschach and
presented to a variety of individuals, the consensus agreement was obtained where disagree-
individuals enthusiastically agree that the ment in scoring occurred. All this was to no avail
statements uniquely apply to them. Therefore, because the two projectives did not correlate with
it seems that the very evidence often used by each other let alone with the criteria or any of the
clinicians, that their clients believe assessments direct methods. (Marvin Zuckerman, SSCPNET,
to be accurate and that they are helped by April 22, 1996)
assessment and treatment, affords no reassur-
ance. Much information provided by typical Although yet another scoring system for the
psychodiagnostic feedback is general and Rorschach has been used and researched for the
applies to almost anyone. The associations with past 20 years (Exner, 1974, 1993) with a greater
various personality characteristics, signs, and emphasis on standardization of scoring and
indicators may be more related to what any interpretation, it has yielded no apparent im-
astute observer has learned to associate with provement on the predictive or incremental
them through observation, folklore, and litera- validity of the technique. Criticisms of the
ture, that is, ªillusory correlationsº (Chapman research are nearly identical to those expressed
& Chapman, 1967, 1969; Reisman, 1991). It is in the 1940s and 1950s. Disturbingly, in spite of
likely that such illusory correlations are in- overwhelming evidence of their invalidity, clin-
volved in accounts of individuals known as icians tend to continue to rely on their impres-
ªRorschach Savants,º who are purported sions and interpretations of the content of
anecdotally to see phenomenal amounts of Rorschach responses (Reisman, 1991). It is
information in Rorschach responses. not precisely fair to say that the Rorschach is
It is astonishing that the Rorschach continues unrelated to anything, but its validity is so
to be not only very popular, but in many states is limited as to leave virtually no real utility for its
required as part of forensic psychological use. Most problematic, it is inferior to and more
assessment in child custody disputes (Robyn time-consuming than instruments with better
Dawes, personal communication). Reisman reliability and validity and the Rorschach
(1991) suggests that the failure of clinical appears to have zero incremental validity
psychologists to modify their behavior no (Sechrest, 1963).
matter how much aversive stimulation is applied
is less a refutation of Skinner's theory than 4.01.8.4 Old Tests Never Die, They Just Fade
evidence of a great capacity to distort informa- Away
tion. Many clinicians and even some researchers
continue to believe in the validity of the The continued drift of psychology away from
Rorschach (and other projective tests) in spite its scientific roots does not appear to be slowing.
of overwhelming evidence to the contrary and This drift seems additionally fueled by economic
almost universal agreement among the scientific and employment concerns and continued train-
community that the central assumption on ing of too many practitioners. The current
which the Rorschach is based is faulty. conflict is unlikely to slow as managed health
care and cutbacks in federal funding lessen job
The entire Rorschach is based on a fallacious opportunities, and the future of psychology is
assumption, namely that indirect (projective) uncertain. Clinical psychology, even in the
methods are more valid than direct (self-rating halcyon days of the scientist±practitioner model,
Other Measures Used in Clinical Psychology 27

was never resolute in its commitment to science. think, more because of discontent with the
For example, students coming into the field were obvious inadequacies of existing alternatives.
generally not required to have any particular We suspect that whatever its own inadequacies,
prior training in science, or its principal the Rorschach will not die but will only fade
handmaiden, mathematics, and they needed away when some alternative instrument or
only to declare a personal fealty to the idea of procedure becomes available and seems poten-
research. That situation has almost certainly tially to be a better one.
become much worse over the past two or three
decades of drift toward practitioner±scientist,
then practitioner±scholar, and then frankly 4.01.9 OTHER MEASURES USED IN
practitioner programs. The net result is that CLINICAL PSYCHOLOGY
clinical psychology has a huge number of
The list of measures that have been used in
practitioners who are not only ill-equipped to
clinical psychology is very long, and many
handle the demands of evaluating the scientific
appear simply to have faded away. For example,
basis for practice, but they are ill-disposed even
two projective tests that once had a spate of
to doing so. Economic pressures and their own
popularity are the Blacky Test and the Make-a-
incapacities make scientific evidence, which is at
Picture Story Test (MAPS) (Shneidman, 1986).
best likely to be disappointing, a threat.
The Blacky Test seems to have disappeared
Anecdotes, ªclinical experience,º and so on
altogether, and the MAPS is rarely encountered
are far more reassuring and, hence, attractive.
in the literature. Neither was ever demonstrated
Better to believe in an unproven instrument or
to be less reliable or less valid than other tests;
procedure than to be deprived of any basis for
each simply appears to have faded away, the
pride and survival.
Blacky probably because its version of psycho-
Lykken (1991) noted that present knowledge
analytic theory has also faded somewhat and
in psychology is very broad but very shallow.
the MAPS because it was cumbersome and slow
Most recently trained clinical psychologists
to administer. There is not much point in
probably have little acquaintance with the
recounting the histories of the many now
philosophy of science and not much knowledge
deservedly (even if not uniquely deserved)
of the clinical vs. statistical prediction literature;
forgotten tests.
certainly they have inadequate training in
measurement, statistics, and probability. This
ignorance of the roots of psychological theory 4.01.9.1 The Thematic Apperception Test
and scientific psychology contributes to the
continued use of a completely unjustifiable Morgan and Murray (1935) introduced the
procedure such as the Rorschach. It is difficult Thematic Apperception Test (TAT) based on
to refute disproven techniques and theories what they termed the ªwell-recognized factº
when a class of the profession basis its identity that when presented with ambiguous stimuli
and livelihood on them. The problem of theories people reveal their own personality. The TAT
fading away and reviving as suggested by consists of a series of pictures of ambiguous
Meehl's ªold soldiersº simile is not restricted social situations in which the examinee describes
to clinical psychology; psychology as a whole the social situation as they see it. The TAT was
operates in this way. originally designed to be interpreted in light of
In other sciences, each generation builds on psychoanalytic theory, the theory driving its
the foundations of the discipline's previous design. There were subsequently a variety of
scientists. Psychology seems to view its pre- scoring systems from different perspectives,
decessors as ªintrepid explorers who came back although none has improved on the recurrent
empty-handedº (Lykken, 1991). To be fair, problem of inconsistency in use from clinician to
establishing a psychological science is extremely clinician.
difficult because it is difficult to operationalize The TAT, as one might imagine, can be
psychological constructs and because there is scored more or less reliably, depending on the
notable measurement error. The profession and nature of the variable involved and the
practice of clinical psychology would be helped adequacy of its definition. The major problem
immensely, however, if we could better educate is what the scores may be related to and how
graduate students in philosophy of science, they may be interpreted. Over the many years of
measurement, and statistics, in addition to its existence, TAT scores have been related to
psychological theory. many different phenomena, sometimes with
The Rorschach did not come into prominence moderate success. The literature would show
originally because of evidence for its superiority that achievement has been extensively studied
over existing measures, for example, question- by way of the TAT (see Keiser & Prather, 1990)
naires and checklists. It was adopted eagerly, we as have other needs or motives. Although the
28 The Role of Assessment in Clinical Psychology

research is reasonably consistent in showing between the MMPI and functional capacities
some evidence for validity of some TAT scores or incapacities that would justify clinical
and the instrument has proven to be of some decisions other than to seek further information
value in research, the evidence was never strong about the client or patient.
enough to justify use of the TAT for individual The MMPI more than other available
decision-making in clinical settings. The TAT, instruments has been automated, to the extent
like most other clinical measures, can at best be of producing computer-based interpretations of
considered enlightening. test profiles. An unfortunate limitation of
computer-based interpretations is that, because
of their proprietary nature, the algorithms
4.01.9.2 Sentence Completion Tests underlying them are not available. Conse-
quently, one cannot know which interpreta-
Another variety of quasiprojective instru-
tions are based on empirical evidence and
ments is the sentence completion test, which
which, perhaps, on clinical lore, let alone how
consists of a stem, for example, ªWhen I was a
good the evidence might be. Such interpreta-
child,º that the respondent is supposed to make
tions must be accepted on faith. When the
into a complete sentence by writing down his or
MMPI is used in a fully automatic mode, it is
her own thoughts. The sentence completion test,
questionable whether it even should be con-
of which the Rotter Incomplete Sentences Blank
sidered a clinical assessment.
(Rotter & Rafferty, 1950) is the best known
version, probably evolved from word associa-
tion tests, which go back to Galton, Cattell, and
Kraepelin in the latter part of the nineteenth 4.01.9.4 The Clinician as a Clinical Instrument
century (Anastasi, 1988). The Rotter ISB was
Clinical psychology has never been comple-
considered to be a measure of psychological
tely clear about whether it wishes to distinguish
conflict and, therefore, adjustment, and like so
between the testÐa toolÐand the test-in-the-
many other measures, under the right circum-
hands-of-a-user. The perspective of standar-
stances, it could be scored in a reasonably
dized testing implies that the test is a tool that, in
dependable way and could result in ªsignifi-
the hands of any properly trained user, should
cantº validity coefficients. That is to say, the
produce the same results for any given exam-
ISB could be shown variously and not invari-
inee. Many clinical instruments, however,
ably to be correlated around 0.30 with criteria
cannot be considered to be so tightly standar-
thought by someone to be of interest. Those
dized, and it is to be expected that results might
correlations might be useful for some research
differ, perhaps even substantially, from one
purposes, but they were not grounds for much
examiner to another, even for the same
confidence in clinical settings. They may,
examinee. Within reason, at least, an examinee's
however, in the minds of many clinicians have
performance on a vocabulary test or a trail-
inspired more confidence and, therefore, more
making test should be little affected by the
use than was warranted.
characteristics of the examiner, nor should the
scoring and interpretation of the performance.
4.01.9.3 Objective Testing By contrast, an examinee's responses might be
affected to a considerable degree by the
The term ªobjective testº usually refers to a characteristics of an examiner administering a
self-report measure that presents a stimulus item Rorschach or a TAT, let alone the interpreta-
to a respondent and that requires a constrained tion of those responses.
response such as ªTrue/False,º ªAgree/Dis- The field of clinical psychology abounds in
agree,º and so forth. There are many, many tales of diagnostic acumen of marvelous
objective tests, but the dominant one is, and proportions manifested by legendary clinicians
virtually always has been, the MMPI (Hath- able to use the Rorschach, an MMPI profile, or
away & McKinley, 1943). We have already some other instrument as a stimulus. Unfortu-
discussed various aspects of the MMPI under nately, no such tales have advanced beyond the
other topics, but it is worth noting here that the bounds of anecdote, and none of these
durability of the MMPI has been impressive. Its legendary clinicians appears to have been able
clinical utility has not. It yields profiles that to pass along his or her acumen to a group of
seem impressive, and it certainly can, in general, studentsÐlet alone passing it along across
serve as a screening instrument for psycho- several generations. Consequently, if clinicians
pathology: people who get really high scores on are to be part of the clinical assessment
one or more of the MMPI scales probably have equation, then it seems inevitable that indivi-
something awry in their lives. No relationships dual clinicians will have to be validated
have ever been consistently demonstrated individually, that is, individual clinicians will
References 29

have to be shown to be reliable and valid Perhaps the SCID is not used because it takes
instruments. That will not further progress in some training and practice to become proficient
the field. in its use. That requirement is certainly different
from the typical assessment instruments adver-
4.01.9.5 Structured Interviews tised in psychological publications, which boast
their quick and easy use and say nothing about
A fairly recent development in clinical their reliability and validity. It may also be that
assessment is the structured interview schedule. beliefs about the superiority of clinical judg-
These schedules are intended to produce a ment over other more structured practices, for
diagnostic judgment related to the DSM example, the use of projective tests, contributes
(American Psychiatric Association, 1994), a strongly as well. Whatever the reasons for lack
narrow, bounded purpose. There are several of clinical use of the SCID, and we suspect that
such interview schedules currently available, but it is both training time and beliefs about clinical
we will discuss the Structured Clinical Interview skill, it is an unfortunate omission from
for DSM-IV (SCID) as an example and because assessment practice.
it is probably the one most widely used.
As noted earlier, most psychological assess-
ment appears to be done for purposes of 4.01.10 CONCLUSIONS
enlightenment rather than for decision-making.
Progress in psychological assessment, at least
Nevertheless, diagnoses are often required for for clinical applications, has been disappointing
reimbursement, medication referrals, custody
over the century since the field started. Con-
evaluations, and forensic assessments. The
ceptual and theoretical developments have been
SCID (Spitzer, Gibbon, & Williams, 1997)
minimal, although we might except some
appears to be used quite infrequently in other
observational methods used primarily in beha-
than research settings, for example, it is not
vioral work and some research settings. The
mentioned on any list of instruments used by
field continues to move away from its scientific
clinicians. That neglect is interesting in view of
roots in psychology, and clinical assessment has
the attention that was paid to the development
no other base on which to build any conceptual
of the SCID and its established dependability.
structure. Moreover, clinical assessment has
Use of the SCID in clinical practice would never been more than minimally guided by
probably contribute to improved assessment
psychometric theory and analysis, for example,
(and presumably to more appropriate treat-
scarcely beyond superficial concern with ªre-
ment), whether for specific DSM diagnostic
liabilityº of measures, and graduate education
purposes or simply for gathering pertinent
and training in research methods and measure-
information. The SCID was designed to
ment is at an ebb and is maybe still decreasing.
capitalize on clinical skills and to be more
Overall, clinical assessment as an enterprise
ªclinician-friendlyº than other structured inter-
seems to be cut adrift from any important
views (Spitzer, Williams, Gibbon, & First,
sources of rigor, and almost anything goes.
1992). The SCID is meant to be used by
Perhaps it is fortunate, then, that despite the
precisely those people who can already conduct frequent insistence on assessment as a corner-
an interview, and although the SCID is some-
stone of the practice of clinical psychology,
what time-consuming, but probably less so
there is much less evidence for its importance
than, say, the Rorschach, psychologists inter- and prevalence than would be expected.
view all patients, and for most clinicians to do so
in a structured manner would not be a
significant departure. That is, the time would 4.01.11 REFERENCES
be spent interviewing the patient and the SCID
would not add much if anything in terms of time Achenbach, T.M. (1985). Assessment and taxonomy of child
and adolescent psychopathology. Beverly Hills, CA: Sage.
or cost to standard practice. The SCID Achenbach, T.M., & Edelbrock, C. S. (1983). Manual for
demonstrates good reliability (test±retest and the Child Behavior Checklist and Revised Child Behavior
inter-rater) for most disorders, with kappa Profile. Burlington, VT: Department of Psychiatry,
coefficients averaging 0.60±0.80 or greater University of Vermont.
Achenbach, T. M., & Edelbrock, C. S. (1986). Manual for
(Segal, Hersen, & Van Hasselt, 1995; Williams, the Teachers Report Form and Teacher Version of the
Gibbon, First, & Spitzer, 1992). Agreement Child Behavior Profile. Burlington, VT: University of
between diagnoses obtained by SCID and by Vermont, Department of Psychiatry.
traditional clinical interviews is poor to mod- Achenbach, T. M., & Edelbrock, C. S. (1987). Manual for
erate with average kappa coefficients of 0.25 the Youth Self-Report Form and Youth Version of the
Child Behavior Profile. Burlington, VT: University of
(Steiner, Tebes, Sledge, & Walker, 1995), Vermont, Department of Psychiatry.
suggesting strongly that reliance on unstruc- Aiken, L., West, S. G., Sechrest, L., & Reno, R. (1990).
tured clinical interviews is unwise. Graduate training in statistics, methodology, and
30 The Role of Assessment in Clinical Psychology

measurement in psychology: A survey of Ph.D. programs system. New York: Wiley.


in North America. American Psychologist, 45, 721±734. Feynman, R. (1986). Surely you're joking, Mr. Feynman!
American Psychiatric Association (1994). Diagnostic and New York: Bantam Books.
statistical manual for mental disorders (4th ed.). Wa- Fiske, D. W. (1971). Measuring the concepts of personality.
shington, DC: Author. Chicago: Aldine Press.
American Psychological Association (1985). Standards for Forer, B. (1949). The fallacy of personal validation: A
educational and psychological testing. Washington, DC: classroom demonstration of gullibility. Journal of
Author. Abnormal and Social Psychology, 44, 118±123.
APA Practice Directorate (1996). Practitioner survey Frank, L. K. (1939). Projective methods for the study of
results offer comprehensive view of psychological prac- personality. Journal of Psychology, 8, 389±413.
tice. Practitioner Update, 4(2). Gigerenzer, G. (1993). The superego, the ego, and the id in
Anastasi, A. (1988). Psychological testing (6th ed.). New statistical reasoning. In G. Keren & C. Lewis (Eds.), A
York: Macmillan. handbook for data analysis in the behavioral sciences:
Atkins, M. S., Pelham, W. E., & White, K. J. (1990). Methodological issues (pp. 311±338). Hillsdale, NJ:
Hyperactivity and attention deficit disorders. In M. Erlbaum.
Hersen & V. B. Van Hasselt (Eds.), Psychological aspects Goldberg, L. R. (1968). Simple models or simple processes?
of developmental and physical disabilities: A casebook. Some research on clinical judgments. American Psychol-
Newbury Park, CA: Sage. ogist, 23, 483±496.
Berg, I. A. (1961). Measuring deviant behavior by means of Goldberg, L. R. (1991). Human mind versus regression
deviant response sets. New York: Harpers. equation: Five contrasts. In D. Cicchetti & W. M. Grove
Bergen, A. E., & Garfield, S. L. (1994). Handbook of (Eds.), Thinking clearly about psychology: Vol. 1. Matters
psychotherapy and behavior change. New York: Wiley. of public interest: essays in honor of Paul E. Meehl
Bernreuter, R. G. (1933). Validity of the personality (pp. 173±184). Minneapolis, MN: University of Minne-
inventory. Personality Journal, 11, 383±386. sota Press.
Binet, A., & Simon, T. H. (1905). Methodes nouvelles pour Goyette, C. H., Conners, C. K., & Ulrich, R. E. (1978).
le diagnostic du niveau intellectuel des anormaux. Annee Normative data on the Conner's parent and teacher
Psychologique, 11, 191±244. rating scales. Journal of Abnormal Child Psychology, 6(2),
Campbell, D. T., & Fiske, D. W. (1959). Convergent and 221±236.
discriminant validation by multitrait±multimethod ma- Grove, W. M., & Meehl, P. E. (1996). Comparative
trix. Psychological Bulletin, 56, 81±105. efficiency of informal (subjective, impressionistic) and
Cattell, J. M. (1890). Mental tests and measurements. formal (mechanical, algorithmic) prediction procedures:
Mind, 15, 373±380. The clinical-statistical controversy. Psychology, Public
Chapman, L. J., & Chapman, J. P. (1967). Genesis of Policy, & Law, 2(2), 293±323.
popular but erroneous psychodiagnostic observations. Hathaway, S. R., & McKinley, M. N. (1943). The
Journal of Abnormal Psychology, 72, 193±204. Minnesota Multiphasic Personality Inventory (Rev. ed.).
Chapman, L. J., & Chapman, J. P. (1969). Illusory Minneapolis, MN: University of Minnesota Press.
correlation as an obstacle to the use of valid psycho- Hertz, M. R. (1986). Rorschachbound: A 50-year memoir.
diagnostic signs. Journal of Abnormal Psychology, 74, Journal of Personality Assessment, 50(3), 396±416.
271±280. Hoza, B., Vallano, G., & Pelham, W. E. (1995). Attention-
Costello, A., Edelbrock, C. S., Kalas, R., Kessler, M., & deficit/hyperactivity disorder. In R. T. Ammerman & M.
Klaric, S. A. (1982). Diagnostic Interview Schedule for Hersen (Eds.), Handbook of child behavior therapy in
Children (DISC). Bethesda, MD: National Institute for psychiatric setting. New York: Wiley.
Mental Health. Hunt, W. C. (1956). The clinical psychologist. Springfield,
Craik, K. H. (1986). Personality research methods: An IL: Thomas.
historical perspective. Journal of Personality, 54(1), Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-
18±51. analysis: Correcting error and bias in research findings.
Cronbach, L. J. (1960). Essentials of psychological testing Newbury Park, CA: Sage.
(2nd ed.). New York: Harper and Row. Jackson, D. N., & Messick, S. (1962). Response styles on
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, the MMPI: Comparison of clinical and normal samples.
N. (1972). The dependability of behavioral measurements. Journal of Abnormal and Social Psychology, 65, 285±299.
New York: Wiley. Jensen, A. R. (1985). Description & utility of Armed
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity Services Vocational Aptitude Battery-14. Measurement &
in psychological tests. Psychological Bulletin, 52, Evaluation in Counseling & Development, 18(1), 32±37.
281±302. Jensen, A. R. (1992). The importance of intraindividual
Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). variation in reaction time. Personality & Individual
Theory of generalizability: A liberalization of reliability Differences, 13(8), 869±881.
theory. British Journal of Statistical Psychology, 16, Kaufman, A. S., & Kaufman, N. L. (1985). Kaufman Test
137±163. of Educational Achievement (K-TEA). Circle Pines, MN:
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical American Guidance Service.
versus actuarial judgment. Science, 243, 1668±1674. Kagan, J. (1959). The stability of TAT fantasy and stimulus
Epstein, S. (1983). Aggregation and beyond: Some basic ambiguity. Journal of Consulting Psychology, 23,
issues in the prediction of behavior. Journal of Person- 266±271.
ality, 51, 360±392. Kahneman, D., Slovic, P., & Tversky, A. (Eds.) (1982).
Esquirol, J. E. D. (1838). Des maladies mentales considerees Judgment under uncertainty: Heuristics and biases. Cam-
sous les rapports medical, hygienique, et medico-legal (2 bridge, UK: Cambridge University Press.
Vols.). Paris: Bailliere. Keiser, R. E., & Prather, E. N. (1990). What is the TAT? A
Exner, J. E. (1969). The Rorschach systems. New York: review of ten years of research. Journal of Personality
Grune & Stratton. Assessment, 55 (3±4), 800±803.
Exner, J. E. (1974). The Rorschach systems. New York: Klopfer, B., & Kelley, D. M. (1942). The Rorschach
Grune & Stratton. technique. Yonkers-on-Hudson, NY: World Book Com-
Exner, J. E. (1986). The Rorschach: A comprehensive pany.
system. New York: WiIey. Kraemer, H. C. (1992). Evaluating medical tests: Objective
Exner, J. E. (1993). The Rorschach: A comprehensive and quantitative guidelines. Newbury Park, CA: Sage.
References 31

Levy, L. H. (1963). Psychological interpretation. New York: democracy, (d) the origin of multiple-choice exams, (e)
Holt, Rinehart, and Winston. none of the above? (Mark the RIGHT answer). In M. M.
Lindzey, G. (1952). Thematic Apperception Test: Inter- Sokal (Ed.) Psychological testing and American society
pretive assumptions and related empirical evidence. 1890±1930 (pp. 113±127). New Brunswick, NJ: Rutgers
Psychological Bulletin. University Press.
Lord, F. M. (1952). A theory of test scores. Psychometric Sarason, S. B. (1954). The clinical interaction, with special
Monographs, No. 7. reference to the Rorschach. New York: Harper.
Lykken, D. T. (1991). What's wrong with psychology Sechrest, L. (1963). Incremental validity: A recommenda-
anyway? In D. Cicchetti & W. M. Grove (Eds.), Thinking tion. Educational and Psychological Measurement, 33(1),
clearly about psychology (pp. 3±39). Minneapolis, MN: 153±158.
University of Minnesota Press. Sechrest, L. (1968). Testing, measuring, and assessing
Maloney, M. P., & Ward, M. P. (1976). Psychological people. In W. W. Lambert & E. G. Borgatta (Eds.),
assessment: A conceptual approach. New York: Oxford Handbook of personality theory and research. Chicago:
University Press. Rand McNally.
McClure, D. G., & Gordon, M. (1984). Performance of Sechrest, L. (1992). The past future of clinical psychology:
disturbed hyperactive and nonhyperactive children on an A reflection on Woodworth (1937). Journal of Consulting
objective measure of hyperactivity. Journal of Abnormal and Clinical Psychology, 60(1), 18±23.
Child Psychology, 12(4), 561±571. Sechrest, L., McKnight, P. E., & McKnight, K. M. (1996).
McCraken, B. A., & McCallum, S. R. (1993). Wechsler Calibration of measures for psychotherapy outcome
Intelligence Scale for Children (3rd ed.). Brandon, VT: studies. American Psychologist, 51, 1065±1071.
Clinical Psychology Publishing. Segal, D. L, Hersen, M., & Van Hasselt, V. B. (1994).
McReynolds, P. (1987). Lightner Witmer: Little known Reliability of the structured clinical interview for DSM-
founder of clinical psychology. American Psychologist, III-R: An evaluative review. Comprehensive Psychiatry,
42, 849±858. 35(4), 316±327.
McReynolds, P. (1996). Lightner Witmer: A centennial Sharkey, K. J., & Ritzler, B. A. (1985). Comparing the
tribute. American Psychologist, 51(3), 237±240. diagnostic validity of the TAT and a New Picture
Meehl, P. E. (1954). Clinical versus statistical prediction. Projective Test. Journal of Personality Assessment, 49,
Minneapolis, MN: University of Minnesota Press. 406±412.
Meehl, P. E. (1960). The cognitive activity of the clinician. Shneidman, E. S. (1986). MAPS of the Harvard Yard.
The American Psychologist, 15, 19±27. Journal of Personality Assessment, 50(3), 436±447.
Meehl, P. E., & Rosen, A. (1955). Antecedent probability Somoza, E., Steer, R. A., Beck, A. T., & Clark, D. A.
and the efficiency of psychometric signs, patterns, or (1994). Differentiating major depression and panic
cutting scores. Psychological Bulletin, 52, 194±216. disorders by self-report and clinical rating scales: ROC
Meier, S. L. (1994). The chronic crisis in psychological analysis and information theory. Behaviour Research and
measurement and assessment: A historical survey. New Therapy, 32, 771±782.
York: Academic Press. Spitzer, R. L., Gibbon, M., & Williams, J. B. W. (1997).
Meyer, G. J. (1996). The Rorschach and MMPI: Toward a Structured Clinical Interview for DSM-IV Disorders
more scientific differential understanding of cross- (SCID-I)-Clinician Version. Washington, DC: American
method assessment. Journal of Personality Assessment, Psychiatric Press.
67, 558±578. Spitzer, R. L, Williams, J. B. W., Gibbon, M., & First, M.
Meyer, G. J., & Handler, L. (1997). The ability of the B. (1992). The Structured Clinical Interview for DSM-
Rorschach to predict subsequent outcome: a meta- III-R (SCID): I. History, rationale, and description.
analysis of the Rorschach Prognostic Rating Scale. Archives of General Psychiatry, 49(8), 624±629.
Journal of Personality Assessment, 69, 1±38. Steiner, J. L., Tebes, J. K., Sledge, W. H., & Walker, M. L.
Millon, T. (1984). On the renaissance of personality (1995). A comparison of the Structured Clinical Inter-
assessment and personality theory. Journal of Personality view for DSM-III-R and clinical diagnoses. Journal of
Assessment, 48(5), 450±466. Nervous & Mental Disease, 183(6), 365±369.
Millon, T., & Davis, R. D. (1993). The Millon Adolescent Strupp, H. H., Horowitz, L. M., & Lambert, M. J. (1997).
Personality Inventory and the Millon Adolescent Clin- Measuring patient changes in mood, anxiety, and person-
ical Inventory. Journal of Counseling and Development. ality disorders: Toward a core battery. Washington, DC:
Mitchell, J. V., Jr. (Ed.) (1985). The mental measurements American Psychological Association.
yearbook. Lincoln, NE: Buros Institute of Mental Terman, L. M. (1916). The measurement of intelligence.
Measurements, University of Nebraska. Boston: Houghton Mifflin.
Morgan, C. D., & Murray, H. A. (1935). A method for Thorndike, R., & Hagen, E. (1955). Measurement and
investigating fantasies. Archives of Neurological Psychia- evaluation in psychology and education. New York:
try, 35, 289±306. Wiley.
Murray, H. A. (1943). Manual for the Thematic Appercep- Wade, T. C., & Baker, T. B. (1977). Opinions and use of
tion Test. Cambridge, MA: Harvard University Press. psychological tests: A survey of clinical psychologists.
Murstein, B. I. (1963). Theory and research in projective American Psychologist, 32, 874±882.
techniques. New York: Wiley. Wallace, J. (1966). An abilities conception of personality:
Prochaska, J. O., DiClemente, C. C., & Norcross, J. C. Some implications for personality measurement. Amer-
(1992). In search of how people change: Applications to ican Psychologist, 21(2), 132±138.
addictive behaviors. American Psychologist, 47(9), Ware, J. E., & Sherbourne, C. D. (1992). The MOS 36-item
1102±1114. short-form health survey (SF-36): 1. Conceptual
Rotter, J. B., & Rafferty, J. E. (1950). Manual: The Rotter framework and item selection. Medical Care, 30(6),
Incomplete Sentences Blank. San Antonio, TX: Psycho- 473±483.
logical Corporation. Watkins, C. E. (1991). What have surveys taught us about
Reisman, J. M. (1991). A history of clinical psychology (2nd the teaching and practice of psychological assessment?
ed.). New York: Hemisphere. Journal of Personality Assessment, 56, 426±437.
Samejima, F. (1988). Comprehensive latent trait theory. Watkins, C. E., Campbell, V. L., Nieberding, R., &
Behaviormetrika, 24, 1±24. Hallmark, R. (1995). Contemporary practice of psycho-
Samelson, F. (1987). Was early mental testing (a) racist logical assessment by clinical psychologists. Professional
inspired, (b) objective science, (c) a technology for Psychology: Research and Practice, 26, 54±60.
32 The Role of Assessment in Clinical Psychology

Webb, E. J., Campbell, D. T., Schwartz, R. D., & Sechrest, Woodworth, R. S. (1992). The future of clinical psychol-
L. (1966). Unobtrusive measures: Nonreactive research in ogy. Journal of Consulting and Clinical Psychology, 60,
the social sciences. Chicago: Rand McNally. 16±17. (Original work published 1937.)
Webb, E. J., Campbell, D. T., Schwartz, R. D., Sechrest, Wundt, W., Creighton, J. E., & Titchener, E. B. (1894/
L., & Grove, J. B. (1981). Nonreactive measures in the 1896). Lectures on human and animal psychology.
social sciences. Boston: Houghton Mifflin. London: Swan Sonnenschein.
Wiggins, J. S. (1973). Personality and prediction: Principles Yerkes, R. M. (Ed.) (1921). Psychological examining in the
of personality assessment. Reading, MA: Addison- United States army. Memoirs of the National Academy of
Wesley. Sciences, 15.
Williams, J. B. W., Gibbon, M., First, M. B., & Spitzer, R. Yerkes, R. M. (1941). Man power and military effective-
L (1992). The Structured Clinical Interview for DSM-III- ness: The case for human engineering. Journal of
R (SCID): II. Multisite test±retest reliability. Archives of Consulting Psychology, 5, 205±209.
General Psychiatry, 49(8), 630±636. Zuckerman, M. (1996, April 22). Society for a Science of
Witmer, L. (1996). Clinical Psychology. American Psychol- Clinical Psychology Network (SSCPNET; electonic net-
ogist, 51(3), 248±251. (Original work published 1907.) work).
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.02
Fundamentals of Measurement
and Assessment in Psychology
CECIL R. REYNOLDS
Texas A&M University, College Station, TX, USA

4.02.1 INTRODUCTION 33
4.02.2 NORMS AND SCALES OF MEASUREMENT 34
4.02.2.1 Scales of Measurement 34
4.02.2.1.1 Nominal scales 34
4.02.2.1.2 Ordinal scales 34
4.02.2.1.3 Interval scales 35
4.02.2.1.4 Ratio scales 35
4.02.2.2 Norms and Reference Groups 35
4.02.3 UNITS OF MEASUREMENT 37
4.02.4 ACCURACY OF TEST SCORES 41
4.02.4.1 True Score Theory 41
4.02.4.2 Generalizability Theory 43
4.02.5 VALIDITY 43
4.02.6 THE ASSESSMENT PROCESS 45
4.02.7 MODELS AND METHODS OF ASSESSMENT 46
4.02.7.1 Traditional Norm-referenced Assessment 46
4.02.7.1.1 Intelligence, achievement, and special abilities 46
4.02.7.2 Norm-referenced, Objective Personality Measures 48
4.02.7.3 Projective Assessment 49
4.02.7.4 Behavioral Assessment 50
4.02.7.5 Neuropsychological Assessment 51
4.02.8 CLINICAL VS. STATISTICAL PREDICTION 52
4.02.9 ACCESSING CRITICAL COMMENTARY ON STANDARDIZED PSYCHOLOGICAL TESTS 53
4.02.10 CONCLUDING REMARKS 53
4.02.11 REFERENCES 54

4.02.1 INTRODUCTION represent a level of some particular psycholo-


gical trait, attribute, or behavior of the
Measurement is a set of rules for assigning individual. These characteristics may be ob-
numbers to objects or entities. A psychological servable directly or may be inferred or observed
measuring device (typically a test), then, is a set indirectly through changes in behavior or
of rules (the test questions, directions for responses to a set or a variable stimulus.
administration, scoring criteria, etc.) for assign- Assessment is a more comprehensive process
ing numbers to an individual that are believed to of deriving meaning from test scores and clinical

33
34 Fundamentals of Measurement and Assessment in Psychology

information in order to describe the individual they do in other arenas; for example, four inches
both broadly and in depth. Psychological tests of water conveys a very different meaning than a
are the nonexclusive tools of assessment. A reference to four gallons of water. The four
proper assessment must also consider the basic scales of measurement are nominal,
background and current cultural milieu of the ordinal, interval, and ratio scales. As one moves
individual and actual observed behavior. This from nominal scales toward ratio scales,
chapter does not attempt to deal with all aspects increasingly sophisticated levels of measure-
of the assessment process. An introduction to ment are possible.
basic measurement technology and theory will
be provided along with material concerning
different methods of measurement intended to 4.02.2.1 Scales of Measurement
enhance understanding of other chapters in this 4.02.2.1.1 Nominal scales
work.
There are many problems and controversial A nominal scale is a qualitative system of
issues in psychological and educational assess- categorizing people (or objects, traits, or other
ment and, obviously, all cannot be treated in variables) or individual observations regarding
this work. As one example, assessment and the people typically into mutually exclusive classes
testing that accompanies it occur within a par- or sets. Sex is an example of a nominal scale; one
ticular situation or context. The results that are is either male or female. Diagnostic categories
obtained may thus be strongly influenced by such as hyperactivity, learning disabled, apha-
situational factors in the case of some indivi- sia, severely emotionally disturbed, or major
duals but less so or not at all for others. The depressive disorder represent nominal scaling
question of the generalizability of test results categories that are not mutually exclusive.
obtained under a specified set of conditions Nominal scales provide so little quantitative
takes on major importance in interpreting test information about members of categories that
scores. Not all variables that influence general- some writers prefer to exclude nominal scales
izability are known and few that are have been from the general rubric of measurement. As
well researched. Test anxiety is one factor Hays (1973) points out, the term measurement
thought to influence strongly the generalizabil- typically is reserved for a situation where each
ity of results across settings and has been individual is assigned a relational number.
researched extensively, yet the complete articu- Because the quantitative relationship among
lation of the relationship among test anxiety, nominal categories is unknown, many common
test performance, and the validity of test-score statistical tests cannot be employed with
interpretations across settings is far from nominal scale data. However, since nominal
complete. The assessment of children, in scales do allow for the classification of an event
particular, poses special problems because of into a discrete category, many writers (e.g.,
the rapid growth and development as well as Nunnally, 1978) do include them as one type of
their susceptibility to external environmental measurement.
factors. Many of these factors are treated at
length in Anastasi (1981), Cronbach (1983),
4.02.2.1.2 Ordinal scales
Kaufman (1994), Reynolds (1985), and Rey-
nolds and Kamphaus (1990a, 1990b), and the Ordinal scales provide considerably more
interested reader is referred to these sources for quantitative information regarding an observa-
further reading on the problems, issues, and tion than nominal scales. Ordinal scales allow
limitations of educational and psychological one to rank objects or people according to the
testing, as well as to the other chapters in this amount of a particular attribute displayed.
volume and to Volume 10. Ordering usually takes the form of the ªmostº to
the ªleastº amount of the attribute in question.
If children in a classroom were weighed and
4.02.2 NORMS AND SCALES OF then ranked from heaviest to lightest with the
MEASUREMENT heaviest child assigned the rank of 1, the next
heaviest a 2, and so on, until all children had
Many pieces of information are necessary been assigned a number, the resulting measure-
before one can attach the proper meaning to a ment would be on an ordinal scale. Although an
test score. Among the basic are knowledge of ordinal scale provides certain quantitative
what scale of measurement has been employed information about each individual, it does not
and with what sort of reference group the tell how far apart each observation is from the
individual is being compared, if any. Different next one. Between adjacent pairs of ranks there
scales have different properties and convey may be a different degree of difference. The
different levels and types of information just as difference in weight between child 1 and child 2
Norms and Scales of Measurement 35

may be 10 pounds, but the difference between This discussion of scales of measurement has
child 2 and child 3 may be one pound or even necessarily been limited to the most basic
less. Ordinal scales thus designate relative elements and distinctions among scales. The
positions among individuals, an advance over reader who desires to explore this topic from a
nominal scaling, but are still crude with regard technical perspective will find an excellent and
both to describing individuals and to the extensive mathematical presentation of scales of
possible statistical treatments that can be measurement in Hays (1973).
meaningfully applied. Means and standard
deviations are usually without meaning when 4.02.2.2 Norms and Reference Groups
applied to ordinal scales, although the median
and mode can be determined and used mean- To understand the individual's performance
ingfully. Age and grade equivalents are exam- as represented by a score on a psychological
ples of common ordinal scales. measurement device, it is necessary, except with
certain very specific tests, to evaluate the
individual's performance relative to the perfor-
4.02.2.1.3 Interval scales
mance of some preselected group. To know
Interval scales afford far more information simply that an individual answers 60 out of 100
about observations and can be mathematically questions correctly on a history test, and 75 out
manipulated with far greater confidence and of 100 questions correctly on a biology test,
precision than nominal or ordinal scales. To conveys very little information. On which test
have an interval scale of measurement, one did this individual earn the better score?
must have an ordinal scale on which the Without knowledge of how a comparable or
difference between any two adjacent points other relevant group of persons would perform
on the scale is equal. Most of the measurement on these tests, the question of which score is
scales and tests used in psychology and better cannot be answered.
education assume an interval scale. Intelligence Raw scores on a test, such as the number or
tests are one good example of an interval scale percentage of correct responses, take on mean-
and can also illustrate the distinction between ing only when evaluated against the perfor-
interval and the highest level of measurement, mance of a normative or reference group of
ratio scales. Although nearly all statistical individuals. For convenience, raw scores are
methods can be applied to measurements on typically converted to a standard or scaled score
an interval scale, the interval scale has no true and then compared against a set of norms. The
zero point, where zero designates total absence reference group from which the norms are
of an attribute. If one were to earn an IQ of zero derived is defined prior to the standardization of
on an intelligence test (by failing to answer a the test. Once the appropriate reference popula-
single question correctly), this would not tion has been defined, a random sample is
indicate the absence of intelligence, for without tested, with each individual tested under as
intelligence no human could remain alive (it is nearly identical procedures as possible. Many
not possible on most tests of intelligence to earn factors must be considered when developing
an IQ of zero even if all test questions are norms for test interpretation. Ebel (1972),
answered incorrectly). Angoff (1971), and Petersen, Kolen, and Ho-
over (1989) have provided especially good
discussions of the necessary conditions for
4.02.2.1.4 Ratio scales
appropriate development and use of normative
Ratio scales possess the attributes of ordinal reference data. The following points are taken
and interval scales but also have a true zero principally from these three sources, with some
pointÐa score of zero indicates the complete elaboration by the present author. Some of
absence of the attribute under consideration. these conditions place requirements on the test
Length and width are ratio scales. There are few being normed, some on the psychological trait
instances of ratio scales in psychology outside of being measured, and others on the test user.
measurement of simple sensory and motor (i) The psychological trait being assessed
functions. Ratio scales have useful quantitative must allow the ranking of individuals along a
features, in particular, as indicated by the name: continuum from high to low, that is, it must be
ratios are meaningfulÐsix feet is twice three amenable to at least ordinal scaling. If a
feet. Ratios are not meaningful with interval nominal scale was employed, only the presence
scales. A person with an IQ of 100 cannot be or absence of the trait would be of interest and
said to be twice as intelligent as a person with an relative amounts of the trait could not be
IQ of 50. Fortunately, it is not necessary to have determined; norms, under this unusual condi-
ratio scales to attack the vast majority of tion, would be superfluous if not distracting or
problems in psychological assessment. misleading.
36 Fundamentals of Measurement and Assessment in Psychology

(ii) The content of the test must provide an scores are derived be a group of similar socio-
adequate operational definition of the psycho- cultural background, experience, and handicap-
logical trait under consideration. With a proper ping condition. Although this may be an
operational definition, other tests can be con- appropriate, if not noble, hypothesis for re-
structed to measure the same trait and should search, implementation must await empirical
yield comparable scores for individuals taking verification, especially since it runs counter to
both tests. traditional psychological practice. Indeed, all
(iii) The test should assess the same psycho- interpretations of test scores should be guided
logical construct throughout the entire range of principally by empirical evidence. Once norms
performance. have been established for a specific reference
(iv) The normative reference group should group, the generalizability of the norms becomes
consist of a large random sample representative a matter of actuarial research; just as norms
of the population on whom the test is to be based on one group may be inappropriate for use
administered later. with another group, the norms may also be
(v) The normative sample of examinees from appropriate and a priori acceptance of either
the population should ªhave been tested under hypothesis would be incorrect (Reynolds &
standard conditions, and . . . take the test as Brown, 1984). A large, cumulative body of
seriously, but no more so, than other(s) to be evidence demonstrates clearly that test scores
tested later for whom the norms are neededº predict most accurately (and equally well for a
(Ebel, 1972, p. 488). variety of subgroups) when based on a large,
(vi) The population sampled to provide nor- representative random sample of the popula-
mative data must be appropriate to the test and tion, rather than open highly specific subgroups
to the purpose for which the test is to be within a population (e.g., Hunter, Schmidt, &
employed. The latter point is often misinter- Rauschenberger, 1984; Jensen, 1980; Reynolds,
preted, especially with regard to evaluation of 1982, 1995, in press-a, in press-b).
exceptional children. Many adequately normed (vii) Normative data should be provided for
psychological tests are inappropriately maligned as many different groups as it may be useful for
for failure to include significant numbers of an individual to be compared against. Although
handicapped children in their normative sample. this may at first glance seem contradictory to
The major intelligence scales designed for use the foregoing conclusions, there are instances
with children (i.e., the various Wechsler scales when it is useful to know how a patient
and the McCarthy Scales of Children's Abilities) compares to members of other specific sub-
have been normed on stratified random samples groups. The more good reference groups avail-
of children representative of children in the able for evaluating a patient's performance on a
United States. Some authors (e.g., Salvia & test, the potentially more useful the test may
Ysseldyke, 1981) criticize tests such as the become.
Wechsler scales as inappropriate for measuring The normative or reference group most often
the intellectual level of various categories of used to derive scores is the standardization
children with disabilities because large numbers sample, a sample of the target population drawn
of these children were not included in the test's using a set plan. The best tests, and most
standardization sample. Whether this is a valid publishers and developers of tests, aspire to a
criticism depends on the purpose to which the standardization sample that is drawn using
test is applied. If knowledge of an emotionally population proportionate stratified random
disturbed child's level of intellectual functioning sampling. This means that samples of people
relative to age mates in the United States is are selected based on subgroups of a larger
desired, comparing the child's performance to group to ensure that the population as a whole is
that of other similarly emotionally disturbed represented. In the USA, for example, tests are
children, then a reference group of emotionally typically standardized via a sampling plan that
disturbed children would be appropriate. The stratifies the sample by gender, age, ethnicity,
latter information is not sought frequently nor socioeconomic background, region of resi-
has it been shown to be more useful in the dence, and community size based on population
diagnosis or development of appropriate inter- statistics provided by the US Bureau of the
vention strategies. Salvia and Ysseldyke (1981) Census. If the Census Bureau data were to
contend that it would be inappropriate to base indicate, for example, that 1% of the US
predictions of future intellectual or academic population consisted of African-American
performance on test scores for an exceptional males in the middle range of socioeconomic
child that have been derived through compar- status residing in urban centers of the south
ison with the larger, normal population's per- region, then 1% of the standardization sample
formance. To make predictions, they would first of the test would be drawn to meet this same set
require that the reference group from which of characteristics.
Units of Measurement 37

Once the normative reference group has been determined. The normal distribution or normal
obtained and tested, tables of standardized or curve is most helpful in making these interpreta-
scaled scores are developed. These tables are tions. Figure 1 shows the normal curve and its
based on the responses of the standardization relationship to various standard score systems.
sample and are frequently referred to as norms A person whose score falls 1 SD above the mean
tables. There are many types of scaled scores or performs at a level exceeding about 84% of the
other units of measurement that may be population of test-takers. Two SDs will be above
reported in the ªnorms tablesº and just which 98% of the group. The relationship is the same in
unit of measurement has been chosen may the inverse below the mean. A score of 1 SD
greatly influence score interpretation. below the mean indicates that the individual
exceeds only about 16% of the population on the
attribute in question. Approximately two-thirds
4.02.3 UNITS OF MEASUREMENT (68%) of the population will score within 1 SD of
the mean on any psychological test.
Raw scores such as number correct are Standard scores such as those shown in
tedious to work with and to interpret properly. Figure 1 (z scores, T scores, etc.) are developed
Raw scores are thus typically transformed to for ease of interpretation. Though standard
another unit of measurement. Scaled scores are scores are typically linear transformations of
preferred, but other units such as age and grade raw scores to a desired scale with a predeter-
equivalents are common. Making raw scores mined mean and SD, normalized scaled scores
into scaled scores involves creating a set of can also be developed. In a linear transforma-
scores with a predetermined mean and standard tion of test scores to a predetermined mean and
deviation that remain constant across some SD, equation (1) must be applied to each score:
preselected variable such as age.
The mean is simply the sum of the scores (X i 7 X )
scaled score = X ss + SDss (1)
obtained by individuals in the standardization SDx
sample divided by the number of people in the
sample (SXi/N). In a normal distribution of where Xi = raw score of any individual i,
scores (to be described in the next paragraph), X = mean of the raw scores, SDx = standard
the mean breaks performance on the test into deviation of the raw scores, SDss = standard
two equal parts, with half of those taking the deviation scaled scores are to have, and
test scoring above the mean and half scoring X ss = mean scaled scores are to have.
below the mean, though the median is formally Virtually all tests designed for use with
defined as the score point which breaks a children along with most adult tests standardize
distribution into two equal parts; in a normal scores and then normalize them within age
distribution, the mean and median are the same groups so that a scaled score at one age has the
score. same meaning and percentile rank at all other
The standard deviation (SD) is an extremely ages. Thus a person age 10 who earns a scaled
useful statistic in describing and interpreting a score of 105 on the test has the same percentile
test score. The SD is a measure of the dispersion rank within his or her age group as a 12-year-old
of scores about the mean. If a test has a mean of with the same score has in his or her age group.
100 and an individual earns a score of 110 on the That is, the score of 105 will fall at the same
test, we still have very little information except point on the normal curve in each case.
that the individual performed above average. Not all scores have this property. Grade and
Once the SD is known, one can determine how age equivalents are very popular types of scores
far from the mean the score of 110 falls. A score that are much abused because they are assumed
of 110 takes on far different meaning depending to have scaled score properties when in fact they
on whether the SD of the scores is 5, 15, or 30. represent only an ordinal scale. Grade equiva-
The SD is relatively easy to calculate once the lents ignore the dispersion of scores about the
mean is known; it is determined by first mean although the dispersion changes from age
subtracting each score from the mean, squaring to age and grade to grade. Under no circum-
the result, and summing across individuals. This stances do such equivalent scores qualify as
sum of squared deviations from the mean is then standard scores. Consider the calculation of a
divided by the number of persons in the grade equivalent. When a test is administered to
standardization sample. The result is the a group of children, the mean raw score is
variance of the test scores; the square root of calculated at each grade level and this mean raw
the variance is the SD. score then is called the ªgrade equivalentº score
Once the mean and SD of test scores are for a raw score of that magnitude. If the mean
known, an individual's standing relative to raw score for beginning fourth graders (grade
others on the attribute in question can be 4.0) on a reading test is 37, then any person
0.13% 2.14% 13.59% 34.13% 34.13% 13.59% 2.14% 0.13%

z scores -3.33 -3 -2.67 -2.33 -2 -1.67 -1.33 -1 -0.67 -0.33 0 0.33 0.67 1 1.33 1.67 2 2.33 2.67 3 3.33

T scores 17 20 23 27 30 33 37 40 43 47 50 53 57 60 63 67 70 73 77 80 83
Wechsler IQ 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150
(and others)
Wechsler scale 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Binet IQ 47 52 57 63 68 73 79 84 89 95 100 105 111 116 121 127 132 137 143 148 153

Binet scale 23 26 29 31 34 37 39 42 45 47 50 53 55 58 61 63 66 69 71 74 77

SAT/GRE scores 200 233 267 300 333 367 400 433 467 500 533 567 600 633 667 700 733 767 800

Percentile ranks 0.04 0.13 0.38 1 2 5 9 16 25 37 50 63 75 84 91 95 98 99 99.62 99.87 99.96

Stanines 1 2 3 4 5 6 7 8 9
4% 7% 12% 17% 20% 17% 12% 7% 4%

Figure 1 Relationships among the normal curve, relative standing expressed in percentiles, and various systems of derived scores.
Units of Measurement 39

earning a score of 37 on the test is assigned a but what happens to the 5 feet, 10 inch tall 14-
grade equivalent score of 4.0 regardless of the year-old female since at no age does the mean
person's age. If the mean raw score of fifth height of females equal 5 feet, 10 inches? Since
graders (grade 5.0) is 38, then a score of 38 the average reading level in the population
would receive a grade equivalent of 5.0. A raw changes very little after junior high school,
score of 37 could represent a grade equivalent of grade equivalents at these ages become virtually
4.0, 38 could be 5.0, 39 could be 5.1, 40 be 5.3, nonsensical with large fluctuations resulting
and 41, 6.0. Thus, differences of one raw score from a raw score difference of two or three
point can cause dramatic differences in the points on a 100-item test.
grade equivalents received, and the differences (ii) Grade equivalents assume that the rate of
will be inconsistent across grades with regard to learning is constant throughout the school year
the magnitude of the difference in grade and that there is no gain or loss during summer
equivalents produced by constant changes in vacation.
raw scores. (iii) Grade equivalents involve an excess of
Table 1 illustrates the problems of using grade extrapolation, especially at the upper and lower
equivalents to evaluate a patient's academic ends of the scale. However, since tests are not
standing relative to his or her peers. Frequently administered during every month of the school
in both research and clinical practice, children year, scores between the testing intervals (often
of normal intellectual capacity are diagnosed as a full year) must be interpolated on the assump-
learning disabled through the use of grade tion of constant growth rates. Interpolations
equivalents such as ªtwo years below grade level between sometimes extrapolated values on an
for ageº on a test of academic attainment. The assumption of constant growth rates is a some-
use of this criterion for diagnosing learning what ludicrous activity.
disabilities or other academic disorders is clearly (iv) Different academic subjects are acquired
inappropriate (Reynolds, 1981a, 1985). As seen at different rates and the variation in perfor-
in Table 1, a child with a grade equivalent score mance varies across content areas so that ªtwo
in reading two years below the appropriate years below grade level for ageº may be a much
grade placement for age may or may not have a more serious deficiency in math than in reading
reading problem. At some ages this is within the comprehension.
average range, whereas at others a severe (v) Grade equivalents exaggerate small dif-
reading problem may be indicated. ferences in performance between individuals
Grade equivalents tend to become standards and for a single individual across tests. Some
of performance as well, which they clearly are test authors even provide a caution on record
not. Contrary to popular belief, grade equiva- forms that standard scores only, and not grade
lent scores on a test do not indicate what level of equivalents, should be used for comparisons.
reading text a child should be using. Grade Age equivalents have many of the same
equivalent scores on tests simply do not have a problems. The standard deviation of age
one-to-one correspondence with reading series equivalents varies substantially across tests,
placement or the various formulas for determin- subsets, abilities, or skills assessed, and exist on
ing readability levels. an ordinal, not interval scale. It is inappropriate
Grade equivalents are also inappropriate for to add, subtract, multiply, or divide age or grade
use in any sort of discrepancy analysis of an equivalents or any other form of ordinal score.
individual's test performance, diagnosis of a Nevertheless, the use of such equivalent scores
learning disability or developmental disorder, in ipsative analysis of test performance remains
or for use in many statistical procedures for the a common mistake in clinical, educational, and
following reasons (Reynolds, 1981a). neuropsychological assessment.
(i) The growth curve between age and The principal advantage of standardized or
achievement in basic academic subjects flattens scaled scores lies in the comparability of score
at upper grade levels. This can be seen in Table 1 interpretation across age. By standard scores of
where there is very little change in standard course, I refer to scores scaled to a constant
score values corresponding to two years below mean and SD such as the Wechsler Deviation
grade level for age after about grade 7 or 8. In IQ and not to ratio IQ types of scales employed
fact, grade equivalents have almost no meaning by the early Binet and original Slosson
at this level since reading instruction typically Intelligence Test, which give the false appear-
stops by high school. Consider the following ance of being scaled scores. Ratio IQs or other
analogy with height as an age equivalent. types of quotients have many of the same
Height can be expressed in age equivalents just problems as grade equivalents and should be
as reading can be expressed as grade equiva- avoided for many of these same reasons.
lents. It might be helpful to describe a tall first Standard scores of the deviation IQ type have
grader as having the height of an 8‰ year old, the same percentile rank across age since they
40 Fundamentals of Measurement and Assessment in Psychology

Table 1 Standard scores and percentile ranks corresponding to performance ªtwo years below grade level for
ageº on three reading tests.

Wide range Woodcock Reading Stanford Diagnostic


achievement test Mastery Testa Reading Testa

Grade Two years


placement below placement SSb %Rc SS %R SS %R
2.5 K.5 67 1
3.5 1.5 69 2 64 1 64 1
4.5 2.5 73 4 77 6 64 1
5.5 3.5 84 14 85 16 77 6
6.5 4.5 88 21 91 27 91 27
7.5 5.5 86 18 94 34 92 30
8.5 6.5 87 19 94 34 93 32
9.5 7.5 90 25 96 39 95 34
10.5 8.5 85 16 95 37 95 37
11.5 9.5 85 16 95 37 92 30

a
Total test. b All standard scores in this table have been converted for ease of comparison to a common scale having a mean of 100 and an SD of
15. c Percentile rank.
Source: Adapted from Reynolds (1981a).

are based not only on the mean but the developers will then transform scores, using one
variability in scores about the mean at each of a variety of statistical methods (e.g., see Lord
age level. For example, a score that falls two- & Novick, 1968, for a mathematical review and
thirds of a SD below the mean has a percentile explication), to take a normal distribution.
rank of 25 at every age. A score falling two- Despite what is often taught in early courses
thirds of a grade level below the average grade in psychological statistics and measurement, this
level or an age equivalent six months below is not always appropriate. It is commonplace to
chronological age have different percentile read that psychological variables, like most
ranks at every age. others, are normally distributed within the
Standard scores are more accurate and population; many are. Variables such as intelli-
precise. When constructing tables for the gence, memory skill, and academic achievement
conversion of raw scores into standard scores, will closely approximate the normal distribution
interpolation of scores to arrive at an exact score when well measured. However, many psycho-
point is typically not necessary, whereas the logical variables, especially behavioral ones such
opposite is true of age and grade equivalents. as aggression, attention, and hyperactivity,
Extrapolation is also typically not necessary for deviate substantially from the normal curve
scores within 3 SDs of the mean, which accounts within the population of humans.
for more than 99% of all scores encountered. When a score distribution then deviates from
Scaled scores can be set to any desired mean normality, the test developer is faced with the
and standard deviation, with the fancy of the decision of whether to create normalized scores
test author frequently the sole determining via some transformation or to allow the dis-
factor. Fortunately, a few scales can account for tribution to retain its shape with perhaps some
the vast majority of standardized tests in smoothing to remove irregularities due to
psychology and education. Table 2 illustrates sampling error. In the later case, a linear trans-
the relationship between various scaled score formation of scores is most likely to be chosen.
systems. If reference groups are comparable, To make this determination, the test devel-
Table 2 can also be used to equate scores across oper must ascertain whether the underlying
tests to aid in the comparison of a patient's construct measured by the test is normally
performance on tests of different attributes, distributed or not and whether the extant
provided normalized scores are provided. sample is adequate to estimate the distribution,
What has been said thus far about scaled whatever its shape. For applied, clinical devices,
scores and their equivalency applies primarily to the purpose of score transformations that result
scores that have been forced to take the shape of in normalization of the distribution is to correct
the Gaussian or bell curve. When test-score for sampling error and presumes that the
distributions derived from a standardization underlying construct is, in fact, normally or
sample are examined, the scores frequently near normally distributed. Normalization of the
deviate significantly from normal. Often, test score distribution then produces a more
Accuracy of Test Scores 41

Table 2 Conversion of standard scores based on several scales to a commonly expressed metric.a

Scales

X =0 X = 10 X = 36 X = 50 X = 50 X = 100 X = 100
X = 100 X = 500 Percentile
SD = 1 SD = 3 SD = 6 SD = 10 SD = 15 SD = 15 SD = 16 SD = 20 SD = 100 rank

2.6 18 52 76 89 139 142 152 760 499


2.4 17 51 74 86 136 138 148 740 99
2.2 17 49 72 83 133 135 144 720 99
2.0 16 48 70 80 130 132 140 700 98
1.8 15 47 68 77 127 129 136 680 96
1.6 15 46 66 74 124 126 132 660 95
1.4 14 44 64 71 121 122 128 640 92
1.2 14 43 62 68 118 119 124 620 88
1.0 13 42 60 65 115 116 120 600 84
0.8 12 41 58 62 112 113 116 580 79
0.6 12 40 56 59 109 110 112 560 73
0.4 11 38 54 56 106 106 108 540 66
0.2 11 37 52 53 103 103 104 520 58
0.0 10 36 50 50 100 100 100 500 50
70.2 9 35 48 47 97 97 96 480 42
70.4 9 34 46 44 94 94 92 460 34
70.6 8 33 44 41 91 90 88 440 27
70.8 8 31 42 38 88 87 84 420 21
71.0 7 30 40 35 85 84 80 400 16
71.2 6 29 38 32 82 81 76 380 12
71.4 6 28 36 29 79 78 72 360 8
71.6 5 26 34 26 76 74 68 340 5
71.8 5 25 32 23 73 71 64 320 4
72.0 4 24 30 20 70 68 60 300 2
72.2 3 23 28 17 67 65 56 280 1
72.4 3 21 26 14 64 62 52 260 1
72.6 2 20 24 13 61 58 48 240 1

a
X = mean; SD = standard deviation.

accurate rendition of the population distribu- above the mean on all of the subtests but the
tion and improves the utility of the standardized percentile rank could vary, and could vary
scaled scores provided. If the population substantially the more the underlying distribu-
distribution of the construct in question is not tion deviates from that of the normal curve. It is
normal, for example, aggressive behavior (see thus important for clinicians to review test
Reynolds & Kamphaus, 1992), then a different manuals and ascertain the methods of scaling
form of transformation, typically linear, is that have been applied to the raw score
required to be accurate. This decision affects distributions. This becomes increasingly im-
how clinicians best interpret the ultimately portant as scores are to be compared across
scaled scores. different tests or batteries of tests. This effect is
If score distributions have been normalized magnified as the distance from the mean
for a battery of tests or subtests of a common increases.
test, for example, the Wechsler scales, the same
scaled score on any part-test will have the same
percentile rank. On the Wechsler Intelligence 4.02.4 ACCURACY OF TEST SCORES
Scale for Children-III (WISC-III; Wechsler, 4.02.4.1 True Score Theory
1992), for example, a subtest scaled score of 13 is
1 SD above the mean and, for all 13 subtests of When evaluating test scores, it is also
the WISC-III, will have a percentile rank of necessary to know just how accurately the score
approximately 86. If the scores had not been reflects the individual's true score on the test.
transformed through the nonlinear methods Tests typically do not ask every possible
necessary to approximate a normal distribution, question that could be asked or evaluate every
this would not be true. For a linear transforma- possible relevant behavior. Rather a domain of
tion, a scaled score of 13 could still be 1 SD possible questions or test items is defined and a
42 Fundamentals of Measurement and Assessment in Psychology

sample taken to form the test. Whenever less chapter, reliability has been referred to as
than the total number of possible behaviors estimated. This is because the absolute or ªtrueº
within a domain is sampled, sampling error reliability of a psychological test can never be
occurs. Psychological and educational tests are determined. Alpha and all other methods of
thus destined to be less than perfect in their determining reliability are, however, considered
accuracy. Certainly, psychological tests contain to be lower bound estimates of the true reliability
errors produced from a variety of other sources, of the test. One can be certain that the reliability
most of which are situational. Error resulting of a test is at least as high as the calculated
from domain sampling is the largest contributor estimate and possibly even higher.
to the degree of error in a test score, however Once the reliability of a test has been
(Feldt & Brennan, 1989; Nunnally, 1978), and is estimated, it is possible to calculate a sometimes
the type of error about which measurement more useful statistic known as the standard
theory has the greatest concern. Fortunately, error of measurement. Since there is always
this type of error is also the easiest and most some error involved in the score a person
accurately estimated. obtains on a psychological test, the obtained
Error caused by domain sampling is deter- score (Xi) does not truly represent the indivi-
mined from an analysis of the degree of dual's standing with regard to the trait in
homogeneity of the items in the test, that is, question. Obtained scores estimate an indivi-
how well the various items correlate with one dual's true score on the test (the score that
another and with an individual's true standing would be obtained if there was no error involved
on the trait being assessed. The relative accuracy in the measurement). Since this is not possible,
of a test is represented by a reliability coefficient the true score (X?) is defined as the mean score
symbolized as rxx. Since it is based on the of an individual if administered an infinite
homogeneity or consistency of the individual number of equivalent forms of a test and there
items of a test and no outside criteria or were no practice effects or other intervening
information are necessary for its calculation, rxx factors. The standard error of measurement
is frequently referred to as internal consistency (Sem) is the SD of the individual's distribution of
reliability or as an estimate of item homo- scores about his or her true score. To determine
geneity. Error caused by domain sampling is the Sem it is necessary to know only the SD and
also sometimes estimated by determining the the reliability (preferably an internal consis-
correlation between two parallel forms of a test tency estimate) of the test in question. The
(forms of a test that are designed to measure the calculation of X? and Sem are only estimates,
same variable with items sampled from the same however, since the conditions for determining a
item domain and believed to be equivalent). The true score never actually exist.
correlation between the two equivalent or Since the distribution of obtained scores
alternate forms is then taken as the reliability about the true score is considered to be normal,
estimate and is usually symbolized as rxx, rab, or one can establish a degree of confidence in test
rxy (although rxy is generally used to represent a results by banding the estimated true score by a
validity coefficient). specified number of Sems. A table of values
Split-half reliability estimates can also be associated with the normal curve (pictured in
determined on any specific test as a measure of Figure 1) quickly tells us how many Sems are
internal consistency. Split-half reliability is necessary for a given level of confidence. In a
typically determined by correlating each per- normal distribution, about 68% of all scores fall
son's score on the one-half of the items with his within 1 SD of the mean, and about 95% of all
or her score on the other half of the test with a scores fall within 2 SDs of the mean. Therefore,
correction for the original length of the test, if one wanted to be 68% certain that a range of
since length will affect reliability. Predetermined scores contained a person's true score, X?
or planned split-half comparisons such as would be banded by +1 Sem. To be 95% certain
correlating scores on odd numbered items with that a range of scores contained the true score, a
scores on the even numbered items may take range of X? + 2 Sems would be necessary.
advantage of chance or other factors resulting in When evaluating a test or performance on a
spuriously high estimates of reliability. A test, it is important to ascertain just what type of
reliability coefficient called alpha is a better reliability estimate is being reported. Sems
method for estimating reliability since it is the should typically be calculated from an internal
mean of all possible split-half comparisons, thus consistency estimate. Comparisons of reliability
expunging any sampling error resulting from estimates across tests should be based on the
the method of dividing the test for the purposes same type of estimate. For example, one should
of calculating a correlation between each half. not compare the reliability of two tests based on
As noted earlier, a number of techniques exist alternate form correlations for one test and
for estimating reliability. Throughout this estimation of the alpha coefficient for the other.
Validity 43

Test±retest correlations, also frequently re- Generalizability theory takes advantage of the
ferred to as reliability coefficients, should not capabilities of ANOVA in partitioning variance
be confused with measures of the accuracy or components to develop a model of unreliability
precision of a test at a given point in time. (as opposed to concentrating on statistical
Test±retest ªreliabilityº is one of the most significance). Through ANOVA, generalizabil-
often confused concepts of psychometric theo- ity theory is able to partition the error variance of
ry. Even Anastasi (1976), in introducing a set of scores into the components listed above,
reliability, refers to reliability as a measure of such as domain sampling error and the like,
the degree to which a person would obtain the along with some additional components not
same score if tested again at a later time. In the considered in true score theory. Generalizability
earlier stages of development of psychology theory is no more difficult mathematically than
when traits were considered unchanging, test± true score theory. Generalizability theory is
retest reliability was properly considered to be a surprisingly absent from the measurement
characteristic of the test and indeed was believed repertoire of most clinicians but is becoming
to be an indication of the degree to which a increasingly popular among measurement scien-
person would obtain the same score if tested tists. However, the understanding and applica-
again. Test±retest reliability speaks principally tion of generalizability theory does require an
of the stability of the trait being measured and understanding of methods and designs for
has little to do with the accuracy or precision of partitioning variance components in ANOVA,
measurement unless the psychological construct a skill that is perhaps on the decline in clinical
in question is considered to be totally unchange- training programs in favor of statistical methods
able. Given that traits such as anxiety and even more aligned with structural equation modeling.
intelligence do in fact change over time and that The basic foundations of generalizability
testing from one time to the next is positively theory can be found in Cronbach, Rajaratnam,
correlated, it is still possible to use the test±retest and Gleser (1963). A current, detailed explana-
correlation to determine estimates of what score tion appears in Feldt and Brennan (1989) along
a person would obtain upon retesting. Internal with the necessary mathematical models neces-
consistency estimates, however, should not be sary to apply generalizability theory to the
interpreted in such a manner. When psycholo- concept of error in test scores of groups and
gical constructs are not highly labile and individuals.
believed to change only over long periods of
time, test±retest correlations may be considered
to reflect the accuracy of a test if the two testings 4.02.5 VALIDITY
occur at close points in time during which the
trait under consideration is believed to be stable. Reliability refers to the precision or accuracy
of test scores. Validity refers to the appropri-
ateness of the interpretations of test scores and
4.02.4.2 Generalizability Theory not to the test or the score itself. ªValidity is an
integrated evaluative judgment of the degree to
Generalizability theory is an extension of true which empirical evidence and theoretical ratio-
score theory (also known as classical test theory) nales support the adequacy and the appropri-
that is achieved principally through use of ateness of inferences and actions based on test
analysis of variance (ANOVA) procedures. scores or other modes of assessmentº (Messick,
Often, more than one type of error is acting 1989, p. 13). As is reliability, validity is a matter
on a reliability coefficient. For example, in true of degree and not an all or none concept.
score theory, errors due to domain sampling Reliability will, however, enter into evaluation
(e.g., not asking about every possible symptom of the validity of an inference drawn from a test
of depression), errors due to faulty administra- score. Reliability is a necessary but insufficient
tion, scoring errors by the examiner, and errors condition for validity. As reliability approaches
associated with time sampling may all act to zero, the amount of random error in a test score
lower the average interitem correlation, which increases. The greater the relative proportion of
will reduce the internal consistency reliability of random error present, the less confidence one
the test score. Under true score theory, it is can have in any interpretation of a score since,
impossible to partial the relative contributions, by definition, random error is unrelated to
that is, to determine how much error is anything meaningful. Validation is not static
contributed by each subset of error to the total but is an ongoing process, not just of the
amount of unreliability. Even test±retest or corroboration of a particular meaning, but for
stability coefficients are confounded by internal the development of sounder and better inter-
consisting errors. The maximum r12 is equal to pretations of observations that are expressed as
the square root of rxx or max r12 = (rxx)‰. scores on a psychological test.
44 Fundamentals of Measurement and Assessment in Psychology

Although it is often done as a matter of coefficient, typically expressed as rxy, is re-


convenience or as simple short hand, it should stricted in magnitude. Its maximum true value is
be obvious by now that it is not correct equal to the square root of the product of the
technically to refer to the validity of a test. internal consistency reliability coefficients of the
Validity is a characteristic of the interpretation scores being compared: rxy max = (rxx ryy)1/2.
given to performance on a test. It makes no Construct validity of the interpretations given
sense, for example, to ask a question such as ªIs to psychological tests is one of the most complex
this Wechsler scale a valid test?º Rather, one issues facing the psychometrician and perme-
might pose the superior question ªIs the ates all aspects of test development and test use.
interpretation of performance on this Wechsler Psychology for the most part deals with
scale as reflecting intelligence or intellectual intangible constructs. Intelligence is one of
level valid?º This is more than a game of the most intensely studied constructs in the field
semantics as such subtle differences in language of psychology, yet it cannot be directly observed
affect the way we think about our methods and or evaluated. Intelligence can only be inferred
our devices. The difference in language and its from the observation and quantification of
implications are considered powerful enough what has been agreed upon as ªintelligentº
that Educational and Psychological Measure- behavior. Personality variables such as depen-
ment, one of the oldest and most respected dence, anxiety, need achievement, mania, and
journals in psychometrics, founded originally on through the seemingly endless list of
by Frederic Kuder, no longer allows authors in personality traits that psychologists have
its pages to refer to the validity of a test or the ªidentifiedº also cannot be observed directly.
reliability of a test. Reviewers for this journal Their existence is only inferred from the
are asked routinely to screen manuscripts for observation of behavior. Construct validity
improper or imprecise use of such terminology. thus involves considerable inference on the part
Just as reliability may take on a number of of the test developer and the researcher;
variations, so may validity. Quite a bit of construct validity is evaluated by investigating
divergent nomenclature has been applied to just what psychological properties a test
validity. Messick (1980) identified 17 ªdiffer- measures.
entº types of validity that are referred to in the Prior to being used for other than research
technical literature! Traditionally, validity has purposes, interpretations given to a test must be
been broken into three major categories: shown clearly to demonstrate an acceptable
content, construct, and predictive or criterion- level of validity. For use with various categories
related validity. These are the three types of of psychopathology, validation with normally
validity distinguished and discussed in the joint functioning individuals should be considered
Standards for Educational and Psychological insufficient. The validity of an interpretation
Tests (American Psychological Association, needs to be demonstrated for each group with
1985). Construct validity cuts across all whom it is used. This can be a long and
categories, and criterion-related validity is laborious process but is nevertheless a necessary
definitely a question of the relationship of test one. There are many subtle characteristics of
performance to other methods of evaluating various classes of exceptional children, for
behavior. example, that may cause an otherwise appro-
Content validity is determined by how well priate interpretation of a test to lack validity
the test items and their specific content sample with special groups (e.g., see Newland, 1980).
the set of behaviors or subject matter area about As has been noted by Cronbach (1971) and
which inferences are to be drawn on the basis of others, the term ªtest validationº can cause
the test scores. Criterion-related or predictive some confusion. In thinking about and evalu-
validity refers to either comparisons of test ating validity, we must always keep in mind that
scores with performance on accepted criteria of one does not ever actually validate a test but
the construct in question taken in close only the interpretation that is given to the score
temporal relationship to the test or the level on the test. Any test may have many applica-
of prediction of performance at some future tions and a test with originally a singular
time. Criterion-related validity is determined by purpose may prove promising for other appli-
the degree of correspondence between the test cations. Each application of a test or inter-
score and the individual's performance on the pretation of a test score must undergo
criterion. If the correlation between these two validation. Whenever hearing or reading that
variables is high, no further evidence may be a test has been validated, we need to know for
considered necessary (Nunnally, 1978). Here, what purpose it has been validated, and what
reliability has a direct, and known, limiting interpretations of scores from the instrument in
effect on validity. A correlation between a question have been shown empirically to be
predictor (x) and a criterion (y), a validity justifiable and accurate.
The Assessment Process 45

4.02.6 THE ASSESSMENT PROCESS consuming psychological evaluation. They are


referred to a psychologist for some more or less
As noted at the opening of this chapter, specific reason; a problem of some kind exists.
assessment is an involved, comprehensive pro- The assessment process then plays a major role in
cess of deriving meaning from test scores to accurately identifying and describing the pro-
achieve a broad but detailed description and blem, suggesting solutions, and properly carried
understanding of the individual. The description through, provides ideas for modifying the
here of assessment as a process is important. initially proposed interventions.
Assessment, properly carried out, is not a static It is necessary in the assessment process to
collection of information, but an ongoing entertain and evaluate information from a
dynamic synthesis and evaluation of data, variety of sources if the assessment is to be
reliably obtained, from multiple sources relevant ecologically valid. Each situation will dictate the
to the current, and possibly future, status of the relevance and appropriate weighting of each
individual. Assessment is open endedÐnew piece of information. Age and physical condition
information can occur daily that can properly are two obvious factors that influence the
alter one's perception of the ecological validity gathering of information regarding child and
of prior impressions and recommendations. adult patients. Palmer (1980), Newland (1980),
Crucial to the assessment process, and far too Salvia and Ysseldyke (1981), and Sattler (1988)
frequently neglected or overlooked, is follow-up have discussed factors to be included in the
evaluation that should occur after more formal assessment process when evaluating exceptional
diagnostic assessments have been made and children in the schools. The following are
habilitative recommendations implemented. generally accepted to be important aspects of
There are no absolutes in psychological and assessment: medical condition, sensory and
educational testing; no profile of assessment motor skills, school performance and behavior
information is inexorably linked with a single (e.g., group achievement tests, grades, teacher
method of treatment, remediation, or interven- checklists), individual intelligence test scores,
tion that will always be successful. Currently, special aptitude and achievement test perfor-
the opposite is the case; the search for the mance, affective characteristics (e.g., personality
aptitude 6 treatment interaction is nearly as tests), teacher reports on behavior and peer
elusive as that for the neural engram. The interaction, the child±school interaction, char-
follow-up component of the assessment process acteristics of the classroom, parent reports on
is crucial to the fine-tuning of existing inter- behavior, the social and cultural milieu of the
vention procedures and in some cases more home, and the child's developmental history.
massive overhauling of an intervention. Each of these factors takes on more or less
Psychological and educational testing and importance for individual patients. With adult
assessment are far from exact, just as are the patients, many of the same types of information
clinical assessment procedures of medicine and will be relevant with a conceptual shift toward
related specialties. When used in diagnosis, adulthood (Anastasi & Urbina, 1997). The
assessment allows one simply to narrow the patient's vocational functioning and relation-
number of disorders under serious considera- ships including parents, spouse, and children will
tion. Similarly, when used in the search for an all need to be considered when designing the
appropriate method of habilitation for a assessment and later when interpreting the
handicapped youngster, the assessment process results. More specialized types of knowledge
allows the psychologist to narrow the number of may be required for any given case. For example,
strategies (i.e., hypotheses) from which to choose in certain genetically-based disorders, a com-
one that is believed to be most effective. There plete family history may be necessary to achieve
are no guarantees that the first strategy adopted a good understanding of the nature of the
will be the most effective program of treatment patient's difficulty.
(or be effective at all for that matter). Kaufman Numerous methods of psychological testing
(1994) described the proper attitude of the can be used in the assessment process. Each will
psychologist involved in assessment to be that of have its own strengths and weaknesses. There are
a ªdetectiveº who evaluates, synthesizes, and frequent debates in the psychological literature
integrates data gleaned from the assessment over the relative merits of one category of
process with knowledge of psychological the- assessment over another, with some respondents
ories of development and the psychology of carrying on with nearly religious fervor. How-
individual differences (also see Reynolds, 1981b; ever, these arguments can be resolved quickly by
Reynolds & Clark, 1982). As described here, the recalling that tests are tools of assessment and
assessment process is a major component in most certainly not an end in themselves.
psychological problem-solving. Individuals are Different methods and techniques of testing
not randomly selected for an expensive, time- are best seen and used as complementary in
46 Fundamentals of Measurement and Assessment in Psychology

assessment, which is a problem-solving process similarity of purpose. There are, however, some
requiring much information. With these ad- basic distinctions among these measures. In-
monitions in mind, it is time to turn to a telligence tests tend to be broad in terms of
discussion of various methods of testing and content; items sample a variety of behaviors that
their role in the assessment process. are considered to intellectual in nature. Intelli-
gence tests are used both to evaluate the current
intellectual status of the individual and to predict
4.02.7 MODELS AND METHODS OF future behavior on intellectually demanding
ASSESSMENT tasks and to help achieve a better understanding
A variety of assessment methods are available of past behavior and performance in an
for evaluating adults and exceptional children. intellectual setting. Achievement tests measure
Some of these methods grew directly from relatively narrowly defined content, sampled
specific schools of psychological thought, such from a specific subject matter domain that
as the psychoanalytic view of Freud (projective typically has been the focus of purposeful study
assessment techniques) or the behavioral schools and learning by the population for whom the test
of Watson, Skinner, and Bandura (applied is intended. Intelligence tests by contrast are
behavior analysis). Other methods have grown oriented more toward testing intellectual pro-
out of controversies in and between existing cesses and use items that are more related to
academic disciplines such as personality theory incidental learning and not as likely to have been
and social psychology. New and refined meth- specifically studied as are achievement test items.
ods have come about with new developments in Tests of special abilities, such as memory,
medicine and related fields, whereas other new mechanical aptitude, and auditory perception,
testing methods stem from advances in the are narrow in scope as are achievement tests but
theory and technology of the science of focus on process rather than content. The same
psychological measurement. Unfortunately, still test question may appear on an intelligence,
other new techniques stem from psychological achievement, or special ability test, however, and
and educational faddism with little basis in closely related questions frequently do. Tests of
psychological theory and little if any empirical intelligence and special abilities also focus more
basis. Any attempt to group tests by character- on the application of previously acquired
istics such as norm-referenced vs. criterion- knowledge, whereas achievement tests focus
referenced, traditional vs. behavioral, maximum on testing just what knowledge has been
vs. typical performance, and so on, is doomed to acquired. One should not focus on single items;
criticism. As will be seen in the pages that follow, it is the collection of items and the use and
the demarcations between assessment methods evaluation of the individual's score on the test
and models are not so clear as many would that are the differentiating factors.
contend. In many cases, the greatest distinctions
lie in the philosophical orientation and intent of (i) Intelligence tests
the user. As one prominent example, many
ªbehavioralº assessment techniques are as Intelligence tests are among the oldest devices
bound by norms and other traditional psycho- in the psychometric arsenal of the psychologist
metric concepts as are traditional intelligence and are likely the most frequently used category
tests (Cone, 1977). Even trait measures of of tests in the evaluation of exceptional children,
personality end up being labeled by some as especially in the cases of mental retardation,
behavioral assessment devices (e.g., Barrios, learning disabilities, and intellectual giftedness.
Hartmann, & Shigetomi, 1981). The division of Intelligence and aptitude tests are used fre-
models and methods of assessment to follow is quently in adult assessment as well and are
based in some part on convenience and clarity of essential diagnostic tools when examining for
discussions but also with an eye toward main- the various dementias. They are used with
taining the most important conceptual distinc- adults in predicting a variety of other cognitive
tions among these assessment methods. disorders and in the vocational arena. Since the
translation and modification of Alfred Binet's
intelligence test for French schoolchildren was
4.02.7.1 Traditional Norm-referenced introduced in the United States by Lewis
Assessment Terman (of Stanford University, hence the
Stanford±Binet Intelligence Scale), a substantial
4.02.7.1.1 Intelligence, achievement, and special
proliferation of such tests has occurred. Many
abilities
of these tests measure very limited aspects of
These assessment techniques have been intelligence (e.g., Peabody Picture Vocabulary
grouped together primarily because of their Test, Columbia Mental Maturity Scale, Am-
similarity of content and, in some cases, their mons and Ammons Quick Test), whereas others
Models and Methods of Assessment 47

give a much broader view of a person's intelligence tests. To use them well requires
intellectual skills, measuring general intelligence mastery of the broader field of psychology,
as well as more specific cognitive skills (e.g., the especially differential psychology, the psycho-
various Wechsler scales). Unfortunately, while logical science that focuses on the psychological
intelligence is a hypothetical psychological study and analysis of human individual differ-
construct, most intelligence tests were devel- ences and theories of cognitive development.
oped from a primarily empirical basis, with little Extensive discussions of the clinical evaluation
if any attention given to theories of the human of intelligence can be found in Kaufman (1990,
intellect. Empiricism is of major importance in 1994) and Kaufman and Reynolds (1983).
all aspects of psychology, especially psycholo-
gical testing, but is insufficient in itself. It is
important to have a good theory underlying the
(ii) Achievement tests
assessment of any theoretical construct such as
intelligence. Various types of achievement tests are used
Intelligence tests in use today are for the most throughout the public schools with regular
part individually administered (i.e., a psychol- classroom and exceptional children. Most
ogist administers the test to an individual in a achievement tests are group tests administered
closed setting with no other individuals pre- with some regularity to all students in a school
sent). For a long time, group intelligence tests or system. Some of the more prominent group
were used throughout the public schools and in tests include the Iowa Test of Basic Skills, the
the military. Group tests of intelligence are used Metropolitan Achievement Test, the Stanford
more sparingly today because of their many Achievement Test, and the California Achieve-
abuses in the past and the limited amount of ment Test. These batteries of achievement tests
information they offer about the individual. typically do not report an overall index of
There is little of utility to the schools, for achievement but rather report separately on
example, that can be gleaned from a group achievement in such academic areas as English
intelligence test that cannot be obtained better grammar and punctuation, spelling, map read-
from group achievement tests. Individual ing, mathematical calculations, reading com-
intelligence tests are far more expensive to use prehension, social studies, and general science.
but offer considerably more and better infor- The tests change every few grade levels to
mation. Much of the additional information, accommodate changes in curriculum emphasis.
however, comes from having a highly trained Group achievement tests provide schools with
observer (the psychologist) interacting with the information concerning how their children are
person for more than an hour in a quite achieving in these various subject areas relative
structured setting, with a variety of tasks of to other school systems throughout the country
varying levels of difficulty. The most widely and relative to other schools in the same district.
used individually administered intelligence They also provide information about the
scales today are the Wechsler scales, the progress of individual children and can serve
Kaufman scales, and the Stanford±Binet In- as good screening measures in attempting to
telligence Scale (Fourth Edition). Though the identify children at the upper and lower ends of
oldest and best known of intelligence tests, the the achievement continuum. Group adminis-
Binet has lost much of its popularity and is now tered achievement tests help in achieving a good
a distant third. understanding of the academic performance of
Intelligence testing, which can be very useful these individuals but do not provide sufficiently
in clinical and vocational settings, is also a detailed or sensitive information on which to
controversial activity, especially with regard to base major decisions. When decision-making is
the diagnosis of mild mental retardation among called for or an in-depth understanding of a
minority cultures in the United States. Used child's academic needs is required, individual
with care and compassion, as a tool toward testing is needed.
understanding, such tests can prove invaluable. Psychologists use achievement measures with
Used recklessly and with rigidity, they can cause adult clients as well. With the elderly, acquired
irreparable harm. Extensive technical training is academic skills tend to be well preserved in the
required to master properly the administration early stages of most dementias and provide a
of an individual intelligence test (or any good baseline of promorbid skills. Academic
individual test for that matter). Even greater skills can also be important in recommending
sensitivity and training are required to interpret job placements, as a component of child custody
these powerful and controversial devices. Ex- evaluations, in rehabilitation planning, and in
tensive knowledge of statistics, measurement the diagnosis of adult learning disorders and
theory, and the existing research literature adult forms of attention deficit hyperactivity
concerning testing is a prerequisite to using disorder.
48 Fundamentals of Measurement and Assessment in Psychology

(iii) Tests of special abilities Jean & Reynolds, 1982; Reynolds, 1998b), and is
a more significant problem with personality
These are specialized methods for assessing
scales than cognitive scales. Papers have even
thin slices of the spectrum of abilities for any
been published providing details on how to
single individual. These measures can be helpful
distort responses on personality tests in the
in further narrowing the field of hypotheses
desired direction (e.g. Whyte, 1967). Although
about an individual's learning or behavior
there is no direct solution to this problem, many
difficulties when used in conjunction with
personality measures have built-in ªlieº scales or
intelligence, achievement, and personality mea-
social desirability scales to help detect deliberate
sures. The number of special abilities that can be
faking to make one look as good as possible and
assessed is quite large. Some examples of these
F or infrequency scales to detect the faking of the
abilities include visual±motor integration skills,
presence of psychopathology.
auditory perception, visual closure, figure±
The use and interpretation of scores from
ground distinction, oral expression, tactile form
objective personality scales also has implica-
recognition, and psychomotor speed. While
tions for this problem. Properly assessed and
these measures can be useful, depending on the
evaluated from an empirical basis, response to
questions to be answered, one must be parti-
the personality scale is treated as the behavior of
cularly careful in choosing an appropriate,
immediate interest and the actual content con-
valid, and reliable measure of a special ability.
veyed by the item becomes nearly irrelevant. As
The use and demand for these tests are
one example, there is an item on the Revised-
significantly less than that for the most popular
Children's Manifest Anxiety Scale (RCMAS;
individual intelligence tests and widely used
Reynold's & Richmond, 1978, 1985), a test
achievement tests. This in turn places some
designed to measure chronic anxiety levels in
economic constraints on development and
children, that states ªMy hands feel sweaty.º
standardization procedures, which are very
Whether the child's hands actually do feel
costly enterprises when properly conducted.
sweaty is irrelevant. The salient question is
One should always be wary of the ªquick and
whether children who respond ªtrueº to this
dirtyº entry into the ability testing market.
question are in reality more anxious than
There are some very good tests of special
children who respond ªfalseº to such a query.
abilities available, although special caution is
Children who respond more often in the keyed
needed. For example, simply because an ability
direction on the RCMAS display greater
is named in the test title is no guarantee that the
general anxiety and exhibit more observed
test measures that particular ability. As with all
behavior problems associated with anxiety than
tests, what is actually measured by any collec-
children who respond in the opposite manner.
tion of test items is a matter for empirical
Although face validity of a personality or other
investigation and is subject to the process of
test is a desirable quality, it is not always a
validation.
necessary one. It is the actuarial implications of
To summarize, norm-referenced tests of
the behavioral response of choosing to respond
intelligence, achievement, and special abilities
in a certain manner that holds the greatest
provide potentially important information in
interest for the practitioner. Scales developed
the assessment process. Yet each supplies only a
using such an approach are empirical scales.
piece of the required data. Equally important
Another approach is to devise content scales.
are observations of how the patient behaves
As the name implies, the item content of such
during testing and in other settings, and
scales is considered more salient than its purely
performance on other measures.
empirical relationship to the construct. Indivi-
duals with depression, especially men, may be
4.02.7.2 Norm-referenced, Objective edgy and irritable at times. Thus, the item ªI
Personality Measures sometimes lash out at others for no good
reasonº might show up on an empirically
Whereas tests of aptitude and achievement derived scale assessing depression, but is
can be described as maximum performance unlikely to find its way onto a content scale.
measures, tests of personality can be described as ªI am most often sadº would be a content-scale
typical performance measures. When taking a item assessing depression. Content scales are
personality test, one is normally asked to typically derived via expert judgments, but from
respond according to one's typical actions and an item pool that has passed muster at some
attitudes and not in a manner that would present empirical level already.
the ªbestº possible performance (i.e., most The emphasis on inner psychological con-
socially desirable). The ªfakingº or deliberate structs typical of personality scales poses special
distortion of responses is certainly possible, to a problems for their development and validation.
greater extent on some scales than others (e.g., A reasonable treatment of these issues can be
Models and Methods of Assessment 49

found in most basic psychological measurement examiners agree on how a particular answer is
texts (e.g., Anastasi & Urbina, 1997; Cronbach, scored, tests are considered objective; if not, they
1983). are considered subjective. Projective is not
Objective personality scales are probably the synonymous with subjective in this context but
most commonly used of all types of tests by most projective tests are closer to the subjective
clinical psychologists. They provide key evi- than objective end of the continuum of agree-
dence for the differentiation of various forms of ment on scoring of responses. Projective tests are
psychopathology including clinical disorders sets of ambiguous stimuli, such as ink blots or
and especially personality disorders. Omnibus incomplete sentences, and the individual re-
scales such as the MMPI-2 and the Millon sponds with the first thought or series of thoughts
Clinical Multiaxial Inventory-3 are common that come to mind or tells a story about each
with adult populations and also have adolescent stimulus. Typically no restrictions are placed on
versions. Omnibus scales directed at children individuals' response options. They may choose
and adolescents specifically however may be to respond with anything desired; in contrast, on
more appropriate for these younger age ranges. an objective scale, individuals must choose
Among the many available, the Personality between a set of answers provided by the test
Inventory for Children and the Self-report of or at least listed out for the examiner in a scoring
Personality from the Behavior Assessment manual. The major hypothesis underlying
System for Children (BASC: Reynolds & projective testing is taken from Freud (Exner,
Kamphaus, 1992) are the most commonly used. 1976). When responding to an ambiguous
Omnibus scales that are multidimensional in stimulus, individuals are influenced by their
their construction are important to differential needs, interests, and psychological organization
diagnosis in the early stages of contact with a and tend to respond in ways that reveal, to the
patient. As diagnosis is established and treat- trained observer, their motivations and true
ment is in place, narrower scales that coincide emotions, with little interference from the
with the clinical diagnosis become more useful conscious control of the ego. Various psycho-
for follow-up assessment and monitoring of dynamic theories are applied to evaluating test
treatment. In this regard, scales such as the Beck responses, however, and herein too lie problems
Depression Inventory or the State-Trait Anxi- of subjectivity. Depending on the theoretical
ety Inventory are common examples. The orientation of the psychologist administering
tremendous cultural diversity of the world the test, very different interpretations may be
and how culture influences perceptions of items given. Despite the controversy surrounding
about the self and about one's family places these tests, they remain very popular.
special demands of cultural competence and Projective methods can be divided roughly
cultural sensitivity on psychologists interpreting into three categories according to the type of
personality scales outside of their own cultural stimulus presented and the method of response
or ethnic heritage (e.g., see Dana, 1996; called for by the examiner. The first category
Hambleton, 1994; Moreland, 1995). calls for the interpretation of ambiguous visual
stimuli by the patient with an oral response.
4.02.7.3 Projective Assessment Tests in this category include such well-known
techniques as the Rorschach and the Thematic
Projective assessment of personality has a Apperception Test (TAT). The second category
long, rich, but very controversial history in the includes completion methods, whereby the
evaluation of clinical disorders and the descrip- patient is asked to finish a sentence when given
tion of normal personality. This controversy an ambiguous stem or to complete a story begun
stems largely from the subjective nature of the by the examiner. This includes the Despert
tests used and the lack of good evidence of Fables and a number of sentence completion
predictive validity coupled with sometimes tests. The third category includes projective art,
fierce testimonial and anecdotal evidence of primarily drawing techniques, although sculp-
their utility in individual cases by devoted ture and related art forms have been used. In
clinicians. these tasks, the child is provided with materials
The subjectiveness of projective testing ne- to complete an artwork (or simple drawing) and
cessarily results in disagreement concerning the given instructions for a topic, some more specific
scoring and interpretation of responses to the than others. Techniques such as the Kinetic-
test materials. For any given response by any Family-Drawing, the Draw-A-Person, and the
given individual, competent professionals would Bender±Gestalt Test fall in this category.
each be likely to interpret differently the meaning Criterion-related and predictive validity have
and significance of the response. It is primarily proven especially tricky for advocates of
the agreement on scoring that differentiates projective testing. Although techniques such
objective from subjective tests. If trained as the TAT are not as amenable to study and
50 Fundamentals of Measurement and Assessment in Psychology

validation through the application of traditional category, the scope of the work simply will
statistical and psychometric methods as objec- not allow us to address this aspect of behavioral
tive tests may be, many clinical researchers have assessment except to say that it is indeed a most
made such attempts with less than heartening useful one in the treatment of a variety of
results. None of the so-called objective scoring clinical disorders.
systems for projective devices has proved to be The impetus for behavioral assessment comes
very valuable in predicting behavior, nor has the not only from the field of behavior therapy but
use of normative standards been especially also from a general revolt against the high level
fruitful. This should not be considered so of inference involved in such methods of
surprising; however, it is indeed the nearly assessing behavior as the Rorschach and the
complete idiographic nature of projective tech- TAT. The greatest distinguishing characteristic
niques that can make them useful in the between the behavioral assessment of psycho-
evaluation of a specific patient. It allows for pathological disorders and most other techni-
any possible response to occur, without restric- ques is the level of inference involved in moving
tion, and can reveal many of a patient's current from the data provided by the assessment
reasons for behaving in a specific manner. When instrument to an accurate description of the
used as part of a complete assessment, as defined patient and the development of an appropriate
in this chapter, projective techniques can be quite intervention strategy. This is a most useful
valuable. When applied rigidly and without strength for behavioral assessment strategies
proper knowledge and consideration of the but is their greatest weakness when it is
patient's ecology, they can, as with other tests, be necessary to understand what underlies the
detrimental to our understanding of the patient. observed behaviors.
For a more extensive review of the debates over Many of the early conceptual and methodo-
projective testing, the reader is referred to Exner logical issues have been resolved in this area of
(1976), Jackson and Messick (1967, Part 6), assessment, for example, the importance of
O'Leary and Johnson (1979), and Prevatt (in norms and other traditional psychometric
press), as well as to other chapters in this volume, concepts such as reliability and validity (Cone,
especially Chapter 15. 1977; Nelson, Hay, & Hay, 1977). Problems of
interobserver reliability and observer drift
4.02.7.4 Behavioral Assessment remain but are well on their way to being
resolved. Unquestionably, behavioral assess-
The rapid growth of behavior therapy and ment is an exciting and valuable part of the
applied behavior analysis has led to the need for assessment process. Behavioral assessment grew
tests that are consistent with the theoretical and from a need to quantify observations of a
practical requirements of these approaches to patient's current behavior and its immediate
the modification of human behavior. Thus, the antecedents and consequences, and this is the
field of behavioral assessment has developed context within it that remains most useful today.
and grown at an intense pace. Book length There are a number of formal behavior rating
treatments of the topic became commonplace in scales or behavior checklists now available.
the 1970s and 1980s (e.g., Haynes & Wilson, These instruments typically list behaviors of
1979; Hersen & Bellack, 1976; Mash & Terdal, interest in clearly specified terms and have a
1981) and entire journals are now devoted to trained observer or an informant indicate the
research regarding behavioral assessment (e.g., frequency of occurrence of these behaviors.
Behavioral Assessment). The general term Interpretation can then take on a normative or a
ªbehavioral assessmentº has come to be used criterion-reference nature depending on the
to describe a broad set of methods including purpose of the assessment and the availability
some traditional objective personality scales, of norms. Clusters of behaviors may be of
certain methods of interviewing, physiological interest that define certain clinical syndromes
response measures, naturalistic observation, such as attention deficit hyperactivity disorder.
norm-referenced behavior checklists, frequency On the other hand, individual behaviors may be
counts of behavior, and a host of informal the focus (e.g., hitting other children). More
techniques requiring the observation of a frequently, behavioral assessment occurs as an
behavior with recording of specific responses. ªinformalº method of collecting data on specific
Behavioral assessment will be discussed here in behaviors being exhibited by a patient and is
its more restricted sense to include the rating (by dictated by the existing situation into which the
self or others) of observable behavioral events, psychologist is invited. An informal nature is
primarily taking the form of behavior checklists dictated by the nature of behavioral assessment
and rating forms that may or may not be in many instances. Part of the low level of
normed. Although I would certainly include inference in behavioral assessment lies in not
psychophysiological assessment within this generalizing observations of behavior across
Models and Methods of Assessment 51

settings without first collecting data in multiple the application of a specific set of tests or battery
settings. In this regard, behavioral assessment of tests. Far from being a set of techniques, the
may for the most part be said to be psychositua- major contribution of neuropsychology to the
tional. Behavior is observed and evaluated under assessment process is the provision of a strong
existing circumstances, and no attempt is made paradigm from which to view assessment data
to infer that the observed behaviors will occur (Reynolds, 1981b, 1981c, 1997). Without a
under other circumstances. Comprehensive sys- strong theoretical guide to test score interpreta-
tems that are multimethod, multidimensional, tion, one quickly comes to rely upon past
and that assess behavior in more than one setting experience and illusory relationships and trial
have been devised and provide a good model for and error procedures when encountering a
the future (Reynolds & Kamphaus, 1992). patient with unique test performance. As with
Another area of assessment that stems from most areas of psychology, there are competing
behavioral psychology and is considered by neuropsychological models of cognitive func-
many to be a subset of behavioral assessment is tioning, any one of which may be most
task analysis. Whereas behavioral assessment appropriate for a given patient. Thus consider-
typically finds its greatest applicability in dealing able knowledge of neuropsychological theory is
with emotional and behavioral difficulties, task required to evaluate properly the results of
analysis is most useful in evaluating and neuropsychological testing.
correcting learning problems. In task analysis, Since the 1950s, clinical testing in neuropsy-
the task to be learned (e.g., brushing one's teeth chology has been dominated by the Halstead±
or multiplying two-digit numbers) is broken Reitan Neuropsychological Test Battery
down into its most basic component parts. The (HRNTB), although the Luria±Nebraska Neu-
learner's skill at each component is observed and ropsychological Battery and the Boston process
those skills specifically lacking are targeted for approach have made significant inroads. The
teaching to the child. In some cases, hierarchies prevalence of use of the HRNTB is partly
of subskills can be developed, but these have not responsible for perceptions of clinical neuro-
held up well under cross-validation. Task psychology as primarily a set of testing
analysis can thus be a powerful tool in specifying techniques. However, a brief examination of
a learner's existing (and needed) skills for a given the HRNTB should quickly dispel such ideas.
learning problem. Task analysis could, for The HRNTB consists of a large battery of tests
example, form an integral part of any behavioral taking a full day to administer. There is little
intervention for a child with specific learning that can be said to be psychologically or
problems. The proper use of these procedures psychometrically unique about any of these
requires a creative and well-trained individual tests. They are all more or less similar to tests
conversant with both assessment technology that psychologists have been using for the past
and behavioral theories of learning, since there 50 years. The HRNTB also typically includes a
are no standardized task analysis procedures. traditional intelligence test such as one of the
Those involved in task analysis need to be Wechsler scales. The HRNTB is unique in the
sensitive to the reliability and validity of their particular collection of tests involved and the
methods. As with other behavioral assessment method of evaluating and interpreting perfor-
techniques, some contend that behavioral mance. While supported by actuarial studies,
assessment techniques need only demonstrate HRNTB performance is evaluated by the
that multiple observers can agree on a descrip- clinician in light of existing neuropsychological
tion of the behavior and when it has been theories of cognitive function, giving the battery
observed. Though not having to demonstrate a considerable explanatory and predictive power.
relationship with a hypothetical construct, Neuropsychological approaches to clinical
behavioral techniques must demonstrate that assessment are rapidly growing and can be most
the behavior observed is consistent and relevant helpful in defining areas of cognitive-neurop-
to the learning problems. For behavior check- sychological integrity and not just in evaluating
lists and more formal behavioral assessment deficits in neurological function. Neuropsycho-
techniques, most traditional psychometric con- logical techniques can also make an important
cepts apply and must be evaluated with regard to contribution by ruling out specific neurological
the behavioral scale in question. problems and pointing toward environmental
determinants of behavior. The well-trained
4.02.7.5 Neuropsychological Assessment neuropsychologist is aware that the brain does
not operate in a vacuum but is an integral part
Along with behavioral assessment, perhaps of the ecosystem of the patient. As with other
the most rapidly growing area in patient methods of assessment, neuropsychological
evaluation is neuropsychological assessment. assessment has much to offer the assessment
Many view neuropsychological assessment as process when used wisely; poorly or carelessly
52 Fundamentals of Measurement and Assessment in Psychology

implemented, it can create seriously false decision-making and clinicians' objections to


impressions, lessen expectations, and precipi- acceptance of more than 70 years of consistent
tate a disastrous state of affairs for the patient it research findings, as has Kleinmuntz (1990),
is designed to serve. Clinicians who use who did seminal research on developing expert
neuropsychological approaches to their work systems for MMPI interpretation in the 1960s.
or make neuropsychological interpretations of Grove and Meehl (1996) state the most
tests or behaviors are in high demand but common objection of clinicians to statistical
require specialized training that takes years to modeling is that they (the clinicians) use a
acquire. combination of clinical and statistical methods,
obviating the issue since they are used in a
complementary model. This is a spurious
4.02.8 CLINICAL VS. STATISTICAL argument because, as research shows, the
PREDICTION clinical method and the statistical method often
disagree and there is no known way to combine
Given a set of test results and/or additional the two methods to improve predictions; we
interview or historical data on a patient, there simply do not know under what conditions to
are two fundamental approaches a clinician can conclude the statistical model will be wrong
apply to diagnostic decision-making or to the (also see Faust & Ackley, 1998, and Reynolds,
prediction of future behavior. The first, and 1998a). Grove and Meehl (1996) illustrate this
likely most common, is the simple human quandary by examining the actions of an
judgment of the clinician who reviews the data admissions committee. Suppose the applicant's
in conjunction with prior experiences and test scores, grade point average, and rank in
knowledge of diagnostic criteria, psychopathol- class predict successful college performance but
ogy, and other psychological literature. As a the admissions commission believes, perhaps
result of the application of this clinical method, a based on an interview and letters of recommen-
diagnosis or prediction is made. Alternatively, dation, that the applicant will not be successful.
the clinician may apply a formal algorithm or The two methods cannot be combined when
set of mathematical equations to predict they specify different outcomes. One is right and
membership in a class (a diagnosis) or the one is wrong. Consider a crucial prediction in a
probability of some future behavior (e.g., forensic case. A psychologist treating an
whether a defendant will or will not molest offender on parole for aggravated sexual assault
children again if placed on probation). The use of a child is asked whether the offender might
of such mechanistic approaches constitutes the molest a child of the same age and gender as a
statistical method. prior victim if the child is placed in the
The ability of clinicians to use test data in offender's home (he recently married the child's
optimal ways and to combine multiple sources mother). Actuarial tables with rearrest rates for
of data optimally to arrive at correct diagnoses offenders with many similar salient character-
and to make predictions about future behavior istics and in treatment indicate that 10±11% of
has been the subject of much scrutiny. Meehl those individuals will be arrested again for the
(1954) addressed this problem early on and same offense. The psychologist concludes that
concluded that formula-based techniques, de- the offender is virtually a zero percent risk,
rived by mathematical models that search for however, because ªhe has worked so hard in
optimal combinations of data, are superior to treatment.º The two methods again cannot be
clinicians in decision-making. This has been resolved; yet, as clinicians we persist in believing
difficult for clinicians to accept even as more we can do better than the statistical models even
and more actuarial systems for test interpreta- in the face of a substantial body of contra-
tion find their way into our office computers. In dictory evidence. Grove and Meehl (1996)
70+ years of research on this topic, actuarial review some 16 other objections clinicians make
modeling continues to be superior (Faust & to statistical methods in diagnosis and other
Ackley, 1998; Grove & Meehl, 1996), yet I clinical predictions. Most resolve to questions of
listened to a clinical psychologist testify in validity, reliability, or cost. Cost and incon-
February of 1998 that clinical judgment was veniences are rapidly becoming a nonissue as
always better and that actuarial predictions computerized models are widely available and
were used only when you had nothing else to some of these cost less than one dollar per
use. In 136 studies since 1928, over a wide range application for a patient (e.g., Stanton, Rey-
of predictions, actuarial modeling is invariably nolds, & Kamphaus, 1993).
equal to or superior to clinical methods (Grove Statistical models work better, consistently,
& Meehl, 1996). Grove and Meehl (1996) have and for known reasons. Clinicians cannot assign
addressed this reluctance (or perhaps ignor- optimal weights to variables they use to make
ance) about actuarial modeling in clinical- decisions, do not apply their own rules
Concluding Remarks 53

consistently, and are influenced by relatively bound volumes known as the Mental Measure-
unreliable data (e.g., see Dawes, 1988; Faust, ments Yearbook (MMY). The first MMY was
1984; Grove & Meehl, 1996; Meehl, 1954). As published by Buros in 1938 and the series
the reliability of a predictor goes down, the continues today. Buros died in 1978, during the
relative accuracy of any prediction will be final stages of production of the Eighth MMY
reduced and, consequently, the probability of (though ªYearbooks,º they are not published
being wrong increases. The decisions being annually), and his spouse, art director, and
made are not trivial. Every day, thousands of assistant Luella Buros saw the Eighth MMY to
diagnostic decisions are being made that completion. Subsequently, she opened a com-
influence treatment choices, freedom for par- petition for proposals to adopt the Institute and
olees and probationers, large monetary awards continue the work of her late husband. A
in personal injury cases, custody of children, proposal written by this author (then on the
and others. What clinical psychologists do is faculty of the University of Nebraska-Lincoln)
important and the increased use of statistical was chosen and the Buros Institute of Mental
models based on sound principles of data Measurement was established in 1979 at the
collection that includes data from well-standar- University of Nebraska-Lincoln, where it
dized, objective psychological tests seems im- remains permanently due to a generous endow-
perative from an ethical standpoint and from ment from Luella Buros.
the standpoint of the survival of the profession The Institute continues to seek out competent
as accountability models are increasingly ap- reviewers to evaluate and provide critical
plied to reviews of the need for and effectiveness commentary on all educational and psycholo-
of our services. gical tests published in the English language.
These reviews are collected in an ongoing
process, as tests are published or revised, under
4.02.9 ACCESSING CRITICAL a strict set of rules designed to ensure fair
COMMENTARY ON reviews and avoid conflicts of interest. The
STANDARDIZED collected reviews are published on an unsched-
PSYCHOLOGICAL TESTS uled basis approximately every five to eight
years. However, as reviews are written and
Not every practitioner can nor should be an accepted for publication, they are added quickly
expert on the technical adequacy and measure- to the Buros Institute database which may be
ment science underlying each and every stan- searched on-line by subscription to the master
dardized test that might be useful to their database or through most major university
clinical practice. With a clear understanding of libraries. Information on how to access current
the fundamentals of measurement, however, reviews in the Buros database can be obtained
clinicians can make intelligent choices about test through nearly any reference librarian or
selection and test use based on the test manuals through a visit to the Buros Institute website.
and accompanying materials in most cases. The Institute established a sterling reputation as
However, when additional expertise or com- the ªconsumer reportsº of the psychological
mentary is required, critical reviews of nearly testing industry under the 50 year leadership of
every published test can be accessed with Oscar Buros and this reputation and service
relative ease. have been continued at the University of
Many journals in psychology routinely pub- Nebraska-Lincoln. Consumers of tests are
lish reviews of new or newly revised tests encouraged to read the Buros reviews of the
including such publications as the Archives of tests they choose to use.
Clinical Neuropsychology, Journal of Psycho-
educational Assessment, Journal of Learning
Disabilities, Journal of School Psychology, Child 4.02.10 CONCLUDING REMARKS
Assessment News, and the Journal of Personality
Assessment. However, the premier source of Knowledge of measurement science and
critical information on published tests are the psychometrics is critical to good practice in
publications of the Buros Institute of Mental clinical assessment. Hopefully, this review,
Measurement. targeted at practice, has provided a better
In the late 1920s, Oscar Krisen Buros began foundation for reading about psychological
to publish a series of monographs reviewing testing and for developing better skills in the
statistics texts. He noted the rapid proliferation application of methods of testing and assess-
of psychological and educational tests begin- ment. Old tests continue to be revised and
ning to occur at the same time and rapidly updated, and many new tests are published
turned his attention to obtaining high-quality yearly. There is a large, rapidly growing body of
reviews of these tests and publishing them in literature on test interpretation that is too often
54 Fundamentals of Measurement and Assessment in Psychology

ignored in practice (e.g., Reynolds & Whitaker, Dawes, R. M. (1988). Rational choice in an uncertain world.
in press) but must be accessed in practice. It is Chicago, IL: Harcourt Brace Jovanovich.
Ebel, R. L. (1972). Essentials of educational measurement.
difficult but necessary to do so. Measurement Englewood Cliffs, NJ: Prentice-Hall.
science also progresses and practitioners are Exner, J. E. (1976). Projective techniques. In I. B. Weiner
encouraged to revisit their mathematical back- (Ed.), Clinical methods in psychology. New York: Wiley.
grounds every few years as part of what has Faust, D. (1984). The limits of scientific reasoning.
Minneapolis, MN: University of Minnesota Press.
become a nearly universal requirement for Faust, D., & Ackley, M. A. (1998). Did you think it was
continuing education to continue in practice. going to be easy? Some methodological suggestions for
New paradigms will always emerge as they have the investigation and development of malingering detec-
in the past. tion techniques. In C. R. Reynolds (Ed.), Detection of
It is from basic psychological research into malingering during head injury litigation (pp. 1±54). New
York: Plenum.
the nature of human information processing Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R.
and behavior that advances in psychological Linn (Ed.), Educational measurement, (3rd ed.,
assessment must come. While some of these pp. 105±146). New York: Macmillan.
advances will be technological, the more fruitful Grove, W. M., & Meehl, P. (1996). Comparative efficiency
of informal (subjective, impressionistic) and formal
area for movement is in the development of new (mechanical, algorithmic) prediction procedures: The
paradigms of test interpretations. With each clinical±statistical controversy. Psychology, Public Pol-
advance, with each ªnewº test that appears, we icy, and Law, 2, 293±323.
must proceed with caution and guard against Hambleton, R. K. (1994). Guidance for adapting educa-
jumping on an insufficiently researched band- tional and psychological tests: A progress report.
European Journal of Psychological Assessment, 10,
wagon. Fruitful techniques may be lost if 229±244.
implemented too soon to be fully understood Haynes, S. N., & Wilson, C. C. (1979). Behavioral
and appreciated; patients may also be harmed assessment. San Francisco: Jossey-Bass.
by the careless or impulsive use of assessment Hays, W. L. (1973) Statistics for the social sciences. New
materials that are poorly designed (but attrac- York: Holt, Rinehart & Winston.
Hersen, M., & Bellack, A. S. (1976). Behavioral assessment:
tively packaged) or without the necessary A practical handbook. New York: Pergamon.
theoretical and empirical grounding. When Hunter, J. E., Schmidt, F. L., & Rauschenberger, J. (1984).
evaluating new psychological assessment meth- Methodological and statistical issues in the study of bias
ods, surely caveat emptor must serve as the in mental testing. In C. R. Reynolds & R. T. Brown
(Eds.), Perspectives on bias in mental testing (pp. 41±99)
guard over our enthusiasm and our eagerness to New York: Plenum.
provide helpful information about patients in Jackson, D. N., & Messick, S. (Eds.) (1967). Problems in
the design of successful intervention programs. human assessment. New York: McGraw-Hill.
Jean, P. J., & Reynolds, C. R. (1982). Sex and attitude
distortions: The faking of liberal and traditional attitudes
4.02.11 REFERENCES about changing sex roles. Paper presented to the annual
meeting of the American Educational Research Associa-
American Psychological Association (1985). Standards for tion, New York, March.
educational and psychological tests. Washington, DC: Jensen, A. R. (1980). Bias in mental testing. New York:
Author. Free Press.
Anastasi, A. (1976). Psychological testing (4th ed.). New Kaufman, A. S. (1990). Assessment of adolescent and adult
York: Macmillan. intelligence. Boston: Allyn & Bacon.
Anastasi, A. (1981). Psychological testing (5th ed.). New Kaufman, A. S. (1994) Intelligent testing with the WISC-
York: Macmillan. III. New York: Wiley.
Anastasi, A., & Urbina, S. (1997). Psychological testing Kaufman, A. S., & Reynolds, C. R. (1983). Clinical
(7th ed.). Upper Saddle River, NJ: Prentice-Hall. evaluation of intellectual function. In I. Weiner (Ed.),
Angoff, W. H. (1971). Scales, norms, and equivalent scores. Clinical methods in psychology (2nd ed.). New York:
In R. L. Thorndike (Ed.), Educational measurement (2nd Wiley.
ed.). Washington, DC: American Council on Education. Kleinmuntz, B. (1990). Why we still use our heads instead
Barrios, B. A., Hartmann, D. P., & Shigetomi, C. (1981). of the formulas: Toward an integrative approach.
Fears and anxieties in children. In E. J. Marsh & L. G. Psychological Bulletin, 107, 296±310.
Terdal (Eds.), Behavioral assessment of childhood dis- Lord, F. M., & Novick, M. (1968). Statistical theories of
orders. New York: Guilford. mental test scores. Reading, MA: Addison-Wesley.
Cone, J. D. (1977). The relevance of reliability and validity Mash, E. J., & Terdal, L. G. (1981). Behavioral assessment
for behavioral assessment. Behavior Therapy, 8, 411±426. of childhood disorders. New York: Guilford.
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike Meehl, P. (1954). Clinical versus statistical prediction: A
(Ed.), Educational measurement (2nd ed.). Washington, theoretical analysis and a review of the evidence.
DC: American Council on Education. Minneapolis, MN: University of Minnesota Press.
Cronbach, L. J. (1983). Essentials of psychological testing Messick, S. (1980). Test validity and the ethics of
(4th ed.). New York: Harper & Row. assessment. American Psychologist, 35, 1012±1027.
Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). Messick, S. (1989). Validity. In R. Linn (Ed.), Educational
Theory of generalizability: A liberalization of reliability measurement (3rd ed., pp. 13±104). New York: Macmil-
theory. British Journal of Statistical Psychology, 16, lan.
137±163. Moreland, K. L. (1995). Persistent issues in multicultural
Dana, R. H. (1996). Culturally competent assessment assessment of social and emotional functioning. In L. A.
practices in the United States. Journal of Personality Suzuki, P. J. Meller, & J. G. Ponterrotto (Eds.),
Assessment, 66, 472±487. Handbook of multicultural Assessment: Clinical, psycho-
References 55

logical, and educational applications. San Francisco: litigation (pp. 261±282). New York: Plenum.
Jossey-Bass. Reynolds, C. R. (Ed.) (1998b). Detection of malingering
Nelson, R. O., Hay, L. R., & Hay, W. M. (1977). during head injury litigation. New York: Plenum.
Comments on Cone's ªThe relevance of reliability and Reynolds, C. R. (in press-a). Need we measure anxiety
validity for behavioral assessment.º Behavior Therapy, 8, separately for males and females? Journal of Personality
427±430. Assessment.
Newland, T. E. (1980). Psychological assessment of Reynolds, C. R. (in press-b). Why is psychometric research
exceptional children and youth. In W. M. Cruickshank on bias in mental testing so often ignored? Psychology,
(Ed.), Psychology of exceptional children and youth (4th Public Policy and Law.
ed.). Englewood Cliffs, NJ: Prentice-Hall. Reynolds, C. R. & Brown, R. T. (1984). Bias in mental
Nunnally, J. (1978). Psychometric theory. New York: testing: An introduction to the issues. In C. R. Reynolds
McGraw-Hill. & R. T. Brown (Eds.), Perspectives on bias in mental
O'Leary, K. D., & Johnson, S. B. (1979). Psychological testing. New York: Plenum.
assessment. In H. C. Quay & J. S. Werry (Eds.), Reynolds, C. R., & Clark, J. (1982). Cognitive assessment
Psychopathological disorders of childhood. New York: of the preschool child. In K. D. Paget & B. Bracken
Wiley. (Eds.), Psychoeducational assessment of the preschool and
Palmer, D. J. (1980). Factors to be considered in placing primary aged child. New York: Grune & Stratton.
handicapped children in regular classes. Journal of Reynolds, C. R., & Kamphaus, R. W. (Eds.) (1990a).
School Psychology, 18, 163±171. Handbook of psychological and educational assessment of
Petersen, N. S., Kolen, M., & Hoover, H. D. (1989). children: Vol I. Intelligence and achievement. New York:
Scaling, norming, and equating. In R. Linn (Ed.) Guilford.
Educational measurement (3rd ed., pp. 221±262). New Reynolds, C. R., & Kamphaus, R. W. (Eds.) (1990b).
York: Macmillan. Handbook of psychological and educational assessment of
Prevatt, F. (in press). Personality assessment in the schools. children: Vol II. Personality, behavior, and context. New
In C. R. Reynolds & T. B. Gutkin (Eds.), The handbook York: Guilford.
of school psychology (3rd ed.). New York: Wiley. Reynolds, C. R., & Kamphaus, R. W. (1992). Behavior
Reynolds, C. R. (1981a). The fallacy of ªtwo years below assessment system for children. Circle Pines, MN:
grade level for ageº as a diagnostic criterion for reading American Guidance Service.
disorders. Journal of School Psychology, 19, 350±358. Reynolds, C. R., & Richmond, B. O. (1978). What I think
Reynolds, C. R. (1981b). Neuropsychological assessment and feel: A revised measure of children's manifest
and the habilitation of learning: Considerations in the anxiety. Journal of Abnormal Child Psychology, 6,
search for the aptitude X treatment interaction. School 271±280.
Psychology Review, 10, 343±349. Reynolds, C. R., & Richmond, B. O. (1985). Revised
Reynolds, C. R. (1981c). The neuropsychological basis of children's manifest anxiety scale. Los Angeles: Western
intelligence. In G. Hynd & J. Obrzut (Eds.), Neuropsy- Psychological Services,
chological assessment of the school-aged child. New York: Reynolds, C. R., & Whitaker, J. S. (in press). Bias in
Grune and Stratton. mental testing since Jensen's ªBias in mental testing.º
Reynolds, C. R. (1982). The problem of bias in psycholo- School Psychology Quarterly.
gical assessment. In C. R. Reynolds & T. B. Gutkin Salvia, J., & Ysseldyke, J. E. (1981). Assessment in special
(Eds.), The handbook of school psychology. New York: and remedial education (2nd ed.). Boston: Houghton
Wiley. Mifflin.
Reynolds, C. R. (1985). Critical measurement issues in Sattler, J. (1988). Assessment of children's intelligence and
learning disabilities. Journal of Special Education, 18, special aptitudes. San Diego, CA: Author.
451±476. Stanton, H., Reynolds, C. R., & Kamphaus, R. W. (1993).
Reynolds, C. R. (1997). Measurement and statistical BASC plus software scoring program for the Behavior
problems in neuropsychological assessment of children. Assessment System for Children. Circle Pines, MN:
In C. R. Reynolds & E. Fletcher-Janzen (Eds.), The American Guidance Service.
handbook of clinical child neuropsychology (2nd ed., Weschsler, D. (1992). Weschsler Intelligence Scale for
pp. 182±203). New York: Plenum. Children-III. San Antonio, TX: Psychological Corpora-
Reynolds, C. R. (1998a). Common sense, clinicians, tion.
and actuaralism in the detection of malingering Whyte, W. H. (1967). How to cheat on personality tests. In
during head injury litigation. In C. R. Reynolds D. Jackson & S. Messick (Eds.), Problems in human
(Ed.), Detection of malingering during head injury assessment. New York: McGraw-Hill.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.03
Diagnostic Models and Systems
ROGER K. BLASHFIELD
Auburn University, AL, USA

4.03.1 INTRODUCTION 57
4.03.2 PURPOSES OF CLASSIFICATION 58
4.03.3 DEVELOPMENT OF CLASSIFICATION SYSTEMS IN THE USA 59
4.03.4 KRAEPLIN 62
4.03.5 EARLY DSMS AND ICDS 62
4.03.6 NEO-KRAEPELINIANS 63
4.03.7 DSM-III, DSM-III-R, AND DSM-IV 64
4.03.8 ICD-9, ICD-9-CM, AND ICD-10 67
4.03.9 CONTROVERSIES AND ISSUES 69
4.03.9.1 Organizational Models of Classification 69
4.03.9.2 Concept of Disease 72
4.03.9.3 Two Views of a Hierarchical System of Diagnostic Concepts 75
4.03.9.4 Problem of Diagnostic Overlap/Comorbidity 76
4.03.10 CONCLUDING COMMENTS 78
4.03.11 REFERENCES 78

4.03.1 INTRODUCTION Yea, verily, I am the Mighty King, Lord Arch-


duke, Pope, and Grand Sanhedrim, John Michler.
None with me compare, none fit to comb my hair,
To all the People and Inhabitants of the United but the three-legged stool is the chief of my store,
States and all the outlying Countries, Greetings: and my neat little cottage has ground for the floor.
I, John Michler, King of Tuskaroras, and of all the John Michler is my name. Selah!
Islands of the Sea, and of the Mountains and I am the Great Hell-Bending Rip-Roaring Chief of
Valleys and Deserts; Emperor of the Diamond the Aborigines! Hear me and obey! My breath
Caverns, and Lord High General of the Armies overthrows mountains; my mighty arms crush the
thereof; First Archduke of the Beautiful Isles of everlasting forests into kindling wood; I am the
the Emerald Sea; Lord High Priest of the Grand owner of the Ebony Plantations; I am the owner of
Lama, etc., etc., etc.: Do issue this my proclama- all the mahogany groves and of all the satin-wood;
tion. Stand by and hear, for the Lord High I am the owner of all the granite; I am the owner of
Shepherd speaks. No sheep have I to lead me all the marble; I am the owner of all the owners of
around, no man have I to till me the ground, but Everything. Hear me and obey! I, John Michler,
the sweet little cottage is all of my store, and my stand forth in the presence of the Sun and of all the
neat little cottage has ground for the floor. No Lord Suns and Lord Planets of the Universe, and I
children have I to play me around, no dog have I to say, Hear me and obey! I, John Michler, on this
bark me around, but the three-legged stool is the eighteenth day of August, 1881, do say, Hear me
chief of my store, and my neat little cottage has and obey! for with me none can equal, no, not one,
ground for the floor. for the three-legged stool is the chief of my store,

57
58 Diagnostic Models and Systems

and my neat little cottage has ground for the floor. behave in deviant and socially abnormal ways in
Hear me and obey! Hear me and obey! John order to understand their behaviors.
Michler is my name. More specifically, there are five purposes to
John Michler, First Consul and Dictator of the the classification of mental disorders: (i) forming
World, Emperor, Pope, King and Lord High
Admiral, Grand Liconthropon forever! (Ham-
a nomenclature so that mental health profes-
mond, 1883, pp. 603±604) sionals have a common language; (ii) serving as a
basis of information retrieval; (iii) providing a
short-hand description of the clinical picture of
A physician in private practice in New York the patient; (iv) stimulating useful predictions
City reported that a man brought his brother, about what treatment approach will be best; and
John Michler, to see him. John was acting (v) serving as a concept formation system for a
strangely, and his brother wanted to know what theory (or theories) of psychopathology.
to do. The brother gave the physician a procla- The first reason to have a classification
mation that John Michler had written (see system, providing a nomenclature, is the most
above). Clearly, to most observers, there would fundamental (World Health Organization,
be no question that John Michler was ªcrazy.º 1957). At a minimum, a classification system
However, what is the diagnosis of John Michler? provides a set of nouns for clinicians to use to
When this proclamation was shown to mental discuss their objects of interestÐpeople. Thus, a
health professionals, the most common diag- nomenclature is a set of terms that refer to
nostic possibilities that are mentioned are schizo- groups of people that mental health profes-
phrenia and bipolar disorder (manic episode). sionals see in their various professional roles.
What do these various diagnoses mean? Why did The second reason, information retrieval, has
clinicians not assign a diagnosis of narcissistic a pragmatic focus on how well a classification
personality disorder to this patient? Certainly organizes a research literature, so that clinicians
this man would fit the vernacular meaning of and scientists can search for information that
self-centered and self-aggrandizing that is often they need (Feinstein, 1967). In biology, there is a
associated with a narcissistic personality. How is dictum: ªthe name of a plant is the key to its
a manic episode differentiated from schizophre- literatureº (Sneath & Sokal, 1973). The same is
nia? What does it mean to say that Michler true in the area of mental disorders. If a student
appears to be psychotic? Does that diagnosis clinician is assigned a patient who attempts to
mean that he has some type of disease that has control weight by inducing vomiting, the name
affected part of his brain, or does it suggest that bulimia becomes important for helping the
his childhood was so unusual and abnormal that student locate the literature about this disorder
he has developed a strange and unusual way of in books, journal articles, and even on the
coping with life? internet.
The third reason for having a classification is
4.03.2 PURPOSES OF CLASSIFICATION description. There are many ways of creating
classifications that could satisfy the first two
The vernacular word that was used to describe purposes. For instance, clinicians could decide
John Michler was ªcrazy.º This word is fre- to classify all of their patients on eye color.
quently used in descriptions of persons who have Using eye color would allow clinicians to have a
mental disorders. The reason for the applic- language to discuss patients (ªI have seen 17
ability of this word is that one common feature brown eyed, eight blue eyed, and four mixed eye
of most psychiatric patients is that their color patients in the last month.º). Also, eye
behaviors are statistically abnormal. That is, color categories could be used as names to store
psychiatric patients behave in ways that are information about patients. However, using eye
deviant. Their interpersonal actions are not color to classify patients would not be a
expected within the cultural context of our satisfactory solution to either researchers or
society. clinicians because patients with similar eye
Classification is a fundamental human activ- colors are unlikely to have similar symptom
ity that people use to understand their world. For patterns. To meet the purpose of description,
instance, a classification of animals is helpful patients are grouped on the basis of similarity.
when understanding the variations among Patients who have the same diagnosis are
diverse forms of living organisms. In forming expected to be relatively similar in terms of
a classification, the classifier is attempting to use symptoms (Lorr, 1966). In addition, these
observed similarities and differences among the patients should be dissimilar when compared
things being classified, so as to find some order to patients with different diagnoses (Everitt,
among those things. In psychopathology, the 1974). In the case of John Michler, if he is having
general goal of classification is an attempt to use a manic episode of a bipolar disorder, then we
similarities and differences among people who would expect that Michler's brother would
Development of Classification Systems in the USA 59

report that John had been spending large sums 4.03.3 DEVELOPMENT OF
of money that he did not have, that his speech CLASSIFICATION SYSTEMS IN
was extremely rapid, and that his sleep pattern THE USA
was markedly disturbed. In contrast, his brother
is unlikely to report that Michler usually sat The classification of mental disorders has an
around the house with an emotionless, cold, extensive history that can be traced back to
detached interpersonal style and that he would some of the earliest writings known to man. A
tell his brother about voices in his head that nineteenth century BC Egyptian writer dis-
were in discussion about Michler's behaviors. cussed a disorder in women in which they would
The latter symptoms typically occur in indivi- report vague and inconsistent physical symp-
duals with schizophrenia. Thus, diagnoses toms that appeared to shift in body location
become descriptive short-hand names for over time (Veith, 1965). Psalm 102 provides an
clusters of co-occurring symptoms. excellent clinical description of depression.
The fourth purpose is prediction. This However, like many others areas of modern
purpose typically involves two types of infer- science, the first major commentaries on mental
ences: (i) predicting the course of the patient's disorders were found in the writings of the
condition if there is no treatment or intervention Greeks. The Greek medical writers introduced
(ªDiagnosis is prognosisº as stated by Wood- four terms, all of which are still used today:
ruff, Goodwin, & Guze, 1974); and (ii) predict- ªmelancholia,º ªhysteria,º ªmania,º and ªpara-
ing which treatment approach will be most noia.º Melancholia was a Greek term that
effective with the patient (Roth & Fonagy, referred to a condition that now would be
1996). In the field of psychopathology, predic- described by the word depression. Hippocrates
tion has proved to be an elusive goal. Recently, believed that the sadness and the slowed bodily
there was an important multisite study that was movements associated with this disorder were
performed in the USA in which three different caused by an abundance of black bile, which he
treatment approaches to alcoholism were considered to be one of the four main
compared. An attempt was made to see whether ingredients in the human body. Hence, he
particular groups of patients improved with named this disorder melan (black) + cholia
specific treatments. The initial results have been (bile). The second term, hysteria, was the Greek
negative. Except for differences related to the word for the condition originally described by
severity of patient symptomatology, other the Egyptians in which women had multiple,
patient characteristics did not predict which inconsistent and changing somatic complaints.
treatments were most effective (Project Match The Hippocratic writers used the name of
Research Group, 1997). hysteria, which means pertaining to the uterus,
The final goal of a classification is concept because they believed that this disorder was
formation (Hempel, 1965). This goal is probably caused by a dislodged, floating uterus. The last
the most distant. In biological classification, two terms were mania and paranoia. Mania, to
Linneaus and his contemporaries made noteable the Greeks, referred to persons who were
gains in the classification of living organisms by delusional. During the nineteenth century,
creating classifications that served to describe individuals who were delusional but had few
most of the known information about these other symptoms were diagnosed with mono-
organisms. Almost a century later, Darwin mania (Berrios, 1993). Mania became an
formulated a theory of evolution which ex- umbrella term for almost any type of psychotic
plained many of the organizational phenomena state (Spitzka, 1883). The meaning of mania,
that Linneaus had used (Hull, 1988). In however, changed again in the twentieth century
particular, Darwin's evolutionary theory pro- to its contemporary denotation of grandiosity,
vided a basis for understanding why there is a excitement, expansiveness, and elation. The
hierarchical arrangement about the categories in final Greek term, paranoia, has undergone a
biological classification. The field of biological similar transformation. Paranoia initially
classification has continued to change. During meant craziness (para = abnormal + nous =
the twentieth century, three different, competing mind). Now the term refers to people who are
approaches to biological classification have suspicious and often have delusions in which
appeared (Hull, 1988). In oversimplified terms, others are plotting against them.
one of these approaches emphasized the no- After the Greeks, psychopathology did not
menclature and information retrieval purposes, attract much scientific interest until the nine-
the second focused on description, and the third teenth century. During the Middle Ages, mental
was based on a theoretical view. The third view, disorders were associated with evilness. Thus,
the one based on theory, has become the mental disorders were under the domain of
dominant approach to biological classification religious authorities, not physicians or scien-
(Nelson & Platnick, 1981). tists. This approach to mental disorders began
60 Diagnostic Models and Systems

to change in the late 1700s as exemplified by the pathology of this disorder as well as to under-
fact that King George III of England, who was stand its cause. Dementia paralytica was also a
psychotic for most of the last decade of his reign, clinically serious disorder because it accounted
received medical care rather than religious for about one-sixth of all admissions to insane
counseling. asylums during the nineteenth century. The
The first major American physician to be prognosis of the disorder was very poor because
interested in mental disorders was Benjamin death typically would occur within three years
Rush who, as a signer of the Declaration of of the diagnosis (Austin, 1859/1976). For most
Independence, was one of the prominent of the nineteenth century, many different
American physicians of the late eighteenth etiologies were proposed as potential causes
century. He was very interested in the forms of this disorder. Austin (1859/1976), for in-
of insanity. Rush also published a book on the stance, listed the following as moral causes of
topic which he titled Medical inquires and dementia paralytica: death of a son, sudden loss
observations upon the diseases of the mind of two good situations, wife's insanity, worry,
(Alexander & Selesnick, 1966). He believed in and commercial ruin.
a theory of neurosis. According to this theory, With the increasing interest in psychopathol-
mental disorders were caused by overstimula- ogy during the nineteenth century, a number of
tion of the nervous system. Thus, environmental classifications for mental disorders appeared.
phenomena such as the pace of urban living, One example of these classifications was pub-
overuse of alcohol, excessive sexual behavior, lished by William A. Hammond (Blustein,
masturbation, and smoking were all seen as 1991). Hammond, like Freud, was a nineteenth
casual factors in the development of mental century neurologist. As a young physician,
disorders. As a result, asylums were the Hammond had published a set of interesting
appropriate way to treat insane patients. experimental studies on human physiology. At
Asylums could provide the quiet and tranquility the age of 34, he became Surgeon General for the
that was necessary to allow the nervous system US Army during the Civil War and was credited
to heal and to repair itself. with a number of important innovations at the
About the same time that Rush was writing time including hospital design, the development
about psychopathology in the USA, there was of an ambulance corps, and the removal of a
an important discovery in France that was to mercury compound from the medical formulary.
markedly influence thinking about mental His political clashes with the Secretary of War,
disorders. In 1822, a physician named Bayle however, led to his court martial and dismissal.
(Quetel, 1990) performed autopsies on a number After the Civil War, he moved to New York City
of patients who presented with gradiose delu- and developed a lucrative private practice as a
sions and dementia (i.e., who had lost their mind neurologistÐa remarkable accomplishment
from the French de- (not) + ment (mind)). Bayle during a time when most physicians were
discovered that all of the patients in his study generalists. His interests extended to psychiatry
had marked changes in their brains. In addition and to writing novels as well as to physiology,
to their dementia, all of these patients developed studies of sleep, hypnosis, and the use of animal
motor paralysis before they died. The brains of hormonal extracts. Hammond wrote extensively
these patients had shrunk to almost half the in scientific journals. He was one of the founders
weight of a normal brain, the skin of the brain of the Journal of Nervous and Mental Disease
(i.e., the meninges and the arachnoid) was which is still published. In addition, he authored
thickened, and the color of the brain was important textbooks of neurology and psychia-
strikingly different from that of normal brains. try.
Bayle's name for this disorder was chronic In Hammond's textbook of mental disorders,
arachnitis since he believed that this disorder he argued that there were six possible principles
was caused by a chronic infection of arachnoid that could be used to organize a classification
tissue (Quetel, 1990). Later, the common name system:
for this disorder was changed to dementia (i) anatomical (organized by the part of the
paralytica, a descriptive name that referred to brain that is affected);
the combined presence of a psychosis together (ii) physiological (organized by the physio-
with the progressive paralysis of the patient's logical system in the brain);
limbs. (iii) etiological (supposed causes);
The discovery of dementia paralytica was the (iv) psychological (based upon a functional
first instance in which a mental disorder had view of the mind);
been shown to be associated with demonstrable (v) pathological (observable, morbid altera-
changes in the brain. A number of autopsy tions in the brain); and
studies appeared in the medical journals. These (vi) clinical (descriptive, based upon clusters
studies attempted to identify the exact neuro- of symptoms).
Development of Classification Systems in the USA 61

Of these six principles, Hammond said that the By the end of the nineteenth century, there
anatomical, the physiological, and the patho- were three broad theories about the etiology of
logical are the best, but he could not use them this disorder. First, one school of thought
because the science of his time was insufficient. believed that it was caused by alcoholism
Hammond also rejected the etiological organi- because the disorder primarily affected men,
zation of categories, because he felt that an the age of onset was typically during the 30s and
etiological classification, given nineteenth cen- 40s (which is the same time of onset for severe
tury knowledge, would be too speculative. alcoholism), and most men with dementia
Thus, the main choice was between the clinical paralytica had substantial drinking histories.
(descriptive) approach and the psychological Second, was the theory that the disorder was
(mentalistic) approach. Hammond preferred caused by syphilis. Epidemiological surveys had
the latter because he thought that a classifica- found that over 80% of men with dementia
tion which did not have a strong theoretical paralytica had had syphilis. However, since no
basis would fail. survey had documented 100% with a history of
Hammond adopted a functional view of syphilis, many investigators suggested that
psychology that was common in his day. He syphilis was an important precondition to the
believed that mental functioning could be development of dementia paralytica but was not
organized into four areas: perception, cogni- the single cause. Hammond, for instance, was
tion, affect, and will. Hence, he organized his clear that syphilis was not the etiology because
classification of mental disorders into six major syphilis was associated with other forms of
headings: insanity. The third theory was more psycholo-
(i) perceptual insanities (e.g., hallucina- gical in that the disorder was believed to be
tions); caused by moral depravity because persons who
(ii) intellectual insanities (e.g., delusional drank, who frequented prostitutes, and who
disorders); were in the military were more likely to have the
(iii) emotional insanities (e.g., melancholia); disorder. As additional evidence, dementia
(iv) volitional insanities (e.g., abulomania); paralytica was known to be rare among priests
(v) compound insanities (i.e., disorders af- and Quakers.
fecting more than one area of the mind); and Research, attempting to provide evidence for
(vi) constitutional insanities (i.e., disorders or against these theories, was performed. For
with specific causes such as choreic insanity). instance, a famous German psychiatrist named
There were a total of 47 specific categories of Kraft-Ebbing performed a study in which he
mental disorders that were organized under injected serum from patients with syphilis into
these six major headings. Most names of these the blood streams of patients with dementia
specific categories would not be recognized by paralytica. Since it was known at the time that a
modern clinicians. The descriptions of these person could only develop a syphilitic infection
disorders, together with case histories that he once, if any of the patients with dementia
included in this textbook do, however, suggest paralytica developed syphilis, it would prove
that many of the disorders he was discussing that they had not had syphilis previously. Hence
would have modern counterparts. For instance, syphilis could not be the cause of the disorder.
under the heading ªintellectual insanities,º None of the 32 patients with dementia paral-
Hammond classified four disorders whose ytica developed syphilis. Kraft-Ebbing con-
names seem odd by modern standards: intellec- cluded that syphilis was the cause of this
tual monomania with exaltation, chronic in- disorder.
tellectual monomania, reasoning mania, and The conclusive evidence regarding the etiol-
intellectual subjective morbid impulses. In ogy of dementia paralytica occurred in the early
modern terms, these disorders probably would twentieth century. In 1906, the bacillus that
be called biopolar I disorder (manic episode), causes syphilis was isolated, and the Waserman
schizophrenia (continuous), narcissistic person- test for syphilis was developed. Plaut (1911)
ality disorder, and obsessive compulsive dis- demonstrated that patients with dementia para-
order. lytica had positive Waserman tests from blood
In Hammond's textbook, the lengthiest samples and also from samples of cerebrospinal
discussion was devoted to general paralysis, fluid. In 1913, two Americans, Noguchi and
for which Hammond's name was dementia Moore, were able to identify the presence of the
paralytica. As part of this discussion, Ham- syphillitic baccilus in the brains of patients who
mond included the proclamation by John had died from dementia paralytica (Quetel,
Michler (quoted at the beginning of this 1990). The name of this disorder was changed
chapter). In his discussion of general paralysis, again to reflect the new understanding of this
Hammond emphasized the many medical disorder. It was called paresis or general para-
symptoms associated with this disorder. lysis associated with tertiary syphilis of the
62 Diagnostic Models and Systems

central nervous system. However, even after the In 1917, the newly formed American Psy-
discovery of the cause of dementia paralytica, the chiatric Association (APA) adopted a classifi-
development of antibiotics to treat the disorder cation system that was quite similar to the
was another 30 years in the future. Thus, many classification contained in Kraepelin's sixth
patients were treated by inoculating them with edition of his textbook. This early twentieth
malaria (Braslow, 1995). century American classification included the
concepts of dementia praecox and manic-
depressive disorder. The classification also
4.03.4 KRAEPELIN adopted the fundamental Kraepelinian distinc-
tion between the organic disorders, the func-
Another important development at the turn tional psychoses, and the neurotic/character
of the century was the international focus on the disorders.
classificatory ideas of a German psychiatrist In 1932, the APA officially adopted a new
named Emil Kraepelin (Berrios & Hauser, classification system as part of the Standard
1988). Kraepelin was a researcher who initially Classified Nomenclature of Diseases (APA,
learned his approach to research in a laboratory 1933). This new classification, however, did not
organized by Wundt, one of the founders of attract much attention (Menninger, 1963).
modern experimental psychology. After com-
pleting his medical degree, Kraepelin became
the medical director for an insane asylum in east 4.03.5 EARLY DSMS AND ICDS
Prussia. While in that setting, Kraepelin
published a number of experimental psychology World War II led to a renewed emphasis on
studies on persons with mental disorders. He classification. During the war, nearly 10% of all
also began to write textbooks about psycho- military discharges were for psychiatric reasons.
pathology for German medical students. Like By the time the war ended, there were four major
most textbook authors, Kraepelin organized his competing classification systems at use in the
chapters around the major categories of mental USA: (i) the standard system adopted by the
disorders that he recognized. The sixth edition APA in 1932; (ii) the US Army classification;
of Kraepelin's textbook (Kraepelin, 1902/1896) (iii) the US Navy classification; and (iv) the
attracted major international attention because Veterans Administration system (Raines, 1952).
of two chapters in these texts. In response to this disorganization, the APA
One chapter was about a disorder that formed a task force to create a system that
Kraepelin described as dementia praecox (prae- would become standard in the USA. The result
cox = early) which was a form of psychosis that was the Diagnostic and statistical manual of
had a typical age of onset in adolescence. mental disorders (DSM; APA, 1952). This
Kraepelin recognized three subtypes to this classification is usually known by its abbre-
disorder: hebephrenic, catatonic, and paranoid. viated name: DSM-I. The DSM-I contained 108
Kraepelin's chapter on dementia praecox par- different diagnostic categories.
alleled the immediately preceding chapter on The DSM-I was important for a number of
dementia paralytica. Just as dementia praecox reasons. First, the major rationale behind the
had three descriptive subtypes, dementia paral- DSM-I was to create a classification system that
ytica also had three descriptive subtypes: a represented a consensus of contemporary
depressed form, a grandiose form, and a thinking. Care was taken to include all
paranoid form. diagnostic concepts that were popular in
The second chapter to attract attention was American psychiatry during the 1940s and
what Kraepelin named manic-depressive in- 1950s. Thus, the DSM-I emphasized commu-
sanity. Kraepelin's observations of patients in nication among professionals as the major
asylums had led him to believe that the mania purpose of classification and emphasized the
and depression (= melancholia) had the same need for psychiatric classification to be an
type of course when these patients were accepted nomenclature that members of a
observed over time. Both were episodic dis- profession can use to discuss their clinical cases.
orders. Moreover, nineteenth century clinicians Consistent with this emphasis on communica-
had recognized that there were some patients tion, early versions of the DSM-I were revised,
who went from episodes of mania to episodes of based on comments elicited in a questionnaire
depression and vice versa. These observations sent to 10% of the members of the APA. The
led Kraepelin to hypothesize that mania and DSM-I was finally adopted by a vote of the
depression were essentially flip sides of the same membership of the APA (Raines, 1952).
coin. Hence, he combined, what had been The emphasis on communication in the
recognized since the times of the ancient Greeks DSM-I led to a similar organizing movement
as two disorders, into one mental disorder. at an international level. The international
Neo-Kraepelinians 63

psychiatric community had adopted a classifi- approach to psychopathology that was asso-
cation of mental disorders that was part of the ciated with the DSM-I and DSM-II. Third,
International statistical classification of disease, sociologists became interested in a theory of
injuries and causes of death (6th ed.) (ICD-6 labeling that suggested the process of classifica-
World Health Organization (WHO), 1948). The tion was a process that stigmatized human
first ICD had been created in 1900 and was a beings who adopted unusual patterns of
medical classification of causes of death. The behavior and the act of diagnosis could lead
ICD-6 was the first edition to include all to self-fulfilling prophecies.
diseases, whether they led to death or not. The first of these criticisms was summarized
The classification of mental disorders in the in three different review articles by Kreitman
ICD-6 did not gain broad acceptance. A (1961), Zubin (1967), and Spitzer and Fleiss
committee, chaired by the British psychiatrist (1974). All discussed various problems asso-
Stengel, was formed to review the classification ciated with the classification of mental disorders
systems used by various countries and to make and why poor reliability was a significant issue.
any necessary recommendation for changes to Zubin (1967) made an excellent case that the
the WHO. What Stengel (1959) found was a lack of uniform statistical procedures for
hodgepodge of diagnostic systems between, and estimating reliability was a serious methodolo-
sometimes within, different countries. Stengel gical problem. Spitzer and Fleiss (1974) sug-
despaired over the confused state of interna- gested a solution to this problem and showed
tional classification and said that the ICD-6 did how this solution could be applied retrospec-
not serve as a useful nomenclature. A positive tively to earlier data. Kreitman (1961) probably
note in his review, however, concerned the had the most far-reaching analysis of the issue
DSM-I, which Stengel considered an advance because he said that the unreliability problem
over the other national classifications because had been overemphasized and that the more
of its emphasis on representing a well-organized serious issue was the unexplored validity of
nomenclature for a country. diagnostic concepts.
As a result of Stengel's review, there was an The second issue criticism of the early DSMs
international movement to create a consensual and ICDs was the implicit acceptance of a
system that would be adopted by the WHO. The medical model. Despite the dramatic etiological
final product was the mental disorders section solution to dementia paralytica, most of the
of the ICD-8. In the USA, the APA revised its twentieth century research has been disappoint-
DSM classification to correspond with the ICD- ing to those who believed that mental disorders
8. The US version of the ICD-8 was known as are caused by underlying biological processes.
the DSM-II (APA, 1968). The DSM-II had 185 Large amounts of research have attempted to
categories. These categories were subdivided by discover the etiology of disorders such as
a hierarchical organizational system. First, schizophrenia, yet, despite interesting advances,
there was a distinction between psychotic and a clear understanding of the cause of this
nonpsychotic disorders. The psychotic disor- disorder is not available. A psychiatrist named
ders were further subdivided into organic and Thomas Szasz published a book titled The myth
nonorganic disorders. The classification of the of mental illness (Szasz, 1961). He argued that
organic disorders was in terms of etiology (e.g., mental disorders are not diseases, but instead
tumors, infections, heredity, etc.). The nonor- are better conceptualized as ªproblems in
ganic psychotic disorders primarily contained living.º He argued that psychiatrists had placed
the Kraepelinian categories of schizophrenia themselves into the role of moral policeman to
and manic-depressive insanity. The nonpsycho- control individuals with deviant behavior
tic disorders were subdivided into eight sub- patterns. Szasz is now considered one of a
headings including the neuroses (now called group of critics of classification known by the
anxiety disorders), personality disorders, men- title ªantipsychiatrists.º Others in this group are
tal retardation, etc. the British psychiatrist Laing (1967), the French
psychoanalyst Lacan (1977), and recent authors
such as Sarbin (1997), and Kirk and Kutchins
4.03.6 NEO-KRAEPELINIANS (1992).
The third criticism of classification that
After the publication of the DSM-II, psy- became popular in the 1960s and 1970s was
chiatric classification became a very unpopular the labeling criticism. Sociologists such as Matza
topic. There were three general lines of criticism (1969) and Goffman (1961) suggested that the
that were aimed at classification. First, the act of psychiatric diagnosis could lead to self-
diagnosis of mental disorders was unreliable as fulfilling prophecies in which the behavior of
shown by empirical research. Second, a number deviant individuals was constrained to become
of critics attacked the implicit medical model even more deviant. A dramatic demonstration of
64 Diagnostic Models and Systems

the labeling criticism was contained in a (viii) Diagnostic criteria should be codified,
controversial paper published by Rosenhan and a legitimate and valued area of research
(1973). In this paper, Rosenhan and his should be to validate such criteria by various
colleagues gained admission to mental hospitals techniques. Further, departments of psychiatry
even though they reported everything about in medical schools should teach these criteria
themselves factually except their names and one and not depreciate them, as has been the case for
auditory hallucination. All but one of these many years.
pseudopatients were admitted with a diagnosis (ix) In research efforts directed at improving
of schizophrenia and all were released with a the reliability and validity of diagnosis and
diagnosis of schizophrenia in remission. The classification, statistical techniques should be
pseudopatients commented that most of the utilized.
patients were aware that they did not belong In 1972, a group of the psychiatric researchers
there, even though the hospital staff never at Washington University published a paper
figured that out. In addition, the experiences entitled ªDiagnostic criteria for use in psychia-
of the pseudopatients supported the labeling tric researchº (Feighner, Robins, Guze, Woo-
concern. For instance, one pseudopatient re- druff, Winokur, & Munoz, 1972). This paper
ported being bored while being on a ward and listed 15 mental disorders that they believed had
walking around. A nurse noticed him pacing and sufficient empirical evidence to establish their
asked if he was feeling anxious. Following the validity, and listed a set of diagnostic criteria for
publication of the Rosenhan paper, an issue of defining these disorders. The authors argued
the Journal of Abnormal Psychology in 1975 was that a major problem in research about these
devoted to commentaries on this controversial disorders had stemmed from the lack of uniform
paper. definitions by different researchers of the dis-
Partially in reaction to these criticism of orders. They suggested that future research on
classification, a new school of thought was any of these disorders should utilize the diag-
formed in psychiatry called the neo-Kraepeli- nostic criteria proposed in their paper.
nians (Klerman, 1978). This group of psychia- The paper by Feighner et al. had a dramatic
trists, initially an active collection of psychiatric impact on American psychiatry. It was a heavily
researchers at Washington University in St. cited paper, probably the most frequently
Louis, believed that psychiatry, with its psycho- referenced journal article of the 1970s in
analytic emphasis, had drifted too far from its psychiatry. The diagnostic criteria were imme-
roots in medicine. The neo-Kraepelinians diately adopted and became the standard for
emphasized that psychiatry should be con- psychiatric research. Moreover, the 15 cate-
cerned with medical diseases, that extensive gories described by Feighner et al. were
research was needed on the biological bases of expanded into a much larger set of categories,
psychopathology, and that much more empha- focusing primarily on the schizophrenic and
sis needed to be placed upon classification if affective disorders (Spitzer, Endicott, & Robins,
knowledge about psychopathology was to 1975). This new classification was called the
grow. Klerman (1978) summarized the perspec- Research Diagnostic Criteria (RDC) and had an
tive implicit in the neo-Kraepelinian approach associated structured interview known as the
to psychiatry by listing the following tenets: SADS. Since the lead author of the RDC,
(i) Psychiatry is a branch of medicine. Robert Spitzer, had been appointed as the new
(ii) Psychiatry should utilize modern scien- chairperson responsible for organizing the
tific methodologies and base its practice on DSM-III, the RDC became the initial founda-
scientific knowledge. tion from which the DSM-III developed.
(iii) Psychiatry treats people who are sick
and who require treatment.
(iv) There is a boundary between the normal 4.03.7 DSM-III, DSM-III-R, AND DSM-IV
and the sick.
(v) There are discrete mental illnesses. Men- The DSM-III (APA, 1980) was a revolution-
tal illnesses are not myths. There is not one, but ary classification. First, unlike the DSM-I and
many mental illnesses. It is the task of scientific DSM-II, which had emphasized using consensus
psychiatry, as of other medical specialties, to as the major organizing principle, the DSM-III
investigate the causes, diagnosis, and treatment attempted to be a classification based on scient-
of these mental illnesses. ific evidence rather than clinical consensus. For
(vi) The focus of psychiatric physicians instance, the classification of depression was
should be particularly on the biological aspects very different from the view of depression in the
of mental illness. DSM-I and DSM-II, largely because of family
(vii) There should be an explicit and inten- history data gathered in research studies per-
tional concern with diagnosis and classification. formed by the neo-Kraepelinians. In the earlier
DSM-III, DSM-III-R, and DSM-IV 65

DSMs, the primary separation of affective since that time. A third area of impact for the
disorders was in terms of a psychotic vs. neurotic DSM-III was economic. The DSM-III became
distinction. The DSM-III dropped this differ- very popular, sold well in the USA, and even
entiation and, instead, emphasized a separation became a surprisingly large seller to the
of bipolar vs. unipolar mood disorders. Second, international community as translations ap-
the DSM-III discontinued the use of prose peared. The sizeable revenues that accrued to
definitions of the mental disorders. The neo- the APA led to the formation of the American
Kraepelinians were impressed by the research Psychiatric Press, which published subsequent
data suggesting that the reliability of psychiatric versions of the DSM as well as many other
classification, as represented in the DSM-I and books of clinical interest.
DSM-II, was less than optimal (Spitzer & Fleiss, Despite its innovations and generally positive
1974). To try to help improve diagnostic acceptance by mental health professionals, a
reliability, virtually all mental disorders in the number of criticisms were leveled at the DSM-
DSM-III were defined using diagnostic criteria III. One focus of criticism concerned the
stimulated by the innovative system used in the diagnostic criteria. Despite the intention to
Feighner et al. paper. Third, the DSM-III was a make decisions about the classificatory cate-
multiaxial classification. Because the DSM-I gories using scientific evidence, most diagnostic
and DSM-II were ªcommittee products,º the criteria were based on the intuitions of experts in
subsections of these classifications had different the field. In addition, even though the goal when
implicit organizing principles. For instance, in formulating diagnostic criteria was to make
the DSM-I/DSM-II, the organic brain syn- them as behavioral and explicit as possible, not
dromes were organized by etiology, the psycho- all criteria met this goal. Consider part of the
tic disorders were organized by syndromes, and DSM-III criteria for histrionic personality
the neurotic disorders were organized according disorder, for instance.
to ideas from psychoanalytic theory. In order to
avoid the confusion inherent in the use of Characteristic disturbances in interpersonal rela-
multiple organizing principles, the DSM-III tionships as indicated by at least two of the
adopted a multiaxial system that permitted the following:
separate description of the patient's psycho- 1) perceived by others as shallow and lacking
pathology syndrome (Axis I), personality style genuineness, even if superficially warm and
charming
(Axis II), medical etiology (Axis III), environ-
2) egocentric, self-indulgent and inconsiderate
mental factors (Axis IV), and role disturbances of others
(Axis V). The DSM-III was published in 1980 3) vain and demanding
and contained 265 mental disorders. Moreover, 4) dependent, helpless, constantly seeking re-
the size of the manuscript for the DSM-III was assurance
482 pages, a huge increase over the 92 pages of 5) prone to manipulative suicidal threats, ges-
the DSM-II. tures or attempts
The revolutionary impact of the DSM-III led
to changes in many areas of the mental health Note that common language terms such as
professions. One area of impact was in terms of ªshallowº were highly subjective. In addition, a
research. As soon as versions of the DSM-III criterion such as 3) requires an inference about
began to be disseminated to researchers inter- motivations and reasons for behavior, rather
ested in mental disorders, new studies began to than direct observation of behaviors. Finally,
appear that explored the adequacy of the subsequent research showed that criterion 5)
diagnostic criteria used in this classification. above actually was observed more frequently in
The DSM-III was a marked stimulus for borderline rather than histrionic patients
descriptive research in psychiatry and in the (Stangl, Pfohl, Zimmerman, Bowers, & Cor-
other mental health professions. Another area enthal, 1985).
of impact was political. There was a major A second major criticism of the DSM-III
controversy that erupted in the late 1970s over concerned the multiaxial system. First, diag-
the issue of whether the term ªneurosisº should nosing multiple axes required increased time
appear in the DSM-III. Spitzer and the neo- and effort by clinicians, an exercise they were
Kraepelinians had exorcized this term from the unlikely to do unless they were certain that the
classification because of its psychoanalytic gain in information was significant. Second, the
associations. The psychoanalysts lobbied in- relative emphasis on these five axes in the DSM-
tensely within the APA to have the term III was sizeably different. In the DSM-III
reintroduced. Although a compromise was manual, almost 300 pages were devoted to
achieved (Bayer, 1981), the psychoanalysts lost defining the Axis I disorders, another 39 pages
ground in this struggle, and their influence in were spent on Axis II disorders, whereas only
organized psychiatry has continued to wane two pages each were devoted to Axes IV and V.
66 Diagnostic Models and Systems

Moreover, Axes I and II were assessed using diagnosis would have the unfortunate conse-
diagnostic categories, whereas Axes IV and V quence of blaming these women for their roles as
were measured using relatively crude, ordinal victims. Finally, the proposal to include para-
rating scales. Third, the choice of particular axes philic rapism was also attacked. The critics
was also criticized. For instance Rutter, Shaffer, argued that this diagnosis would allow chronic
and Shepard (1975) had advocated the use of rapists to escape punishment for their crimes
one axis for the clinical syndrome in childhood because their behaviors could be attributed to a
disorders with a second focusing on intellectual mental disorder. Thus, these men would not be
level. Instead, both clinical syndromes and held responsible for their behaviors.
mental retardation were Axis I disorders in Because of the ensuing controversy, a
the DSM-III. A group of psychoanalysts argued compromise somewhat similar to the earlier
that defense mechanisms should have been compromises about homosexuality and neuro-
included as a sixth axis. Psychiatric nurses sis was attempted. The authors of the DSM-III-
advocated an additional axis for a nursing R revised the names of the first two disorders
diagnosis, relevant to the level of care required (PMS and masochistic personality disorder) to
by a patient. periluteal phase dysphoric disorder and self-
Only seven years later, a revision to the defeating personality disorder. They also de-
DSM-III was published. This version, known leted the proposal to add paraphilic rapism. In
as the DSM-III-R (APA, 1987), was intended addition, another disorder, sadistic personality
primarily to update diagnostic criteria using the disorder, was added presumably to blame
research that had been stimulated by the DSM- abuser as well as victims, thereby balancing
III. It was called a revision because the goal the potential antifeminine connotations of self-
was to keep the changes small. However, the defeating/masochistic personality disorder.
differences between the DSM-III and the DSM- This compromise was not successful. As a
III-R were substantial. Changes included re- result, the executive committee for the Amer-
naming some categories (e.g., paranoid disorder ican Psychiatric Classification decided not to
was renamed delusional disorder), changes in include these categories in the body of the DSM-
specific criteria for disorders (e.g., the criteria III-R. Instead, they were placed in an appendix
for schizoaffective disorder), and reorganiza- as disorders needing more research (Walker,
tion of categories (e.g., panic disorder and 1987).
agoraphobia were explicitly linked). In addi- The DSM-IV was published in 1994, con-
tion, six diagnostic categories originally in the tained 354 categories, and was 886 pages in
DSM-III were deleted (e.g., egodystonic homo- length, a 60% increase over the DSM-III and
sexuality and attention deficit disorder without almost seven times longer than the DSM-II
hyperactivity) while a number of new specific (APA, 1994). There are 17 major categories in
disorders were added (e.g., body dysmorphic the DSM-IV:
disorder and trichotillomania). As a result, the
DSM-III-R contained 297 categories compared disorders usually first diagnosed in childhood
to the 264 categories in the DSM-III. cognitive disorders
Associated with the DSM-III-R was the mental disorders due to a general medical
development of a major controversy that had condition
political overtones. Among the changes pro- substance-related disorders
posed for the DSM-III-R was the addition of schizophrenia and other psychotic disorders
three new disorders: premenstrual syndrome mood disorders
(PMS), masochistic personality disorder and anxiety disorders
paraphilic rapism. These additions raised the ire somatoform disorders
of a number of groups, especially feminists. factitious disorders
Concerning PMS, feminists argued that the dissociative disorders
inclusion of this disorder into the DSM would sexual disorders
be the implicit assumption that the emotional eating disorders
state of women can be blamed on their biology. sleep disorders
If it were to be a disorder, the feminists argued impulse control disorders
that PMS should be classified as a gynecological adjustment disorders
disorder rather than a psychiatric disorder. personality disorders
Masochistic personality disorder had been other conditions that may be a focus of
suggested for inclusion by psychoanalysts who clinical attention
pointed to the extensive literature on this
category. Feminists, however, believed that this The DSM-IV retained a multiaxial system
diagnosis would be assigned to women who had and recognized five axes (dimensions) along
been physically or sexually abused. Thus, this which patient conditions should be coded:
ICD-9, ICD-9-CM, and ICD-10 67

Axis I clinical disorders disorders, the steering committee for this clas-
Axis II personality disorders/mental sification contained 27 members, including four
retardation psychologists. Reporting to this committee were
Axis III general medical conditions 13 work groups composed of 5±16 members.
Axis IV psychosocial and environmental Each work group had a large number of
problems advisors (typically over 20 per work group).
Axis V global assessment of functioning There were three major steps associated with
the activities of each work group. First, all work
As with the DSM-III-R, a major focus in the groups performed extensive literature reviews of
DSM-IV revision concerned diagnostic criteria. the disorders under their responsibility. Many
A total of 201 specific diagnoses in the DSM-IV of these literature reviews were published in the
were defined using diagnostic criteria. The journal literature. Second, the work groups
average number of criteria per diagnosis was solicited contributions of descriptive data from
almost eight. Using this estimate, the DSM-IV researchers in the field. Using these data, the
contains slightly over 1500 diagnostic criteria work groups reanalyzed the data to decide
for the 201 diagnoses. which diagnostic criteria needed revision. Third,
To give the reader a glimpse of how the a series of field trials was performed on specific
diagnostic criteria have changed from the DSM- topics. For instance, the personality disorders
III to the DSM-IV, the criteria for histrionic work group performed a multicenter study on
personality disorder are listed below: antisocial personality disorder which led to a
significant alteration in the diagnostic criteria
A pervasive pattern of excessive emotionality and for that disorder.
attention seeking, beginning by early adulthood The DSM-IV was not without controversy.
and present in a variety of contexts, as indicated by For instance, the issues that had been raised in
five (or more) of the following: the DSM-III-R regarding premenstrual syn-
(1) is uncomfortable in situations in which he or drome, masochistic personality disorder, and
she is not the center of attention sadistic personality disorder continued in the
(2) interaction with others is often character- DSM-IV. Interestingly, none of these three
ized by inappropriate sexually seductive or disorders were included in the DSM-IV. In fact,
provocative behavior two (masochistic and sadistic personality dis-
(3) displays rapidly shifting and shallow expres-
sion of emotions
orders) completely disappeared from the classi-
(4) consistently uses physical appearance to fication. PMS remained in an appendix as a
draw attention to oneself disorder ªfor further study.º Interestingly, 17
(5) has a style of speech that is excessively other disorders joined PMS in this appendix as
impressionistic and lacking in detail did three possible new axes for the multiaxial
(6) shows self-dramatization, theatricality, and system (defense mechanisms, interpersonal
exaggerated expression of emotion functioning, and occupational functioning).
(7) is suggestible, i.e., easily influenced by Earlier editions of the DSM had few, if any,
others or circumstances references to document the sources for any
(8) considers relationships to be more intimate factual claims in these classifications. The
than they actually are
DSM-IV attempt to overcome this problem
was by publishing a five-volume companion set
In addition to presenting diagnostic criteria, the of sourcebooks. These sourcebooks are edited
DSM-IV contains supplementary information papers by members of the work groups. The
about the mental disorders in its system. For intent of the sourcebooks is to provide a
instance, there are three pages of information scholarly basis for understanding the specific
about histrionic personality disorder including decisions that were made by the work groups.
diagnostic features (prose description of
symptoms)
associated features and disorders (mental 4.03.8 ICD-9, ICD-9-CM, AND
disorders that are likely to co-occur) ICD-10
specific culture, age and gender features Earlier in this chapter, the point was made
prevalence that the DSM-II and ICD-8 were virtually
differential diagnosis (how to differentiate identical because the American psychiatric
the disorder from others with which it is likely community had joined an international move-
to be confused) ment to create a consensual classification. With
the revolutionary DSM-III, American psychia-
In order to help ensure that the DSM-IV would try reversed itself and created a radically new
be the best possible classification of mental classification based upon the purpose of
68 Diagnostic Models and Systems

description, rather than emphasizing a system 300.4 Neurotic depression


that would be acceptable world-wide. 300.5 Neurasthenia
The editions of the ICDs were intended to be 300.6 Depersonalization syndrome
revised every 10 years. The ICD-8 was published 300.7 Hypochondriasis
in 1966; the ICD-9 came out in 1977. The mental 300.8 Other neurotic disorders
disorders section of the ICD-9 was very similar 300.9 Unspecified
to the ICD-8/DSM-II (WHO, 1978). The
psychotic/nonpsychotic distinction was the In the ICD-9 system, all diagnoses have four-
primary hierarchical distinction among cate- digit codes. The codes for all mental disorders
gories. The psychotic disorders were further range from 290 to 319. The 29x disorders are the
subdivided into organic and functional psy- psychotic disorders; 300±315 are reserved for
choses. There were 215 categories in this system, nonpsychotic disorders; and 316±318 are codes
and the ICD-9 was published in a monograph for classifying mental retardation. The first
that was 95 pages in length. subheading under the nonpsychotic disorder
The USA has signed an international treaty is the neurotic disorders. Notice that this
that obliges it to use the ICD as the official subheading includes what the DSM-III recog-
medical classification. Thus, when the DSM-III nizes as anxiety disorders but it also includes
was created, an odd numeric coding scheme was categories that the DSM-III placed under other
incorporated so that the DSM-III categories headings (e.g., neurotic depression = dysthy-
could be incorporated with the ICD-9 frame- mic disorder and depersonalization syndrome).
work. To understand this, below is an overview Because the DSM-III anxiety disorders were
of the specific diagnostic categories under the mostly found under the ICD-9 neurotic dis-
general heading of anxiety disorders in the orders, these categories were given 300.xx codes.
DSM-III: However, post-traumatic stress disorder
(chronic; PTSD) was given a code number of
Phobic disorders 309.89 because it was included under the general
300.21 Agoraphobia with panic attacks ICD-9 heading of adjustment reactions. Notice
300.22 Agoraphobia without panic attacks also that PTSD has an xxx.8 coding. In the ICD-
300.23 Social phobia 9 coding system, all diagnoses with an xxx.8
300.29 Simple phobia code represent country-specific categories that
Anxiety states are generally recognized by the international
300.01 Panic disorder psychiatric community. Thus, PTSD (309.89) is
300.02 Generalized anxiety disorder a US category that has no clear equivalent in the
300.30 Obsessive compulsive disorder international diagnostic system. Another DSM-
Post-traumatic stress disorder III category with a coding that reflects a similar
308.30 Acute status is borderline personality disorder
309.81 Chronic or delayed (301.83).
300.00 Atypical anxiety disorder In order to blend the DSM-III with the ICD-9
so that a consistent coding system would be
Notice that the coding scheme for the anxiety used, a US version of the ICD-9 was created.
disorders in the DSM-III is not what one might This new version was the ICD-9-CM (where
expect. Most of the anxiety disorders are coded CM stands for ªclinical manualº). The ICD-9-
with 300.xx numbers. However, the two forms CM is the official coding system that all
of post-traumatic stress disorder are coded physicians and mental health professionals must
308.30 and 309.81. Notice also that the first use when assigning diagnostic codes. However,
number after the decimal point is somewhat American clinicians do not need to refer to the
irregular. The phobic disorders, listed first in the ICD-9-CM because the applicable codes are
DSM-III, are given 300.2x codes whereas the listed in the printed versions of the DSM-III,
anxiety states are coded 300.0x or 300.30. DSM-III-R and DSM-IV.
To understand why the DSM-III codes As noted earlier, the DSM-III and its
appear this way, below is a listing of the specific successors (DSM-III-R and DSM-IV) were
neurotic disorders in the ICD-9: resounding successes. Not only did these
systems become dominant in the USA, but they
Mental disorders (290±319) also achieved substantial popularity among
Nonpsychotic mental disorders (300±316) European mental health professionals (Mez-
Neurotic disorders (300) zich, Fabrega, Mezzich, & Coffman, 1985). The
300.0 Anxiety states proponents of the ICD were somewhat resentful
300.1 Hysteria (Kendell, 1991). Thus, when the ICD-10 was
300.2 Phobic states created, substantial changes were made that
300.3 Obsessive-compulsive disorders utilized innovations from the DSM-III.
Controversies and Issues 69

First, like the DSM-III, the ICD-10 went 4.03.9.1 Organizational Models of
through extensive field testing (Regier, Klaeber, Classification
Roper, Rae, & Sartorius, 1994). There were two
major goals in the field testing: (i) to ensure that There are four organizational models of
the ICD-10 could be used in a reliable way classification that have been frequently dis-
across clinicians, and (ii) to examine the cussed in the literature. Often these models are
acceptability of the mental disorder categories seen as competing and incompatible. For
contained in this system. The data reported instance, most discussions of the first two
regarding both of these goals have given a models, the categorical and the dimensional,
favorable view of the ICD-10. are presented as if the mental health professions
The second important innovation of the ICD- must choose between them. However, hybrid
10 is that the mental disorders section is models are possible (Skinner, 1986) and perhaps
published in two forms. One form, the blue even the most probable.
manual, is subtitled Clinical descriptions and Mental disorders are usually discussed as if
diagnostic guidelines (WHO, 1992). The blue they are categories. Thus, a categorical model is
manual contains prose definitions of categories often implicitly assumed to be the structural
and is primarily intended for clinical use. The model for psychopathology (Kendell, 1975).
other, the green manual, is like the DSM-III in The tenets of this model are listed below:
that the categories are defined using explicit
diagnostic criteria with rules regarding how 1.1 The basic objects of a psychiatric classification
many criteria must be met in order for a are patients
diagnosis to be made (WHO, 1993). The green 1.2 Categories should be discrete, in the sense that
manual is intended for research use. the conditions for membership should be able
The complete ICD-10 is organized into a to be clearly ascertained
series of 21 chapters, one of which is Chapter V 1.3 Patients either belong or do not belong to
specified classes (categories)
(labeled with the prefix F) about ªMental and 1.4 The members of a category should be rela-
behavioural disorders.º Other chapters in the tively homogeneous
ICD-10 are: 1.5 Categories may or may not overlap
1.6 In the borderline areas where categories may
Chapter I A±B Infections and parasitic dis- overlap, the number of overlapping patients
eases should be relatively small
1.7 Cluster analytic methods can be used to
Chapter II C±D Neoplasms initially identify categories
Discriminant analysis is used to validate
Chapter X J Diseases of the respiratory
categories. (Blashfield, 1991, p. 14)
system
Chapter XXI Z Factors influencing health The DSMs, particularly the DSM-III and its
status and contact with health services successors, are seen as fitting a categorical
model. According to the categorical model,
In terms of classificatory size, the mental the unit of analysis is the patient. Diagnoses
disorders section of the ICD-10 and the DSM- refer to classes of patients. Patients either are or
IV are reasonably similar. The DSM-IV con- are not the members of the categories. The
tains 354 categories organized under 17 major categorical model assumes that some type of
headings. The ICD-10 has 389 categories that definitional rule exists by which the member-
are structured under 10 major headings. One ship in a category can be determined. Moreover,
ironic feature of the ICD-10 is that it did not membership in a category is considered to be a
adopt a multiaxial system of classification. This discrete, all-or-nothing event. An animal is
decision is ironic because the idea originated in either a cat or not a cat. A patient is either a
Europe, most prominently by a Swedish psy- schizophrenic or not a schizophrenic.
chiatrist, Essen-Moller (1971). An important assumption of the categorical
model is that members of a category are
4.03.9 CONTROVERSIES AND ISSUES relatively homogeneous. All animals that be-
long to the class of ªbirdsº should be reasonably
Although the DSM-III and its successors are similar morphologically. This is not to say that
usually viewed as substantial improvements in all birds must be alike. Certainly a robin and
the classification of mental disorders, a number sparrow have a number of obvious differences.
of controversies and issues still remain con- Yet they are more like each other than either is
cerning the classification of mental disorders. to a lynx, for instance. In the same way, two
The remainder of this chapter attempts to schizophrenic patients should be relatively
provide an overview of some of these issues. similar. Both may have different symptom
70 Diagnostic Models and Systems

pictures, but their symptom pictures should be 2.1 The basic unit of the dimensional model is a
more similar to each other than either is to an descriptive variable (e.g., a symptom, a scale
antisocial patient (Lorr, 1966). from a self-report test, a laboratory value, etc.)
Classes in a categorical model may or may not 2.2 Dimensions refer to higher-order, abstract
variables
overlap. Most uses of the categorical model
2.3 A dimension refers to a set of correlated
typically assume that nonoverlapping categories descriptive variables
are an ideal condition, but recognize that this 2.4 There are a relatively small number of dimen-
condition may not always happen. Thus, overlap sions compared to the number of descriptive
among categories is often treated like measure- variables, yet the dimensions account for
ment errorÐa condition to be tolerated, but almost as much reliable variance as do the
minimized. However, there are categorical larger number of descriptive variables
models that have been developed in which the 2.5 Dimensions themselves may be correlated or
categories are assumed to overlap (Jardine & independent
Sibson, 1971; Zadeh, 1965). In these models, 2.6 The methods used to identify dimensions are
exploratory factor analysis and multidimen-
overlap is not error. Categories are fuzzy sets sional scaling. Confirmatory factor analysis
whose boundaries of membership do not need to can be used to test a specific dimensional
result in mutually exclusive groupings. model. (Blashfield, 1991, p. 15)
According to the assumption of relative
homogeneity, the number of patients who For the dimensional model, the basic units of
clearly belong to one and only one category analysis are the descriptive variables. Thus, the
should be relatively frequent, whereas patients dimensional model focuses on symptoms, be-
who fall in the overlapping, borderline areas haviors, diagnostic criteria, scales from self-
between categories should be relatively infre- report tests, and the like. The dimensional
quent. A sparrow±lynx should not occur if model summarizes these descriptive variables
categories are to have the necessary homogen- by forming higher-order abstract variables that
eity that allows them to be separable constructs can serve to represent the original measurement
(Grinker, Werble, & Drye, 1968). variables. Each of these higher-order abstract
Finally, the methods that have been devel- variables constitutes a dimension through its
oped to find the boundaries among categories conceptualization as occurring on a continua.
are called cluster analytic methods (Everitt, Patients can have scores anywhere along these
1974). Generally, these methods analyze a large dimensions.
matrix of descriptive data on patients, and A major test of a dimensional model is
attempt to form clusters (categories) of rela- parsimony. The specific dimensions in such a
tively homogeneous patients in terms of the system should account for most of the system-
descriptive variables that were gathered on the atic, reliable variance that exists within the
patients. Cluster analysis was used in the 1960s original set of descriptive variables. If the
and 1970s to create new descriptive classifica- dimensions do not account for the reliable
tions. In the last decade, most researchers have variance in the original descriptive variables,
abandoned the use of these methods because of then using the dimensions will sacrifice a great
pragmatic difficulties in their application and deal of information and the original variables
because of unsolved statistical issues. Meehl, should be used rather than the smaller set of
however, has developed a related method for dimensions.
isolating categories that he believes has promise A third structural model that is often
(Meehl, 1995). discussed regarding the classification of mental
Although the categorical model, as presented disorders is a disease model. The basic assump-
above, seems to be a common sense model of tion in this model is that all diagnostic categories
psychiatric classification, the recent DSMs refer to medical diseases (Wing, 1978). In effect,
clearly do not adhere to this model. First, as this model is a modern extension of Griesinger's
noted above, a categorical model assumes that famous nineteenth-century dictum that all
the unit of analysis is the patient and that groups mental disorders are diseases of the brain
of patients are members of similar sets called (Stromgren, 1991). The tenets of the disease
mental disorders. The authors of the DSM-III, model are:
DSM-III-R and DSM-IV explicitly reject this
approach. They state that these classifications
3.1 The fundamental units are biological diseases
are not intended to classify individual patients of individual patients (essentialism)
(Spitzer & Williams, 1987). Instead, these recent 3.2 Each diagnosis refers to a discrete disease
DSMs state that they are classifying disorders 3.3 Diagnostic algorithms specify objective rules
(rather than patients). for combining symptoms to reach a diagnosis
A second structural model is the dimensional 3.4 Adequate reliability is necessary before any
model. The tenets for this model are: type of validity can be established
Controversies and Issues 71

3.5 Good internal validity will show that the child will have a grasp of the concept. Later, the
category refers clearly described patterns of child will learn to abstract the essential features
symptoms of the concept. This occurs by making observa-
3.6 Good external validity will mean that the tions about similarities that occur among
diagnosis can be used to predict prognosis,
course and treatment response. (Blashfield,
instances of the concept (e.g., internal feelings,
1991, p. 8) interpersonal context, etc.). Russell and Fehr
(1994) provide an interesting and more com-
plete discussion of the concept of anger from a
Some authors have assumed that a catego- prototype perspective.
rical model and a disease model are the same. Another important aspect of the prototype
These models are not identical. A categorical is model is the idea that not all instances of a
neutral about the existential status of the concept are equally good representatives of the
categories in its system. A disease model adopts concept. A robin, for instance, is a good
a stronger view. Diseases do have an existential exemplar of a bird. Robins are about the same
status. Diseases are real. The goal of medical size as most birds; they have feathers; they can
research is to identify, describe, understand and fly; etc. Penguins, however, are not a good
eventually treat these diseases. This belief in the exemplar. Penguins are larger than most birds;
reality of diseases is associated with a broader they cannot fly; they do have feathers, although,
view about the status of scientific concepts to a child, their covering probably seems more
known as essentialism. like fur than feathers; etc.
Notice also that diseases are not necessarily The above presentation of the prototype
categorical, at least as this model was described model is easy to understand and seems like a
above. For instance, more than one disease can common-sense view of classification. However,
occur in the same patient. In fact, some diseases advocates of the prototype model argue that this
are very likely to co-occur (e.g., certain model is radically different than a categorical
sarcomas have high frequency in patients with model (Barasalou, 1992; Russell, 1991). Accord-
AIDS). Thus, diseases do not refer to mutually ing to the categorical model, classificatory
exclusive sets. In addition, there are diseases concepts are defined by listing the features that
that are conceptualized as dimensional con- are sufficient for making a diagnosis. If a given
structs. Hypertension is the most common instance has a sufficient number of these
example. Patients with hypertension vary along features, then that instance is a member of the
a continuum. A categorical scaling of this classificatory concept. Moreover, all members
continuum is possible, but imposing a categor- of a concept are equal. A square is a square. One
ical separation on this continuum is arbitrary. square is not squarer than another square. In
The fourth model of classification is the contrast, the prototype model does stipulate that
prototype model. Cantor, Smith, French, and some instances of a concept are better exemplars
Mezzich (1980) have suggested that this model is than others. The Glenn Close character in Fatal
superior to the implicit categorical model of attraction is a better representation of borderline
psychiatric classification. For those readers who personality disorder than the Jessica Walter
do not know what the prototype model is, the character in Play Misty for me.
easiest way to conceptualize this model is The basic tenets of the prototype model are
through an example. presented below:
According to the prototype model, if a mother
wanted to teach a child what ªangerº means, she 4.1 Diagnoses are concepts that mental health
would not say ªSteven, you need to understand professionals use (nominalism)
that anger is an emotion that many of us feel 4.2 Categories are not discrete
when we are frustrated or upset.º Instead, at 4.3 There is a graded membership among different
instances of a concept
some point when little Steven is upset because he 4.4 Categories are defined by exemplars
has to go to bed and he tries to hit his mother, she 4.5 Features (symptoms) are neither necessary nor
would say ªI know that you are angry, Steven, sufficient to determine membership
but it is time to go to bed.º And on another day, 4.6 Membership in a category is correlated with
when Steven is upset because another child took number of features (symptoms) that a patient
one of his toys, his mother might say, ªYou are has. (Blashfield, 1991, p. 11)
feeling angry, Steven. Being angry is natural, but
you should not hit the other child. Maybe if you The major difference between the disease and
ask Carol she will return your toy.º the prototype model is that the latter is asso-
In effect, a child learns a concept by being ciated with nominalism. Nominalism is the
presented with instances of the concept. Once position that the names of diseases are just
the child is able to associate the instances with a convenient fictions that clinicians use to orga-
verbal label (i.e., the word ªangryº), then the nize information. Diagnostic terms do not have
72 Diagnostic Models and Systems

some underlying reality of their own. Diagnostic particular type of extensional definition in
concepts are names. These concepts exist simply which a concept is defined by an outstanding
to meet some functional goal. or exemplary instance of the concept. Thus,
The preceding discussion of organizational Micky Mantle might be a good exemplar of the
models of the classification of psychopathology 1957 New York Yankees even though Mantle's
is overly simplistic. Each of these models, when batting prowess was hardly average. In the same
examined more closely, can become quite way, Abraham Lincoln might be seen as
complex, and the apparent distinctions between exemplar of American presidents, even though
the models blur. Two instances of this complex- he was not average for this set of individuals.
ity will be described. First, although the
categorical and dimensional are usually pre- 4.03.9.2 Concept of Disease
sented as if they are competing and antagonistic,
Skinner (1986) has suggested that these models The discussion of the disease model vs. the
are actually complementary. He suggested that prototype model led to a brief introduction
the measurement model associated with a regarding the dualism of essentialism vs.
dimensional perspective is the more fundamen- nominalism. This dualism is associated with a
tal of the two models. The dimensional model complicated problem that has bothered writers
only assumes that, in order to assess a patient, a about classification throughout the last two
clinician should gather information on specific centuries. What is a disease? Do the two concepts
descriptive variables that are correlated and that of ªmental diseaseº and ªmental disorderº have
can be summarized by higher-order variables the same meaning? To discuss the issues
(dimensions). The categorical model also as- associated with the meaning of ªdisease,º the
sumes that descriptive variables can be sorted writings of a British internist named Scadding
into dimensions. But, in addition, the categorical will be discussed. At the end of this section, other
model asserts that the patients themselves approaches to the concepts of ªdiseaseº and
ªclusterº into groups that are relatively homo- ªdisorderº are briefly introduced.
geneous on these dimensions. Thus, from Scadding's (1959) first attempt to discuss the
Skinner's hybrid perspective, a pure categorical meaning of disease occurred in a short essay.
model makes stronger assumptions about de- This essay offered his first general definition of
scriptive data than does a dimensional model. disease which read:
However, psychological models of human social
cognition suggest that categorical models are The term ªa diseaseº refers to those abnormal
more basic than are dimensional models (Wyer phenomena which are common to a group of living
organisms with disturbed structure or function, the
& Srull, 1989). group being defined in the same way.
A second example of the complexity of these
models is associated with Barsalou's distinction
In effect, Scadding was saying that a disease
between ªprototypeº and ªexemplarº models.
was associated with a cluster of signs and
There are two types of approach that can be
symptoms (i.e., abnormal phenomena) that
used to define a concept: intensional definitions
are associated with some functional or structur-
and extensional definitions. An intensional
al disturbance in the human body. Scadding
definition lists the features that can be used to
went on to argue that a disease had (i) defining
separate the concept from related concepts (e.g.,
characteristics and (ii) diagnostic criteria. The
a square is a four-sided closed figure whose sides
defining characteristics refer to the indications
are equally long and occur at right angles to
that prove the presence of the disease (e.g.,
each other). In contrast, an extensional defini-
locating syphilitic bacilli in the brains of in-
tion is a definition by listing the members of the
dividuals with paresis). In contrast, the diag-
category (e.g., the 1957 New York Yankees
nostic criteria are signs and symptoms of the
included Roger Marris, Yogi Berra, Micky
disease that may or may not be present (e.g.,
Mantle, etc.). Barsalou says that a prototype
motor paralysis, grandiose delusions, and slug-
model uses an intensional definition for cate-
gish pupillary response to light would be
gories in which the prototype represents the
possible diagnostic criteria for paresis).
average (centroid) of the concept. Using the
Ten years later, Scadding (1969) revised his
example of a child learning about birds,
definition of disease to read as follows:
Barasalou suggests that the reason that a robin
is a prototype for bird, whereas a penguin is not, A disease is the sum of the abnormal phenomena
is that robins have the average features of most displayed by a group of living organisms in
birds (small size, bright coloring, migration, association with a specified common characteristic
food choices, etc.). Penguins, in contrast, are or set of characteristics by which they differ from
statistically unusual on these dimensions. An the norm for their species in such a way as to place
exemplar model, according to Barsalou, uses a them at biological disadvantage.
Controversies and Issues 73

There are four important points to note associated morbid anatomy; or it can be defined
about this second definition of disease. First, through a recognition of its etiological cause.
the emphasis is on abnormal phenomena. The third point to note about Scadding's
Scadding wanted to be quite clear that the definition is its emphasis on norms, in that
name of the disease does not refer to the disease refers to an abnormality. To be a
etiologic agent causing the disease. That is, disease, the condition must refer to phenomena
tuberculosis is not simply defined by the pre- that are statistically deviant. For instance, most
sence of a particular bacterium, Mycobacterium of us have various types of bacteria that
tuberculosis. To have tuberculosis a patient normally reside in our intestines and which
must manifest the symptoms of the disease as are important in the digestive process. The
well as the anatomical changes (i.e., the forma- presence of these bacteria do not define a
tion of characteristic lesions called tubercles in disease. The effects of these bacteria are
the lung) associated with this disease. This normative. In fact, individuals without these
distinction is important because there are other bacteria have abnormal digestive processes.
bacilli, besides Mycobacterium tuberculosis, Finally, the definition ends with the term
which can cause these lesions and the symptom ªbiological disadvantage.º Scadding intro-
pattern of tuberculosis. duced this term because he recognized that
Second, the definition contains the rather not all nonnormative aspects of human struc-
vague phrase ªcommon characteristic.º Scad- ture and functioning should be called diseases.
ding argued that there are three general ways of For instance, some individuals produce an
characterizing any individual disease: (i) a abnormal amount of ear wax. However, this
clinical-descriptive approach, (ii) a morbid should not define a disease unless there is some
anatomical approach, and (iii) an etiological biological disadvantage associated with this
approach. (Note that these approaches were condition. Although the term biological dis-
presented almost a century earlier by Hammond advantage is not more precisely specified, its
(1883), as previously discussed.) The clinical- general meaning seems clear: syphilis and
descriptive approach is simply the description of diabetes place an individual at biological
the ªsyndrome.º That is, the clinical-descriptive disadvantage since both can lead to death if
approach outlines a loose cluster of signs and untreated.
symptoms that are correlated in their appear- In 1979, Scadding and two Canadian authors
ance in patients. For instance, the clinical- (Campbell, Scadding and Roberts, 1979) ex-
descriptive approach to defining diabetes fo- tended their ideas about disease by studying
cuses on frequent urination, an intense thirst, what physicians and nonphysicians meant by
and rapid loss of weight as indications of this the concept of disease. They published a survey
disorder. The clinical-descriptive approach that they had conducted regarding the meaning
dominated when the DSM-III and its successors of disease. To conduct their survey, these
were created. The second approach concerns authors read a list of possible diseases to four
morbid anatomy. This refers to the anatomical groups of individuals: (i) a group of medical
changes in the body's structure associated with faculty, (ii) a group of nonmedical faculty, (iii) a
the disease. For diabetes mellitus, a morbid sample of general practice physicians, and (iv) a
anatomy view might define this disease in terms sample of youth in British and Canadian
of the destruction of b-insulin-producing cells in schools. The subjects in this study were asked
the pancreas. Finally, the etiological approach to note whether the terms being read aloud
would be to define a disease in terms of the referred to diseases or not. In addition, the
syndrome caused by a known and specifiable subjects were asked to assign degree of
etiological process. For Type I diabetes mellitus, confidence ratings to their decisions.
this might be an autoimmune process whose At the top of the list of conditions that were
exact details have yet to be specified. For paresis, viewed as diseases are infections (malaria,
the etiological agent is the effect of the syphilitic tuberculosis, syphilis, measles). Virtually every-
bacillus on the central nervous system of the one in the four groups, whether physicians or
affected individual. nonphysicians, agreed that these terms referred
Scadding commented that, historically, to diseases. Syphilis, for instance, was consid-
knowledge about diseases typically proceeds ered a disease by over 90% of the subjects in all
from clinical-description to morbid anatomy to groups.
etiology. Certainly his observation seems to be At the bottom of the list were concepts that
correct when applied to the history of paresis. were not considered diseases by these subjects.
He argued that any of these approaches to Two terms that were seen as referring to diseases
characterizing a disease are appropriate. That by less than 30% of all four groups were
is, a disease can be defined in terms of a clinical drowning and starvation. Many of the terms at
syndrome; or it can be defined by some the bottom of Scadding's list might be described
74 Diagnostic Models and Systems

as injuries, i.e., traumas that affect bodily briefly. To understand the dualism, Wulff et al.
functioning and that were caused by identifiable discussed possible ways to classify defective
external events such as a car accident (e.g., skull grandfather clocks as an example.
fracture) or ingestion of a substance (e.g. Suppose that one examines how people who
barbituate overdose, poisoning). work in a repair shop might classify clocks. The
The psychiatric concepts in the list (schizo- receptionist, who knows very little about the
phrenia, depression, and alcoholism) were workings of the clocks, might classify them
ranked in the middle. There was considerable descriptively. Thus, some clocks would be
variance among the four groups regarding placed together because they do not work after
whether these concepts referred to diseases. being wound; others have broken faces, arms or
For instance, faculty of medical schools rated other parts; and still others do not keep time
these three concepts in the following order: accurately. Another person who might classify
schizophrenia (78%), depression (65%), and the grandfather clocks would be the book-
alcoholism (62%). Children in secondary keeper of the shop. This individual might
schools had quite different impressions of what classify the clocks according to the manufac-
is considered to be a disease: alcoholism (76%), turer and cost of the clock. A third person who
schizophrenia (51%) and depression (23%). might classify the clocks is the repairman. He
One factor that had a large influence on might organize clocks anatomically into those
whether a term referred to a disease concerned with accumulated dirt impeding their normal
the role of a physician in the diagnosis or functioning, those needing replacement parts,
treatment of the disorder. Malaria and syphilis and those with weighting mechanisms that have
require a doctor to diagnose and treat. In become unbalanced. Finally, the owner of the
contrast, starvation can be identified and repair shop, when reporting back to various
treated by nonmedical individuals. The latter manufacturers about the causes of clock
is also true of acne and hemorrhoids, although malfunctions, might classify the clocks etiolo-
the intervention of physicians can prove useful gically. That is, she might report about clocks
for both. Consistent with this view, acne and that have had little care, clocks that become
hemorrhoids were ranked in the middle of the worn over various time intervals of ownership,
list. The potential role of nonphysicians in the and clocks that developed problems after being
treatment of mental disorders may also account moved or damaged.
for the occurrence of schizophrenia, depression Which of these classificationsÐdescriptive,
and alcoholism in the middle of the same list. cost oriented, anatomical, or etiologicalÐis the
Scadding et al (1979, p. 760) concluded their true or best classification of defective grand-
paper with the following interesting comment: father clocks? From the nominalist perspective,
none of these classifications is inherently the
Most people without medical training seem to best. Each of these classifications is imposed by
think of a disease as an agent causing illness. The the needs of the particular individual using the
common concept of ªdiseaseº is essentialist: dis- classification. Each classification serves a func-
eases exist, each causing a particular sort of illness. tion. For any particular function, one classifi-
Doctors tend to adopt a more nominalist position, catory system may be preferable. But none of
but they obviously retain remnants of belief in the these is the true classification of defective clocks.
real existence of diseases. Notice that this apocryphal classification of
defective clocks is analogous to the approaches
When viewed from this dualism of an essen- to defining disease suggested by Scadding:
tialist vs. a nominalist perspective, Scadding clinical-descriptive, morbid anatomical and
had started his definitional attempts from an etiological. The cost oriented classification
essentialist perspective but, by the time of his was simply added as an analogy to how medical
last writings on the topic, he was suggesting that classifications are used by the insurance in-
a nominalist view was preferred. Interestingly, dustry in the USA.
the writings of a prominent British psychiatrist, Wulff's defective clock analogy was borrowed
Kendell, who has also tried to solve this issue, from the British philosopher, John Locke.
have followed the same progression. His ideas Locke had argued that classificatory systems
shifted from a paper trying to settle on an are inherently nominalist, even though the
essentialist meaning of disease (Kendell, 1976) ultimate goal is often essentialist:
to a skeptical discussion of how the disease
model fails to explain alcoholism (Kendell, Therefore we in vain pretend to rank things in
1979) to a nominalist view (Kendell, 1986). sorts, and dispose them into certain classes, under
Because this nominalism vs. essentialism names, by their real essences, that are so far from
dualism is so important, the approach of Wulff, our discovery or comprehension. (Wulff et al.,
Pedersen, and Rosenberg (1986) is discussed 1986, p. 75)
Controversies and Issues 75

In this regard, it is interesting to contrast author, in particular, has attracted substantial


Locke's view of classification to those of his attention in the 1990s for his writings on this
physician friend, Thomas Sydenham. Believing issue. Wakefield (1992, 1993) initially addressed
in an essentialist view of disease, Sydenham this issue by providing a detailed critique of the
made the following statement which has been definition of mental disorder that appeared in
quoted repeatedly since then: the DSM-III. Following this seminal paper,
other theoretical papers (Wakefield, 1997a,
Nature, in the production of disease, is uniform 1997b) proposed a ªharmful dysfunctionº view
and consistent . . . The selfsame phenomena that of how to define mental disorders. A special
you observe in the sickness of a Socrates you section of the Journal of Abnormal Psychology
would observe in the sickness of a simpleton. has been devoted to a discussion of Wakefield's
ideas .
In other words, diseases do exist. They do Besides Wakefield's writings, there are other
have an essence. It is the business of medical important discussions of this definitional issue
research to discover what these essences are. It including an overview by Reznek (1987), a book
was the belief in this essentialist perspective that by Wing (1978), and an edited book on
led nineteenth century researchers to solve the philosophical issues associated with classifica-
etiological issues associated with dementia tion (Sadler, Wiggins & Schwartz, 1994).
paralytica.
Scadding and Wulff et al. warned about the
dangers of essentialist thinking when applied to 4.03.9.3 Two Views of a Hierarchical System of
disease. For instance, Scadding noted that often Diagnostic Concepts
we mistake the disease for the cause of the
disease. Noguchi and Moore (Quetel, 1990) Categories in the classification of mental
discovered the syphilitic bacilli in the brains of disorders are organized hierarchically. This
individuals with paresis. Hence, we might say structural arrangement is commonly recognized
that paresis occurs when syphilis invades the but, since the publication of the DSM-III, two
central nervous system. But the last sentence is different views about this hierarchical structure
misleading. Syphilis is not an organism. The have been discussed. Since these two views are
bacteria, Treponema pallidum, is an organism often confused, the next section briefly dis-
and it could be said to invade the central cussed them.
nervous system. But even if this bacteria were The first approach to the meaning of
present in the brain of an individual, that hierarchy is nested set approach. Consider,
presence does not mean that the individual has for instance, the DSM-II classification of
paresis. To have paresis the individual must mental disorders. In this system, there are two
manifest the characteristic symptoms and broad categories of disorders: (I) psychotic
anatomical changes associated with paresis. disorders and (II) nonpsychotic disorders. The
Wulff et al. end their discussion of nomin- psychotic disorders are further subdivided into
alism vs. essentialism with the following state- (I.A) the organic disorders and (I.B) the
ment: nonorganic disorders. The nonorganic psycho-
tic disorders in the DSM-II were subdivided
The philosophical problem which underlies the into three categories: (I.B.1) schizophrenic
discussion in this chapter is the age-old dispute disorders, (I.B.2) major affective disorders,
about universals, and we have tried to navigate and (I.B.3) paranoid disorders. Then the
between the Scylla of essentialism (or Platonism) schizophrenic disorders were subdivided into
and the Charybdis of extreme nominalism. Essen- various subtypes: (I.B.1.a) simple type, (I.B.1.b)
tialism underlines correctly that any classification hebephrenic type, (I.B.1.c) catatonic type, and
of natural phenomena must reflect the realities of
so on. Notice that this organization of mental
nature, but it ignores the fact that classifications
also depend on our choice of criteria and that this disorders has a similar outline to the organiza-
choice reflects our practical interests and the extent tion of categories in the biological classification.
of our knowledge. Nominalism, on the other hand, Any patient, for instance, who would be
stresses correctly the human factor, but the diagnosed as being hebephrenic, would also
extreme nominalist overlooks that classifications be considered as having schizophrenia disorder
are not arbitrary but must be molded on reality as (the next highest, inclusive category) as well as
it is. (Wulff et al., 1986, pp. 87±88) having a psychotic disorder (an even higher
level, inclusive category). Thus, this approach to
As mentioned earlier, defining the concepts of hierarchy is called a nested set approach because
ªdiseaseº and/or ªdisorderº raise complicated the categories low in the system refer to sets of
issues, and the preceding discussion does not patients that are included (nested) in higher
adequately cover the literature. One American order categories. This approach parallels the
76 Diagnostic Models and Systems

classification of biological organisms in which The important point to note is that disorders
any animal who is a member of the species Felis placed higher in this pecking order view of
catus (housecat) is also member of the genus hierarchy can explain any of the symptoms of
Felis (cats) and a member of an even higher disorders lower in the hierarchy; however, the
order category of Mammalia (mammal). reverse should not occur. There should be
The other approach to hierarchy is called a symptoms that will be manifest in patients with
pecking order view. This view can be best schizophrenia that will not occur in patients with
understood by making analogy to the hierarch- personality disorders.
ical organization of rank among military The pecking order approach to the hierarch-
officers. A colonel is higher in rank than a major ical arrangement of mental disorder categories
who in turn is higher in rank than a lieutenant. In was popularized by Foulds and Bedford (1975).
this pecking order structure, a colonel can give The specifics of their approach to the classifica-
orders to a lieutenant, but a lieutenant cannot tion of mental disorders differs from that
issue orders to a colonel. Thus, the pecking order presented above, but the general outline is the
in military rank concerns lines of authority. A same. An important corollary to this pecking
colonel has authority over a lieutenant. Notice, order view of the hierarchical arrangement
however, that there is no membership nesting in among mental disorder categories is that this
these categories. If a particular individual is a view suggests that there will be a strong severity
lieutenant, then that individual is not a colonel dimension in any descriptive approach to the
even though a colonel is higher in the hierarchy classification of these disorders. Mental dis-
than a lieutenant. orders higher in this system will be associated
To understand how this analogy to the with many more symptoms than are mental
hierarchical arrangement of military rank can disorders lower in this system. Descriptive
be applied to psychiatric classification, consider studies of psychopathology have repeatedly
the following order of general mental disorders: found a strong severity dimension that the
pecking order view would predict.
organic mental disorders
schizophrenic disorders
affective disorders 4.03.9.4 Problem of Diagnostic Overlap/
anxiety disorders Comorbidity
personality disorders.
When discussing the categorical model of
In terms of the pecking order meaning of classification, one of the tenets that was
hierarchy, this order means that disorders high- attributed to the model stated: ªIn the borderline
er in this order should be diagnosed over areas where categories may overlap, the number
disorders lower in the hierarchy. Thus, in terms of overlapping patients should be relatively
of standard diagnostic practice, the presence of small.º Diagnostic overlap refers to the relative
organic mental disorders should be ruled out percentage of patients with one diagnosis who
first, then the schizophrenic disorders, then also meet the criteria for another diagnosis. As
affective disorders, etc. This principle of diag- the tenet above states, some diagnostic overlap is
nostic precedence is analogous to the authority expected. But the relative amount of overlap
relationship among different levels of rank in should be small.
the military. One terminological note should be made
Notice that the pecking order relationship before proceeding. The literature on this issue is
among these five general mental disorders also grouped under the general heading of comor-
carries another implication. If a patient has an bidity. This term is from the medical literature,
organic mental disorder, the patient can (and because it is well recognized that some medical
often does) have the symptoms associated with disorders tend to go together. For instance,
disorders that are lower in the hierarchy. Thus, a individuals who develop AIDS are relatively
patient with Alzheimer's disease can develop likely to develop yeast infections, sarcomas, and
hallucinations like a schizophrenic, can have other disorders because of their compromised
marked sleep disturbance like someone who is autoimmune system. The term comorbidity
depressed, can develop a fear of leaving the refers to the pattern of co-occurrences of these
house like someone with anxiety disorder, and medical disorders. However, because the con-
show the rigidity and need to be controlled like cept of comorbidity implies the acceptance of a
someone with an obsessive-compulsive person- disease model, the preferred term in this chapter
ality disorder. However, a patient with anxiety will be ªdiagnostic overlap.º
disorder such as agoraphobia should not show One of the earliest studies that focused on
the disturbed memory patterns and disorienta- diagnostic overlap was by Barlow, DiNardo,
tion of a patient with an organic mental disorder. Vermilyea, Vermilyea, and Blanchard (1986).
Controversies and Issues 77

These investigators reported on 126 patients with 6% meeting the criteria for six of the eleven
who were referred for the treatment of anxiety. DSM-III-R personality disorders! When Blash-
These patients were administered structured field et al. attempted to identify prototypic
interviews. Of the 126 patients interviewed, 108 patients (i.e., individuals who met at least eight
were assigned one of seven diagnoses that fit of the diagnostic criteria for a specific disorder),
within the anxiety/affective disorder spectrum. they found that only 15% of the patients would
Of these 108 patients 65% were given at least qualify as prototypic. However, most of these
one additional diagnosis. This is a high level of individuals also satisfied the diagnostic criteria
diagostic overlap and apparently was much for other disorders. When a prototypic patient
higher than these researchers had expected a was defined as an individual with eight or more
priori. criteria for a personality disorder and the lack of
A large number of other empirical studies an additional personality diagnosis, only 1% of
have confirmed the high levels of diagnostic the patients were prototypes. In effect, patients
overlap using the DSM-III and subsequent with mixed symptom patterns are much more
classifications. Many of these studies are typical than are patients who represent rela-
discussed in an excellent review by Clark, tively pure forms of a disorder.
Watson, and Reynolds (1995). Examples of Clark, Watson and Reynolds (1995) sug-
the results found in their review are listed below. gested that there are three related issues
For example, these reviewers noted one study of associated with the problem of diagnostic
personality disorder diagnoses in a state overlap. The first issue concerns the hierarchical
hospital population that found that these organization of categories. As discussed earlier,
patients met the diagnostic criteria for an one view of hierarchy is a pecking order
average of 3.75 Axis II disorders. In addition approach. This view was implicitly adopted
to the personality disorders, the depressive by the DSM-III because a number of exclusion
disorders also have striking overlap with many rules were included in the diagnostic criteria for
other disorders. For instance, over half of the different disorders, so that a lower diagnosis
patients with major depressive disorder as well would not be made if a higher order diagnosis
as patients with dysthymic disorder were found was possible. However, research on the exclu-
to have at least one co-occuring mental sionary criteria suggested that these criteria
disorder. Depression shows significant overlap were arbitrary. Exclusionary criteria were
even with disorders that one might not expect mostly deleted in the DSM-III-R and the
overlap. For instance, a sample of antisocial DSM-IV.
personality disorder patients showed that one- The second issue associated with the diag-
third of these individuals also had a depressive nostic overlap issue concerns the heterogeneity
diagnosis (Dinwiddie & Reich, 1993). These with diagnostic categories. In effect, the
antisocial patients, less surprisingly, also had extensive diagnostic overlap suggests that the
high rates of alcoholism (76%) and other definitions of the various mental disorders are
substance use (63%). Even broad epidemiolo- too broad and inclusive. Evidence of excessive
gical studies on normal community samples heterogeneity comes from other sources, ac-
show high rates of diagnostic overlap. Clark cording to Clark et al. For instance, direct
et al. reported that, in two national studies, over studies of the variability in symptom patterns
half of the individuals who had one mental have shown high levels of variability within
disorder diagnosis had at least one more disorders such as schizophrenia, depression,
diagnosis. In some samples, the rate of eating disorders and anxiety disorders. Another
diagnostic overlap is even higher. For instance, line of evidence of heterogeneity is the
a study of suicidal patients showed that these frequency with which mixed or atypical
individuals averaged about four mental dis- diagnoses are used. For instance, Mezzich,
orders (Rudd, Dahm, & Rajab, 1993). Fabrega, Coffman, and Haley (1989) found
Together, these and many other studies that the majority of patients with a dissociative
suggest that the number of overlapping patients disorder fit the criteria for an atypical dis-
among mental disorder diagnoses is not small at sociative disorder.
all. Instead, diagnostic overlap is a standardly The third issue associated with the comor-
occurring phenomenon. Blashfield, McElroy, bidity finding is the increasing support asso-
Pfohl, and Blum (1994) studied 151 patients ciated with replacing the categorical approach
who had been administered a structured inter- to classification with a dimensional view. In the
view to assess personality disorders. In this discussion of these two models, it was noted that
sample, only 24% of the patients met the criteria the dimensional model is the simpler of the two.
for one and only one personality disorder. Unless there is clear evidence of the existence of
Exactly the same percentage met the diagnostic discrete categories, a dimensional approach is
criteria for at least four personality disorders, the more parsimonious.
78 Diagnostic Models and Systems

A number of researchers, when confronted 4.03.11 REFERENCES


with high rates of diagnostic overlap, have
American Psychiatric Association (1933). Notes and
suggested that a dimensional model should be comments: Revised classified nomenclature of mental
used. For instance, in the area of the personality disorders. American Journal of Psychiatry, 90,
disorders where some of the highest rates of 1369±1376.
diagnostic overlap have been found, interest in American Psychiatric Association (1952). Diagnostic and
statistical manual of mental disorders (1st ed.). Washing-
a ªBig Fiveº dimensional approach to the ton, DC: Author.
personality disorders has been attracting in- American Psychiatric Association (1968). Diagnostic and
creasing support. Another example in which statistical manual of mental disorders (2nd ed.). Wa-
a dimensional model is gaining popularity is shington, DC: Author.
in the subclassification of schizophrenia. American Psychiatric Association (1980). Diagnostic and
statistical manual of mental disorders (3rd ed.). Washing-
The classic Kraepelinian subtypes do not ton, DC: Author.
generate sufficiently homogeneous categories. American Psychiatric Association (1987). Diagnostic and
Various dimensional schemes (process±reactive, statistical manual of mental disorders (3rd ed. Rev.).
paranoid±nonparanoid, positive vs. negative Washington, DC: American Psychiatric Press.
American Psychiatric Association (1994). Diagnostic and
symptoms) have more empirical support than statistical manual of mental disorders (4th ed.). Washing-
the use of categorical subtypes. ton, DC: American Psychiatric Press.
Alexander, F. G., & Selesnick, S. T. (1966). The history of
psychiatry. New York: Harper and Row.
Austin, T. J. (1859/1976). A practical account of general
paralysis. New York: Arno Press.
4.03.10 CONCLUDING COMMENTS Barlow, D. H., DiNardo, P. A., Vermilyea, B. B.,
Vermilyea, J., & Blanchard, E. B. (1986). Co-morbidity
This chapter provides a rather simplified and depression among the anxiety disorders: Issues in
overview of the issues associated with the diagnosis and classification. Journal of Nervous and
classification of psychopathology. It attempts Mental Disease, 174, 63±72.
to help the reader gain a better understanding of Barsalou, L. W. (1992). Cognitive psychology: An overview
for cognitive scientists. Hillsdale, NJ: Erlbaum.
classification by starting with a reasonably Bayer, R. (1981). Homosexuality and American psychiatry.
detailed history of classificatory systems, and New York: Basic Books.
thereby give some idea how many of the features Berrios, G. E. (1993). Personality disorders: A conceptual
of contemporary classificatory systems have history. In P. Tyrer & G. Stein (Eds.), Personality
evolved. The text also presents a succinct and disorder reviewed (pp. 17±41). London: Gaskell.
Berrios, G. E., & Hauser, R. (1988). The early development
readable presentation of four issues that cur- of Kraepelin's ideas on classification: A conceptual
rently attract a reasonable degree of attention in history. Psychological Medicine, 18, 813±821.
the literature. However, justice has not been Blashfield, R. K. (1991). Models of psychiatric classifica-
done to many of the other complex issues that tion. In M. Hersen & S. M. Turner (Eds.), Adult psycho-
pathology and diagnosis (2nd ed. pp. 3±22). New York:
face both clinicians and scientists interested in Wiley.
classification, such as whether to use semistruc- Blashfield, R. K., McElroy, R. A., Pfohl, B., & Blum, N.
tured interviews as the ªgold standardº for (1994). Comorbidity and the prototype model. Clinical
measurement (Mezzich, 1984), the role of values Psychology: Science and Practice, 1, 96±99.
in the diagnostic practice of mental health Blustein, B. E. (1991). Preserve your love for science: Life of
William A. Hammond, American neurologist. Cambridge,
professionals, whether or not certain mental UK: Cambridge University Press.
disorders are sexually or racially biased (Bus- Braslow, J. T. (1995). Effect of therapeutic innovation on
field, 1996; Nuckolls, 1992; Widom, 1984), the perception of disease and the doctor±patient relation-
relevance of life-span measures of psychopathol- ship: A history of general paralysis of the insane and
ogy (Roff & Ricks, 1970), and the problem of malaria fever therapy, 1910±1950. American Journal of
Psychiatry, 152, 660±665.
focusing on the individual patient as the basic Busfield, J. (1996). Men, women and madness: Under-
unit of psychopathology (as opposed to families standing gender and mental disorder. New York: New
or systems or interpersonal relationship pat- York University Press.
terns) (Clarkin & Miklowitz, 1997; Francis, Campbell, E. J. M., Scadding, J. G., & Roberts, R. J.
(1979). The concept of disease. British Medical Journal,
Clarkin & Ross, 1997; Williams, 1997) 2, 757±762.
Like any general topic, the classification of Cantor, N., Smith, E. E., French, R. D., & Mezzich, J.
psychopathology becomes a very complex topic (1980). Psychiatric diagnosis as a prototype categoriza-
when analyzed in detail. tion. Journal of Abnormal Psychology, 89, 181±193.
Clark, L. A., Watson, D., & Reynolds, S. (1995). Diagnosis
and classification of psychopathology: Challenges to the
current system and future directions. Annual Review of
Perhaps I believe that the world can get forward
Psychology, 46, 121±153.
most by clearer and clearer definitions of funda- Clarkin, J. F., & Mikowitz, D. J. (1997). Marital and family
mentals. Accordingly I propose to stick to the communication difficulties. In T. A. Widiger, A. J.
tasks of nomenclature and terminology, unpopu- Francis, H. A. Pincus, R. Ross, M. B. First, & W. Davis
lar and ridicule-provoking though they may be. (Eds.), DSM-IV sourcebook (Vol. 3, pp. 631±672).
(Southard, as quoted by Menninger, 1963, p. 3) Washington, DC: American Psychiatric Press.
References 79

Dinwiddie, S. H., & Reich, T. (1993). Attribution of Mezzich, J. E., Fabrega, H., Coffman, G. A., & Haley, R.
antisocial symptoms in coexistent antisocial personality (1989). DSM-III disorders in a large sample of psychia-
disorder and substance abuse. Comprehensive Psychiatry, tric patients: Frequency and specificity of diagnosis.
34, 235±242. American Journal of Psychiatry, 146, 212±219.
Essen-Moller, E. (1971). Suggestions for further improve- Mezzich, J. E., Fabrega, H., Mezzich, A. C., & Coffman,
ment of the international classification of mental G. A. (1985). International experience with DSM-III.
disorders. Psychological Medicine, 1, 308±311. Journal of Nervous and Mental Disease, 173, 738±741
Everitt, B. S. (1974). Cluster analysis. New York: Halstead Nelson, G., & Platnick, N. (1981). Systematics and
Press. biogeography: Cladistics and vicariance. New York:
Feighner, J. P., Robins, E., Guze, S., Woodruff, R. A., Columbia University Press.
Winokur, G., & Munoz, R. (1972). Diagnostic criteria Nuckolls, C. W. (1992). Toward a cultural history of the
for use in psychiatric research. Archives of General personality disorders. Social Science and Medicine, 35,
Psychiatry, 143, 57±63. 37±47.
Feinstein, A. R. (1967). Clinical judgment. Huntington, Plaut, F. (1911). The Wasserman sero-diagnosis of syphilis
VA: Krieger. in its application to psychiatry. New York: Journal of
Foulds, G. A., & Bedford A. (1975). Hierarchy of classes of Nervous and Mental Disease Publishing Company.
personal illness. Psychological Medicine, 5, 181±192. Project Match Research Group (1997). Matching alcohol-
Francis, A. J., Clarkin, J. F., & Ross, R. (1997). Family/ ism treatments to client heterogeneity: Project MATCH
relational problems. In T. A. Widiger, A. J. Francis, H. posttreatment drinking outcomes. Journal of Studies on
A. Pincus, R. Ross, M. B. First, & W. Davis (Eds.), Alcohol, 58, 7±29.
DSM-IV sourcebook (Vol. 3, pp. 521±530). Washington, Quetel, C. (1990). History of syphilis. Baltimore: Johns
DC: American Psychiatric Press. Hopkins University Press.
Goffman, E. (1961). Asylums. London: Penguin. Raines, G. N. (1952). Forward. In American Psychiatric
Grinker, R. R., Werble, B., & Drye, R. C. (1968). The Association, Diagnostic and statistical manual of mental
borderline syndrome. New York: Basic Books. disorders (1st ed., pp. v±xi). Washington, DC. American
Hammond, W. A. (1883). A treatise on insanity in its Psychiatric Assocation.
medical relations. New York: Appleton. Regier, D. A., Kaelber, C. T., Roper, M. T., Rae, D. S., &
Hempel, C. G. (1965). Aspects of scientific explanation. Sartorius, N. (1994). The ICD-10 clinical field trial for
New York: Free Press. mental and behavioral disorders: Results in Canada and
Hull, D. L. (1988). Science as a process. Chicago: the United States. American Journal of Psychiatry, 151,
University of Chicago Press. 1340±1350.
Jardine, N., & Sibson, R. (1971). Mathematical taxonomy. Reznek, L. (1987). The nature of disease. London:
New York: Wiley. Routledge & Kegan Paul.
Kendell, R. E. (1975). The role of diagnosis in psychiatry. Roff, M., & Ricks, D. F. (Eds.) (1970). Life history research
Oxford, UK: Blackwell. in psychopathology. Minneapolis, MN: University of
Kendell, R. E. (1976). The concept of disease. British Minnesota Press.
Journal of Psychiatry, 128, 508±509. Rosenhan, D. L. (1973). On being sane in insane places.
Kendell, R. E. (1979). Alcoholism: A medical or a political Science, 179, 250±258.
problem. British Medical Journal, 1, 367±381. Roth, A., & Fonagy, P. (1996). What works for whom: A
Kendell, R. E. (1986). What are mental disorders? In A. M. critical review of psychotherapy research. New York:
Freedman, R. Brotman, I. Silverman, & D. Huston Guilford.
(Eds.), Issues in psychiatric classification (pp. 23±45). Rudd, M. D., Dahm, P. F., & Rajab, M. H. (1993).
New York: Human Sciences Press. Diagnostic comorbidity in persons with suicidal ideation
Kendell, R. E. (1991). Relationship between the DSM-IV and behavior. American Journal of Psychiatry, 147,
and ICD-10. Journal of Abnormal Psychology, 100, 1025±1028.
297±301. Russell, J. A. (1991). In defense of a prototype approach to
Kirk, S. A., & Kutchins, H. (1992). The selling of DSM: emotion concepts. Journal of Personality and Social
The rhetoric of science in psychiatry. Hawthorne, NY: Psychology, 60, 37±47.
Walter deGruyter. Russell, J. A., & Fehr, B. (1994). Fuzzy concepts in a fuzzy
Klerman, G. L. (1978). The evolution of a scientific hierarchy: Varieties of anger. Journal of Personality and
nosology. In J. C. Shershow (Ed.), Schizophrenia: Social Psychology, 67, 186±205.
Science and practice (pp. 99±121). Cambridge, MA: Rutter, M., Shaffer, D., & Shepard, M. (1975). A multiaxial
Harvard University Press. classification of child psychiatry disorders. Geneva,
Kraepelin, E. (1902/1896). Clinical psychiatry: A text-book Switzerland: World Health Organization.
for students and physicians (6th ed., translated by A. R. Sadler, J. Z., Wiggins, O. P., & Schwartz, M. A. (1994).
Diefendorf). London: Macmillan. Philosophical perspectives on psychiatric diagnostic clas-
Kreitman, N. (1961). The reliability of psychiatric diag- sification. Baltimore: Johns Hopkins University Press.
nosis. Journal of Mental Science, 107, 878±886. Sarbin, T. R. (1997). On the futility of psychiatric
Lacan, J. (1977). Ecruits: A selection. New York: Norton. diagnostic manuals (DSMs) and the return of personal
Laing, R. D. (1967). The politics of experience. London: agency. Applied and Preventive Psychology, 6, 233±243.
Penguin. Scadding, J. G. (1959). Principles of definition in medicine
Lorr, M. (1966). Explorations in typing psychotics. New with special reference to chronic bronchitis and emphy-
York: Pergamon. sema. Lancet, 1, 323±325.
Matza, D. (1969). Becoming deviant. Englewood Cliffs, NJ: Scadding, J. G. (1969). Diagnosis: The clinician and the
Prentice-Hall. computer. Lancet, 2, 877±882.
Meehl, P. E. (1995). Bootstraps taxometrics: Solving the Skinner, H. A. (1986). Construct validation approach to
classification problem in psychopathology. American psychiatric classification. In T. Millon & G. L. Klerman
Psychologist, 50, 266±275. (Eds.), Contemporary directions in psychopathology:
Menninger, K. (1963). The vital balance. New York: Toward the DSM-IV (pp. 307±331). New York: Guilford
Viking. Press.
Mezzich, J. E. (1984). Diagnosis and classification. In S. M. Sneath, P. H. A., & Sokal, R. R. (1973). Numerical
Turner & M. Hersen (Eds.), Adult psychopathology and taxonomy. San Francisco: Freeman.
diagnosis (pp. 3±36). New York: Wiley. Spitzer, R. L., Endicott, J., & Robins, E. (1975). Research
80 Diagnostic Models and Systems

diagnostic criteria. Archives of General Psychiatry, 35, personality disorder diagnosis for women. Journal of
773±782. Personality Disorders, 1, 183±189.
Spitzer, R. L., & Fleiss, J. L. (1974). A re-analysis of the Widom, C. (Ed.) (1984). Sex, roles and psychopathology.
reliability of psychiatric diagnosis. British Journal of New York: Plenum.
Psychiatry, 125, 341±347. Williams, J. B. W. (1997). The DSM-IV multiaxial system.
Spitzer, R. L., & Williams, J. B. W. (1987). Revising DSM- In T. A. Widiger, A. J. Francis, H. A. Pincus, R. Ross,
III: The process and major issues. In G. L. Tischler (Ed.), M. B. First, & W. Davis (Eds.), DSM-IV sourcebook
Diagnosis and classification in psychiatry (pp. 425±434). (Vol. 3). Washington, DC: American Psychiatric Press.
New York: Cambridge University Press. Wing, J. K. (1978). Reasoning about madness. Oxford, UK:
Spitzka, E. C. (1883). Insanity: Its classification, diagnosis Oxford University Press.
and treatment. New York: Bermingham. Woodruff, R. A., Goodwin, D. W., & Guze, S. B. (1974).
Stangl, D., Pfohl, B., Zimmerman, M., Bowers, W., & Psychiatric diagnosis. New York: Oxford University
Corenthal, C. (1985). Structured interview for the DSM- Press.
III personality disorders. Archives of General Psychiatry, World Health Organization (1948). Manual of the interna-
42, 519±596. tional statistical classification of diseases, injuries, and
Stengel, E. (1959). Classification of mental disorders. causes of death. Geneva, Switzerland: Author.
Bulletin of the World Health Organization, 21, 601±663. World Health Organization (1957). Introduction to Manual
Stromgren, E. (1991). A European perspective on the of the international statistical classification of diseases,
conceptual approaches to psychopathology. In A. Kerr injuries, and causes of death (7th ed.). Geneva, Switzer-
& H. McClelland (Eds.), Concepts of mental disorders: A land: Author.
continuing debate (pp. 84±90). London: Gaskell. World Health Organization (1978). Mental disorders:
Szasz, T. (1961). The myth of mental illness. New York: Glossary and guide to their classification in accordance
Hoeber-Harper. with the ninth revision to the International Classification of
Veith, I. (1965). Hysteria: The history of a disease. Chicago: Diseases. Geneva, Switzerland: Author.
University of Chicago Press. World Health Organization (1992). The ICD-10 classifica-
Wakefield, J. C. (1992). The concept of mental disorder: On tion of mental and behavioral disorders: Clinical descrip-
the boundary between biological facts and social values. tions and diagnostic guidelines. Geneva, Switzerland:
American Psychologist, 47, 373±388. Author.
Wakefield, J. C. (1993). Limits of operationalization: A World Health Organization (1993). The ICD-10 classifica-
critique of Spitzer and Endicott's (1978) proposed tion of mental and behavioural disorders: Diagnostic
operational criteria for mental disorder. Journal of criteria for research. Geneva, Switzerland: Author.
Abnormal Psychology, 102, 160±172. Wulff, H. R., Pedersen, S. A., & Rosenberg, R. (1986).
Wakefield, J. C. (1997a). Diagnosing DSM-IVÐPart I: Philosophy of medicine. Boston: Blackwell Scientific.
DSM-IV and the concept of disorder. Behavioral Wyer, R. S., & Srull, T. K. (1989). Memory and cognition in
Research and Therapy, 35, 633±649. a social context. Hillsdale, NJ: Erlbaum.
Wakefield, J. C. (1997b). Diagnosing DSM-IVÐPart II: Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8,
Eysenck (1986) and the essentialist fallacy. Behavioral 338±353.
Research and Therapy, 35, 651±665. Zubin, J. (1967). Classification of behavior disorders.
Walker, L. (1987). Inadequacies of the masochistic Annual Review of Psychology, 28, 373±406.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.04
Clinical Interviewing
EDWARD L. COYLE
Oklahoma State Department of Health, Oklahoma City, OK, USA
and
DIANE J. WILLIS, WILLIAM R. LEBER, and JAN L. CULBERTSON
University of Oklahoma Health Sciences Center, Oklahoma City,
OK, USA

4.04.1 PURPOSE OF THE CLINICAL INTERVIEW 82


4.04.1.1 Gathering Information for Assessment and Treatment 82
4.04.1.2 Establishing Rapport for Assessment and Treatment 82
4.04.1.3 Interpersonal Style/Skills of the Interviewer 82
4.04.1.4 Structuring the Interview 83
4.04.1.4.1 Setting variables 83
4.04.1.4.2 Preparing for the patient 84
4.04.1.5 Introductory Remarks 84
4.04.1.6 How to Open the Interview 85
4.04.1.7 The Central Portion of the Interview 85
4.04.1.8 Closing the Interview 86
4.04.1.9 The Collateral Interview 86
4.04.2 DEVELOPMENTAL CONSIDERATIONS IN INTERVIEWING 87
4.04.2.1 Interviewing Children (Preschool Age through Older Elementary) 87
4.04.2.2 Interviewing Parents 87
4.04.2.3 Social Context 88
4.04.2.4 Developmental Context 89
4.04.2.5 Direct Interview of Children 89
4.04.2.6 Adolescents 90
4.04.2.6.1 Separation±individuation 91
4.04.2.6.2 Resolving conflict with authority figures 91
4.04.2.6.3 Peer group identification 91
4.04.2.6.4 Realistic appraisal and evaluation of self-qualities 92
4.04.2.7 Interviewing Young Adults (18±40 Years) 92
4.04.2.8 Interviewing Adults in Middle Adulthood (40±60 Years) 93
4.04.2.9 Interviewing Older Adults (60±70 Years) 93
4.04.2.10 Interviewing In Late Adulthood (70 Years±End of Life) 93
4.04.3 INTERVIEWING SPECIAL POPULATIONS OR SPECIFIC DISORDERS 94
4.04.3.1 Interviewing Depressed Patients 94
4.04.3.2 Interviewing Anxious Patients 94
4.04.4 SUMMARY 95
4.04.5 REFERENCES 96

81
82 Clinical Interviewing

4.04.1 PURPOSE OF THE CLINICAL diagnosis, evaluate mental status and historical
INTERVIEW data that impact upon the individual, and
provide a full understanding of the important
The clinical interview is extremely important personality, biological, and environmental vari-
as a diagnostic tool in the assessment and treat- ables that have brought the patient to this point.
ment of patients. Clinicians who do thorough All treatment planning begins with some type of
and competent interviews have a much better formal or informal evaluation. The clinical
understanding of the developmental course of interview is the most effective way to gain an
symptoms presented by the patient. Indeed, understanding of the current functioning and the
before there were any personality inventories, difficulties faced by the patient, and is a neces-
before the Rorschach and one-way mirror beha- sary adjunct to the data gathered from other
vioral observation, there was the clinical inter- assessment approaches. In the clinical interview,
view. The purpose of the clinical interview is to the clinician inquires directly and in a focused
gain sufficient information from the informant manner about the patient's development, adap-
or informants to formulate a diagnosis, assess tation, and current difficulties. When the inter-
the individual's strengths and liabilities, assess view is part of a comprehensive evaluation, some
the developmental and contextual factors that features may be emphasized as a result of the
influence the presenting concerns, and to allow specific referral reason that would not be as
planning for any interventions to follow. The prominent in the interview conducted for psy-
interview is in many instances the ultimate clin- chotherapy treatment planning. As an example,
ical tool, and effective interviewing must be an the interview conducted for an initial psycho-
integral part of any clinician's professional educational evaluation of an elementary grade
abilities. Although the clinical interview is used school child to determine reasons for school
primarily to gather information for clinical failure is likely to entail considerable emphasis
evaluation or psychotherapeutic treatment, it upon academic and learning history and the
can also serve the purpose of preparing the involvement of one or more of the child's
patient for therapy, and less frequently the in- teachers as collateral informants. If the same
terview process itself provides some relief from child were to present later for psychotherapeutic
psychological distress. The interview may be interventions to address the depression and
performed in many different settings: outpatient oppositional behaviour problems identified by
private practice, community mental health previous evaluation to be the cause of his
center, psychiatric hospital, prison, emergency academic failure, the interviewer would likely
or medical hospital room, school, and others. spend more time and effort in determining
While the amount of time devoted to the family interactions, parenting skills, and social
interview, the setting, and the purposes may supports.
vary, the features of an effective clinical inter-
view remain the same. When completed, the
interviewer has created a relatively comprehen- 4.04.1.2 Establishing Rapport for Assessment
sive portrait of the patient that can be commu- and Treatment
nicated readily to others and will provide the One function of the clinical interview is to
basis for making important judgments about the prepare the patient for the clinical interventions
subject of the interview. The relative importance that follow, including additional formal assess-
of various symptoms or concerns should be ment procedures. In order to obtain valid
established, along with an estimate of the psychometric data, the patient must be ade-
individual's overall functioning. The relative quately cooperative and invested in the testing
importance of various symptoms or concerns process. The interview can help the clinician
should be established, and some estimate of the achieve this end by providing a sense of
individual's responses in a variety of settings can professional intimacy and a feeling of compas-
be made with an acceptable degree of validity. sion and interest in the patient's well-being.
These features can be said to be a part of any Thus prepared, the respondent is more willing to
clinical interview. Some specific purposes of the give themself over to the process, and to perceive
interview are described next, along with sugges- it as being something that will provide them with
tions about different approaches and emphases some beneficial outcome.
the clinician may be required to take.
4.04.1.3 Interpersonal Style/Skills of the
4.04.1.1 Gathering Information for Assessment Interviewer
and Treatment
While the basic purpose of gathering rela-
The most common purposes of the clinical tively concrete information may be accom-
interview are to gather information to establish a plished by individuals with a minimum of
Purpose of the Clinical Interview 83

training and sensitivity, there are a number of sexual and then avoids asking questions about
personal qualities that tend to improve the sexual functioning and current relationships
quality of information gained and to result in a readily communicates their discomfort with an
more helpful and pleasant experience on the important aspect of the informant's personality.
part of the informant. Chief among these is the The effective interviewer does not perfect an
quality of empathy, which must be readily unexpressive mask, but does develop the ability
recognized by the informant through subtle and to decrease the immediate translation of visceral
overt communications from the interviewer. responses into explicit behaviours during the
Empathy means identifying with and under- interview. Introspection about areas that in-
standing someone else's feelings, motives, or crease the clinician's anxiety and honest con-
word view. It means entering the private frontation about discriminatory beliefs are
perceptual world of another person and being necessary if one is to perform clinical tasks
at home in itÐto be sensitive to what they feel competently and ethically.
(Egan, 1994; Luborsky, 1996). An intellectual
understanding of empathy, however, does not
provide one with the interpersonal skills and 4.04.1.4 Structuring the Interview
experience that result in the ability to truly
resonate to the informant's experience and to As with any therapeutic or evaluative inter-
respond in ways that will ease the flow of vention, the setting and structure of the inter-
information of a personal and often sensitive view have a significant effect on the outcome of
nature. The skill and art of attuning oneself not the interaction. Because the actual face-to-face
only to the overt communications of the patient, time spent with patients must be as productive
but also to the underlying feelings and mean- and positive as possible, the clinician should
ings, must become a continuing focus of take care to prepare for the clinical interview
attention for the interviewing clinician. While prior to contact. While the goals of the interview
much of this process is not fully accessible to may vary somewhat as discussed previously,
conscious awareness, there are some compo- many factors common to all clinical interviews
nents that lend themselves readily to examina- should at least be in one's mind prior to and
tion and modification. For example, the during the interview.
interviewer's responses that communicate to
the informant negative value judgments are
4.04.1.4.1 Setting variables
perhaps more easily modified.
Although the mental health fields and their Some basic attention should be given to
practitioners have often been vilified for their simple environmental factors when preparing
purported moral relativism, no reasonable for the interview. Although many fruitful inter-
clinician would believe himself or herself to be views have been conducted with patients,
free of individual prejudices and deeply-held families, and other sources under conditions
convictions regarding right and wrong. These that might be charitably described as less-than-
values are a part of each person, and to truly optimal, doing so in a comfortable and soothing
expunge them would result in an insipid and environment will often add to an informant's
ineffective shell of a human being. The relevance ease in discussing delicate and/or emotionally
to a discussion of clinical interviewing is this: the charged matters. Seating accommodations
effective interviewer takes care to be aware of should be given consideration, as hard, un-
his or her own expectations and biases regarding comfortable, or rickety, precarious seats can add
human behaviour and strives to avoid making a tinge of anxiety or discomfort to the gestalt of
explicit negative judgments of the informant in the interview process and thus to the evaluation
order to provide a comfortable and supportive and/or treatment that follows the interview. It
environment during the interview. This skill can should go without saying that the space being
be and is developed and improved through used for the clinical interview should be held
careful attention to the process, internal relatively inviolate from intrusions, including
changes within the interviewer during the external noises and conversation to the degree
interaction, and by effective supervision and possible. While most people are able to tolerate
review of actual interviews with other clinicians. minor interruptions such as a ringing telephone,
Often such judgments can be communicated having another clinician open the door while
to the respondent with no more than a change in your patient is tearfully recounting a past
facial expression or in a shift in questioning. trauma is likely to be somewhat harmful to
Specific wording and follow-up questions the tentative alliance you are developing. There-
sometimes can have the effect of casting a chill fore, if you work in a setting with multiple users
upon the interview process. For example, the you will do well to take precautions to avoid such
interviewer who learns the informant is homo- disruptions. A white-sound generator can help
84 Clinical Interviewing

decrease the penetration of external sounds and interview and the expected duration. The
add somewhat to the intimacy of the interaction. interviewer's role and title should be clarified,
Throughout the interview the clinician will be and any supervisory or other training relation-
carefully observing the behaviours of the ship must be disclosed prior to beginning the
subject, noting congruencies and incongruen- interview. It is essential that issues of con-
cies, attending to shifts in voice and posture. One fidentiality be fully addressed, and the infor-
sometimes overlooked source of information mant be given opportunity and encouragement
that may add to the interview process is that of to ask questions about disclosure of informa-
behavioral observations made while the patient tion. If any of the data obtained will be shared
and collaterals are in the waiting area of the clinic with other individuals, this must be explained
or office. Often it is possible to observe clearly. This is of particular importance in
interactions and general demeanor while you forensic or custody evaluations. When inter-
organize paperwork or make other preparations viewing children and parents, keep in mind the
before formally introducing yourself. It may fact that in many jurisdictions the noncustodial
then be helpful to comment during interview on parent may retain full rights to examine medical
some salient interaction or response. Of course, records, including data from the clinical inter-
as has been particularly noted by various view. Even if the informant has signed a general
custody and forensic evaluators (Bricklin, disclosure or consent for treatment, it is the
1990), the behavior in the waiting room must clinician's ethical responsibility to review duties
not be taken as necessarily representative of the to warn and the possible limits on confidenti-
person's usual response outside of the clinic. ality. The legal definition of informed consent
However, one often can observe telling interac- in many jurisdictions is not necessarily satisfied
tional patterns, particularly between parents by the presence of a signature on a form, but
and children, and this may provide opportunity rather is established by questioning the in-
for addressing problematic areas during the formant about their understanding at the time
subsequent interview. the information was given. The best practice is
for the clinician to do their best to make certain
that the person with whom they are commu-
4.04.1.4.2 Preparing for the patient nicating for professional purposes is fully
It is common practice now to present the informed of such issues. In illustration, imagine
patient with relatively extensive questionnaires for a moment being 30 minutes into an
or personal history forms prior to the first clinic interview with a man who informs you very
visit. With this information in hand, the clearly that he intends to use the pistol in his car
clinician may be able to focus in quickly on to shoot his wife when he returns home. If you
the salient symptomatology and current con- have fully informed him of the limits of
cerns. When available, this information should confidentiality, you are in a very distressing
be used to tailor the interview, allotting time situation. If you have not done so, your position
during the face-to-face interview in the most is much worse. The growth of managed care and
effective manner. If the clinician does choose to its attendant prospective treatment review
utilize such instruments, he or she would be well process may complicate the ethical duties
served to take the time necessary to review the involved in the clinical interview. As Corcoran
data prior to entering the room with the and Vandiver (1996) point out, ªThere can be no
informant. Watching the professional read doubt that managed care has restricted clients
and murmur over the forms for minutes while autonomy and interferes with the confidential
the informant is ignored or asked disconnected relationshipº (p. 198). During the initial inter-
questions can be expected to result in a sense of view and prospective utilization review of a
devaluation for the informant. It also gives the patient whose care is managed by a third (or
impression of disorganization and lack of possibly fourth) party, the clinician may find
preparation on the part of the clinician. Neither him or herself in the uncomfortable position of
of these will be helpful in the ensuing interview being more the agent of the managed-care
process. organization than the advocate of the patient. In
such relationships, it is imperative that the
patient be made fully aware at the outset of the
4.04.1.5 Introductory Remarks interview of the additional limits of confidenti-
ality imposed by the managed-care entity. This
It is helpful to develop a standard approach may include multiple reviews of the data gained
to the clinical interview, including the intro- during interview and any subsequent treatment
duction and beginning of the interview. One sessions. An additional ethical concern arises in
should introduce oneself, and give the infor- the clinical interview with regard to the
mant information about the purpose of the establishment of a professional relationship
Purpose of the Clinical Interview 85

and responsibility for the clinical care of the limitations, disorientation, or distractibility, it
patient. Does performing the clinical interview may be helpful to ask more direct and close-
and prospective review obligate the clinician to ended questions based upon previously
provide service even if the managed-care entity obtained information or the patient's brief
denies authorization? Again, the only way to verbalizations. It is generally not desirable to
avoid difficulties, misunderstandings, and pos- lead the patient any more than necessary, as the
sible litigation or board complaints is to be more you query the less likely you will be able to
absolutely clear with interviewees and any distinguish between accurate responses and
involved third-party payer about these issues those that are colored by the demands experi-
prior to the professional contact. If it is possible enced by the patient. However, in some cases the
that the interviewing clinician will not receive clinician must take a more directive approach to
reimbursement from the managed-care com- complete the interview successfully.
pany for services, any alternative financial The topics to be included in every interview
arrangements should also be discussed with are:
the prospective patient before any formal (i) Introduction, purpose of interview, cre-
clinical contact. If there are inherent limitations dentials/role of interviewer;
to the number of sessions or type of interven- (ii) Confidentiality and exceptions;
tions that are covered by the third-party payer, (iii) Presenting problems (preferably phrased
the potential client should also be made aware in general, open-ended manner);
of these before ending the interview. Of course, (iv) Mood/Anxiety symptoms;
it is possible that no treatment will be necessary; (v) Impulse control and direct inquiry of
thus it seems sensible to leave discussing the suicidal ideation/history;
mechanics of paying for it until it is determined (vi) Current social, academic, and vocational
to be needed. functioning;
(vii) System of social support;
(viii) Environmental factors, including cur-
4.04.1.6 How to Open the Interview rent basic needs/shelter;
(ix) Developmental factors (especially for
The best way to open the interview is with a children) that may influence symptom presen-
very general, open-ended question about the tation;
circumstances that have brought the patient to (x) Medical history, including family health
the interview. Morrison (1995) recommends history and previous treatment/hospitalization;
taking approximately eight to 10 minutes in the (xi) Substance use;
typical one-hour interview to allow the respon- (xii) Legal involvement and history; and
dent to explain in their own words their needs (xiii) Vegetative symptoms.
and history. Morrison points out that, among
other things, this provides the clinician an
opportunity to obtain a true flavor for the 4.04.1.7 The Central Portion of the Interview
respondent's personality and communication
style, and to make general observations of After the initial introduction, housekeeping,
behavior, affect, and thought process relatively and rapport-building, it is time to focus upon
free from the clinician's direction. An example the most salient features of the person being
of an opening question might be ªPlease tell me evaluated, and the circumstances that maintain
about the things that are concerning you most the current dysfunction. Once the presenting
right nowº or ªI would like for you to tell me problems have been identified and an adequate
what you need some assistance with nowº or alliance with the respondent established, the
even ªPlease give me an idea of how you came to clinician must utilize their knowledge of
be here today.º The amount of information psychopathology and diagnostic criteria to fully
gathered during this portion of the interview understand and classify the presenting pro-
will be to some degree dependent upon the blems, as well as to identify the primary
respondent's intellectual ability and verbal strengths and resources that will be drawn
facility. Many people are characterologically upon by the patient and the professionals
unwilling to self-disclose, even within the involved in subsequent interventions. The
confines of the clinical interview, and may central portion of the interview is dedicated
require additional urging. The clinician should to adding to the framework established by
generally respond to hesitations with supportive queries about the presenting problem. One mis-
restatement of the opening question, or with take made by novice (as well as by some more
gentle encouragement and reflection of any seasoned but overly concrete) interviewers is to
apprehension that is detected. If hesitation or rigidly adhere to an interviewing framework,
lack of content appear to be due to cognitive disregarding the natural flow of conversation.
86 Clinical Interviewing

If one is unable to recognize the more subtle 4.04.1.8 Closing the Interview
verbal and nonverbal messages that should be
probed and instead forces one's way forward, As the time for the interview draws to a close,
the clinician will end up with less information the clinician should consolidate the information
than they should. Thus, it is essential to gained. It is helpful to review one's notes so that
attend carefully to shifts in mood during the any lingering questions may be answered, and to
interview, both within the patient and the clarify any dates, names, or other details that
interviewer. may have become confused during the course of
Luborsky (1996) details the utilization of the interview. An additional responsibility of the
momentary shifts in mood during a therapy clinician conducting the interview is to assist
session to focus upon vital underlying thoughts the informant in achieving closure. Many times
that are salient to the therapeutic issues. The the clinical interview results in emotional
interviewing clinician can also benefit by dilation and some level of cognitive disorgani-
noticing changes in voice tone, volume, and zation as distressing events are recalled and
content of speech. During the central portion of exposed to another person. The skilled clinician
the interview the clinician continues to focus on will continue to structure the interview with
the problems and possible explanations for reminders about the amount of time remaining,
present distress. When possible, avoid becoming summarizing the information provided, and
involved in digressive topics, as some respon- giving appropriate feedback to the informant
dents may prefer to spend most of the available regarding what to expect next. Avoid rushing
time presenting problems that are not central to the informant out of the room, but be prepared
the services being sought. By the same token, it to set limits about the closing of the interview.
is the clinician's responsibility to follow any When possible, it is beneficial to give the
significant leads in the interview, and to be informant a good idea of your diagnostic
aware of any tendencies on their own part to formulation, and to outline possible interven-
avoid distressing topics. Experience shows that tion strategies. If this is not possible or
clinicians tend to be vulnerable to this type of appropriate at the close of interview, convey
error in particular with regard to sexual to the informant what steps will be taken to
functioning, substance use, and racial/ethnic complete the evaluation, or provide the in-
discrimination issues. It may be helpful to keep formant with an idea of how the information
in mind that while the interview shares many provided will be utilized. If possible the
commonalties with social conversation, it is by informant should leave the interview with the
definition not a run-of-the-mill social interac- feeling that they have been heard, understood,
tion. Thus, inhibitions that prevent the inter- and will be benefiting in some way from having
viewer from querying these admittedly participated.
uncomfortable topics must be dealt with.
Because many clinicians may find themselves
having completed much of their formal training 4.04.1.9 The Collateral Interview
without ever overcoming the discomfort experi-
enced when such topics are broached, it may be Collateral interviewing refers to any direct
necessary to practice on colleagues in role-play interviewing done with persons other than the
activity designed to help the clinician become identified patient. Common collateral indivi-
adept at obtaining the necessary information duals who are interviewed in the clinical setting
despite initial resistance from within as well as include parents, spouses, siblings, and other
from the respondent. As one of the primary close relatives. In the case of children and
purposes of the clinical interview is accurate adolescents, school teachers, administrators,
diagnosis according to current syndromal and counselors are also often interviewed
criteria from the Diagnostic and statistical directly about the behavior and adaptive
manual of mental disorders (4th ed., DSM-IV), functioning of the patient. The same skills used
the clinician must have a solid working knowl- in interviewing the identified patient will be
edge of the criteria for major disorders. Many of employed in these interviews. Empathy, a lack of
the diagnostic categories require precise time criticism, and an appropriate use of humor are
qualifiers, so any reports of significant symp- just as indispensable in talking with a spouse or
toms should be followed by the clinician's school principal as they are with the individual
efforts to establish their time of onset, duration, presenting for assessment and/or treatment. In
and severity. The respondent should be encour- many cases, the collateral interview is conducted
aged by the clinician to employ descriptive because the patient is unable to provide the
terms, and to indicate in some way the intensity needed information on their own because of
of the symptoms with a numerical scaling or disorganizing pathology or other limiting fac-
comparative descriptors. tors, making a collateral information source
Developmental Considerations in Interviewing 87

even more important. In conducting the col- has convincing evidence that contacting the
lateral interview, one must also determine, to the other parent would present a significant danger
extent possible, the degree of reliability or to the patient.
weight to place upon the information thus
gathered.
The clinician should consider the amount of 4.04.2 DEVELOPMENTAL
time the informant has known the patient, and CONSIDERATIONS IN
the circumstances under which the patient has INTERVIEWING
been observed by the informant. In the case of
4.04.2.1 Interviewing Children (Preschool Age
the school teacher, beginning with questions
through Older Elementary)
regarding the amount of time spent with the
patient and the subjects taught provides an Because children are usually brought into the
opportunity to gather useful information about clinic setting by their parents, clinicians typi-
the school setting. In addition, this allows the cally schedule an interview with the parents to
clinician to evaluate to some extent the affective obtain information about current concerns and
responses of the informant toward the patient, past history. Parents are in a unique position to
for example, excessive anger or frustration on provide a chronology of significant events in the
the part of the teacher may point to possible child's life, leading up to the present concerns
distortions in reporting. It is helpful to probe and reasons for referral. Often collateral inter-
gently for the teacher's experience level, to avoid views will be scheduled with others who play a
being unduly influenced by the observations of significant role in the child's life, such as
one who has relatively little comparative grandparents, teachers, day care providers,
knowledge of normative classroom behavior. etc. Indeed, some diagnoses (such as attention
Begin by asking how long they have been in this deficit hyperactivity disorder [ADHD]) require
particular school, or teaching this particular that symptoms be documented across at least
grade. If the teacher is special education two settings, and it is helpful to have informants
certified, ask how long they have been certified from settings such as school to add to the
and in what areas. Usually these queries will be history provided by parents. One should not
sufficient to obtain a good estimate of the limit interviewing to only the adults who are
experience base of the teacher, and most will significant in the child's life, however. To do so
actually respond to the general probes with would create a risk of overlooking important
more than enough information to make appro- information that could be obtained from the
priate judgments. child directly about the child's perceived fears,
In the case of the parent interview, take care anxieties, mood, and critical events in the child's
to establish current custody arrangements and world. The child's perspective is often over-
responsibilities as clearly as possible. Depend- looked in situations where the child is not
ing upon the jurisdiction in which the clinician articulate about feelings or is immature in
works, noncustodial parents may not have the language development. It is necessary for the
right to seek mental health services for the child. clinician to develop skill in obtaining interview
It is the clinician's responsibility to be aware of information from children even in these cir-
all the legal constraints on service, as well as the cumstances. An excellent resource for inter-
ethical duties peculiar to working with children viewing or observing children, including infants,
or others who are unable to consent to can be found in Sattler (1998, pp. 96±132).
treatment legally. Be cautious about taking
the word of one parent involved in a visitation
or custody dispute who reports that the other 4.04.2.2 Interviewing Parents
parent has no interest in the child, or that the
other parent would be completely uninterested The purpose of the interview with parents is
in assisting in assessment or treatment for the similar to that discussed earlier in the chapter, in
child. Experience indicates that while this may that the clinician attempts to clarify the reasons
be true in some cases, this attempt to shut the for concern, identify strengths and weaknesses
other parent out of clinical work may result in that moderate the presenting problems in the
significant distortion of the presenting facts, child, and obtain information that could assist
and can hamper effective work with the child. with treatment planning. However, there are
Thus, if the parent bringing the child for services important ecological variables that are salient
indicates that their (ex)spouse will not partici- for children and should be addressed in the
pate in the interview, go ahead and obtain interview. These include placing the child's
consent from the present parent to contact the current and past problems into a social and
reportedly uninvolved parent. This action developmental context, assessing possible risk
would only be contraindicated if the clinician and resilience factors that may relate to the
88 Clinical Interviewing

child's problems, and assessing the conse- attempt to determine why this mother was
quences or developmental impact of the child's concerned and the best approach to interven-
problems on their future development. tion, the clinician asked about the mother's
perception of what this behavior means. The
4.04.2.3 Social Context mother responded by saying that she knew her
daughter was at a developmental age when
Schroeder and Gordon (1993) outlined exploring her body was normal, and she knew
several steps in assessing the problems of young that nothing bad would happen (e.g., such as
children, including clarifying the referral ques- growing hair on the palms of her hands) as a
tions and determining the social context of the result of masturbation. The additional question
problem. Parents often present to clinicians (`Why is the mother concerned now vs. any
feeling anxiety and/or frustration about their other time?º) yielded the most salient informa-
child's problems. This may lead to emotionally- tion about the mother's concerns. The mother
laden, imprecise descriptions of behavior (e.g., revealed that her mother-in-law was coming for
ªHe never minds!º or ªShe is always disrespect- a visit the next week, and she was concerned that
ful to her parents!º). The first task in the this relative would have a negative reaction to
interview is to help parents define the specific seeing her granddaughter masturbate. The
behaviors that cause concern, and to obtain intervention was simplified by understanding
information about the frequency, intensity, and the true reason for the mother's concern. The
nature of the problem. For instance, a three- clinician recommended that the mother provide
year-old child who displays temper tantrums rules about when and where it was acceptable to
once per week may be of mild concern, but, one masturbate (e.g., when her daughter was alone,
who has tantrums three to five times per day in her bedroom, or in the bathroom) and
would be of much greater concern. The intensity institute a behavioral reward system for re-
of the child's problems might be gauged by the membering not to masturbate while watching
degree of distress caused to the child or the television.
disruption to typical family activities. For Other social contextual information can be
instance, tantrums that occur occasionally at obtained about family status (who is living in
home may cause less distress than if they occur the home), recent transitions (moves, job
with regularity at church, school, or in other changes, births, recent deaths, or illnesses of
public places. Finally, the nature of the child's significant family members), and other family
problems will be an indicator of severity. stresses (marital problems, financial stresses,
Children who engage in cruelty to animals or etc.). The presence of persons who are suppor-
other people, who are destructive, or who tive to the child, or who may provide a buffer in
engage in a pattern of fire-setting behavior with the face of other stresses, is important. The
the intent to destroy property are of more literature on resilience is replete with examples
concern than those who have less serious of children who have lived with adversity but
oppositional and defiant symptoms. As clin- have developed and functioned normally due to
icians interview parents about the specific protective factors in their social history (Routh,
behaviors of concern, important information 1985). The interview with parents also provides
about the frequency, nature, and severity of the an opportunity for assessing possible psycho-
problems can be assessed. pathology in the parents, such as significant
The social context is best assessed by asking depressive or anxiety symptoms; problems with
simple questions such as, ªWho is concerned anger management and self-control, as is often
about the child?,º ªWhy is this person con- seen in abusive parents; substance abuse
cerned?,º and ªWhy is this person concerned problems that may lead to parental instability
now vs. some other time?º (Schroeder & or neglect; or problems with reality testing, as in
Gordon, 1993). Although parents or teachers schizophrenia. One mother, for example, de-
may refer children for assessment or treatment, scribed her 14-year-old son as being afraid of the
this does not mean that the child necessarily has dark and reporting seeing ghosts at night. This
a problem that needs treatment. A teacher who was viewed by the clinician as an example of a
refers several active children from a first grade fear that was developmentally inappropriate for
class may be feeling overwhelmed by the sheer a 14-year-old; it also raised questions about
number of active children in the class at one possible hallucinations. The context became
time, although a given child's behavior may not more clear when the mother revealed that she
be severe enough to warrant a diagnosis of saw nothing inappropriate about this behavior.
ADHD. Rutter and Schroeder (1981) provided The mother reported that she, too, needed to
a case of example of a mother who presented sleep with a light on due to her fear of the dark,
with concerns about her daughter occasionally and that she also imagined seeing ghosts in her
masturbating while watching television. In an bedroom. This mother reported that she and her
Developmental Considerations in Interviewing 89

son had many discussions about their mutual of preschool or daycare teachers and classroom
fears. The context of the son's fears was changed routines). In contrast, children in middle to late
by the mother's revelation, and the clinician elementary years (seven to 12 years of age)
decided to include a more thorough interview encounter developmental tasks related to mas-
regarding the mother's mental status in this tery of knowledge and intellectual skills, leading
case. Even when no concerns about parental to feelings of productivity and competence.
psychopathology exist, parental stress levels Children with learning disorders or other
and affect must be considered when interpreting developmental problems that interfere with
their reports about child behavior. A parent academic progress may be at risk for secondary
who is calm and rational in providing a history behavioral or emotional problems related to
of their child's behavior may be viewed as more their primary problems with learning during
objective than a parent who is extremely upset, this developmental period. The clinician must
tearful, or angry and uses exaggerated descrip- tailor the interview to exploration of the child's
tors of the child's behavior. strengths and weaknesses in the context of
appropriate developmental expectations for
4.04.2.4 Developmental Context particular ages.
The newly emerging field of developmental
Developmental context provides an essential psychopathology has provided a theoretical and
lens from which to view children's behavior, empirical base for better understanding the
allowing the clinician to evaluate the child's developmental precursors of psychopathology
behavior relative to that of other children of the in children, and the impact of this psycho-
same chronological and/or mental age. For pathology on subsequent functioning (cf,
instance, enuresis may not be unusual in a four- Cicchetti & Cohen, 1995a, 1995b). There is a
year-old, but would be of concern in a 14-year- growing body of research addressing risk
old. Likewise, enuresis may not be unusual in a factors for the onset and continuity of various
six-year-old youngster with a moderate degree childhood disorders. For example, Loeber and
of mental retardation. Some behavioral pro- colleagues have made important contributions
blems of young children are transient, reflecting to understanding the developmental pathways
their responses to normative developmental to childhood disruptive behavior disorders, in
challenges (e.g., a five-year-old girl who displays which different constellations of risk factors
a regression to thumb-sucking and infantile lead to different outcomes. In their longitudinal
speech patterns following birth of a new study of inner city boys at ages seven, 10 and 13,
sibling). Other problems are more serious and they found that initiation into antisocial
persistent, and suggest risk for later maladjust- behavior was predicted by some factors (e.g.,
ment. Familiarity with developmental theory poor parent±child relations, symptoms of
and the rich empirical literature in clinical child physical aggression) that were present across
psychology and developmental psychopathol- all three ages, while others (e.g., shyness at age
ogy can provide the clinician with guidance in seven, depression at age 10) were age specific
making these discriminations. (Loeber, Stouthamer-Loeber, Van Kammen, &
Knowledge of the sequence and transitions in Farrington, 1991). Further, the environments of
social/emotional development are helpful to the children who remained antisocial differed from
clinician in judging the appropriateness of those whose antisocial behavior dropped out;
children's behavior at various ages. For in- good supervision was more important in help-
stance, a toddler who has never displayed a ing older children (age 13 at intake) while
strong attachment to a primary caregiver attitude toward school was more important for
(usually a parent or parents) and who seems the younger children. Studies such as these
to form attachments indiscriminately with illustrate the importance of understanding the
others would raise concerns about possible contextual variables related to parenting style
attachment relational problems. A seven-year- and parent±child relational issues, as well as
old child who cannot delay gratification or specific child behaviors, in determining the
consider the feelings of others would be of significance of presenting problems and their
concern for not having learned appropriate self- possible trajectory over time.
control and capacity for emotional empathy
that would be expected at that age. Critical 4.04.2.5 Direct Interview of Children
developmental tasks for the preschool age child
include establishing effective peer relations (e.g., Perhaps the best and most comprehensive
learning to share material resources and adult resource guide for interviewing children and
attention with peers, establishing reciprocal adolescents who present with a variety of
play relationships) and developing flexible self- problems is Sattler's (1998) book on clinical
regulatory skills (e.g., adjusting to the authority and forensic interviewing of children. Basically,
90 Clinical Interviewing

the goals of the initial interview of the child School-aged children are able to share thoughts
depends upon the referral questions as well as and feelings with the clinician unless they are
the age and verbal ability of the child (Sattler, unusually shy or oppositional (Sattler, 1998).
1998). When interviewing children and their Obviously establishing rapport and maintaining
families the information sought often includes the child's cooperation during the interview is
the following: crucial. Kanfer, Eyberg, and Krahn (1992)
(i) to obtain informed consent to conduct the identified five basic communication techniques
interview (for older children) or agreement to be that can aid the clinician in attaining rapport
at the interview (for younger children); and cooperation. First, the clinician can use
(ii) to evaluate the children's understanding descriptive statements to describe the clients
of why they are at the interview and how they ongoing behavior, for example, ªYou're stack-
feel about being at the interview; ing the toys so nice.º Second, using reflective
(iii) to gather information about the chil- statements to mirror the childs statements can
dren's perception of the situation; be nonthreatening. For example, if the child
(iv) to identify antecedent and consequent says she wants to play with blocks the clinician
events related to the children's problems; merely reflects ªyou want to play with the
(v) to estimate the frequency, magnitude, blocks.º Third, labeled praise helps the child feel
duration, intensity, and pervasiveness of the good and feel that the clinician approves of
children's problems; them. Fourth, the clinician must avoid critical
(vi) to identify the circumstances in which the statements that suggest disapproval or make the
problems are most or least likely to occur; child feel as though they are bad. Finally, open-
(vii) to identify potentially reinforcing events ended questions avoid yes or no answers and
related to the problems; provide opportunities for children to elaborate
(viii) to identify factors associated with the on their responses (Kanfer et al., 1995).
parents, school, and environment that may
contribute to the problems; 4.04.2.6 Adolescents
(ix) to gather information about the chil-
dren's perceptions of their parents, teachers, Interpersonal style may play a greater role in
peers, and other significant individuals in their good interviewing with this age group than with
lives; any other. Adolescents tend to be intensely
(x) to assess the children's strengths, motiva- attuned to any communications that concern
tions, and resources for change; their personal appearance, skills, or compe-
(xi) to evaluate the children's ability and tence, and the interviewer must avoid at all costs
willingness to participate in formal testing; even the hint of condescension. As numerous
(xii) to estimate what the children's level of authors have pointed out, older clinicians tend
functioning was before an injury; and to identify readily with the parents of adoles-
(xiii) to discuss the assessment procedures cents, while younger ones may easily align
and possible follow-up procedures. (Sattler, themselves with the youth. The clinician who
p. 98). remains unaware of their tendencies in this
A part of the interview process with children regard runs the risk of making insensitive or
includes observation of parent and child and intrusive statements that will inhibit rapport
obtaining collateral information from the rather than increase it. In the first case, the
schools or others if the presenting problem clinician who approaches the adolescent with a
relates to learning or behavior problems outside parental attitude may unconsciously interact in
the home. Recognizing the developmental tasks a way that increases the informant's anxiety,
that children must master at varying ages helps guilt, and hostility. Questions that presuppose
the clinician understand the child's behavior. information the adolescent has not provided
Thus, a comprehensive, detailed developmental may mirror intrusive interactions with other
history of the child and family milieu is an adults, resulting in defensive efforts and
integral part in establishing an appropriate guardedness. Similarly, clinicians who identify
treatment. easily with the adolescent may also appear
Clinicians must also consider interviewing the ªhokeyº and insincere when they misuse
child at some stage during the evaluation popular language, or try too hard to relate
process. Very young children may be observed their own somewhat misty adolescent experi-
using a free-play setting and using observational ences to those of the youth they are interviewing.
guides during the play. The clinician can learn a These errors result from incautious use of the
great deal about the child's energy level, same techniques that will be necessary for
physical appearance, spontaneity, organization, successful adolescent interviewing. That is, to
behavior, affect, and attitude through their play obtain good information and develop adequate
and through a diagnostic play interview. rapport, the adolescent must perceive that the
Developmental Considerations in Interviewing 91

clinician is clearly on their side within the for later adolescents this includes work super-
boundaries of the relationship. Judicious use of visors as well. Conflicts with authority figures
self-disclosure can help the adolescent believe outside the home often have their roots in
that the interviewer is not attempting to take greater-than-average difficulties in resolving the
away from the interaction without reciprocat- family relationship struggles. Thus, when
ing. Earnest discussion of the limits of con- interviewing the adolescent, it is helpful to
fidentiality and the purposes of the interview identify both positive and negative relationships
will help allay some of the suspicions the with other adults in their life.
informant may have about the clinician's role, Often classroom performance for the adoles-
and will serve to make a distinction between the cent presenting for services is related strongly to
clinician±informant relationship and those the the quality of the relationship with the teacher,
adolescent has with parents, teachers, parole so discussion of academic performance (usually
officers, and other adults. a relatively nonthreatening issue in the context
The adolescent patient presents a number of of the clinical interview) can elicit useful
challenges to the interviewer that are often less information about this area of functioning as
present or significant in interactions with both well. Adolescents, as well as younger children,
older and younger people. Because of the may readily express relational difficulties in
unique developmental pressures and challenges response to the question ªIs he/she a good
of adolescence, special care must be taken in the teacher?º This often elicits the adolescent's
interview to ensure adequate cooperation as opinion regarding the desirable qualities in an
well as to make the interview process a helpful important adult, and allows the interviewer to
one to the patient. It is essential that the follow up with questions regarding the adoles-
interviewing clinician possess a basic knowledge cent's ability to recognize their own role in any
of the common demands and urges present in positive or negative interactions.
the adolescent and their family to effectively
assess the patient's functioning. Listed next are
4.04.2.6.3 Peer group identification
those tasks commonly believed to be operating
in the adolescent period of life according to As adolescence is inarguably a time of shifting
various developmental theorists (Erikson, 1963; focus from family relations to peer relations, it is
Rae, 1983). vital to gather information regarding the
patient's friendships and any identification with
a social subgroup. Some effective ways of
4.04.2.6.1 Separation±individuation
eliciting this information include discussion of
Separation±individuation refers to the need music topics, such as taste and dress, that will
of the adolescent to identify those qualities in provide clues to the adolescent's social pre-
themselves that set them apart from their sentation and degree of inclusion or exclusion
family. Many of the issues bringing adolescents from social groups.
to treatment involve conflicts that are direct To effectively interview adolescents regarding
results of this process. The adolescent during social issues, it is necessary for the clinician to
this time begins testing family boundaries and maintain a moderate degree of understanding of
experimenting with beliefs and behaviors that popular culture. Thus, one would be well served
differ from those held by their caretakers. This by making an effort to watch television
process often produces considerable anxiety for programming, read magazines, and spend time
all family members, and the adolescent's taking in the various electronic media that are
interpersonal relations may become quite vari- aimed at people in this age group. The
able. Often, the adolescent moves between the interviewer should not attempt to present as
poles of autonomy from, and dependence upon, an authority on the adolescent's culture, but will
the family. An important portion of the benefit from being able to recognize specific
adolescent interview is that of identifying the music groups, current movies, video games and
severity of the stressors resulting from this Internet activities, and other elements that are
natural process. part of the adolescent's milieu. It is often helpful
to enlist adolescents' aid in delineating the social
groups present in their school, then ask them to
4.04.2.6.2 Resolving conflict with authority
identify the group to which they feel they most
figures
belong. This question can usually be asked
Related to the individuation task is the rather directly, and many teens are pleased by
frequent occurrence of conflict with authority the opportunity to display their understanding
figures outside of the family as well as within. of the social complexities in their school.
For younger adolescents this involves primarily Follow-up inquiry should establish with
their teachers and other school personnel, and whom the adolescent spends most time and
92 Clinical Interviewing

how they see themself as fitting into the groups related to establishing an autonomous role-
at school. Many youth social strata include a identity may surface in the interactions with the
group delineated primarily by drug/alcohol use interviewer, especially with the ªyoungestº
as well as different groups for aggressive or adults. Therefore the interviewer may fre-
delinquent behavior that may be gang-affiliated quently call upon the skills used in interviewing
or gang-emulating. Thus, the social categories adolescents.
to which the adolescent assigns themself may Erikson (1963) identified the primary devel-
also point the interviewer toward necessary opmental conflict for the various stages of
inquiries into these possible problem areas as adulthood, and these stages suggest important
well as providing information about the degree interview topics (see Table 1). The primary
of social integration in the adolescent's life. conflict of young adulthood is intimacy vs.
isolation. Consequently, many of the psycho-
logical problem areas frequently encountered
4.04.2.6.4 Realistic appraisal and evaluation of will revolve around commitment to interperso-
self-qualities nal relationships and establishing trust. Estab-
As the focus of evaluation or treatment is lishment of a working relationship with the
likely to include assessing and modifying self- patient is also affected by these issues.
image, it is necessary to include questions A relatively greater amount of the interview
regarding the ways in which the adolescent might be devoted to exploration of existing
views themself. Adolescents generally display relationships or those the patient wishes existed.
both overly optimistic and excessively pessimis- One type of relationship to consider is that with
tic appraisals of personal qualities. One purpose parents and family of origin. Establishing the
of the interview is to assist in determining when degree of desired independence continues to be
these perceptions area faculty and result in an issue with some young adults. Issues relevant
impaired functioning. It is often helpful to to these ties might be financial (e.g., parents may
present questions about self-image in terms of be paying college expenses), or they may be
strengths and liabilities, and to follow up on more interpersonal in nature (e.g., parents
both. Questions about the adolescent's physical controlling social relationships or defining goals
capacities as well as social and emotional for the patient).
abilities are necessary components of the inter- Intimate relationships with individuals of the
view. This portion of the interview can be same or opposite sex may also be a source of
directed toward uncovering problems with psychological discomfort and play a part in the
perception of body image and behaviors related development of anxiety disorders or depression.
to physical health. The interviewer should Inquiry about social functioning should include
attend carefully to clues that might indicate peer relationships, such as partners in love
the need for more focused exploration of relationships, friends, and acquaintances.
possible eating disorders, and to somatic Individuals in the young adult age group
complaints indicative of anxiety or depression. generally will have established some degree of
independence, and the relative importance of
work and employment will be much greater than
4.04.2.7 Interviewing Young Adults (18±40 at younger ages. The interview should therefore
Years) include specific inquiry into current job status,
job satisfaction, goals, and relationships with
The psychological distinction between ado- co-workers. The further one progresses into this
lescence and young adulthood is frequently stage, the greater is the importance of establish-
blurred, and many of the same traits and ment of a stable intimate relationship and
problems may be observed in individuals both mutual trust, and the higher the probability that
over and under the chronological age of the issue of procreation will arise. Therefore
majority. However, since the age of majority inquiry should include questions about inten-
is generally 18 years, a higher proportion of tions and concerns associated with having
patients over age 18 will be self-referred and children and child rearing and any differences
hence will present in a more open and with one's partner about children.
cooperative manner than some adolescents. Finally, the initial episodes of many severe
Additionally, young adults are more likely to psychiatric disorders are most likely to occur
present with some subjective description of their within the young adult period. Initial episodes
distress and their situation. Therefore, the client of depression, and post-partum depression, are
may be more likely to identify a problem area likely to occur in those affected before they pass
spontaneously. Despite the fact that more through this period (Kaelber, Moul, & Farmer,
patients in this age group may independently 1995). Therefore screening for affective dis-
seek services, many of the adolescent issues orders should be included in the interview. A
Developmental Considerations in Interviewing 93

Table 1 Interview topics for each developmental stage.

Young adult Independence from family, relationships with peers, stable intimate relationships,
trust in relationships, establishment of a family, issues related to having and
rearing children, education, and career goals.
Middle adult Achievement of work and family goals, career or family role changes, responsibility
for aging parents, death of grandparents and parents, reducing responsibility for
children, changes in physical appearance and characteristics, and anticipating
retirement.
Older adult Accepting status of family and career, developing identity as grandparent or ªelder
advisor,º coping with reduced physical capability and/or health changes, specific
plans for retirement, loss of siblings, spouse, and friends. Increased reliance on
children or caretakers.
Late adult Coping with deteriorating health, decreased mobility, dependence on caretakers, and
anticipation of death.

later section of this chapter deals with inter- 4.04.2.9 Interviewing Older Adults (60±70
viewing depressed and anxious patients. Ad- Years)
ditionally, first episodes of schizophrenia or
bipolar disorder generally take place in adoles- For many adults in this age range, the
cence or young adulthood and the interviewer predominant life circumstance deals with addi-
should be sensitive to symptoms of these tional impending changes in the area of life
disorders. roles. Retirement usually occurs within this time
frame, and inquiries might reveal difficulties in
psychological adjustment to one's own retire-
ment or the retirement of a significant other.
4.04.2.8 Interviewing Adults in Middle
The frequency of death in the patient's social
Adulthood (40±60 Years) circle gradually increases, and may include a
Interview techniques need not differ with this spouse, close friends, or even an adult child.
age group, but the relevant topics from a Due to the possibility of some early decline in
developmental perspective are somewhat dif- cognitive capacity in this age group, the
ferent (see Table 1). This period encompasses response to inquiry may be defensiveness and
much of the creative and productive portion of denial. The patient with some early impairment
the life span in western culture. The emphasis is may deny the need for the evaluation, object to
not on starting, but on completing tasks begun questions, and become resentful if the interview
in young adulthood. The focus of individuals at serves to demonstrate difficulties with memory.
this stage of life is much less on goal setting than Therefore, it becomes more important to inter-
on goal attainment. The growth and nurturing view a collateral person or include a collateral
of an established family, the attainment of person in the patient interview. In addition to a
successive career goals, and nurturing of one's spouse or family member, a collateral person to
parents and grandparents occur in this time be considered with older adults is an adult
span. One's children come into adulthood and caretaker, who may or may not be related to the
begin to establish their identities and families. patient. This may give rise to some special issues
Inquiry into the relationships with the former of confidentiality.
and succeeding generations should be made. Attention to the collateral person's nonverbal
Towards the middle of this period, individuals behavior may sometimes suggest that they are
are able to anticipate the likelihood of reaching uncomfortable reporting the patient's difficul-
family and career goals, and become aware of ties, especially in the patient's presence. In such
the fact that certain goals for themselves and circumstances a separate collateral interview is
their children may not be met. Biological desirable.
changes associated with mid-life, which are
well-defined for women, but also may be present 4.04.2.10 Interviewing In Late Adulthood (70
for men, should be queried since they may be Years±End of Life)
associated with depression or anxiety. Possible
mid-life existential crises related to loss should Adults in the latest stages of life have their
also be assessed. The losses may result from own unique set of circumstances of which the
death of parents or grandparents, or changes in interviewer must be aware. The losses that may
roles as parent, spouse, or worker. have begun earlier may become more frequent.
94 Clinical Interviewing

Physical changes, often represented by medical It is also particularly important with de-
problems, may interfere with some life activities, pressed patients, who are prone to hopelessness,
and there may be a need to accept reduced to provide encouragement and attempt to
independence. At some point anticipation of the impart hope to the patient during the interview.
end of life is common. The combination of these This may be done by recognizing areas of
forces often lead the elderly to have a strength, either in terms of personal qualities or
perspective on life and the situation giving rise successful areas of functioning.
to the interview that differs considerably from Specific inquiry is necessary to diagnose
younger adults, in that they may be uncon- depression appropriately, and a variety of
cerned and see no need for the evaluation. Often sources are available to guide this inquiry.
the reasons for the interview are more important Diagnostic criteria for depression are clearly
to someone else than to the patient. As with delineated in the DSM-IV (American Psychia-
children and adolescents, it is more likely that tric Association [APA], 1994). A number of
someone other than the client identified the need structured interviews have been developed that
for and arranged for the mental health contact. may serve as guides for inquiry or provide
It is also common for the oldest adults to sample questions. Formal training is required
answer questions more slowly, either because of for the reliable use of these interviews for
difficulty accessing information or because a diagnostic purposes. The Schedule for Affective
more tangential and elaborate route is taken to Disorders and Schizophrenia (SADS; Endicott
reach a point in conversation. Patience on the & Spitzer, 1978) is a relatively early forerunner
part of the examiner in these situations is of current interviews that slightly preceded the
important, both for maintaining rapport and to DSM-III (APA, 1980), and includes probe
show the proper respect due the patient. questions for depressive symptoms as well as
It has been estimated that the incidence of other disorders.
cognitive decline in people over age 65 is The Structured Clinical Interview for DSM-
10±20% (Brody, 1982). Estimates are as high III-R (SCID; Spitzer, Williams, Gibbon, &
as 25% of those 80 years and older (Hooper, First, 1992) is a more current instrument with a
1992). Thus, the likelihood of cognitive impair- modular format so that sections for each
ment is even greater in this age group than those disorder may be used independently. Table 2
discussed previously. For those with cognitive also lists sample questions that might be used to
dysfunction, cooperation may be minimal and probe for the presence of various depressive
denial, and even belligerence, may be present. symptoms.
Again, the availability of a collateral person for
interview may be very important, as the patient
may not cooperate or may be impaired in their 4.04.3.2 Interviewing Anxious Patients
ability to provide information.
The anxious patient may also present some
special difficulties during the interview. If the
4.04.3 INTERVIEWING SPECIAL patient is acutely distressed at the time of the
POPULATIONS OR SPECIFIC interview, as might be true of someone with a
DISORDERS generalized anxiety disorder, they may provide a
rush of disorganized information so that it may
4.04.3.1 Interviewing Depressed Patients be difficult to obtain a coherent history. Anxiety
interferes with attention and concentration, so
Interviewing depressed adults may require that repetition may be necessary. Experience has
some adjustment in the tempo and the goals of shown that in such a situation, some initial
the interview. Due to low energy and psycho- intervention using brief relaxation techniques, is
motor retardation, it may not be possible to helpful before proceeding with the interview.
gather all the desired information within the Anxious patients also frequently seek reassur-
time available. Hence, some prioritization of ance that treatment will be effective in reducing
information is necessary, so that issues such as their anxiety. It is appropriate to indicate that
suicidality, need for hospitalization, and need treatment techniques have been helpful to other
for referral for medication may be addressed. anxious patients, and that these techniques will
Beck (1967) and later, Katz, Shaw, Vallis, and be available to them.
Kaiser (1995) pointed out that the interpersonal The diagnostic symptoms of various anxiety
interaction with the depressed patient may be disorders are identified in DSM-IV, and the
frustrating for the interviewer, not only due to structured interviews mentioned earlier also
the slowness mentioned above, but also because provide some guidance for the inquiry for
of the negative affect and negative tone of specific anxiety symptoms. In addition to the
information provided. diagnostic information, it is important to
Summary 95

Table 2 Sample questions for depressive symptoms.

Mood (depressed) How would you describe your mood?


Have you been feeling down or sad much of the time?
How much of the time do you feel down or sad?
Mood (irritable) Have you been more short-tempered than usual for you?
Do others say you are more irritable or lose your temper more easily
than usual?
Interest and pleasure Are you as interested as ever in things like your work, hobbies, or sex?
Do you continue to enjoy the things you usually like to do, like
hobbies, doing things with friends, or your work?
Has your interest declined in things which used to be really
interesting for you?
Energy/fatigue Do you have enough energy to do the things you want to do or need
to do?
Do you have the energy to do the things you find interesting?
Do you tire out more easily than usual for you?
Weight loss/gain Have you gained or lost weight since . . . (specify a time period)?
If the patient does not know, you may inquire about whether clothes
fit properly, or what others may have said about weight. Insomnia/
hypersomnia
Insomnia/hypersomnia How well are you sleeping?
Do you have difficulty getting to sleep? (initial insomnia).
Do you awaken frequently during the night and have trouble getting
back to sleep? (middle insomnia)
Do you awaken too early in the morning? (terminal insomnia)
Psychomotor agitation/retardation Have other people commented on your being too active or being very
slowed down?
Are there times when you just can't sit still, when you have to be
active, like pacing the floor or something similar?
Are there times when you are very slowed down, and can't move as
quickly as usual?
Worthlessness/guilt How do you feel about yourself?
Do you think of yourself as worthwhile?
Do you often feel guilty or have thoughts of being guilty for
something?
Is guilt a problem for you?
Concentration/decisiveness Is it difficult for you to keep your attention on things you are doing?
Do you lose track of things, like conversations or things you are
working on?
Is there a problem with making decisions?
Does it seem that your thoughts are slowed down, so it takes a long
time to make a decision?
Thoughts of death/suicide Do you frequently have thoughts of death?
Do you think a lot about friends or loved ones who have died?
(Inquire if someone close to the patient has recently died or is near
death.)
Do you sometimes think it would be better if you were dead?
Have you thought abut hurting yourself or killing yourself?
Have you planned a particular way in which you would kill yourself?
What would keep you from killing yourself?

inquire about ways the patient has attempted to clinician interview, structuring the interview,
cope with the anxiety, and to provide some the setting in which the interview takes place,
reinforcement for such efforts. preparing the patient, and the beginning,
middle, and ending phases of the interview
are discussed. Developmental considerations
4.04.4 SUMMARY
and suggestions are offered in interviewing
The clinical interview provides rich diagnostic children, adolescents, and adults. Sample ques-
information that can aid in the assessment and tions are primarily for interviewing depressed
treatment of patients. Interpersonal style of the patients.
96 Clinical Interviewing

4.04.5 REFERENCES pp. 3±35). New York: Guilford Press.


Kanfer, R., Eyberg, S., & Krahn, G. L. (1992). Interview-
American Psychiatric Association (1980). Diagnostic and ing strategies in child assessment. In M. Roberts & C. E.
statistical manual of mental disorders (3rd ed.). Washing- Walker (Eds.), Handbook of clinical child psychology
ton, DC: Author. (pp. 49±62). New York: Wiley.
American Psychiatric Association (1994). Diagnostic and Katz, R., Shaw, B., Vallis, M., & Kaiser, A. (1995). The
statistical manual of mental disorders (4th ed.). Washing- assessment of the severity and symptom patterns in
ton, DC: Author. depression. In E. E. Beckham & W. R. Leber (Eds.),
Beck, A. T. (1967). Depression: Clinical, experimental and Handbook of depression (2nd ed., pp. 61±85). New York:
therapeutic aspects. New York: Harper and Row. Guilford Press.
Bricklin, B. (1990). The custody evaluation handbook: Loeber, R., Stouthamer-Loeber, M., Van Kammen, W., &
Research-based solutions and applications. New York: Farrington, D. P. (1991). Initiation, escalation and
Brunner-Mazel. desistance in juvenile offending and their correlates.
Brody, J. A. (1982). An epidemiologist views senile The Journal of Criminal Law and Criminology, 82, 36±82.
dementia: Facts and fragments. American Journal of Luborsky, L. (1996). The symptom±context method. Wa-
Epidemiology, 115, 155±160. shington, DC: APA Publications.
Cicchetti, D., & Cohen, D. J. (Eds.) (1995a). Developmental Morrison, J. (1995). The first interview. New York:
psychopathology. Vol. 1: Theory and methods. New York: Guilford Press.
Wiley. Rae, W. A. (1992). Teen±parent problems. In M. C.
Cicchetti, D., & Cohen, D. J. (Eds.) (1995b). Developmental Roberts & C. E. Walker (Eds.), Handbook of clinical
psychopathology. Vol. 2: Risk, disorder, and adaption. child psychology (pp. 555±564). New York: Wiley.
New York: Wiley. Routh, M. (1985). Masturbation and other sexual beha-
Corcoran, K., & Vandiver, V. (1996). Maneuvering the viors. In S. Gabel (Ed.), Behavioral problems in childhood
maze of managed care: Skills for mental health profes- (pp. 387±392). New York: Grune & Stratton.
sionals. New York: Simon & Schuster. Rutter, D. K., & Schroeder, C. S. (1981). Resilience in the
Egan, G. (1994). The skilled helper: A problem management face of adversity: Protective factors and resistance to
approach to helping. Pacific Grove, CA: Brooks/Cole psychiatric disorder. British Journal of Psychiatry, 147,
Publishing. 598±611.
Endicott, J., & Spitzer, R. (1978). A diagnostic interview: Sattler, J. (1998). Clinical and forensic interviewing of
The Schedule for Affective Disorders and Schizophrenia. children and families (pp. 96±132) San Diego, CA: J.
Archives of General Psychiatry, 35, 837±844. M. Sattler.
Erikson, E. H. (1963). Childhood and society (2nd ed.). New Schoeder, C. S., & Gordon, B. N. (1993). Assessment of
York: Norton. behavior problems in young children. In J. L. Culbertson
Hooper, C. (1992). Encircling a mechanism in Alzheimer's & D. J. Willis (Eds.), Testing young children
disease. Journal of National Institutes of Health Research, (pp. 101±127). Austin, TX: ProEd.
4, 48±54. Spitzer, R., Williams, J. B. W., Gibbon, M., & First, M.
Kaelber, C. T., Moul, D. E., & Farmer, M. E. (1995). (1992). The Structured Clinical Interview for DSM-III-R
Epidemiology of depression. In E. E. Beckham & (SCID): I. History, rationale and description. Archives of
W. R. Leber (Eds.), Handbook of depression (2nd ed., General Psychiatry, 49, 624±636.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.05
Structured Diagnostic Interview
Schedules
JACK J. BLANCHARD and SETH B. BROWN
University of New Mexico, Albuquerque, NM, USA

4.05.1 INTRODUCTION 98
4.05.1.1 Evaluating Structured Interviews 99
4.05.1.1.1 Reliability 99
4.05.1.1.2 Validity 101
4.05.1.2 Overview 101
4.05.2 ADULT DISORDERS 101
4.05.2.1 Schedule for Affective Disorders and Schizophrenia 101
4.05.2.2 Reliability 102
4.05.2.2.1 Summary 104
4.05.2.3 Present State Examination 104
4.05.2.3.1 Reliability 105
4.05.2.3.2 Supplements to the PSE 106
4.05.2.3.3 Summary 107
4.05.2.4 Structured Clinical Interview for DSM-IV/Axis I Disorders 107
4.05.2.4.1 Reliability 108
4.05.2.4.2 Summary 109
4.05.2.5 Comprehensive Assessment of Symptoms and History 109
4.05.2.5.1 Reliability 110
4.05.2.5.2 Summary 110
4.05.2.6 Diagnostic Interview for Genetic Studies 111
4.05.2.6.1 Reliability 111
4.05.2.6.2 Summary 112
4.05.2.7 Diagnostic Interview Schedule 112
4.05.2.7.1 Reliability 112
4.05.2.7.2 Summary 113
4.05.2.8 Composite International Diagnostic Interview 113
4.05.2.8.1 Reliability 114
4.05.2.8.2 Summary 114
4.05.3 PERSONALITY DISORDERS 114
4.05.3.1 Structured Interview for DSM-IV Personality Disorders 114
4.05.3.1.1 Reliability 115
4.05.3.1.2 Summary 115
4.05.3.2 International Personality Disorder Examination 116
4.05.3.2.1 Reliability 117
4.05.3.2.2 Summary 117
4.05.3.3 Structured Clinical Interview for DSM-IV Personality Disorders 118
4.05.3.3.1 Reliability 119
4.05.3.3.2 Summary 119
4.05.3.4 Personality Disorder Interview-IV 120
4.05.3.4.1 Reliability 120
4.05.3.4.2 Summary 120
4.05.4 CHILD AND ADOLESCENT DISORDERS 121

97
98 Structured Diagnostic Interview Schedules

4.05.4.1 Schedule for Affective Disorders and Schizophrenia for School Age Children 121
4.05.4.1.1 Reliability 121
4.05.4.1.2 Summary 121
4.05.4.2 Child Assessment Schedule 122
4.05.4.2.1 Reliability 122
4.05.4.2.2 Summary 123
4.05.4.3 Child and Adolescent Psychiatric Assessment 123
4.05.4.3.1 Reliability 124
4.05.4.3.2 Summary 124
4.05.4.4 Diagnostic Interview Schedule for Children 124
4.05.4.4.1 Reliability 124
4.05.4.4.2 Summary 125
4.05.4.5 Diagnostic Interview for Children and Adolescents 125
4.05.4.5.1 Reliability 126
4.05.4.5.2 Summary 126
4.05.5 SUMMARY 126
4.05.6 REFERENCES 126

4.05.1 INTRODUCTION assignment attributable to how clinicians sum-


marize patient information into existing defini-
As early as the 1930s, and certainly by the tions of psychiatric diagnoses. Inadequacies of
early 1960s, it was apparent that clinical early diagnostic systems (e.g., the Diagnostic
diagnostic interviews were fallible assessments. and statistical manual of mental disorders
Evidence suggested that clinicians frequently [DSM-I and DSM-II]) generally arose from
arrived at different diagnoses, often agreeing at the lack of explicit diagnostic criteria. The
no more than chance levels (e.g., Beck, Ward, development of newer diagnostic schemes such
Mendelson, Meck, & Erbaugh, 1962; Matar- as the Research Diagnostic Criteria (RDC;
azzo, 1983; Spitzer & Fleiss, 1974). These Spitzer, Endicott, & Robins, 1978), and subse-
findings gave raise to the study of the causes quently the DSM-III (American Psychiatric
of diagnostic unreliability as well as the Association, 1980), provided inclusion and
development of methods to improve psychiatric exclusion criteria and specific criteria relating
diagnosis. to symptoms, signs, duration and course, and
In the first study systematically to examine severity of impairment.
reasons for diagnostic disagreement, Ward In addressing errors that arise from inade-
Beck, Mendelson, and Erbaugh (1962) sum- quate nosology, clinicians and researchers are
marized three major sources of disagreement. still faced with information variance, that is,
These were inconsistency on the part of the errors originating from differences in what
patient (5% of the disagreements), inconsis- information is obtained and how that informa-
tency on the part of the diagnostician (32.5%), tion is used by the interviewers. As reviewed
and inadequacies of the nosology (62.5%). above, Ward et al. (1962) found that nearly a
Thus, nearly one-third of the diagnostic dis- third of diagnostic disagreements were related
agreements arose from the diagnosticians. to the interviewers. Structured interviews were
Factors associated with differences between developed in order to address this source of
diagnosticians included interviewing techniques error variance, and the history of structured
that led to differences in information obtained, interviews goes back to the 1950s (Spitzer,
weighing symptoms differently, and differences 1983). All structured interviews seek to mini-
in how the symptomatology was interpreted mize information variance by ensuring that
(Ward et al., 1962). It is interesting that these clinicians systematically cover all relevant areas
problems arose despite methods of study which of psychopathology. Although specific methods
included the use of experienced psychiatrists, vary across instruments, common techniques
preliminary meetings to review and clarify that characterize structured interviews are the
diagnostic categories, the elaboration of diag- specification of questions to be asked to assess
nostic descriptions to minimize differences, and domains of psychopathology and the provision
the compilation of a list of instructions to guide of anchors and definitions in order to determine
diagnosis (Beck et al., 1962). the ratings of symptoms (e.g., do the descrip-
The study of Ward et al. (1962) identified two tions obtained within the interview achieve
major sources of disagreement that have been diagnostic threshold or not?). Despite some
termed ªcriterion varianceº and ªinformation shared qualities it is also clear that available
varianceº (Endicott & Spitzer, 1978). Criterion structured interviews differ markedly on a
variance refers to the errors in diagnostic number of dimensions. The reliability of
Introduction 99

diagnoses based on refined diagnostic criteria methods of diagnosticians, two other factors
paired with structured interviews was found to contribute to rater disagreements in test±retest
be greatly improved (e.g., Endicott & Spitzer, designs. First, the information provided by the
1978; Spitzer, Endicott, & Robins, 1978). patient may be different during the two inter-
views. Even with structured interviews this can
continue to be a relevant source of variance. In a
4.05.1.1 Evaluating Structured Interviews review of test±retest data using the Schedule for
Affective Disorders and Schizophrenia, Life-
The selection of a structured interview will be time Anxiety, Fyer et al. (1989) found that more
driven by a number of other concerns. Some of than 60% of diagnostic disagreements were due
the potential considerations are summarized in to variation in the information provided by
Table 1 and are derived from the reviews of Page subjects. Second, there may be a true change in
(1991) and Zimmerman (1994). Questions that the clinical status of the individual who is
should be asked in selecting an instrument will interviewed. As the test±retest period increases,
include the diagnoses covered by the interview, the potential contribution of changing clinical
the nosological criteria adhered to in generating status will increase.
these diagnoses (e.g., DSM, the International Other methods sometimes utilize a single
classification of diseases [ICD], or other criteria interview that is either observed or videotaped
such at the RDC), and the population studied and rated by independent (nonparticipating)
(e.g., adult or child). Additionally, the context raters, yielding inter-rater agreement. This
in which the interview is conducted may also be method may yield inflated estimates of relia-
relevant. That is, who will be administering the bility as information variance is absent; that is,
questionnaire and under what circumstances? both raters are basing diagnostic decisions on
Some interviews were developed to be used by the same subject responses. Also, interviewer
lay interviews as in community epidemiological behavior may disclose diagnostic decisions to
studies while other instruments require exten- the second rater. For example, during a
sive clinical experience and are to be adminis- structured interview an interviewer may deter-
tered only by mental health professionals. Other mine that a module of the interview is not
concerns will relate to the guidelines and required based on subject responses and the
support available for an instrument. Some interviewer's interpretation of rule-out criteria.
measures have extensive book-length user's The observing rater is aware of this diagnostic
manuals with available training videotapes decision as the interviewer skips questions and
and workshops; however, other measures have moves to another module. Given the impor-
only sparse unpublished manuals with no tance of methods used in assessing reliability,
training materials available. Finally, major throughout this chapter we will attempt to
concerns arise regarding reliability. indicate clearly the techniques by which relia-
bility was assessed for each instrument.
In addition to considering study designs used
4.05.1.1.1 Reliability
in evaluating diagnostic reliability, it is also
The reliability of a diagnostic interview refers important to examine the statistics used to
to the replicability of diagnostic distinctions compute reliability. One method that has been
obtained with the interview. Methods for used as an index of reliability is percent
evaluating agreement between two or more agreement. As outlined by Shrout, Spitzer,
raters can take a variety of forms (Grove, and Fleiss (1987), percent agreement is inflated
Andreasen, McDonald-Scott, Keller, & Sha- by chance agreement and the base rates with
piro, 1981). Importantly, these differing meth- which disorders are diagnosed in a sample. In
ods may have an impact on indices of reliability. their example, if two clinicians were randomly
The most stringent evaluation involves raters to assign DSM diagnoses to 6% of a community
conducting separate interviews on different sample of 100 persons, chance agreement would
occasions with diagnoses based only on inter- produce an overall rate of agreement of about
view data (i.e., no access to medical records or 88.8% (Shrout et al., 1987).
collateral informants such as medical staff or The index of agreement, kappa (K), was
family members). Reliability assessment based developed to address this statistical limitation
on this method is referred to as ªtest±retest (Cohen, 1960). Kappa reflects the proportion of
reliability,º given the two occasions of inter- agreement, corrected for chance; it varies from
viewing. This methodology ensures a rigorous negative values reflecting agreement below
evaluation of the ability of the interview to limit chance, zero for chance agreement, and positive
information variance and to yield adequate values reflecting agreement above chance to 1.0
information for the determination of diagnoses. for perfect agreement (Spitzer, Cohen Fleiss, &
However, in addition to interviewing style and Endicott, 1967). The statistic weighted K was
100 Structured Diagnostic Interview Schedules

Table 1 Relevant questions for selecting a diagnostic interview.

Content
Does the interview cover the relevant diagnostic system (e.g., RDC, DSM-IV, ICD-10)?
As an alternative to adhering to a single diagnostic system, does the interview provide polydiagnostic assessment
(i.e., diagnoses for multiple diagnostic systems can be generated)?
Does the interview cover the relevant disorders?
Can irrelevant disorders be omitted?
Does the interview provide a sufficiently detailed assessment? That is, aside from diagnostic nosology is
adequate information in other domains assessed (e.g., course of illness, family environment, social
functioning)?
How are signs and symptoms rated (e.g., dichotomous ratings of presence vs. absence or continuous ratings of
severity)?

Population
Is the interview applicable to the target population?
Relevant population considerations include adult vs. child, patient or nonpatient, general population, or family
members of psychiatrically ill probands and whether the instrument will be used cross-culturally (is it
available in languages for the cultures to be studied?).
Aside from general population considerations are there other exclusionary conditions to be aware of (e.g., age,
education, cognitive functioning, or exclusionary clinical conditions)?

Time period
Does the interview cover the relevant time period (e.g., lifetime occurrence)?
Can the interview be used in longitudinal assessments to measure change?

Logistics of interview
How long does the interview take?
Does interview require or suggest use of informant (e.g., with child interviews)?

Administration/interviewer requirements
Who can administer the interview (e.g., lay interviewers, mental health professionals)?
How much training or experience is required to administer interview?
Does interview provide screening questionnaire to assist in expediting the interview (e.g., in personality disorder
assessments)?
Are computer programs required and compatible with available equipment?

Guidelines and support


What guidelines for administration and scoring are available (e.g., user's manual)?
What training materials are available (videotapes, workshops)?
Is consultation available for training or clarification of questions regarding administration or scoring?

Reliability and validity


Is the interview sufficiently reliable?
Are reliability data available for the diagnoses and populations to be studied?
Are validity data available for the interview (e.g., concordance with other structured interviews, expert-obtained
longitudinal data, and other noninterview measures)?

developed for distinguishing degrees of dis- be noted that other authors have proposed what
agreement, providing partial credit when dis- appear to be more lenient criteria for evaluating
agreement is not complete (Spitzer et al., 1967). kappa. For example, Landis and Koch (1977)
Standards for interpreting kappa suggest that suggest that kappas as low as 0.21±0.40 be
values greater that 0.75 indicate good reliability, considered as indicating ªfairº agreement.
values between 0.50 and 0.75 indicate fair It is important to understand that reliability is
reliability, and values below 0.50 reflect poor not a quality inherent within an instrument that
reliability (Spitzer, Fleiss, & Endicott, 1978). is transportable to different investigators or
Although the present review will adhere to the populations. All the measures described herein
recommendations of Spitzer, Fleiss & Endicott require interviewer training, and some instru-
(1978) that kappas below 0.50 indicate poor ments require an extensive amount of prior
agreement or unacceptable reliability, it should clinical experience and professional training.
Adult Disorders 101

Ultimately, the reliability of any structured 4.05.2 ADULT DISORDERS


interview will be dependent on the user of that
interview. Although the present review will 4.05.2.1 Schedule for Affective Disorders and
invite comparisons of reliability across studies it Schizophrenia
should be noted that reliability statistics such as
kappa are influenced by a number of factors In order to address sources of diagnostic error
that constrain such comparisons. Differences in arising from criterion variance, Spitzer, End-
population heterogeneity, population base icott, and Robins (1978) developed the RDC.
rates, and study methods will all influence The RDC contains specific inclusion and
reliability and should be considered in evaluat- exclusion criteria for 25 major diagnostic
ing the literature. categories as well as subtypes within some
categories. Disorders covered include schizo-
phrenia spectrum disorders, mood disorders
(depression and bipolar disorders), anxiety
4.05.1.1.2 Validity disorders (panic, obsessive-compulsive disor-
In addition to reliability, the validity of a der, phobic disorder, and generalized anxiety
diagnostic assessment can also be evaluated. In disorder), alcohol and drug use disorders, some
the absence of an infallible or ultimate criterion personality disorders (cyclothymic, labile, anti-
of validity for psychiatric diagnosis, Spitzer social), and two broad categories of unspecified
(1983) proposed the LEAD standard: long- functional psychosis, and other psychiatric
itudinal, expert, and all data. Longitudinal disorder. The major source of data for
refers to the use of symptoms or information determining RDC diagnoses is the use of the
that may emerge following an evaluation in Schedule for Affective Disorders and Schizo-
order to determine a diagnosis. Additionally, phrenia (SADS; Endicott & Spitzer, 1978).
expert clinicians making independent diagnoses As originally developed there were three
based on all sources of information make a versions of the SADS: the regular version
consensus diagnosis that will serve as a criterion (SADS), the lifetime version (SADS-L), and a
measure. These expert clinicians should use all change version (SADS-C). A lifetime anxiety
sources of data that have been collected over the version of the SADS (SADS-LA; Fyer et al,
longitudinal assessment including direct evalua- 1989; Mannuzza et al., 1989) has been devel-
tion of the subject, interviewing of informants, oped to assess RDC, DSM-III, and DSM-III-R
and information from other professionals such criteria for almost all anxiety disorder diag-
as ward nurses and other personnel having noses, in addition to all the diagnoses covered in
contact with the subject. Typically, validity data the original SADS. The SADS has two parts:
such as that suggested by Spitzer are rarely Part I provides a detailed description of current
available for structured interviews. condition and functioning during the one week
preceding the interview, Part 2 assesses past
psychiatric disturbance. The SADS-L is similar
to Part 2 of the SADS but the SADS-L focuses
4.05.1.2 Overview on both past and current disturbance. It is
appropriate for use in populations where there
Within this chapter we provide an overview of is likely no current episode or when detailed
the major structured interviews available for use information regarding the current condition is
with adult and child populations. The inter- not required. Endicott and Spitzer (1978)
views included in this chapter are listed in Table estimate that the SADS can be completed in
2. Due to space limitations we have focused on one and one-half to two hours depending on the
the review of broad diagnostic instruments and disturbance of the individual being interviewed.
have not reviewed more narrow or specialized The SADS provides probe questions for each
instruments that may address a single diagnosis symptom rating. However, in addition to using
or category of diagnoses. Each instrument will the interview guide the rater is instructed to use
be reviewed with regard to its history and all sources of information and to use as many
development and description of the instrument supplemental questions as is required to make
including its format and the diagnoses covered. ratings. Part 1 of the SADS rates severity of
Reliability data available will be presented and symptoms when they were at their most
reviewed. Finally, a summary will be provided extreme. Many items are rated for severity
that intends to highlight the advantages and during the week prior to the interview and for
disadvantages inherent in each instrument. severity during their most extreme during the
Interviews reviewed will address adult disor- current episode. Ratings are made on a seven-
ders, including personality disorders, and inter- point scale from zero (no information) to six
views for children and adolescents. (e.g., extreme). The SADS provides defined
102 Structured Diagnostic Interview Schedules

Table 2 Structured interviews included in review.

Adult
Schedule for Affective Disorders and Schizophrenia
Present State Examination
Structured Clinical Interview for DSM-IV/Axis I Disorders
Comprehensive Assessment of Symptoms and History
Diagnostic Interview for Genetic Studies
Diagnostic Interview Schedule
Composite International Diagnostic Interview

Personality disorders
Structured Interview for DSM-IV Personality
International Personality Disorder Examination
Structured Clinical Interview for DSM-IV Personality Disorders
Personality Disorder Interview-IV

Child and adolescent


Schedule for Affective Disorders and Schizophrenia for School Age Children Child Assessment Schedule
Child and Adolescent Psychiatric Assessment
Diagnostic Interview Schedule for Children
Diagnostic Interview for Children and Adolescents

levels of severity for each item. For example, in videotaped SADS interviews was not affected
the screening items for manic criteria the item by level of education (from medical degrees and
ªless need for sleep than usual to feel restedº doctorates in psychology to masters and other
ratings are as follow: 1 = no change or more degrees) or years of prior clinical experience
sleep needed; 2 = up to 1 hour less than usual; (from less than four years to more than 10
3 = up to 2 hours less than usual; 4 = up to 3 years).
hours less than usual; 5 = up to 4 hours less than Training videotapes and training seminars
usual; 6 = 4 or more hours less than usual. are available from the developers of the SADS
In addition to the item ratings and the at the New York State Psychiatric Institute,
assignment of RDC diagnoses, the SADS can Columbia University. One example of training
be used to provide eight summary scales: in the SADS is provided by Mannuzza et al.
Depressive Mood and Ideation, Endogenous (1989) in the use of the SADS-LA. Training was
Features, Depressive-Associated Features, Sui- conducted in three phases consisting of 50±60
cidal Ideation and Behavior, Anxiety, Manic hours over three to four months. In Phase 1
Syndrome, Delusions-Hallucinations, Formal raters spent 20 hours attending lectures covering
Thought Disorder. These scales were deter- diagnosis, systems of classification, interviewing
mined by factor-analytic work using similar technique and the SADS-LA rater manual,
content scales, and an evaluation of clinical reviewed RDC and DSM-III vignettes, and
distinctions that are made in research of rated videotapes of interviews. In Phase 2 each
affective and schizophrenic disorders. The rater administered the SADS-LA with one
intent of the scales is to provide a meaningful patient with an expert trainer and other raters
summary of SADS information. observed. Interviews were subsequently re-
The SADS is intended for use by individuals viewed and discussed. In the final phase, raters
with experience in clinical interviewing and independently interviewed three or four patients
diagnostic evaluation. Since clinical judgments who had already received the SADS-LA from
are required in determining the need for an expert rater. This final phase allowed for
supplemental questioning and in determining test±retest reliability to be examined and
ratings, Endicott and Spitzer (1978) suggest that provided an opportunity to discuss discrepan-
administration of the interview be limited to cies in ratings during a consensus meeting.
psychiatrists, clinical psychologists, or psychia-
tric social workers. However, these authors do
note that interviewers with different back- 4.05.2.2 Reliability
grounds may be used but will require additional
training. In one study using highly trained raters Initial reliability data for the SADS were
(Andreasen et al., 1981), reliability in rating reported by Spitzer, Endicott, and Robins
Adult Disorders 103

(1978) and Endicott and Spitzer (1978) for both study modified SADS-C ratings were com-
joint interviews (N = 150) and independent pared for two raters: one with extensive fami-
test±retest interviews, separated by no more liarity about subject's psychiatric history and
than one week (N = 60). For joint interviews, course of illness and prior history and one who
present and lifetime RDC diagnoses obtained was blind to this history and had no prior
kappas greater than 0.80 (median K = 0.91), contact with the subject. Quasi-joint interviews
with the exception of minor depressive disorder were conducted with the two raters. The
(K = 0.68). For test±retest interviews, reliability nonblind rater was allowed to ask additional
was somewhat attenuated but remained high questions following the joint SADS interview,
with kappas greater than 0.55 for all disorders in the absence of the blind rater. Of four SADS-
(median K = 0.73) with the exception of bipolar C summary scale scores all achieved ICCs of
I (0.40). Endicott and Spitzer (1978) also 0.79 or greater. At the item level, 92% of the 52
reported reliability of the SADS items and items had ICCs of 0.60 or greater. Rater
eight summary scales using these same samples. differences in scoring suggested that the blind
For the 120 items of the current section of the rater may have been somewhat more sensitive
SADS, reliability was high for both joint (90% to items relating to dysphoria while the
of items interclass correlation coefficients nonblind rater was more likely to identify
[ICCs] equal to or greater than 0.60) and some symptoms that may have been minimized
test±retest interviews (82% of items ICCs or missed in the blind rater's interview (e.g.,
greater than or equal to 0.60). Summary scales mania). However, these discrepancies were
also yielded high reliability for joint (ICC subtle and suggest that the SADS can achieve
range = 0.82±0.99, median = 0.96) and test± accurate assessment of current cross-sectional
retest interviews (ICC range = 0.49±0.91, med- functioning whether or not raters have famil-
ian = 0.83). Spitzer et al. (1978) also examined iarity with the patient.
the reliability of the SADS-L with first-degree The inter-rater reliability of the SADS-
relatives of patient probands (N = 49). All derived DSM-III diagnoses in adolescents has
kappas were 0.62 or higher with the exception of been examined by Strober, Green, and Carlson
other psychiatric disorder (0.46), median kap- (1981). Joint interviews were conducted with 95
pa = 0.86. inpatient adolescents and a family member.
Two subsequent studies examined test±retest Raters independently reviewed all available
reliability (separate interviews conducted on collateral information prior to the SADS inter-
the same day) of the SADS (Andreasen et al., view (e.g., medical and psychiatric records,
1981; Keller et al., 1981). In a study of 50 school records, current nurses' observations).
patients using the SADS-L, Andreasen et al. All diagnoses achieved kappas of 0.63 or greater
(1981) found ICCs equal to or greater than 0.62 with the exception of anxiety disorders of child-
for the major RDC diagnoses of bipolar I and hood (0.47) and undiagnosed illness (0.47). Al-
II, major depressive disorder, alcoholism, and though encouraging, these data should be
never mentally ill. The RDC subtypes of major viewed in the context of the use of joint
depression also achieved ICCs equal to or interviews and the extensive use of collateral
greater than 0.60 with the exception of the information to supplement the SADS.
subtype of incapacitating. Keller et al. (1981), Mannuzza et al. (1989) examined the relia-
using the SADS-L in a sample of 25 patients, bility of the SADS-LA in a sample of 104
obtained kappas equal to or greater than 0.60 patients with anxiety disorders. Independent
for the RDC diagnoses of schizophrenia, interviews were conducted with test±retest
schizoaffective-depressed, manic, major depres- periods ranging from the same day to 60 days.
sive disorder, and alcoholic. The major diag- Collapsing across RDC, DSM-III, and DSM-
noses of schizoaffective-manic and hypomanic III-R anxiety disorder diagnoses, agreement for
had low reliability with kappas of 0.47 and lifetime disorders achieved kappas of 0.60 or
0.26, respectively. Keller et al. (1981) also greater, with the exception of simple phobia.
found high reliability for background informa- Examining lifetime anxiety diagnoses separately
tion items on social and educational back- for each diagnostic system again suggested
ground, and history of hospitalization (kappas adequate reliability for most disorders (K
greater than 0.73). Finally, individual items range = 0.55±0.91), with the exception of
from manic, major depressive disorder, psy- RDC and DSM-III-R diagnoses of simple
chosis, alcohol and drug abuse, suicidal phobia and generalized anxiety disorder (Ks
behavior, and social functional all achieved less than 0.49). Using this same sample, Fyer
kappas above 0.56. et al. (1989) assessed item reliability and factors
McDonald-Scott and Endicott (1984) eval- contributing to disagreements. In general,
uated the influence of rater familiarity on symptoms were reliably rated with the exception
diagnostic agreement using the SADS. In this of stimulus-bound panic (typical of simple
104 Structured Diagnostic Interview Schedules

phobia), near panic attacks, persistent general- 4.05.2.2.1 Summary


ized anxiety, six social and nine nonsocial
The development of the SADS represented
irrational fears. Review of narratives and
significant progress in clinical assessment. The
consensus meeting forms by Fyer et al. (1989)
SADS has been used extensively in a number of
suggested that the largest source of disagreement
research studies and a wealth of reliability data
was variation in information provided by subject
are available. The SADS provides a broad
(more than 60% of disagreements). Differences
assessment of symptoms as well as severity
in rater interpretation of criteria resulted in
ratings for many symptoms. However, the range
10±20% of the disagreements and rater error
of disorders covered is somewhat narrow (with
accounted for 10% of the disagreements.
an emphasis on schizophrenia, mood disorders,
The prior studies have examined test±retest
and anxiety disorders in the SADS-LA).
reliability of lifetime diagnoses over brief
Additionally, diagnostic criteria are based on
periods. Two studies have examined the long-
the RDC, with the exception of anxiety
term test±retest reliability of SADS-L diag-
disorders covered in the SADS-LA which
noses. Bromet, Dunn, Connell, Dew, and
provide DSM-III-R diagnoses.
Schulberg (1986) examined the 18-month test±
retest reliability of the SADS-L in diagnosing
lifetime major depression in a community 4.05.2.3 Present State Examination
sample of 391 women. Whenever possible,
interviewers conducted assessments with the The Present State Examination (PSE) grew
same subject at both interviews. Overall, out of research projects in the UK requiring the
reliability of lifetime diagnoses of RDC episodes standardization of clinical assessment. The PSE
of major depression was quite low. Of those was not developed as a diagnostic instrument, as
women reporting an episode of major depres- with the SCID and SADS. Rather, the PSE was
sion at either interview for the period preceding intended to be descriptive and facilitate in-
the first assessment, only 38% consistently vestigation of diagnostic rules and practices. At
reported these episodes at both interviews (62% the time of the first publication of this
reported a lifetime episode on one occasion but instrument (Wing, Birley, Cooper, Graham, &
not another). For those women meeting lifetime Isaacs, 1967), the PSE was in its fifth edition.
criteria for a depressive episode at the first Currently, the ninth edition is widely used
interview, fully 52% failed to meet criteria at the (Wing, Cooper, & Sartorius, 1974) and the tenth
time of the second interview. edition of the PSE is available (Wing et al.,
In a large-scale study of 2226 first degree 1990). The PSE has been translated into over 40
relatives of probands participating in the languages and has been used in two large-scale
National Institute of Mental Health (NIMH) international studies: the US±UK Diagnostic
Collaborative Program on the Psychobiology of Project (Cooper et al., 1972) and the Interna-
Depression study, Rice, Rochberg, Endicott, tional Pilot Study of Schizophrenia (IPSS;
Lavori, and Miller (1992) examined the stability World Health Organization, 1973).
of SADS-L-derived RDC diagnoses over a six- The standardization of the PSE is achieved
year period. The rater at the second interview through the provision of a glossary of defini-
was blind to the initial SADS-L. A large degree tions for the phenomena covered by the inter-
of variability in reliability was obtained for view. Additionally, specific series of questions
RDC diagnoses, with kappas ranging from 0.16 with optional probes and cut-off points are also
to 0.70. Diagnoses with kappas greater than provided. Detailed instructions for rating the
0.50 included major depression, mania, presence and severity of symptoms is also
schizoaffective-mania, alcoholism drug use available. Despite this standardization, the
disorder, and schizophrenia. Diagnoses with developers have emphasized that it remains a
low reliability as reflected by kappas below 0.50 clinical interview. The examiner determines the
were hypomania, schizoaffective-depressed, cy- rating provided, evaluates the need for addi-
clothymia, panic disorder, generalized anxiety tional probe questions, and uses a process of
disorder, phobic disorder, antisocial personal- cross-examination to address inadequate or
ity, and obsessive-compulsive disorder. Rice inconsistent responses. As the name implies, the
et al. (1992) suggested that diagnostic reliability PSE was developed to ascertain present symp-
increases with symptom severity. In the studies tomatology and focuses on functioning in the
of Bromet et al. (1986) and Rice et al. (1992), month prior to the interview.
results indicated that there may be substantial The eighth edition of the PSE was comprised
error in the temporal stability of some SADS-L- of 500 items which were then reduced to 140
derived lifetime diagnoses. This error may be symptom scores. The ninth edition of the PSE
particularly problematic in nonclinical commu- reduced the number of items by having the 140
nity samples as studied in these investigations. symptoms rated directly (the presence or
Adult Disorders 105

absence of a symptom can be determined these authors to achieve reliable PSE adminis-
without asking as many questions, although tration. In this study a general introduction and
additional probe questions are maintained in experience with unstructured symptom assess-
the ninth edition). Items receive one of three ment was followed by reading and discussion
ratings. A zero indicates that a symptom is not of the PSE manual, the rating and discussion of
present. If present, a symptom is rated as either 13 videotaped PSE interviews, and finally,
one (moderate) or two (severe). Items are participation in and observation of 12 live
grouped into symptom scores based on item student-conducted PSE interviews followed by
content and infrequency (Wing et al., 1974). The discussion.
eighth edition takes approximately one hour to
complete while the ninth edition takes approxi-
4.05.2.3.1 Reliability
mately 45 minutes (Wing et al., 1974).
Symptoms can be further reduced to 38 Early evaluations of the reliability of the PSE
syndrome scores by grouping together symp- indicated promising agreement between raters.
toms of similar content. For example, in the In the first reliability study conducted on early
ninth edition the symptoms of worrying, tired- versions of the PSE (up to PSE-5), rater
ness, nervous tension, neglect through brood- agreement was evaluated with both independent
ing, and delayed sleep are combined into the interviews and observation of interviews, or
syndrome score of ªWorrying, etc.º These listening to audiotapes (Wing et al., 1967).
syndrome scores were intended to aid in the Assignment to main categories suggested rea-
process of diagnosis by reducing the informa- sonable agreement, using percent agreement, of
tion to be considered, provide descriptive 83.7%. Examining five nonpsychotic symp-
profiles, and provide a brief method of toms, agreement also seemed satisfactory (range
summarizing clinical information from other, across studies r = 0.53±0.97). Reliability for
non-interview, sources such as medical records nine psychotic symptoms, calculated for single
by using a syndrome checklist. interviews (tape recorded or observed) was also
Following the rating of items, a computer adequate (range of r = 0.62±0.97). Kendell,
program (CATEGO) can be used to summarize Everitt, Cooper, Sartorius, and David (1968)
PSE ratings further. For the ninth edition, the found a mean kappa for all items to be 0.77.
CATEGO program provides syndrome scores Luria conducted two reliability studies using
along with summary data for each syndrome the PSE-8 (Luria & McHugh, 1974; Luria &
(e.g., scores on constituent items). In the next Berry, 1979). Luria and McHugh (1974)
stage, the program further summarizes the examined agreement using six videotaped PSE
syndrome scores into six descriptive categories. interviews. The authors examined agreement
The certainty of each descriptive category is also for 19 profiles of their own design. Patients were
indicated (three levels of certainty are pro- ranked on each category based on ratings of
vided). Finally, a single CATEGO class (of 50) examiners. Reliability for these categories was
is assigned. Importantly, Wing (1983) has generally adequate with Kendall's W coeffi-
emphasized that the PSE and CATEGO cients greater than 0.73 except for behavioral
program were not developed as diagnostic items such as psychomotor retardation (0.66);
instruments per se. The CATEGO category or excitement, agitation (0.47), catatonic, bizarre
class assignments should not be considered behavior (0.44); blunted, inappropriate, incon-
diagnoses in the sense of DSM or ICD gruous affect (0.60). In a subsequent study,
nosology. Rather, these summaries are pro- Luria and Berry (1979) examined agreement on
vided for descriptive purposes. However, data 20 symptoms deemed of diagnostic importance,
from the US±UK Diagnostic Project and the 19 psychopathology profiles, and eight syn-
IPSS have indicated reasonable convergence dromes. Thirteen interviews were rated for
between CATEGO classes and clinical project reliability on the basis of videotapes; 12 were
diagnoses, especially when clinical history rated based on joint observational interviews.
information is added to the PSE (reviewed in Reliability for videotape and live symptom
Wing et al., 1974). ratings were adequate with median ICCs of 0.84
Although short training courses lasting one and 0.86, respectively (however, agitation or
week are available at various centers including retardation and bizarre behaviors were judged
the Institute of Psychiatry in London, Wing to have poor reliability). Of the 19 profile
(1983) suggests that more extensive training is ratings, the 13 had adequate reliability for
necessary. Wing (1983) recommends that at videotaped (0.97) and live interviews (0.95). The
least 20 interviews be conducted under super- six behavioral profiles were somewhat lower at
vision in order to determine competency in 0.72 and 0.66, respectively. Syndrome agree-
administration of the PSE. Luria and Berry ment was high with generalized kappas above
(1980) describe the stages of training used by 0.91.
106 Structured Diagnostic Interview Schedules

Three studies have examined inter-rater syndrome scores derived from symptom ratings
agreement for abbreviated versions of the ranged from 0.29±0.94 (median = 0.76). Two
PSE-8 and PSE-9 when used by nonpsychiatric syndrome scores were unacceptably low in
raters (Cooper, Copeland, Brown, Harris, & agreement, Ideas of Reference (0.44) and
Gourlay, 1977; Wing, Nixon, Mann, & Leff, Hypochondriasis (0.29).
1977; Rodgers & Mann, 1986). Cooper et al. In a recent study Okasha, Sadek, Al-Haddad
(1977) examined the agreement between ratings and Abdel-Mawgoud (1993) examined rater
of a psychiatrist or psychologist and those agreement for assigning diagnosis based on
obtained by a sociologist or sociology graduate ICD-9, ICD-10, and DSM-III-R criteria. The
student. Agreement was evaluated for both joint Arabic version of the PSE-9 was modified to
interviews and test±retest over one week. For collect extra data needed to make ICD and
joint interviews, with the exception of situa- DSM-III-R diagnoses. One hundred adult
tional anxiety (r = 0.34), inter-rater agreement inpatients and outpatients were interviewed
for the remaining 13 section scores was good, by a single rater. An abstract form with PSE
with correlations ranging from 0.65 to 0.96 scores and other demographic and clinical
(mean r = 0.77). Test±rest reliability was lower information was then rated and diagnoses
with five section scores having correlations assigned. Overall kappa for nine broad diag-
below 0.40 and the mean for the 14 sections nostic categories was acceptable (ICD-9,
decreasing to 0.49. The correlation between K = 0.79; ICD-10, K = 0.82; DSM-III-R,
total scores was 0.88 for inter-rater agreement K = 0.64). Overall kappa values for more
and 0.71 for test±retest. Finally, presence vs. specific 18 diagnoses diminished somewhat
absence decisions for the 150 rated items but remained adequate (ICD-9, 0.62; ICD-10,
indicated good reliability with a mean inter- 0.80, DSM-III-R, 0.63). Although this study
rater kappa of 0.74, and a mean test±retest indicates that PSE-9-derived information can be
kappa of 0.54. used to assign ICD and DSM diagnoses reliably
Wing et al. (1977) conducted two studies of a it does not address the reliability of PSE-9
brief version of the PSE-9. In the first, 95 interviews themselves as diagnostic ratings were
patients were interviewed independently (5±84 made from a single PSE abstract.
days between interviews) by a nonmedical
interviewer and a psychiatrist. Agreement was
4.05.2.3.2 Supplements to the PSE
examined for 13 syndromes and was unaccep-
tably low with a mean kappa of 0.34 (range Two supplements to the PSE have been
0±0.49). The authors examined agreement on developed to address limitations in this instru-
five symptoms relating to anxiety and depres- ment. These supplements address the assessment
sion. Poor agreement was found for these of lifetime psychopathology (McGuffin, Katz, &
symptoms with kappas below 0.32. In the Aldrich, 1986) and change ratings (Tress,
second study, 28 interviews were conducted Bellenis, Brownlow, Livingston, & Leff, 1987).
by a nonmedical interviewer. Audiotapes of Because of the PSE's focus on the last month, its
these interviews were rated by a psychiatrist. use in epidemiological studies is somewhat
The mean kappa for syndrome scores was 0.52 limited as these population-based investigations
(range = 0.25±0.85). Ratings of the five symp- generally require the assessment of lifetime
toms yielded kappas above 0.62 with the psychopathology. This concern led McGuffin
exception of free-floating anxiety (K = 0.34). et al. (1986) to modify the PSE. A Past History
In a large population study, Rodgers and Schedule was developed to determine the dates
Mann (1986) assessed inter-rater agreement of onset of worst episode of psychopathology,
between nurses and a psychiatrist's rating of first psychiatric contact, and severest distur-
audiotapes. Audiotapes of 526 abbreviated bance and recovery. Based on information
PSE-9 interviews were evaluated. A statistic obtained with the Past History Schedule, the
of index of association was used, although the PSE is then administered in three time formats:
authors report that this measure was highly focusing on the last month, the most serious past
correlated with kappa. Of 44 symptoms rated, episode, and modifying each PSE obligatory
six were considered too infrequent to evaluate. question with ªhave you ever experienced this?º
Of the remaining 38 symptoms the median index Reliability assessment of this modified PSE
of association was 0.73 (range 0±0.96); seven using audiotaped interviews (McGuffin et al.,
items (18%) were unacceptably low in level of 1986) has suggested adequate inter-rater agree-
agreement (index of association less than 0.45): ment for the PSE CATEGO classes for past
Expansive Mood, Ideomotor Pressure, Obses- month (kappa range = 0.48±0.74), first episode
sional Checking/Repeating, Obsessional Ideas/ (kappa range = 0.87±1), and ever (kappa
Rumination, Hypochondriasis, Suicidal Plans range = 0.88±0.92). Rater agreement for dating
or Acts, and Ideas of Reference. Thirteen past episodes was also found to be satisfactory
Adult Disorders 107

(rank-order correlation coefficients, med- havioral criteria. At this time existing clinical
ian = 0.83, range 0.54±0.99). structured diagnostic interviews became limited
Tress et al. (1987) modified the PSE for in that they did not conform to the DSM-III
purposes of obtaining change ratings. The criteria (e.g., the SADS and PSE). Although the
authors suggest that the advantage of the PSE Diagnostic Interview Schedule (DIS) was
over other instruments available for ratings of developed to yield DSM-III diagnoses, the
clinical change are that the PSE gives data for DIS was designed to be used by lay interviewers
clinical classification, provides clear definitions in epidemiological studies. It was argued by
of items, and uses a structured interview format. Spitzer (1983) that the most valid diagnostic
The PSE Change Rating is administered assessment required the skills of a clinician so
following a standard PSE assessment. Items that the interviewer could rephrase questions,
not rated positively on the initial assessment are ask further questions for clarification, challenge
discarded (as well as items that were not rated). inconsistencies, and use clinical judgment in
Subsequent ratings are only made on these ultimately assigning a diagnosis. Thus, the
remaining items. These items are subsequently SCID was initially developed as a structured,
rated on an eight-point scale from zero yet flexible, clinical interview for DSM-III, and
(Completely Remitted) to seven (Markedly subsequently DSM-III-R, diagnoses (Spitzer,
Worsened). Inter-rater agreement based on Williams, Gibbon, & First, 1992).
observed interviews was high for grouped The SCID-I has been revised several times
symptom ratings (ICC range = 0.75±0.99) and due to criteria changes and field trials. The
selected individual symptoms (ICC range interview was primarily developed for use with
= 0.70±1). adults, but may be used with adolescents. It is
contraindicated for those with less than an
4.05.2.3.3 Summary eighth grade education, severe cognitive im-
pairments, and experiencing severe psychotic
As the first semistructured clinical interview symptoms (First et al., 1996). The SCID-I is
the PSE has an extensive history with applica- available in Spanish, German, Portuguese,
tion in a number of studies. Additionally, the Dutch, and Hebrew, as well as English.
PSE has been translated into over 40 languages Separate versions of the SCID-I have been
and has been employed in cross-cultural studies. developed for research and clinical applications.
A potential advantage of the PSE is that it is not The clinical version, the SCID-I-CV, is briefer
constrained by a particular diagnostic system; than the research version and focuses primarily
however, the PSE-10 was designed to yield ICD- on key diagnostic information (excluding the
10 and DSM-III-R diagnoses (Wing et al., supplementary coverage provided in the re-
1990). The reliability data for the PSE are search version) and on the most commonly
constrained in that assessments have included a occurring diagnoses (First et al., 1996). Within
variety of versions and modifications of the PSE the research version, three variations of the
using raters with a variety of training with interview provide differing comprehensive cov-
different populations. Caution should be ex- erage of the disorders, subtypes, severity, course
ercised in applying these data to an investiga- specifiers, and history. The research versions
tor's own intended use. Furthermore, reliability have been used historically for inclusion,
data for the PSE-10, which has undergone exclusion, and data collection of study partici-
substantial revision, are not yet available, pants (in over 100 studies), and are distributed
although a multisite investigation has been in loose page format to allow the investigator to
conducted (Wing et al., 1990). Additionally, customize the SCID-I to meet the needs of their
examination of diagnostic reliability achieved research. The SCID-P (patient edition) was
with the PSE, while encouraging, has been designed to address psychiatric patients and
limited to a few diagnoses and are not available provides thorough coverage of psychotic dis-
for DSM-IV. orders and past psychiatric history. The SCID-
NP (nonpatient edition) was developed to focus
4.05.2.4 Structured Clinical Interview for DSM- on nonpsychiatric patients, and subsequently
IV/Axis I Disorders screens for psychotic disorders and provides less
comprehensive coverage of psychiatric history.
The Structured Clinical Interview for DSM- The SCID-P with Psychotic Screen was devel-
IV (SCID-I) is a semistructured interview oped for patients where a psychotic disorder is
designed to assist in determining DSM-IV Axis not expected (and therefore only screens for
I diagnoses (First, Gibbon, Spitzer, & Williams, psychotic disorders), but has thorough coverage
1996). Construction of the interview began in of psychiatric history.
1983 following the introduction of the DSM-III, The SCID-I can usually be administered in 60
which introduced operationalized, specific be- to 90 minutes, contingent on the quantity of
108 Structured Diagnostic Interview Schedules

symptoms and disorders, and the ability of the scoring sheets to indicate the lifetime absence or
interviewee to describe problems succinctly. It threshold, and current presence of each disorder.
begins with an introductory overview followed As a prerequisite for the SCID-I, the
by nine diagnostic modules. The overview interviewer must possess adequate clinical
provides open and closed questions that not experience and knowledge of psychopathology
only gather background information, but and diagnostic issues. The test developers
allows the interviewer to establish rapport with recommend the following training: reading the
the interviewee before more detailed (and administration sections of the User's guide for
potentially uncomfortable) diagnostic questions the SCID-I, reading the entire test booklet,
are asked. The overview gathers information on reading the questions orally, practicing the
demographics, work history, medical and SCID-I on a colleague/friend, watching a six-
psychiatric history, current stressors, substance hour didactic training videotape titled SCID-I
use, and the interviewee's account of current 201, role playing sample cases in the User's guide
and past problems (First et al., 1996). for the SCID-I, administering on an actual
There are nine diagnostic modules focusing subject, conducting joint interviews (with in-
on both current (usually defined as the past dependent ratings) followed by discussion
month) and lifetime assessment of diagnostic sections, and examining inter-rater and test±
criteria: Mood Episodes, Psychotic Symptoms, retest reliability among interviewers (First et al.,
Psychotic Disorders, Mood Disorders, Sub- 1996). The following training materials and
stance Use Disorders, Anxiety Disorders, services are available: User's guide for the SCID-
Somataform Disorders, Eating Disorders, and I-I, SCID-I 201 video tape, videotape samples of
Adjustment Disorders. An optional module interviews, on-site training, off-site monitoring,
covers Acute Stress Disorder, Minor Depressive and SCID-I certification (under development).
Disorder, Mixed Anxiety Depressive Disorder, Following training, interviewers would benefit
and symptomatic details of past Major Depres- from ongoing supervision and feedback from an
sive/Manic episodes. Each page of the modules experienced SCID-I interviewer.
contains questions, reprinted DSM-IV criteria,
ratings, and instructions for continuation.
4.05.2.4.1 Reliability
Initial questions are closed-ended and followed
up with open-ended elaboration questions. If Inter-rater agreement for the DSM-III-R
further clarification is needed, the interviewer is version of the SCID-I was examined for 592
encouraged to ask supplementary (their own) patients in five inpatient sites (one in Germany)
questions, give examples, present hypothetical and two nonpatient sites (Williams et al., 1992).
situations, and challenge inconsistencies. In At each site, two clinicians independently
essence, the interviewer is testing diagnostic interviewed and diagnosed patients at least 24
hypotheses. The ratings are based not on the hours but less than two weeks apart. In order to
question response, but on fulfillment of DSM- limit access to other information (e.g., chart
IV criteria which are provided alongside the review), interviewers were provided with only a
questions. The interviewer is encouraged to use brief summary of the hospital admission
alternate sources of information to assist in evaluation (circumstances of admission, num-
rating the criteria, such as observed behavior, ber of prior hospitalizations, presenting pro-
medical records, and informants. Each criteria blems). Diagnostic terms were excluded from
is rated as one of the following: ? = inadequate the summary. For patients, overall weighted
information, 1 = symptom clearly absent or kappa was 0.61 for 18 current and 0.68 for 15
criteria not met, 2 = subthreshold condition lifetime DSM-III-R diagnoses common to these
that almost meets criteria, and, 3 = threshold sites. Disorders with poor agreement (i.e., Ks
for criteria met. below 0.50) were current diagnoses of dysthy-
Unlike other diagnostic interviews such as the mia, agoraphobia without panic disorder, and
SADS, PSE, or DIS, where diagnostic algo- social phobia, and the lifetime diagnosis of
rithms are applied following the interview, the agoraphobia without panic disorder. Agree-
SCID incorporates diagnostic criteria and ment for specific substance dependence diag-
decision making within the interview. The use noses at a drug and alcohol treatment facility
of a ªdecision-tree approachº allows the inter- was high with all diagnoses having kappas
viewer to test hypotheses and concentrate on above 0.61 except cannabis dependence and
more problematic areas (Spitzer, Williams, polydrug dependence (both kappas below 0.35).
Gibbons & First, 1992). In addition, this For nonpatients, overall weighted kappa was
approach makes the interview more time 0.37 for five current diagnoses and 0.51 for
efficient, allowing the interviewer to ªpass overº seven lifetime diagnoses common to these sites.
areas of no concern. Following the interview, the The only diagnoses in nonpatients with a kappa
interviewer is provided with concise summary of 0.50 or greater were current panic disorder,
Adult Disorders 109

and lifetime diagnoses of alcohol dependence/ hood, adolescence, cognitive, factitious, sexual,
abuse, other drug dependence/abuse, and panic sleep, and impulse control disorders. Also, for
disorder. Due to low occurrences, data were those individuals interested in other diagnostic
inconclusive for infrequent diagnoses. nosologies or needing to obtain broader clinical
Although generally satisfactory, these find- assessments, the restriction of the SCID to
ings do indicate low agreement for some DSM-IV might be limiting. As with other
diagnoses. Williams et al. (1992) suggest several structured interviews there is as yet no informa-
possible causes for low rater agreement in this tion currently available on the reliability of the
study including the restriction of noninterview SCID-I for the DSM-IV criteria. However,
information, the focus on a broad range of minor changes in the diagnostic criteria should
diagnoses, and the flexible nature of the SCID in not adversely affect reliability obtained with the
using clinical judgment. With regard to this last DSM-III-R version.
point, a review of a sample of audiotapes
indicated that diagnostic disagreements were 4.05.2.5 Comprehensive Assessment of
largely due to one interviewer's acceptance of a Symptoms and History
yes response without requesting elaboration
while the other interviewer asked follow-up The Comprehensive Assessment of Symp-
questions that ultimately determined that an toms and History (CASH; Andreasen, 1987)
initial yes response did not meet diagnostic was developed without adherence to existing
criterion. As concluded by Williams et al. (1992) diagnostic systems (such as the DSM or ICD).
maximizing reliability on the SCID requires The CASH adopted this approach based on
extensive training in the diagnostic criteria and observations that diagnostic criteria change
an emphasis on not taking shortcuts but over time and that methods of collecting
requiring that descriptions of behavior are information that conform to these criteria
elicited to justify each criterion. may be quickly outdated (Andreasen, Flaum,
Several other studies offer data on the & Arndt, 1992). The CASH was designed for the
reliability of the SCID-I, but the findings are study of psychosis and affective syndromes and
confounded by small number of participants, is intended to provide a standardized assess-
changing DSM criteria and SCID-I revisions ment of psychopathology that will, ideally, yield
during the study, low base rates of disorders, diagnoses based on multiple criteria (both
and limited range of disorders (Segal, Hersen, & existing and future).
Van Hasselt, 1994). However, higher inter-rater The CASH consists of nearly 1000 items
agreement was observed in these studies divided into three sections: present state, past
(K = 0.70±1) compared to that obtained by history, and lifetime history. The present state
Williams et al. (1992). The differences may have section consists of sociodemographic informa-
been due to the use of joint interviews (which tion intended to establish rapport and, subse-
controls for subject report variance) rather than quently, items pertaining to present illness. This
independent interviews, access to noninterview section includes symptoms relating to the
information such as medical records and reports psychotic syndrome, manic syndrome, major
from other clinical staff, and the focus on a depressive syndrome, treatment, cognitive as-
narrower range of diagnoses assessed. sessment (laterality and a modified Mini-
Mental Status Examination), a Global Assess-
ment scale, and a summary of diagnoses for
4.05.2.4.2 Summary
current episode.
The SCID is a well-established structured Past history includes history of onset and
interview for determining DSM-III-R and hospitalization, past symptoms of psychosis,
DSM-IV diagnoses. Users may find the inclu- characterization of course, and past symptoms
sion of diagnostic algorithms within the SCID of affective disorder. To provide a detailed
and the use of skip-outs to result in a time- evaluation of phenomenology over time, for
efficient interview. Reliability data from multi- each symptom or sign, interviewers determine
ple sites indicate that the SCID can provide whether it was present during the first two years
reliable DSM-IV diagnoses. Additionally, the of illness, and whether it has been present for
SCID has some of the most extensive training much of the time since onset. Finally, the
materials and support available for any struc- lifetime history section includes history of
tured interview. The interview, user's guide, and somatic therapy, alcohol and drug use, pre-
all training materials have been completely morbid adjustment, personality (schizotypal
updated for DSM-IV. and affective characteristics), functioning in
There are, however, a few disadvantages of the past five years, Global Assessment scale and
the SCID-I. The interview does not cover a diagnoses for lifetime. Most items are given
number of disorders, including infant, child- detailed definitions with suggested interview
110 Structured Diagnostic Interview Schedules

probes. Items are typically rated on a six-point inter-rater agreement, ICCs were generally high
Likert-type scale. with three-quarters of the items having ICCs
A number of measures are embedded within greater than or equal to 0.65. For the test±retest
the CASH. Scales within the CASH include the design, reliability was somewhat lower with
Scale for Assessment of Negative Symptoms approximately one-half of the items demon-
(Andreasen, 1983), Scale for Assessment of strating ICC greater than or equal to 0.65.
Positive Symptoms (Andreasen, 1984), most Reliability data have been published for some
items for the Hamilton depression scale (Ha- more critical items or content areas (Andreasen
milton, 1960), and the Brief Psychiatric Rating et al., 1992). For history of illness, ICC values
Scale (Overall & Gorham, 1962), the Mini- for both inter-rater and test±retest designs were
Mental Status Exam (Folstein, Folstein, & quite adequate with values generally above 0.60
McHugh, 1975), and the Global Assessment (median ICCs above 0.70). Reliability for items
Scale (Endicott, Spitzer, Fleiss, & Cohen, 1976). relating to manic and depressive syndromes was
These measures make the CASH useful for acceptable (median ICCs = 0.68 and 0.58,
repeat assessments. respectively). For positive and negative symp-
The CASH was intended for use by indivi- toms inter-rater and test±retest reliability was
duals with experience and training in working generally acceptable for ªcurrentº and ªmuch of
with psychiatric patients (e.g., psychologists, time since onsetº time frames (global symptom
psychiatrists, nurses, or social workers). A scores ICCs greater than 0.65). However,
training program has been developed for its test±retest reliability for negative symptoms
use, which includes training videotapes con- rated for the ªfirst two years of illnessº and
ducted with patients presenting a range of ªworst everº were unacceptably low (ICCs = 0
psychopathology. Narratives and calibrated and 0.48, respectively). Test±retest data on
ratings for the CASH items are available from premorbid and prodromal symptoms were very
the authors low (median ICCs = 0.37 and 0.25, respec-
tively), while residual symptom ratings were
somewhat better (median ICC = 0.60).
4.05.2.5.1 Reliability
A small reliability study conducted with 30
4.05.2.5.2 Summary
patients has been reported (Andreasen et al.,
1992). Two forms of rater agreement were The CASH presents several advantages
evaluated. First, patients were interviewed by a including its lack of adherence to any diagnostic
primary rater with a second rater observing and system. This may afford the opportunity to
asking clarifying questions when necessary. collect a rich body of information on indivi-
Second, test±retest reliability was evaluated duals. The comprehensiveness of the items is
with a third rater interviewing the patient intended to allow for diagnoses for DSM and
within 24 hours of the initial interview. All ICD to be generated while not narrowing the
raters had access to medical records. Agreement collection of information to these systems.
between the two initial raters was generally good Available reliability data are encouraging, with
for the spectrum diagnoses (Schizophrenia some exceptions as noted above. The avail-
Spectrum, K = 0.86; Affective Spectrum, ability of training materials including video-
K = 1). For specific DSM-III-R diagnoses tapes and consensus ratings is also attractive.
(focusing on diagnoses with more than one The CASH also has companion instruments
case), the results were positive for schizophrenia that are useful in the context of longitudinal
(K = 0.61), bipolar affective disorder (K = 1), assessments, providing baseline and follow-up
and major depression (K = 0.65). However, assessment of psychosocial functioning and
reliability for schizoaffective disorder was symptomatology.
somewhat low (K = 0.45). Test±rest reliability The CASH is limited in several respects. First,
was similarly positive with kappas above 0.74 because it seeks a full assessment of symptoms
for spectrum diagnoses and above 0.64 for and history without regard to diagnostic criteria
specific DSM-III-R diagnoses with the excep- the entire CASH must be administered (how-
tion of schizoaffective disorder (K = 0.52). ever, some syndromes can be skipped if the
Because of the intent of the CASH to provide interviewer already knows that a syndrome is
a reliable assessment of symptoms and func- not applicable). With nearly 1000 items this
tioning independent of diagnostic classification, ensures lengthy assessment. Second, the CASH
it is important to examine the reliability of the is limited to schizophrenia and affective syn-
individual items. Given the number of items, dromes and alcohol and drug use. Thus, it may
Andreasen et al. (1992) provide summaries of not provide the breadth of symptom and
the intraclass correlation coefficients for the diagnostic evaluation that some settings may
inter-rater and test±retest administrations. For require. Finally, although intended to be
Adult Disorders 111

capable of assigning diagnoses based on extant medical history screening questions. Somatiza-
nosological systems, the CASH may not always tion follows to enhance flow from medical
be capable of achieving this goal. Diagnostic history. An overview section assesses psychia-
criteria may require symptom information that tric history and course of illness and this
does not conform to the information obtained information is summarized graphically in a
with the CASH. Interested users should care- time line to provide chronology of symptoms
fully evaluate the content of the CASH to ensure and episodes of illness. Mood disorders include
that relevant diagnoses can be made. major depression, mania/hypomania, and cy-
clothymic personality disorder. The DIGS also
4.05.2.6 Diagnostic Interview for Genetic provides a detailed assessment of substance use
Studies history. Psychotic symptoms are assessed in
great detail and psychotic syndromes are
The Diagnostic Interview for Genetic Studies distinguished. Additionally, schizotypy is also
(DIGS; Nurnberger et al., 1994) was developed assessed. A unique feature of the DIGS is an
by participants in the NIMH Genetics Initia- assessment for comorbid substance use. The
tive. The need for the DIGS arose from aim of this section is to determine the temporal
perceptions that inconsistent findings in the relationship between affective disorder, psycho-
genetics of psychiatric illnesses might be, in part, sis, and substance use. Suicidal behaviors,
the result of differences in phenotypic assess- major anxiety disorders (except generalized
ment. Problems in assessment are highlighted in anxiety disorder), eating disorders, and socio-
genetics research where family members are pathy are also evaluated. Finally, at the
more likely to evince spectrum disorders and conclusion of the DIGS, the interviewer
subclinical symptomatology. completes a Global Assessment scale (Endicott
The DIGS adopts a polydiagnostic approach et al., 1976), the Scale for the Assessment of
that collects clinical information in sufficient Negative Symptoms (Andreasen, 1983), and the
detail to allow a variety of diagnostic definitions Scale for the Assessment of Postive Symptoms
to be applied including DSM-III-R (a new (Andreasen, 1984).
version for DSM-IV is now available), modified Appropriate personnel to administer the
RDC, RDC, Feighner criteria, ICD-10, and the DIGS are mental health professionals with
European Operational Criteria (OPCRIT) clinical experience and familiarity with multiple
Checklist (McGuffin & Farmer, 1990). As with diagnostic systems. In the study of Nurnberger
the CASH, the advantage of this feature et al. (1994), all but one interviewer had prior
includes the collection of a broad data set for experience administering semistructured clin-
diagnostic entities whose definitions are some- ical interviews. Training as outlined in the
times ambiguous and evolving. However, the reliability studies (Nurnberg et al., 1994)
DIGS (unlike the CASH) explicitly collects consisted of demonstration interviews by senior
information that conforms to several diagnostic clinicians, the administration of a DIGS under
systems including DSM-IV. Items from other supervision, and supplemental training invol-
interviews have been incorporated into the ving three videotaped patient interviews.
DIGS including the SADS, CASH, and DIS.
Like the SADS the DIGS provides standard
4.05.2.6.1 Reliability
probe questions and criterion-based definitions.
Additionally, the DIGS requires clinical judg- Test±retest reliability for the DIGS has been
ment for item ratings and in determining the evaluated for major depression, bipolar dis-
need for further probe questions. Sections of the order, schizophrenia, schizoaffective disorder
DIGS begin with one or two screening questions and an ªotherº category (Faraone et al., 1996;
that, if denied, allow the interviewer to skip out Nurnberger et al., 1994). Test±retest reliability
of the remainder of the section. Questions are was evaluated within participating research sites
integrated so as to cover the various diagnostic as well as across sites. For the intrasite study,
criteria covered while maintaining an efficient participants were independently interviewed
flow of questioning. The interview can take 30 with the DIGS over a period of no more than
minutes to four hours depending on the three weeks. For the intersite study, interviewers
complexity of the symptomatology (median traveled to other research centers so that
time for symptomatic individuals is two and interviewers from different sites could assess
one-half hours). the same subjects. These pairs of interviews were
The DIGS begins with a modified Mini- conducted within a 10-day period. With the
Mental Status examination in order to deter- exception of DSM-III-R schizoaffective disor-
mine if the interview should be terminated as a der (K less than 0.50), DSM-III-R and RDC
result of cognitive impairment. The Introduc- target diagnoses showed excellent reliability
tion continues with demographics and extensive with kappas above 0.72 across the two studies.
112 Structured Diagnostic Interview Schedules

4.05.2.6.2 Summary 4.05.2.7.1 Reliability


The DIGS appears to be an excellent
vRobins et al. (1981) addressed the question
instrument for the study of the genetics of
of whether lay interviewers could obtain psy-
schizophrenia and affective disorders and other
chiatric diagnoses comparable to those ob-
comorbid conditions. It provides an exhaustive
tained by a psychiatrist. In a test±retest design,
assessment of symptomatology that allows for
subjects (mostly current or former psychiatric
the comparison of findings across a number of
patients) were separately interviewed by a lay
diagnostic systems. Furthermore, it targets
interviewer and a psychiatrist, both using the
spectrum disorders and other comorbid condi-
DIS. With the exception of DSM-III panic
tions that may be relevant in family studies of
disorder (K = 0.40), kappas for all lifetime
schizophrenia and affective disorders. Although
diagnoses across each diagnostic system were
reliability has been shown to be high across
0.50 or greater. Mean kappas for each
different sites, these data are limited to schizo-
diagnostic system were quite adequate: DSM-
phrenia and the affective disorders (but are not
III, K = 0.69; Feighner, K = 0.70; RDC,
available for bipolar II and schizotypal person-
K = 0.62. Further analysis of these data
ality); data are not available for other disorders
(Robins, Helzer, Ratcliff, & Seyfried, 1982)
such as the anxiety and eating disorders. As
suggested that current disorders and severe
emphasized by the developers, the DIGS is not
disorders are more reliably diagnosed with the
designed for routine clinical use. It is designed to
DIS than disorders in remission or borderline
be used by highly trained clinical interviewers
conditions.
for use in research settings.
Although the findings of Robins et al. (1981
1982) suggested acceptable concordance be-
4.05.2.7 Diagnostic Interview Schedule tween lay interviewers and psychiatrists using
the DIS, these data do not address whether the
The DIS (Robins, Helzer, Croughan, & DIS would yield similar diagnoses as obtained
Ratcliff, 1981) is a highly structured interview by psychiatrists with broader clinical assess-
developed for use in large-scale epidemiological ment than that allowed by the DIS alone.
studies by the NIMH (the Epidemiological Additionally this first study was conducted
Catchment Area, ECA, projects). Because of with a largely psychiatric sample and may not
the logistical constraints in general population be generalizable to the nonclinical populations
studies, the use of traditional structured inter- the DIS was designed for. Anthony et al.
views administered by clinicians is prohibitive. (1985), using data from the ECA obtained in
The DIS was developed for use by lay Eastern Baltimore, compared DIS-obtained
interviewers with relatively brief training (one diagnoses with psychiatrist-conducted clinical
to two weeks). Thus, unlike structured inter- reappraisal examinations (N = 810). These
views such as the SADS or SCID, the DIS clinical reappraisals were based on an aug-
minimizes the amount of discretion that an mented PSE consisting of 450 signs and
interviewer exercises in either the wording of symptoms and included all items of the PSE-
questions or in determining the use of probe 9. Additionally, psychiatrists reviewed all
questions. Additionally, diagnoses are not made available records. The two assessments were
by the interviewer; rather, the data are scored, independent and the majority were completed
and diagnoses assigned, by a computer program. within four weeks of each other. Diagnostic
The DIS was designed to provide information comparisons were for conditions present at the
that would allow diagnoses to be made accord- time of the interview or within one month prior
ing to three diagnostic systems: DSM-III (APA, to the interview. Results indicated substantial
1980), the Feighner criteria (Feighner et al. differences between DIS and psychiatrists'
1972), and the RDC. The interview covers each diagnoses. Except for schizophrenia and manic
item or criterion in the form of one or more episode, one month prevalence rates for DSM-
close-ended questions. Questions assess the III diagnostic categories were significantly
presence of symptoms, whether they meet different for the two methods. Additionally,
criteria for frequency and clustering in time, there was very low concordance between DIS-
and whether they meet the age-at-onset criter- based diagnoses and those obtained by psy-
ion. The use of a Probe Flow Chart provides chiatrists with kappas for selected diagnoses all
probes needed to determine severity and address below 0.36.
alternative explanations. Rules concerning In a second major study based on ECA data
when and what probe questions to use are collected in St. Louis, Helzer et al. (1985)
explicit in the interview. Nearly all questions can compared lay-interview diagnoses with those
be read by lay interviewers as written. The DIS obtained by a psychiatrist (N = 370). The
can take between 45 and 75 minutes to complete. psychiatrist re-examined subjects with the
Adult Disorders 113

DIS and were also allowed to ask whatever 4.05.2.8 Composite International Diagnostic
supplemental questions they deemed necessary Interview
to resolve diagnostic uncertainties following
the DIS. The majority of psychiatrist inter- The Composite International Diagnostic
views were conducted within three months of Interview (CIDI; Robins et al., 1988; World
the lay interview. Diagnostic comparisons were Health Organization [WHO], 1990) was devel-
made for lifetime diagnoses. Helzer et al.'s oped at the request of WHO and the United
summary of the data was somewhat optimistic, States Alcohol, Drug Abuse, and Mental
indicating that corrected concordance was 0.60 Health Administration. The CIDI was designed
or better for eight of the 11 lifetime diagnoses. to serve as a diagnostic instrument in cross-
However, when kappa statistics are examined, cultural epidemiological and comparative stu-
the results are less promising. Only one of dies of psychopathology. The initial version of
eleven diagnoses obtained a kappa greater than the CIDI was based on the DIS, to cover DSM-
or equal to 0.60, and eight diagnoses had III diagnoses, and initially incorporated aspects
kappas below 0.50. As summarized by Shrout of the PSE since the PSE has been used in cross-
et al. (1987) ªfor most diagnoses studied, the cultural studies and reflects European diagnos-
agreement in community samples between the tic traditions. Some items from the DIS were
DIS and clinical diagnoses is poorº (p. 177). altered either to provide further information for
A number of other studies have been the PSE or to address language and content that
conducted comparing lay interviewer-adminis- would allow cross-cultural use. Additional
tered DIS diagnoses with clinical diagnoses questions were added to provide adequate
(e.g., Erdman et al., 1987; Escobar, Randolph, coverage of the PSE-9. These PSE items were
Asamen, & Karno, 1986; Ford, Hillard, rewritten into the closed-ended format of the
Giesler, Lassen, & Thomas, 1989; Spengler & DIS.
Wittchen, 1988; Wittchen, Semler, & von Initial versions of the CIDI provided DSM-
Zerssen, 1985; also see recent review by III diagnoses and updated versions, used in
Wittchen, 1994). These studies are difficult to Phase II WHO field trials, now provide DSM-
summarize and their interpretability is some- III-R and ICD-10 diagnoses (WHO, 1990). The
times limited due to the use of a variety of latest version of the CIDI has also eliminated all
assessment methodologies, diagnostic systems, questions that are not needed for DSM-III-R
and populations. Although some diagnoses (deletion of Feighner and RDC criteria) and has
achieve acceptable concordance levels between added items to assess ICD-10 criteria. Further-
the DIS and clinical diagnoses, in total the more, the PSE approach was abandoned and
results of these studies suggest limitations of the only same questions from the PSE were retained
DIS consistent with the observations of Shrout because they were relevant to ICD-10 diagnoses.
et al. (1987). Wittchen (1994) has summarized Revisions of the CIDI to meet DSM-IV criteria
particular problems apparent in the panic, are in progress. A Substance Abuse Module was
somatoform, and psychotic sections of the developed for the CIDI to be used alone or
DIS. substituted for the less detailed coverage of drug
abuse in the CIDI proper (Cottler, 1991). Other
modules that have been developed or are under
4.05.2.7.2 Summary
development include post-traumatic stress dis-
The DIS marked a significant development in orders, antisocial disorder, conduct disorder,
the epidemiological study of psychopathology. pathological gambling, psychosexual dysfunc-
The ECA findings based on the DIS have tions, neurasthenia, and persistent pain disorder
yielded important information about the epi- (Wittchen, 1994).
demiology of a variety of disorders. However, Like the DIS, the CIDI was intended to be
studies examining the concordance between the used by lay interviewers with modest training
DIS and clinical interviews conducted by (one week), and to be capable of rapid
psychiatrists suggest that there may be appreci- administration. In a multicenter study con-
able diagnostic inaccuracy in DIS-based diag- ducted in 15 countries the CIDI was found to be
noses. Although the use of the DIS in judged appropriate by the majority of inter-
epidemiological studies may continue to be viewers (Wittchen et al., 1991). However, 31%
warranted given the logistical constraints of of interviewers rated parts of the CIDI as
such studies and the important data the DIS inappropriate, in particular sections for schizo-
does obtain, the concern with diagnostic phrenia and depression. The CIDI also takes a
reliability should preclude the use of the DIS long time to administer: one-third of the
in settings where other structured diagnostic interviews took one to two hours and another
interviews can be used (e.g., the SADS or third lasted two to three hours (Wittchen
SCID). et al., 1991). In this study the duration of the
114 Structured Diagnostic Interview Schedules

interviews may have been extended because of 4.05.2.8.2 Summary


the assessment of predominantly clinical popu-
The CIDI may be considered the next step
lations. One might expect briefer administration
beyond the DIS. This instrument incorporated
times with general population samples.
lessons learned from the development of the DIS
As with the DIS, training for the CIDI can be
and has been subject to repeated revisions to
conducted in five days. No professional experi-
enhance its reliability and cross-cultural appli-
ence is necessary as the CIDI is intended to be
cation. The CIDI appears to have achieved
used by lay interviewers. Training sites partici-
somewhat better reliability than the DIS and it
pating in the WHO field trials may be available
covers the latest diagnostic standards of ICD-10
to conduct courses. However, there is a CIDI
and DSM-III-R (soon to cover DSM-IV).
user manual, a standardized training manual
However, the concordance between CIDI-
with item-by-item specifications, and computer
obtained diagnoses and diagnoses obtained by
scoring program available from WHO (1990).
other structured interviews administered by
clinicians (e.g., SADS, SCID) remains unclear.
4.05.2.8.1 Reliability
In an evaluation of the ªprefinalº version of 4.05.3 PERSONALITY DISORDERS
the CIDI (DSM-III and PSE) across 18 clinical 4.05.3.1 Structured Interview for DSM-IV
centers in 15 countries, Wittchen et al. (1991) Personality Disorders
found high inter-rater agreement. Kappas for
diagnoses were all 0.80 or greater with the Introduced in 1983, the Structured Interview
exception of somatization (0.67). Wittchen for DSM-III Personality Disorders (SIDP) was
(1994) has summarized test±retest reliability the first structured interview to diagnose the full
of the CIDI across three studies involving range of personality disorders in DSM-III
independent interviews conducted within a (Stangl, Pfohl, Zimmerman, Bowers, & Cor-
period of three days. Kappa coefficients for enthal, 1985). Subsequent versions have ad-
DSM-III-R diagnoses were all above 0.60 with dressed Axis II criteria for the DSM-III-R
the exception of bipolar II (0.59), dysthymia (SIDP-R; Pfohl, Blum, Zimmerman, & Stangl,
(0.52), and simple phobia (0.59). 1989) and the DSM-IV (SIDP-IV; Pfohl, Blum,
Two studies have also examined the con- & Zimmerman, 1995).
cordance between the CIDI and clinical ratings. SIDP-IV is organized into 10 topical sections:
Farmer, Katz, McGuffin, and Bebbington Interests and Activities, Work Style, Close
(1987) evaluated the test±retest concordance Relationships, Social Relationships, Emotions,
between CIDI PSE-equivalent items obtained Observational Criteria, Self-perception, Percep-
by a lay interviewer and PSE interviews tion of Others, Stress and Anger, and Social
conducted by a psychiatrist. Interviews were Conformity. This format is intended to provide
conducted no more than one week apart. a more conversational flow and is thought to
Concordance at the item level was unacceptably enhance the collection of information from
low. Of 45 PSE items, 37 (82%) achieved kappas related questions and facilitate the subsequent
below 0.50. Agreement was somewhat higher at scoring of related criteria.
the syndrome level but remained low for the The SIDP-IV can be administered to a patient
syndromes of behavior, speech, and other and to an informant, requiring 60±90 and 20±30
syndromes (Spearman r = 0.44) and specific minutes, respectively (Pfohl et al., 1995). In
neurotic syndromes (Spearman r = 0.35). addition, the interview takes 20±30 minutes to
Janca, Robins, Cottler, and Early (1992) score. Each page of the interview provides
examined diagnostic concordance between a questions, prompts, diagnostic criteria, and
clinical interviewer using the CIDI and a scoring anchors. The informant interview is
psychiatrist in a small sample of patients and composed of a subset of questions from the
nonclinical subjects (N = 32). Psychiatrists patient interview. Two alternate versions of the
asked free-form questions and completed an SIDP-IV are available. The SIDP-IV Modular
ICD-10 checklist following either the observa- Edition is organized by personality disorders
tion of lay-administered CIDI interview, or rather than by topical sections. This modular
following their own administration of the CIDI. form permits the interviewer to focus on
Overall diagnostic agreement appeared ade- disorders of interest and to omit others. The
quate with an overall kappa of 0.77. High Super SIDP is an expanded version that includes
concordance was also found for the ICD-10 all questions and criteria necessary to assess
categories of anxiety/phobic disorders (K DSM-III-R, DSM-IV, and ICD-10 personality
= 0.73), depressive disorders (K = 0.78), and disorders.
psychoactive substance use disorders Several instructions for administering the
(K = 0.83). SIDP-IV are noteworthy. First, the SIDP uses a
Personality Disorders 115

ªfive year ruleº to operationalize criteria total SIDP-R score or scores from Clusters A,
involving an enduring pattern that represents B, and C, was satisfactory (ICCs ranging from
personality. Thus, ªbehavior, cognititions, and 0.82 to 0.90). Inter-rater reliability for presence
feelings that have predominated for most of the or absence of any personality disorder with the
last five years are considered to be representa- SIDP-R was moderate with a kappa of 0.53.
tive of the individual's long-term personality Due to infrequent diagnoses, mixed diagnoses,
functioningº (Pfohl et al., 1995, p. ii). The and the number of subthreshold protocols, in
SIDP-IV is intended to follow assessment of this study kappas for individual diagnoses were
episodic (Axis I) disorders in order to assist the not provided.
interviewer in ruling out the influence of Stangl et al. (1985) conducted SIDP inter-
temporary states of behavior described by the views on 63 patients (43 interviews were
patient. Second, the patient's responses are not conducted jointly, and 20 interviews were
given a final rating until following the interview. separated by up to one week). The kappa for
This is intended to allow for all sources of presence or absence of any personality disorder
information to be reviewed before rating. was 0.66. Only five personality disorders
However, unlike previous versions of the SIDP, occurred with enough frequency to allow kappa
the SIDP-IV now provides a rater with the to be calculated: dependent (0.90), borderline
opportunity to rate or refer to specific DSM-IV (0.85), histrionic (0.75), schizotypal (0.62), and
criteria associated with each set of questions. avoidant (0.45). Using the SIDP among a small
Third, use of an informant is optional and Pfohl sample of inpatients, Jackson, Gazis, Rudd, &
et al. (1995) note that while the frequency of Edwards (1991) found inter-rater agreement to
personality diagnoses may increase with the use be adequate to poor for the five specific
of informants there appears to be little effect on personality disorders assessed: borderline
predictive validity. (K = 0.77), histrionic (0.70), schizotypal
Each diagnostic criterion is scored as one of (0.67), paranoid (0.61), and dependent (0.42).
the following: 0 = not present or limited to rare The impact of informant interviews on the
isolated examples, 1 = subthreshold: some diagnosis of personality disorders and inter-
evidence of the trait but it is not sufficiently rater agreement for the SIDP was assessed by
pervasive or severe to consider the criterion Zimmerman, Pfohl, Stangl, and Corenthal
present, 2 = present: criterion is present for (1986). Inter-rater agreement (kappa) for the
most of the last five years (i.e., present at least presence or absence of any personality disorder
50% of the time during the last five years), and was 0.74 before the informant interview and
3 = strongly present: criterion is associated with 0.72 after the informant interview. Kappas for
subjective distress or some impairment in social individual personality disorders were all 0.50 or
or occupational functioning or intimate rela- above. Reliability did not appear to benefit or
tionships. Unlike other personality interviews be compromised by the use of informants.
(e.g., the IPDE and SCID-II), scores of both 2 However, the informant generally provided
and 3 count towards meeting diagnostic criteria additional information on pathology and,
(Pilkonis et al., 1995). following the informant interview, diagnoses
The SIDP-IV is an interview requiring clinical that had been established with the subject only
skill in determining the need for additional were changed in 20% of the cases (Zimmerman
probe questions and to discriminate between et al., 1986).
personality (Axis II) disorders and episodic In an examination of the long-term test±retest
(Axis I) disorders. The developers of the SIDP- reliability of the SIDP, Pfohl, Black, Noyes
IV recommend one month of intensive training Coryell, and Barrash (1990) administered the
to administer the interview properly (Pfohl et al., SIDP to a small sample of depressed inpatients
1995; Standage, 1989). Pfohl et al. (1995) have during hospitalization and again 6±12 months
reported success with interviewers having at later. Information from informants was used in
least an undergraduate degree in the social addition to patient interviews. Of the six
sciences and six months of previous experience disorders diagnosed three had unacceptably
with psychiatric interviewing. Training video- low kappas (below 0.50): passive-aggressive
tapes and workshops are available from the (0.16), schizotypal (0.22), and histrionic (0.46).
developers of the SIDP-IV. Adequate test±retest reliability was obtained for
borderline (0.58), paranoid (0.64), and anti-
social (0.84).
4.05.3.1.1 Reliability
Several investigations of inter-rater reliability
4.05.3.1.2 Summary
reveal poor to good agreement. Using the SIDP-
R, Pilkonis et al. (1995) found that inter-rater The SIDP-IV represents the latest version of
agreement for continuous scores on either the the first interview to assess the spectrum of
116 Structured Diagnostic Interview Schedules

DSM personality disorders. Although origin- The IPDE contains materials for determining
ally developed to be administered in a topical both DSM-IV and ICD-10 diagnoses. However,
format to assess DSM personality disorders the due to the long length of the interviews noted in
SIDP-IV now provides alternative formats for the field trials (mean interview length was 2
assessing specific disorders without administer- hours, 20 minutes), the interview is distributed
ing the entire SIDP-IV and for assessing ICD in two modules for each classification system.
diagnoses. Reliability data are encouraging for Furthermore, clinicians and researchers can
some disorders. However, these data are limited easily administer specific personality modules to
to selected disorders using the SIDP and suit their purpose (Loranger et al., 1994).
reliability data have not been presented for A self-administered IPDE screening ques-
specific disorders using the SIDP-R (Pilkonis tionnaire may be administered prior to the
et al., 1995). No reliability data are available for interview in order to eliminate subjects who are
the SIDP-IV. Little data are available concern- unlikely to have any or a particular personality
ing the long-term test±retest reliability of the disorder. Similar to the SCID-II (described
SIDP. The SIDP-IV does not come with a below), a corresponding low diagnostic thresh-
screening questionnaire to assist in identifying old (for endorsement) is set for each question. If
personality disorders that might be a focus of three or more items are endorsed for a specific
the interview. personality disorder, then the interview is
administered for that personality disorder.
4.05.3.2 International Personality Disorder In the attempt to establish reliable diagnoses,
Examination the IPDE interview utilizes two distinct guide-
lines. First, the behavior or trait must be present
The International Personality Disorder Ex- for at least five years to be considered an
amination (IPDE; Loranger et al., 1995), a expression of personality with some manifesta-
modified version of the Personality Disorder tions (based on the disorder) occurring within
Examination (PDE), is a semistructured inter- the past 12 months (Loranger et al., 1995). This
view designed to assess personality disorders in strict criterion is adopted to ensure the enduring
both the DSM-IV and ICD-10 classification nature of behavior and life experiences, and rule
systems. The PDE was initially developed in the out transient or situational behaviors. A second
early 1980s to assist in the diagnosis of guideline for the IPDE is that one criterion must
personality disorders. At that time, only be met before the age of 25. However, if an
structured interviews existed that focused on individual develops a personality disorder later
Axis I mental disorders. A highly structured lay- in life (with no criterion exhibited prior to age
administered interview for personality disorders 25) the IPDE provides an optional ªlate onsetº
was thought to be inappropriate due to the diagnosis (Loranger et al, 1995).
complexity of diagnostic criteria and level of The developers constructed the IPDE inter-
inference required (Loranger et al., 1994). The view not only to be clearly organized, but to
first version of the PDE was completed in 1985. ªflowº naturally. As a result, the diagnostic
Beginning in 1985, international members of the criteria are not ordered by cluster or disorder,
psychiatric community attended several work- but by six sections that assess major life
shops to formulate an international version of domains: Work, Self, Interpersonal Relation-
the PDE, the IPDE, which was developed under ships, Affects, Reality Testing, and Impulse
the WHO, and the US Alcohol, Drug Abuse, Control. Each section begins with open-ended
and Mental Health Administration System. The questions that provide a transition between
purpose of the IPDE was to assess personality sections and allow the interviewer to gather
disorders in different languages and cultures. general background information. Closed-ended
The IPDE interview surveys behavior and life and elaboration questions follow for each
experiences relevant to 157 criteria and can be criterion (Loranger et al., 1995). Each individual
used to determine DSM-IV and ICD-10 page of the IPDE is designed to optimally assist
categorical diagnoses and dimensional scores the interviewer in correctly determining if the
of personality disorders. The IPDE is not diagnostic criteria is met. Each page contains:
recommended for use on individuals who are personality disorder and criterion number,
under the age of 18, agitated, psychotic, severely structured questions, reprinted DSM-IV or
depressed, below normal intelligence, or se- ICD-10 criteria, notes on determining criteria,
verely cognitively impaired. The interview is descriptions of scoring criteria, and scoring
available in the following languages: English, areas for both interview and informants.
Spanish, French, German, Italian, Dutch, The scoring of the IPDE interview is similar
Danish, Norwegian, Estonian, Greek, Russian, to other semistructured interviews. Prior to the
Japanese, Hindi, Kannada, Swahili, and Tamil interview, the developers recommend that
(Loranger et al., 1995). collecting information or conducting interviews
Personality Disorders 117

on Axis I disorders be completed to assist in broader definite or probable criteria, kappa for
scoring the criteria. Each trait or behavior (i.e., any personality disorder increased to 0.70 for
criterion) is scored as one of the following: DSM-III-R and 0.71 for ICD-10. For temporal
absent or normal (0), exaggerated or accentu- stability, kappas for the presence or absence of a
ated (1), criterion level or pathological (2), and personality disorder were 0.62 for DSM-III-R
interviewee is unable to furnish adequate and 0.59 for ICD-10. Inter-rater reliability was
information (?). Some items may not be higher for dimensional scores with ICCs
applicable to all interviewees and scored not ranging from 0.79 to 0.94 for the DSM-III-R
applicable. The IPDE also allows the inter- and 0.86 to 0.93 for the ICD-10. Temporal
viewer to rate each criterion based on infor- stability for dimensional scores was also high
mants (Loranger et al., 1995). The IPDE with ICCs ranging from 0.68 to 0.92 for DSM-
manual provides information on the scope III-R and from 0.65 to 0.86 for ICD-10.
and meaning of each criterion, and provides Pilkonis et al. (1995) also evaluated the
guidelines and anchor points for scoring. The reliability of the third version of the PDE.
manual does not recommend challenging the Intraclass correlations for total scores or cluster
interviewee on inconsistencies with informants scores ranged from 0.85 to 0.92. Inter-rater
during the interview, due to the potential threat agreement (kappa) for the presence or absence
to rapport. However, challenging discrepancies of any personality disorder was 0.55.
occurring within the interview is encouraged. Loranger et al. (1991) examined inter-rater
The IPDE may be hand scored or computer agreement and test±retest reliability of the PDE
scored (program provided by publishers). in a sample of psychiatric inpatients. Second
Hand-scored algorithms and summary sheets administrations of the PDE were conducted one
are provided to assist in determining categorical week to six months later by a separate
diagnoses and dimensional scores. interviewer blind to the initial assessment.
The IPDE developers recommend that only Inter-rater agreement between two raters was
those with the clinical experience and training to assessed at both the initial and repeated inter-
make psychiatric diagnoses independently use view. At the first interview kappas for inter-
the IPDE (Loranger et al., 1994). The IPDE rater reliability ranged from 0.81 to 0.92
manual strongly discourages the use of IPDE by (median = 0.87). At the repeat interview kappas
clinicians early in their training, research for inter-rater reliability ranged from 0.70 to
assistants, nurses, medical students, and grad- 0.95 (median = 0.88). At follow up there was a
uate students. In addition, the interviewer significant reduction in the number of criteria
should have familiarity with the DSM-IV and met on all disorders except schizoid and
ICD-10 classification systems (Loranger et al., antisocial. Stability of the presence or absence
1995). The test manual recommends the of any personality disorder was moderated with
following training: read the interview and a kappa of 0.55.
manual thoroughly, practice on several parti- O'Boyle and Self (1990) interviewed 20
cipants to get familiar with the interview, patients with a depressive disorder for a
administer with an IPDE-experienced inter- personality disorder. Eighteen patients were
viewer, and discuss any problems in adminis- re-interviewed across a mean of 63 days for the
tration or scoring. Before administering the presence or absence of personality disorder.
IPDE, the interviewer should have thorough Intraclass correlations were 0.89 to 1 and inter-
knowledge of the scope and meaning of each rater agreement (kappa) was 0.63. Depressive
criterion and scoring guidelines. IPDE training disorders did not consistently affect categorical
courses are offered at the WHO training centers. diagnoses, but dimensional scores were higher
during depressed periods.
4.05.3.2.1 Reliability
4.05.3.2.2 Summary
The IPDE field trial conducted in 11
countries and 14 centers evaluated inter-rater The IPDE has a number of strengths to
reliability in joint interviews as well as test± recommend its use. First, the IPDE (and PDE)
retest reliability with an average test±retest has demonstrated medium to high inter-rater
interval of six months (the test±retest interviews agreement and temporal reliability for both
were conducted by the same interviewer). categorical diagnoses and dimensional scores.
Results indicated overall weighted kappa for In addition, preliminary investigations into the
individual definite personality disorders to be influence of Axis I disorders (e.g. depression) on
0.57 for the DSM-III-R and 0.65 for the ICD-10 the assessment of personality disorders indicate
(Loranger et al., 1994). Median kappas for no significant influence on PDE categorical
definite or probable personality diagnoses were diagnoses. Second, a detailed training manual
0.73 for DSM-III-R and 0.77 for ICD-10. Using accompanies the interview, which provides
118 Structured Diagnostic Interview Schedules

thorough instructions and scoring algorithms. SCID-II with either a SCID-I or some other
Third, a unique feature of the IPDE is dual Axis I evaluation.
coverage of the DSM-IV and ICD-10 criteria. A self-report screening questionnaire is
Fourth, in addition to providing categorical provided to improve time efficiency. Each of
diagnoses, the IPDE measures dimensional the 113 items on the questionnaire corresponds
scores which provide information about accen- to a diagnostic criterion and is purposefully
tuated normal traits below the threshold constructed to have a low threshold for a
required for a personality disorder. Finally, positive response, and is therefore for screening
the IPDE is available in several languages and purposes only. Interviewers should probe all
has been studied in 11 countries. items coded ªyesº in the questionnaire. Under
The IPDE, while quite comprehensive, is most circumstances, the interviewer does not
flexible enough to permit more economical need to interview for the negatively endorsed
administration. The DSM-IV and ICD-10 criteria, due to the low probability of psycho-
modules can be administered separately. pathology. Negative questionnaire responses
Furthermore, rather than administer in the- should be followed up when either the inter-
matic organization, the IPDE can be limited to viewer suspects that the criterion or personality
diagnostic modules of interest. A self-adminis- disorder is actually present or when the number
tered screening questionnaire is available to of items endorsed positively in the interview is
assist in identifying personality disorders that only one item below that required for making a
might be of focus in the interview. diagnosis (in which case all questions for that
Inter-rater agreement between the SCID-II diagnosis should be probed).
and IPDE have led some researchers to Utilizing the screening questionnaire, the
conclude that the IPDE (and PDE) presents SCID-II interview can usually be administered
more stringent guidelines to fulfill personality in 30±45 minutes (First et al., 1995). First,
disorder criteria (Hyler, Skodol, Kellman, Spitzer, Gibbons, Williams, Davies, et al. (1995)
Oldhan, & Rosnik, 1990; Hyler, Skodol, Old- interviewed 103 psychiatric patients and 181
ham, Kellham, & Doldge, 1992; O'Boyle & Self, nonpatients, and calculated a mean interview
1990). This stringent determination is probably time of 36 minutes. A unique feature of the
due to the consistent five-year time period SCID-II is that the interview begins with a brief
requirement for personality traits. In conclu- overview which gathers information on beha-
sion, the specificity of the instrument is vior and relationships, and provides informa-
increased (fewer false positives) but this may tion about the interviewee's capacity for self-
be at the cost of decreased sensitivity (more false reflection. This allows the interviewer not only
negatives). to establish rapport, but also allows intervie-
wees to provide a description and consequences
of their behavior in their own words (Spitzer
4.05.3.3 Structured Clinical Interview for et al., 1990). Following the overview, the
DSM-IV Personality Disorders interview progresses through each relevant
disorder.
The Structured Clinical Interview for DSM- The format and structure of the SCID-II is
IV Personality Disorders (SCID-II) is a struc- very similar to that of the SCID for Axis I
tured interview that attempts to provide an disorders. Each page of the interview contains
assessment of the 11 DSM-III-R personality questions, reprinted DSM-IV criteria, guidelines
disorders, including the diagnosis of self- for scoring, and space for scores (Spitzer et al.,
defeating personality disorder, which is in- 1990). Initial questions are open-ended and
cluded in Appendix A of DSM-III-R (First, followed up with prompts for elaboration
Spitzer, Gibbon, & Williams, 1995). The SCID- and examples. If further clarification is needed,
II interview can be used to make categorical or the interviewer is encouraged to ask supplemen-
dimensional personality assessments (based on tary (their own) questions, give examples,
the number of items judged present). The SCID- present hypothetical situations, and challenge
II was developed as a supplementary module to inconsistencies (Spitzer et al., 1990). There are
the SCID-I, but was redesigned in 1985 to be a usually two to three interview questions for each
separate and autonomous instrument due to personality disorder criterion. In essence, the
different assessment procedures and length of interviewer is testing diagnostic hypotheses.
interview (First et al., 1995). The ratings are based not on the question
In conducting the SCID-II, it is extremely response, but an fulfillment of DSM-IV
important to evaluate the interviewee's behavior criteria. The interviewer is encouraged to use
out of the context of an Axis I disorder (Spitzer alternate sources of information to assist in
et al., 1990). The test developers recommend rating the criteria, such as observed behavior,
an evaluation of Axis I disorders prior to the medical records, and informants. With slight
Personality Disorders 119

modifications, the SCID-II can be administered health center outpatients and conducted SCID-
to an informant (First et al., 1995). Each II interviews. Inter-rater reliability was deter-
criterion is rated as one of the following: mined by comparing criteria scores between a
? = inadequate information, 1 = symptom primary interviewer and an observer. With the
clearly absent or criteria not met, 2 = subthres- exception of a few criteria, inter-rater reliability
hold condition that almost meets criteria, and for each criterion was good to excellent. Eighty-
3 = threshold for criteria met. four of 116 DSM-III-R criteria had ICCs higher
A rating of ª3º is scored only when the than 0.75, and 14 had reliability ranging from
interviewee provides convincing, elaborative, 0.60 to 0.75. Inter-rater reliability was not able
and/or exemplary information. A rating of ª3º to be rated for 12 criteria due to lack of variance.
is reserved only for criteria that fulfill the Inter-rater agreement for specific personality
following three guidelines: pathological (outside disorders was good with kappas ranging from
normal variation), pervasive (occurs in a variety 0.65 to 1.
of places and/or with a variety of people), and Several other studies report good to excellent
persistent (occurs with sufficient frequency over inter-rater reliability and agreement using the
at least a five-year period). Specific guidelines SCID-II (Brooks, Baltazar, McDowell, Mun-
for a ª3º rating are provided for each criterion jack, & Bruns, 1991; Fogelson, Nuechterlein,
in the body of the interview. Asarnow, Subotnik, & Talovic, 1991; Malow,
Due to the similarity in interview procedures, West, Williams, & Sutker, 1989; Renneberg,
SCID-II training procedures are almost iden- Chambless, & Gracely, 1992). However, these
tical to SCID-I training. As with the SCID-I, studies contained two or more of the following
clinical judgment is required in the administra- limitations: restricted number of personality
tion and scoring of the SCID-II and thus disorders, a homogeneous population and a
requires interviewers to have a full under- small number of participants.
standing of DSM nosology and experience in
diagnostic interviewing. A user's manual is
available for the SCID-II and demonstration
4.05.3.3.2 Summary
videotapes are available. Training workshops
can also be arranged with the developers. The SCID-II differs from other personality
interviews in several respects. Although other
interviews have a disorder-based format avail-
4.05.3.3.1 Reliability
able, only the SCID-II has this format as its
The test±retest reliability of the SCID-II was primary (and only) format of administration.
examined within an investigation of the relia- First et al. (1995) maintain that the grouping of
bility of the Axis I SCID (Williams et al., 1992). questions based on disorder may more closely
In this study, First, Spitzer, Gibbons, Williams, approximate clinical diagnostic practice and
Davies, et al. (1995) administered the SCID-II that this grouping forces interviewers to con-
to 103 psychiatric patients and 181 nonpatients. sider criteria in the context of the overarching
Two raters independently administered the theme of the disorder. One disadvantage is that
SCID-II between 1 and 14 days apart. Each the lack of a thematically organized format
SCID-II was preceded by an Axis I SCID limits an interviewer's choices, and some have
evaluation. The SCID-II Personality Question- raised concerns that disorder-based organiza-
naire was given only on the occasion of the first tion results in redundancy and repetition with
assessment (both SCID-II interviews used the similar items across different diagnoses. Also,
same questionnaire results). Overall weighted the organization of items by disorder may create
kappas were 0.53 and 0.38 for patients and ªhaloº effects where a positive criterion rating
nonpatients, respectively. For the patient sam- may bias an interviewer's rating of similar items.
ple, kappas were above 0.5 for avoidant, Although the SCID-II screening question-
antisocial, paranoid, histrionic, and passive- naire is unusual the IPDE now has a screening
aggressive personality disorders, and below 0.5 questionnaire as well (neither the SIDP-IV or
for dependent, self-defeating, narcissistic, bor- the PDI-IV use screening questionnaires). The
derline, and obsessive-compulsive personality SCID-II has shown reliability comparable to
disorders. For the nonpatient sample, kappas other interviews and has been used in a number
were above 0.5 for dependent, histrionic, and of studies. The shared format between the
borderline personality disorders, and below 0.5 SCID-II and the SCID for Axis I disorders
for avoidant, obsessive-compulsive, passive- should facilitate training on the two measures
aggressive, self-defeating, paranoid, and nar- and may ease the typical requirement that Axis I
cissistic personality disorders. disorders are assessed and taken into considera-
Using the Dutch version of the SCID-II, tion when conducting personality disorder
Arntz et al. (1992) randomly selected 70 mental examinations.
120 Structured Diagnostic Interview Schedules

4.05.3.4 Personality Disorder Interview-IV item, ratings and questions, and issues and
problems relevant to assessing that criterion.
The Personality Disorder Interview-IV (PDI- The PDI-IV does not include the use of a self-
IV; Widiger, Mangine, Corbitt, Ellis, & Thomas, report questionnaire. However, Widiger et al.
1995) is a semistructured interview developed to (1995) do recommend that a stand-alone self-
assess 10 DSM-IV personality disorders as well report inventory be used to assess personality.
as the two DSM-IV personality criteria sets Scores from such a questionnaire may then be
provided for further study (depressive person- used to inform selection of the most relevant
ality disorder, passive-aggressive personality personality disorders to assess on the PDI-IV.
disorder). The PDI-IV is the fourth edition of Widiger et al. (1995) suggest that the use of such
the Personality Interview Questionnaire (PIQ). self-report measures will serve the same purpose
The name change was based, in part, on the as screening questionnaires but also will provide
intent to provide a more descriptive title as the more information than measures simply de-
PDI focuses on the assessment of disordered signed for screening purposes.
personality. The PDI-IV manual indicates that, although
The PDI-IV provides questions for the lay interviewers can administer the PDI-IV,
assessment of the 94 diagnostic criteria that extensive training and supervision is required.
relate to the 12 DSM-IV personality disorders. Even then, it is recommended that final item
Criterion ratings are made on a three-point scale scoring be done by an experienced clinician.
(0 = not present, 1 = present according to Ideally, the PDI-IV should be administered and
DSM-IV criteria, 2 = present to more severe scored by a professional clinician with training
or substantial degree). Questions from the PDI- and experience in diagnosis and assessment. The
IV were selected as useful in determining PDI-IV manual outlines suggested training
criterion ratings and additional questions are beginning with study of the DSM-IV, articles
provided for further elaboration if time allows. on the diagnosis or assessment of personality
However, given the questionnaire's semistruc- disorders, and the PDI-IV manual and interview
tured format, the interviewer may deviate from booklets. Following discussion of this literature
questions to obtain further information or to it is recommended that trainees conduct pilot
address inconsistencies. It is suggested that all interviews with nonclinical subjects. Tapes of
positive responses be followed by a request for these initial interviews should be reviewed and
examples. feedback provided. It is then suggested that
The PDI-IV can be administered in a manner 5±10 patient interviews be conducted and taped
either organized by thematic content (as with for review. Continued taping and systematic
the IPDE and SIDP) or by DSM-IV diagnostic review of interviews is recommended to avoid
category (as with the SCID-II). Separate inter- interviewer drift.
view booklets are provided for these two forms
of administration. For occasions when all
4.05.3.4.1 Reliability
personality disorders will be assessed, it is
recommended that the thematic administration Inter-rater agreement for presence vs. absence
be used. Content areas in the thematic format of personality disorders ranges from 0.37
include Attitudes Towards Self, Attitudes (histrionic) to 0.81 (antisocial), with a median
Toward Others, Severity or Comfort with kappa of 0.65. Agreement for the number of
Others, Friendships and Relationships, Con- personality disorder criteria met ranges from
flicts and Disagreements, Work and Leisure, 0.74 (histrionic, narcissistic, and schizotypal) to
Social Norms, Mood, and Appearance and 0.90 (obsessive-compulsive and passive-aggres-
Perception. The diagnostic format may be sive) and 0.91 (sadistic). Median reliability for
preferable when only particular disorders must the number of PD criteria met was 0.84.
be assessed. Ratings can be used either to derive Although these data are generally encouraging
categorical or dimensional ratings for DSM-IV there are some concerns. Unfortunately, the
personality disorders. population on which these reliability data were
The PDI-IV comes with an extensive manual obtained is not specified nor are the methods for
that discusses general issues regarding adminis- determining rater agreement described. More
tration but also provides a thorough discussion detailed information may be available from the
of each personality disorder in separate chapters. unpublished dissertation from which these data
Each chapter provides an overview of the history are derived (Corbitt, 1994).
and rationale for the personality disorder
including discussion of the development of the
4.05.3.4.2 Summary
criterion in DSM as well as ICD and other
criterion sets. Each criterion item is discussed The PDI-IV is built upon the extensive history
with regard to revisions and rationale for each and experience derived from prior editions of
Child and Adolescent Disorders 121

this interview. The PDI-IV manual is one of the 12 months. Each time period is rated indepen-
more extensive, thorough, and informative dently and a summary score is made. Diagnostic
manuals available for the assessment of person- criteria are rated as present or absent, and then
ality disorders. The flexibility afforded by the rated on severity (Ambrosini, Metz, Prabucki,
choice of either thematic content format or & Lee, J., 1989). Ultimately, diagnoses are given
diagnostic category format is also attractive. based on clinical judgment (Hodges, McKnew,
Despite the accumulation of research on prior Burbach, & Roebuck, 1987).
versions of the PDI-IV, there is limited As with the SADS, the K-SADS requires
reliability data for the PDI-IV. However, the extensive training and experience in psychiatric
PDI-IV is the only personality interview that interviewing but has an added burden of
has reliability data available for the DSM-IV conducting interviews with adults (parent/
personality disorders. guardian) and children. Full familiarity with
DSM-III-R is required. Training typically
requires viewing videotapes and the conduct
4.05.4 CHILD AND ADOLESCENT of practice interviews with ongoing supervision.
DISORDERS
4.05.4.1 Schedule for Affective Disorders and 4.05.4.1.1 Reliability
Schizophrenia for School Age Children
Chambers et al. (1985) examined test±retest
The Schedule for Affective Disorders and reliability of the K-SADS administered to
Schizophrenia for School Age Children (K- children and parents. Test±retest reliability of
SADS; Puig-Antich & Chambers, 1978) is a major diagnoses was generally adequate with
semistructured interview designed for research kappas ranging from 0.54 to 0.74, with the
or clinical assessment by a trained clinician. The exception of anxiety disorder (K = 0.24). In-
K-SADS was developed as a child and adol- dividual symptoms and symptom scales gener-
escent version of SADS resulting from research ally showed adequate test±retest reliability with
in childhood depression. The K-SADS covers a anxiety-related symptoms showing the lowest
wide range of childhood disorders but has a reliability. Agreement between the parent and
strong emphasis on major affective disorders child interviews varied greatly, ranging from
(Roberts, Vargo, & Ferguson, 1989). The poor to excellent. This later finding suggests the
interview is intended to assess both past and nonredundant aspect of these two interviews.
current episodes of psychopathology in children Inter-rater reliability was examined in video-
aged 6±17 years old. taped K-SADS-III-R interviews by Ambrosini,
The K-SADS-III-R is compatible with DSM- et al. (1989). Inter-rater agreement among child,
III-R criteria. This version of the SADS parent, combined interview ratings, and across
provides 31 diagnoses within affective disorders time frames (present episode and last week)
(including depression, bipolar disorder, dysthy- ranged from acceptable (K = 0.53) to excellent
mia, and cyclothymia), eating disorders, anxiety (K = 1) for major depression, minor depression,
disorders, behavioral disorders (e.g., conduct overanxious disorder, simple phobia, separa-
disorder, substance abuse/dependence), psy- tion anxiety, oppositional, and attention deficit.
choses, and personality disorders (i.e., schizo- Of the 36 kappa values, 30 were 0.75 or higher.
typal and paranoid) Apter, Orvaschel, Laseg, Moses, and Tyano
The K-SADS is composed of three parts. It (1989) examined inter-rater and test±retest
begins with an unstructured interview that aims agreement in a sample of adolescent inpatients
to put the patient at ease and gather information (aged 11 to 18 years). Overall inter-rater and
regarding present problems, onset and duration test±retest agreement was high with kappas of
of problems, and treatment history. Following 0.78. Reliability of symptom scales was also
this general interview, the interviewer asks adequate with ICCs of 0.63±0.97 for inter-rater
questions relevant to specific symptoms and agreement and ICCs of 0.55±0.96 for test±retest
diagnostic criteria. Sample questions are pro- agreement. Diagnostic agreement between par-
vided only as a guideline, and modification is ent and child interviews (conducted by different
encouraged. If initial probe questions are clinicians for each informant) was generally low
negative, follow-up questions are skipped over. with an overall kappa of 0.42. Parent±child
At the conclusion of the interview, observa- agreement for symptom scales was particularly
tional items are rated (Roberts et al., 1989). low for anxiety symptoms.
The parent interview should be conducted
first, followed by the child interview. The child
4.05.4.1.2 Summary
and parent interview require approximately 90
minutes each. The K-SADS focuses on the last The K-SADS extensively covers the major
week and most intense symptoms over the last affective disorders and has adequate coverage of
122 Structured Diagnostic Interview Schedules

other childhood disorders. It has been one of the CAS, content scales generally had alphas
main diagnostic interviews available for use greater than 0.70 but low internal consistency
with children and adolescents. Reliability data was found for Activities and Reality Function-
are very positive for a number of disorders. ing. For the P-CAS, content scales with alphas
However, reliability data are largely for DSM- below 0.60 were Activities, Reality-testing
III diagnoses and limited data are available for Symptoms, Self-image, and Fears. Diagnoses
DSM-III-R diagnoses (no data are available for for DSM-III-R can be derived in addition to
DSM-IV). these scale scores. The CAS takes approxi-
mately 45 minutes to one hour to complete.
It is recommended that the CAS be adminis-
4.05.4.2 Child Assessment Schedule tered by trained clinicians (although lay inter-
viewers have been used; Hodges et al., 1987).
The Child Assessment Schedule (CAS; Guidelines for administering, scoring, and
Hodges, Kline, Stern, Cytryn, & McKnew, interpreting the CAS are contained in the
1982) is a structured interview that is unique in CAS manual (Hodges, 1985) and in guidelines
that it is modeled after traditional clinical established for achieving rater reliability
interviews with children. The interview is orga- (Hodges, 1983).
nized around thematic topics (e.g., school,
friends) with diagnostic items inserted within
4.05.4.2.1 Reliability
these topics. Structured questions are provided
in a format that is intended to develop rapport. In the initial rater reliability study, Hodges,
Hodges (1993) has noted that about half of the McKnew, Cytryn, Stern and Kline (1982)
CAS material related to clinical content does examined inter-rater agreement using video-
not reflect directly on diagnostic criteria. taped interviews. For symptom scores, mean
The CAS is organized into three parts. In the kappas were generally close to or exceeded 0.60.
first part 75 questions are organized into 11 Mean correlations across raters for content
content areas: school, friends, activities and areas was 0.63 or above with the exception of
hobbies, family, fears, worries, self-image, worries (0.59). For symptom complexes mean
mood, somatic concerns, expression of anger, correlations were 0.68 or above except for
and thought disorder symptoms. Items are rated attention deficit without hyperactivity (0.58),
true (presence of symptom), false (absence of separation anxiety (0.56), and socialized con-
symptom), ambiguous, no response, or not duct (0.44). Hodges et al. (1982) also report
applicable. In the second part the onset and inter-rater reliability on a small sample
duration of symptoms is assessed. In the third (N = 10). Correlations for items, content scores,
part of the CAS, following completion of the and symptom complex scores were all above
interview, 56 items are rated based on observa- 0.85. Verhust, Althaus, and Berden (1987) also
tions during the interview. These items include have reported inter-rater reliability for contents
the following areas: insight, grooming, motor scores using a small number (N = 10) of
coordination, activity level, other spontaneous videotaped interviews. Correlations for content
physical behaviors, estimate of cognitive ability, areas ranged from 0.70 to 0.97.
quality of verbal communications, quality of In the only test±retest reliability study of the
emotional expression, and impressions about CAS, Hodges, Cools, and McKnew (1989)
the quality of interpersonal interactions. A examined agreement over a mean of five days
parallel form of the CAS is available for with an inpatient sample. Intraclass correlations
administration to parents (P-CAS). The same indicated good reliability for the total CAS
inquiries are made, with parents being asked score and scale scores. Kappas for DSM-III
about the child. diagnoses of conduct disorder, depression, and
Quantitative scales can be obtained for a total anxiety were above 0.70. However, the kappa
score, scores for content areas and for symptom for attention deficit disorder was only 0.43.
complexes. The internal consistency of the scale The concordance between the CAS and the
scores has been examined and are generally K-SADS was examined by Hodges et al. (1987).
adequate with a few exceptions. Symptom scales Lay interviewers were used and agreement was
have been found to be internally consistent examined for both child and parent interviews
(Hodges, Saunders, Kashani, Hamlett, & for four major diagnostic categories (attention
Thompson, 1990), especially in a psychiatric deficit disorder, conduct disorders, anxiety
sample with some attenuation in medically ill disorders, and affective disorders). Only present
and community samples (particularly for anxi- episodes were evaluated. Child only diagnostic
ety symptoms). Hodges and Saunders (1989) concordance between the CAS and K-SADS
examined the internal consistency of content was poor for attention deficit disorder and
scales for both the CAS and P-CAS. For the anxiety disorders (kappas less than 0.40). Better
Child and Adolescent Disorders 123

agreement was obtained for parent only inter- administered to children and parents. The
views or in combinations of child and parent CAPA has four sections, three of which pertain
interviews. Anxiety disorders generally had low to the interview proper. The time period
concordance across informant conditions. addressed is the three months preceding the
The concordance between child and parent interview. In the Introduction, the interview is
interviews has also been examined with the conducted in a conversational manner in order
CAS. Verhulst et al. (1987) found low to to establish rapport. Questions within the
moderate correlations between parent- and Introduction address three areas: home and
child-derived content areas, somatic concerns, family life, school life, peer groups and spare
and observational judgments. Of 22 correla- time activities.
tions, only four exceeded 0.40. The total score The second section is the Symptom Review
correlation was 0.58, indicating approximately which has a disorder-based organization. A
34% shared variance. Hodges, Gordon, and wide range of disorders are covered including
Lennon (1990) also found low to moderate anxiety disorders, obsessive-compulsive disor-
correlations between parent and child interview ders, depressive disorders, manic disorders,
ratings. For diagnostic areas, the lowest somatization disorders, food-related disorders,
correlations (those below 0.30) were obtained sleep disorders, elimination disorders, tic dis-
for overanxious disorder, oppositional disor- orders and trichotillomania, disruptive beha-
der, and separation anxiety. Low correlations vior disorders, tobacco use, alcohol, psychotic
(again below 0.30) were also found in the disorders, life events and post-traumatic stress
content areas of fears, worries and anxieties, disorder (PTSD), and drugs. Due to problems in
and physical complaints. These data indicate child report with some disorders, only the
reasonable parent±child agreement for conduct/ parent interview assesses sleep terror disorder,
behavioral problems, moderate agreement for sleepwalking disorder, and attention deficit
affective symptoms, and low agreement for hyperactivity disorder. Alternatively, because
anxiety symptoms. As with other child assess- parents may be a poor source of information for
ment measures, the greatest parent±child agree- children's substance use, delusions, hallucina-
ment appears to be for observable behavior and tions, and thought disorder these items are
the lowest for subjective experiences (Hodges abbreviated in the parent interview with more
et al., 1990; Hodges, 1993). extensive coverage in the child interview.
The third section of the interview assesses
incapacity. At this point the interviewer reviews
4.05.4.2.2 Summary symptom information and questions about the
The CAS appears to provide a reliable effects of symptoms in 17 areas of psychosocial
assessment of a range of symptoms and shows impairment. Impairment is evaluated in the
reasonable convergence with noninterview three domains of home and family life, school
measures. It does not cover a number of life, peer groups and spare time activities.
disorders including sleep disorders, eating Finally, following the interview, observations
disorders, alcohol or drug use disorders, or of interview behavior are rated for 67 items.
mania. Although it provides a broad clinical These items cover level of activity, child's mood
assessment, some users may find that the state, quality of child's social interaction during
presence of many CAS items that do not reflect interview, and psychotic behavior.
directly on diagnostic criteria is inefficient. Detailed questions are provided for each
Inter-rater agreement for diagnoses studied interview item in the CAPA. There are three
appears adequate. However, only one small- levels of questions. Screening questions allow a
scale study has examined test±retest reliability skip-out of a section. If the screen question is
for a subset of diagnoses. No reliability data are positive two levels of probes are provided.
available for DSM-IV diagnoses. Emphasized probes are required and should be
asked for all subjects. Discretionary probes are
provided if further information is required. A
4.05.4.3 Child and Adolescent Psychiatric glossary is provided to be used in conjunction
Assessment with the standardized questions. The glossary
provides operational definitions of symptom
The Child and Adolescent Psychiatric Assess- items. These definitions were based on a review
ment (CAPA; Angold et al., 1995) was devel- of several of the existing clinical child inter-
oped in order to assess a wide range of views. The glossary also provides explicit rating
diagnostic classifications including DSM-III, principles including a formal definition of
DSM-III-R, ICD-9, and ICD-10. Additionally, the item, ratings of intensity (from 0, absent,
other symptoms of clinical interest are evalu- to 3, present at a higher intensity level),
ated. As with other interviews, the CAPA can be duration, length of time symptom is occurring,
124 Structured Diagnostic Interview Schedules

and psychosocial impairment related to the the reliability of other disorders as well as that of
symptom. parent interviews, and whether diagnostic
A wealth of information is obtained with the agreement is improved with both child and
CAPA. Fortunately, a computer program is parent administration.
available in order to summarize these data with
a series of diagnostic algorithms (the CAPA
Originated Diagnostic Algorithms; CODA). 4.05.4.4 Diagnostic Interview Schedule for
The CODA can generate diagnoses according Children
to DSM-III, DSM-III-R, DSM-IV, and ICD-10
systems as well as symptom scores for particular The Diagnostic Interview Schedule for Chil-
diagnostic areas. dren (DISC) is a highly structured interview
Angold et al. (1995) report that the CAPA has designed to assess most child and adolescent
been used with a variety of populations (both psychiatric disorders (Jensen et al., 1995) The
clinical and general population) in both the UK interview was introduced in 1982 as a child
and the USA. Training requires four to five version of the Diagnostic Interview Schedule
weeks with emphasis on practice administering (DIS). The DISC was intended to be adminis-
the CAPA and group ratings of tapes. Based on tered by lay interviewers and used for epide-
its use in multiple clinical centers explicit miological research (Shaffer et al., 1993). The
training criteria have been developed and details version current in the late 1990s, DISC-2.1
about the CAPA and its training requirements covers 35 diagnostic criteria for the DSM-III-R,
can be obtained from Angold. and contains a child (DISC-C) and parent
(DISC-P) interview. The DISC was designed for
children and adolescents ranging from 6 to 18
4.05.4.3.1 Reliability years old.
DISC interviewers are encouraged not to
Angold and Costello (1995) examined test± deviate from the order, wording, and scoring
retest reliability in a clinical sample. Interviews procedures. The child and parent interviews of
were conducted with children only and were the DISC-2.1 require approximately 60±75
completed within 11 days of each other. Kappas minutes each (Jensen et al., 1995). Questions,
for specific DSM-III-R diagnoses were all above organized into six separate diagnostic modules,
0.73, with the exception of conduct disorder inquire about current and past symptoms,
(K = 0.55). Agreement on symptom scales for behaviors, and emotions of most child and
these disorders was also high with ICCs above adolescent diagnoses. Diagnostic criteria are
0.60 except for conduct disorder (ICC = 0.50). initially assessed with a broad ªstem questionº
No reliability data were available for a number (with a low diagnostic threshold) and, if
of disorders covered in the CAPA including endorsed, followed with ªcontingent questionsº
obsessive-compulsive disorders, manic disor- to determine criteria requirements, duration,
ders, food-related disorders, sleep disorders, frequency, impairment, and other modifiers
elimination disorders, tic disorders, psychotic (Fisher et al., 1993). The DISC-2.1 focuses on
disorders, or life events and PTSD. the last six months and a graphic timeline is used
to assist in recall (Fisher et al., 1993; Jensen et al.,
1995). At the end of each module, supplemen-
4.05.4.3.2 Summary tary questions are provided to assess onset,
The CAPA appears to offer a thorough current impairment, treatment history, and
clinical evaluation that incorporates several precipitating stressors. Questions are rated as:
diagnostic criteria. It provides a broader ªno,º ªyes,º or ªsometimesº or ªsomewhat,º
assessment with more contemporary diagnostic and a computer algorithm generates diagnoses.
nosology than some other instruments. How- The DISC was specifically developed for use
ever, this breadth of assessment does come at a by lay interviewers in epidemiological research.
cost. The CAPA administered to the child alone Interviewer training typically takes one to two
can take one hour and coding can take another weeks. No differences in performance have been
45 minutes. The CAPA is not recommended for found between clinicians and lay interviewers
use with children under the age of eight. using the DISC-1 (Shaffer et al., 1993). A user's
Additionally, the CAPA is limited to the three manual for the DISC is available.
months preceding the interview. Although
reliability data are encouraging, these data are
4.05.4.4.1 Reliability
limited to child only interviews, are not
available for a number of disorders covered Jensen et al. (1995) examined test±retest
by the CAPA, and are not available for DSM- reliability in both a clinical and community
IV diagnoses. It will be important to determine sample across three sites. In the clinic sample,
Child and Adolescent Disorders 125

for major diagnostic categories, test±retest 4.05.4.4.2 Summary


agreement was adequate for parents (K
The DISC's design for epidemiological
range = 0.58±0.70) and was generally higher
research constitutes several advantages. First,
than that obtained for child interviews (K
the highly structured interview may be adminis-
range = 0.39±0.86). Using a combined diag-
tered by nonclinicians. Second, the DISC
nostic algorithm, test±retest agreement was
contains the full range of disorders. Finally,
adequate (K range = 0.50±0.71). Inter-rater
the DISC has been thought to contain a lower
agreement was lower for the community sample
threshold for disorders than other interviews,
with test±retest agreement lower for parents
which makes it ideal for screening and use in the
(K range = 0.66) and children (K range
general population (Roberts et al., 1989).
= 0.23±0.60). The combined diagnostic algo-
However, the design of the DISC has several
rithm for the community sample continued to
disadvantages. It may be too restrictive, at times
provide low agreement (K range = 0.26±0.64).
not allowing the interviewer to probe further
Instances of diagnostic disagreement in the
and adapt the interview to accommodate special
clinic sample appeared to be related to an
situations. The DISC has shown to be unreliable
absolute decrease in the number of symptoms at
among young children and is fairly long
the time of the second interview. Low reliability
requiring 60±75 minutes each for the two
in the community sample was attributed to
interviews. Research on the concordance be-
decreased symptom severity, the presence of
tween the DISC and clinical structured inter-
threshold cases, and other unknown factors.
views such as the K-SADS has not been
Other studies have generally found adequate
examined.
test±retest reliability for the DISC. One general
pattern that has emerged is greater agreement
when examining parent interviews. Schwab- 4.05.4.5 Diagnostic Interview for Children and
Stone et al. (1993) interviewed 41 adolescents Adolescents
(aged 11±17 years) and 39 parents twice (1±3
weeks apart) with the DISC-R. Inter-rater The Diagnostic Interview for Children and
agreement ranged from poor (K = 0.16) to Adolescents (DICA) is a highly structured
good (K = 0.77) for the child interviews, and interview designed to be used by lay interviewers
ranged from fair (K = 0.55) to excellent for clinical and epidemiological research. The
(K = 0.88) for the parent interviews. Schwab- interview assesses the present episode of a wide
Stone, Fallon, Briggs, and Crowther (1994) range of psychopathology among children aged
interviewed 109 preadolescents (aged 6±11 6±17 years (Roberts et al., 1989). The interview
years) and their parents twice (7±18 days apart) initially appeared in 1969, and was revised in
with the DISC-R. Inter-rater agreement ranged 1981 to emulate the organization of the DIS and
from poor (K = 0) to fair (K = 0.56) for the based on DSM-III criteria (Welner, Reich,
child interviews, and from poor (K = 0.43) to Herjanic, Jung, & Amado, 1987). The DICA
excellent (K = 0.81) for the parent interviews. was subsequently revised to conform to DSM-
Based on the lower inter-rater agreement for III-R diagnoses (DICA-R; Kaplan & Reich,
preadolescents, Schwab-Stone et al. (1994) 1991). In addition to coverage of DSM-III-R,
concluded that highly structured interviews the DICA-R was also revised so that questions
were not appropriate for directly assessing were presented in a more conversational style.
young children due to lower endorsement of The DICA-R is organized into 15 sections.
symptoms and unreliable reporting within the Two sections cover general and sociodemo-
interview. graphic information, 11 sections relate to
Most of the reliability studies on the DISC disorders and symptomatology, and remaining
have examined only the most common child- sections address menstruation, psychosocial
hood diagnoses and little information is avail- stressors, and clinical observations. The DICA
able on uncommon diagnoses. From a clinical consists of a separate parent (DICA-P) and
sample of relatively uncommon diagnoses, child (DICA-C) interview. The child interview
Fisher et al. (1993) interviewed 93 children requires 30±40 minutes to administer, while the
(aged 8±19 years) and 75 parents with the DISC- parent interview takes longer due to additional
2.1. Using the clinic diagnosis as a standard, the questions on developmental history, medical
DISC-2.1 had good (0.73) to excellent (1) history, socioeconomic status, and family
sensitivity in identifying eating disorders, major history (Roberts et al., 1989). For each
depressive disorders, obsessive-compulsive diagnostic criterion, one or more questions
disorder, psychosis, substance use disorders, elicit information. Follow-up questions are
and tic disorders. The DISC-2.1 was noted to skipped if primary questions are responded
;be less sensitive for major depressive disorder negatively. Responses on the DICA-R are
than other interviews (K-SADS, DICA, CAS). coded on a four-point scale: ªno,º ªrarely,º
126 Structured Diagnostic Interview Schedules

ªsometimesº or ªsomewhat,º and ªyes.º Fol- 4.05.6 REFERENCES


lowing each diagnostic section, specific DSM
Ambrosini, P. J., Metz, C., Prabucki, K., & Lee, J. (1989).
criteria are listed to assist in deriving diagnoses Videotape reliability of the third revised edition of the K-
(Welner et al., 1987). SADS. Journal of the American Academy of Child and
Adolescent Psychiatry, 28, 723±728.
American Psychiatric Association (1980). Diagnostic and
4.05.4.5.1 Reliability statistical manual of mental disorders (3rd ed.). Washing-
ton, DC: Author.
Limited data on the inter-rater agreement of Andreasen, N. C. (1983). The Scale for the Assessment of
the DICA is available. Only one study has Negative Symptoms (SANS). Iowa City, IA: The
University of Iowa.
provided data pertaining to individual diag- Andreasen, N. C. (1984). The scale for the assessment of
noses with an adequate description of sample positive symptoms (SAPS). Iowa City, IA: The Uni-
and methods. Welner et al. (1987) administered versity of Iowa.
two independent interviews (1±7 days apart) to Andreasen, N. C. (1987). Comprehensive assessment of
27 psychiatric inpatients (aged 7±17 years). symptoms and history. Iowa City, IA: The University of
Iowa.
Using lay interviewers, inter-rater agreement Andreasen, N. C., Flaum, M., & Arndt, S. (1992). The
was excellent across diagnostic categories (K comprehensive assessment of symptoms and history
range = 0.76±1, median = 0.86). (CASH): An instrument for assessing diagnosis and
Similar to other interviews, diagnoses derived psychopathology. Archives of General Psychiatry, 49,
615±623.
from the parent and child interviews vary. Andreasen, N. C., Grove, W. M., Shapiro, R. W., Keller,
Welner et al. (1987) examined concordance M. B., Hirschfeld, R. M. A., & McDonald-Scott, P.
between child and parent interviews among 84 (1981). Reliability of lifetime diagnoses: A multicenter
outpatients (ages 7±17 years). Fair to excellent collaborative perspective. Archives of General Psychiatry,
concordance was noted (K range = 0.49±0.80, 38, 400±405.
Angold, A., & Costello, E. J., (1995). A test±retest study of
median = 0.63). However, other studies have child-reported psychiatric symptoms and diagnoses using
found more modest concordance between the Child and Adolescent Psychiatric Assessment
parent and child interviews with median kappas (CAPA-C). Psychological Medicine, 25, 755±762.
below 0.30 (Earls, Reich, Jung, & Cloninger, Angold, A., Prendergast, M., Cox, A., Harrington, R.,
1988; Sylvester, Hyde, & Reichler, 1987). Simonoff, E., & Rutter, M. (1995). The Child and
Adolescent Psychiatric Assessment (CAPA). Psychologi-
cal Medicine, 25, 739±753.
Anthony, J. C., Folstein, M., Romanoski, A. J., Von Korff,
4.05.4.5.2 Summary M. R., Nestadt, G. R., Chahal, R., Merchant, A.,
Brown, H., Shapiro, S., Kramer, M., & Gruenberg, E.
The DICA-R appears to be a well-developed M. (1985). Comparison of the lay Diagnostic Interview
instrument that has taken special care in the Schedule and a standardized psychiatric diagnosis:
writing and sequencing of questions. Although Experience in eastern Baltimore. Archives of General
the DICA has been used in a number of studies, Psychiatry, 42, 667±675.
Apter, A., Orvaschel, H., Laseg, M., Moses, T., & Tyano,
only a limited amount of reliability information S. (1989). Psychometric properties of the K-SADS-P in
is available. No reliability information is an Israeli adolescent inpatient population. Journal of the
available for the DICA-R. Other child and American Academy of Child and Adolescent Psychiatry,
adolescent interviews may be more attractive 28, 61±65.
because of the relatively greater inter-rater Arntz, A., van Beijsterveldt, B., Hoekstra, R., Hofman, A.,
Eussen, M., & Sallaerts, S. (1992). The inter-rater
reliability information. reliability of a Dutch version of the Structured Clinical
Interview for DSM-III-R personality disorders. Acta
Psychiatrica Scandinavica, 85, 394±400.
4.05.5 SUMMARY Beck, A. T., Ward, C. H., Mendelson, M., Mock, J. E., &
Erbaugh, J. K. (1962). Reliability of psychiatric diag-
There has been an enormous amount of noses: 2. A study of consistency of clinical judgments and
research conducted on the development and use ratings. American Journal of Psychiatry, 119, 351±357.
of structured clinical interviews since the late Bromet, E. J., Dunn, L. O., Connell, M. M., Dew, M. A., &
Schulberg, H. C. (1986). Long-term reliability of
1960s. This research has yielded diagnostic diagnosing lifetime major depression in a community
interviews that address an array of clinical sample. Archives of General Psychiatry, 43, 435±440.
diagnoses in both adult and child populations. Brooks, R. B., Baltazar, P. L., McDowell, D. E., Munjack,
The use of structured interviews can not only D. J., & Bruns, J. R. (1991). Personality disorders co-
occurring with panic disorder with agoraphobia. Journal
provide reliable diagnostic evaluations but can of Personality Disorders, 5, 328±336.
also serve to ensure a broad and thorough Chambers, W. J., Puig-Antich, J., Hirsch, M., Paez, P.,
clinical assessment. Although most readily Ambrosini, P. J., Tabrizi, M. A., & Davies, M. (1985).
embraced in research settings, it is anticipated The assessment of affective disorders in children and
(and hoped) that structured diagnostic inter- adolescents by semistructured interview. Archives of
General Psychiatry, 42, 696±702.
views will become more commonplace in clinical Cohen, J. (1960). A coefficient of agreement for nominal
applied settings as part of the standard assess- scales. Educational and Psychological Measurement, 20,
ment tools that clinicians use regularly. 37±46.
References 127

Cooper, J. E., Copeland, J. R. M., Brown, G. W., Harris, of children and adolescents. Journal of the American
T., & Gourlay, A. J. (1977). Further studies on Academy of Child and Adolescent Psychiatry, 32,
interviewer training and inter-rater reliability of the 666±673.
Present State Exam (PSE). Psychological Medicine, 7, Fogelson, D. L., Nuechertlein, K. H., Asarnow, R. F.,
517±523. Subotnik, K. L., & Talovic, S. A. (1991). Inter-rater
Cooper, J. E., Kendell, R. E., Gurland, B. J., Sharpe, L., reliability of the Structured Clinical Interview for
Copeland, J. R. M., & Simon, R. (1972). Psychiatric DSM-III-R, Axis II: schizophrenia spectrum and
diagnosis in New York and London. Maudsley mono- affective spectrum disorders. Psychiatry Research, 39,
graphs. London: Oxford University Press. 55±63.
Corbitt, E. M. (1994). Sex bias and the personality Folstein, M. F., Folstein, S. E., & McHugh, P. (1975).
disorders: A reinterpretation from the five-factor model. ªMini Mental Stateº: A practical method for grading the
Unpublished doctoral dissertation, University of Ken- cognitive state of patients for the clinician. Journal of
tucky, Lexington. Psychiatric Research, 12, 189±198.
Cottler, L. B. (1991). The CIDI and CIDI-Substance Ford, J., Hillard, J. R., Giesler, L. J., Lassen, K. L., &
Abuse Module (SAM): Cross-cultural instruments for Thomas, H. (1989). Substance abuse/mental illness:
assessing DSM-III, DSM-III-R and ICD-10 criteria. Diagnostic issues. American Journal of Drug and Alcohol
Research Monographs, 105, 220±226. Abuse, 15, 297±307.
Earls, R., Reich, W., Jung, K. G., & Cloninger, C. R. Fyer, A. J., Mannuzza, S., Martin, L. Y., Gallops, M. S.,
(1988). Psychopathology in children of alcoholic and Endicott, J., Schleyer, B., Gorman, J. M., Liebowitz, M.
antisocial parents. Alcoholism: Clinical and Experimental R., & Klein, D. F. (1989). Reliability of anxiety
Research, 12, 481±487. assessment, II: Symptom assessment. Archives of General
Endicott, J., & Spitzer, R. L. (1978). A diagnostic Psychiatry, 46, 1102±1110.
interview: The Schedule for Affective Disorders and Grove, W. M., Andreasen, N. C., McDonald-Scott, P.,
Schizophrenia. Archives of General Psychiatry, 35, Keller, M. B., & Shapiro, R. W. (1981). Reliability
837±844. studies of psychiatric diagnosis: Theory and practice.
Endicott, J., Spitzer, R. L., Fleiss, J. L., & Cohen, J. (1976). Archives of General Psychiatry, 38, 408±413.
The Global Assessment Scale: A procedure for measur- Hamilton, M. (1960). A rating scale for depression. Journal
ing overall severity of psychiatric disturbance. Archives of Neurology, Neurosurgery, and Psychiatry, 23, 56±62.
of General Psychiatry, 33, 766±771. Helzer, J. E., Robins, L. N., McEvoy, L. T., Spitznagel, E.
Erdman, H. P., Klein, M. H., Greist, J. H., Bass, S. M., L., Stolzman, R. K., Farmer, A., & Brockington, I. F.
Bires, J. K., & Machtinger, P. E. (1987). A comparison (1985). A comparison of clinical and diagnostic interview
of the Diagnostic Interview Schedule and clinical schedule diagnoses: Physician reexamination of lay-
diagnosis. American Journal of Psychiatry, 144, interviewed cases in the general population. Archives of
1477±1480. General Psychiatry, 42, 657±666.
Escobar, J. I., Randolph, E. T., Asamen, J., & Karno, M. Hodges, K. (1983). Guidelines to aid in establishing inter-
(1986). The NIMH-DIS in the assessment of DSM-III rater reliability with the Child Assessment Schedule.
schizophrenic disorder. Schizophrenia Bulletin, 12, Unpublished manuscript.
187±194. Hodges, K. (1985). Manual for the Child Assessment
Faraone, S. V., Blehar, M., Pepple, J., Moldin, S. O., Schedule. Unpublished manuscript.
Norton, J., Nurnberger, J. I., Malaspina, D., Kaufman, Hodges, K. (1993). Structured interviews for assessing
C. A., Reich, T., Cloning, C. R., DePaulo, J. R., Berg, children. Journal of Child Psychology and Psychiatry, 34,
K., Gershon, E. S., Kirch, D. G., & Tsuang, M. T. 49±68.
(1996). Diagnostic accuracy and confusability analyses: Hodges, K., Cools, J., & McKnew, D. (1989). Test±retest
An application to the Diagnostic Interview for Genetic reliability of a clinical research interview for children:
Studies. Psychological Medicine, 26, 401±410. The Child Assessment Schedule (CAS). Psychological
Farmer, A. E., Katz, R., McGuffin, P., & Bebbington, P. Assessment: Journal of Consulting and Clinical Psychol-
(1987). A comparison between the Present State Exam- ogy, 1, 317±322.
ination and the Composite International Interview. Hodges, K., Gordon, Y., & Lennon, M. P. (1990).
Archives of General Psychiatry, 44, 1064±1068. Parent±child agreement on symptoms assessed via a
Feighner, J. P., Robins, E., Guze, S. B., Woodruff, R. A., clinical research interview for children: The Child
Winokur, G., & Munoz, R. (1972). Diagnostic criteria Assessment Schedule (CAS). Journal of Child Psychology
for use in psychiatric research. Archives of General and Psychiatry, 31, 427±436.
Psychiatry, 26, 57±63. Hodges, K., Kline, J., Stern, L., Cytryn, L., & McKnew, D.
First, M. B., Gibbon, M., Spitzer, R. L., & Williams, J. B. (1982). The development of a child assessment interview
W. (1996). User's guide for the Structured Clinical for research and clinical use. Journal of Abnormal Child
Interview for DSM-IV Axis I Disorders-Research Version Psychology, 10, 173±189.
(SCID-I, version 2.0, February 1996 Final version). New Hodges, K., McKnew, D., Burbach, D. J., & Roebuck, L.
York: Biometrics Research Department, New York (1987). Diagnostic concordance between the Child
State Psychiatric Institute. Assessment Schedule (CAS) and the Schedule for
First, M. B., Spitzer, R. L., Gibbon, M., & Williams, J. B. Affective Disorders and Schizophrenia for School-age
W. (1995). The Structured Clinical Interview for DSM- Children (K-SADS) in an outpatient sample using lay
III-R Personality Disorders (SCID-II). Part I: Descrip- interviewers. Journal of the American Academy of Child
tion. Journal of Personality Disorders, 9, 83±91. and Adolescent Psychiatry, 26, 654±661.
First, M. B., Spitzer, R. L., Gibbon, M., Williams, J. B. W., Hodges, K., McKnew, D., Cytryn, L., Stern, L., & Kline, J.
Davies, M., Borus, J., Howes, M. J., Kane, J., Pope, H. (1982). The Child Assessment Schedule (CAS) diagnostic
G., & Rounsaville, B. (1995). The Structured Clinical interview: A report on reliability and validity. Journal of
Interview for DSM-III-R Personality Disorders (SCID- the American Academy of Child Psychiatry, 21, 468±473.
II). Part II: Multi-site test±retest reliability study. Journal Hodges, K., & Saunders, W. (1989). Internal consistency of
of Personality Disorders, 9, 92±104. a diagnostic interview for children: The Child Assess-
Fisher, P. W., Shaffer, D., Piacentini, J. C., Lapkin, J., ment Schedule. Journal of Abnormal Child Psychology,
Kafantaris, V., Leonard, H., & Herzog, D. B. (1993). 17, 691±701.
Sensitivity of the Diagnostic Interview Schedule for Hodges, K., Saunders, W. B., Kashani, J., Hamlett, K., &
Children, 2nd Edition (DISC-2.1) for specific diagnoses Thompson, R. J. (1990). Journal of the American
128 Structured Diagnostic Interview Schedules

Academy of Child and Adolescent Psychiatry, 29, F. (1989). Reliability of anxiety assessment, I: Diagnostic
635±641. agreement. Archives of General Psychiatry, 46,
Hyler, S. E., Skodol, A. E., Kellman, H. D., Oldham, J. M., 1093±1101.
& Rosnik, L. (1990). Validity of the Personality Matarazzo, J. D. (1983). The reliability of psychiatric and
Diagnostic Questionnaire-Revised: Comparison with psychological diagnosis. Clinical Psychology Review, 3,
two structured interviews. American Journal of Psychia- 103±145.
try, 147, 1043±1048. McDonald-Scott, P., & Endicott, J. (1984). Informed
Hyler, S. E., Skodol, A. E., Oldham, J. M., Kellman, D. H., versus blind: The reliability of cross-sectional ratings of
& Doldge, N. (1992). Validity of the Personality psychopathology. Psychiatry Research, 12, 207±217.
Diagnostic Questionnaire-Revised: A replication in an McGuffin, P., & Farmer, A. E. (1990). Operational Criteria
outpatient sample. Comprehensive Psychiatry, 33, 73±77. (OPCRIT) Checklist. Version 3.0. Cardiff, UK: Uni-
Jackson, H. J., Gazis, J., Rudd, R. P., & Edwards, J. versity of Wales.
(1991). Concordance between two personality disorder McGuffin, P., Katz, R., & Aldrich, J. (1986). Past and
instruments with psychiatric inpatients. Comprehensive Present State Examination: The assessment of ªlifetime
Psychiatry, 32, 252±260. everº psychopathology. Psychological Medicine, 16,
Janca, A., Robins, L. N., Cottler, L. B., & Early, T. S. 461±465.
(1992). Clinical observation of assessment using the Nurnberger, J. I., Blehar, M. C., Kaufman, C. A., York-
Composite International Diagnostic Interview (CIDI): Cooler, C., Simpson, S. G., Harkavy-Friedman, J.,
An analysis of the CIDI field trialsÐWave II at the St Severe, J. B., Malaspina, D., Reich, T., & collaborators
Louis Site. British Journal of Psychiatry, 160, 815±818. from the NIMH Genetics Initiative (1994). Diagnostic
Jensen, P., Roper, M., Fisher, P., Piacentini, J., Canino, G., Interview for Genetic Studies: Rationale, unique fea-
Richters, J., Rubio-Stipec, M., Dulcan, M., Goodman, tures, and training. Archives of General Psychiatry, 51,
S., Davies, M., Rae, D., Shaffer, D., Bird, H., Lahey, B., 849±859.
& Schwab-Stone, M. (1995). Test±retest reliability of the O'Boyle, M, & Self, D. (1990). A comparison of two
Diagnostic Interview Schedule for Children (DISC 2.1). interviews for DSM-III-R personality disorders. Psy-
Archives of General Psychiatry, 52, 61±71. chiatry Research, 32, 85±92.
Kaplan, L. M., & Reich, W. (1991). Manual for Diagnostic Okasha, A., Sadek, A., Al-Haddad, M. K., & Abdel-
Interview for Children and Adolescents-Revised (DICA- Mawgoud, M. (1993). Diagnostic agreement in psychia-
R). St Louis, MO: Washington University. try: A comparative study between ICD-9, ICD-10, and
Keller, M. B., Lavori, P. W., McDonald-Scott, P., DSM-III-R. British Journal of Psychiatry, 162, 621±626.
Scheftner, W. A., Andreasen, N. C., Shapiro, R. W., & Overall, J., & Gorham, D. (1962). Brief Psychiatric Rating
Croughan, J. (1981). Reliability of lifetime diagnoses and Scale. Psychological Reports, 10, 799±812.
symptoms in patients with a current psychiatric disorder. Page, A. C. (1991). An assessment of structured diagnostic
Journal of Psychiatric Research, 16, 229±240. interviews for adult anxiety disorders. International
Kendell, R. E., Everitt, B., Cooper, J. E., Sartorius, N., & Review of Psychiatry, 3, 265±278.
David, M. E. (1968). Reliability of the Present State Pfohl, B., Black, D. W., Noyes, R., Coryell, W. H., &
Examination. Social Psychiatry, 3, 123±129. Barrash, J. (1990). Axis I/Axis II comorbidity findings:
Landis, J. R., & Koch, G. G. (1977). The measurement of Implications for validity. In J. Oldham (Ed.), Axis II:
observer agreement for categorical data. Biometrics, 33, New perspectives on validity (pp. 147±161). Washington,
159±174. DC: American Psychiatric Association.
Loranger, A. W., Andreoli, A., Berger, P., Buchheim, P., Pfohl, B., Blum, N., & Zimmerman, M. (1995). The
Channabasavanna, S. M., Coid, B., Dahl, A., Diekstra, Structured Interview for DSM-IV Personality Disorders
R. F. W., Ferguson, B., Jacobsberg, L. B., Janca, A., (SIDP-IV). Iowa City, IA: University of Iowa College
Mombour, W., Pull, C., Ono, Y., Regier, D. A., of Medicine.
Sartorius, N., & Sumba R. O. (1995). The International Pfohl, B., Blum, N., Zimmerman, M., & Stangl, D. (1989).
Personality Disorder Examination (IPDE) manual. New Structured Interview for DSM-III-R Personality Disor-
York: World Health Organization. ders (SIDP-R). Iowa City, IA: University of Iowa
Loranger, A. W., Lenzenweger, M. F., Gartner, A. F., College of Medicine.
Susman, V. L., Herzig, J., Zammit, G. K., Gartner, J. D., Pilkonis, P. A., Heape, C. L., Proietti, J. M., Clark, S. W.,
Abrams, R. C., & Young, R. C. (1991). Trait-state McDavid, J. D., & Pitts, T. E. (1995). The reliability and
artifacts and the diagnosis of personality disorders. validity of two structured diagnostic interviews for
Archives of General Psychiatry, 48, 720±728. personality disorders. Archives of General Psychiatry,
Loranger, A. W., Sartorius, N., Andreoli, A., Berger, P., 52, 1025±1033.
Buchheim, P., Channabasavanna, S. M., Coid, B., Dahl, Puig-Antich, J., & Chambers, W. J. (1978). Schedule for
A., Diekstra, R. F. W., Ferguson, B., Jacobsberg, L. B., Affective Disorders and Schizophrenia for School-age
Mombour, W., Pull, C., Ono, Y., & Regier, D. A. (1994). Children: Kiddie SADS (K-SADS). New York: Depart-
The international personality disorder examination. ment of Child and Adolescent Psychiatry, New York
Archives of General Psychiatry, 51, 215±224. State Psychiatric Institute.
Luria, R. E., & Berry, R. (1979). Reliability and descriptive Renneberg, B., Chambless, D. L., & Gracely, E. J. (1992).
validity of PSE syndromes. Archives of General Psychia- Prevalence of SCID-diagnosed personality disorders in
try, 36, 1187±1195. agoraphobic outpatients. Journal of Anxiety Disorders, 6,
Luria, R. E., & Berry, R. (1980). Teaching the Present State 111±118.
Examination in American. American Journal of Psychia- Rice, J. P., Rochberg, N., Endicott, J., Lavori, P. W., &
try, 137, 26±31. Miller, C. (1992). Stability of psychiatric diagnoses: An
Luria, R. E., & McHugh, P. R. (1974). Reliability and application to the affective disorders. Archives of General
clinical utility of the ªWingº Present State Examination. Psychiatry, 49, 824±830.
Archives of General Psychiatry, 30, 866±971. Roberts, N., Vargo, B., & Ferguson, H. B. (1989).
Malow, R. M., West, J. A., Williams, J. L., Sutker P. B. Measurement of anxiety and depression in children and
(1989). Personality disorders classification and symp- adolescents. Psychiatric Clinics of North America, 12,
toms in cocaine and opioid addicts. Journal of Consulting 837±860.
and Clinical Psychology, 57, 765±767. Robins, L. N., Helzer, J. E., Croughan, J., Ratcliff, K. S.
Mannuzza, S., Fyer, A. J., Martin, L. Y., Gallops, M. S., (1981). National Institute of Mental Health Diagnostic
Endicott, J., Gorman, J., Liebowitz, M. R., & Klein, D. Interview Schedule: Its history, characteristics, and
References 129

validity. Archives of General Psychiatry, 38, 381±389. Stangl, D., Pfohl, B., Zimmerman, M., Bowers, W., &
Robins, L. N., Helzer, J. E., Ratcliff, K. S., & Seyfried, W. Corenthal, C. (1985). A structured interview for the
(1982). Validity of the diagnostic interview schedule, DSM-III personality disorders. Archives of General
version II: DSM-III diagnoses. Psychological Medicine, Psychiatry, 42, 591±596.
12, 855±870. Strober, M., Green, J., & Carlson, G. (1981). Reliability of
Robins, L. N., Wing, J., Wittchen, H.-U., Helzer, J. E., psychiatric diagnosis in hospitalized adolescents: inter-
Babor, T. F., Burke, J., Farmer, A., Jablenski, A., rater agreement using DSM-III. Archives of General
Pickens, R., Regier, D. A., Sartorius, N., & Towle, L. H. Psychiatry, 38, 141±145.
(1988). The Composite International Diagnostic Inter- Sylvester, C., Hyde, T., & Reichler, R. (1987). The
view: An epidemiological instrument suitable for use in Diagnostic Interview for Children and Personality
conjunction with different diagnostic systems and in Interview for Children in studies of children at risk for
different cultures. Archives of General Psychiatry, 45, anxiety disorders or depression. Journal of the American
1069±1077. Academy of Child and Adolescent Psychiatry, 26,
Rodgers, B., & Mann, S. (1986). The reliability and validity 668±675.
of PSE assessments by lay interviewers: A national Tress, K. H., Bellenis, C., Brownlow, J. M., Livingston, G.,
population survey. Psychological Medicine, 16, 689±700. & Leff, J. P. (1987). The Present State Examination
Schwab-Stone, M., Fallon, T., Briggs, M., & Crowther, B. change rating scale. British Journal of Psychiatry, 150,
(1994). Reliability of diagnostic reporting for children 201±207.
aged 6±11 years: A test±retest study of the Diagnostic Verhulst, F. C., Althaus, M., & Berden, G. F. M. G.
Interview Schedule for Children-Revised. American (1987). The Child Assessment Schedule: Parent±child
Journal of Psychiatry, 151, 1048±1054. agreement and validity measures. Journal of Child
Schwab-Stone, M., Fisher, P., Piacentini, J., Shaffer, D., Psychology and Psychiatry, 28, 455±466.
Davies, M., & Briggs, M. (1993). The Diagnostic Ward, C. H., Beck, A. T., Mendelson, M., Mock, J. E., &
Interview Schedule for Children-Revised version (DISC- Erbaugh, J. K. (1962). The psychiatric nomenclature:
R): II. Test±retest reliability. Journal of the American Reasons for diagnostic disagreement. Archives of General
Academy of Child and Adolescent Psychiatry, 32, Psychiatry, 7, 198±205.
651±657. Welner, Z., Reich, W., Herjanic, B., Jung, K. G., &
Segal, D. L., Hersen, M., & Van Hasselt, V. B. (1994). Amado, H. (1987). Reliability, validity, and parent±child
Reliability of the structured clinical interview for DSM- agreement studies of the Diagnostic Interview for
III-R: An evaluative review. Comprehensive Psychiatry, Children and Adolescents (DICA). Journal of the
35, 316±327. American Academy of Child and Adolescent Psychiatry,
Shaffer, D., Schwab-Stone, M., Fisher, P., Cohen, P., 26, 649±653.
Piacentini, J., Davies, M., Connors, C. K., & Regier, D. Widiger, T. A., Mangine, S., Corbitt, E. M., Ellis, C. G., &
(1993). The Diagnostic Interview Schedule for Children- Thomas, G. V. (1995). Personality Disorder Interview-IV:
Revised version (DISC-R): I. Preparation, field testing, A semistructured interview for the assessment of person-
inter-rater reliability, and acceptability. Journal of the ality disorders. Odessa, FL: Psychological Assessment
American Academy of Child and Adolescent Psychiatry, Resources.
32, 643±650. Williams, J. B. W., Gibbon, M., First, M. B., Spitzer, R. L.,
Shrout, P. E., Spitzer, R. L., & Fleiss, J. L. (1987). Davies, M., Borus, J., Howes, M. J., Kane, J., Pope, Jr.,
Quantification of agreement in psychiatric diagnosis H. G., Rounsaville, B., & Wittchen, H.-U. (1992). The
revisited. Archives of General Psychiatry, 44, 172±177. Structured Clinical Interview for DSM-III-R (SCID). II:
Spengler, P. A., & Wittchen, H. -U. (1988). Procedural Multisite test±retest reliability. Archives of General
validity of standardized symptom questions for the Psychiatry, 49, 630±636.
assessment of psychotic symptoms: A comparison of Wing, J. K. (1983). Use and misuse of the PSE. British
the DIS with two clinical methods. Comprehensive Journal of Psychiatry, 143, 111±117.
Psychiatry, 29, 309±322. Wing, J. K., Babor, T., Brugha, T., Burke, J., Cooper, J.
Spitzer, R. L. (1983). Psychiatric diagnosis: Are clinicians E., Giel, R., Jablenski, A., Regier, D., & Sartorius, N.
still necessary? Comprehensive Psychiatry, 24, 399±411. (1990). SCAN: Schedules for Clinical Assessment in
Spitzer, R. L., Cohen, J., Fleiss, J. L., & Endicott, J. (1967). Neuropsychiatry. Archives of General Psychiatry, 47,
Quantification of agreement in psychiatric diagnosis: A 589±593.
new approach. Archives of General Psychiatry, 17, 83±87. Wing, J. K., Birley, J. L. T., Cooper, J. E., Graham, P., &
Spitzer, R. L., Endicott, J., & Robins, E. (1978). Research Isaacs, A. (1967). Reliability of a procedure for
Diagnostic Criteria: Rationale and reliability. Archives of measuring and classifying present psychiatric state.
General Psychiatry, 35, 773±782. British Journal of Psychiatry, 113, 499±515.
Spitzer, R. L., & Fleiss, J. L. (1974). A re-analysis of the Wing, J. K., Cooper, J. E., & Sartorius, N. (1974). The
reliability of psychiatric diagnosis. British Journal of measurement and classification of psychiatric symptoms.
Psychiatry, 125, 341±347. London: Cambridge University Press.
Spitzer, R. L., Fleiss, J. L., & Endicott, J. (1978). Problems Wing, J. K., Nixon, J. M., Mann, S. A., & Leff, J. P.
of classification: Reliability and validity. In M. A. (1977). Reliability of the PSE (ninth edition) used in a
Lipton, A. DiMascio, & K. F. Killam (Eds.), Psycho- population study. Psychological Medicine, 7, 505±516.
pharmacology: A generation of progress (pp. 857±869). Wittchen, H.-U. (1994). Reliability and validity studies of
New York: Raven Press. the WHO-Composite International Diagnostic Interview
Spitzer, R. L., Williams, J. B. W., Gibbon, M., & First, M. (CIDI): A critical review. Journal of Psychiatry Research,
B. (1990). User's guide for the Structured Clinical 28, 57±84.
Interview for DSM-III-R (SCID). Washington, DC: Wittchen, H.-U., Robins, L. N., Cottler, L. B., Sartorius,
American Psychiatric Press. N., Burke, J. D., Regier, D., & participants in the
Spitzer, R. L., Williams, J. B. W., Gibbon, M., & First, M. multicentre WHO/ADAMHA field trials (1991).
B. (1992). The Structured Clinical Interview for DSM- Cross-cultural feasibility, reliability and sources of
III-R (SCID). I: History, rationale, and description. variance of the Composite International Diagnostic
Archives of General Psychiatry, 49, 624±629. Interview (CIDI). British Journal of Psychiatry, 159,
Standage, K. (1989). Structured interviews and the 645±653.
diagnosis of personality disorders. Canadian Journal of Wittchen, H. -U., Semler, G., & von Zerssen, D. (1985). A
Psychiatry, 34, 906±912. comparison of two diagnostic methods: Clinical ICD
130 Structured Diagnostic Interview Schedules

diagnoses versus DSM-III and Research Diagnostic CIDI-computer programs. Geneva: Author.
Criteria using the Diagnostic Interview Schedule (Ver- Zimmerman, M. (1994). Diagnosing personality disorders:
sion 2). Archives of General Psychiatry, 42, 677±684. A review of issues and research methods. Archives of
World Health Organization (1973). The international pilot General Psychiatry, 51, 225±245.
study of schizophrenia, Vol. 1.: Geneva: Author. Zimmerman, M., Pfohl, B., Stangl, D., & Corenthal, C.
World Health Organization (1990). Composite International (1986). Assessment of DSM-III personality disorders:
Diagnostic Interview (CIDI): a) CIDI-interview (version The importance of interviewing an informant. Journal of
1.0), b) CIDI-user manual, c) CIDI-training manual, d) Clinical Psychiatry, 47, 261±263.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.06
Principles and Practices of
Behavioral Assessment with
Children
THOMAS H. OLLENDICK
Virginia Tech, Blacksburg, VA, USA
and
ROSS W. GREENE
Harvard Medical School, Boston, MA, USA

4.06.1 INTRODUCTION 132


4.06.2 HISTORY AND DEVELOPMENT 133
4.06.3 THEORETICAL UNDERPINNINGS 134
4.06.4 DESCRIPTION OF ASSESSMENT PROCEDURES 136
4.06.4.1 Behavioral Interviews 136
4.06.4.2 Ratings and Checklists 138
4.06.4.3 Self-report Instruments 140
4.06.4.4 Self-monitoring 142
4.06.4.5 Behavioral Observation 142
4.06.5 RESEARCH FINDINGS 144
4.06.5.1 Behavioral Interviews 145
4.06.5.2 Ratings and Checklists 145
4.06.5.3 Self-report Instruments 146
4.06.5.4 Self-monitoring 146
4.06.5.5 Behavioral Observation 147
4.06.6 FUTURE DIRECTIONS 148
4.06.6.1 Developmental Factors 148
4.06.6.2 The Utility of the Multimethod Approach at Different Age Levels 149
4.06.6.3 Cultural Sensitivity 149
4.06.6.4 Measures of Cognitive and Affective Processes 150
4.06.6.5 The Role of the Child 150
4.06.6.6 Ethical Guidelines 151
4.06.7 SUMMARY 151
4.06.8 REFERENCES 151

131
132 Principles and Practice of Behavioral Assessment with Children

4.06.1 INTRODUCTION Two other primary features characterize child


behavioral assessment procedures (Ollendick &
While treatment strategies derived from Hersen, 1984, 1993). First, they must be
behavioral principles have a long and rich sensitive to development, and second, they must
tradition in clinical child psychology (e.g., be validated empirically. As noted by Lerner
Holmes, 1936; Jones, 1924; Watson & Rayner, (1986, p. 41), the concept of development
1920), assessment practices based on these same implies ªsystematic and successive changes over
principles have lagged, especially in the area of time in an organism.º Descriptors such as
child behavioral assessment. In fact, many child ªsystematicº and ªsuccessiveº suggest that these
behavioral assessment practices have been changes are, for the most part, orderly and that
adopted, sometimes indiscriminately, from changes seen at one point in time will be
those used with adults. This practice is of influenced, at least in part, by changes that
dubious merit and, as we have argued elsewhere occurred at an earlier point in time. Thus
(Ollendick & Greene, 1990), it has frequently led development is not random nor, for that matter,
to imprecise findings and questionable conclu- discontinuous. Changes that occur at an early
sions. As a result, greater attention has been age (whether due to learning, an unfolding of
focused on the development of behavioral basically predetermined structures, or some
assessment practices for children in recent years complex, interactive process) have a direct
(e.g., Mash & Terdal, 1981, 1989; Ollendick & impact on subsequent development. Changes
Hersen, 1984, 1993; Prinz, 1986). associated with development, however, create
As first suggested by Mash and Terdal (1981) problems in selecting appropriate methods of
and elaborated by Ollendick and Hersen (1984, assessment, as well as in identifying specific
1993), child behavioral assessment can be target behaviors for change (Ollendick & King,
defined as an ongoing, exploratory, hypothesis- 1991). Behavioral interviews, self-reports,
testing process in which a range of specific other-reports, self-monitoring, and behavioral
procedures is used in order to understand a observation may all be affected by these rapidly
given child, group, or social ecology, and to changing developmental processes. Further,
formulate and evaluate specific intervention due to ªsystematic and successiveº change,
techniques. As such, child behavioral assess- some of these procedures may be more useful at
ment is a dynamic, self-correcting process. It one age than another. For example, interviews
seeks to obtain information from a variety of may be more difficult to conduct and self-
sources in order that we might understand reports less reliable with younger children,
diverse child behavioral problems in their rich whereas self-monitoring and behavioral obser-
and varied contexts, and plan and evaluate vations may be more reactive at older ages
behavioral interventions based upon the in- (Ollendick & Hersen, 1984). Age-related con-
formation obtained. Thus, assessment from this straints are numerous and must be taken into
perspective is fluid (i.e., responsive to feedback consideration when selecting specific methods
and open to change(s) based on information of assessment.
obtained throughout the process), and it is Just as child behavioral assessment proce-
linked intimately with treatment (i.e., assess- dures must be developmentally sensitive, they
ment serves treatment). Moreover, child beha- must also be validated empirically. All too
vioral assessment entails more than the frequently, professionals working with children
identification of discrete target behaviors and have used assessment methods of convenience
their controlling variables. While the impor- without sufficient regard for their psychometric
tance of direct observation of target behaviors properties, including their reliability, validity,
in simulated and natural settings should not be and clinical utility (i.e., the degree to which
underestimated, recent advances in child beha- assessment strategies contribute to beneficial
vioral assessment have incorporated a range of treatment outcomes; see Hayes, Nelson, &
assessment procedures, including behavioral Jarrett, 1987, for a discussion of treatment
interviews, self-reports, ratings by significant utility). Although child behavior assessors have
others, and self-monitoring in addition to fared somewhat better in this regard, they too
behavioral observations. An approach combin- have tended to design and use idiosyncratic,
ing these procedures can best be described as a ªconvenientº tools for assessment. As we have
multimethod one in which an attempt is made to suggested elsewhere (Ollendick & Hersen, 1984),
obtain a complete picture of the child and his or comparison across studies is made difficult, if
her presenting problems. Such a picture is not impossible, and the advancement of an
intended to be useful in the understanding and assessment science and technology, let alone an
modification of specific child behavior pro- understanding of child behavior disorders and
blems (Ollendick & Cerny, 1981; Ollendick & their effective treatments, is compromised with
Hersen, 1984, 1993). such an idiosyncratic approach.
History and Development 133

While a multimethod approach that is based the referral question. Regardless of the measures
on developmentally sensitive and empirically used, they should be developmentally sensitive
validated procedures is recommended, it should and empirically validated.
be clear that a ªtest batteryº approach is not
being espoused. The specific devices to be 4.06.2 HISTORY AND DEVELOPMENT
selected depend on a host of factors, including
the child's age, the nature of the referral As indicated above, assessment of children's
question, the contexts in which the problematic behavior problems requires a multimethod
behavior occurs, and the personnel, time, and approach in which data are gathered from
resources available (Ollendick & Cerny, 1981). clinical interviews and self- and other-report
Nonetheless, given inherent limitations in each sources as well as from direct behavioral
of the various procedures, as well as the observations. In this manner, important in-
desirability of obtaining as complete a picture formation from the cognitive and affective
of the child as possible, we recommend multi- modalities can be obtained and integrated with
method assessment whenever possible. Any one behavioral data to provide a more complete
procedure, including direct behavioral observa- picture of the child. In addition, the multi-
tion, is not sufficient to provide a composite method approach provides the clinician with
view of the child. The multimethod approach, if necessary detail regarding perceptions and
implemented, is not only helpful in assessing reactions of significant others in the child's life
specific target behaviors and in determining (e.g., parents, teachers, peers). It should be
response to behavior change, but also in noted, however, that this comprehensive and
understanding child behavior disorders and inclusive assessment approach is of relatively
advancing assessment as a scientific endeavor. recent origin.
Based on these considerations, we offer the In its earliest stages, behavioral assessment of
following tentative conclusions regarding child children relied almost exclusively on identifica-
behavioral assessment: tion and specification of discrete and highly
(i) Children are a special and unique popu- observable target behaviors (cf. Ullmann &
lation. The automatic extension of adult Krasner, 1965). As such, assessment was limited
behavioral assessment methods to children is to gathering information solely from the
not warranted and is often inappropriate. motoric (i.e., behavioral) response modality.
Further, not all ªchildrenº are alike. Clearly, This early assessment approach followed logi-
a 16-year-old adolescent differs from a 12- cally from theoretical assumptions of the
year-old preadolescent who in turn differs from operant school of thought which was in vogue
an 8-year-old middle-age child and a young at the time. Early on, behaviorally oriented
4-year-old child. Age-related variables affect the clinicians posited that the only appropriate
choice of methods as well as the procedures behavioral domain for empirical study was that
employed. which was directly observable (Skinner, 1953).
(ii) Given rapid developmental change ob- Contending that objective demonstration of
served in children as they grow, normative behavior change following intervention was of
comparisons are required to ensure that appro- utmost importance, behaviorists relied upon
priate target behaviors are selected and that data that could be measured objectively.
change in behavior is related to treatment Subjectivity, and the inferential process asso-
effects, and not normal developmental pro- ciated with it, were eschewed. Hence the
cesses. Such comparisons require identification frequency, intensity, and duration of proble-
of suitable reference groups and information matic behaviors (i.e., ªhard coreº measures)
about the ªnatural courseº of diverse child were pursued. Although existence of cognitions
behavior problems (Ollendick & King, 1994). and affective states was not denied, they were
(iii) Thorough child behavioral assessment not deemed appropriate subject matter for
involves multiple targets of change, including experimental analysis.
overt behavior, affective states, and cognitive As behavioral treatment approaches with
processes. Further, such assessment entails children were broadened to include cognitive
determining the context (e.g., familial, social, and self-control techniques in the 1970s (e.g.,
cultural) in which the child's behavior occurs Bandura, 1977; Kanfer & Phillips, 1970; Ken-
and the function(s) the target behaviors serve. dall & Hollon, 1980; Meichenbaum, 1977), it
(iv) Given the wide range of targets for change became apparent that assessment strategies
and the imprecision of extant measures, multi- would have to expand into the cognitive and
method assessment is desirable. Multimethod affective domains as well. Furthermore, even
assessment should not be viewed simply as a test though operant techniques were shown to be
battery approach; rather, methods should be efficacious in producing behavior change under
selected on the basis of their appropriateness to controlled conditions, the clinical significance
134 Principles and Practice of Behavioral Assessment with Children

of these changes was less evident. The issue of approach and its unique contributions. In this
clinical significance of behavior change is section, we will contrast the theoretical assump-
especially important in child behavioral assess- tions that guide behavioral and traditional
ment because children are invariably referred assessment and discuss the practical implica-
for treatment by others (e.g., parents, teachers). tions of these assumptions for child behavioral
Once treatment goals have been identified, the assessment.
ultimate index of treatment efficacy lies in the The most fundamental difference between
referral source's perceptions of change. Hence, traditional and behavioral assessment lies in the
other-report measures become as important as conception of ªpersonalityº and behavior (we
direct observational ones. place the construct ªpersonalityº in quotations
Furthermore, the scope of behavioral assess- because early behaviorists would have objected
ment has been expanded to include the impact to use of this term, given its subjectivity and
of large-scale social systems (e.g., schools, imprecise meaning). In the traditional assess-
neighborhoods) on the child's behavior (e.g., ment approach, personality is viewed as a
Patterson, 1976; Wahler, 1976). Although reflection of underlying and enduring traits, and
inclusion of these additional factors serves to behavior is assumed to be caused by these
complicate the assessment process, they are an internal personality characteristics (ªpersonal-
indispensable part of modern-day child beha- ismº). Aggressive behavior, for example, is
vioral assessment. The ideologies and expecta- assumed to reside ªinº the child and to be
tions of seemingly distal social systems often caused by an underlying dynamic process
have immediate and profound effects on attributed, perhaps, to hostility or anger and
individual behavior (see Winett, Riley, King, resulting from deep-seated intrapsychic conflict.
& Altman, 1989, for discussion of these issues). ªAggression,º it is said, is caused by the
In sum, child behavioral assessment has underlying hostility/anger. In contrast, beha-
progressed from sole reliance on measurement vioral approaches have avoided references to
of target behaviors to a broader approach that underlying personality constructs, focusing
takes into account cognitive and affective instead on what the child does under specific
processes of the child that serve to mediate conditions. From the behavioral perspective,
behavior change. Further, the social contexts ªpersonalityº refers to patterns rather than
(i.e., families, schools, communities) in which causes of behavior (Staats, 1975, 1986). Further-
the problematic behaviors occur have been more, behavior is viewed as a result of current
targeted for change. The assessment techniques environmental factors (ªsituationalismº) and of
that accompany this approach include beha- current environmental factors interacting with
vioral interviews and self- and other-report organismic variables (ªinteractionismº). Thus
instruments. These measures are utilized in the role of the current environment is stressed
addition to direct behavioral observation which more in behavioral assessment than in tradi-
remains the cornerstone of behavioral assess- tional assessment. The focus of assessment is
ment (Mash & Terdal, 1981, 1989; Ollendick & on what the child does in that situation rather
Hersen, 1984, 1993). than on what the child has or ªisº (Mischel,
1968). As a result, a lower level of inference is
required in behavioral assessment than in
4.06.3 THEORETICAL UNDERPINNINGS traditional assessment.
It is important not to oversimplify the
Although behaviorism has had an historical behavioral view of the causes of behavior,
development of its own, it is safe to state that the however. It has often been erroneously asserted
increased popularity of the behavioral approach that the behavioral approach focuses on
has flourished, at least in part, due to external determinants of behavior to the
dissatisfaction with the psychodynamic ap- exclusion of organismic states or internal
proach. A reflection of this dissatisfaction is cognitions and affects. To be sure, behavioral
that virtually all discussions of behavioral views of childhood disorders have emphasized
assessment are carried out through comparison the significant role of environmental factors in
and contrast with traditional assessment ap- the manifestation and maintenance of behavior.
proaches (e.g., Bornstein, Bornstein, & Daw- However, organismic variables that influence
son, 1984; Cone & Hawkins, 1977; Goldfried & behavior are not ignored or discounted. Among
Kent, 1972; Hayes, Nelson, & Jarrett, 1986; the organismic variablesÐdubbed cognitive
Mash & Terdal, 1981, 1989; Mischel, 1968; social learning person variables (CSLPVs) by
Ollendick & Hersen, 1984, 1993). Though such Mischel (1973)Ðthat have been found to be
comparisons often result in oversimplification important are competencies (skills which chil-
of both approaches, they serve to elucidate dren possess such as social skills, problem-
theoretical underpinnings of the behavioral solving skills), encoding strategies (the manner
Theoretical Underpinnings 135

in which children perceive or encode informa- These differing assumptions have implica-
tion about their environment), expectancies tions for the assessment process. In behavioral
(expectancies about performance, including assessment, the emphasis on situational speci-
self-efficacy and outcome expectancies), sub- ficity necessitates an assessment approach that
jective values (children's likes or dislikes, samples behavior across a number of settings.
preferences or aversions), and self-regulatory Hence assessment of the child's behavior at
systems and plans (children's capacity for and home, in school, and on the playground is
manner of self-imposing goals and standards important in addition to information obtained
and self-administering consequences for their in the clinic setting. Furthermore, it is not
behavior). A wide array of self-report instru- assumed that information obtained from these
ments tapping CSLPVs and related cognitive various settings will be consistent. The child
and affective modalities for use in child may behave aggressively in school and on the
behavioral assessment have been reviewed playground with peers but not at home with
recently by us (Greene & Ollendick, in press). siblings or parents. Or conversely, the child
A thorough behavioral assessment should might behave aggressively at home but not at
attempt to identify controlling variables, school or when with his or her peers. This lack of
whether environmental or organismic in nature. consistency in behavior would be problematic
As Mash and Terdal (1981) point out, ªthe for the traditional approach, but not for the
relative importance of organismic and environ- behavioral approach. Similarly, the notion of
mental variables and their interaction . . . should temporal instability requires the child's beha-
follow from a careful analysis of the problemº vior be assessed at several points in time from a
(p. 23). behavioral perspective, whereas such measure-
The traditional conception of personality as ments across time would be less critical for the
made up of stable and enduring traits implies traditional approach.
that behavior will be relatively persistent over At one point in time, it was relatively easy to
time and consistent across situations. The differentiate behavioral from traditional assess-
behavioral view, in contrast, has been one of ment on the basis of the methods employed.
situational specificity; that is, because behavior Direct behavioral observation was the defining
is in large part a function of situational characteristic and often the sole assessment
determinants and CSPLVs that are enacted technique of the behavioral approach, whereas
only under specified conditions, a child's clinical interviews, self-report measures, and
behavior will change as these situational factors projective techniques characterized traditional
are altered or the person variables are engaged. assessment. However, as behavioral assessment
Similarly, consistency of behavior across the was expanded to include a wider repertoire of
temporal dimension is not necessarily expected. assessment methods, differentiating behavioral
Hence, as noted above, an aggressive act such as and traditional assessments simply on the basis
a child hitting another child would be seen from of assessment methods used has become more
the traditional viewpoint as a reflection of difficult. It is not uncommon for behaviorists to
underlying hostility which, in turn, would be utilize information from clinical interviews and
hypothesized to be related to early life experi- self-report instruments, and to pursue percep-
ences or intrapsychic conflict. Little or no tions and expectancies of significant others in
attention would be given to specific situational the child's environment. Thus there is consider-
factors or the environmental context in which able overlap in actual assessment practices, with
the aggressive act occurred. From the behavior- one notable exception. Rarely, if ever, would
al perspective, an attempt is made to identify projective techniques be utilized by the child
those variables that elicit and maintain the behavioral assessor. The primary difference
aggressive act in that particular situation. That between traditional and behavioral assessment
the child may aggress in a variety of situations is lies then not in the methods employed, but
explained in terms of his or her learning history rather in the manner in which data from
in which reinforcing consequences have been assessment sources are utilized. Traditional
obtained for past aggressive acts (which help approaches interpret assessment data as signs
shape CSLPVs), and not in terms of an of underlying personality functioning. These
underlying personality trait of hostility. From data are used to diagnose and classify the child
this analysis, it is clear that actual behavior is of and to make prognostic statements. From the
utmost importance to behaviorists, because it behavioral perspective, assessment data are used
represents a sample of the child's behavioral to identify target behaviors and their controlling
repertoire in a specific situation. From the conditions (again, be they overt or covert).
traditional viewpoint, the behavior assumes Information obtained from assessment serves as
importance only insofar as it is a sign of some a sample of the child's behavior under specific
underlying trait. circumstances. This information guides the
136 Principles and Practice of Behavioral Assessment with Children

selection of appropriate treatment procedures. the assessment process. Of paramount impor-


Because behavioral assessment is ongoing, such tance for child behavior assessors is the necessity
information serves as an index by which to of tailoring the assessment approach to the
evaluate critically the effects of treatment and to specific difficulties of the child in order to
make appropriate revisions in treatment. identify the problem accurately, specify treat-
Further, because assessment data are viewed ment, and evaluate treatment outcome. Such
as samples of behavior, the level of inference is tailoring requires ongoing assessment from a
low, whereas a high level of inference is required number of sources under appropriately diverse
when one attempts to make statements about stimulus conditions.
personality functioning from responses to inter-
view questions or test items.
In addition to these differences, Cone (1986) 4.06.4 DESCRIPTION OF ASSESSMENT
has highlighted the nomothetic and ideographic PROCEDURES
distinction between traditional and behavioral
Multimethod behavioral assessment of chil-
assessment. Stated briefly, the nomothetic
dren entails use of a wide variety of specific
approach is concerned with the discovery of
procedures. As behavioral approaches with
general laws as they are applied to large
children evolved from sole reliance on operant
numbers of children. Usually, these laws
procedures to those involving cognitive and self-
provide heuristic guidelines as to how certain
control procedures, the methods of assessment
variables are related to one another. Such an
have changed accordingly. Identification of
approach can be said to be variable-centered
discrete target behaviors has been expanded
because it deals with particular characteristics
to include assessment of cognitions and affects,
(traits) such as intelligence, achievement, asser-
as well as large-scale social systems that affect
tion, aggression, and so on. In contrast, the
the child (e.g., families, schools, communities).
ideographic approach is concerned more with
Information regarding these additional areas
the uniqueness of a given child and is said to be
can be obtained most efficiently through
child-centered rather than variable-centered.
behavioral interviews, self-reports, and other-
Unlike the nomothetic approach, the ideo-
reports. Cone (1978) has described these
graphic perspective emphasizes discovery of
assessment methods as indirect ones; that is,
relationships among variables uniquely pat-
while they may be used to measure behaviors of
terned in each child. The ideographic approach
clinical relevance, they are obtained at a time
is most akin to the behavioral perspective,
and place different from when the behaviors
whereas the nomothetic approach is closely
actually occurred. In both behavioral interviews
related to the traditional approach. As Mischel
and self-report questionnaires, a verbal repre-
(1968) observed, ªBehavioral assessment in-
sentation of the behaviors of interest is
volves an exploration of the unique or idiosyn-
obtained. Other-reports, or ratings by others
cratic aspects of the single case, perhaps to a
such as parents or teachers, are also included in
greater extent than any other approachº
the indirect category because they involve
(p. 190). Cone (1986) illustrates how the
retrospective descriptions of behavior. Gener-
ideographic/nomothetic distinction relates to
ally, a significant person in the child's environ-
the general activities of behavioral assessors by
ment (e.g., at home or school) is asked to rate the
exploring five basic questions: What is the
child based on previous observations in that
purpose of assessment? What is its specific
setting (recollections).
subject matter? What general scientific ap-
As noted by Cone (1978), ratings such as these
proach guides this effort? How are differences
should not be confused with direct observation
accounted for? And, to what extent are
methods, which assess behaviors of interest at
currently operative environmental variables
the time and place of their occurrence. Of
considered? Although further discussion of
course, information regarding cognition and
these important issues is beyond the scope of
affects, as well as the situations or settings in
the present chapter, Cone's schema helps us
which they occur, can also be obtained through
recognize the pluralistic nature of behavioral
direct behavioral observations, either by self-
assessment and calls our attention to mean-
monitoring or through trained observers. In the
ingful differences in the practices contained
sections that follow, both indirect and direct
therein. As Cone (1986) concludes, ªThere is not
methods are reviewed.
one behavioral assessment, there are manyº
(p. 126). We agree.
In sum, traditional and behavioral assessment 4.06.4.1 Behavioral Interviews
approaches operate under different assump-
tions regarding the child's behavior. These The first method of indirect assessment to be
assumptions, in turn, have implications for considered is the behavioral interview. Of the
Description of Assessment Procedures 137

many procedures employed by behavioral Ollendick & Cerny, 1981). First, children rarely
clinicians, the interview is the most widely used refer themselves for treatment; invariably, they
(Swann & MacDonald, 1978) and is generally are referred by adults whose perceptions of
considered an indispensable part of assessment problems may not coincide with the child's view.
(Gross, 1984; Linehan, 1977). Behavioral inter- This is especially true when problems are
views are frequently structured to obtain centered around externalizing behaviors such
information about the target behaviors and as oppositional or disruptive behaviors, less so
their controlling variables and to begin the with internalizing behaviors (e.g., anxiety or
formulation of specific treatment plans. While depression). Moreover, it is not uncommon for
the primary purpose of the behavioral interview the perception of one adult to differ from that of
is to obtain information, we have found that another (i.e., the mother and father disagree, or
traditional ªhelpingº skills including reflec- the teacher and the parents disagree; cf,
tions, clarifications, and summary statements Achenbach, McConaughy, & Howell, 1987).
help put children and their families at ease and A second issue, related to the first, is the
greatly facilitate collection of this information determination of when child behaviors are
(Ollendick & Cerny, 1981). As with traditional problematic and when they are not. Normative
therapies, it is important to establish rapport developmental comparisons are useful in this
with the child and family and to develop a regard (Lease & Ollendick, 1993; Ollendick &
therapeutic alliance (i.e., agreement on the goals King, 1991). It is not uncommon for parents to
and procedures of therapy) in the assessment refer 3-year-olds who wet the bed, 5-year-olds
phase of treatment (Ollendick & Ollendick, who reverse letters, 10-year-olds who express
1997). interest in sex, and 13-year-olds who are
Undoubtedly, the popularity of the behavior- concerned about their physical appearance.
al interview is derived in part from practical Frequently, these referrals are based on parental
considerations associated with its use. While uneasiness or unrealistic expectations rather
direct observation of target behaviors remains than genuine problems (see Campbell, 1989, for
the hallmark of behavioral assessment, such further discussion of these issues). Finally,
observations are not always practical or problematic family interactions (especially
feasible. At times, especially in outpatient parent±child interactions) are frequently ob-
therapy in clinical settings, the clinician might served in families in which a particular child has
have to rely on children's self-report as well as been identified and referred for treatment (cf.
that of their parents to obtain critical detail Dadds, Rapee, & Barrett, 1994; Patterson, 1976,
about problem behaviors and their controlling 1982). These interactions may not be a part of
variables. Further, the interview affords the the parents' original perception of the ªpro-
clinician the opportunity to obtain information blem.º Furthermore, assessment of such inter-
regarding overall functioning in a number of actions allows the clinician an opportunity to
global areas (e.g., home, school, neighborhood), observe the verbal and nonverbal behaviors of
in addition to specific information about the family unit in response to a variety of topics,
particular problem areas. The flexibility inher- and of family members in response to each
ent in the interview also allows the clinician to other. Structured interviews assessing parent±
build a relationship with the child and the family child interactions have been developed for a
and to obtain information that might otherwise number of behavior problems (e.g., Barkley,
not be revealed. As noted early on by Linehan 1987; Dadds et al., 1994).
(1977), some family members may be more Ideally, evaluation of parental perceptions
likely to divulge information verbally in the and parent±child interactions will enable the
context of a professional relationship than to clinician to conceptualize the problematic
write it down on a form to be entered into a behaviors and formulate treatment plans from
permanent file. In our experience, this is not an a more comprehensive, integrated perspective.
uncommon occurrence. That is, certain family However, the above discussion is not meant to
members report little or no difficulties on intake imply that the behavioral interview should be
reports or on self-report measures, yet they limited to the family; in many instances, the
divulge a number of problem areas during the practices described above should be extended to
structured behavioral interview. adults outside the family unit, such as teachers,
In addition, the interview allows the clinician principals, and physicians, and to environments
the opportunity to observe the family as a whole beyond the home, including schools and day-
and to obtain information about the familial care centers. For example, if a problem behavior
context in which the problem behaviors occur. is reported to occur primarily at school,
Several interrelated issues may arise when child assessing the perceptions and behavioral goals
behavioral assessment is expanded to include of a teacher and/or principal will be necessary
the family unit (Evans & Nelson, 1977; (Greene, 1995, 1996), and evaluating teacher±
138 Principles and Practice of Behavioral Assessment with Children

student interactions may prove more productive categories (to the exclusion of important details
than observing parent±child interactions during regarding specific target behaviors and their
the clinical interview. Finally, the clinician controlling variables), weak or untested relia-
should approach the behavioral interview with bility for children under age 11, low correspon-
caution and avoid blind acceptance of the dence between responses of children and their
premise that a ªproblemº exists ªinº the child. parents, and categorical vs. dimensional scoring
Information obtained in a comprehensive criteria (McConaughy, 1996). Further, struc-
assessment may reveal the behavior of the tured diagnostic interviews often do not yield
identified client is only a component of a more specific information about contextual factors
complex clinical picture involving parents, associated with the child's problematic beha-
siblings, other adults, and/or social systems. vior; thus, when a diagnostic interview is used, it
In sum, an attempt is made during the needs to be supplemented with a problem-
behavioral interview to obtain as much infor- focused interview. In our opinion, diagnostic
mation as possible about the child, his or her interviews should not be considered as replace-
family, and other important individuals and ments for problem-focused interviews; rather
environments. While the interview is focused they should be viewed as complementary.
around specific target behaviors, adult±child
interactions and adult perceptions of the
problem may also be assessed. These percep- 4.06.4.2 Ratings and Checklists
tions should be considered tentative, however,
and used primarily to formulate hypotheses Following the initial behavioral interview
about target behaviors and their controlling and, if necessary, the diagnostic interview,
variables and to select additional assessment significant others in the child's environment
methods to explore target behaviors in more may be requested to complete rating forms or
depth (e.g., rating scales, self-reports, self- checklists. In general, these forms are useful in
monitoring, and behavioral observations). providing an overall description of the child's
The behavioral interview is only the first step behavior, in specifying dimensions or response
in the assessment process. clusters that characterize the child's behavior,
Brief mention should also be made here of and in serving as outcome measures for the
structured diagnostic interviews and their role in evaluation of treatment efficacy. Many of these
child behavioral assessment. In some instances, forms contain items related to broad areas of
most notably when a diagnosis is required, it functioning such as school achievement, peer
may be desirable for the clinician to conduct a relationships, activity level, and self-control. As
structured diagnostic interview. In general, such, they provide a cost-effective picture of
diagnostic interviews are oriented toward children and their overall level of functioning.
obtaining specific information to determine if Further, the forms are useful in eliciting
a child ªmeetsº diagnostic criteria for one or information that may have been missed in the
more specific diagnoses included in the Diag- behavioral interview (Novick, Rosenfeld, Bloch,
nostic and statistical manual of mental disorders & Dawson 1966). Finally, the forms might prove
(4th ed., DSM-IV) (American Psychiatric useful in the search for the best match between
Association, 1994) or the International classifi- various treatments (e.g., systematic desensitiza-
cation of diseases (10th ed., ICD-10; World tion, cognitive restructuring, and self-control)
Health Organization, 1991). Such interviews and subtypes of children as revealed on these
facilitate collection of data relative to a broad forms (Ciminero & Drabman, 1977).
range of ªsymptomsº (i.e., behaviors) and The popularity of omnibus rating forms and
psychiatric diagnoses. Several ªomnibusº diag- checklists is evident in the number of forms
nostic interviews are available, including the currently available (McMahon, 1984). Three of
Diagnostic Interview Schedule for Children- the more frequently used forms are described
Version 2.3 (Shaffer, 1992), which was recently here: the Behavior Problem Checklist (Quay &
revised to reflect DSM-IV criteria. Other Peterson, 1967, 1975) and its revision (Quay &
diagnostic interviews are oriented toward a Peterson, 1983); the Child Behavior Checklist
specific domain such as anxiety (e.g., the (Achenbach, 1991a, 1991b); and the recently
Anxiety Disorders Interview Schedule for developed Behavior Assessment System for
Children; Silverman & Nelles, 1988). It, too, Children (Reynolds & Kamphaus, 1932).
has recently been revised to incorporate DSM- Based on Peterson's (1961) early efforts to
IV criteria (Silverman & Albano, 1996). Both sample diverse child behavior problems, the
child and parent forms of these interviews are Revised Behavior Problem Checklist consists
available. Although these structured diagnostic of 89 items, each rated on a three-point severity
interviews provide a wealth of information, they scale. While some of the items are general and
are limited by an overemphasis on diagnostic require considerable inference (e.g., lacks self-
Description of Assessment Procedures 139

confidence, jealous), others are more specific band grouping of the factors reflects the
(e.g., cries, sucks thumb). Six primary dimen- aforementioned internalizing and externalizing
sions or response clusters of child behavior behavioral dimensions.
have been identified on this scale: conduct Although the Behavior Problem Checklist
problems, socialized aggression, attention pro- and Child Behavior Checklist have enjoyed
blems, anxiety-withdrawal, psychotic behavior, considerable success, the recently developed
and motor excess. Interestingly, the two Behavior Assessment System for Children
primary problem clusters found on this check- (BASC; Reynolds & Kamphaus, 1992) repre-
list are similar to those found in numerous sents a challenge to both of these well-
factor-analytic studies of other rating forms established rating scales. Like these other
and checklists. These two factors or response instruments, the BASC is an omnibus checklist
clusters represent consistent dimensions of composed of parent, teacher, and child ver-
child behavior problems, reflecting externaliz- sions. It also contains a developmental history
ing (e.g., acting out) and internalizing (e.g., form and a classroom observation form. Most
anxiety, withdrawal) dimensions of behavior similar to Achenbach's Child Behavior Check-
(Achenbach, 1966). list (Achenbach, 1991a), Teacher's Report
While the Behavior Problem Checklist has a Form (Achenbach, 1991b), and Youth Self-
rather lengthy history and is one of the most Report (Achenbach, 1991c), the parent, tea-
researched scales, it does not include the rating cher, and self-report forms of the BASC contain
of positive behaviors and, hence, does not items that tap multiple emotional and beha-
provide a basis on which to evaluate more vioral domains and produce scale scores that
appropriate, adaptive behaviors. A scale that represent pathological and adaptive character-
does assess appropriate behaviors, as well as istics of the child. Unlike the empirically derived
inappropriate ones, is the Child Behavior scales of Achenbach's checklists, however, the
Checklist (CBCL; Achenbach, 1991a, 1991b; scales of the BASC were created conceptually to
Achenbach & Edelbrock, 1989). The scale, represent content areas relevant to assessment
designed for both parents and teachers, contains and classification in clinical and educational
both social competency and behavior problem settings.
items. The parent-completed CBCL is available For example, the BASC Parent Rating Scale
in two formats depending on the age of the child (BASC-PRS) yields T scores in broad externa-
being evaluated (i.e., 2±3 years and 4±18 years). lizing and internalizing domains as well as in
The CBCL 4±18, for example, consists of 112 specific content areas, including aggression,
items rated on a three-point scale. Scored items hyperactivity, conduct problems, attention
can be clustered into three factor-analyzed problems, depression, anxiety, withdrawal,
profiles: social competence, adaptive function- somatization, and social skills.
ing, and syndrome scales. The latter includes In addition, it provides T scores in areas of
eight syndrome scales: withdrawn, somatic social competency such as leadership and
complaints, anxious/depressed, social pro- adaptability. Preschool (ages 4±5), child (ages
blems, thought problems, attention problems, 6±11), and adolescent (ages 12±18) forms are
aggressive behavior, and delinquent behavior. available. Recent findings suggest the utility of
Social competency items examine the child's this instrument with both clinical and educa-
participation in various activities (e.g., sports, tional populations and in identifying youth at
chores, hobbies) and social organizations (e.g., risk for maladaptive outcomes (cf, Doyle,
clubs, groups), as well as performance in the Ostrander, Skare, Crosby, & August, 1997).
school setting (e.g., grades, placements, promo- Although initial findings associated with its use
tions). The teacher-completed CBCL (TRF; appear promising, much more research is
Teacher Report Form) also consists of 112 items needed before its routine acceptance can be
which are fairly similar, but not completely endorsed.
identical to, those found on the CBCL com- In addition to these more general rating
pleted by parents. The scored items from the forms, rating forms specific to select problem
teacher form cluster into the same three factor- areas are also available for use in child
analyzed profiles; further, the eight syndrome behavioral assessment. Three such forms have
scales are the same for the two measures, been chosen for purposes of illustration: one
allowing for cross-informant comparisons. As used in the assessment of an internalizing
with Quay and Peterson's Behavior Problem dimension (fears/anxiety), another in the assess-
Checklist, some of the items are general and ment of an externalizing dimension (defiance/
require considerable inference (e.g., feels worth- noncompliance), and the final one in measuring
less, acts too young, fears own impulses), while a specific area of social competency.
others are more specific and easily scored (e.g., The Louisville Fear Survey Schedule for
wets bed, sets fires, destroys own things). Broad- Children (Miller, Barrett, Hampe, & Noble,
140 Principles and Practice of Behavioral Assessment with Children

1972) contains 81 items that address an self-rating forms are available. In general, this
extensive array of fears and anxieties found in instrument provides important and detailed
children and adolescents. Each item is rated on a information about academic and social compe-
three-point scale by the child's parents. Re- tence that can be used to supplement informa-
sponses to specific fear items can be used to tion obtained from the more generic rating
subtype fearful children. For example, Miller scales.
et al. (1972) were able to differentiate among In sum, a variety of other-report instruments
various subtypes of school-phobic children on are available. As noted earlier, these forms must
the basis of this instrument. be considered indirect methods of assessment
The Home Situations Questionnaire (HSQ; because they rely on retrospective descriptions
Barkley, 1981; Barkley & Edelbrock, 1987) of the child's behavior. For all of these scales, an
contains 16 items representing home situations informant is asked to rate the child based on
in which noncompliant behavior may occur past observations of that child's behavior.
(e.g., while playing with other children, when Global scales such as the Revised Behavior
asked to do chores, and when asked to do Problem Checklist, Child Behavior Checklist,
homework). For each situation, parents indicate and Behavior Assessment Scale for Children
whether noncompliant behavior is a problem comprehensively sample the range of potential
and then rate each of the 16 problematic behavior problems, while more specific scales
situations on a nine-point scale (mild to severe); such as the Louisville Fear Survey Schedule
thus the scale assesses both the number of for Children, the Home Situations Question-
problem situations and the severity of non- naire, and the Social Skills Rating System
compliant behavior. The HSQ has been shown provide detailed information about particular
to be sensitive to stimulant-drug effects (Bark- maladaptive or adaptive behaviors. Both types
ley, Karlsson, Strzelecki, & Murphy, 1984), to of other-report instruments provide useful,
discriminate children with behavior problems albeit different, information in the formulation
from normal children (Barkley, 1981), and to be and evaluation of treatment programs.
sensitive to the effects of parent training
programs (Pollard, Ward, & Barkley, 1983).
The HSQ was selected for inclusion in this 4.06.4.3 Self-report Instruments
chapter because it may also be used in
conjunction with a companion scale, the School Concurrent with the collection of other-
Situations Questionnaire (SSQ; Barkley, 1981; reports regarding the child's behavior from
Barkley & Edelbrock, 1987), which is completed significant others, self-reports of attitudes,
by teachers. This scale includes 12 school feelings, and behaviors may be obtained directly
situations most likely to be problematic for from the child. As noted earlier, behaviorists
clinic-referred, noncompliant children, includ- initially eschewed such data, maintaining that
ing ªduring lectures to the class,º ªat lunch,º the only acceptable datum was observable
and ªon the bus.º Teachers rate the occurrence behavior. To a large extent, this negative bias
and severity of noncompliant behavior on a against self-report was an outgrowth of early
scale identical to that of the HSQ. In earlier findings indicating that reports of subjective
sections, we emphasized the importance of states did not always coincide with observable
assessing child behavior in multiple environ- behaviors (Finch & Rogers, 1984). While
ments; the HSQ and SSQ are representative congruence in responding is, in fact, not always
of recent efforts to develop measures for observed, contemporary researchers have co-
this purpose, thus providing us important gently argued that children's perceptions of
contextual information about specific problem their own behavior and its consequences may be
behaviors. as important for behavior change as the
In some instances, it may be useful to obtain behavior itself (Finch, Nelson, & Moss, 1993;
more information about a positive, adaptive Ollendick & Hersen, 1984, 1993). Furthermore,
domain of behaviorÐsuch as social skills or as noted earlier, although different assessment
self-regulationÐthan that provided for by procedures may yield slightly different informa-
omnibus rating scales such as the Revised tion, data from these sources should be
Behavior Problem Checklist (Quay & Peterson, compared and contrasted in order to produce
1983) or the Child Behavior Checklist (Achen- the best picture of the child and to derive
bach, 1991a, 1991b). For example, the Social treatment goals and procedures. Although self-
Skills Rating System (Gresham & Elliot, 1990), report instruments have specific limitations,
a 55-item questionnaire, provides specific they can provide valuable information about
information about a child's behavior in three children and their presenting problem; further-
domains (social skills, problem behaviors, and more, they can be used as an index of change
academic competence). Parent, teacher, and following treatment.
Description of Assessment Procedures 141

A wide array of self-report instruments have failure or criticism, fear of the unknown, fear of
been developed for children. Some self-report injury and small animals, fear of danger and
instruments focus on a broad range of beha- death, and medical fears. This pattern of fear
vioral, cognitive, and affective functioning, as in has been shown to be relatively invariant across
the case of the Youth Self-report (Achenbach, several nationalities, including American (Ol-
1991c). Other self-report instruments tap more lendick, Matson & Hetsel, 1985), Australian
specific areas of interest, such as anger (Nelson (Ollendick, King, & Frary, 1989), British
& Finch, 1978), anxiety (Reynolds & Rich- (Ollendick, Yule, & Ollier, 1991), Chinese
mond, 1985; Spielberger, 1973), assertion (Dong, Yang, & Ollendick, 1994), and Nigerian
(Deluty, 1979; Ollendick, 1983a), depression youth (Ollendick, Yang, King, Dong, &
(Kovacs, 1985), and fear (Scherer & Nakamura, Akande, 1996). Further, it has been shown that
1968). Each of these instruments has been girls report more fear than boys in these various
carefully developed and empirically validated. countries, that specific fears change develop-
Three of the more frequently used instruments mentally, and that the most prevalent fears of
we have found to be useful in our clinical boys and girls have remained unchanged over
practice will be described briefly. the past 30 years (although some differences
Spielberger's State±Trait Anxiety Inventory have been noted across nationalities). Such
for Children (1973) consists of 20 items that information is highly useful when determining
measure state anxiety and 20 items that tap trait whether a child of a specific age and gender is
anxiety. The state form is used to assess excessively fearful. Further, the instrument has
transient aspects of anxiety, while the trait been used to differentiate subtypes of phobic
form is used to measure more global, general- youngsters whose fear of school is related to
ized aspects of anxiety. Combined, the two separation anxiety (e.g., death, having parents
scales can provide both process and outcome argue, being alone) from those whose fear is due
indices of change in self-reported anxiety. That to specific aspects of the school situation itself
is, the state form can be used to determine (e.g., taking a test, making a mistake, being sent
session-by-session changes in anxiety, while the to the principal). When information from this
trait form can be used as a pretreatment, post- instrument is combined with that from parents
treatment, and follow-up measure of general- on the Louisville Fear Survey Schedule for
ized anxiety. A clear advantage of this instru- Children (Miller et al., 1972), a relatively
ment is that the state scale is designed so complete picture of the child's characteristic
responses to relatively specific anxiety-produ- fear pattern can be obtained.
cing situations can be determined. For example, The final self-report instrument to be re-
children can be instructed to indicate how they viewed is Kovac's (1985) Children's Depression
feel ªat this momentº about standing up in front Inventory (CDI). Since the mid-1980s, no other
of class, leaving home for summer camp, or area in clinical child psychology has received
being ridiculed by peers. Further, cognitive, more attention than depression in children. A
motoric, and physiologic indicators of anxiety multitude of issues regarding its existence,
can be endorsed by the child (e.g., feeling upset, nature, assessment, and treatment have been
scared, mixed up, jittery, or nervous). Re- examined (Cantwell, 1983; Rutter, 1986). One
sponses to items are scored on a three-point of the major obstacles to systematic investiga-
scale (e.g., ªI feel very scared/scared/not tions in this area has been the absence of an
scaredº). Finally, the pervasiveness of the acceptable self-report instrument, and the CDI
anxiety response can be measured by the trait appears to meet this need. The instrument is a
form. The Spielberger scales are most useful for 27-item severity measure of depression based on
children aged 9±12, but have been used with the well-known Beck Depression Inventory.
both younger children and adolescents as well. Each of the 27 items consists of three response
A second instrument that has been used choices designed to range from mild depression
frequently in child behavioral assessment is the to fairly severe and clinically significant depres-
Fear Survey Schedule for Children (Scherer & sion. Kovacs reports that the instrument is
Nakamura, 1968) and its revision (Ollendick, suitable for middle-age children and adolescents
1983b). In the revised scale, designed to be used (8±17 years of age). We have found the
with younger and middle-age (9±12) children, instrument to be useful with younger children
children are instructed to rate their level of fear as well, especially when items are read aloud and
to each of 80 items on a three-point scale. They response choices are depicted on a bar graph.
are asked to indicate whether a specific fear item Smucker, Craighead, Craighead, and Green
(e.g., having to go to school, snakes, dark (1986) have provided additional psychometric
places, riding in a car) frightens them ªnot at data on the CDI. Overall, they conclude it is a
all,º ªsome,º or ªa lot.º Factor analysis of the reliable, valid, and clinically useful instrument
scale has revealed five primary factors: fear of for children and adolescents.
142 Principles and Practice of Behavioral Assessment with Children

In sum, a variety of self-report instruments serve as visual prompts for self-monitoring.


are available. As with other-report forms, self- Finally, children should be reinforced profusely
reports should be used with appropriate caution following successful use of self-monitoring.
and due regard for their specific limitations. In general, methods of self-monitoring are
Because they generally involve the child's highly variable and depend on the specific
retrospective rating of attitudes, feelings, and behavior being monitored and its place of
behaviors, they too must be considered indirect occurrence. For example, Shapiro, McGonigle,
methods of assessment (Cone, 1978). Never- and Ollendick (1980) had mentally retarded and
theless, they can provide valuable information emotionally disturbed children self-monitor on-
regarding children's own perception of their task behavior in a school setting by placing
behavior. gummed stars on assignment sheets; Ollendick
(1981) had children with tic disorders place tally
marks upon the occurrence of tics on a colored
4.06.4.4 Self-monitoring index card carried in the child's pocket; and
Ollendick (1995) had adolescents diagnosed
Self-monitoring differs from self-report in with panic disorder and agoraphobia indicate
that it constitutes an observation of clinically the extent of their agoraphobic avoidance on a
relevant target behaviors (e.g., thoughts, feel- 1±5 scale each time they encountered the feared
ings, actions) at the time of their occurrence situation. He also had the adolescents indicate
(Cone, 1978). As such, it is a direct method of their confidence (i.e., self-efficacy) in coping
assessment. Self-monitoring requires children to with their fear on a similar 1±5 scale. In our
observe their own behavior and then to record clinical work, we have also used wrist counters
its occurrence systematically. Typically, the with children whose targeted behaviors occur
child is asked to keep a diary, place marks on while they are ªon the move.º Such a device is
a card, or push the plunger on a counter as the not only easy to use, but serves as a visual
behavior occurs or shortly thereafter. Although prompt to self-record. The key to successful self-
self-monitoring procedures have been used with monitoring in children is the use of recording
both children and adults, at least three procedures that are uncomplicated. They must
considerations must be attended to when such be highly portable, simple, time-efficient, and
procedures are used with younger children relatively unobtrusive (Greene & Ollendick, in
(Shapiro, 1984): behaviors should be clearly press).
defined, prompts to use the procedures should In sum, self-monitoring procedures represent
be readily available, and rewards for their use a direct means of obtaining information about
should be provided. Some younger children will the target behaviors as well as their antecedents
be less aware of when the target behavior is and consequences. While specific monitoring
occurring and will require coaching and assis- methods may vary, any procedure that allows
tance prior to establishing a monitoring system. the child to monitor and record presence of the
Other young children may have difficulty targeted behaviors can be used. When appro-
remembering exactly what behaviors to monitor priate procedures are used, self-monitoring
and how those behaviors are defined. For these represents a direct and elegant method of
reasons, it is generally considered advisable to assessment (Ollendick & Greene, 1990; Ollen-
provide the child with a brief description of the dick & Hersen, 1993).
target behavior or, better yet, a picture of it, and
to have the child record only one or two
behaviors at a time. In an exceptionally sensitive 4.06.4.5 Behavioral Observation
application of these guidelines, Kunzelman
(1970) recommended the use of COUNTOONS, Direct observation of the child's behavior in
simple stick figure drawings that depict specific the natural environment is the hallmark of child
behaviors to be self-monitored. Children are behavioral assessment. As described by Johnson
instructed to place a tally mark next to the and Bolstad (1973), the development of natur-
picture when the behavior occurs. For example, alistic observation procedures represents one of
a girl monitoring hitting her younger brother the major, if not the major, contributions of the
may be given an index card with a drawing of a behavioral approach to assessment and treat-
girl hitting a younger boy and instructed to ment of children. A direct sample of the child's
mark each time she does what the girl in the behavior at the time and place of its occurrence
picture is doing. Of course, in a well-designed is obtained with this approach. As such, it is the
program, the girl might also be provided with a least inferential of the assessment methods
picture of a girl and a younger boy sharing toys described heretofore. However, behavioral ob-
and asked as well to mark each time she emits servations in the naturalistic environment
the appropriate behavior. Such pictorial cues should not be viewed as better than other
Description of Assessment Procedures 143

methods of assessment. Rather, direct observa- behaviors and to determine the antecedent
tions should be viewed as complementary to the and consequent events associated with them.
other methods, with each providing different In this single-parent family, it was noted that the
and valuable information. mother routinely left for work about one hour
In behavioral observation systems, a single after the targeted girl (Valerie) and her siblings
behavior or set of behaviors that have been were to leave for school. Although the siblings
identified as problematic (generally through the left for school without incident, Valerie was
aforementioned procedures) are operationally observed clinging to her mother and refusing to
defined, observed, and recorded in a systematic leave the house and go to school. As described
fashion. In addition, events that precede and by Ayllon et al. (1970), ªValerie typically
follow behaviors of interest are recorded and followed her mother around the house, from
subsequently used in development of specific room to room, spending approximately 80
treatment programs. Although Jones, Reid, and percent of her time within 10 feet of her mother.
Patterson (1975) have recommended use of During these times there was little or no
ªtrained impartial observer-codersº for collec- conversationº (p. 128). Given her refusal to
tion of these data, this is rarely possible in the go to school, the mother took Valerie to a
practice of child behavioral assessment in the neighbor's apartment for the day. However,
clinical setting. Frequently, time constraints, when the mother attempted to leave for work,
lack of trained personnel, and insufficient Valerie frequently followed her at a 10-foot
resources mitigate against the use of highly distance. As a result, the mother had to return to
trained and impartial observers. In some cases, the neighbor's apartment with Valerie in hand.
we have used significant others in the child's This daily pattern was observed to end with the
environment (e.g., parents, teachers, siblings) or mother ªliterally running to get out of sight of
the children themselves as observers of their Valerieº so she would not follow her to work.
own behavior. Although not impartial, these During the remainder of the day, it was
observers can be trained adequately to record observed that Valerie was allowed to do
behaviors in the natural environment. In other whatever she pleased: ªHer day was one which
cases, behavioral clinicians have resorted to would be considered ideal by many grade-
laboratory or analogue settings that are similar school childrenÐshe could be outdoors and
to, but not identical to, the natural environ- play as she chose all day long. No demands of
ment. In these simulated settings, children may any type were placed on herº (p. 129). Based on
be asked to behave as if they are angry with their these observations, it appeared that Valerie's
parents, to role play assertive responding, or to separation anxiety and refusal to attend school
approach a highly feared object. Behaviors can were related to her mother's attention and to the
be directly observed or videotaped (or audio- reinforcing environment of the neighbor's
taped) and reviewed retrospectively. The dis- apartment where she could play all day.
tinguishing characteristic of behavioral However, because Valerie was also reported
observations, whether made in the naturalistic to be afraid of school itself, Ayllon et al. (1970)
environment or in simulated settings, is that a designed a simulated school setting in the home
direct sample of the child's behavior is obtained. to determine the extent of anxiety or fear toward
A wide variety of target behaviors have been specific school-related tasks. (Obviously, ob-
examined using behavioral observation proce- servation in the school itself would have been
dures. These behaviors have varied from desirable but was impossible because she
relatively discrete behaviors like enuresis and refused to attend school.) Unexpected, little
tics, that require relatively simple and straight- or no fear was evinced in the simulated setting;
forward recording procedures, to complex in fact, Valerie performed well and appeared to
social interactions that necessitate extensive enjoy the school-related setting and homework
behavioral coding systems (e.g., Dadds et al., tasks. In this case, these detailed behavioral
1994; O'Leary, Romanczyk, Kass, Dietz, & observations were useful in ruling upon differ-
Santogrossi, 1971; Patterson, Ray, Shaw, & ential hypotheses related to school refusal. They
Cobb, 1969; Wahler, House, & Stambaugh, led directly to a specific and efficacious
1976). treatment program based on shaping and
The utility of behavioral observations in differential reinforcement principles. The utility
naturalistic and simulated settings is well of behavioral observations for accurate assess-
illustrated in Ayllon, Smith, and Rogers' ment and treatment programming has been
(1970) behavioral assessment of a young noted in numerous other case studies as well
school-phobic girl. In this case study, impartial (e.g., Ollendick, 1995; Ollendick & Gruen, 1972;
observers in the child's home monitored the Smith & Sharpe, 1970).
stream of events occurring on school days in A major disadvantage of behavioral observa-
order to identify the actual school-phobic tions in the natural environment is that the
144 Principles and Practice of Behavioral Assessment with Children

target behavior may not occur during the about how this happened, Dadds and colleagues
designated observation periods. In such in- observed the moment-to-moment process
stances, simulated settings that occasion the whereby parents of anxious children influenced
target behaviors can be used. Simulated ob- their children to change from a nonthreatened
servations are especially helpful when the target stance to an avoidant, threatened stance. To
behavior is of low frequency, when the target examine the interdependency of the parents and
behavior is not observed in the naturalistic the child, they coded each family member's
setting due to reactivity effects associated with utterances in real time sequence so that
being observed, or when the target behavior is conditional probabilities could be computed
difficult to observe in the natural environment between different family members' behaviors.
due to practical constraints. Ayllon et al.'s Using this system, they were able to show the
(1970) use of a simulated school setting process by which, and through which, the
illustrated this approach under the latter anxiety response was activated and maintained
conditions. A study by Matson and Ollendick in the child. Thus a very complicated process of
(1976) illustrates this approach for low-fre- parent±child interactions was broken down into
quency behaviors. In this study, parents its constituent parts, recorded with a sophisti-
reported that their children bit either the parent cated observation system, and analyzed sequen-
or siblings when they ªwere unable to get their tially over time. Moreover, the observations
way or were frustrated.º Direct behavioral suggested that, in this sample of overanxious
observations in the home confirmed the par- children, anxiety did not exist solely ªinº the
ental report, but it was necessary to observe the child; rather, it existed in a context that was
children for several hours prior to observing an highly dependent upon parental influences.
occurrence of the behavior. Further, parents Such a demonstration illustrates the importance
reported that their children were being ªgoodº of contextual influences in understanding,
while the observers were present and that assessing, and treating diverse child behavior
frequency of biting was much lower than its disorders.
usual, ªnormalº rate. Accordingly, parents were In sum, direct behavioral observationÐeither
trained in observation procedures and in- in the natural or simulated environmentÐ
structed to engage their children in play for provides valuable information for child beha-
four structured play sessions per day. During vioral assessment. When combined with infor-
these sessions, parents were instructed to mation gathered through behavioral interviews,
prompt biting behavior by deliberately remov- self- and other-reports, and self-monitoring, a
ing a preferred toy. As expected, removal of comprehensive picture of children and their
favored toys in the structured situations resulted behaviors, as well as their controlling variables,
in increases in target behaviors, which were then is obtained. As with other assessment proce-
eliminated through behavioral procedures. The dures, however, direct behavioral observation
structured, simulated play settings maximized alone is not sufficient to meet the various
the probability that biting would occur and that behavioral assessment functions required for a
it could be observed and treated under thorough analysis of a child's problem behavior.
controlled conditions.
It is often intimated that behavioral observa-
tion systems may not be suitable for more
complex behavior problems, such as parent± 4.06.5 RESEARCH FINDINGS
child interactions. Sophisticated systems devel-
oped by Dumas (1989) and Dadds et al. (1994) As noted earlier, use of assessment instru-
to capture family interactions and processes ments and procedures that have been empiri-
suggest otherwise. For example, Dadds et al. cally validated is one of the primary
(1994) developed the Family Anxiety Coding characteristics of child behavioral assessment.
Schedule in order to measure anxious behavior However, the role of conventional psychometric
in both child and parent, and the antecedents standards in evaluating child behavioral assess-
and consequences each provided the other to ment procedures is a controversial one (e.g.,
occasion anxiety in the other. This schedule was Barrios & Hartman, 1986; Cone, 1981, 1986;
developed following the observation that chil- Cone & Hawkins, 1977; Mash & Terdal, 1981).
dren learned to process information about Given the theoretical underpinnings of child
threat cues through interactions with their behavioral assessment and the basic assump-
parents. More specifically, they observed that tions regarding situational specificity and
anxious children tended to view ªneutralº temporal instability of behavior, traditional
situations as more threatening after discussing psychometric standards would appear to be of
the situations with their parents than they did in little or no value. After all, how can behaviors
the absence of such interactions. To learn more thought to be under the control of specific
Research Findings 145

antecedent and consequent events be expected delayed clarification of the presenting com-
to be similar in different settings and at different plaints, but also in faulty hypotheses about
times? Yet, if there is no consistency in behavior causal agents and maintaining factors. For
across settings and time, prediction of behavior example, Chess, Thomas, and Birch (1966)
is impossible and the generalizability of findings reported that parents inaccurately reported
obtained from any one method of assessment certain behavior problems developed at times
would be meaningless. Such an extreme ideo- predicted by popular psychological theories.
graphic stance precludes meaningful assess- For example, problems with siblings were
ment, except of highly discrete behaviors in very recalled to have begun with the birth of a
specific settings and at very specific points in younger sibling, and problems with dependency
time (Ollendick & Hersen, 1984). were reported to have begun when the mother
Research findings suggest that it is not became employed outside the home. In actu-
necessary totally to dismiss notions of cross- ality, these behaviors were present prior to these
situational and cross-temporal consistency of events; nonetheless, they were ªconvenientlyº
behavior (e.g., Bem & Allen, 1974). Although a recalled to have begun coincident with com-
high degree of behavioral consistency cannot be monly accepted ªlifeº points. In a similar vein,
expected, a moderate degree of behavioral Schopler (1974) noted that many parents of
consistency can be expected across situations autistic children inaccurately blame themselves
that involve similar stimulus and response for their child's problematic behaviors and that
characteristics and are temporally related. many therapists inadvertently ªbuy intoº this
When multimethod assessment procedures are notion that parents are to blame. Such
used under these conditions, a modest relation- scapegoating accomplishes little in the under-
ship among the measures and a fair degree of standing, assessment, and treatment of the
predictability and generalizability can be ex- child's problematic behavior (Ollendick &
pected. Under such circumstances, application Cerny, 1981).
of conventional psychometric standards to While the reliability and validity of general
evaluation of child behavioral assessment information about parenting attitudes and
procedures is less problematic and potentially practices are suspect, findings suggest parents
useful (Cone, 1977; Ollendick & Hersen, 1984, and children can be reliable and valid reporters
1993). The value of psychometric principles has of current, specific information about proble-
already been demonstrated for certain classes of matic behaviors (e.g., Graham & Rutter, 1968;
behavior when obtained through methods such Gross, 1984; Herjanic, Herjanic, Brown, &
as behavioral observation (e.g., Olweus, 1979), Wheatt, 1973). The reliability and validity of the
self-report (e.g., Ollendick, 1981), and other- information are directly related to recency of
report ratings (e.g., Cowen, Pederson, Barbi- behaviors being discussed and specificity of
gian, Izzo, & Trost, 1973). Further, when information requested. Thus, careful specifica-
multiple methods of behavioral assessment tion of precise behaviors and conditions under
have been used in the same studies, a modest which they are occurring is more reliable and
degree of concurrent and predictive validity has valid than vague descriptions of current
been reported (e.g., Gresham, 1982). behaviors or general recollections of early
It is beyond the scope of the present chapter childhood events (Ciminero & Drabman,
to review specific research findings related to the 1977). When the interview is conducted along
reliability, validity, and clinical utility of the such guidelines, it is useful in specifying
various procedures espoused in the multi- behaviors of clinical interest and in determining
method approach. Nonetheless, brief mention appropriate therapeutic interventions. As we
will be made of specific directions of research have noted, however, it is only the first step in
and ways of enhancing the psychometric the ongoing, hypothesis-generating process that
qualities of each procedure. is characteristic of child behavioral assessment.

4.06.5.1 Behavioral Interviews 4.06.5.2 Ratings and Checklists


As noted by Evans and Nelson (1977), data As with behavioral interviews, issues related
based on retrospective reports obtained during to reliability and validity are also relevant to
the interview may possess both low reliability ratings and checklists. Cronbach (1960) has
(agreement among individuals interviewed may noted that the psychometric quality of rating
differ and responses may vary over time) and scales is directly related to the number and
low validity (reported information may not specificity of the items rated. Further, O'Leary
correspond to the ªfactsº). Such inaccurate or and Johnson (1986) have identified four factors
distorted recollections may result not only in associated with item-response characteristics
146 Principles and Practice of Behavioral Assessment with Children

and raters that enhance reliability and validity ingful data about the child's adaptive and
of such scales: (i) the necessity of using clearly problem behaviors but are also useful in
defined reference points on the scale (i.e., orienting parents, teachers, and significant
estimates of frequency, duration, or intensity), others to specific problem or asset areas and
(ii) the inclusion of more than two reference in alerting them to observe and record specific
points on the scale (i.e., reference points that behaviors accurately and validly.
quantify the behavior being rated), (iii) a rater
who has had extensive opportunities for obser-
ving the child being rated, and (iv) more than 4.06.5.3 Self-report Instruments
one rater who has equal familiarity with the
Of the various methods used in child
child.
behavioral assessment, the self-report method
The rating forms and checklists described
has received the least empirical support,
earlier (e.g., Revised Behavior Problem Check-
although this picture is rapidly changing. As
list, Child Behavior Checklist, Behavior Assess-
noted earlier, child behavioral assessors initially
ment System for Children, the Louisville Fear
eschewed use of self-report instruments, largely
Survey Schedule for Children, and the Home
on the basis of their suspected low reliability and
Situations Questionnaire) incorporate these
validity. As we have noted, however, data from
item and response characteristics and are
self-report instruments can be meaningfully
generally accepted as reliable and valid instru-
used to understand and describe the child, plan
ments. For example, the interrater reliability of
treatment, and evaluate treatment outcome.
the Revised Behavior Problem Checklist is quite
As with interview and checklist or rating data,
high when raters are equally familiar with the
self-report of specific behaviors (including
children being rated and when ratings are
cognitions and affects) and events is more
provided by raters within the same setting
reliable and valid than more general, global
(Quay, 1977; Quay & Peterson, 1983). Further,
reports of life experiences. Such self-reports of
stability of these ratings has been reported over
specific states can be used to identify discrete
two-week and one-year intervals. These findings
components of more general constructs (e.g.,
have been reported for teachers in the school
determining the exact fears of a phobic child and
setting and parents in the home setting.
the exact situations that are associated with
However, when ratings of teachers are com-
withdrawn behavior in an unassertive child).
pared to those of parents, interrater reliabilities
Illustratively, Scherer and Nakamura's (1968)
are considerably lower. While teachers seem to
Fear Survey Schedule for Children and its
agree with other teachers, and one parent tends
revision. (Ollendick, 1983b) can be used to
to agree with the other parent, there is less
pinpoint specific fears and classes of fear.
agreement between parents and teachers. Such
Further, this instrument has been shown to be
differences may be due to differential percep-
reliable over time, to possess high internal
tions of behavior by parents and teachers or to
consistency and a meaningful and replicable
the situational specificity of behavior, as
factor structure, to distinguish between phobic
discussed earlier (also see Achenbach et al.,
and nonphobic children, and to discriminate
1987). These findings support the desirability of
among subtypes of phobic youngsters within a
obtaining information about the child from as
particular phobic group (Ollendick & Mayer,
many informants and from as many settings as
1984; Ollendick, King, & Yule, 1994).
possible.
Clearly, more research is needed in this area
The validity of the Revised Behavior Problem
before routine use of self-report instruments can
Checklist has also been demonstrated in
be endorsed. Nonetheless, instruments that
numerous ways. It has been shown to distin-
measure specific aspects of behavior such as
guish clinic-referred children from nonreferred
anxiety or depression rather than global traits
children, and to be related to psychiatric
hold considerable promise for child behavioral
diagnosis, other measures of behavioral de-
assessment.
viance, prognosis, and differential effectiveness
of specific treatment strategies (see Ollendick &
Cerny, 1981, for a discussion of these findings). 4.06.5.4 Self-monitoring
Findings similar to these have been reported
for the Child Behavior Checklist, Behavior In self-monitoring, children observe their
Assessment System for Children, Louisville own behavior and then systematically records
Fear Survey Schedule, and the Home Situations its occurrence. As with other measures, con-
Questionnaire. These rating forms and check- cerns related to the reliability and validity of
lists, as well as others, have been shown to this method exist. What is the extent of
possess sound psychometric qualities and to be interobserver agreement between children who
clinically useful. They not only provide mean- are instructed to monitor their own behavior
Research Findings 147

and objective observers? How accurate are In short, self-monitoring has been found to be
children in recording occurrences of behavior? useful in the assessment of a wide range of child
How reactive is the process of self-monitoring? behavior problems across a wide variety of
The literature in this area is voluminous. settings. When issues related to the reliability,
Even though all necessary studies have not accuracy, and reactivity of measurement are
been conducted, the findings are in general addressed, self-monitoring represents another
agreement. First, children as young as seven clinically useful strategy that is highly efficient
years of age can be trained to be reliable and and clinically useful.
accurate recorders of their own behavior.
However, the specific behaviors should be
clearly defined, prompts to self-record should 4.06.5.5 Behavioral Observation
be available, and reinforcement for self-mon-
itoring should be provided. Under such As with other assessment strategies, beha-
conditions, children's recordings closely ap- vioral observation procedures must possess
proximate those obtained from observing adequate psychometric qualities and be empiri-
adults. For example, in a study examining cally validated before their routine use can be
the effects of self-monitoring and self-adminis- endorsed. Although early behaviorists accepted
tered overcorrection in the treatment of the accuracy of behavioral observations based
nervous tics in children, Ollendick (1981) on their face validity, subsequent investigators
showed that 8±10-year-old children who were enumerated a variety of problems associated
provided clear prompts to self-record highly with their reliability, validity, and clinical utility
discrete behaviors were able to do so reliably. (e.g., Johnson & Bolstad, 1973; Kazdin, 1977).
Estimates of occurrence closely paralleled those These problems include the complexity of the
reported by parents and teachers, even though observation code, the exact recording proce-
children were unaware that these adults were dures to be used (e.g., frequency counts, time
recording their nervous tics. In another study, sampling, etc.), observer bias, observer drift,
Ackerman and Shapiro (1985) demonstrated and the reactive nature of the observation
the accuracy of self-monitoring by comparing process itself (see Barton & Ascione, 1984, for
self-recorded data with a permanent product further discussion of these issues). Our experi-
measure (the number of units produced in a ence suggests that the greatest threat to the
work setting). Again, accuracy of self-monitor- utility of observational data comes from the
ing was confirmed. reactive nature of the observational process
Second, self-monitoring may result in beha- itself, especially when the observer is present in
vior change due to the self-observation process the natural setting. It is well known that the
and result in altered estimates of target presence of an observer affects behavior, usually
behaviors. This effect is known as reactivity. in socially desirable directions. We have found
Numerous factors have been shown to influ- two strategies to be useful in reducing such
ence the occurrence of reactivity: specific reactive effects: recruiting and training
instructions, motivation, goal-setting, nature observer-coders already present in the natural
of the self-recording device, and the valence of setting (e.g., a teacher or parent), and if this is
the target behavior (e.g., Nelson, 1977, 1981). not possible, planning extended observations so
Among the more important findings are that children can habituate to the observers and so
desirable behaviors (e.g., study habits, social that the effects of reactivity will diminish.
skills) may increase while undesirable beha- However, in the latter instance, it should be
viors (e.g., nervous tics, hitting) tend to noted that several sessions of observations may
decrease following self-monitoring, and that be required, since reactive effects have been
the more obtrusive the self-recording device, observed to be prolonged (Johnson & Lobitz,
the greater the behavior change. For example, 1974). Reactive effects, combined with the
Nelson, Lipinski, and Boykin (1978) found aforementioned practical issues of personnel,
that hand-held counters produced greater time, and resources, have led us to place greater
reactivity than belt-worn counters. Holding a emphasis on recruiting observer-coders already
counter in one's hand was viewed as more in the children's natural environment or train-
obtrusive, contributing to increased reactivity. ing children themselves to record their own
Reactivity is a concern in the assessment behavior.
process because it affects the actual occur- In brief, behavioral observations are the most
rences of behavior. However, if one is aware of direct and least inferential method of assess-
the variables that contribute to reactive effects, ment. Even though a variety of problems related
self-monitoring can be used as a simple and to their reliability and validity have been
efficient method for data collection (Shapiro, commented upon, behavioral observations are
1984). highly useful strategies and represent the hall-
148 Principles and Practice of Behavioral Assessment with Children

mark of child behavioral assessment. Whenever both at the time of assessment and following
possible, behavioral observations in the natural treatment. Another way in which developmen-
setting should be obtained. tal principles can be integrated into ongoing
child behavioral assessment is to identify age
differences in the relations or patterns among
behaviors (Edelbrock, 1984). Ollendick and
4.06.6 FUTURE DIRECTIONS King (1991) have shown such patterning of
behavior across development for a number of
A number of directions for future research measures, including diagnostic interviews, self-
and development in child behavioral assessment and other-report instruments, and behavioral
may be evident to the reader. What follows coding systems.
is our attempt to highlight those areas that Finally, developmental principles can be
appear most promising and in need of greater useful in child behavioral assessment in our
articulation. attempts to examine and understand continuity
and discontinuity of certain behavioral patterns.
4.06.6.1 Developmental Factors Basically, this issue can be addressed from two
vantage points, a descriptive one and an
First, it seems to us that greater attention explanatory one. From a descriptive standpoint,
must be given to developmental factors as they we are interested in determining whether a
affect the selection of child behavioral assess- behavior or set of behaviors seen at one point in
ment procedures. Although we have argued that time can be described in the same way at another
these procedures should be developmentally point in time. If it can be described in the same
sensitive, child behavioral assessors have fre- way, descriptive continuity is said to exist; if it
quently not attended to, or have ignored, this cannot, descriptive discontinuity is said to
recommendation. As we noted earlier, the most obtain (Lerner, 1986). We are simply asking,
distinguishing characteristic of children is does the behavior look the same or different?
developmental change. Such change encom- Does it take the same form over time? For
passes basic biological growth and maturity as example, if 4-year-old, 8-year-old, and 12-year-
well as affective, behavioral, and cognitive old children all emitted the same behaviors to
fluctuations that may characterize children at gain entry in a social group, we would conclude
different age levels. While the importance of that descriptive continuity exists for social entry
accounting for developmental level when asses- behavior.
sing behavior may be obvious, ways of For the most part, it has been shown that the
integrating developmental concepts and princi- expression and patterning of a large number of
ples into child behavioral assessment are less behaviors change across development and that
evident. Edelbrock (1984) has noted three areas descriptive discontinuity is more likely the case
for the synthesis of developmental and beha- (Ollendick & King, 1991). Changes in behavior
vioral principles: (i) use of developmental observed with development can, of course,
fluctuations in behavior to establish normative occur for many different reasons. If the same
baselines of behavior, (ii) determination of age explanations are used to account for behavior
and gender differences in the expression and over time, then that behavior is viewed as
covariation of behavioral patterns, and (iii) involving unchanging laws or rules and ex-
study of stability and change in behavior over planatory continuity is said to exist. However, if
time. Clearly, these areas of synthesis and different explanations are used to account for
integration are in their infancy and in need of changes in behavior over time, explanatory
greater articulation (e.g., Harris & Ferrari, discontinuity prevails (Lerner, 1986). For the
1983; Ollendick & Hersen, 1983; Rutter & most part, behaviorally oriented theorists and
Garmezy, 1983; Sroufe & Rutter, 1984). clinicians maintain changes over time are due to
Recently, Ollendick and King (1991) ad- a set of learning principles that are largely the
dressed this developmental±behavioral synth- same across the child's life span. No new
esis in some detail. In reference to normative principles or laws are needed as the child grows.
data, they suggested that such information Developmental theorists, on the other hand,
could be used to determine which behavior maintain a progressive differentiation of the
problems represent clinically significant areas of organism which suggests a different set of
concern, examine appropriateness of referral, principles be invoked across different stages of
and evaluate efficacy of interventions. Essen- development. Unfortunately, the evidence on
tially, this normative-developmental perspec- explanatory continuity versus discontinuity is
tive emphasizes the central importance of scarce; the jury is out on these issues. Much
change over time and the need for relevant work remains to be done in this area; however,
norms against which children can be compared, as Ollendick and King (1991) note, the
Future Directions 149

emergence of ªdevelopmental±behavioral as- (Mash & Terdal, 1981). The sine qua non of
sessmentº is on the horizon. child behavioral assessment is that the proce-
dures be empirically validated. In addition, the
different procedures might vary in terms of their
4.06.6.2 The Utility of the Multimethod treatment utility across different ages. Treat-
Approach at Different Age Levels ment utility refers to the degree to which
assessment strategies are shown to contribute
Second, and somewhat related to the first to beneficial treatment outcomes (Hayes et al.,
area, greater attention must be focused on the 1987). More specifically, treatment utility
incremental validity of the multimethod ap- addresses issues related to the selection of
proach when used for children of varying ages. specific target behaviors and to the choice of
Throughout this chapter, we have espoused a specific assessment strategies. For example, we
multimethod approach consisting of inter- might wish to examine the treatment utility of
views, self- and other-reports, self-monitoring, using self-report questionnaires to guide treat-
and behavioral observations. Some of these ment planning, above and beyond that provided
procedures may be more appropriate at some by direct behavioral observation of children
age levels than others. Further, the psycho- who are phobic of social encounters. All
metric properties of these procedures may vary children could complete a fear schedule and
with age. For example, self-monitoring requires be observed in a social situation, but the self-
the ability to compare one's own behavior report data for only half of the children would
against a standard and accurately to judge be made available for treatment planning. If the
occurrence or nonoccurrence of targeted events children for whom self-reports were made
and behaviors. Most children below six years of available improved more than those whose
age lack the requisite ability to self-monitor and treatment plans were based solely on behavioral
may not profit from such procedures. In fact, observations, then the treatment utility of using
the limited research available suggests self- self-report data would be established (for this
monitoring may be counter-productive when problem with this age child). In a similar
used with young children, resulting in confu- fashion, the treatment utility of interviews, role
sion and impaired performance (e.g., Higa, plays, and other devices could be evaluated
Thorp, & Calkins, 1978). These findings (Hayes et al., 1987). Of course, it would be
suggest that self-monitoring procedures are important to examine treatment utility from a
better suited for children who possess sufficient developmental perspective as well. Certain
cognitive abilities to benefit from their use procedures might be shown to possess incre-
(Shapiro, 1984). In a similar vein, age-related mental validity at one age but not another.
variables place constraints on use of certain Although the concept of treatment utility is
self-report and sociometric measures with relatively new, it shows considerable promise as
young children. It has often been noted that a strategy to evaluate the incremental validity of
sociometric devices must be simplified and our multimethod assessment approach. We
presented in pictorial form to children under six should not necessarily assume that ªmoreº
years of age (Hops & Lewin, 1984). The assessment is ªbetterº assessment.
picture-form sociometric device provides young
children with a set of visual cues regarding
children to be rated and, of course, does not 4.06.6.3 Cultural Sensitivity
require them to read names of children being
rated. The roster-and-rating method, used so Considerable energy must be directed to the
frequently with older children, is simply not development of child behavioral assessment
appropriate with younger children. Ollendick methods that are culturally sensitive. Numerous
and Hersen (1993) review additional age- observers have called attention to the inter-
related findings for other procedures and nationalization of the world and the ªbrowning
suggest caution in using these procedures of Americaº (e.g., Malgady, Rogler, & Con-
without due regard for their developmental stantino, 1987; Vasquez Nuttall, DeLeon, & Del
appropriateness and related psychometric Valle, 1990). In reference to this chapter, these
properties. developments suggest that the assessment
If certain procedures are found to be less process is increasingly being applied to non-
reliable or valid at different age levels, their Caucasian children for whom English is not the
indiscriminate use with children can not be primary language in America, and that many
endorsed. Inasmuch as these strategies are procedures developed in America and other
found to be inadequate, the combination of Western countries are being applied, sometimes
them in a multimethod approach would serve indiscriminately, in other countries as well.
only to compound their inherent limitations Development of assessment procedures that are
150 Principles and Practice of Behavioral Assessment with Children

culture-fair (and language-fair) is of utmost child behavioral assessors have become increas-
importance. Of course, many cultural issues ingly interested in the relation of children's
need to be considered in the assessment process. cognitive and affective processes to observed
Cultural differences may be expressed in child- behaviors. The need for assessment in this area
rearing practices, family values, parental ex- is further evidenced by the continued increase of
pectations, communication styles, nonverbal cognitive-behavioral treatment procedures with
communication patterns, and family structure children, a trend first observed in the late 1970s
and dynamics (Vasquez et al., 1990). As an and early 1980s (e.g., Kendall, Pellegrini, &
example, behaviors characteristic of ethnic Urbain, 1981; Meador & Ollendick, 1984). As
minority children may be seen as emotionally noted by Kendall et al. (1981), there is a
or behaviorally maladaptive by persons who particularly pressing need to develop proce-
have little or no appreciation for cultural norms dures that can examine the very cognitions and
(e.g., Prewitt-Diaz, 1989). Thus, cultural differ- processes that are targeted for change in these
ences (biases?) are likely to occur early in the intervention efforts. For example, the reliable
assessment process. Fortunately, Vasquez-Nut- and valid assessment of self-statements made by
tall, Sanchez, Borras Osorio, Nuttall, & Varvo- children in specific situations would facilitate
gil (1996) have suggested several steps that can the empirical evaluation of cognitive-behavioral
be taken to minimize cultural biases in the procedures such as self-instructional training
assessment process. and cognitive restructuring (cf. Zatz & Chassin,
Vasquez et al. (1996) have offered the 1983; Stefanek, Ollendick, Baldock, Francis, &
following suggestions: (i) include extended Yaeger, 1987).
family members in the information-gathering
process; (ii) use interpreters, if necessary, in 4.06.6.5 The Role of the Child
interviewing the child and family members;
(iii) familiarize oneself with the culture of We must concentrate additional effort on the
specific groups; and (iv) use instruments that role of the child in child behavioral assessment.
have been translated into the native language of All too frequently, ªtests are administered to
the children and for which norms are available children, ratings are obtained on children, and
for specific ethnic groups. With regard to this behaviors are observed in childrenº (Ollendick
latter recommendation, significantly greater & Hersen, 1984, p. ix). This process views the
progress has been witnessed for the translation child as a passive responder, someone who is
component than the establishment of well- largely incapable of actively shaping and
standardized normative information. For ex- determining behaviors of clinical relevance.
ample, while the Conners' Parent Rating Scales Although examination of these organismic
(Conners, 1985) and Conners' Teacher Rating variables is only beginning, it would appear
Scale (Conners, 1985) have been translated into that concerted and systematic effort must be
Spanish and other languages, group norms are directed to their description and articulation.
lacking and the reliability and validity of the For example, children's conceptions of their
translations have not been examined system- own behavior is a critical area of further study.
atically. Similarly, the Fear Survey Schedule for To what causes do children attribute aggressive
Children-Revised (Ollendick, 1983b) has been or withdrawn behavior in themselves or in their
translated into over 10 languages, yet normative peers? Are there aggregated trends in these
data are lacking and the psychometric proper- attributions? Do they differ by culture? Do
ties of the instrument have not been fully ex- causal attributions (as well as self-efficacy and
plored or established. In sum, a clear challenge outcome expectancies) mediate treatment out-
before us in the years ahead is to attend to comes? Again, are there age-related effects for
important cultural factors that impinge on our these effects or culturally relevant effects? The
assessment armamentarium, and to develop and answers to these questions are of both theore-
promulgate culturally sensitive methods that are tical interest and applied clinical significance.
developmentally appropriate and empirically The process described above also implies that
validated. child behavior (problematic or otherwise)
occurs in a vacuum, and that the perceptions
4.06.6.4 Measures of Cognitive and Affective and behaviors of referral sources (parents,
Processes teachers) and characteristics of the environ-
ments in which behavior occurs are somehow
More effort must be directed toward the less critical to assess. Recent efforts to develop
development of culturally relevant, develop- reliable methods for assessing parent±child
mentally sensitive, and empirically validated interactions are indicative of an increased
procedures for assessment of cognitive and awareness of the need to broaden the scope of
affective processes in children. In recent years, assessment to include specific individuals with
References 151

whom, and environments in which, child influences as well as cognitive and affective
behavior problems commonly occur (cf. Dadds mediators of overt behavior. At the same time,
et al., 1994; Dumas, 1989; Greene, 1995, 1996; attention to psychometric properties of assess-
Ollendick, 1996). However, much additional ment procedures has continued.
work remains to be done in this area. Certain theoretical assumptions guide child
behavioral assessment. Foremost among these
is the premise that behavior is a function of
4.06.6.6 Ethical Guidelines situational determinants and not a sign of
underlying personality traits. To assess ade-
Finally, we must continue to focus our
quately the situational determinants and to
attention on ethical issues in child behavioral
obtain as complete a picture of the child as is
assessment. A number of ethical issues regard-
possible, a multimethod assessment approach is
ing children's rights, proper and legal consent,
recommended, utilizing both direct and indirect
professional judgment, and social values are
methods of assessment. Direct methods include
raised in the routine practice of child behavioral
self-monitoring as well as behavioral observa-
assessment (Rekers, 1984). Are children capable
tion by trained observers in naturalistic or
of granting full and proper consent to a
simulated analogue settings. Indirect measures
behavioral assessment procedure? At what
include behavioral interviewing and self- and
age and in what cultures are children competent
other-report measures. These sources of infor-
to give such consent? Is informed consent
mation are considered indirect ones because
necessary? Or might not informed consent be
they involve retrospective reports of previous
impossible, impractical, or countertherapeutic
behavior.
in some situations? What ethical guidelines
Even though direct behavioral observation
surround the assessment procedures to be used?
remains the hallmark of child behavioral
Current professional guidelines suggest our
assessment, information from these other
procedures should be reliable, valid, and
sources is considered not only valuable but
clinically useful. Do the procedures suggested
integral in the understanding and subsequent
in this chapter meet these professional guide-
treatment of child behavior disorders. Hence,
lines? What are the rights of parents and of
whereas identification and specification of
society? It should be evident from these
discrete target behaviors were once considered
questions that a variety of ethical issues persists.
sufficient, current child behavioral assessment
Striking a balance between the rights of parents,
involves serious consideration and systematic
society, and children is no easy matter but is one
assessment of cognitive and affective aspects of
that takes on added importance in the increas-
the child's behavior and of developmental,
ingly litigious society of the USA.
social, and cultural factors that influence the
In short, future directions of child behavioral
child, as well as direct observation of the
assessment are numerous and varied. Even
problematic behavior in situ.
though a technology for child behavioral
Several areas of future research remain. These
assessment has evolved and is in force, we need
include clearer specification of developmental
to begin to explore the issues raised before we
variables, a closer examination of the utility of
can conclude the procedures are maximally
the multimethod approach at different age
productive and in the best interests of children
levels, the influence of culture and the need
throughout the world.
for models of assessment that take cultural
forces into consideration, development of
4.06.7 SUMMARY specific measures to examine cognitive and
affective processes in children, articulation of
Child behavioral assessment strategies have the role of the child in child behavioral
been slow to evolve. Only recently has the chasm assessment, and continued development of
between child behavior therapy and child ethical guidelines. While the basis for a
behavioral assessment been narrowed. In- technology of child behavioral assessment
creased awareness of the importance of devel- exists, considerable fine-tuning remains to be
oping assessment procedures that provide an done. Child behavioral assessment is at a critical
adequate representation of child behavior crossroad in its own development; which path it
disorders has spurred research into assessment takes will determine its long-term future.
procedures and spawned a plethora of child
behavioral assessment techniques. The growing
sophistication of child behavior assessment is 4.06.8 REFERENCES
witnessed by the appearance of self- and other- Achenbach, T. M. (1966). The classification of children's
report strategies that are beginning to take into psychiatric symptoms: A factor-analytic study. Psycho-
account developmental, social, and cultural logical Monographs, 80, 1±37.
152 Principles and Practice of Behavioral Assessment with Children

Achenbach, T. M. (1991a). Manual for the Child Behavior Ciminero, A. R., & Drabman, R. S. (1977). Current
Checklist and Revised Child Behavior Profile. Burlington, developments in the behavioral assessment of children.
VT: University of Vermont Department of Psychiatry. In B. B. Lahey & A. E. Kazdin (Eds.), Advances in
Achenbach, T. M. (1991b). Manual for the Teacher Report clinical child psychology (Vol. I, pp. 47±82). New York:
Form and 1991 Profile. Burlington, VT: University of Plenum.
Vermont Department of Psychiatry. Cone, J. D. (1977). The relevance of reliability and validity
Achenbach, T. M. (1991c). Manual for the Youth Self- for behavioral assessment. Behavior Therapy, 8, 411±426.
Report and 1991 Profile. Burlington, VT: University of Cone, J. D. (1978). The behavioral assessment grid (BAG):
Vermont Department of Psychiatry. A conceptual framework and taxonomy. Behavior
Achenbach, T. M., & Edelbrock, C. S. (1989). Diagnostic, Therapy, 9, 882±888.
taxonomic, and assessment issues. In T. H. Ollendick & Cone, J. D. (1981). Psychometric considerations. In M.
M. Hersen (Eds.), Handbook of child psychopathology Hersen & A. S. Bellack (Eds.), Behavioral assessment: A
(2nd ed., pp. 53±69). New York: Plenum. practical handbook (2nd ed., pp. 38±68). Elmsford, NY:
Achenbach, T. M., McConaughy, S. H., & Howell, C. T. Pergamon.
(1987). Child/adolescent behavioral and emotional pro- Cone, J. D. (1986). Ideographic, nomothetic, and related
blems: Implications of cross-informant correlations for perspectives in behavioral assessment. In R. O. Nelson &
situational specificity. Psychological Bulletin, 101, S. C. Hayes (Eds.), Conceptual foundations of behavioral
213±232. assessment (pp. 111±128). New York: Guilford Press.
Ackerman, A. M., & Shapiro. E. S. (1985). Self-monitoring Cone, J. D., & Hawkins, R. P. (Eds.) (1977). Behavioral
and work productivity with mentally retarded adults. assessment: New directions in clinical psychology. New
Journal of Applied Behavior Analysis, 17, 403±407. York: Brunner/Mazel.
American Psychiatric Association (1994). Diagnostic and Conners, C. K. (1985). The Conners rating scales: Instru-
statistical manual of mental disorders (4th ed.). Washing- ments for the assessment of childhood psychopathology.
ton, DC: Author. Unpublished manuscript, Children's Hospital National
Ayllon, T., Smith, D., & Rogers, M. (1970). Behavioral Medical Center, Washington, DC.
management of school phobia. Journal of Behavior Cowen, E. L., Pederson, A., Barbigian, H., Izzo, L. D., &
Therapy and Experimental Psychiatry, 1, 125±138. Trost, M. A. (1973). Long-term follow-up of early-
Bandura, A. (1977). Self-efficacy: Toward a unifying theory detected vulnerable children. Journal of Consulting and
of behavioral change. Psychological Review, 84, 191±215. Clinical Psychology, 41, 438±445.
Barkley, R. A. (1981). Hyperactive children: A handbook for Cronbach, L. J. (1960). Essentials of psychological testing.
diagnosis and treatment. New York: Guilford Press. New York: Harper & Row.
Barkley, R. A. (1987). Defiant children: A clinician's manual Dadds, M. R., Rapee, R. M., & Barrett, P. M. (1994).
for parent training. New York: Guilford Press. Behavioral observation. In T. H. Ollendick, N. J. King,
Barkley, R. A., & Edelbrock, C. S. (1987). Assessing & W. Yule (Eds.), International handbook of phobic and
situational variation in children's behavior problems: anxiety disorders in children and adolescents
The home and school situations questionnaires. In R. (pp. 349±364). New York: Plenum.
Prinz (Ed.), Advances in behavioral assessment of children Deluty, R. H. (1979). Children's Action Tendency Scale: A
and families (Vol. 3, pp. 157±176). Greenwich, CT: JAI self-report measure of aggressiveness, assertiveness, and
Press. submissiveness in children. Journal of Consulting and
Barkley, R. A., Karlsson, I., Strzelecki, E., & Murphy, J. Clinical Psychology, 41, 1061±1071.
(1984). Effects of age and Ritalin dosage on the Dong, Q., Yang, B., & Ollendick, T. H. (1994). Fears in
mother±child interactions of hyperactive children. Jour- Chinese children and adolescents and their relations to
nal of Consulting and Clinical Psychology, 52, 750±758. anxiety and depression. Journal of Child Psychology and
Barrios, B., & Hartmann, D. P. (1986). The contributions Psychiatry, 35, 351±363.
of traditional assessment: Concepts, issues, and meth- Doyle, A., Ostrander, R., Skare, S., Crosby, R. D., &
odologies. In R. O. Nelson & S. C. Hayes (Eds.), August, G. J. (1997). Convergent and criterion-related
Conceptual foundations of behavioral assessment validity of the Behavior Assessment System for
(pp. 81±110). New York: Guilford Press. Children±Parent Rating Scale. Journal of Clinical Child
Barton, E. J., & Ascione, F. R. (1984). Direct observations. Psychology, 26, 276±284.
In T. H. Ollendick & M. Hersen (Eds.), Child behavioral Dumas, J. E. (1989). Interact: A computer-based coding
assessment: Principles and procedures (pp. 166±194). New and data management system to assess family interac-
York: Pergamon. tions. In R. J. Prinz (Ed.), Advances in behavioral
Bem, D. I., & Allen, A. (1974). On predicting some of the assessment of children and families (Vol. 3, pp. 177±202).
people some of the time: The search for cross-situational Greenwich, CT: JAI Press.
consistencies in behavior. Psychological Review, 81, Edelbrock, C. S. (1984). Developmental considerations. In
506±520. T. H. Ollendick & M. Hersen (Eds.), Child behavioral
Bornstein, P. H., Bornstein, M. T., & Dawson, B. (1984). assessment: Principles and procedures (pp. 20±37). Elms-
Integrated assessment and treatment. In T. H. Ollendick ford, NY: Pergamon.
& M. Hersen (Eds.), Child behavioral assessment: Evans, I. M., & Nelson, R. O. (1977). Assessment of child
Principles and procedures (pp. 223±243). New York: behavior problems. In A. R. Ciminero, K. S. Calhoun, &
Pergamon. H. E. Adams (Eds.), Handbook of behavioral assessment
Campbell, S. B. (1989). Developmental perspectives in child (pp. 603±681). New York: Wiley-Interscience.
psychopathology. In T. H. Ollendick & M. Hersen Finch, A. J., Nelson, W. M., III, & Moss, J. H. (1983).
(Eds.), Handbook of child psychopathology (2nd ed., Stress innoculation for anger control in aggressive
pp. 5±28). New York: Plenum. children. In A. J. Finch, W. M. Nelson, & E. S. Ott
Cantwell, D. P. (1983). Childhood depression: A review of (Eds.), Cognitive-behavioral procedures with children: A
current research. In B. B. Lahey & A. E. Kazdin (Eds.), practical guide (pp. 148±205). Newton, MA: Allyn &
Advances in clinical child psychology (Vol. 5, pp. 39±93 ). Bacon.
New York: Plenum. Finch, A. J., & Rogers, T. R. (1984). Self-report instru-
Chess, S., Thomas, A., & Birch, H. G. (1966). Distortions ments. In T. H. Ollendick & M. Hersen (Eds.), Child
in developmental reporting made by parents of behavio- behavioral assessment: Principles and procedures
rally disturbed children. Journal of the American (pp. 106±123). Elmsford, NY: Pergamon.
Academy of Child Psychiatry, 5, 226±231. Goldfried, M. R., & Kent, R. N. (1972). Traditional versus
References 153

behavioral personality assessment: A comparison of Kazdin, A. E. (1977). Artifact, bias, and complexity of
methodological and theoretical assumptions. Psycholo- assessment: The ABCs of reliability. Journal of Applied
gical Bulletin, 77, 409±420. Behavior Analysis, 4, 7±14.
Graham, P., & Rutter, M. (1968). The reliability and Kendall, P. C., & Hollon, S. D. (Eds.) (1980). Cognitive-
validity of the psychiatric assessment of the childÐII. behavioral intervention: Assessment methods. New York:
Interview with the parents. British Journal of Psychiatry, Academic Press.
114, 581±592. Kendall, P. C., Pellegrini, D. S., & Urbain, E. S. (1981).
Greene, R. W. (1995). Students with ADHD in school Approaches to assessment for cognitive-behavioral
classrooms: Teacher factors related to compatibility, interventions with children. In P. C. Kendall & S. D.
assessment, and intervention. School Psychology Review, Hollon (Eds.), Assessment strategies for cognitive-beha-
24, 81±93. vioral interventions (pp. 227±286). New York: Academic
Greene, R. W. (1996). Students with ADHD and their Press.
teachers: Implications of a goodness-of-fit perspective. Kovacs, M. (1985). Children's Depression Inventory
In T. H. Ollendick & R. J. Prinz (Eds.), Advances in (CDI). Psychopharmacology Bulletin, 21, 995±998.
clinical child psychology (Vol. 18, pp. 205±230). New Kunzelman, H. D. (Ed.) (1970). Precision teaching. Seattle,
York: Plenum. WA: Special Child Publications.
Greene, R. W., & Ollendick, T. H. (in press). Behavioral Lease, C. A., & Ollendick, T. H. (1993). Development and
assessment of children. In G. Goldstein & M. Hersen psychopathology. In A. S. Bellack & M. Hersen (Eds.),
(Eds.), Handbook of psychological assessment (3rd ed.). Psychopathology in adulthood (pp. 89±102). Boston:
Boston: Allyn & Bacon. Allyn & Bacon.
Gresham, F. M. (1982). Social interactions as predictors of Lerner, R. M. (1986). Concepts and theories of human
children's likability and friendship patterns: A multiple development (2nd ed.). New York: Random House.
regression analysis. Journal of Behavioral Assessment, 4, Linehan, M. (1977). Issues in behavioral interviewing. In J.
39±54. D. Cone & R. P. Hawkins (Eds.), Behavioral assessment:
Gresham, F. M., & Elliott, S. N. (1990). Social skills rating New directions in clinical psychology (pp. 30±51). New
system manual. Circle Pines, MN: American Guidance York: Brunner/Mazel.
Service. Malgady, R., Rogler, L., & Constantino, G. (1987).
Gross, A. M. (1984). Behavioral interviewing. In T. H. Ethnocultural and linguistic bias in mental health
Ollendick & M. Hersen (Eds.), Child behavioral assess- evaluation of Hispanics. American Psychologist, 42,
ment: Principles and procedures (pp. 61±79). Elmsford, 228±234.
NY: Pergamon. Mash, E. J., & Terdal, L. G. (1981). Behavioral assessment
Harris, S. L., & Ferrari, M. (1983). Developmental factors of childhood disturbance. In E. J. Mash & L. G. Terdal
in child behavior therapy. Behavior Therapy, 14, 54±72 . (Eds.), Behavioral assessment of childhood disorders
Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1986). (pp. 3±76). New York: Guilford Press.
Evaluating the quality of behavioral assessment. In R. O. Mash, E. J., & Terdal, L. G. (Eds.) (1989). Behavioral
Nelson & S. C. Hayes (Eds.), Conceptual foundations of assessment of childhood disorders (2nd ed.). New York:
behavioral assessment (pp. 463±503). New York: Guil- Guilford Press.
ford. Mash, E. J., & Terdal, L. G. (1989). Behavioral assessment
Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987) The of childhood disturbance. In E. J. Mash & L. G. Terdal
treatment utility of assessment: A functional approach to (Eds.), Behavioral assessment of childhood disorders (2nd
evaluating assessment quality. American Psychologist, ed., pp. 3±65). New York: Guilford Press.
42, 963±974. Matson, J. L., & Ollendick, T. H. (1976). Elimination of
Herjanic, B., Herjanic, M., Brown, F., & Wheatt, T. (1973). low-frequency biting. Behavior Therapy, 7, 410±412.
Are children reliable reporters? Journal of Abnormal McConaughy, S. H. (1996). The interview process. In M. J.
Child Psychology, 3, 41±48. Breen & C. R. Fiedler (Eds.), Behavioral approach to
Higa, W. R., Tharp, R. G., & Calkins, R. P. (1978). assessment of youth with emotional/behavioral disorders:
Developmental verbal control of behavior: Implications A handbook for school-based practitioners (pp. 181±224).
for self-instructional testing. Journal of Experimental Austin, TX: ProEd.
Child Psychology, 26, 489±497. McMahon, R. J. (1984). Behavioral checklists and rating
Holmes, F. B. (1936). An experimental investigation of a forms. In T. H. Ollendick & M. Hersen (Eds.), Child
method of overcoming children's fears. Child Develop- behavioral assessment: Principles and procedures
ment, 1, 6±30. (pp. 80±105). Elmsford, NY: Pergamon.
Hops, H., & Lewin, L. (1984). Peer sociometric forms. In Meador, A. E., & Ollendick, T. H. (1984). Cognitive
T. H. Ollendick & M. Hersen (Eds.), Child behavioral behavior therapy with children: An evaluation of its
assessment: Principles and procedures (pp. 124±147). New efficacy and clinical utility. Child and Family Behavior
York: Pergamon. Therapy, 6, 25±44.
Johnson, S. M., & Bolstad, O. D. (1973). Methodological Meichenbaum, D. H. (1977). Cognitive-behavior modifica-
issues in naturalistic observations: Some problems and tion. New York: Plenum.
solutions for field research. In L. A. Hammerlynck, L. C. Miller, L. C., Barrett, C. L., Hampe, E., & Noble, H.
Handyx, & E. J. Mash (Eds.), Behavior change: (1972). Comparison of reciprocal inhibition, psychother-
Methodology, concepts, and practice (pp. 7±67). Cham- apy, and waiting list control for phobic children. Journal
paign, IL: Research Press. of Abnormal Psychology, 79, 269±279.
Johnson, S. M., & Lobitz, G. K. (1974). Parental Mischel, W. (1968). Personality and assessment. New York:
manipulation of child behavior in home observations. Wiley.
Journal of Applied Behavior Analysis, 1, 23±31. Mischel, W. (1973). Toward a cognitive social learning
Jones, M. C. (1924). The elimination of children's fears. reconceptualization of personality. Psychological Review,
Journal of Experimental Psychology, 7, 382±390. 80, 252±283.
Jones, R. R., Reid, J. B., & Patterson, G. R. (1975). Nelson, R. O. (1977). Methodological issues in assessment
Naturalistic observation in clinical assessment. In P. via self-monitoring. In J. D. Cone & R. P. Hawkins
McReynolds (Ed.), Advances in psychological assessment (Eds.), Behavioral assessment: New directions in clinical
(Vol. 3, pp. 42±95). San Francisco: Jossey-Bass. psychology (pp. 217±240). New York: Brunner/Mazel.
Kanfer, F. H., & Phillips, J. S. (1970). Learning foundations Nelson, R. O. (1981). Theoretical explanations for self-
of behavior therapy. New York: Wiley. monitoring. Behavior Modification, 5, 3±14.
154 Principles and Practice of Behavioral Assessment with Children

Nelson, R. O., Lipinski. D. P., & Boykin, R. A. (1978). The Ollendick, T. H., & Ollendick, D. G. (1997). General worry
effects of self-recorder training and the obtrusiveness of and anxiety in children. In Session: Psychotherapy in
the self-recording device on the accuracy and reactivity Practice, 3, 89±102.
of self-monitoring. Behavior Therapy, 9, 200±208. Ollendick, T. H., Yang, B., King, N. J., Dong, Q., &
Nelson, W. M., III, & Finch, A. J., Jr. (1978). The new Akande, A. (1996). Fears in American, Australian,
children's inventory of anger. Unpublished manuscript, Chinese, and Nigerian children and adolescents: A
Xavier University, OH. cross-cultural study. Journal of Child Psychology and
Novick, J., Rosenfeld, E., Bloch, D. A., & Dawson, D. Psychiatry, 37, 213±220.
(1966). Ascertaining deviant behavior in children. Jour- Ollendick, T. H., Yule, W., & Ollier, K. (1991). Fears in
nal of Consulting and Clinical Psychology, 30, 230±238. British children and their relation to manifest anxiety
O'Leary, K. D., & Johnson, S. B. (1986). Assessment and and depression. Journal of Child Psychology and
assessment of change. In H. C. Quay & J. S. Werry Psychiatry, 32, 321±331.
(Eds.), Psychopathological disorders of children (3rd ed., Olweus, D. (1979). Stability of aggressive reaction patterns
pp. 423±454). New York: Wiley. in males: A review. Psychological Bulletin, 86, 852±875.
O'Leary, K. D., Romanczyk, R. G., Kass, R. E., Dietz, A., Patterson, G. R. (1976). The aggressive child: Victim and
& Santogrossi, D. (1971). Procedures for classroom architect of a coercive system. In E. J. Mash, L. A.
observations of teachers and parents. Unpublished manu- Hammerlynck, & L. C. Hardy (Eds.), Behavior modifica-
script, State University of New York at Stony Brook. tion and families (pp. 267±316). New York: Brunner/
Ollendick, T. H. (1981). Self-monitoring and self-adminis- Mazel.
tered overcorrection: The modification of nervous tics in Patterson, G. R. (1982). Coercive family process. Eugene,
children. Behavior Modification, 5, 75±84. OR: Castalia.
Ollendick, T. H. (1983a). Development and validation of Patterson, G. R., Ray, R. S., Shaw, D. A., & Cobb, J. A.
the Children's Assertiveness Inventory. Child and Family (1969). Manual for coding family interaction (6th ed.).
Behavior Therapy, 5, 1±15. Unpublished manuscript, University of Oregon.
Ollendick, T. H. (1983b). Reliability and validity of the Peterson, D. R. (1961). Behavior problems of middle
Revised-Fear Survey Schedule for Children (FSSC-R). childhood. Journal of Clinical and Consulting Psychology,
Behaviour Research and Therapy, 21, 685±692. 25, 205±209.
Ollendick, T. H. (1995). Cognitive-behavioral treatment of Pollard, S., Ward, E., & Barkley, R. A. (1983). The effects
panic disorder with agoraphobia in adolescents: A of parent training and Ritalin on the parent±child
multiple baseline design analysis. Behavior Therapy, 26, interactions of hyperactive boys. Child and Family
517±531. Behavior Therapy, 5, 51±69.
Ollendick, T. H. (1996). Violence in society: Where do we Prewitt-Diaz, J. (1989). The process and procedures for
go from here? (Presidential address). Behavior Therapy, identifying exceptional language minority children. State
27, 485±514. College, PA: Pennsylvania State University.
Ollendick, T. H., & Cerny, J. A. (1981). Clinical behavior Prinz, R. (Ed.) (1986). Advances in behavioral assessment of
therapy with children. New York: Plenum. children and families. Greenwich, CT: JAI Press.
Ollendick, T. H., & Greene, R. W. (1990). Behavioral Quay, H. C. (1977). Measuring dimensions of deviant
assessment of children. In G. Goldstein & M. Hersen behavior: The Behavior Problem Checklist. Journal of
(Eds.), Handbook of psychological assessment (2nd ed., Abnormal Child Psychology, 5, 277±287.
pp. 403±422). Elmsford, NY: Pergamon. Quay, H. C., & Peterson, D. R. (1967). Manual for the
Ollendick, T. H., & Gruen, G. E. (1972). Treatment of a Behavior Problem Checklist. Champaign, IL: University
bodily injury phobia with implosive therapy. Journal of of Illinois.
Consulting and Clinical Psychology, 38, 389±393. Quay, H. C., & Peterson, D. R. (1975). Manual for the
Ollendick, T. H., & Hersen, M. (Eds.) (1983). Handbook of Behavior Problem Checklist. Unpublished manuscript .
child psychopathology. New York: Plenum. Quay, H. C., & Peterson, D. R. (1983). Interim manual for
Ollendick, T. H., & Hersen, M. (Eds.) (1984). Child the Revised Behavior Problem Checklist. Unpublished
behavioral assessment: Principles and procedures. New manuscript, University of Miami.
York: Pergamon. Rekers, G. A. (1984). Ethical issues in child behavioral
Ollendick, T. H., & Hersen, M. (1993). Child and assessment. In T. H. Ollendick & M. Hersen (Eds.), Child
adolescent behavioral assessment. In T. H. Ollendick & behavioral assessment: Principles and procedures
M. Hersen (Eds.), Handbook of child and adolescent (pp. 244±262). Elmsford, NY: Pergamon.
behavioral assessment (pp. 3±14). New York: Pergamon. Reynolds, C. R., & Kamphaus, R. W. (1992). Behavior
Ollendick, T. H., & King, N. J. (1991). Developmental assessment system for children. Circle Pines, MN:
factors in child behavioral assessment. In P. R. Martin American Guidance Service.
(Ed.), Handbook of behavior therapy and psychological Reynolds, C. R. & Richmond, B. O. (1985). Revised
science: An integrative approach (pp. 57±72). New York: children's manifest anxiety scale manual. Los Angeles:
Pergamon. Western Psychological Services.
Ollendick, T. H., & King, N. J. (1994). Assessment and Rutter, M. (1986). The developmental psychopathology of
treatment of internalizing problems: The role of long- depression: Issues and perspectives. In M. Rutter, C. E.
itudinal data. Journal of Consulting and Clinical Psychol- Izard, & P. B. Read (Eds.), Depression in young people:
ogy, 62, 918±927. Clinical and developmental perspectives (pp. 3±30). New
Ollendick, T. H., King, N. J., & Frary, R. B. (1989). Fears York: Guilford Press.
in children and adolescents in Australia and the United Rutter, M., & Garmezy, N. (1983). Developmental
States. Behaviour Research and Therapy, 27, 19±26. psychopathology. In E. M. Hetherington (Ed.), Sociali-
Ollendick, T. H., King, N. J., & Yule, W. (Eds.) (1994). zation, personality, and social development: Vol 14.
International handbook of phobic and anxiety disorders in Mussen's Handbook of child psychology (pp. 775±911).
children. Boston: Allyn & Bacon. New York: Wiley.
Ollendick, T. H., Matson, J. L., & Hetsel, W. J. (1985). Scherer, M. W., & Nakamura, C. Y. (1968). A fear survey
Fears in children and adolescents: Normative data. schedule for children (FSS-FC): A factor-analytic
Behaviour Research and Therapy, 23, 465±467. comparison with manifest anxiety (CMAS). Behaviour
Ollendick, T. H., & Mayer, J. (1984). School phobia. In S. Research and Therapy, 6, 173±182.
M. Turner (Ed.), Behavioral treatment of anxiety Schopler, E. (1974). Changes of direction with psychiatric
disorders (pp. 367±411). New York: Plenum. children. In A. Davids (Ed.), Child personality and
References 155

psychopathology: Current topics (Vol. I, pp. 205±236). G., & Yaeger, N. J. (1987). Self-statements in aggressive,
New York: Wiley. withdrawn, and popular children. Cognitive Therapy and
Shaffer, D. (1992). NIMH diagnostic interview schedule for Research, 11, 229±239.
children, Version 2.3. New York: Columbia University Swann, G. E., & MacDonald, M. L. (1978). Behavior
Division of Child & Adolescent Psychiatry. therapy in practice: A rational survey of behavior
Shapiro, E. S. (1984). Self-monitoring. In T. H. Ollendick therapists. Behavior Therapy, 9, 799±807.
& M. Hersen (Eds.), Child behavioral assessment: Ullmann, L. P., & Krasner, L. (Eds.) (1965). Case studies in
Principles and procedures (pp. 148±165). Elmsford, NY: behavior modification. New York: Holt, Rinehart, &
Pergamon. Winston.
Shapiro, E. S., McGonigle, J. J., & Ollendick, T. H. (1980). Vasquez Nuttall, E., DeLeon, B., & Del Valle, M. (1990).
An analysis of self-assessment and self-reinforcement in Best practice in considering cultural factors. In A.
a self-managed token economy with mentally retarded Thomas & J. Grimes (Eds.), Best practices in school
children. Journal of Applied Research in Mental Retarda- psychology II, (pp. 219±233). Washington, DC: National
tion, 1, 227±240. Association of School Psychologists.
Silverman, W. K., & Albano, A. M. (1996). Anxiety Vasquez Nuttall, E., Sanchez, W., Borras Osorio, L.,
Disorders Interview Schedule for DSM-IV. San Antonio, Nuttall, R. L., & Varvogil, L. (1996). Assessing the
TX: The Psychological Corporation. culturally and linguistically different child with emo-
Silverman, W. K., & Nelles, W. B. (1988). The anxiety tional and behavioral problems. In M. J. Breen & C. R.
disorders interview schedule for children. Journal of the Fiedler (Eds.), Behavioral approach to assessment of
American Academy of Child and Adolescent Psychiatry, youth with emotional/behavioral disorders: A handbook
27, 772±778. for school-based practitioners (pp. 451±502). Austin, TX:
Skinner, B. F. (1953). Science and human behavior. New ProEd.
York: Macmillan. Wahler, R. G. (1976). Deviant child behavior in the family:
Smith, R. E., & Sharpe, T. M. (1970). Treatment of a Developmental speculations and behavior change stra-
school phobia with implosive therapy. Journal of tegies. In H. Leitenberg (Ed.), Handbook of behavior
Consulting and Clinical Psychology, 35, 239±243. modification and behavior therapy (pp. 516±543). Engle-
Smucker, M. R., Craighead, W. E., Craighead, L. W., & wood Cliffs, NJ: Prentice-Hall.
Green, B. J. (1986). Normative and reliability data for Wahler, R. G., House, A. E., & Stambaugh, E. E. (1976).
the Children's Depression Inventory. Journal of Abnor- Ecological assessment of child problem behavior: A
mal Child Psychology, 14, 25±39. clinical package for home, school, and institutional
Spielberger, C. D. (1973). Preliminary manual for the settings. Elmsford, NY: Pergamon.
State±Trait Anxiety Inventory for Children (ªhow I feel Watson, J. B., & Rayner, R. (1920). Conditioned emotional
questionnaireº). Palo Alto, CA: Consulting Psychologist reactions. Journal of Experimental Psychology, 3, 1±14.
Press. Winett, R. A., Riley, A. W., King, A. C., & Altman, D. G.
Sroufe, L. A., & Rutter, M. (1984). The domain of (1989). Preventive strategies with children and families.
developmental psychopathology. Child Development, 55, In T. H. Ollendick & M. Hersen (Eds.), Handbook of
17±29. child psychopathology (2nd ed., pp. 499±521). New York:
Staats, A. W. (1975). Social behaviorism. Homewood, IL: Plenum.
Dorsey Press. World Health Organization (1991). International classifica-
Staats, A. W. (1986). Behaviorism with a personality: The tion of mental and behavioral disorders: Clinical descrip-
paradigmatic behavioral assessment approach. In R. O. tions and diagnostic guidelines (10th ed.). Geneva,
Nelson & S. C. Hayes (Eds.), Conceptual foundations of Switzerland: Author.
behavioral assessment (pp. 242±296). New York: Guil- Zatz, S., & Chassin, L. (1983). Cognitions of test-anxious
ford Press. children. Journal of Consulting and Clinical Psychology,
Stefanek, M. E., Ollendick, T. H., Baldock, W. P., Francis, 51, 526±534.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.07
Principles and Practices of
Behavioral Assessment with
Adults
STEPHEN N. HAYNES
University of Hawaii at Manoa, Honolulu, HI, USA

4.07.1 INTRODUCTION 158


4.07.2 CLINICAL JUDGMENTS AND FUNCTIONAL ANALYSIS IN BEHAVIORAL ASSESSMENT 158
4.07.3 CONCEPTUAL FOUNDATIONS OF BEHAVIORAL ASSESSMENT 162
4.07.3.1 Assumptions About the Causes of Behavior Disorders 162
4.07.3.1.1 Multiple causality 163
4.07.3.1.2 Multiple causal paths 163
4.07.3.1.3 Individual differences in causal variables and paths 163
4.07.3.1.4 Environmental causality and reciprocal determinism 163
4.07.3.1.5 Contemporaneous causal variables 164
4.07.3.1.6 Interactive and additive causality 165
4.07.3.1.7 Situations, setting events, and systems factors as causal variables 165
4.07.3.1.8 Dynamic causal relationships 166
4.07.3.1.9 Additional assessment implications of causal assumptions 166
4.07.3.2 Assumptions About the Characteristics of Behavior Problems 166
4.07.3.2.1 Behavior problems can have multiple response modes 166
4.07.3.2.2 Behavior problems have multiple parameters 167
4.07.3.2.3 Client can have multiple behavior problems 168
4.07.3.2.4 Behavior problems are conditional 168
4.07.3.2.5 The dynamic nature of behavior problems 168
4.07.4 METHODOLOGICAL FOUNDATIONS OF BEHAVIORAL ASSESSMENT 169
4.07.4.1 An Empirically Based Hypothesis Testing Approach to Assessment 169
4.07.4.2 An Individualized Approach to Assessment 170
4.07.4.3 Time-series Assessment Strategies 170
4.07.4.4 Quantitative and Qualitative Approaches to Behavioral Assessment 171
4.07.5 BEHAVIORAL ASSESSMENT METHODS 171
4.07.5.1 Behavioral Observation 172
4.07.5.1.1 Behavioral observation in the natural environment 173
4.07.5.1.2 Analogue observation 175
4.07.5.2 Self-monitoring 176
4.07.5.3 Psychophysiological Assessment 177
4.07.5.4 Self-report Methods in Behavioral Assessment 178
4.07.5.5 Psychometric Foundations of Behavioral Assessment 179
4.07.6 BEHAVIORAL AND PERSONALITY ASSESSMENT 180
4.07.7 SUMMARY 181
4.07.8 REFERENCES 182

157
158 Principles and Practices of Behavioral Assessment with Adults

4.07.1 INTRODUCTION analogue observation, self-monitoring, and


electrophysiological measurement. The beha-
Psychological assessment is the systematic vioral assessment paradigm also has many
evaluation of a person's behavior. The compo- methodological elements, including an empha-
nents of psychological assessment include the sis on the use of minimally inferential con-
variables selected for measurement (e.g., beliefs, structs, time-series measurement, hypotheses
social behaviors), the measurement methods testing, and an idiographic (i.e., focus on an
used (e.g., interviews, observation), the reduc- individual client) approach to assessment.
tion and synthesis of derived data (e.g., whether This chapter focuses on clinical applications
summary scores are calculated for a question- of behavioral assessment with adults. The
naire), and the inferences drawn from the data chapter will outline the conceptual and meth-
(e.g., inferences about treatment effectiveness). odological elements of behavioral assessment
Psychological assessment affects the evolution and indicate how they aid clinical judgment and
of all social, cognitive, and behavioral science decision-making with adult clients. To illustrate
disciplines. The accuracy with which variables the underlying assumptions, methods, and
can be measured affects the degree to which strategies of behavioral assessment, the first
relationships among behavioral, cognitive, en- section presents a functional analytic causal
vironmental, and physiological events can be model of a clientÐa vector diagram of a
identified and explained. For example, our behavioral clinical case conceptualization. Fol-
understanding of the impact of traumatic life lowing a discussion of the principles and
stressors on immune system functioning, the methods of behavioral assessment, subsequent
relationship between depressed mood and self- sections briefly discuss the history of behavioral
efficacy beliefs, the effect of social response assessment and psychometric considerations.
contingencies on self-injurious behavior, and Developments in behavioral assessment and
the degree to which presleep worry moderates differences between behavioral and nonbeha-
the impact of chronic pain on sleep quality vioral assessment paradigms are also discussed.
depends on the strategies we use to measure
these constructs.
Psychological assessment also affects clinical 4.07.2 CLINICAL JUDGMENTS AND
judgments. In adult treatment settings, clinical FUNCTIONAL ANALYSIS IN
psychologists make judgments about a client's BEHAVIORAL ASSESSMENT
risk of suicide, whether treatment is warranted
for a client, whether a client should be One of the most important and complex
hospitalized, the variables that affect a client's clinical judgments is the clinical case concep-
behavior problems, and the best strategy for tualization. The clinical case conceptualization
treating a client. Psychological assessment also is a metajudgmentÐit is composed of many
helps the clinician select intervention goals and lower-level judgments regarding a client's
evaluate intervention effects. behavior problems and the factors that affect
There are many paradigms in psychological them. It is a synthesis of assessment- and
assessment. A psychological assessment para- research-based hypotheses about a client and its
digm is composed of a coherent set of assess- primary application is for designing the most
ment principles, values, assumptions, and effective treatment.
methods of assessment. It includes assumptions In behavioral assessment, clinical case con-
about the relative importance of different types ceptualization is often termed ªfunctional
of behavior problems, the variables that most analysisº (Haynes & O'Brien, 1990). (Terms
likely cause behavior problems, the most likely with similar meanings include ªclinical patho-
mechanisms through which causal variables genesis mapº [Nezu & Nezu, 1989] and ªbeha-
operate, the importance and role of assessment vioral case formulationº [Persons, 1989]. The
in treatment design, and the best methods and term ªfunctional analysisº is often used in
strategies of assessment. A psychological assess- applied behavior analysis to refer to the sys-
ment paradigm also includes guidelines for tematic manipulation of variables, in a con-
problem-solving, decision-making strategies, trolled assessment setting, to deteremine their
and data interpretation. effect on designated dependent variables.)
One powerful and evolving psychological Functional analysis is a central component in
assessment paradigm is behavioral assessment the design of behavior therapy programs be-
(Haynes & O'Brien, in press). Influenced by cause of individual differences between
behavior-analytic, social-learning, and clientsÐtwo clients can manifest the same
cognitive-behavioral therapy construct systems, primary behavior problem for different reasons
the paradigm incorporates diverse methods of and, consequently, receive different treatments.
assessment but emphasizes naturalistic and Behavioral interventions are often designed to
Clinical Judgments and Functional Analysis in Behavioral Assessment 159

modify variables that are hypothesized to affect in a functional analysis and are designed to
(i.e., account for variance in, trigger, maintain, promote less intuitive intervention decisions.
moderate) problem behaviors and goals The clinical pathogenesis map and FACMs
(Haynes, Spain, & Oliveira, 1993). Many per- graphically model the clinician's hypotheses
mutations of causal variables can result in about a patient's behavior problems and goals
identical behavior problems, thereby resulting and their relative importance, interrelation-
in different functional analyses and warranting ships, sequela and the strength, modifiability,
different interventions. and direction of action of causal variables. The
Behavioral interventions are designed on the FACM allows the clinician to estimate,
basis of many judgments about a patient quantitatively or qualitatively, the relative
reflected in a functional analysis. Clinical magnitude of effect of a particular treatment
judgments with important implications for focus, given the clinician's hypotheses about
treatment decisions include the importance the patient.
(e.g., severity, degree of risk associated with) The FACM of a client presented in Figure 1
of a client's multiple behavior problems; the will be used to illustrate several underlying
relationships (e.g., strength, correlational vs. assumptions and methods of the behavioral
causal) among a client's multiple behavior assessment paradigm. The graphics in Figure 1
problems, and the effects of behavior problems. are explained in Figure 2. The client was a 35-
Judgments about causal variables that affect the year-old pregnant, married, unemployed wo-
behavior problem (their importance, functional man (Mrs. A) who came to an outpatient mental
form, modifiability) are particularly important health center complaining of constant head-
elements of the functional analysis. aches and sleeping problems. (This FACM was
There can be many errors in the clinical modified from one developed by Akiko Lau,
judgments that compose a clinical case con- University of Hawaii and discussed in Haynes,
ceptualization. Books by Eels (1997), Nezu and Leisen, & Blaine, 1997.) She was referred by her
Nezu (1989), and Turk and Salovey (1988) neurologist and had undergone extensive neu-
discuss many of these errors. In brief, a rological and endocrinological examinations
clinician's judgments about a client can be with negative results.
affected by the clinician's preconceived beliefs, Mrs. A was first interviewed in an unstruc-
recent or particularly salient clinical experi- tured, open-ended manner, with the goal of
ences, selective attention to data that confirms encouraging her to talk about her behavior
the clinician's expectations, training-related problems, goals, and the factors affecting them
biases, premature diagnoses, decisions based (Haynes, 1978). Structured interviews were then
on initial impressions, and insufficient integra- conducted to acquire more detailed information
tive abilities. These errors can reduce the on specific behavior problems mentioned in the
validity of the case conceptualization and unstructured interview, such as her anxiety
reduce the chance that the most effective symptoms (Brown, DiNardo, & Barlow, 1994),
treatment strategy will be selected for a patient. marital relationship concerns (O'Leary, 1987),
The supraordinate goal of behavioral assess- headache (James, Thorn, & Williams, 1993),
ment is to reduce error and increase the validity sleep disturbance (Lichstein & Riedel, 1994),
in clinical judgments. The behavioral assess- and other factors depicted in Figure 1.
ment paradigm suggests that clinical judgment Validated questionnaires on marital satisfac-
error can be reduced to the degree that the tion, child behavior problems, anxiety, and life
assessor uses multiple sources of information, stressors were also administered to provide
validated assessment instruments, time-series additional information on issues raised in the
measurement strategies, focuses on multiple interview process.
response modes, minimizes the inferential Because headaches and sleep difficulties were
characteristics of variables, addresses behavior± important problems, Mrs. A began daily self-
environment interactions, and is guided by data monitoring after the second interview and
from previously published studies (e.g, Persons continued to self-monitor throughout the
& Fresco, 1998). assessment-treatment process. She recorded
Haynes (1994), Haynes et al. (1993), Nezu headache intensity and symptoms four times
and Nezu (1989), and Nezu et al. (1996) have per day and each morning she recorded sleep-
outlined two methods to help the clinician onset and awakenings for the previous night.
systematically integrate the complex informa- Marital conflict was a major concern for Mrs.
tion contained in a functional analysis. These A and one possible cause of her psychophysio-
methods are the clinical pathogenesis map and logical problems. Consequently, a one and a
functional analytic causal models (FACMs). half hour assessment session (the third session)
Both involve systematic construction of vector was conducted with her and her husband.
diagrams of the component clinical judgments During the session, the couple underwent a
NEURO-
ATTENTION
LOGICAL
AND HELP
DYSFUNCTION
CONSTANT FROM
NON- 0
FREQUENT HEADACHE HUSBAND
COMPLIANT
MOTHER– 80 .8
DAUGHTER
.8 DAUGHTER CONFLICT
.8
POOR
PARENTING
SKILLS
.8 ANXIETY: INTERMITTENT
IMPAIRED
PHYSIOLOGICAL SLEEP
CONCENTRATION
HYPERREACTIVITY MAINTENANCE
20
.4 PROBLEMS
ESCALATING 40
HUSBAND’S
MARITAL
ALCOHOL USE
CONFLICT
.2
.8 EXCESSIVE
WEIGHT
GAIN
2

DYSFUNCTIONAL
MARITAL PROBLEM PREG- POOR
SOLVING NANCY HEALTH
.8 0 BEHAVIORS
2

Figures 1 and 2 An FACM of an outpatient woman with headaches and sleep disorders. The figures illustrate the relative importance of behavior problems, interrelationships
among behavior problems, behavior problem sequalea, casual relationships, and the modifiability of casual variables.
Clinical Judgments and Functional Analysis in Behavioral Assessment 161

ILLUSTRATING A FUNCTIONAL ANALYSIS WITH A


FUNCTIONAL ANALYTIC CAUSAL MODEL

IMPORTANCE/MODIFIABILITY OF VARIABLES
(using width of variable boundary and coefficients)

X1 LOW IMPORTANCE/ X1 HIGH IMPORTANCE/


.2 MODIFIABILITY .8 MODIFIABILITY

.4

TYPE AND DIRECTION OF RELATIONSHIP BETWEEN VARIABLES

NONCAUSAL, CORRELATIONAL UNIDIRECTIONAL CAUSAL BIDIRECTIONAL CAUSAL

SYMBOLS

ORIGINAL, CAUSAL
BEHAVIOR PROBLEM;
UNMODIFIABLE VARIABLE;
X X Y,Z EFFECT OF BEHAVIOR
CAUSAL MEDIATING
PROBLEM
VARIABLE VARIABLE

STRENGTH OF RELATIONSHIP BETWEEN VARIABLES

INDICATED BY ARROW THICKNESS; MORE PRECISELY BY COEFFICIENTS

.2 .4 .8

WEAK MODERATE STRONG

MEDIATING
RELATIONSHIP
162 Principles and Practices of Behavioral Assessment with Adults

conjoint structured interview about their mar- conditional. For example, some variables
ital relationship (e.g., perceived conflicts, affecting Mrs. A's behavior problems may
strengths, spousal excesses and deficits, marital change after the birth of her child.
goals), and Mr. A also completed a marital A final note on the limitations of the
satisfaction questionnaire. The couple partici- functional analysis. Despite its central role in
pated in an analogue communication assess- behavior therapy, the functional analysis is
ment, in which they discussed for 10 minutes limited in several ways: (i) the best assessment
their conflicts regarding disciplining their 12- methods for developing a functional analysis
year-old daughter. The conversation was re- have not been identified, (ii) the best methods for
corded and later coded by the assessor. formulating a functional analysis from assess-
Based on interview and questionnaire data, ment data have not been determined, and (iii) for
conflicts with her daughter were another source many behavior problems, the incremental utility
of distress for Mrs. A. A joint assessment session and cost-effectiveness of the functional analysis
(the fourth session) was conducted in which the have yet to be established.
daughter was interviewed about her perception
of family issues. Also, the mother and daughter
were observed in two structured assessment 4.07.3 CONCEPTUAL FOUNDATIONS OF
settings: while trying to resolve one of their BEHAVIORAL ASSESSMENT
frequent sources of conflict (the daughter's
Many methodological elements of the beha-
refusal to do her school work) and while Mrs. A
vioral assessment paradigm, such as the
attempted to help her daughter with some
preferred methods of assessment and the
school work.
variables targeted in assessment, are influenced
The functional analytic causal model of Mrs.
by its underlying assumptions. The following
A emphasizes many elements of a clinical case
sections review two sets of assumptions: (i)
conceptualization that are important for beha-
those concerning the causal factors associated
vioral treatment decisions. (Many other
with behavior problems and goals, and (ii)
factorsÐe.g., treatment history, client cognitive
those concerning the characteristics of behavior
resources, cost-efficiency of treatments, re-
problems. This section also discusses implica-
sponses of persons in the client's environmentÐ
tions of these assumptions for behavioral
affect treatment design in addition to those
assessment strategies with adults. More ex-
included in a FACM.) Important and con-
tensive discussions of underlying assumptions
trollable functional relationships are high-
in behavioral assessment can be found in
lighted in the FACM because of their clinical
Bandura (1969), Barrios (1988), Bellack and
utility. The FACM for Mrs. A recognizes some
Hersen, (1988), Bornstein, Bornstein, and
unmodifiable variables but these have no
Dawson (1984), Ciminero, Calhoun, and
clinical utility. Unidirectional and bidirectional
Adams (1986), Cone (1988), Eysenck, (1986),
causal relationships are shown because they can
Haynes (1978), Hersen and Bellack (1998),
significantly affect decisions about what vari-
Johnston and Pennypacker, (1993), Kratoch-
ables should be targeted in treatment. Treat-
will and Shapiro (1988), Mash and Terdal
ment decisions are also affected by the strength
(1988), Nelson and Hayes (1986), O'Donohue
of causal relationships and the degree of
and Krasner, (1995), Ollendick and Hersen
modifiability of causal variables, depicted in
(1984, 1993), Strosahl and Linehan (1986), and
Figure 1.
Tryon (1985).
Before considering the specific assumptions
of the behavioral assessment paradigm that
influenced the assessment strategy outlined 4.07.3.1 Assumptions About the Causes of
above and the clinical judgments summarized Behavior Disorders
in Figure 1, several additional attributes of the
functional analysis should be briefly noted. Psychological assessment paradigms differ in
First, the functional analysis (and the FACM) the assumptions they make regarding the
reflects the clinician's current judgments about a causes of behavior disorders. Although causal
client. It is a subjectively derived (although data- assumptions and the identification of causal
influenced), hypothesized model. It is also variables in pretreatment assessment are less
unstable, in that it can change with changes in important for treatment paradigms with limited
naturally occurring causal variables, with the treatment options (e.g., Gestalt, transactional
acquisition of additional data, and as a result of therapies), the identification of potential causal
treatment. For example, a change in the variables is a primary objective in pretreatment
variables that affected Mr. A's drinking could behavioral assessment. This is because hy-
lead to a significant change in the FACM for pothesized controlling variables are targeted
Mrs. A. A FACM for a client may be for modification in behavior therapy and it is
Conceptual Foundations of Behavioral Assessment 163

presumed that causal variables may vary across 4.07.3.1.3 Individual differences in causal
patients with the same behavior problems. The variables and paths
variables presumed to cause Mrs. A's sleep
Models of causality for behavior problems
problems may not operate for other patients
are further complicated because the permuta-
with identical sleep problems. Consequently,
tions of causal variables and causal mechanisms
other patients with the same sleep disorder
can differ across clients with the same behavior
would be treated differently.
problem. For example, there can be important
Behavioral assessment strategies are guided
differences in the causal variables and causal
by several empirically based and interrelated
paths among persons reporting chronic pain
assumptions about the causes of behavior
(Waddell & Turk, 1992), exhibiting self-injur-
problems. These assumptions include: multiple
ious behaviors (Iwata et al., 1994), or who
causality, multiple causal paths, individual
complain of difficulties in initiating and main-
differences in causal variables and paths,
taining sleep (Lichstein & Riedel, 1994). Some
environmental causality and reciprocal deter-
differences in causality may covary with
minism, contemporaneous causal variables, the
dimensions of individual differences. For ex-
dynamic nature of causal relationships, the
ample, the causes of depression, marital distress,
operation of moderating and mediating vari-
and anxiety may vary as a function of ethnicity,
ables, interactive and additive causality; and
age, gender, religion, sexual orientation, and
situations, setting events, and systems factors
economic status (see discussions in Marsella &
as causal variables and dynamical causal
Kameoka, 1989).
relationships.

4.07.3.1.4 Environmental causality and


4.07.3.1.1 Multiple causality reciprocal determinism
Behavior problems are often the result of The behavioral assessment paradigm also
multiple causal variables acting additively or stresses the importance of environmental caus-
interactively (Haynes, 1992; Kazdin & Kagan, ality and behavior±environment interactions
1994). This is illustrated in Figure 1 by the (McFall & McDonel, 1986). Many studies have
multiple factors influencing Mrs. A's head- shown that it is possible to account for variance
aches. Although some behavior problems and in many behavior problems by examining
the behavior problems of some individuals variance in response contingencies (e.g., how
(e.g., asthma episodes that are mostly triggered others respond to self-injurious behaviors,
by exposure to specific allergens) may primar- depressive statements, or asthma episodes can
ily be the result of single causal variables, affect the parameters of those behaviorsÐa
multivariate causal models have been proposed ªparameterº of a behavior refers to a quanti-
for most adult behavior disorders, including tative dimension, such as rate, duration,
schizophrenia, chronic pain, sleep disorders, magnitude, and cyclicity), situational and
paranoia, personality disorders, child abuse, antecedent stimulus factors (e.g., alcohol use
and many other behavior disorders (see reviews may vary reliably across social settings, anxiety
in Gatchel & Blanchard, 1993; Sutker & episodes may be more likely in more inter-
Adams, 1993). personally stressful environments), and other
learning principles (e.g., modeling, stimulus
pairings; see discussions in Eysenck & Martin,
1987; O'Donohue & Krasner, 1995).
4.07.3.1.2 Multiple causal paths
An important element of the principle of
A causal variable may also affect a behavior environmental causality is reciprocal determin-
problem through multiple paths. Note that for ism, (i.e., bidirectional causality, reciprocal
Mrs. A, physiological hyperreactivity can causation; Bandura, 1981)Ðthe idea that two
directly influence sleep but hyperreactivity can variables can affect each other. In a clinical
also influence sleep because it produces head- context, reciprocal determinism refers to the
aches. Similarly, there may be many paths assumption that clients can behave in ways that
through which social isolation increases the risk affect their environment which, in turn, affects
of depression (e.g., by restricting the potential their behavior. For example, a client's depres-
sources of social reinforcement, by increasing sive behaviors (e.g., reduced social initiations
dependency on reinforcement from a few and positive talk) may result in the withdrawal
persons) and many paths through which of the client's friends, increasing the client's loss
immune system functioning can be impaired of social reinforcement and increasing the
by chronic life stressors (e.g., dietary changes, client's depressive mood and behaviors. A
reduction of lymphocyte levels). hospitalized paranoid patient may behave
164 Principles and Practices of Behavioral Assessment with Adults

suspiciously with staff and other patients. These evaluated. With Mrs. A, it would be important
behaviors may cause others to avoid the patient, to determine what additional parenting and
talk about him/her and behave in many other marital communication skills might help Mrs. A
ways that confirm and strengthen the patient's develop a more positive relationship with her
paranoid thoughts. With Mrs. A, we presume daughter and husband. An example would
that there are some ways that Mrs. A is behaving include the ability to clearly and positively talk
that might contribute to her marital distress and about her ideas and concerns.
difficulties. An emphasis on bidirectional cau- Cognitive skills are often targeted by beha-
sation does not negate the possibility of vioral assessors. The clients' beliefs, expectan-
unidirectional environmental causal factors. cies, deductions, and other thoughts regarding
In some distressed marriages, for example, a their capabilities in specific situations (e.g.,
spouse may independently be contributing to Linscott & DiGiuseppe, 1998) are often con-
marital distress by being verbally abusive or sidered essential elements for effective function-
unsupportive. However, pure unidirectional ing. A molar-level skill is adaptive
causation may be rare. flexibilityÐan overarching goal of behavior
Viewing clients within a reciprocal determin- therapy is to help the client to develop behavior
ism framework effects the focus of assessment repertoires that facilitate adaptability to var-
and treatment. Clients are considered active ious, novel and changing environments.
participants in their own lives, as active
contributors to their goal attainment and to
4.07.3.1.5 Contemporaneous causal variables
their behavior problems. Consequently, clients
are encouraged to participate actively in the The behavioral assessment paradigm empha-
assessment±treatment process. Assessment sizes the relative importance and utility of
goals include identifying the ways that clients contemporaneous rather than historical, causal
may be contributing to their behavior problems factors. It is presumed that a more clinically
and ways they can contribute to the attainment useful, and sometimes more important, source
of treatment goals. of variance in a client's behavior problems can
One consequence of reciprocal determinism is be identified by examining the client's current,
that labels such as ªbehavior problem (depen- rather than historical, learning experiences,
dent variable)º or ªcausal variable (independent social contingencies, and thoughts. For exam-
variable)º become less distinct. Often, either ple, suspicious thoughts and behaviors can
variable in a bidirectional relationship can be undoubtedly be learned as a child from parental
described as a behavior problem and a causal models (e.g., parents who teach a child to be
variableÐeach variable can be either or both. mistrustful of others' intentions; Haynes, 1986).
Which variables are described as problem vs. However, early parent±child learning experi-
cause depends more on convention or the intent ences are difficult to identify and ªtreatº in
of the assessor and client than on the character- therapy. Assessment efforts might more profit-
istic of the functional relationships. As indicated ably be focused on contemporaneous causal
in the functional analytic causal models (e.g., variables for paranoid thoughtsÐsuch as
Haynes, 1994), treatment decisions are dictated restricted social network that precludes correc-
more by estimates of the strength of causal tive feedback about misperceptions, social skills
relationships than by the label of the variable. deficits, hypersensitivity to negative stimuli or
The concept of reciprocal determinism also negative scanning, or failure to consider alter-
promotes a behavioral skills focus in assessment native explanations for ambiguous events.
and treatment. A client's behavior problems are These can also be important causal variables
presumed to be a partial function of their for a client's paranoid behaviors and are more
behavioral repertoire. Their behavioral ex- amenable than historical events to intervention.
cesses, deficits, and strengths are presumed to The emphasis on contemporaneous, recipro-
affect whether they will experience problems in cal, behavior±environment interactions dictates
some situations, the type and magnitude of an emphasis on particular methods of assess-
behavior problem experienced and how long the ment. For example, naturalistic observation,
problem persists. For example, a behavior skills analogue observation, and self-monitoring are
assessment with a socially anxious client might better suited than retrospective questionnaires
focus on specific deficits that prevent the client to measuring contemporaneous, reciprocal
from forming more frequent and satisfying dyadic interactions. Additionally, in behavio-
friendships. Similar to a task analysis, the rally oriented interviews and questionnaires
necessary skills for attaining a treatment goal clients are more often asked about current than
(e.g., establishing positive friendships) are about past behavior±environment interactions
broken down into molecular components and (Jensen & Haynes, 1986; Sarwer & Sayers,
the client's abilities on these components are 1998).
Conceptual Foundations of Behavioral Assessment 165

An emphasis on contemporaneous reciprocal variable (purging history, social context) de-


determinism is compatible with the causal role pended on the value of the other causal variable.
of genetic and physiological factors, and early Diathesis-stress models of psychopathology
learning experiences. Evidence from many are common exemplars of interactive causality
sources suggests that genetic factors, neuro- (e.g., Barnett & Gotlib, 1988). Diathesis-stress
physiological mechanisms, medical disorders, models suggest that environmental stressors
and early learning (e.g., early traumatic experi- and physiological or genetic vulnerability (or
ences) can serve as important causal variables genetic and later physiological challenges)
for behavior problems (see reviews in Asteria, interact to affect the probability that a
1985; Haynes, 1992; Sutker & Adams, 1993). particular behavior disorder will occur.
Sometimes, physiological, behavioral, and cog-
nitive variables are different modal expressions
4.07.3.1.7 Situations, setting events, and systems
of the same phenomena.
factors as causal variables
The emphasis on contemporaneous behavior
and environmental causality is evident in the One assumption of the behavioral assessment
contemporaneous focus of many behavioral paradigm is that the probability (or another
assessment interviews. However, behavioral parameter such as magnitude) of behavior
assessors differ in their emphasis on historical problems varies across situations, settings,
data. Joseph Wolpe, for example, emphasized and antecedent stimuli (e.g., discrete and
the importance of gathering a complete clinical compound antecedent stimuli, contexts, discri-
history for patients before therapy (Wolpe minative stimuli for differential reinforcement
& Turkat, 1985). Often historical information contingencies); behavior problems are condi-
can aid in the development of hypotheses tional. The conditional nature of behavior
regarding the time-course and causes of beha- problems has important causal implications
vior problems. For example, a careful interview because it marks the differential operation of
about past behaviors, events and treatment causal factors. Mrs. A was more likely to
experiences can help determine if Mrs. A may experience anxiety symptoms in the presence
be experiencing neurological deficits (e.g., than in the absence of her daughter. The
she had a minor head injury two years prior presence of the daughter marked the operation
to this assessment) and may help estimate the of a causal relationship and suggested to the
degree to which her health-related behaviors assessor that the mother±daughter interactions
(e.g., poor diet and exercise habits) are should be explored in greater detail.
modifiable. A situational model of behavior problems
contrasts with traditional personality trait
models, which emphasize a higher degree of
4.07.3.1.6 Interactive and additive causality
cross-situational consistency of behavior and
In the section on multiple causality, I noted some enduring trait of the person as the primary
that a behavior problem often results from causal variable for behavior problems (see
multiple causal factors acting concurrentlyÐ subsequent discussion of personality assess-
this is an additive model of causality. Causal ment). However, situational and trait models of
variables can also interactÐthis is a multi- behavior problems can be compatible. Knowl-
plicative model of causality. Interactive caus- edge of the robust behaviors of a client (e.g.,
ality occurs when the causal effects of one those behaviors that do not vary to an
variable vary as a function of the values of important degree across conditions) and knowl-
another causal variable (see discussion in edge of the situational factors that influence the
Haynes, 1992). Furthermore, the effects of the client's behavior can both contribute to a
variables in combination often cannot be functional analysis of the client. This ªinterac-
predicted by simply summing their independent tionalº perspective is a welcomed refinement of
effects. A longitudinal study by Schlundt, exclusively trait models (see discussions by
Johnson, and Jarrell (1986) demonstrated Mischel, 1968; McFall & McDonel, 1986).
interactive causal effects with bulimic clients. Cross-situational consistency can vary across
The probability of postmeal purging was different behaviors, individuals, and situations;
significantly related to a history of recent purges relative cross-situational consistency in beha-
(i.e., purging tended to occur in cycles). vior can occur, but may not. Because the
However, the social context within which eating assessor does not have prior knowledge of the
occurred affected the strength of the relation- degree of cross-situational consistency of a
ship between those two variables. The chance of client's behavior problems, the assessor must
purging was higher when the person had evaluate their conditional nature. Unfortu-
recently purged, but especially higher if the nately, strategies and classification schema for
person ate alone. The effect of each causal situations have not been developed.
166 Principles and Practices of Behavioral Assessment with Adults

Although the behavioral assessment para- briefly noted in the previous sections and will be
digm emphasizes contemporaneous causal fac- discussed in greater detail later in this chapter.
tors (e.g., a SORC [stimulus, organism, The assessment implications include: (i) pre-
response, contingency] model; Goldfried, treatment assessment should be broadly focused
1982), extended social systems can play an on multiple variables; (ii) classification will
important causal role. Mrs. A's marital satisfac- usually be insufficient to identify the causal
tion may be affected by her relationships with variables operating for a particular client and
her friends and family. Her daughter's behavior insufficiently precise to identify the client's
problems in the home may be affected by the behavior problems; (iii) assessors should avoid
social and academic environment of her school. ªprematureº or ªbiasedº presumptions of
Assessment efforts cannot be confined to causal relationships for a client and draw
individual elements extracted from a complex data-based inferences whenever possible; (iv)
array of interacting variables. Chaos theory and a valid functional analysis is most likely to result
dynamic modeling also suggest that it may be from assessment that focuses on multiple
difficult to develop powerful predictive or response modes, uses multiple methods; and
explanatory models unless we measure behavior gathers data from multiple sources; (v) it is
within the complex dynamical systems in which important to identify the mechanisms under-
the behavior is imbedded (Vallacher & Nowack, lying causal relationships (see also, discussion in
1994) Shadish, 1996); and (vi) a time-series assessment
strategy can be an effective method of identify-
ing and tracking behavior problems and
4.07.3.1.8 Dynamic causal relationships potential causal factors.
All elements of a functional analytic causal
modelÐthe causal variables that affect a client's
behavior problems, the strengths of causal
relationships, moderating variables, for exam- 4.07.3.2 Assumptions About the Characteristics
ple, are nonstationary (Haynes, Blaine, & of Behavior Problems
Meyer, 1995). Causal relationships for a client
can be expected to change across time in several Behavioral assessment strategies and the
ways. First, new causal variables may appear: resulting clinical case conceptualizations are
Mr. or Mrs. A may develop new health strongly affected by assumptions of the beha-
problems; Mr. A may lose his job. Second, a vioral assessment paradigm about the charac-
causal variable may disappear: Mr. A may stop teristics of behavior problems. Several of these
drinking; Mrs. A may give birth to her baby. assumptions were mentioned earlier in this
Third, the strength and form of a causal chapter. They include the multimodal and
relationship are likely to change over time. multiparameter characteristics of behavior pro-
There may be a decrease in sleep disruption blems, differences among clients in the impor-
originally triggered by a traumatic event; tance of behavior problem modes and
marital distress that originally caused depressive parameters, the complex interrelationships
symptoms may be exacerbated by continued among a client's multiple behavior problems,
depressive reactions. Fourth, moderating vari- and the conditional and dynamic natures of
ables may change: Clients may change their behavior problems.
expectancies about the beneficial effects of
alcohol (Smith, 1994). In causal models of
behavior disorders, a moderating variable is one 4.07.3.2.1 Behavior problems can have multiple
that changes the relationship between two other response modes
variables. For example, ªsocial supportº would
Adult behavior problems can involve more
be a moderating variable if it affected the
than one response mode. For example, PTSD
probability that an environmental disaster
can involve physiological hyperreactivity, sub-
would be associated with symptoms of post-
jective distress, avoidance of trauma-related
traumatic sress disorder (PTSD).
situations and thoughts, and distressing recol-
lections and dreams of the traumatic event
(American Psychiatric Association, 1994; Ku-
4.07.3.1.9 Additional assessment implications of
bany, 1994). The degree of covariation among
causal assumptions
modes of a behavior problems can vary across
Emphases on multivariate, idiosyncratic, persons (note that the Diagnostic and statistical
interactive, reciprocal deterministic, and dy- manual of mental disorders [4th ed.] requires that
namic causal models have several implications a client manifest only one of five major [category
for behavioral assessment strategies that were B] symptoms for a diagnosis of PTSD). For
Conceptual Foundations of Behavioral Assessment 167

example, some PTSD clients show only slight duration, magnitude, and rate, that can be used
evidence of distressing recollections of the event to describe a behavior. As with response modes,
but show significant physiological reactivity to there are important between-client differences
event-related cues while others show the in the relative importance of different behavior
opposite pattern. problem parameters. For example, Mrs. A
Low levels of covariation among the multiple reported mildly intense but constant headaches
modes of behavior problems have been found in and intermittent but severe sleep disruption.
both group nomothetic and single-subject time- Other clients could report the same problems
series research (see discussion in Gannon & with different parameters (e.g., infrequent but
Haynes, 1987). Acknowledging that the appar- debilitatingly intense headaches). Similarly,
ent magnitude of covariation reflects the ways in some clients report frequent but severe episodes
which the modes are measured, these findings of depression that last for only a few days;
suggest that behavior problem modes can be others report episodes of mild to moderate
discordant across persons and for one person depression that can last months.
across time. (See discussions in Cone (1979) and Multiple parameters of behavior disorders
Eifert and Wilson (1991). Different response have important implications for the functional
modes are often measured with different analysis because different parameters may be
methods. Different response modes can also affected by different causal variables. For
have different response latencies which can example, Barnett and Gotlib (1988) reviewed
reduce apparent magnitudes of covariation if all the literature on depression and suggested that
modes are sampled simultaneously.) learned helplessness beliefs seem to affect the
Discordance among response modes for some duration and magnitude of depressive beha-
clients has many clinical implications. Different viors. However, learned helplessness beliefs
response modes of a behavior problem may could not account for the onset of depressive
have different latencies in response to treat- behaviors. Consequently, the functional ana-
ment. Furthermore, some treatments may have lysis and treatment of a client with frequent
a stronger effect on some modes than on others. depressive episodes might be different from the
Consequently, the effects of treatment may be functional analysis and treatment of a client
judged differently depending on which response with infrequent but persistent depressive
mode is measured by which method. More episodes.
important for a functional analysis of a patient, One assessment implication that permeates
different response modes can be affected by many assumptions underlying behavioral as-
different causal factors and respond differently sessment is that aggregated measures are
to the same causal factor. Therefore, the insufficiently precise for a functional analysis.
selection of the most important mode for a Between-person differences in the importance
client's behavior problem can affect the func- of behavior problem modes and parameters
tional analysis and the intervention program mandate careful specification and measurement
designed for that client. of multiple modes and parameters. Measures of
Assessment strategies should be congruent behavior problems that aggregate across modes,
with the multimodal nature of behavior pro- parameters, situations, for example, a single
blems. First, because causal inferences are an measure of ªdepressionº or ªanxiety,º will often
important component of the functional analysis be insufficiently precise for the development of a
and guide treatment decisions, the primary functional analysis and for the design of
mode of a client's behavior problem should be intervention programs. Aggregated indices
identified. Second, as the prior example of may be helpful as a general index of the
PTSD illustrated, diagnosis may be helpful but magnitude or generality (or unconditional
is usually an insufficient basis for the identifica- probability) of a behavior problem, but are
tion of the most important response modes for a insufficient for the precise inferences that are
client. Third, inferences regarding treatment necessary for the evolution of assessment
effects for one mode may not be generalizable to paradigms, functional analyses, and treatment
other modes. In sum, behavioral assessment strategies.
should have a multimodal focus. One helpful strategy for gathering more
precise data on behavior problem parameters
is the construction of a time-course for behavior
problemÐa time line reflecting occurrence,
4.07.3.2.2 Behavior problems have multiple
magnitude, and duration of behavior problems.
parameters
An example of this method is the ªtimeline
As previously mentioned, each behavior followbackº by Sobell, Toneatto, and Sobell
problem mode can have multiple parameters. (1994) to establish a time-course of substance
Parameters are quantitative dimensions, such as use.
168 Principles and Practices of Behavioral Assessment with Adults

4.07.3.2.3 Client can have multiple behavior has the same function, (Haynes, 1996). Adap-
problems tive elements of the response class can some-
times be strengthened to weaken maladaptive
Many clients have multiple behavior pro- elements of that class. Relaxation skills may be
blems. For example, Beck and Zebb (1994) taught as a substitute for dysfunctional ways of
reported that 65±88% of panic disorder patients reducing physiological hyperarousal. Effective
have a coexisting behavior disorder, Figley communication skills may reduce the frequency
(1979) reported a high incidence of comorbidity of self-injurious behavior for some developmen-
for PTSD, Regier et al., (1990) noted that drug tally disabled individuals (Durand & Carr,
users often have other concurrent behavior 1991).
problems. Similar observations of comorbidity
have been reported for panic disorders (Craske
& Waikar, 1994) and depression (Persons & 4.07.3.2.4 Behavior problems are conditional
Fresco, 1996). As noted earlier in this chapter, behavior
A client with multiple behavior problems problems seldom occur randomly or uncondi-
challenges the clinical judgment capabilities of tionally. Although it is difficult to predict the
the assessor and complicates the functional occurrence of many behavior problems, the
analysis because the mode and parameter of probability of occurrence often varies as a
each behavior problem can be affected by function of settings, antecedent stimuli, envir-
multiple causal variables: functional analytic onmental contexts, physiological states, and
causal models were developed partly as a other discriminative stimuli (Gatchel, 1993;
method of organizing these clinical judgments. Glass, 1993; Ollendick & Hersen, 1993). It
Additionally, multiple behavior problems can was previously noted that identifying sources of
have complex causal and noncausal relation- variance in behavior problems can help the
ships. Note the relationships between sleep and assessor to identify causal variables and me-
headache problems for Mrs. A illustrated in chanisms. For example, behavioral assessors
Figure 1. Beach, Sandeen, and O'Leary (1990) attempt to identify the conditions that trigger
observed a reciprocal causal relationship be- the startle responses of a client with PTSD (Foa,
tween marital distress depression for some et al., 1989), that triggers a client's asthma
patients (with many variables moderating that episodes (Creer & Bender, 1993), and the
relationship). Hatch (1993) observed that conditions associated with marital violence
depression can affect pain perception of head- (O'Leary, Vivian, & Malone, 1992) to develop
ache patients and that headaches may con- a functional analysis and design the most
tribute to the occurrence of depressive episodes appropriate intervention program.
for some patients. The conditional nature of behavior problems
The assumption that clients may have more further diminishes the clinical utility of aggre-
than one behavior problem has several implica- gated measures of a behavior problemÐ
tions for behavioral assessment strategies. First, assessment instruments that provide a ªscoreº
initial assessment, such as the intake interview without providing more precise information of
(Kerns, 1994; Sarwer & Sayers, 1998) must be the conditional nature of the behavior problem.
broadly focused to identify a client's multiple Assessment instruments should help the asses-
behavior problems. Premature narrowing of the sor examine the conditional probabilities of the
assessment focus may preclude the identifica- behavior problem or the magnitude of shared
tion of important behavior problems. Following variance between the behavior problem and
a broad survey, subsequent assessment efforts multiple situational factors. For Mrs. A, the
can be focused on problem specification and the assessor would try to determine the situations
functional relationships relevant to each beha- that provoked conflict with her daughter, and
vior problem. The functional analysis and to determine the events that increased or
intervention decisions will also be affected by decreased the intensity of her headaches.
estimates of the form and strength of relation- Behavioral questionnaires and interviews, self-
ship among, and relative importance of, a monitoring, and naturalistic observation can
client's behavior problems. Multiple behavior aid in identifying the conditional aspects of
problems also mandate a multivariate focus in behavior problems.
treatment outcome evaluation.
Sometimes, the identification of functional
response groups will aid in treatment decisions 4.07.3.2.5 The dynamic nature of behavior
(Sprague & Horner, 1992). A functional problems
response group is a set of behaviors, which The parameters and qualitative aspects (e.g.,
may differ in form, that are under the control of topography, form, characteristics) of behavior
the same contingencies (a set of behaviors that problems can change over time. The frequency,
Methodological Foundations of Behavioral Assessment 169

intensity, and content of arguments between the assessment strategies, serve the priority
Mr. and Mrs. A will probably change within a placed on a scholarly, empirically based
few weeks and months. The magnitude, fre- approach to clinical assessment. It is assumed
quency, duration, and form of a clients' PTSD that clinically useful knowledge about behavior
symptoms, paranoid delusions, nightmares, problems is best acquired through the frequent
excessive dieting, and pain can change in administration of carefully constructed assess-
important ways across time. Also, new behavior ment instruments that are precisely focused of
problems may occur and some behavior multiple, minimally inferential variables. Beha-
problems may become less important. vioral assessors are likely to eschew assessment
Dynamic behavior problems and other vari- instruments that are poorly validated, provide
ables can only be sensitively tracked by indices of vaguely defined and highly inferential
measuring them frequently, using time-series constructs, and have unsubstantiated clinical
assessment strategies. The frequency with which utility.
dynamic variables should be sampled depends
on their rate of change. Collins and Horn
(1991), Heatherton and Weinberger (1994), 4.07.4.1 An Empirically Based Hypothesis
Johnston and Pennypacker (1993), Kazdin Testing Approach to Assessment
(1997), and Kratochwill and Levin (1992)
discuss instruments, strategies, and issues in A hypothesis testing and refinement climate
the measurement of dynamic variables. Fre- guides the behavioral assessment of adults. The
quent measurement of behavior problems can assessor makes many tentative judgments about
also help the assessor identify important causal the client beginning early in the preintervention
relationships. For example, recent changes in assessment phase. The clinician estimates the
the magnitude of a client's depressive symptoms relative importance of the client's behavior
may provide cues about environmental or problems and goals, the variables that influence
cognitive causal factors. Changes in Mrs. A's problems and goals, and other elements of the
sleep patterns could trigger inquiries about functional analysis. Based on these early clinical
events preceding the change. Self-monitoring, judgments, the assessor also begins to estimate
brief structured interviews and short question- the most effective methods of intervention and
naires are particularly amenable to time-series selects additional assessment instruments, (see
assessment. discussions in Eels, 1996; Haynes et al., 1997;
Nezu et al., 1996; O'Brien & Haynes, 1995;
Persons & Bertagnolli, 1994; Turk & Salovey,
4.07.4 METHODOLOGICAL 1988). These hypotheses are tested and refined
FOUNDATIONS OF BEHAVIORAL as assessment continues. With Mrs. A, for
ASSESSMENT example, initial judgments that deficits in
marital communications skills were functionally
The methodological elements of the beha- related to their marital conflicts would be
vioral assessment paradigm, the preferred evaluated subsequently through analogue com-
strategies of behavioral assessment, are dictated munication assessment and by teaching the
by the assumptions about behavior and its couple more positive discussion strategies. If
causes outlined in the previous sections. Many their communication skills increased but their
of these methodological elements were intro- marital conflicts did not decrease invalid
duced earlier in this chapter and are outlined in hypotheses may have been initially drawn about
Table 1. the causes of this couple's marital conflicts.
This section will discuss three of the elements There are other possible explanations for a lack
of the behavioral assessment paradigm deli- of observed covariation in a causal relationship.
neated in Table 1: (i) the emphasis on empirical For example, if another causal factor or
hypothesis-testing, (ii) the idiographic empha- moderating variable became operational while
sis, and (ii) the use of time-series assessment communication skills were strengthened, there
strategies. These and other methodological could appear to be no causal relationship
elements from Table 1 have been presented in between communication skills and conflict
Cone and Hawkins (1977), Haynes (1978), (Shadish, 1996).
Hersen and Bellack (1996), Johnston and A scholarly hypotheses-testing approach to
Pennypacker, (1993), Mash and Terdal psychological assessment requires that the
(1988), Nelson and Hayes (1986), Ollendick results of assessment and contingent clinical
and Hersen (1993), and Shapiro and Kratoch- inferences be viewed skeptically. Clinical judg-
will (1988). ments are always based on imperfect measure-
Most of the methodological elements of the ments of nonstationary data and are
behavioral assessment paradigm, particularly intrinsically subjective. The validity and utility
170 Principles and Practices of Behavioral Assessment with Adults

Table 1 Methodological emphases of the behavioral assessment paradigm.

Assessment strategies
1. Idiographic assessment; a focus on the client's specific goals, individual behavior problems; individually
tailored assessment; a de-emphasis on normatively based assessment (e.g., trait-focused questionnaires).
2. A hypothesis-testing approach to assessment and treatment (including the use of interrupted time-series
designs).
3. Time-series assessment strategies (as opposed to single-point or pre±post measurement strategies);
frequent measures of important variables.
4. Multimethod and multi-informant measurement of variables across multiple situations.
5. Quantification of variables (measurement of rates, magnitudes, durations).
6. The use of validated assessment instruments in conditions in which sources of measurement error are
minimized.
The focus of assessment
7. Precisely specified, lower-level, and less inferential constructs and variables.
8. Observable behavior (as opposed to hypothesized intrapsychic events).
9. Client±environment interactions, sequences, chains, and reciprocal interactions.
10. Behavior in the natural environment.
11. Events that are temporally contiguous to behavior problems.
12. Multiple client and environmental variables in pretreatment assessment (multiple behavior problems,
causal variables, moderating and mediating variables).
13. Multiple targets in treatment outcome evaluation (e.g., main treatment effects, side effects, setting and
response generalization).
14. Multiple modes and parameters of behavior and other events.
15. Extended systems (e.g., family, work, economic variables).

Source: Haynes (1996b).


These are relative emphases whose importance varies across behavioral assessment subparadigms (e.g., behavior anlaysis, cognitive-behavioral).
Many are compatible with other psychological assessment paradigms (e.g., neuropsychological, educational achievement).

of clinical judgments will covary with the degree tivariate time-series designs are often used in
to which they are guided by the assessment research.
principles outlined in Table 1. The assessor can
also reduce some of the biases in clinical
judgments by basing them on assessment data 4.07.4.3 Time-series Assessment Strategies
and being receptive to clinical assessment data As indicated in previous sections, time-series
that is inconsistent with those judgments. measurement of independent and dependent
variables across time (e.g., the frequent [e.g.,
4.07.4.2 An Individualized Approach to 440] daily samples of a client's behavior
Assessment problems and causal variables) is an important
strategy in behavioral assessment. It is a
Partially due to between person differences powerful strategy in clinical research and has
in behavior problems, goals, and functional many advantages in clinical assessment. First, it
relationships, the behavioral assessment para- can help estimate causal and noncausal relation-
digm emphasizes individualized assessmentÐ ships for a participant's behavior problems. By
an idiographic approach to assessment (Cone, subjecting the data to sequential analyses or
1986). An individualized approach to assess- cross-lagged correlations time-series assessment
ment is manifested in several ways: (i) self- can provide information on conditional prob-
monitoring targets and sampling procedures are abilities and suggest possible causal mechanisms
often tailored to the individual client; (ii) role for behavior problems (Bakeman & Gottman,
play scenarios and other assessment instruments 1986; Moran, Dumas, & Symons, 1992; Tryon,
are often individually tailored (e.g., Chadwick, 1998).
Lowe, Horne, & Higson, 1994); (iii) client- A major advantage is that time-series assess-
referenced and criterion-referenced assessment ment allows the researcher and clinician to
instruments, in contrast to norm-referenced control for and examine the temporal prece-
assessment instruments, are often used; (iv) dence of causal variables. As O'Leary, Malone,
treatment goals and strategies are often in- and Tyree (1994) noted in their discussion of
dividually tailored (e.g., de Beurs, Van Dyck, marital violence, it is difficult to draw inferences
van Balkom, Lange, & Koele, 1994), and (v) about causal factors unless they are measured
within-subject, interrupted time-series and mul- well ahead of the targeted behavior problem. A
Behavioral Assessment Methods 171

concurrent measurement strategy (in which inference but an exclusive reliance on quanti-
hypothesized independent and dependent vari- fication can promote a focus on variables and
ables are measured concurrently) cannot be effects with minimal practical importance. The
sufficient for causal inferences because the form emphasis on clinical significance (e.g., Jacobson
(e.g., correlational, bidirectional causal) of the & Truax, 1991) of treatment effects and
relationships cannot be established. functional relationships reflects the practical
Statistical analysis of time-series data can be and clinical importance of effects, as well as
cumbersome in many clinical assessment con- their statistical significance.
texts (however, see subsequent discussion of Qualitative analyses (Haberman, 1978) can
computer aids). However, time-series assess- compliment quantitative analyses. Behavioral
ment is also the best strategy for tracking the assessors can generate clinically useful hypoth-
time-course of behavior problems and goal esis by supplementing quantitative with quali-
attainment during therapy. Frequent monitor- tative analyses of clinical phenomena. Using
ing increases the chance that failing treatment time-sampling measurement strategies to code
will be detected early (Mash & Hunsley, 1993) specific dyadic interaction sequences in a
and that the clinician can make needed changes communication analogue between Mrs. and
in the functional analysis and intervention Mr. A provided data that helped identify
programs. dysfunctional verbal exchanges and provided
Time-series assessment is congruent with an the data base for judging the effects of
emphasis on professional accountability and communication training. However, it was also
empirically based judgments in clinical treat- beneficial for the clinician to ªpassivelyº listen
ment. Time-course plots can be a useful means to the couple discuss their marital problems.
of documenting treatment effects and of Qualitative observation can be a rich source of
providing feedback to clients about behavior ideas about the beliefs, attitudes, and behaviors
change and possible causal relationships for that may contribute to marital distress.
their behavior problems. It can help the clinician Qualitative analyses can also promote the
identify naturally occurring causal mechanisms development of the behavioral assessment
for a client's mood fluctuations, panic episodes, paradigm. We have only an elementary under-
and other behavior problems. Time-series data standing of the causes of behavior disorders and
was acquired with Mrs. A when she self- of the best methods of treating them. An
monitored daily her headaches and sleep exclusive reliance on quantification can impair
problems. the evolution of any nascent assessment para-
Finally, time-series measurement is an essen- digm. Consequently, we must adopt a Stein-
tial element in interrupted time-series designs, beckian attitudeÐgenerate and consider new
such as the A±B±A±B, multiple baseline, or ideas about functional relationships, presume
changing-criterion designs (Kazdin, 1997; Sha- that a new idea may be true, and then rigorously
piro & Kratochwill, 1988). These designs can examine it.
strengthen the internal validity of inferences Although I am advocating that qualitative
about treatment effects and mechanisms. methods can contribute to behavioral assess-
ment, scientific methods remain the core
element in the behavioral assessment paradigm.
4.07.4.4 Quantitative and Qualitative The stagnant nature of many psychological
Approaches to Behavioral Assessment construct systems can be attributed to their
focus on a rigidly invoked core set of assump-
The empirical elements of the behavioral tions about the nature of behavior and treat-
assessment paradigm are partially responsible ment, rather than on a core set of beliefs about
for the current emphasis on treatment outcome the best way to learn about behavior and
evaluation through frequently applied, mini- treatment. The behavioral assessment paradigm
mally inferential, validated assessment instru- will continue to evolve to the degree that it
ments. This quantitative approach to clinical emphasizes scientific methods for studying
inference reflects and accentuates the growing behavior, rather than a set of a priori beliefs
importance of systematic evaluation by profes- about the nature of behavior disorders and their
sionals delivering clinical services. The beha- treatment.
vioral assessment paradigm provides a useful
structure for the evaluation of clinical service
delivery and intervention outcome. However, it 4.07.5 BEHAVIORAL ASSESSMENT
is possible for clinicians and clinical researchers METHODS
to overemphasize excessively molecular mea-
sures and excessive quantification. Quantifica- There are many methods of behavioral
tion is an indispensable component of clinical assessment. Some, such as self-monitoring
172 Principles and Practices of Behavioral Assessment with Adults

and behavioral observation, are congruent with suggests that clinical judgments can sometimes
and influenced by the conceptual and metho- be aided by some trait measures, when used with
dological elements of the behavioral assessment situationally sensitive assessment instruments.
paradigm outlined earlier in this chapter. Fourth, the power of behavioral assessment
Others, such as trait-focused self-report ques- methods often surpasses their clinical utility.
tionnaires, are less congruent with the beha- Some behavioral assessment methods, such as
vioral assessment paradigm. Surveys of journal behavioral observation, are costly and time-
publications and of the assessment methods consuming, which decreases their usefulness in
used by practicing behavior therapists show that clinical assessment (however, as discussed in
it is difficult to reliably categorize an assessment subsequent sections, many technological ad-
instrument as ªbehavioralº or vances have enhanced the clinical utility of some
ªnonbehavioralºÐcategories of behavioral behavioral assessment methods).
and nonbehavioral assessment methods are Finally, behavioral assessors are sometimes
becoming increasingly indistinct. For example, insufficiently educated in psychometric princi-
many cognitive assessment instruments used by ples and in the degree to which assessment
behavior therapists are aggregated and trait- strategies match the conceptual and methodo-
based. They provide an unconditional ªscoreº logical aspects of the behavioral assessment
of some cognitive construct such as ªself- paradigm. For example, behavioral assessors
efficacy beliefsº or ªlocus of controlº (see sometimes use an aggregated score from an
discussion of cognitive assessment in Linscott assessment instrument that has multiple un-
& DiGiuseppe, 1998). Other assessment instru- correlated factors. At other times an aggregated
ments and methods used by behavior therapists score is used when there is important between-
include neuropsychological assessment, socio- situation variance in the measured construct.
metric status evaluation, the Minnesota Multi- Also, norm-referenced assessment instruments
phasic Personality Inventory and other trait- are sometimes applied when they are not
based personality tests, aggregated mood scales, psychometrically appropriate or useful with a
historically focused interviews, and tests of particular client (norms may not be available for
academic achievement (see Hersen & Bellack, the client's ethnic group, age, or economic
1998). status; Cone, 1996; Haynes & Wai'alae, 1995;
There are several bases for the decreasing Silva, 1993).
distinction between behavioral and nonbeha- The following sections briefly present four
vioral assessment methods. First, many vari- categories of behavioral assessment methods: (i)
ables currently targeted in behavioral assessment behavioral observation, in the natural environ-
(e.g., subjective distress, beliefs, mood) were not ment and in analogue situations; (ii) self-
the focus of behavior therapy several decades monitoring; (iii) self-report interviews and
ago. As the response modes in causal models of questionnaires; and (iv) psychophysiological
behavior disorders and the targets of behavioral assessment. The specific strategies, conceptual
treatments expanded beyond motor and verbal foundations, utility, psychometric properties,
behavior, the array and focus of assessment disadvantages, technical advancements, and
instruments used in behavioral assessment contribution to clinical judgment of each
expanded correspondingly. category will be discussed. Coverage of these
Second, behavioral assessors are less predis- methods is necessarily limited and more ex-
posed to immediately reject all traditional self- tensive presentations of behavioral assessment
report assessment instruments. Some well- methods and instruments can be found in books
validated ªpersonalityº assessment instruments by Hersen and Bellack (1998), Mash and Terdal
can provide useful information. However, care (1988), Ollendick and Hersen (1993), and
should be taken to avoid unwarranted infer- Shapiro and Kratochwil (1988).
ences from personality assessment instruments
and to insure that, if used, they are part of an
assessment program that includes more pre- 4.07.5.1 Behavioral Observation
cisely focused assessment methods (see discus-
sions of behavioral and personality assessment Behavioral observation involves the time-
in Behavior Modification, 1993, No. 1, and series observation of molecular, precisely de-
general problems in personality assessment in fined behaviors, and environmental events.
Heatherton & Weinberger, 1994). Usually, an observation session is divided into
Third, in the 1960s and 1970s many behavior smaller time samples and the occurrence of
analysts denounced trait concepts and empha- discrete events within each time sample is
sized situational sources of behavior variance. recorded by external observers (Foster &
The current person 6 situation interactional Cone, 1986; Foster, Bell-Dolan, & Burge,
model of behavior variance, discussed earlier, 1988; Mash & Hunsley, 1990; Tryon, 1998).
Behavioral Assessment Methods 173

Two observation strategies, and variants of ing TV). Consequently, the behavior of the
each, are discussed below: behavioral observa- individual to be observed is often constrained.
tion in the natural environment and behavioral For example, a marital couple being observed at
observation in analogue environments. home might be requested to remain within sight
of the observer, to avoid long phone conversa-
tions, and keep the TV off. Such constraints
4.07.5.1.1 Behavioral observation in the natural
compromise the generalizability of the obtained
environment
data but increase the efficiency of the observa-
Systematic behavioral observation of clients tion process.
is congruent with most of the underlying The temporal parameters (the length, fre-
assumptions of the behavioral assessment quency, spacing, and sampling intervals) of an
paradigm. Quantitative, minimally inferential observation session are influenced by the
data are derived on clients in their natural characteristics of the targeted behaviors. For
environment using external observers (nonpar- example, higher rate behaviors require shorter
ticipant observers). Behavior observation sys- time sampling intervals. Highly variable beha-
tems can be constructed for individual patients, viors required more observation sessions. Suen
some sources of measurement error can be and Ary (1989) discuss temporal parameters of
examined through interrater reliability coeffi- behavioral observation in more detail.
cients, and the acquired data can provide Behavioral observation in the natural envir-
valuable information for the functional analysis onment has many applications. It is a powerful
and for treatment outcome evaluation. Obser- method of treatment outcome evaluation
vation in the natural environments has been because it minimizes many sources of measure-
used in the assessment of self-injurious, delu- ment error associated with self-report and is
sional, and hallucinatory behaviors in institu- sensitive to behavior change. It has also been
tions; eating and drinking in restaurants, bars, used as a basic research tool in the behavioral
and in the home; marital and family interactions and social sciences and to gather data for
in the home; student, teacher, and peer inter- functional analyses.
actions in schools; pain and other health-related The clinical utility of behavioral observation
behaviors at home and in medical centers; in the natural environment is limited in several
community-related behaviors (e.g., driving, ways. It is not cost-efficient for the assessment
littering); and many other behaviors. of very low frequency behaviors, such as
Typically, the client is observed for brief stealing, seizures, and some aggressive beha-
periods (e.g., one hour) in his or her natural viors. Also, ªinternalized,º covert behavior
environment (e.g., at home) several times in problems such as mood disorders and anxiety
ªsensitiveº or ªhigh-riskº situationsÐ are less amenable to external observation
situations or times with an elevated probability (although some verbal and motoric components
that important behaviors or interactions will of these disorders can be observed). Socially
occur (e.g., at dinnertime when observing sensitive or high-valence behaviors, such as
problematic family interactions; at mealtime sexual dysfunctions, paraphiliac behaviors,
when observing the social interactions of a substance use, and marital violence may not
psychiatric inpatient). Trained observers record be emitted in the presence of observers. In most
the occurrence/nonoccurrence of specified and outpatient clinical settings, observation in the
carefully defined behaviors within time-sam- natural environment with external observers is
pling periods (e.g., 15-second periods). Se- prohibitively expensive and time-consuming.
quences of events (e.g., sequential interactions Behavioral observation is an expensive
between a depressed client and family members) assessment method but technological advances
and the duration of events can also be recorded. have enhanced its clinical utility. Audio and
Observers can also use momentary time sam- video tape recorders and other instrumentation
pling and record the behaviors that are can facilitate the acquisition of observation data
occurring at predetermined time sampling on clients in the natural environment without
points. An example of this latter sampling having to send observers (Tryon, 1991). Ob-
strategy would be a nurse recording the social servers can also use hand-held computers to
interactions of a psychiatric inpatient at the record and analyze observation data in real-
beginning of every hour. time (Tryon, 1998).
Observation in unrestricted environments Because many behaviors can be observed,
(e.g., in a client's home) can be problematic behavior sampling is an indispensable compo-
because clients are sometimes out of sight of the nent of behavioral observation. Typically,
observers and sometimes engage in behaviors observers use a standardized behavior coding
that are incompatible with the purposes of the system (e.g., Marital Interaction Coding Sys-
observation (e.g., talking on the phone; watch- tem; Weiss & Summers, 1983) that contains
174 Principles and Practices of Behavioral Assessment with Adults

preselected behavior codes. Behaviors selected negative reciprocityÐthe relative probability


for inclusion in behavioral observation include: that one family member will respond negatively
(i) client problem behaviors (e.g., social initia- following a negative (in comparison to a
tion behaviors by a depressed psychiatric nonnegative) response by another family mem-
inpatient), (ii) causal variables for the client's ber. Particularly with computer-aided observa-
behavior problems and goals (e.g., compliments tion, data on the duration of behaviors can also
and insults emitted during distressed marital be acquired. Some observation systems use
interaction), (iii) behaviors correlated with ratings by observers, rather than event record-
client problem behaviors (e.g., members of a ings, although these are more frequently
functional response class such as verbal and conducted outside formal observation session
physical aggressive behaviors), (iv) salient, (Segal & Fal, 1998; Spector, 1992). According to
important, high-risk behaviors (e.g., suicide McReynolds (1986), the first rating scale was
talk), (v) client goals and positive alternatives to published by Thomasius in 1692. Four basic
undesirable behaviors (e.g., positive social characterological dimensions were ratedÐ
interactions by a delusional or socially isolated sensuousness, acquisiteveness, social ambition,
client), (vi) anticipated immediate, intermediate, and rational love.
and final outcomes of treatment, and (vii) ªReactivityº refers to the degree to which
possible positive and negative side- and general- asessment modifies the targets of assessment.
ized effects of treatment. Reactivity is a potential source of inferential
Although observers sometimes focus on only error in all assessment procedures, but particu-
one individual, it is more common in beha- larly in behavioral observation (Foster et al.,
vioral observation to monitor interpersonal 1988; Haynes & Horn, 1982). The behavior of
interactions. To this end, observers can record clients, psychiatric staff members, spouses, and
sequences of interactions between two or more parents may change as a function of whether
individuals (see discussions in Bakeman & observers are present or absent. Therefore,
Gottman, 1986; Moran et al., 1992). When the reactivity threatens the external validity or
goal of observation is to draw inferences about situational and temporal generalizability of
a group of persons, subject sampling can be the acquired data. In the cases of some highly
used. A few persons may be selected for socially sensitive behaviors (e.g., sexual or
observation from a larger group. For example, antisocial behaviors), the magnitude of reactiv-
several patients may be selected for observation ity may be sufficient to preclude observation in
if the goal is to document the effects of a new the natural environment.
token program on patients in a psychiatric Participant observation is an alternative to
unit. the use of nonparticipant observers. Participant
Behavioral observation is often considered observation is behavioral observation, as de-
the ªgold standardº for assessment. However, scribed above, using observers who are nor-
there are several sources of error which can mally part of the client's natural environment.
detract from the accuracy and validity of (In ethnography and cultural anthropology, the
obtained data and of the inferences drawn from term ªparticipant observationº usually refers to
them. Sources of error in behavioral observa- qualitative observation by external observers
tion include: (i) the degree to which observers who immerse themselves in the social system
are trained, (ii) the composition and rotation of they are observing.) Examples of participant
observer teams, (iii) observer bias and drift, (iv) observation include: (i) parents observing the
the behaviors selected for observation, (v) the play behavior of their children, (ii) nurses
specificity of code definitions, (vi) the methods observing the delusional speech of psychiatric
of evaluating interobserver agreement, (vii) the inpatients, and (iv) a spouse observing a client's
match between time samples and the dynamic depressive or seizure behaviors. Participant
characteristics of the observed behaviors, (viii) observers often use time and behavior sampling
variance in the situations or time of day in which procedures similar to those used by nonparti-
observation occurs (Alessi, 1988; Hartmann, cipant observers. However, participant obser-
1982; Suen & Ary, 1989; Tryon, 1996). vers are usually less well trained and apply less
Several types of data can be derived from complex observation systems (e.g., fewer codes,
behavioral observation. Often, assessors are use of momentary time sampling). For example,
interested in the rate of targeted events. This is a staff member on a psychiatric unit might
usually calculated as the percent of sampling monitor the frequency of social initiations by a
intervals in which a behavior occurs. Sequential client only during short mealtime periods or
analyses and conditional probabilities are often only briefly throughout the day.
more important for developing functional The primary advantages of participant ob-
analyses. For example, observation of family servation are its cost-efficiency and applicabil-
interaction in the home can provide data on ity. Participant observation can be an
Behavioral Assessment Methods 175

inexpensive method of gathering data on clients functional relationships can be observed. It is a


in their natural environment. It may be powerful, clinically useful, and idiographic
particularly advantageous for gathering data behavioral assessment method. Many elements
on low frequency behaviors and on behaviors of the analogue assessment setting are similar to
that are highly reactive to the presence of those of the client's natural environment.
external observers. However, the participants, social and physical
There are several potential sources of error in stimuli, and instructions to the client may differ
participant observation. First, it is susceptible from those of the client's natural environment.
to most of the sources of error mentioned for The behavior of clients in analogue assess-
nonparticipant observation (e.g., behavior de- ment is presumed to correlate with their
finitions, time sampling errors). Additionally, behavior in the natural environment. For
participant observation may be particularly example, a distressed marital couple might
sensitive to observer biases, selective attention discuss a problem in their relationship while in a
by observers, and recent interactions with the clinic and being observed from a one-way
target of observation. The observer is likely to mirror. It is presumed that the problem-solving
be less well trained and often is not a neutral strategies they use will be similar to those they
figure in the social context of the client. use outside the clinic.
Participant observation may be associated Analogue observation has many applica-
with reactive effects. Sometimes, the reactive tions. The role play is often used in the
effects would be expected to be less for behavioral assessment of social skills. A
participant than for nonparticipant observa- psychiatric patient or socially anxious adult
tion. One determining factor in reactivity is the might be observed in a clinic waiting room while
degree to which the assessment process modifies attempting to initiate and maintain a conversa-
the natural environment of the client. Because tion with a confederate. A client might be placed
participant observation involves less change in in a simulated social situation (e.g., simulated
the natural environment of the client, it may be restaurant) and asked to respond to social
less reactive. However, the reactive effects of stimuli provided by a confederate. The Beha-
participant observation may be strengthened, viour Avoidance Test (BAT; e.g., Beck & Zebb,
depending on the method of recording, the 1994) is another form of analogue observation.
behaviors recorded, and the relationship be- In a BAT, a client is prompted to approach an
tween the observer and client. In some situa- object that is feared or avoided.
tions participant observation might be expected Analogue methods have been used in the
to alter the monitored behavior to an important assessment of many clinical phenomena, such as
degree or to adversely affect the relationship pain (Edens & Gil, 1995), articulated thoughts
between the observer and target (e.g., an (Davison, Navarre, & Vogel, 1995), and social
individual monitoring the sexual or eating anxiety and phobia (Newman, Hofmann,
behavior of a spouse). Trabert, Roth, & Taylor, 1994). Other applica-
Critical event sampling is another infre- tions include the study of self-injurious beha-
quently used but clinically useful and cost- viors, dental anxiety, stuttering, heterosexual
efficient method of acquiring natural environ- anxiety, alcohol ingestion, panic episodes,
ment observation data. Critical event sampling cigarette refusal skills, parent±child interaction,
involves video or audio tape recording of speech anxiety, animal phobias, test anxiety,
important interactions in the client's natural and eating patterns.
environment (Jacob, Tennenbaurm, Bargiel, & Data can be acquired on multiple response
Seilhamer, 1995; Tryon, 1998). The critical modes in analogue observation. For example,
interactions are later qualitatively or quantita- during exposure to an anxiety provoking social
tively analyzed. For example, a tape recorder exchange, clients can report their level of
could be self-actuated by a distressed family anxiety and discomfort, electrophysiological
during mealtime; a marital couple could record measures can be taken, observers can record
their verbal alterations at home; a socially the behavior of the client, and clients can report
anxious individual could record conversations their thoughts.
while on a date. Analogue observation is a cost-efficient and
multimodal method of assessment and can be a
useful supplement to retrospective interview
4.07.5.1.2 Analogue observation
and questionnaire methods. It provides a means
Analogue observation involves the systematic of directly observing the client in sensitive
behavioral observation of clients in carefully situations and of obtaining in vivo client reports
structured environments. The assessment en- to supplement the client's retrospective report
vironment is arranged to increase the prob- of thoughts, emotions, and behavior. In
ability that clinically important behaviors and comparison to observation in the natural
176 Principles and Practices of Behavioral Assessment with Adults

environment, it is particularly useful for obser- analogue settings may not accurately reflect
ving some important but low-rate events (e.g., or match their behavior in the natural environ-
marital arguments). ment. Nevertheless, analogue assessment
When used in conjunction with systematic should be expected to be valid in other ways.
manipulation of hypothesized controlling vari- For example, socially anxious and nonanxious
ables analogue observation can be exceptionally clients should behave differently during analo-
useful for identifying causal relationships and gue observation even if their behaviors do not
for constructing a functional analysis of match their behaviors in the natural environ-
behavior problems. For example, social atten- ment. The validity and clinical utility of
tion, tangible rewards, and task demands can be analogue observation should be considered
systematically presented and withdrawn before dependent variables. They are likely to vary
and after the self-injurious behavior of devel- across the purposes of the assessment, subjects,
opmentally disabled individuals (e.g., Iwata target behaviors, settings, and observation
et al., 1994). Systematic manipulation of methods.
hypothesized controlling factors during analo-
gue observation has also been used to identify
the cognitive sequelae of hyperventilation 4.07.5.2 Self-monitoring
during anxiety episodes (Schmidt & Telch,
1994), the most effective reinforcers to use in In self-monitoring, clients systematically
therapy (Timberlake & Farmer-Dougan, 1991), record their behavior and sometimes relevant
and the factors associated with food refusal environmental events (Bornstein, Hamilton, &
(Munk & Repp, 1994). Bornstein, 1986; Gardner & Cole, 1988; Sha-
Analogue observation is associated with piro, 1984). The events to be recorded by the
several unique sources of variance, measure- client are first identified and specified by the
ment error, and inferential error (e.g., Hughes & client and assessor during an interview. A
Haynes, 1978; Kern, 1991; Torgrud & Holborn, recording form is developed or selected and
1992). First, because the physical environment the client monitors the designated events,
and social stimuli are more carefully controlled sometimes at designated times or in specified
in analogue observation but may differ from situations. To reduce errors associated with
those in the client's naturalistic environment, retrospective reports, recording usually occurs
external validity may be reduced concomitantly immediately before or after the monitored
with an increase in behavioral stability. The event.
behavior of clients and the data acquired during One particularly innovative development
analogue observation can covary with: (i) the is self-monitoring via hand-held computer
physical characteristics of the assessment en- and computer-assisted data acquisition and
vironment; (ii) the instructions to participants; analysis (Agras, Taylor, Feldman, Losch, &
(iii) observer and sampling errors, such as those Burnett, 1990; Shiffman, 1993). Hand-held
outlined in naturalistic observation, time and computers allow the collection of real-time
behavior sampling; and (iv) the content validity data and simplify the analysis and presentation
of the assessment environment (i.e., the degree of self-monitoring data. Computerization
to which the stimuli are relevant to the construct should significantly increase the clinical utility
being measured by the analogue assessment of self-monitoring.
situation). Time-sampling is sometimes used with self-
The primary disadvantage to analogue ob- monitoring, depending on the characteristics of
servation is its representational nature: the data the behavior. Clients can easily record every
acquired in analogue assessment are only occurrence of very low-rate behaviors such as
presumed to correlate with data that would seizures or migraine headaches. However, with
be acquired in the natural situations the high-rate or continuous behaviors, such nega-
analogue assessment is designed to represent. tive thoughts, blood pressure, and mood, clients
It is an indirect measure of the individual's recordings may be restricted to specified time
behavior in the natural environment. The results periods or situations.
of many studies have supported the discrimi- Many clinically important behaviors have
nant and criterion-related validity of analogue been the targets of self-monitoring. These
observation; the results of other validation include ingestive behaviors (e.g., eating, caffeine
studies have suggested more cautious conclu- intake, alcohol and drug intake, smoking),
sions. Given the presumption that many specific thoughts (e.g., self-criticisms); physio-
behaviors are sensitive to situational sources logical phenomena and medical problems (e.g.,
of variance, clients can be expected to behave bruxism, blood pressure, nausea associated with
differently in analogue and natural environ- chemotherapy, Raynaud's symptoms, arthritic
ments. That is, the behavior of clients in and other chronic pain, heart rate, seizures); and
Behavioral Assessment Methods 177

a variety of other phenomena such as self- be affected by how well the client was trained in
injurious behaviors, electricity use, startle self-monitoring procedures, the demands of the
responses, sexual behavior, self-care behaviors, self-monitoring task, the degree to which target
exercise, panic and asthma episodes, social events have been clearly specified, reactions
anxiety, mood, marital interactions, study time, from family and friends to the client's monitor-
sleeping patterns, and nightmares. ing, and the frequency and duration of the
Many response modes are amenable to targeted behaviors. Clinical researchers have
measurement with self-monitoring. Clients also reported difficulty securing cooperation
can monitor overt motor behavior, verbal from clients to self-monitor for extended
behavior, subjective distress and mood, emo- periods of time. One particularly powerful
tional responses, occurrence of environmental source of inferential error is reactivity (Born-
events associated with their behavior, physio- stein et al., 1986). The reactive effects of self-
logical responses, thoughts, and the qualitative monitoring are frequently so great that self-
characteristics of their behavior problems (e.g., monitoring is sometimes used as a method of
location of headaches, specific negative treatment.
thoughts, multimodal aspects of panic epi-
sodes). Self-monitoring can also be used to track
multiple response parameters: response dura- 4.07.5.3 Psychophysiological Assessment
tions, magnitudes, and frequencies. Of parti-
cular relevance for the functional analysis, the Psychophysiological measurement is an in-
client can concurrently monitor behaviors, creasingly important method in behavioral
antecedent events, and consequent events to assessment (Haynes, Falkin, & Sexton-Radek,
help identify functional relationships. Johnson, 1989). The increased emphasis on psychophy-
Schlundt, Barclay, Carr-Nangle, and Engler siological assessment is due, in part, to a
(1995), for example, had eating disordered growing recognition of the importance of
clients monitor the occurrence of binge eating physiological components of behavior pro-
episodes and the social situations in which they blems, such as depression, anxiety, and many
occurred in order to calculate conditional psychotic behavior problems. Also, behavior
probabilities for binge eating. therapists are increasingly involved in the
Self-monitoring is a clinically useful assess- assessment and treatment of disorders that
ment method. Self-monitoring can be tailored have traditionally been the focus of medical
for individual clients and used with a range of interventionsÐcancer, chronic pain, diabetes,
behavior problems. It is an efficient and cardiovascular disorders. A third reason for the
inexpensive assessment method for gathering importance of psychophysiological assessment
data on functional relationships in the natural is that many behavioral intervention proce-
environment and is another important supple- dures, such as relaxation training and desensi-
ment to retrospective self-report. It is suitable tization, focus partly on the modification of
for time-series assessment and for the derivation physiological processes. Advances in ambula-
of quantitative indices of multiple response tory monitoring, computerization, and other
modes. Self-monitoring is applicable with many technologies have increased the clinical utility of
populationsÐadult outpatients, children, inpa- psychophysiological measurement. Finally,
tients, parents and teachers, and developmental psychophysiological measurement can easily
disabled individuals. Events that, because of be combined with other behavioral assessment
their frequency or reactive effects, are not methods, such as self-monitoring and analogue
amenable to observation by participant and observation.
nonparticipant observers may be more amen- The recognition of the importance of the
able to assessment with self-monitoring. physiological response mode in behavior
Although many validation studies on self- problems mandates the inclusion of electrophy-
monitoring have been supportive (see reviews in siological and other psychophysiological
Bornstein et al., 1986; Gardner & Cole, 1988; measurement methods. Electromyographic,
Shapiro, 1984) there are several threats to the electrocardiovascular,electroencephalographic,
validity of this assessment method. Two and electrodermal measures are particularly
important sources of error in self-monitoring applicable to behavioral assessment with adults.
are clients' recording errors and biases. The A range of behavior problems (e.g., panic
resultant data can reflect the client's abilities to disorders, PTSD, schizophrenic behaviors,
track and record behaviors, client expectancies, obsessive-compulsive behaviors, worry, depres-
selective attention, missed recording periods, sion, substance abuse, disorders of initiating and
the social valence and importance of the target maintaining sleep) have important physiological
behaviors, fabrication, and the contingencies components. The low magnitude of covariance
associated with the acquired data. Data can also betweenphysiologicalandotherresponsemodes,
178 Principles and Practices of Behavioral Assessment with Adults

noted earlier in this chapter, suggests that they The interview is an indispensable part of
maybeafunctionofdifferentcausalvariablesand behavioral assessment and treatment and un-
respond differently to the same treatment. doubtedly is the most frequently used assess-
Psychophysiological measurement is a com- ment method. All behavioral interventions
plex, powerful, and clinically useful assessment require prior verbal interaction with the client
method in many assessment contexts and for or significant individuals (e.g., staff) and the
many clients. It is amenable to idiographic structure and content of that interview can have
assessment, can be applied in a time-series an important impact on subsequent assessment
format, and generates quantitative indices. The and treatment activities.
validity of the obtained measures can be affected As illustrated with Mrs. A, an assessment
by electrode placement, site resistance, move- interview can be used for multiple purposes.
ment, instructional variables, time-sampling First, it can help identify and rank order the
parameters, data reduction and analysis, equip- client's behavior problems and goals. It can also
ment intrusiveness, and equipment failures. be a source of information on the client's
Books by Andreassi (1995) and Cacioppo and reciprocal interactions with other people, and,
Tassinary (1990) cover instrumentation, mea- consequently, provides important data for the
surement methods, technological innovations, functional analysis. Interviews are the main
clinical applications, and sources of measure- vehicles for informed consent for assessment
ment error. and therapy and can help establish a positive
relationship between the behavior assessor and
4.07.5.4 Self-report Methods in Behavioral client. Additionally, interviews are used to select
Assessment clients for therapy, to determine overall assess-
ment strategies, to gather historical informa-
Many interview formats and hundreds of self- tion, and to develop preliminary hypotheses
report questionnaires have been adopted by about functional relationships relevant to the
behavioral assessors from other assessment client's behavior problems and goals.
paradigms. A comprehensive presentation of The behavioral assessment interview differs
these methods is not possible within the confines from nonbehavioral interviews in content and
of this chapter. Here, I will emphasize how format. First, the behavioral interview is often
behavioral and traditional self-report methods more quantitatively oriented and structured
differ in format and content. The differences (although most behavioral interviews involve
reflect the contrasting assumptions of behavior- unstructured, nondirective, and client-centered
al and nonbehavioral assessment paradigms. phases). The focus of the behavioral interview
More extensive discussions of self-report reflects assumptions of the behavioral assess-
questionnaire and interview methods, and ment paradigm about behavior problems and
applicable psychometric principles are provided causal variables and emphasizes current rather
by Anastasi (1988), Jensen and Haynes (1986), than historical behaviors and determinants.
Nunnally and Bernstein (1994), Sarwer and Behavioral interviewers are more likely to query
Sayers (1998), and Turkat (1986) about situational sources of behavioral variance
Behavioral assessors, particularly those af- and to seek specification of molecular behaviors
filiated with an applied behavior analysis and events.
paradigm, have traditionally viewed self-report A systems perspective also guides the beha-
questionnaires and interviews with skepticism. vioral assessment interview. The behavioral
Objections have focused on the content and interviewer queries about the client's extended
misuses of these methods. Many questionnaires social network and the social and work
solicit retrospective reports, stress situationally environment of care-givers (e.g., the incentives
insensitive aggregated indices of traits, focus on at a psychiatric institution that encourage
molar level constructs that lack consensual cooperation by staff members). The interviewer
validity, and are unsuited for idiographic also evaluates the effects that treatment may
assessment. Biased recall, demand factors, item have on the client's social systemÐwill treat-
interpretation errors, and memory lapses, ment effect family or work interactions?
further challenged the utility of self-report Some of the concerns with the interview as a
questionnaires. Data from interviews have been source of assessment information reside with its
subject to the same sources of error with traditionally unstructured applications. Under
additional error variance associated with the unstructured conditions, data derived from the
behavior and characteristics of the interviewer. interview may covary significantly with the
Despite these constraints, interviews and ques- behavior and biases of the interviewer. How-
tionnaires are the most frequently used methods ever, structured interviews and technological
used by behavior therapists (e.g., Piotrowski & advances in interview methods promise to
Zalewski, 1993). reduce such sources of error (Hersen & Turner,
Behavioral Assessment Methods 179

1994; Sarwer & Sayers, 1998). Computerization, naires sometimes rely on the face validity of
to guide the interviewer and as an interactive questionnaires and do not follow standard
system with clients, promises to reduce some psychometric principles of questionnaire devel-
sources of error in the interview process. opment (see special issue on ªMethodological
Computerization can also increase the efficiency issues in psychological assessment researchº in
of the interview and assist in the summarization Psychological Assessment, 1995, Vol. 7). Defi-
and integration of interview-derived data. ciencies in the development and validation of
Other structured interview aids, such as the any assessment instrument reduce confidence in
Timeline Followback (Sobell, Toneatto, & the inferences that can be drawn from resulting
Sobell, 1994) may also increase the accuracy scores.
of the data derived in interviews. In Timeline Questionnaires, given appropriate construc-
Followback, memory aids are used to enhance tion and validation, can be an efficient and
accuracy of retrospective recall of substance useful source of behavioral assessment data.
use. A calendar is used as a visual aid, with the Most are inexpensive, quick to administer and
client noting key dates, long periods in which score, and are well received by clients. Compu-
they abstained or were continuously drunk, and ter administration and scoring can increase their
other discreet events associated with substance efficiency and remove several sources of error
use. (Honaker & Fowler, 1990). They can be
Some interviews are oriented to the informa- designed to yield data on functional relation-
tion required for a functional analysis. For ships of variables at a clinically useful level of
example, the Motivation Assessment Scale is specificity.
used with care-givers to ascertain the factors
that may be maintaining or triggering self-
injurious behavior in developmentally disabled 4.07.5.5 Psychometric Foundations of
persons (Durand & Crimmins, 1988). Behavioral Assessment
Questionnaires, including rating scales, self-
report questionnaires, and problem inventories, The application of psychometric principles to
are also frequently used in behavioral assess- behavioral assessment instruments has been
ment; they have probably been frequently used discussed in many books and articles (e.g.,
with all adult behavior disorders. Many ques- Cone, 1988; 1996; Foster & Cone, 1995; Haynes
tionnaires used by behavioral assessors are & Wai'alae, 1995; Silva, 1993; see also ªMeth-
identical to those used in traditional nonbeha- odological issues in psychological assessment
vioral psychological assessment. As noted ear- research,º Psychological Assessment, Septem-
lier, questionnaires are often adopted by ber, 1995). Psychometric principles were origin-
behavioral assessors without sufficient thought ally applied to tests of academic achievement,
to their underlying assumptions about behavior intelligence, and abilities. Because many of the
and the causes of behavior problems, content principles were based on estimating measure-
validity, psychometric properties, and incre- ment error with presumably stable and molar-
mental clinical utility. They are often trait- level phenomena, the relevance of psychometric
focused, insensitive to the conditional nature of principles to behavioral assessment has been
the targeted behavior, and provide aggregated questioned. However, ªpsychometryº is best
indices of a multifaceted behavioral construct viewed as a general validation process that is
(Haynes & Uchigakiuchi, 1993). Questionnaires applicable to any method or instrument of
are sometimes helpful for initial screening or as psychological assessment.
a nonspecific index of program outcome but are The ultimate interest of psychometry is the
not useful for a functional analysis or for precise construct validity of an assessment instrument
evaluation of treatment effects. The integration or, more precisely, the construct validity of the
of personality and assessment is addressed data and inferences derived from an assessment
further in a subsequent section of this chapter. instrument. Construct validity is comprised of
Some questionnaires are more congruent the multiple lines of evidence and rationales
with the behavioral assessment paradigm. These supporting the trustworthiness of assessment
usually target a narrower range of adult instrument data interpretation (Messick, 1993).
behavior problems or events, such as panic Indices of construct validity are also
and anxiety symptoms, outcome expectancies conditionalÐan index of validity does not
for alcohol, recent life stressors, and tactics for reside unconditionally with the instrument
resolving interpersonal conflicts. Most beha- (Silverman & Kurtines, 1998, discuss contextual
viorally oriented questionnaires focus on spe- issues in assessment). Elements of construct
cific and lower-level behaviors and events and validation are differentially applicable, depend-
query about situational factors. However, the ing on the method, target, and purpose of
developers of behaviorally oriented question- assessment.
180 Principles and Practices of Behavioral Assessment with Adults

The validity of data derived from an assess- temporal stability coefficients, by themselves
ment instrument establishes the upper limit of are weak indices of validity. A multimethod/
confidence in the clinical judgments to which the multi-instrument assessment strategy, by pro-
instrument contributes. Consequently, the va- viding indices of covariance among measures of
lidity of every element of the functional analysis the same targeted phenomena, however, can
is contingent on the validity of the assessment help separate true from error variance. Addi-
instruments used to collect contributing data. tionally, a low magnitude of temporal stability
The validity of other clinical judgments (e.g., in a time-series measurement strategy has
risk factors for relapse, and the degree of implications for the number of samples neces-
treatment effectiveness) similarly depends on sary to estimate or capture the time course of the
the validity of assessment data. measured phenomenaÐunstable phenomena
The applicability of psychometric principles require more sampling periods than do stable
(e.g., internal consistency, temporal stability, phenomena.
content validity, criterion-related validity) to Behavioral assessment often involves multi-
behavioral assessment instruments varies with ple methods of assessment, focused on multiple
their methods, targets, and applications. The modes and parameters. As noted earlier in this
data obtained in behavioral assessment differ in chapter, sources of measurement error and
the degree to which they are presumed to determinants can vary across methods, modes,
measure lower-level less-inferential variables and parameters. A multimethod approach to
(e.g., number of interruptions in a conversation, assessment can strengthen confidence in sub-
hitting) or higher-level more inferential vari- sequent clinical judgments. However, estimates
ables (e.g., positive communication strategies, of covariance are often used as indices of
aggression). With lower-level variables, psycho- validity and can be attenuated in comparison to
metric indices such as internal consistency and monomethod or monomode assessment strate-
factor structure are not useful indications of gies (see discussion of psychometric indices of
validity of the obtained data. Interobserver multiple methods in Rychtarik & McGillicuddy,
agreement and content validity may be more 1996).
useful indices. The individualized nature of behavioral
The validity of data from an assessment assessment enhances the importance of some
instrument depends on how it will be usedÐon construct validity elements. For example,
the clinical judgments that it affects. For accuracy, content validity, and interobserver
example, accurate data may be obtained from agreement are important considerations in
analogue observation of clients social interac- behavioral observation coding systems. Idio-
tions. That is, there may be perfect agreement graphic assessment reduces the importance
among multiple observers about the client's rate of construct validity elements such as nomothe-
of eye contact, questions, and reflections. tically based discriminant and convergent
However, those rates may demonstrate low validity.
levels of covariance (i.e., low criterion-refer-
enced validity) with the same behaviors mea-
sured in natural settings. The relative 4.07.6 BEHAVIORAL AND PERSONALITY
importance of accuracy and other forms of ASSESSMENT
validity varies with the purpose of the assess-
ment (see Cone, 1998). If the analogue data is As noted earlier in this chapter, behavioral
used to evaluate the effectiveness of a social assessors often use traditional personality
skills training program, accuracy is an impor- questionnaires and several possible reasons
tant consideration. If the data is to be used to for this integration were given. The positive
evaluate generalization of treatment effects, cost-efficiency of personality trait measures is
accuracy is necessary but not sufficient. one factor. One of the more empirically based
The interpretation of temporal and situa- rationales for integration is a person 6
tional stability coefficients is complicated in situation interactional model for assessment:
behavioral assessment by the conditional and if we want to predict a person's behavior, it helps
unstable nature of some of the targeted to know something about the relatively stable
phenomena (e.g., appetitive disorders, social aspects of the person and something about the
behaviors, mood, expectancies). Indices of situations that promote instability, at least
instability (e.g., low test±retest correlations) sometimes (McFall & McDonel, 1986). Person-
can reflect variability due to true change across ality questionnaires are often used in initial
time in the variable (e.g., change in the social screening, followed by more specifically fo-
behavior of an observed client) as well as cused, molecular, and less inferential assessment
measurement error (e.g., poorly defined beha- instruments. Noted in this section are several
vior codes, observer error). Consequently, additional issues concerning the integration
Summary 181

of personality and behavioral assessment. Most behavioral assessors would acknowl-


These issues were discussed in Haynes and edge that molar self-report measures can con-
Uchigakiuchi (1993) and in other articles in a tribute to clinical inferences, when used within a
special section of Behavior Modification, 1993, multimethod assessment program and care is
17(1). taken to address many sources of measurement
There are several complications associated and inferential error noted above. However,
with adopting the situation 6 person interac- there are several other complications associated
tion model and with the use of personality with personality assessment: (i) many traits
assessment instruments. Given that there are measured by personality assessment instruments
hundreds of traits measurable by extant instru- are poorly defined and faddish; (ii) molar
ments, it is difficult to determine which traits to variables are less likely than molecular variables
measure, and how best to measure them. Also, to reflect the dynamic nature of behaviorÐthey
despite a growing literature on situational are momentary snap-shots of unstable phenom-
factors in behavior disorders, we still do not ena; (iii) personality trait measures may be more
know which aspects of situations can be most useful for initial screening than for the construc-
important in controlling behavioral variance for tion of a detailed functional analysis and
a particular client (e.g., Kazdin, 1979). Nor do treatment planning; (iv) personality traits can
we know under which conditions a person± also be conditional: their probability and
situation interaction model, as opposed to a magnitude can vary across situations; (v)
situational or trait model, will assume the inferences about a client's status on a trait
greatest predictive efficacy. dimension varies across assessment instruments;
Several additional issues regarding the trait and (vi) because of their aggregated nature,
6 situation interactional model of behavior and many response permutations can contribute to a
the utility of personality assessment strategies particular score on a trait dimension.
for behavioral assessment have already been In sum, the integration of person±situation
discussed in this chapter and in many previously interactional models and personality assess-
published articles. First, personality traits vary ment in the behavioral assessment paradigm can
in theoretical connotations and the theoretical benefit clinical judgments. However, this inte-
connotations of a trait measure influence its gration has sometimes occurred too readily,
utility for behavioral assessment. Many con- without the thoughtful and scholarly reflection
structs measures by personality assessment characteristics of the behavioral assessment
instruments have psychodynamic and intrinsi- paradigm.
cally causal connotations. Some, such as
ªemotional instability,º ªhardiness,º and 4.07.7 SUMMARY
ªpassive±aggressive,º refer to an internal state
that is presumed to control observed behaviorÐ Behavioral assessment is a dynamic and
these traits invoke causal models that are powerful assessment paradigm designed to
inconsistent with aspects of the behavioral enhance the validity of clinical judgments.
assessment paradigm. In these cases ªpsycho- One of the most important and complex clinical
logical processesº are inferred from cross- judgments in behavioral assessment functional
situational consistencies in behavior. In a analysisÐa synthesis of the clinicians hypoth-
circular fashion, the processes become explana- eses about the functional relationships relevant
tions for the behaviors that are their indicators. to a clients behavior problems. The functional
The processes cannot be independently vali- analysis is a central component in the design of
dated, are difficult to measure, and the behavior therapy programs. The behavioral
inferential process can inhibit a scientific assessment paradigm suggests that errors in
investigation of these behaviors. clinical judgments can be reduced to the degree
Personality questionnaires invariable invoke that the judgments are based on multiple
molar-level traits whose interpretation require assessment methods and sources of informa-
normative comparison. Consequently, trait tion, validated assessment instruments, time-
measures are less amenable to idiographic series measurement strategies, data on multiple
assessment of lower-level variables. Clinical response modes and parameters, minimally
inferences about a person on a trait dimension inferential variables, and the assessment of
are derived by comparing the person's aggre- behavior±environment interactions. The Clin-
gated trait score to the trait scores of a large ical Pathogenesis Map and Functional Analytic
sample of persons. Such comparative inferences Causal Model were introduced as ways of
can be helpful but can also be in error if there are graphically depicting and systematizing the
important differences between the person and functional analysis.
the comparison group, such as on dimensions of The methods of behavioral assessment and
gender, ethnicity, and age. clinical case conceptualizations are influenced
182 Principles and Practices of Behavioral Assessment with Adults

by several interrelated assumptions about the therapy for the treatment of obesity. Behavior Therapy,
causes of behavior problems. The behavioral 21, 99±109.
Alessi, G. (1988). Direct observation methods for emo-
assessment paradigm emphasizes multiple tional/behavior problems. In E. S. Shapiro & T. R.
causality; multiple causal paths; individual Kratochwill (Eds.), Behavioral assessment in schools:
differences in causal variables and paths; Conceptual foundations and practical applications
environmental causality and reciprocal deter- (pp. 14±75). New York: Guilford Press.
American Psychiatric Association. (1994). Diagnostic and
minism; contemporaneous causal variables; statistical manual of mental disorders (4th ed.). Washing-
the dynamic nature of causal relationships; ton, DC: Author.
the operation of moderating and mediating Anastasi, A. (1988). Psychological testing (6th ed.). New
variables; interactive and additive causality; York: Macmillan.
situations, setting events, and systems factors Andreassi, J. L. (1995). Psychophysiology: Human behavior
and physiological response (3rd ed.). Hillsdale, NJ:
as causal variables; and dynamical causal Erlbaum.
relationships. Asterita, M. F. (1985). The physiology of stress. New York:
The methods of behavioral assessment and Human Sciences Press.
clinical case conceptualizations are also affected Bakeman, R., & Gottman, J. M. (1986). Observing
interaction: An introduction to sequential analysis. New
by assumptions about the characteristics of York: Cambridge University Press.
behavior problems. These include an emphasis Bandura, A. (1969). Principles of behavior modification.
on the multimodal and multiparameter char- New York: Holt, Rinehart and Winston.
acteristics of behavior problems, differences Bandura, A. (1981). In search of pure unidirectional
among clients in the importance of behavior determinants. Behavior Therapy, 12, 315±328.
Barlow, D. H., & Cerny, J. A. (1988). Psychological
problem modes and parameters, the complex treatment of panic. New York: Guilford Press.
interrelationships among a client's multiple Barnett, P. A., & Gotlib, I. H. (1988). Psychosocial
behavior problems, and the conditional and functioning and depression: Distinguishing among ante-
dynamic natures of behavior problems. cedents, concomitants, and consequences. Psychological
Three of many methodological foundations Bulletin, 104, 97±126.
Barrios, B. A. (1988). On the changing nature of behavioral
of behavioral assessment were discussed: the assessment. In A. S. Bellack & M. Hersen (Eds.),
emphasis on empirical hypothesis-testing, the Behavioral assessment: A practical handbook (pp. 3±41).
idiographic emphasis, and the use of time-series New York: Pergamon.
assessment strategies. Beach, S., Sandeen, E., & O'Leary, K. D. (1990).
Depression in marriage. New York: Guilford Press.
The decreasing distinctiveness of behavioral Beck, J. G., & Zebb, B. J. (1994). Behavioral assessment
and nonbehavioral assessment, and reasons for and treatment of panic disorder: Current status, future
this change, were discussed. Four caegories of directions. Behavior Therapy, 25, 581±612.
behavioral assessment methods were presented Bellack, A. S., & Hersen, M. (1988). Behavioral assessment:
(i) behavioral observation, (ii) self-monitoring, A practical handbook. New York: Pergamon.
Bornstein, P. H., Bornstein, M. T., & Dawson, D. (1984).
(iii) self-report methods, and (iv) psychophy- Integrated assessment and treatment. In T. H. Ollendick
siological assessment. The specific strategies, & M. Hersen (Eds.), Child behavioral assessment:
conceptual foundations, clinical utility, psycho- Principles and procedures (pp. 223±243). New York:
metric properties, disadvantages, technical ad- Pergamon.
Bornstein, P. H., Hamilton, S. B., & Bornstein, M. T.
vancements, and contribution to clinical (1986). Self-monitoring procedures. In A. R. Ciminero,
judgment of each category were presented. C. S. Calhoun, & H. E. Adams (Eds.), Handbook of
The application of psychometric principles to behavioral assessment (pp. 176±222). New York: Wiley.
behavioral assessment was discussed. The Brown, T. A., DiNardo, P. A., & Barlow, D. H. (1994).
applicability of specific principles varies across Anxiety disorders interview schedule for DSM-IV (ADIS-
IV). Albany, NY: Graywind Publications.
methods, targets, and applications. Cacioppo, J. T., & Tassinary, L. G. (1990). Principles and
Several issues relating to the integration of psychophysiology: Physical, social, and inferential ele-
behavioral and personality assessment were ments. New York: Cambridge University Press.
presented. These included poor definitions for Chadwick, P. D. J., Lowe, C. F., Horne, P. J., & Higson, P.
some traits, the molar nature of personality J. (1994). Modifying delusions: The role of empirical
testing. Behavior Therapy, 25, 35±49.
assessment variables, insensitivity to dynamic Ciminero, A. R., Calhoun, K. S., & Adams, H. E. (1986).
aspects of behavior, reduced utility for func- Handbook of behavioral assessment. New York: Wiley.
tional analysis and treatment planning, the Collins, L. M., & Horn, J. L. (Eds.) (1991). Best methods
conditional nature of personality traits, differ- for the analysis of change. Washington, DC: American
Psychological Association.
ences among personality assessment instru- Cone, J. D. (1979). Confounded comparisons in triple
ments, and the aggregated nature of trait response mode assessment research. Behavioral Assess-
measures. ment, 11, 85±95.
Cone, J. D. (1986). Idiographic, nomothetic and related
perspectives in behavioral assessment. In R. O. Nelson
4.07.8 REFERENCES & S. C. Hayes (Eds.), Conceptual foundations of
behavioral assessment. (pp. 111±128). New York: Guil-
Agras, W. S., Taylor, C. B., Feldman, D. E., Losch, M., & ford Press.
Burnett, K. F. (1990). Developing computer-assisted Cone, J. D. (1988). Psychometric considerations and
References 183

the multiple models of behavioral assessment. In A. S. Gatchel, R. J. (1993). Psychophysiological disorders: Past
Bellack & M. Hersen (Eds.), Behavioral assessment: and present perspectives. In R. J. Gatchel & E. B.
A practical handbook (pp. 42±66). New York: Blanchard (Eds.), Psychophysiological disorders, research
Pergamon. and clinical applications (pp. 1±22). Washington, DC:
Cone, J. D. (1998). Psychometric considerations: Concepts, American Psychological Association.
contents and methods. In M. Hersen & A. S. Bellack Gatchel, R. J., & Blanchard, E. B. (1993). Psychophysio-
(Eds.), Behavioral assessment: A practical handbook (4th logical disorders, research and clinical applications.
ed.). Boston: Allyn & Bacon. Washington, DC: American Psychological Association.
Cone, J. D., & Hawkins, R. P. (Eds.) (1977). Behavioral Glass, C. (1993). A little more about cognitive assessment.
assessment: New directions in clinical psychology. New Journal of Counseling and Development, 71, 546±548.
York: Brunner/Mazel. Goldfried, M. R. (1982). Behavioral Assessment: An
Craske, M. G., & Waikar, S. V. (1994). Panic disorder. In overview. In A. S. Bellack, M. Hersen, & A. E. Kazdin
M. Hersen & R. T. Ammerman (Eds), Handbook of (Eds.), International handbook of behavior modification
prescriptive treatments for adults. (pp. 135±155). New and therapy (pp. 81±107). New York: Plenum.
York: Plenum. Haberman, S. J. (1978). Analysis of qualitative data (Vol.
Creer, T. L., & Bender, B. G. (1993). Asthma. In R. J. 1). New York: Academic Press.
Gatchel & E. B. Blanchard (Eds.), Psychophysiological Hartmann, D. P. (Ed.) (1982). Using observers to study
disorders, research and clinical applications (pp. 151±204) behavior. San Francisco: Jossey-Bass.
Washington, DC: American Psychological Association. Hatch, J. P. (1993). Headache. In: R. J. Gatchel & E. B.
Davison, G. C., Navarre, S., & Vogel, R. (1995). The Blanchard (Eds.), Psychophysiological disorders, research
articulated thoughts in simulated situations paradigm: A and clinical applications (pp. 111±150) Washington, DC:
think-aloud approach to cognitive assessment. Current American Psychological Association.
Directions in Psychological Science, 4, 29±33. Haynes, S. N. (1978). Principles of behavioral assessment.
de Beurs, E., Van Dyck, R., van Balkom, A. J. L. M., New York: Gardner Press.
Lange, A., & Koele, P. (1994). Assessing the clinical Haynes, S. N. (1986). The design of intervention programs.
significance of outcome in agoraphobia research: A In R. O. Nelson & S. Hayes (Eds.), Conceptual
comparison of two approaches. Behavior Therapy, 25, foundations of behavioral assessment (pp. 386±429).
147±158. New York: Guilford Press.
Durand, V. M., & Carr, E. G. (1991). Functional Haynes, S. N. (1992). Models of causality in psychopathol-
communication training to reduce challenging behavior: ogy: Toward synthetic, dynamic and nonlinear models of
Maintenance and application in new settings. Journal of causality in psychopathology. Des Moines, IA: Allyn &
Applied Behavior Analyses, 24, 251±264. Bacon.
Durand, V. M., & Crimmins, D. M. (1988). Identifying the Haynes, S. N. (1994). Clinical judgment and the design of
variables maintaining self-injurious behaviors. Journal of behavioral intervention programs: Estimating the mag-
Autism and Developmental Disorders, 18, 99±117. nitudes of intervention effects. Psichologia Conductual, 2,
Eels, T. (1997). Handbook of psychotherapy case formula- 165±184.
tion. New York: Guilford Press. Haynes, S. N. (1996a). Behavioral assessment of adults. In
Eifert, G. H., & Wilson, P. H. (1991). The triple response M. Goldstein and M. Hersen (Eds.), Handbook of
approach to assessment: A conceptual and methodolo- psychological assessment.
gical reappraisal. Behaviour Research and Therapy, 29, Haynes, S. N. (1996b). The changing nature of behavioral
283±292. assessment. In: M. Hersen & A. Bellack (Eds.),
Edens, J. L., & Gil, K. M. (1995). Experimental induction Behavioral assessment: A practical guide (4th ed.).
of pain: Utility in the study of clinical pain. Behavior Haynes, S. N. (1996c). The assessment±treatment relation-
Therapy, 26, 197±216. ship in behavior therapy: The role of the functional
Evans, I. (1993). Constructional perspectives in clinical analysis. The European Journal of Psychological Assess-
assessment. Psychological Assessment, 5, 264±272. ment. (in press).
Eysenck, H. J. (1986). A critique of contemporary Haynes, S. N., Blaine, D., & Meyer, K. (1995). Dynamical
classification and diagnosis. In T. Millon & G. L. models for psychological assessment: Phase±space func-
Klerman (Eds.), Contemporary directions in psycho- tions. Psychological Assessment, 7, 17±24.
pathology: Toward the DSM-IV (pp. 73±98). New York: Haynes, S. N., Falkin, S., & Sexton-Radek, K. (1989).
Guilford Press. Psychophysiological measurement in behavior therapy.
Eysenck, H. J., & Martin, I. (1987). Theoretical foundations In G. Turpin (Ed.), Handbook of clinical psychophysiol-
of behavior therapy. New York: Plenum. ogy (pp. 263±291). London: Wiley.
Figley, C. R. (Ed.) (1979), Trauma and its wake: Volume 1: Haynes, S. N., & Horn, W. F. (1982). Reactive effects of
The study of post-traumatic stress disorder. New York: behavioral observation. Behavioral Assessment, 4,
Brunner/Mazel. 369±385.
Foster, S. L., Bell-Dolan, D. J. & Burge, D. A. (1988). Haynes, S. N., Leisen, M. B., & Blaine, D. D. (1997).
Behavioral observation. In A. S. Bellack & M. Hersen Design of individualized behavioral treatment programs
(Eds.), Behavioral assessment: A practical handbook using functional analytic clinical case models. Psycholo-
(pp. 119±160). New York: Pergamon. gical Assessment, 9, 334±348.
Foster, S. L., & Cone, J. D. (1986). Design and use of direct Haynes, S. N., & O'Brien, W. O. (1990). The functional
observation systems. In A. R. Ciminero, C. S. Calhoun, analysis in behavior therapy. Clinical Psychology Review,
& H. E. Adams (Eds.), Handbook of behavioral assess- 10, 649±668.
ment (pp. 253±324). New York: Wiley. Haynes, S. N., & O'Brien, W. O. (1998). Behavioral
Gannon, L. R., & Haynes, S. N. (1987). Cognitive- assessment. A functional approach to psychological
physiological discordance as an etiological factor in assessment. New York: Plenum.
psychophysiologic disorders. Advances in Behavior Re- Haynes, S. N., & O'Brien, W. O. (in press). Behavioral
search and Therapy, 8, 223±236. assessment. New York: Plenum.
Gardner, W. I., & Cole, C. L. (1988). Self-monitoring Haynes, S. N., Spain, H., & Oliviera, J. (1993). Identifying
procedures. In E. S. Shapiro & T. R. Kratochwill (Eds.), causal relationships in clinical assessment. Psychological
Behavioral assessment in schools: Conceptual foundations Assessment, 5, 281±291.
and practical applications (pp. 206±246). New York: Haynes, S. N., & Uchigakiuchi, P. (1993). Incorporating
Guilford Press. personality trait measures in behavioral assessment:
184 Principles and Practices of Behavioral Assessment with Adults

Nuts in a fruitcake or raisins in a mai tai? Behavior R. T. Ammerman (Eds), Handbook of prescriptive
Modification, 17, 72±92. treatments for adults (pp 443±461). New York: Plenum.
Haynes, S. N., Uchigakiuchi, P., Meyer, K., Orimoto, Kratochwill, T. R., & Levin, J. R. (1992). Single-case
Blaine, D., & O'Brien, W. O. (1993). Functional analytic research design and analysis: New directions for psychol-
causal models and the design of treatment programs: ogy and education. Hillsdale, NJ: Erlbaum.
Concepts and clinical applications with childhood Kratochwill, T. R., & Shapiro, E. S. (1988). Introduction:
behavior problems. European Journal of Psychological Conceptual foundations of behavioral assessment. In
Assessment, 9, 189±205. E. S. Shapiro & T. R. Kratochwill (Eds.), Behavioral
Haynes, S., N., & Wai'alae, K. (1995). Psychometric assessment in schools: Conceptual foundations and prac-
foundations of behavioral assessment. In: R. FernaÂndez- tical applications (pp. 1±13). New York: Guilford
Ballestros (Ed.), Evaluacion conductual hoy: (Behavioral Press.
assessment today)(pp. 326±356). Madrid, Spain: Edi- Kubany, E. S. (1994). A cognitive model of guilt typology
ciones Piramide. in combat-related PTSD. Journal of Traumatic Stress, 7,
Haynes, S. N., & Wu-Holt, P. (1995). Methods of 3±19.
assessment in health psychology. In M. E. Simon (Ed.), Lang, P. J. (1995). The emotion probe: Studies of
Handbook of health psychology (pp. 420±444). Madrid, motivation and attention. American Psychologist, 50,
Spain: Sigma 519±525.
Heatherton, T. F., & Weinberger, J. L. (Eds.) (1994). Can Lichstein, K. L., & Riedel, B. W. (1994). Behavioral
personality change. Washington, DC: American Psycho- assessment and treatment of insomnia: A review with an
logical Association. emphasis on clinical application. Behavior Therapy, 25,
Hersen, M., & Bellack, A. S. (Eds.) (1998). Behavioral 659±688.
assessment: A practical handbook (4th ed.). Boston: Allyn Linscott, J., & DiGiuseppe, R. (1998). Cognitive assess-
& Bacon. ment. In M. Hersen & A. S. Bellack (Eds.), Behavioral
Hersen, M., & Turner, S. M. (Eds.) (1994). Diagnostic assessment: A practical handbook (4th ed.). Boston: Allyn
interviewing (2nd ed.). New York: Plenum. & Bacon.
Hughes, H. M., & Haynes, S. N. (1978). Structured Malec, J. F., & Lemsky, C. (1996). Behavioral assessment
laboratory observation in the behavioral assessment of in medical rehabilitation: Traditional and consensual
parent±child interactions: A methodological critique. approaches. In L. Cushman & M. Scherer (Eds.)
Behavior Therapy, 9, 428±447. Psychological assessment in medical rehabilitation
Honaker, L. M., & Fowler, R. D. (1990). Computer- (pp. 199±236). Washington, DC: American Psychologi-
assisted psychological assessment. In: G. Goldstein & M. cal Association.
Hersen (Eds.), Handbook of psychological assessment. Marsella, A. J., & Kameoka, V. (1989). Ethnocultural
(pp. 521±546.) New York: Pergamon. issues in the assessment of psychopathology. In S.
Iwata, B. A. (and 14 other authors). (1994). The functions Wetzler (Ed.), Measuring mental illness: Psychometric
of self-injurious behavior: An experimental± assessment for clinicians (pp. 157±181). Washington, DC:
epidemiological analysis. Journal of Applied Behavior American Psychiatric Association.
Analysis, 27, 215±240. Mash, E. J., & Hunsley, J. (1990). Behavioral assessment: A
Jacob, T. Tennenbaurm, D., Bargiel, K., & Seilhamer, R. contemporary approach. In A. S. Bellack, M. Hersen, &
A. (1995). Family interaction in the home: Development A. E. Kazdin (Eds.), International handbook of behavior
of a new coding system. Behavior Modification, 12, modification and therapy (2nd ed., pp. 87±106). New
249±251. York: Plenum.
Jacobson, N. S., & Truax, P. (1991). Clinical significance: Mash, E. J., & Hunsley, J. (1993). Assessment considera-
A statistical approach to defining meaningful change in tions in the identification of failing psychotherapy:
psychotherapy research. Journal of Consulting and Bringing the negatives out of the darkroom. Psycholo-
Clinical Psychology, 59, 12±19. gical Assessment, 5, 292±301.
James, L. D., Thorn, B. E., & Williams, D. A. (1993). Goal Mash, E. J. & Terdal, L. G. (1988). Behavioral assessment
specification in cognitive-behavioral therapy for chronic of childhood disorders. New York: Guilford Press.
headache pain. Behavior Therapy, 24, 305±320. McConaghy, N. (1998). Assessment of sexual dysfunction
Jensen, B. J., & Haynes, S. N. (1986). Self-report and deviation. In M. Hersen & A. S. Bellack (Eds.),
questionnaires. In A. R. Ciminero, C. S. Calhoun, & Behavioral assessment: A practical handbook (4th ed).
H. E. Adams (Eds.), Handbook of behavioral assessment Boston: Allyn & Bacon.
(pp 150±175). New York: Wiley. McFall, R. M. & McDonel, E. (1986). The continuing
Johnson, W. G., Schlundt, D. G., Barclay, D. R., Carr- search for units of analysis in psychology: Beyond
Nangle, R. E., & Engler, L. B. (1995). A naturalistic persons, situations and their interactions. In R. O.
functional analysis of binge eating. Behavior Therapy, 26, Nelson & S. C. Hayes (Eds.), Conceptual foundations of
101±118. behavioral assessment (pp. 201±241). New York: Guil-
Johnston, J. M., & Pennypacker, H. S. (1993). Strategies ford Press.
and tactics of behavioral research (2nd ed.). Hillsdale, NJ: McReynolds, P. (1986). History of assessment in clinical
Erlbaum. and educational settings. In R. O. Nelson & S. C. Hayes
Kanfer, F. H. (1985). Target selection for clinical change (Eds.), Conceptual foundations of behavioral assessment
programs. Behavioral Assessment, 7, 7±20. (pp. 42±80). New York: Guilford Press.
Kazdin, A. E. (1979). Situational specificity: The two edged Mischel, W. (1968). Personality and assessment. New York:
sword of behavioral assessment. Behavioral Assessment, Wiley.
1, 57±75. Moran, G., Dumas, J., & Symons, D. K. (1992).
Kazdin, A. (1997). Research design in clinical psychology Approaches to sequential analysis and the description
(2nd ed.). New York: Allyn & Bacon. of contingency tables in behavioral interaction. Beha-
Kazdin, A. E., & Kagan, J. (1994). Models of dysfunction vioral Assessment, 14, 65±92.
in developmental psychopathology. Clinical Psychology: Munk, D. D., & Repp, A. C. (1994). Behavioral assessment
Science and Practice, 1, 35±52. of feeding problems of individuals with severe disabil-
Kern, J. M. (1991). An evaluation of a novel role-play ities. Journal of Applied Behavior Analysis, 27, 241±250.
methodology: The standardized idiographic approach. Nelson, R. O., & Hayes, S. C. (1986). Conceptual
Behavior Therapy, 22, 13±29. foundations of behavioral assessment. New York: Guil-
Kerns, R. D. (1994). Pain management. In M. Hersen & ford Press.
References 185

Newman, M. G., Hofmann, S. G., Trabert, W., Roth, W. interviews and rating scales. In M. Hersen & A. S.
T., & Taylor, C. B. (1994). Does behavioral treatment of Bellack (Eds.), Behavioral assessment: A practical hand-
social phobia lead to cognitive changes? Behavior book (4th ed.). Boston: Allyn & Bacon.
Therapy, 25, 503±517. Shadish, W. R. (1996). Meta-analysis and the exploration
Nezu, A. M., & Nezu, C. M. (1989). Clinical decision of causal mediating processes: A primer of examples,
making in behavior therapy: A problem-solving perspec- methods, and issues. Psychological Methods. 1, 47±65.
tive. Champaign, IL: Research Press. Shapiro, E. S. (1984). Self-monitoring. In T. H. Ollendick
Nezu, A., Nezu, C., Friedman, & Haynes, S. N. (1996). & M. Hersen (Eds.), Child behavioral assessment:
Case formulation in behavior therapy. In T. D. Eells Principles and procedures (pp. 350±373). Elmsford, NY:
(Ed.), Handbook of psychotherapy case formulation. New Pergamon.
York: Guilford Press. Shapiro, E. W., & Kratochwill, T. R. (Eds.) (1988).
Nunnally, J. C., & Burnstein, I. H. (1994). Psychometric Behavioral assessment in schools, Conceptual foundations
theory (3rd ed.) New York: McGraw-Hill. and practical applications. New York: Guilford Press.
O'Brien, W. H., & Haynes, S. N. (1995). A functional Shiffman, S. (1993). Assessing smoking patterns and
analytic approach to the conceptualization, assessment motives. Journal of Consulting and Clinical Psychology,
and treatment of a child with frequent migraine head- 61, 732±742.
aches. In Session, 1, 65±80. Silva, F. (1993). Psychometric foundations and behavioral
O'Donohue, W., & Krasner, L. (1995). Theories of behavior assessment. Newbury Park, CA: Sage
therapy. Washington, DC: American Psychological Silverman, W. K., & Kurtines, W. M. (1998). Anxiety and
Association. phobic disorders: A pragmatic perspective. In
O'Leary, D. K. (Ed.) (1987). Assessment of marital discord: M. Hersen, & A. S. Bellack (Eds.), Behavioral assess-
An integration for research and clinical practice. Hillsdale, ment: A practical handbook (4th ed.). Boston: Allyn &
NJ: Erlbaum. Bacon.
O'Leary, K. D., Malone, J., & Tyree, A. (1994). Physical Smith, G. T. (1994). Psychological expectancy as mediator
aggression in early marriage: Prerelationship and rela- of vulnerability to alcoholism. Annals of the New York
tionship effects. Journal of Consulting and Clinical Academy of Sciences, 708, 165±171.
Psychology, 62, 594±602. Smith, G. T., & McCarthy, D. M. (1995). Methodological
O'Leary, K. D., Vivian, D., & Malone, J. (1992). considerations in the refinement of clinical assessment
Assessment of physical aggression against women in instruments. Psychological Assessment, 7, 300±308.
marriage: The need for multimodal assessment. Beha- Sobell, L. C., Toneatto, T., & Sobell, M. B. (1994).
vioral Assessment, 14, 5±14. Behavioral assessment and treatment planning for
Ollendick, T. H., & Hersen, M. (1984). Child behavioral alcohol, tobacco, and other drug problems: Current
assessment, principles and procedures. New York: status with an emphasis on clinical applications.
Pergamon. Behavior Therapy, 25, 523±532.
Ollendick, T. H., & Hersen, M. (1993). Handbook of child Spector, P. E. (1992). Summated rating scale construction:
and adolescent assessment. Boston: Allyn & Bacon. An introduction. Beverly Hills, CA: Sage.
Persons, J. B. (1989). Cognitive therapy in practice: A case Sprague, J. R., & Horner, R. H., (1992). Covariation
formulation approach. New York: Norton. within functional response classes: Implications for
Persons, J. B., & Bertagnolli, A. (1994). Cognitive- treatment of severe problem behavior. Journal of Applied
behavioural treatment of multiple-problem patients: Behavior Analysis, 25, 735±745.
Application to personality disorders. Clinical Psychology Strosahl, K. D., & Linehan, M. M. (1986). Basic issues in
and Psychotherapy, 1, 279±285. behavioral assessment. In A. Ciminero, K. S. Calhoun,
Persons, J. B., & Fresco, D. M. (1998). Assessment of & H. E. Adams (Eds.), Handbook of behavioral assess-
depression. In M. Hersen & A. S. Bellack (Eds.), ment (pp. 12±46). New York: Wiley.
Behavioral assessment: A practical handbook (4th ed.). Suen, H. K., & Ary, D. (1989). Analyzing quantitative
Boston: Allyn & Bacon. observation data. Hillsdale, NJ: Erlbaum.
Piotrowski, C., & Zalewski, C. (1993). Training in Sutker, P. B., & Adams, H. E. (Eds.) (1993). Comprehensive
psychodiagnostic testing in APA-approved PsyD and handbook of psychopathology. New York: Plenum.
PhD clinical psychology programs. Journal of Person- Taylor, C. B., & Agras, S. (1981). Assessment of phobia. In
ality Assessment, 61, 394±405. D. H. Barlow (Ed.), Behavioral assessment of adult
Regier, D. A., Farmer, M. E., Rae, D. S., Locke, B. Z., disorders (pp. 280±309). New York: Guilford Press.
Keith, S. J., Judd, L. L., & Goodwin, F. K. (1990). Timberlake, W., & Farmer-Dougan, V. A. (1991). Re-
Comorbidity of mental disorders with alcohol and other inforcement in applied settings: Figuring out ahead of
drug abuse. Journal of the American Medical Association, time what will work. Psychological Bulletin, 110,
264, 2511±2518. 379±391.
Rychtarik, R. G., & McGillicuddy, N. B. (1998). Assess- Torgrud, L. J., & Holborn, S. W. (1992). Developing
ment of appetitive disorders: Status and empirical externally valid role-play for assessment of social skills:
methods in alcohol, tobacco, and other drug use. In A behavior analytic perspective. Behavioral Assessment,
M. Hersen & A. S. Bellack (Eds.), Behavioral Assess- 14, 245±277.
ment: A practical handbook (4th ed.). Boston: Allyn & Turk, D. C., & Salovey, P. (Eds.) (1988). Reasoning,
Bacon. inference, and judgment in clinical psychology. New York:
Sarwer, D., & Sayers, S. L. (1998). Behavioral interviewing. Free Press.
In M. Hersen, & A. S. Bellack (Eds.), Behavioral Turkat, I. (1986). The behavioral interview. In A.
assessment: A practical handbook (4th ed.). Boston: Ciminero, K. S. Calhoun, & H. E. Adams (Eds.),
Allyn & Bacon. Handbook of behavioral assessment (pp. 109±149). New
Schmidt, N. B., & Telch, M. J. (1994). Role of fear and York: Wiley.
safety information in moderating the effects of voluntary Tryon, W. W. (1985). Behavioral assessment in behavioral
hyperventilation. Behavior Therapy, 25, 197±208. medicine. New York: Springer.
Schlundt, D. G., Johnson, W. G., & Jarrell, M. P. (1986). A Tryon, W. W. (1991). Activity measurement in psychology
sequential analysis of environmental, behavioral, and and medicine. New York: Plenum.
affective variables predictive of vomiting in bulimia Tryon, W. (1996). Observing contingencies: Taxonomy and
nervosa. Behavioral Assessment, 8, 253±269. methods. Clinical Psychology Review (in press).
Segal, D. L., & Fal, S. B. (1998). Structured diagnostic Tryon, W. W. (1998). Behavioral observation. In
186 Principles and Practices of Behavioral Assessment with Adults

M. Hersen & A. S. Bellack (Eds.). Behavioral assessment: Weiss, R. L., & Summers, K. J., (1983). Marital interaction
A practical handbook (4th ed.). Boston: Allyn & Bacon. coding systemÐIII. In E. E. Filsinger (Ed.), Marriage
(in press) and family assessment: A sourcebook for family therapy
Waddell, G., & Turk, D. C. (1992). Clinical assessment of (pp. 85±115). Beverly Hills, CA: Sage.
low back pain. In D. C. Turk & R. Melzack (Eds.), Wolpe, J., & Turkat, I. D. (1985). Behavioral formulation
Handbook of pain assessment (pp. 15±36). New York: of clinical cases. In I. Turkat (Ed.), Behavioral cases
Guilford Press. formulation (pp. 213±244). New York: Plenum.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.08
Intellectual Assessment
ALAN S. KAUFMAN
Yale University School of Medicine, New Haven, CT, USA
and
ELIZABETH O. LICHTENBERGER
The Salk Institute, La Jolla, CA, USA

4.08.1 INTRODUCTION 188


4.08.1.1 Brief History of Intelligence Testing 188
4.08.1.2 Controversy Over Intelligence Testing 190
4.08.1.3 Principles of the Intelligent Testing Philosophy 192
4.08.2 MEASURES OF INTELLIGENCE 193
4.08.2.1 Wechsler's Scales 193
4.08.2.1.1 Wechsler Primary and Preschool Intelligence Scale-Revised (WPPSI-R) 193
4.08.2.1.2 Wechsler Intelligence Scale for Children-3rd Edition (WISC-III) 195
4.08.2.1.3 WISC-III Short Form 198
4.08.2.1.4 Wechsler Adult Intelligence Scale-Revised (WAIS-R) 199
4.08.2.1.5 Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) 203
4.08.2.1.6 Kaufman Assessment Battery for Children (K-ABC) 208
4.08.2.1.7 Kaufman Adolescent and Adult Intelligence Test 211
4.08.2.1.8 Overview 214
4.08.2.1.9 The Stanford±Binet: Fourth edition 216
4.08.2.1.10 Woodcock±Johnson Psycho-Educational Battery-Revised: tests of cognitive ability (WJ-R) 217
4.08.2.1.11 Detroit Tests of Learning Aptitude (DTLA-3) 220
4.08.2.1.12 Differential Abilities Scales (DAS) 221
4.08.2.1.13 Cognitive Assessment System (CAS) 223
4.08.3 INSTRUMENT INTEGRATION 224
4.08.3.1 K-ABC Integration with Wechsler Scales 224
4.08.3.2 Integration of KAIT with Wechsler Scales 224
4.08.3.3 Integration of Binet IV with Wechsler Scales 225
4.08.3.4 Integration of WJ-R with Wechsler Scales 225
4.08.3.5 DTLA-3 Integration with Wechsler Scales 226
4.08.3.6 Integration of DAS with Wechsler Scales 226
4.08.3.7 Integration of CAS with Wechsler Scales 227
4.08.4 FUTURE DIRECTIONS 227
4.08.5 SUMMARY 228
4.08.6 ILLUSTRATIVE CASE REPORT 229
4.08.6.1 Referral and Background Information 229
4.08.6.2 Appearance and Behavioral Characteristics 231
4.08.6.3 Tests Administered 231
4.08.6.4 Test Results and Interpretation 232
4.08.6.5 Summary and Diagnostic Impressions 233
4.08.6.6 Recommendations 234
4.08.7 REFERENCES 235

187
188 Intellectual Assessment

4.08.1 INTRODUCTION the 1830s, Jean Esquirol distinguished between


mental retardation and mental illness, unlump-
The assessment of intellectual ability has ing idiocy from madness (Kaufman, 1983). He
grown and continued to develop and flourish focused on language and speech patterns, and
since the nineteenth century. This chapter gives a even on physical measurements such as the
foundation for understanding the progression of shape on the skull, in testing ªfeeblemindedº
intellectual assessment through a brief historical and ªdementedº people.
review of IQ testing. Some of the controversy Another contribution of Esquirol was a
surrounding intelligence testing will be intro- system for labeling the retarded. He formed a
duced, and an ªintelligentº approach to testing hierarchy of mental retardation, with ªmoronº
(Kaufman, 1979, 1994b) will be discussed in at the top. Those less mentally adept were
response to the critics of testing. There are classified as ªimbecileº and those at the bottom
multiple available tests for assessing child, rung of intelligence were ªidiots.º The 1990s
adolescent, and adult intelligence, and this classification systems, which use terms like
chapter will address a select group of measures. profoundly, severely, or moderately retarded,
A description and brief overview is provided on appear to most as less offensive than Esquirol's
the following intelligence tests: Wechsler Pri- labels.
mary and Preschool Intelligence Scale-Revised In the mid-1800s, another innovator joined
(WPPSI-R; Wechsler, 1989), Wechsler Intelli- Esquirol in testing retarded individuals. As
gence Scale for Children-Third Edition (WISC- opposed to Esquirol's use of verbal tests,
III; Wechsler, 1974), Wechsler Adult Intelli- Edouard Seguin tested these individuals using
gence Scale-Revised (WAIS-R; Wechsler, 1981), nonverbal methods, oriented toward sensation
Kaufman Assessment Battery for Children (K- and motor activity (Kaufman, 1983). A link
ABC; Kaufman & Kaufman, 1983), Kaufman between Seguin's work and the twentieth
Adolescent and Adult Intelligence Test (KAIT; century can be seen, as many of the procedures
Kaufman & Kaufman, 1993), Stanford±Binet: he developed were adopted or modified by later
Fourth Edition (Binet IV; Thorndike, Hagen, & developers of performance and nonverbal tasks.
Sattler, 1986b), Woodcock-Johnson Psycho- Intelligence testing and education became
Educational Battery-Revised: Tests of Cognitive intertwined during this time when Seguin
Ability (WJ-R; Woodcock & Johnson, 1989), convinced authorities of the desirability of
Detroit Tests of Learning Aptitude (DTLA-3; educating the ªidiotsº and ªimbeciles.º Seguin
Hammill, 1991), Differential Abilities Scale was also the inspiration for Maria Montessori.
(DAS; Elliott, 1990), and Das±Naglieri Cogni- Many of his methods and materials are present
tive Assessment System (CAS; Naglieri & Das, in the Montessori approach to education.
1996). In an approach similar to Seguin's, stressing
A thorough cognitive assessment contains discrimination and motor control, Galton
supplemental measures in addition to the main studied individual differences in the ordinary
instrument used to measure IQ, as will be man, not just the tail ends of the normal curve.
discussed later in this chapter. Accordingly, He was committed to the notion that intelligence
following the description and overview of these is displayed through the use of the senses
multiple instruments is a section which inte- (sensory discrimination and sensory motor
grates each of them with the Wechsler scales, coordination), and believed that those with the
focusing on how they may be used with highest IQ should also have the best discriminat-
Wechsler's tests to further define cognitive ing abilities. Therefore, he developed mental
functioning. The final part of this chapter tests that were a series of objective measurements
provides an illustrative case report that com- of sensory abilities like keenness of sight, color
bines a number of different measures in the discrimination, and pitch discrimination; sen-
assessment of a 13-year-old female with aca- sory motor-abilities like reaction time and
demic difficulties. steadiness of hand; and motor abilities like
strength of squeeze and strength of pull (Cohen,
4.08.1.1 Brief History of Intelligence Testing Montague, Nathanson & Swerdlik, 1988).
Galton's theory of intelligence was simplistic:
Exactly when intelligence testing began is people take in information through their senses,
difficult to pinpoint. IQ tests as they are known so those with better developed senses ought to
in the 1990s stem from nineteenth century be more intelligent. Although his theory of
Europe. Study of the two extremes of intelli- intelligence was quite different than what is
gence, giftedness and retardation, led to break- considered as intelligence today, he is credited
throughs in intellectual assessment. Tracing the with establishing the first comprehensive in-
roots of assessment, the early pioneers were dividual intelligence test. He also influenced two
Frenchmen, who worked with the retarded. In basic notions of intelligence: the idea that
Introduction 189

intelligence is a unitary construct, and that Every IQ test in existence has been impacted
individual differences in intelligence are largely greatly by Binet's work, and incorporates many
genetically determined (possibly influenced by of the same kinds of concepts and test questions
the theory of his cousin, Charles Darwin) (Das, that he developed. Lewis Terman was one of
Kirby, & Jarman, 1979). several Americans who translated the Binet±
Galton's concepts were brought to the USA Simon for use in the USA. Terman published a
by James McKeen Cattell, an assistant in ªtentativeº revision in 1912. Terman then took
Galton's laboratory (Roback, 1961). In 1890, four years to carefully adapt, expand, and
Cattell established a Galton-like mental test standardize the Binet±Simon. After much
laboratory at the University of Pennsylvania, painstaking work, in 1916 the Stanford±Binet
and he moved his laboratory to Columbia was born. This test used the concept of mental
University in New York the next year. He quotient and introduced the intelligence quo-
shared Galton's philosophy that intelligence is tient. The Stanford±Binet left its competitors in
best measured by sensory tasks, but expanded the dust, and became the leading IQ test in
his use of ªmental tasksº to include standar- America.
dized administration procedures. He urged for Like Binet, Terman viewed intelligence tests
the establishment of norms, and thereby took as useful for identifying ªfeeblemindedness,º or
the assessment of mental ability out of the arena weeding out the unfit. Terman also saw the
of abstract philosophy and demonstrated that potential for using intelligence tests with adults
mental ability could be studied experimentally for determining ability to perform well in
and practically. Studies conducted around the certain occupations. He believed that minimum
turn of the century at Cattell's Columbia intelligence quotients were necessary for success
laboratory showed that American versions of in specific occupations.
Galton's sensory-motor test correlated close to With the onset of World War I, the field of
zero with meaningful criteria of intelligence, adult assessment grew quickly due to practical
such as grade-point average in college. recruitment issues. The USA needed a way to
Following Esquirol's lead by focusing on evaluate the mental abilities of thousands of
language abilities, Alfred Binet began to recruits and potential officers in an expedient
develop mental tasks with his colleagues Victor manner. Due to the large volume of individuals
Henri and Theodore Simon (Binet & Henri, tested, a group version of Binet's test was
1895; Binet & Simon, 1905, 1908). His tests were created by Arthur Otis, a student of Terman.
complex, measuring memory, judgment, rea- This group-administered Stanford±Binet was
soning, and social comprehension, and these labeled the Army Alpha. The Army Beta was
tasks survive to the 1990s in most tests of also created during World War I to assess
intelligence for children and adults. anyone who could not speak English or who
The Minister of Public Instruction in Paris was suspected of malingering. This was a
appointed Binet to study the education of nonverbal problem solving test, which was a
retarded children in 1904. The Minister wanted forerunner of today's nonverbal (ªPerfor-
Binet to separate retarded from normal children manceº) subtests. The Army Alpha and Army
in the public schools. Thus, with 15 years worth Beta tests, published by Yerkes in 1917, were
of task development behind him, the Binet± validated on huge samples (nearly two million).
Simon scale quickly was assembled (Sattler; The tests were scores ªAº to ªD-º with the
1988). percent scoring ªAº supporting their validity:
Binet used a new approach in his tests; he 7% of recruits, 16% of corporals, 24% of
ordered tasks from easy to hard within the scale. sergeants, and 64% of majors. The best evidence
In 1908 and 1911 he revised his test to group of validity, though, was the Peter Principle in
tasks by age level, to add levels geared for action. Second lieutenants (59% ªAº) out-
adults, to introduce the concept of mental age, performed their direct superiors, first lieute-
and to give more objective scoring rules (Sattler, nants (53%) and captains (53%), while those
1988). If someone passed the nine-year-old level with ranks higher than major did not do as well
tasks, but failed the ones at the 10-year level, as majors (Kaufman, 1990).
then that person had the intelligence of the The subtests developed by Binet and World
typical nine-year old, whether the person was 6, War I psychologists were borrowed by David
9, or 30. The measurement adult's intelligence, Wechsler in the mid-1930s to develop the
except mentally retarded individuals, was al- Wechsler-Bellevue Intelligence Scale. His in-
most an after thought. Binet's untimely death in novation was not in the selection of tasks, but in
1911 prevented him from actualizing the many his idea that IQ was only in part a verbal
applications of his tests in child development, intelligence. He also assembled a Performance
education, medicine, and research (Kaufman, Scale from the nonverbal, visual-motor subtests
1990). that were developed during the war to evaluate
190 Intellectual Assessment

people who could not speak English very well or 1990s, Wechsler's scales have proven themselves
whose motivation to succeed was in doubt. by withstanding challenges by other test devel-
Wechsler paired the verbally laden Army Alpha opers, including the Kaufman Assessment
and the Stanford±Binet to create the verbal Battery for Children (K-ABC) (Kaufman &
scale, and the Army Group Examination Beta Kaufman, 1983), the Kaufman Adolescent and
and the Army Individual Performance scale to Adult Intelligence Test (KAIT) (Kaufman &
create the Performance scale. These two scales Kaufman, 1993), the Differential Abilities Scale
together were thought to contribute equally to (DAS) (Elliott, 1990), and Woodcock±Johnson
the overall intelligence scale. The Full Scale IQ, (Woodcock & Johnson, 1989). These many
for Wechsler, is an index of general mental other tests are used widely, but generally remain
ability (g). as alternatives or supplements to the Wechsler
To Wechsler, these tests were dynamic clinical scales.
instruments, more than tools to subdivide
retarded individuals (Kaufman, 1990). How- 4.08.1.2 Controversy Over Intelligence Testing
ever, the professional public was leery. They
wondered how tests developed for the low end of The measurement of intelligence has long
the ability spectrum could be used to test normal been the center of debate. In the past, critics
people's intelligence. The professionals and have spoken of IQ tests as ªbiased,º ªunfair,º
publishers had a difficult time accepting that and ªdiscriminatory.º The critics' arguments in
nonverbal tests could be used as measures for all the 1990s center more around what the IQ tests
individuals, not just foreigners. The postwar truly measure, as well as how or if they should
psychological community held the belief that IQ be interpreted, their relevance to intervention,
tests were primarily useful for predicting and their scope. Despite the controversy, there
children's success in school, but were critical is great interest and need for measurement of
of Wechsler for developing a test primarily for intelligence, especially in the educational con-
adolescents and adults. text, in order to help children and adolescents.
He persisted with his idea that people with Amidst the criticisms and limitations of IQ
poor verbal intelligence may be exceptional in testing, these instruments remain a most
their nonverbal ability, and vice versa. He met technologically advanced and sophisticated
with resistance and frustration, and could not tool of the profession for providing essential
find a publisher willing to subsidize his new test. and unique information to psychologists so
Thus, with a group of psychologist friends, they may best serve the needs of children and
Weschler tested nearly 2000 children, adoles- adults. When used in consideration of the
cents, and adults in Brooklyn, New York. American Psychological Association's Ethical
Although it was a very urban sample, he Principles of Psychologists (American Psycho-
managed to obtain a well stratified sample. logical Association, 1990) Principle 2-Compe-
Once it had been standardized, Wechsler had tence, which encourages clinicians to recognize
no problem finding a publisher in The Psycho- differences among people (age, gender, socio-
logical Corporation. The original Wechsler- economic, and ethnic backgrounds) and to
Bellevue (Wechsler, 1939) has grandchildren, understand test research regarding the validity
including the Wechsler Intelligence Scale for and the limitations of their assessment tools,
Children-Revised (WISC-R), and the Wechsler these tests can be beneficial despite the
Adult Intelligence Scale-Revised (WAIS-R); controversy.
more recently in 1991 a great-grandchild was Three controversial themes associated with
born, the WISC-III. IQ testing were noted by Kaufman (1994). The
Loyalty to the Stanford±Binet prevented first involves opposition to the common practice
Wechsler's test from obtaining instant success. of subtest interpretation advocated by Wechsler
However, gradually, Wechsler overtook the (1958) and Kaufman (1979, 1994b). The second
Binet during the 1960s as the learning dis- includes those who would abandon the practice
abilities movement gained popularity. The altogether. Finally, the third group suggests that
Verbal IQ and Performance IQ provided by the concept of intelligence testing is sound, but
Wechsler's tests helped to identify bright more contemporary instrumentation could
children who had language difficulties or improve the effectiveness of the approach.
visual-perceptual problems. The Stanford± The first group of psychologists has encour-
Binet offered just one IQ, and the test was so aged practitioners to ªjust say `no' to subtest
verbally oriented that people with exceptional analysisº (McDermott, Fantuzzo, & Glutting,
nonverbal intelligence were penalized. (1990) (p. 299; also see Glutting, McDermott,
Terman's Stanford±Binet lost favor when Prifitera, & McGrath, (1994), and Watkins &
revisions of the battery after his death in 1956 Kush, (1994)). McDermott and his colleagues
proved to be expedient and shortsighted. In the argue that interpreting a subtest profile is in
Introduction 191

violation of the principles of valid test inter- and Barnett's (1994) advisement that the Verbal
pretation because the ipsative method fails to and Performance scales are meaningless, then it
improve prediction (McDermott, Fantuzzo, prevents the fair use of these tests with those
Glutting, Watkins, & Baggaley, 1992) and groups who have inconsistent V±P discrepan-
therefore does not augment the utility of the cies. Moreover, contrary to what MacMann
test. It is agreed that the results of studies and Barnett (1994) suggest, it is clear that when
conducted by McDermott et al. (1992), do a child earns very poor Verbal and average
suggest that using the WISC-III in isolation has Performance scores there are obvious implica-
limitations, but using the ipsative approach in tions for instruction and a high probability that
conjunction with other relevant information such results will be reflected in poor verbal
such as achievement test results and pertinent achievement (Naglieri, 1984).
background information may be beneficial. Another extremist group opposed to IQ
Kaufman (1994) further suggests that by testing is Witt and Gresham (1985) who state,
shifting to the child's midpoint score a more ªThe WISC-R lacks treatment validity in that
equally balanced set of hypotheses can be its use does not enhance remedial interventions
developed which can be integrated with other for children who show specific academic skill
findings to either strengthen or disconfirm deficienciesº (p. 1717). It is their belief that the
hypotheses. When the ipsative assessment Wechsler test should be replaced with assess-
approach is used to create a base from which ment procedures that have more treatment
to search for additional information to evaluate validity. However, as Kaufman (1994) points
hypothesized strengths and weaknesses in the out, Witt and Gresham (1985) do not provide
child's subtest profile its validity is extended evidence for their statements. Another pair of
beyond that which can be obtained using the researchers (Rechsly & Tilly, 1993) agree with
Wechsler subtests alone. If support is found for the Witt and Gresham (1985) statements about
the hypotheses, then such a strength or the lack of treatment validity of the WISC-R,
weakness can be viewed as reliable, because but only provide references that are not specific
of its cross-validation (Kaufman, 1994). When to the Wechsler scales. Thus, the Wechsler scales
considering this position and that represented appear to have been rejected by these research-
by McDermott et al., as well as Glutting et al. ers without ample relevant data.
(1994) and Watkins and Kush (1994), it is Witt and Gresham (1985) also complain that
important to recognize that these authors are the WISC-R (as well as the WISC-III) only
against subtest profile analysis not the use of IQ yields a score, and does not provide school
tests in general. This is in contrast to others who psychologists with direct strategies of what to do
hold a more extreme negative view of IQ testing. with and for children, which are what teachers
One extremist group that opposes IQ testing are requesting. As Kaufman (1994) points out,
includes those who advocate throwing away however, it is not the instrument's responsibility
Verbal and Performance IQs, along with the to provide direct treatment information; rather,
subtest profile interpretation, and finally the he states, ªIt is the examiner's responsibility . . .
Full Scale IQ because they insist that all that to provide recommendations for interventionº
Wechsler scales measure is g (MacMann & (p. 35). The examiner should not be just taking
Barnett, 1994). They argue that differences the bottom-line IQ scores or standard scores,
between the Verbal and Performance Scales on but should provide statements about a child's
Wechsler tests hold no meaning, that conven- strengths and weaknesses that have been cross-
tional intelligence tests only measure g (and a validated through the observations of behavior,
measure of g is not enough to warrant the background information, and the standardized
administration of such an instrument) and that intelligence and achievement tests.
Wechsler scale data do not have instructional Finally, there is a group of professionals who
value. These authors fail to recognize a wealth have suggested that the Wechsler has limits that
of data that illustrates that differences between should be recognized, but these limits could be
the Verbal and Performance Scales can be very addressed by alternative methods rather than
important. Any clinician using intelligence tests abandoning the practice of intelligence testing
cannot ignore the numerous studies that are altogether. Some have argued for a move
available that point to significant Verbal toward alternative conceptualizations of intelli-
Performance differences in patients with right- gence and methods to measure new constructs
hemisphere damage (Kaufman, 1990, that are based on factor analytic research (e.g.,
Chapter 9), in Hispanic and Navajo children Woodcock, 1990) while others have used
(McShane & Cook, 1985; McShane & Plas, neuropsychology and cognitive psychology as
1984; Naglieri, 1984), and in normal adults a starting point (e.g., Naglieri & Das, 1996). The
(Kaufman, 1990, Chapter 7). If only the Full results of these efforts have been tests such as the
Scale IQ is interpreted, following MacMann Das±Naglieri Cognitive Assessment System
192 Intellectual Assessment

(Naglieri & Das, 1996), Kaufman Assessment The WISC-III Manual (Wechsler, 1991,
Battery for Children (Kaufman & Kaufman, pp. 206±209) gives many such correlations
1983), Kaufman Adult Intelligence Test (Kauf- between the WISC-III IQs or Factor Indexes
man & Kaufman, 1993), and the Woodcock± and achievement measures. Although this
Johnson Tests of Cognitive Ability (Woodcock connection between the WISC-III and achieve-
& Johnson, 1989). This chapter will show how ment in school is well documented empirically,
these tests, and others, can be utilized in it should not be accepted ultimately as a
conjunction with the Wechsler to gain a more statement of fate, that if a child scores poorly
complete view of the child. on the WISC-III that they will do poorly in
The main goal of this chapter is to use school (Kaufman, 1994). Instead, constructive
Kaufman's (1994) philosophy of ªintelligentº interpretation of a test battery can lead to
testing to address some of the concerns about recommendations which may alter helpfully a
Wechsler interpretation through a careful ana- child's development.
lysis of the results and integration with other The second principle is that WISC-III
measures. Much of this discussion is based on the subtests are samples of behavior and are not
principles of IQ testing as outlined by Kaufman, exhaustive. Because the subtests only offer a
which focus on the view that ªWISC-III brief glimpse into a child's overall level of
assessment is of the individual, by the individual, functioning, examiners must be cautious in
and for the individualº (Kaufman, 1994, p. 14). generalizing the results to performance and
Through research knowledge, theoretical so- behaviors in other circumstances. The Full Scale
phistication, and clinical ability examiners must ªshould not be interpreted as an estimate of a
generate hypotheses about an individual's assets child's global or total intellectual functioning;
and deficits and then confirm or deny these and the WISC-III should be administered along
hypotheses by exploring multiple sources of with other measures, and the IQs interpreted in
evidence. Well-validated hypotheses must then the context of other test scoresº (Kaufman,
be translated into meaningful, practical recom- 1994, p. 7). It is important that the actual scores
mendations. A brief description of those five are not emphasized as the bottom-line; rather, it
principles of intelligent testing follows. is more beneficial to elaborate on what children
Clinician-scientists must come well equipped can do well, relative to their own level of ability.
with state of the art instrumentation, good Such information can be used to create an
judgment, knowledge of psychology, and clin- individualized education program which will
ical training to move beyond the obtained IQs tap a child's areas of strength and help improve
(Kaufman, 1994). Integration of information areas of deficit.
from many sources and different tests is very Principle three states, ªThe WISC-III assesses
important if the child referred for evaluation is mental functioning under fixed experimental
to remain the focus of assessment because it is conditionsº (Kaufman, 1994, p. 8). Rigid
impossible to describe fully a person by just adherence to the standardized procedures for
presenting a few numbers from the Wechsler administration and scoring, outlined in the
protocol or those obtained from a computer WISC-III manual (Wechsler, 1991), helps to
program. Each adult and child who comes for ensure that all children are measured in an
an assessment has unique characteristics, a objective manner. However, parts of the
particular way of approaching test items, and standardized procedure make the testing situa-
may be affected differently by the testing tion very different than a natural setting. For
situation than the next individual. Through example, it is not very often in a children's every
the use of an integrated interpretation approach day life that someone is transcribing virtually
the various dimensions that influence a child can every word they say or timing them with a
become apparent. stopwatch. The standardization procedures are
important to follow, but must be taken into
4.08.1.3 Principles of the Intelligent Testing account as limitations when interpreting the
Philosophy scores obtained in the artificial testing situation.
The value of the intelligence test is enhanced
The first principle of intelligent testing is that when the examiner can meaningfully relate
ªthe WISC-III subtests measure what the observations of the child's behaviors in the
individual has learnedº (Kaufman, 1994, testing situation to the profile of scores.
p. 6). The WISC-III is like an achievement The fourth principle is that ªThe WISC-III is
test, in that it is a measure of past accomplish- optimally useful when it is interpreted from an
ments and is predictive of success in traditional information-processing modelº (Kaufman,
school subjects. Research indicates that intelli- 1994, p. 10). This is especially beneficial for
gence tests consistently prove to be good helping to hypothesize functional areas of
predictors of conventional school achievement. strength and dysfunction. This model suggests
Measures of Intelligence 193

examining how information enters the brain the WISC-III, while the WAIS-R may be used
from the sense organs (input), how information from ages 16±74.
is interpreted and processed (integration), how The different Wechsler scales overlap at ages
information is stored for later retrieval (sto- 6±7 and 16. Kaufman (1994) recommends that
rage), and how information is expressed ling- the WISC-III be used at both these overlapping
uistically or motorically (output). Through this age periods rather than the WPPSI-R or the
model, examiners can organize the test data, WAIS-R. One of the reasons cited for these
including fluctuations in subtest scores, into recommendations is that the WISC-III has a
meaningful underlying areas of asset and deficit. much better ªtopº than the WPPSI-R for
The fifth and very important principle of children who are ages six or seven. On the
intelligent testing is that, ªHypotheses gener- WPPSI-R a child can earn a maximum score of
ated from WISC-III profiles should be sup- 16 or 17 (rather than 19) on six of the 10 subtests
ported with data from multiple sourcesº when age seven. The recommendation to use the
(Kaufman, 1994, p. 13). Although good hypo- WISC-III rather than the WAIS-R at age 16 is
theses can be raised from the initial WISC-III made because the WAIS-R norms are outdated
test scores, such hypotheses do not hold water relative to the WISC-III norms. Kaufman (1990)
unless verified by diverse pieces of data. Such recommends that the WAIS-R norms for ages
supporting evidence may come from careful 16±19 be used cautiously, and states that the
observation of a child's behavior during test ªeven apart WISC-III norms, the Performance
administration; from the pattern of responses scale is more reliable for the WISC-III (0.92)
across various subtests; from background than the WAIS-R (0.88) at age 16º (p. 40).
information obtained from parents, teachers, Wechsler (1974) puts forth the definition that
or other referral sources; from previous test ªintelligence is the overall capacity of an
data; and from the administration of supple- individual to understand and cope with the
mental subtests. The integration of data from all world around him [or her]º (p. 5). His tests,
these different sources is critical in obtaining the however, were not predicated on this definition.
best and most meaningful clinical interpretation Tasks developed were not designed from well-
of a test battery. researched concepts exemplifying his definition.
In fact, as previously noted, virtually all of his
tasks were adapted from other existing tests.
4.08.2 MEASURES OF INTELLIGENCE Like the Binet, Wechsler's definition of
Intelligence tests are administered for a intelligence also ascribes to the conception of
variety of reasons including identification (of intelligence as an overall global entity. He
mental retardation, learning disabilities, other believed that intelligence cannot be tested
cognitive disorders, giftedness), placement directly, but can only be inferred from how
(gifted and other specialized programs), and an individual thinks, talks, moves, and reacts to
as a cognitive adjunct to a clinical evaluation. different stimuli. Therefore, Wechsler did not
The following comprehensive intelligence tests give credence to one task above another, but
are discussed in the next sections: WPPSI-R, believed that this global entity called intelligence
WISC-III, WAIS-R, K-ABC, KAIT, Binet-IV, could be ferreted out by probing a person with
WJ-R Tests of Cognitive Ability, DTLA-3, as many different kinds of mental tasks as one
DAS, and CAS. can conjure up. Wechsler did not believe in a
cognitive hierarchy for his tasks, and he did not
4.08.2.1 Wechsler's Scales believe that each task was equally effective. He
felt that each task was necessary for the fuller
As discussed in the brief history of IQ tests, appraisal of intelligence.
Wechsler's scales reign as leaders of measures of
child, adolescent, and adult intelligence. The 4.08.2.1.1 Wechsler Primary and Preschool
WISC-III is a standard part of a battery Intelligence Scale-Revised
administered to children by school psycholo- (WPPSI-R)
gists and private psychologists to assess level
(i) Standardization and properties of the scale
cognitive functioning, learning styles, learning
disabilities, or giftedness. The WAIS-R is The WPPSI-R is an intelligence test for
administered invariably as a part of a battery children aged three years, through seven years,
to assess intellectual ability for a clinical, three months. The original version of the
neurological, or vocational evaluation of ado- WPPSI was developed in 1967 for ages four
lescents and adults. The WPPSI-R may be used to six and a half years, and the WPPSI-R was
to measure intellectual ability from ages three to revised in 1989. Several changes were made to
seven years, three months; intellectual assess- the revised version of the WPPSI-R. The norms
ment may be done from age six up to age 16 with were updated, the appeal of the content to
194 Intellectual Assessment

young children was improved, and the age range and WISC-III Performance, Verbal, and Full
was expanded. Scale IQs for a sample of 188 children were 0.73,
The WPPSI-R is based on the same Wechsler± 0.85, and 0.85, respectively. The correlations
Bellevue theory of intelligence, emphasizing between the WPPSI-R and other well known
intelligence as a global capacity but having cognitive measures are, on average, much lower.
Verbal and Performance scales as two methods The WPPSI-R Full Scale IQ correlated 0.55
of assessing this global capacity (Kamphaus, with the K-ABC Mental Processing Composite
1993). The Verbal scale subtests include: (Kamphaus, 1993) and 0.77 with the Binet IV
Information Comprehension, Arithmetic, Vo- Test Composite (McCrowell & Nagle, 1994). In
cabulary, Similarities, and Sentences (optional general, the validity coefficients provide strong
subtest). The Performance scale subtests in- evidence for the construct validity of the
clude: Object Assembly, Block Design, Mazes, WPPSI-R (Kamphaus, 1993).
Picture Completion, and Animal Pegs (optional
subtest).
(ii) Overview
Like the K-ABC and the Differential Abilities
Scales (DAS), the WPPSI-R allows the exam- The WPPSI-R is a thorough revision of the
iner to ªhelpº or ªteachº the client on early 1967 WPPSI, with an expanded age range, new
items on the subtests to assure that the child colorful materials, new item types for very young
understands what is expected. Providing this children, a new icebreaker subtests (Object
extra help is essential when working with Assembly), and a comprehensive manual (Kauf-
reticent preschoolers (Kamphaus, 1993). man, 1990). The revision of the test has resulted
Subtest scores have a mean of 10 and a in an instrument that is more attractive, more
standard deviation of three. The overall Verbal, engaging, and materials that are easier to use
Performance, and Full Scale IQs have a mean of (Buckhalt, 1991; Delugach, 1991).
100 and a standard deviation of 15. The The normative sample is large, provides
examiner manual provides interpretive tables recent norms and is representative of the 1986
that allow the examiner to determine individual US Census data (Delugach, 1991; Kaufman,
strengths and weaknesses as well as the 1990). The split-half reliability of the IQs and
statistical significance and clinical rarity of most subtests are exceptional, the factor
Verbal and Performance score differences. analytic results for all age groups are excellent,
The WPPSI-R was standardized on 1700 and the concurrent validity of the battery is well
children from age three through seven years, 3 supported by several excellent correlational
months. The standardization procedures fol- studies (Delugach, 1991; Kaufman, 1990). The
lowed the 1986 US Census Bureau estimates. manual provides a number of validity studies,
Stratification variables included gender, race, factor analytic results, research overviews, and
geographic region, parental occupation, and state-of-the-art interpretive tables, which pro-
parental education. vide the examiner with a wealth of information.
The WPPSI-R appears to be a highly reliable Kaufman (1990) noted that the WPPSI-R has a
measure. The internal consistency coefficients solid psychometric foundation.
across age groups, for the Verbal, Performance, In spite of its reported strengths, the WPPSI-
and Full Scale IQs are 0.95, 0.92, and 0.96, R has flaws. In publishing the WPPSI-R, great
respectively. For the seven-year-old age group, effort was made to ensure that all subtests had
the reliability coefficients are somewhat lower. an adequate ªtopº and ªbottomº (Kaufman,
The internal consistency coefficients for the 1992). However, the WPPSI-R has an insuffi-
individual Performance subtests vary from 0.63 cient floor at the lowest age levels, which limits
for Object Assembly to 0.85 for Block Design, the test's ability to diagnose intellectual defi-
with a median coefficient of 0.79. The internal ciency in young preschoolers (Delugach, 1991).
consistency coefficients for the individual For example, a child at the lowest age level
Verbal subtests vary from 0.80 for Arithmetic (2±11±16 to 3±2±15) who earns only one point of
to 0.86 for Similarities, with a median coefficient credit on all subtests will obtain a Verbal IQ of
of 0.84. The test±retest coefficient for the Full 75, a Performance IQ of 68, and a Full Scale IQ
Scale IQ is 0.91. of 68, making it impossible to adequately assess
The WPPSI-R manual provides some infor- the child's degree of intellectual deficiency. The
mation on validity; however, it provides no WPPSI-R subtests are able to distinguish
information on the predictive validity of the test. between gifted and nongifted children at all
Various studies have shown that concurrent ages, but the top of some subtests is not
validity between the WPPSI-R and other tests is adequate to discriminate among gifted children.
adequate. The correlation between the WPPSI Kaufman (1992) indicates that at the youngest
and the WPPSI-R Full Scale IQs was reported at ages (3±4.5 years), all subtests are excellent.
0.87, and the correlations between WPPSI-R However, at age five, Geometric Design begins
Measures of Intelligence 195

to falter, and at ages 6.5 and above, it only through 16 years of age. The median age for the
allows a maximum scaled score of 16. Other each age group was the sixth month (e.g., 7
subtests, such as Object Assembly and Arith- years, 6 months). The standardization proce-
metic also have problems with the ceiling. dures followed the 1980 US Census data and the
Although the ceilings on the subtests described manual provides information by age, gender,
are not ideal, the IQ scales do allow maximum race/ethnicity, geographic region, and parent
IQs of 150 for all ages and IQs of 160 for education. ªOverall, the standardization of the
children ages 3±6.25. WISC-III is immaculate . . . a better-standar-
Another major problem with the WPPSI-R is dized intelligence test does not existº (Kaufman,
the role played by speed of responding. From 1993, p. 351).
both early developmental perspectives and The WISC-III yields three IQs, a Verbal Scale
common-sense perspectives, giving bonus IQ, a Performance Scale IQ, and a Full Scale IQ.
points for speed is silly (Kaufman, 1992). All three are standard scores (mean of 100 and
Young children may respond slowly for a standard deviation of 15) obtained by compar-
variety of reasons that have little to do with ing an individual's score with those earned by
intellect. A three- or four-year-old child might the representative sample of age peers. Within
respond slowly or deliberately because of lack the WISC-III, there are 10 mandatory and three
immaturity, lack of experience in test taking, supplementary subtests all of which span the age
underdeveloped motor coordination, or a range of six through 16 years. The Verbal scale's
reflective cognitive style. The WPPSI-R Object five mandatory subtests include: Information,
Assembly and Block Design place an over- Similarities, Arithmetic, Vocabulary, and Com-
emphasis on speed. For example, if a six and a prehension. The supplementary subtest on the
half or seven year old child solves every Object Verbal Scale is Digit Span. Digit Span is not
Assembly item perfectly, but does not work calculated into the Verbal IQ unless it has been
quickly enough to earn bonus points, they substituted for another Verbal subtest because
would only receive a scaled score of 6 (ninth one of those subtests has been spoiled (Kam-
percentile). Because of the age-inappropriate phaus, 1993; Wechsler, 1991).
stress on solving problems in with great speed, a The five mandatory Performance scale's
child's IQ may suffer on two of the 10 subtests subtests include Picture Completion, Picture
(Kaufman, 1992). In addition, the directions on Arrangement, Block Design, Object Assembly,
some of the Performance subtests are not and Coding. The two supplementary subtest on
suitable for young children because they are the Performance scale are Mazes and Symbol
not developmentally appropriate (Kaufman, Search. The Mazes subtest may be substituted
1990). However, Delugach (1991) reports that if for any Performance scale subtest; however,
the directions are too difficult, the test provides Symbol Search may only be substituted for the
procedures to ensure that the child understands Coding subtest (Kamphaus, 1993; Wechsler,
the demands of the task. 1991).
The WPPSI-R is a useful assessment tool, ªSymbol Search is an excellent task that
but, like all others, it possesses certain weak- should have been included among the five
nesses that limit its usefulness (Delugach, 1991). regular Performance subtests instead of Coding.
Examiners should be aware of the WPPSI-R's Mazes is an awful task that should have been
inherent strengths and weaknesses and keep dropped completely from the WISC-IIIº (Kauf-
them in mind during administration, scoring, man, 1994, p. 58). He goes further to say that
and interpretation. The WPPSI-R may provide ªthere's no rational reason for the publisher to
the examiner with useful information; however, have rigidly clung to Coding as a regular part of
ªit does little to advance our basic under- the WISC-III when the new Symbol Search task
standing of the development and differentiation is clearly a better choice for psychometric
of intelligence or our understanding of the reasonsº (p. 59). Therefore, for all general
nature of individual differences in intelligenceº purposes, Kaufman (1994) strongly recom-
(Buckhalt, 1991). mends that Symbol Search be substituted
routinely for coding as part of the regular
battery, and to use Symbol Search to compute
4.08.2.1.2 Wechsler Intelligence Scale for the Performance IQ and Full Scale IQ. The
Children-3rd Edition (WISC-III) manual does not say to do this, but neither does
it prohibit it.
(i) Standardization and properties of the scale
Reliability of each subtest except Coding and
The WISC-III was standardized on 2200 Symbol Search was estimated by the split-half
children ranging in age from six through 16 method. Stability coefficients were used as
years. The children were divided into 11 age reliability estimates for the Coding and Symbol
groups, one group for each year from six Search subtests because of their speeded nature.
196 Intellectual Assessment

Across the age groups, the reliability coefficients front page of the WISC-III record form lists the
range from 0.69 to 0.87 for the individual seven standard scores in a box on the top right.
subtests. The average reliability, across the age The record form is quite uniform and laid out
groups, for the IQs and Indexes are: 0.95 for the nicely; however, it is difficult to know just what
Verbal IQ, 0.91 for the Performance IQ, 0.96 for to do with all of those scores. Kaufman (1994)
the Full Scale IQ, 0.94 for the Verbal Compre- has developed seven steps to interpretation
hension Index, 0.90 for the Perceptual Organi- which offer a systematic method of WISC-III
zation Index, 0.87 for the Freedom from interpretation that allows the clinician to
Distractibility Index and 0.85 for the Processing organize and integrate the test results in a
Speed Index (Wechsler, 1991). step-wise fashion. The seven steps (see Table 1)
Factor analytic studies of the WISC-III provide an empirical framework for profile
standardization data were performed for four attack while organizing the profile information
age group subsamples: ages 6±7 (n = 400), ages into hierarchies.
8±10 (n = 600), ages 11±13 (n = 600), and ages
14±16 (n = 600) (Wechsler, 1991). Compiling
(iii) Overview
the results of the analysis, a four-factor solution
was found for the WISC-III. Like the WISC-R, Professionals in the field of intelligence
Verbal Comprehension and Perceptual Orga- testing have described the third edition of the
nization remain the first two factors. Verbal Wechsler Intelligence Scale for Children in a
Comprehension involves verbal knowledge and number of different ways. Some critics feel that
the expression of this knowledge. Perceptual the WISC-III reports continuity, the status quo,
Organization, a nonverbal dimension, involves but makes little progress in the evolution of the
the ability to interpret and organize visually assessment of intelligence. Such critics note that
presented material. The third factor consists of despite more than 50 years of advancement in
the Arithmetic and Digit Span subtests. Factor theories of intelligence, the Wechsler philosophy
III has been described as Freedom from of intelligence (not actually a formal theory),
Distractibility since common among tasks is written in 1939, remains the guiding principle of
the ability to focus, to concentrate, and to the WISC-III (Schaw, Swerdilik, & Laurent,
remain attentive. Other interpretations of this 1993). One of the principal goals for developing
factor have included facility with numbers, the WISC-III stated in the manual was merely to
short-term memory, and sequencing because the update the norms, which is ªhardly a revision at
three tasks which comprise the factor all involve allº (Sternberg, 1993). Sternberg (1993) suggests
a linear process whereby numbers are manipu- that the WISC-III is being used to look for a test
lated. Success is either facilitated by or wholly of new constructs in intelligence, or merely a
dependent on memory (Kaufman, 1979). The new test, the examiner should look elsewhere.
fourth factor consists of Coding and Symbol In contrast to these fairly negative evalua-
Search, and is referred to as the Processing tions, Kaufman (1993) reports that the WISC-
Speed factor. Taken together, the Verbal III is a substantial revision of the WISC-R and
Comprehension and Perceptual Organization that the changes that have been made are
factors offer strong support for the construct considerable and well done. ªThe normative
validity of the Verbal and Performance IQs; sample is exemplary, and the entire psycho-
substantial loadings on the large, unrotated first metric approach to test development, valida-
factor (g) supports the construct underlying tion, and interpretation reflects sophisticated,
Wechsler's Full Scale IQ. state-of-the-art knowledge and competenceº
(Kaufman, 1993). For Kaufman, the WISC-
III is not without its flaws but his overall review
(ii) Analyzing the WISC-III data
of the test is quite positive. One of Kaufman's
To obtain the most information from the (1993) main criticisms is that the Verbal tasks
WISC-III, the psychologist should be more than are highly culturally-saturated and school-
familiar with each of the subtests individually as related, which tend to penalize bilingual,
well as with the potential information that those minority, and learning-disabled children. He
subtests can provide when integrated or com- suggests that perhaps a special scale could have
bined. The WISC-III is maximally useful when been developed to provide a fairer evaluation of
tasks are grouped and regrouped to uncover a the intelligence of children who are from the
child's strong and weak areas of functioning, so non-dominant culture or who have academic
long as these hypothesized assets and deficits are difficulties. Another criticism raised by Kauf-
verified by multiple sources of information. man is that too much emphasis is (placed on a
As indicated previously, the WISC-III pro- child's speed of responding on the WISC-III. It
vides examiners with a set of four Factor is difficult to do well on the WISC-III if you do
Indexes in addition to the set of three IQs. The not solve problems very quickly. This need for
Measures of Intelligence 197

Table 1 Summary of seven steps for interpreting WISC-III profiles.

Step

1 Interpret the full scale IQ


Convert it to an ability level and percentile rank and band it with error, preferable a 90% confidence
interval (about + 5 points)
2 Determine if the verbal-performance IQ discrepancy is statistically significant
Overall values for V±P discrepancies are 11 points at the 0.05 level and 15 points at the 0.01 level. For most
testing purposes, the 0.05 level is adequate
3 Determine if the V±P IQ discrepancy is interpretable, or if the VC and PO factor indexes should be
interpreted instead
Ask four questions about the Verbal and Performance Scales
Verbal Scale
(i) Is there a significant difference (p50.05) between the child's standard scores in VC vs. FD? size needed
for significant (VC±FD) = 13+ points
(ii) Is there abnormal scatter (highest minus lowest scaled score) among the five Verbal subtests used to
compute V-IQ? Size needed for abnormal verbal scatter = 7+ points
Performance Scale
(iii) Is there a significant difference (p50.05) between the child's standard scores on PO vs. PS? Size
needed for significant (PO±PS) = 15+ points
(iv) Is there abnormal scatter (highest minus lowest scaled score) among the five Performance subtests
used to compute P-IQ?
Size needed for abnormal performance scatter = 9+ points
If all answers are no, the V±P IQ discrepancy is interpretable. If the answer to one or more questions is yes,
the V±P IQ discrepancy may not be interpretable. Examine the VC±PO discrepancy. Overall values for
VC±PO discrepancies are 12 points at the 0.05 level and 16 points at the 0.01 level
Determine if the VC and PO indexes are unitary dimensions:
1. Is there abnormal scatter among the four VC subtests?
Size needed for abnormal VC scatter = 7+ points
2. Is there abnormal scatter among the four PO subtests?
Size needed for abnormal PO scatter = 8+ points
If the answer to either question is yes, then you probably shouldn't interpret the VC±PO Index discrepancy
± unless the discrepancy is to big to ignore (see Step 4). If both answers are no, interpret the VC±PO
differences as meaningful
4 Determine if the V±P IQ discrepancy (or VC±PO discrepancy) Is abnormally large
Differences of at least 19 points are unusually large for both the V±P and VC-PO discrepancies. Enter the
table with the IQs or Indexes, whichever was identified by the questions and answers in Step 3
If neither set of scores was found to be interpretable in Step 3, they may be interpreted anyway if the
magnitude of the discrepancy is unusually large (19+ points)
5 Interpret the meaning of the global verbal and nonverbal dimensions and the meaning of the small factors
Study the information and procedures presented in Chapter 4 (verbal/nonverbal) and Chapter 5 (FD and
PS factors). Chapter 5 provides the following rules regarding when the FD and PS factors have too
much scatter to permit meaningful interpretation of their respective Indexes (both Chapters 4 and 5 are
on Intelligent Testing with the WISC-III):
(i) Do not interpret the FD Index if the Arithmetic and Digit Span scaled scores differ by four or more points
(ii) Do not interpret the PO Index if the Symbol Search and Coding scaled scores differ by four or more points
6 Interpret significant strengths and weaknesses in the WISC-III subtest profile
If the V±P IQ discrepancy is less than 19 points, use the child's mean of all WISC-III subtests administered
as the child's midpoint
If the V±P IQ discrepancy is 19 or more points, use the child's mean of all Verbal subtests as the midpoint
for determining strengths and weaknesses on Verbal subtests, and use the Performance mean for
determining significant deviations on Performance subtests
Use either the specific values in Table 3.3 of Intelligent Testing with the WISC-III, rounded to the nearest
whole number, or the following summary information for determining significant deviations:
+3 points: Information, similarities, arithmetic, vocabulary
+4 points: Comprehension, digit span, picture completion, picture arrangement, block design,
object assembly, symbol search
+5 points: Coding
7 Generate hypotheses about the fluctuations in the WISC-III subtest profile
Consult Chapter 6 in Intelligent Testing with the WISC-III, as it deals with the systematic reorganization
of subtest profiles to generate hypotheses about strengths and weaknesses

Source: Kaufman (1994b). Reprinted with permission.


198 Intellectual Assessment

speed penalizes children who are more reflective Although they were each about equally reliable,
in their cognitive style or who have coordination the S-A-PC-BD subtests are quicker to admin-
difficulties. The speed factor may prevent a ister and only Similarities requires some sub-
gifted child from earning a high enough score to jectivity to score. It is quicker to score than the
enter into an enrichment class or may lower a S-V-PA-BD form because it uses Arithmetic
learning disabled child's overall IQ score to a instead of Vocabulary.
below average level, just because they do not The authors recommend that the extra
work quickly enough. Although the WISC-III 25±30% savings in time in using the S-A-PC-
clearly has had mixed reviews, it is one of the BD form, in addition to the added validity in
most frequently used tests in the field of comparison to the practical tetrad (I-A-PC-SS),
children's intelligence testing. makes the S-A-PC-BD short form an excellent
choice. The very brief practical form was not
recommended for clinical use or screening
4.08.2.1.3 WISC-III Short Form
purposes because of its lower validity. Kauf-
Short forms of the Wechsler scales were man, Kaufman, Balgopal, et al. (1996) present
developed shortly after the original tests were an equation for converting a person's sum of
developed (Kaufman, Kaufman, Balgopal, & scaled scores of the four subtests to estimated
McLean, 1996). Short forms are designed to FSIQs. The magnitude of the intercorrelations
have sound psychometric qualities and clinical among the component subtests provide that
justification, but should also be practical to use. data from which the exact equation is derived.
Clinicians and researchers utilize short form The intercorrelations vary to some extent as a
when wanting to perform a screen of intellectual function of age, which leads to slightly different
ability or when doing research which does not equations at different ages. However, the
permit the needed time to complete an entire authors state that the equations for the total
Wechsler scale. In a study using three different sample for each tetrad represent a good over-
WISC-III short forms, the clinical, psycho- view for all children ages 6±16. The following
metric, and practical qualities for each form were conversion equation is for the total sample for
examined (Kaufman, Kaufman, Balgopal et al. the recommended S-A-PC-BD short form:
1996). A psychometric and clinically strong
short form was examined, and included the Estimated FSIQ = 1.6c + 36, (for other spe-
following subtests: Similarities, Vocabulary, cific equations for varying ages, see Kauf-
Picture Arrangement, and Block Design. A man, Kaufman, Balgopal et al., 1996, p. 103).
practical short form, based on its brevity and
ease of scoring, included the following subtests: To use this conversion equation, the child's
Information, Arithmetic, Picture Completion, scaled scores on the four subtests (S-A-PC-BD)
and Symbol Search. A short form which must first be summed. The sum (Xc) must then
combines psychometric, clinical, and practical be entered into the equation. For example, if
qualities was also examined: Similarities, Ar- examiners give the recommended psychometric/
ithmetic, Picture Completion, and Block Design. clinical/practical form to an eight-year-old, the
The results of this study using the WISC-III child's scores on the four subtests would need to
standardization sample of 2200 children, 6±16 be summed. Suppose that the child's sum is 50.
years old, revealed important information The above equation would show:
about the utility of these three different short
forms (Kaufman, Kaufman, Balgopal et al., Estimated FSIQ = 1.6(50) + 36
1996). The split-half reliability coefficients, = 80 + 36 = 116
standard error of measurement (SEM), validity
coefficients, and standard errors of estimate for It is important to note that examiners should
the three selected tetrads are presented in Table not take the good psychometric qualities of the
2. The form which had both psychometric and short form to mean that the short form can be
clinical properties (S-V-PA-BD) was compared regularly substituted for the complete battery.
to the form which had the quality of practicality There is a wealth of information, both clinical
in addition to psychometric and clinical proper- and psychometric, that the examiner benefits
ties (S-A-PC-BD). The results indicated that from when administering the complete battery.
they were equally valid and about equally It is important not to favor the short forms just
reliable (see Table 2). Each of the three short because they are shorter, on account of all the
form tetrads had reliability coefficients of above important information that is derived from a
0.90 for the total sample. The brief tetrad (I-A- complete administration. Kaufman (1990) sug-
PC-SS) had a lower correlation with the Full gests that the following are a few instances in
Scale of 0.89, compared to the other two forms which the savings in administration time may
which each correlated 0.93 with the Full Scale. justify use of the short form: (i) when only a
Measures of Intelligence 199

Table 2 Reliability, standard error of measurement, and validity of the three selected short forms by age.

Standard error of Validity: Correlation Standard error of


Split-half reliabilitya measurement with full scalea estimate
Age
(Years) SF1 SF2 SF3 SF1 SF2 SF3 SF1 SF2 SF3 SF1 SF2 SF3

6 92 92 90 4.2 4.2 4.7 92 94 89 6.0 5.2 7.0


7 91 90 89 4.5 4.7 5.0 91 91 89 6.4 6.4 7.0
8 93 93 92 4.0 4.0 5.5 92 94 90 6.0 5.2 6.7
9 91 91 89 4.5 4.5 5.0 93 93 89 5.6 5.6 7.0
10 93 92 90 4.0 4.2 4.7 91 89 87 6.4 4.2 7.6
11 91 92 91 4.5 4.2 4.5 94 91 89 5.2 6.4 7.0
12 94 92 90 3.7 4.2 4.7 94 93 89 5.2 5.6 7.0
13 93 92 90 4.0 4.2 4.7 94 92 86 5.2 6.0 7.9
14 94 92 90 3.7 4.2 4.7 93 93 90 5.6 5.6 6.7
15 94 94 93 3.7 3.7 4.0 96 94 91 4.2 5.2 6.4
16 94 93 92 3.7 4.0 4.2 94 95 90 5.2 4.7 6.7
Total 93 92 91 4.0 4.2 4.5 93 93 89 5.6 5.6 7.0

Source: Kaufman et al. (1996).


Notes: SFI = Short Form 1 (Psychometric/Clinical; Similarities-Vocabulary-Picture Arrangement-Block Design), SF2 = Short Form 2
(Psychometric/Clinical/Practical; Similarities-Arithmetic-Picture Completion-Block Design), SF3 = Short Form 3 (Practical; Information-
Arithmetic-Picture Completion-Symbol Search).
a
Decimal points have been omitted.

global assessment of IQ is needed in the context group of the same age. The Verbal IQ is
of a complete personality evaluation; (ii) when a comprised of six verbal subtest scores (Informa-
thorough evaluation has been completed re- tion, Digit Span, Vocabulary, Arithmetic,
cently and just a brief check of present intellec- Comprehension, and Similarities). The Perfor-
tual functioning is needed; and (iii) when an mance IQ is comprised of five nonverbal subtests
individual does not require categorization of (Picture Completion, Picture Arrangement,
their intellectual ability for placement or for Block Design, Object Assembly, and Digit
diagnosis of a cognitive disorder. Symbol). The Full Scale represents the average
of the Verbal and Performance IQs.
The WAIS-R was standardized by adminis-
4.08.2.1.4 Wechsler Adult Intelligence Scale-
tering the full scale to 1880 adult subjects,
Revised (WAIS-R)
selected according to current US Census data
The Wechsler Adult Intelligence Scale-Third tested between 1976 and 1980. Subjects were
Edition (WAIS-III; The Psychological Cor- stratified according to age, sex, race (white±
poration, 1997) came out in August 1997 and nonwhite), geographic region, occupation, edu-
soon will be replacing the WAIS-R. Based on cation, and urban±rural residence. Subjects
our experience with previous versions of the were divided into nine age groups, correspond-
WAIS, it is likely that there will be a transition ing to categories often used by the US Census
period of 3±4 years during which time clinicians Bureau. The number in each age group ranged
will be gradually moving to use primarily the from 160±300, and the age groups spanned from
newer instrument. Because of this predicted ages 16±74. A problem with the standardization
transition time, we are including information sample noted by Kaufman (1985) was that there
about both the WAIS-R and the WAIS-III. was apparent systematic bias in the selection of
Additionally, much of the research on the 16- to 19-year-olds, leading to very questionable
WAIS-R will be directly relevant and applicable teenage norms. Also Hispanics were not
to the WAIS-III and is therefore included here. included systematically in the total sample.
There was no mention of how Hispanics were
categorized, if any were tested (Kaufman, 1985).
(i) Standardization and properties of the scale
Reliability coefficients for the 11 tests and the
Similar to the other Wechsler scales discussed, Verbal, Performance, and Full Scale IQs were
three IQ scores are derived from the WAIS-R computed using the split-half method (except
subtests. Each of these scores are standard scores for Digit Span and Digit Symbol). Average
with a mean of 100 and a standard deviation of reliability, across the nine age groups, are as
15, which are created by comparing an indivi- follows: 0.97 for the Verbal IQ; 0.93 for the
dual's score to scores earned by the normative Performance IQ; and 0.97 for the Full Scale IQ.
200 Intellectual Assessment

Stability coefficients for the three IQs are: 0.97, ing: Information, Vocabulary, Comprehension,
0.90, and 0.96 for Verbal, Performance, and Full and Similarities. The triad of Picture Comple-
Scale, respectively. tion, Block Design, and Object Assembly
Many factor analytic studies are available comprised the Perceptual Organization factor
which examine the underlying structure of the with loadings in the 0.56±0.73 range. The third
WAIS (i.e., WAIS-R manual, 1981). Three basic factor comprises Digit Span and Arithmetic,
factors have been reported: a ªverbal compre- with factors of 0.64 and 0.55, respectively.
hensionº factor, a ªperceptual organizationº Picture Arrangement and Digit Symbol are
factor, and a ªmemoryº factor, which has also more or less unaccounted for in the three-factor
been assigned labels like Freedom from Dis- solution. Picture Arrangement loads equally on
tractibility, Sequential Ability, and Number the verbal and nonverbal dimensions. Digit
Ability. These findings are noted to confirm the Symbol achieves loadings of only 0.32, 0.38, and
appropriateness of separating the tests of the 0.36 for each of the factors, loading only
WAIS into the Verbal and Performance Scales. marginally on each, but not definitively on
Researchers have disagreed about how many any. Depending on the profile obtained by any
factors do underlie the WAIS-R. Some re- given individual, examiners may choose to
searchers regard the WAIS-R as a one-factor interpret either two or three factors (Kaufman,
test, stating that the common ability factors 1990). The decision to interpret two or three
account for only a small measure of intellectual factors should be based on whether the small
ability (O'Grady, 1983). Some have interpreted third dimension is interpretable for a given
as many as four or five meaningful WAIS person.
factors for various normal and clinical samples Studies on gender differences on the WAIS-R
(Cohen, 1957). However, Kaufman (1990) have shown that males' earned IQs were higher
states that there is not any justification for (although not significantly so) than females'
interpreting five WAIS-R factors. earned IQs (Kaufman, 1990). In a sample of 940
When only two factors are rotated for the males and 940 females, males scored about two
WAIS-R, the results do not quite correspond to points higher on the VIQ, 1.5 points higher on
Wechsler's division of subtests into the Verbal the PIQ, and two points higher on the FSIQ
and Performance Scales, although the fit is (Renolds, Chastain, Kaufman, & McLean,
adequate. In a comparison of six cross-valida- 1987). When the gender differences are exam-
tion samples and the total normative sample ined within different age groups, there are larger
using two-factor solutions, all of the Verbal differences for ages 20±54 than at the extreme
subtests loaded more highly on the Verbal age groups of 16±19 and 55±74. For the 20±54
Comprehension than the Perceptual Organiza- year age range, males scored higher by about 2.5
tion factor (Kaufman, 1990). The loadings from points on VIQ and PIQ, and by about three
the standardization sample ranged from 0.47 to points on the FSIQ (Kaufman, 1990).
0.84 for the Verbal Conceptualization factor, In examining gender differences on the
and ranged from 0.45 to 0.72 for the Perceptual individual subtests (Kaufman, McLean, &
Organization factor. Two Verbal tests (Digit Reynolds, 1988), males and females were found
Span and Arithmetic) did, however, show to perform differently on some of the 11
strong secondary loadings on the Performance subtests. On Information, Arithmetic, and
factor. Digit Span and Arithmetic's loadings on Block Design males significantly and consis-
the Verbal dimension are also not as consis- tently outperformed females. However, females
tently strong as the other four Verbal Subtests. were far superior on Digit Symbol. On a less
Each of the five Performance subtests also consistent basis, males showed superiority on
consistently loaded more highly on the Percep- Comprehension, Picture Completion, Picture
tual Organization than the Verbal Comprehen- Arrangement, and Object Assembly. No gender
sion factor for the total standardization sample differences for any age group were found for
and for the various supplementary groups. Digit Span, Vocabulary, and Similarities.
Picture Arrangement was the exception, with Research on WAIS-R profiles has also
equal loadings on both factors of the normative focused on the area of neuropsychology. In
group (Kaufman, 1990). this area it has been hypothesized that lesions in
The three-factor solutions for the normal the left cerebral hemisphere are associated with
WAIS-R standardization sample demonstrated diminished language and verbal abilities,
factors that were fairly well anticipated. Kauf- whereas lesions in the right cerebral hemisphere
man (1990) discusses the three factor solutions, are accompanied by visual±spatial deficits
and presents a table summarizing the data from (Reitan, 1955). The hypothesis that has grown
six samples plus the normative sample (p. 244). from these expected differences is that indivi-
The Verbal Comprehension factor was defined duals with left brain lesions will have WAIS-R
by loadings ranging from 0.67 to 0.81, includ- profiles with P 4 V, and individuals with right
Measures of Intelligence 201

hemisphere lesions will have a profile with V 4 females with left hemisphere lesions does not
P (Kaufman, 1990). On the basis of numerous support the reversed hypothesis (Kaufman,
WAIS and WAIS-R studies of patients with 1990). This difference across genders for adults
brain lesions, two general conclusions may be with brain lesions may indicate that women
drawn. A summary of empirical data (Kauf- have different cerebral organization than men.
man, 1990) leads to a few main conclusions, as However, data supporting the reason for the
follows (see Table 3). First, patients with right interaction with gender is not definitive (Kauf-
hemisphere damage (unilateral or some bilateral man, 1990).
damage as well) will most likely demonstrate a Turkheimer and Farace (1992) performed a
V 4 P profile. Second, patients with left meta-analysis of 12 different studies which used
hemisphere, unilateral damage may show a Wechsler IQ data to examine both male and
slight P 4 V profile, but not large enough in size female patients with right or left hemisphere
or consistently enough that it is beneficial damage, including a variety of etiologies. The
diagnostically. researchers noted a problem in the previous
A further area of study in subjects with literature to be the use of the difference between
unilateral brain lesions and cognitive ability is the PIQ and VIQ in measuring the effects of
with gender differences. Males and females are lesions. The V±P differences are determined by
believed to differ in various aspects of brain potentially separate effects of each hemisphere
functioning. Kaufman (1990) presents data on the IQs. Thus, in this meta-analysis, separate
from eight studies that included males and VIQ and PIQ means were reported for men and
females with brain lesions. The accumulated women with either right or left hemisphere
data are reported to support the alleged gender- lesions (Turkheimer & Farace, 1992). The
related interaction between side of brain lesion results of the repeated-measures analysis re-
and direction of Verbal IQ±Performance IQ vealed that left hemisphere lesions produce
difference. Damage to the right hemisphere for substantial and roughly equal VIQ deficits in
both males and females lead to more striking male and female patients, but lower mean PIQ
V±P differences than damage to the left hemi- scores in female than in male patients. Right
sphere. However, the V 4 P of 12 points for hemisphere lesions produce PIQ deficits in both
males with right lesions is nearly twice the value genders, but lower mean VIQ scores in female
of 6.5 points for females. For males, the six- patients. Mean scores from Turkheimer and
point P 4 V difference for patients with left Farace's (1992) data are presented in Table 4.
damage supports the hypothesis of depressed The main effect indicated by the data
Verbal IQ for left hemisphere lesions. However, presented in Table 4 is that ªfemale patients
the P 4 V discrepancy of only 1.6 points for are more sensitive to lesions in the hemisphere

Table 3 Effects of unilateral brain damage on WAIS/WAIS-R VIQ±PIQ


discrepancies.

Mean VIQ minus mean PIQ

Group Sample size Left damage Right damage

Stroke
Men 124 710.1 +16.8
Women 81 +0.1 +9.5
Total 248 76.4 +13.5
Tumor (generalized or posterior) 200 +0.2 +8.4
Frontal lobe 104 72.2 +2.6
Temporal lobe epilepsy
Preoperative 101 73.1 +2.4
Postoperative 101 76.4 +6.0
Acute lesion 109 72.4 +14.2
Chronic lesion 131 72.5 +5.5
Age 20±34 664 75.0 +6.7
Age 35±54 1245 73.9 +9.5
Age 55+ 168 72.9 +14.9
Whites 50 75.2 +15.1
Blacks 50 +5.7 +10.4

Source: Kaufman (1990).


202 Intellectual Assessment

Table 4 Gender differences and brain damage on females use verbal strategies in solving PIQ
WAIS/WAIS-R. items to be supported by their data. In females, a
single model of lesion effects could account for
Men Women deficits in VIQ and PIQ, but this was not found
for males. The most striking observation made
Left damage
was that females with left-hemisphere lesions
VIQ 91 91 had substantial deficits in PIQ related to lesion
PIQ 95 91 parameters, but males with left-hemisphere
V±P 74 0 lesions did not (Turkheimer, 1993). Notably,
Right damage this difference could be present because in the
VIQ 104 99
left-hemisphere females may have more non-
PIQ 90 91 verbal abilities relevant to PIQ, or females may
V±P +14 +8 use more verbal strategies in solving PIQ items.
Further research examining problem-solving
strategy is necessary to clarify the reason for
Source: Turkheimer and Farace (1992).
Note: V±P = Verbal IQ minus Performance IQ.
these gender differences.
Total sample size = 983.

(ii) Overview
The WAIS-R has proven itself as a leader in
opposite to that thought to be `odominant' for a the field of adult assessment. Kaufman (1985)
functionº (Turkheimer & Farace, 1992, p. 499). stated, ªThe WAIS-R is the criterion of adult
Although these results are consistent with intelligence, and no other instrument is even
previously reported greater V±P differences in close.º Matarazzo (1985) had an equally
males, the analysis show that there is also no favorable review of the WAIS-R, applauding
difference between male and female patients in its strong psychometric qualities and clinical
the effects of left hemisphere lesions on VIQ, or usefulness. It has strong reliability and validity
right hemisphere lesions on PIQ. The females for Verbal, Performance, and Full Scale IQs, as
demonstrated a pattern of lower mean scores did its predecessor, the WAIS. The separate
following lesions to hemisphere opposite to the subtests, however, have split-half reliability
ªdominantº hemisphere for each function. This coefficients that are below 0.75 for six of the
pattern is supportive of a model which asserts 11 tasks at ages 16±17, and for tasks across the
that there is a greater degree of bilateral age range (Picture Arrangement and Object
processing in women (Turkheimer & Farace, Assembly) (Kaufman, 1985). The sample selec-
1992). This gender difference could be the result tion included apparent systematic bias in the
of many things including: degree of hemispheric selection of individuals ages 16±19, leading to
lateralization, differences in problem-solving very questionable teenage norms (Kaufman,
strategy, or callosal function. 1985). However, the rest of the sample selection
According to Turkheimer, Farace, Yfo, and was done with precision, leading to an overall
Bigler (1993), two major findings have been well-stratified sample.
suggested by earlier studies. Individuals with Administration is not difficult with the clear
lesions in the left hemisphere have smaller and easy to read WAIS-R manual, which
Verbal IQ±Performance IQ differences than provides good scoring directions (Spruill,
subjects with lesions in the right hemisphere, 1984). The administration and scoring rules of
and this difference is greater for males than the WAIS-R were made more uniform with the
females. Theories of why gender differences WISC-R rules, which facilitates transfer (Kauf-
exist can be evaluated through the lesion data. man, 1985). In addition, for the Verbal items
The degree of lateralization in males and with subjective scoring systems, the scoring
females cannot account for gender differences criteria has been expanded to reduce ambiguity;
in PIQ and VIQ, because a ªstatistical model in and to facilitate administration, all words
which the genders have the same degree of spoken by the examiner are placed on separate
lateralization fits the data as well as a model in lines of the administration manual (Kaufman,
which the genders are allowed to differº 1985).
(Turkheimer et al., 1993, p. 471). There was The WAIS-R does have its limitations; some
also not support for the hypothesis that the of which are the nonuniformity of the scaled
gender difference results from differences in the scores, and the limited floor and ceiling (Spruill,
within-hemisphere organization of verbal skills. 1984). Individuals who are extremely gifted or
In a study examining 64 patients through severely retarded cannot be assessed adequately
archival data, Turkheimer et al. (1993) did find with the WAIS-R because the range of possible
Inglis and Lawson's (1982) hypothesis that Full Scale IQ scores is only 45±150. Several
Measures of Intelligence 203

subtests have insufficient floors for adequate The Performance IQ is comprised of five non-
assessment of retarded individuals: Picture verbal subtests (Picture Completion, Picture
Arrangement, Arithmetic, Similarities, Picture Arrangement, Block Design, Matrix Reason-
Completion, and Block Design (Kaufman, ing, and the renamed Digit Symbol-Coding). In
1985). If evaluating an individual who falls at addition, two supplemental subtests are pro-
the extreme low end, this is a distinct dis- vided on the Performance scale: Symbol Search
advantage. In addition, even if a subject receives (which may be used to replace Digit Symbol-
a raw score of zero on a subtest, they can receive Coding) and Object Assembly (which is an
one scaled-score point on that subtest. optional subtest that may be used to replace any
The WAIS's method of using a reference performance subtest for individuals younger
group (ages 20±34) to determine everyone's than 75). In addition to its new name, Digit
scaled scores was retained in the development of Symbol-Coding also has two new optional
the WAIS-R. Kaufman (1985) stated that this procedures not used in IQ computation, which
method is ªindefensible,º because use of this may be used to help the examiner rule out
single reference group impairs profile inter- potential problems. These new procedures
pretation below age 20 and above 34. Profile include Digit Symbol-Incidental Learning and
interpretation is further impaired for indivi- Digit Symbol-Copy. The Full Scale represents
duals aged 16±17 because of low subtest the average of the Verbal and Performance IQs.
reliability. The WAIS and WAIS-R studies at New to the WAIS-III are additional factor
ages 35±44 cannot be generalized to individuals indices, which can be helpful in further breaking
aged 16±19 because of poor teenage norms, down and understanding an individual's per-
again negatively impacting a clinician's ability formance. Like the WISC-III, there are four
to interpret the profile. In the WAIS-R factor indices: Verbal Comprehension, Percep-
appendix, clinicians must utilize separate tual Organization, Working Memory, and
scaled-score tables which are arranged by age Processing Speed. The two new subtests on
group. These separate tables invite clerical the WAIS-III, Letter±Number Sequencing and
errors and confusion in case reports (Kaufman, Symbol Search, are used in calculation of the
1985). The WAIS-R manual itself fails to Working Memory and Processing Speed In-
provide appropriate empirical guidelines for dices, respective. Table 5 shows which tests
profile interpretation, showing a limited aware- comprise each of the IQs and factor indices.
ness of clinicians' practical needs (Kaufman,
1985). However, despite these limitations, the
(ii) Standardization properties of the scale
WAIS-R is still one of the most readily chosen
instruments in the assessment of intelligence. The WAIS-III was standardized by admin-
istering the full scale to 2450 adult subjects,
selected according to 1995 US Census data.
4.08.2.1.5 Wechsler Adult Intelligence Scale- Subjects were stratified according to age, sex,
Third Edition (WAIS-III) race/ethnicity, geographic region, and educa-
tion level. Subjects were divided into 13 age
(i) Description
groups, which is an improvement over the nine
The newest member of the Wechsler family of age groups tested in the WAIS-R standardiza-
tests is the WAIS-III (The Psychological tion sample. The number in each WAIS-III
Corporation, 1997). The WAIS-III is an standardization age group ranged from 100 to
instrument for assessing the intellectual ability 200, and the age groups spanned from age 16 to
of individuals aged 16±89. Like the other 89. Due to the fact that US citizens are living
Wechsler scales discussed, three IQ scores longer, the WAIS-III developers extended
(Verbal, Performance, and Full Scale) and four norms beyond the highest age group (74)
factor indices (Verbal Comprehension, Percep- provided in the WAIS-R. In the collection of
tual Organization, Working Memory, and normative data, an additional 200 African
Processing Speed) are derived from the WAIS- American and Hispanic individuals were also
III subtests. Each of these scores are standard administered the WAIS-III without discontinue
scores with a mean of 100 and a standard rules. ªThis over sampling provided a sufficient
deviation of 15, which are created by comparing number of item scores across all items for item
an individual's score to scores earned by the bias analysesº (The Psychological Corporation,
normative group of the same age. The Verbal IQ 1997, p. 20).
is comprised of six verbal subtest scores Reliability coefficients for the 14 subtests and
(Vocabulary, Similarities, Arithmetic, Digit the Verbal, Performance, and Full Scale IQs
Span, Information, and Comprehension), plus were computed using the split-half method
a new supplementary test to substitute for Digit (except for Digit Symbol-Coding and Symbol
Span if necessary (Letter±Number Sequencing). Search). Average reliability, across the 13 age
204 Intellectual Assessment

Table 5 Subtests comprising WAIS-III IQs and Index Scores.

Subtest IQ scale Factor index

Vocabulary VIQ VCI


Similarities VIQ VCI
Information VIQ VCI
Comprehension VIQ
Arithmetic VIQ WMI
Digit Span VIQ WMI
Letter±number Sequencinga WMI
Picture Arrangement PIQ
Picture Completion PIQ POI
Block Design PIQ POI
Matrix Reasoning PIQ POI
Digit Symbol-coding PIQ PSI
Symbol Searcha PSI
Object Assemblya

Note. Verbal IQ (VIQ); Performance IQ (PIQ); Verbal Comprehension Index (VCI); Perceptual Organization
Index (POI); Working Memory Index (WMI); Processing Speed Index (PSI).
a
The Letter±Number Sequencing, Symbol Search, and Object Assembly subtests can substitute for other
subtests under certain circumstances (see The Psychological Corporation, 1997).

groups, is as follows: 0.97 for the Verbal IQ; 0.94 the oldest age group. From the data presented
for the Performance IQ; and 0.98 for the Full with the standardization sample, it appears
Scale IQ. The average individual subtests' that the WAIS-III is best represented by the
reliabilities ranged from 0.93 (Vocabulary) to four factors that were originally predicted to
0.70 (Object Assembly), with a median coeffi- underlie it.
cient of 0.85. Stability coefficients for the three Across all ages, the Verbal Comprehension
IQs are: 0.96, 0.91, and 0.96 for Verbal, factor was defined by loadings ranging from 0.76
Performance, and Full Scale, respectively. The to 0.89, including: Information, Vocabulary,
stability coefficients for individual subtests Comprehension, and Similarities. Picture Com-
ranged from an average of 0.94 (Information) pletion, Block Design, Matrix Reasoning, and
to 0.69 (Picture Arrangement) with a median Picture Arrangement comprised the Perceptual
coefficient of 81.5. Organization factor with loadings in the
The WAIS-III manual (The Psychological 0.47±0.71 range. The third factor is comprised
Corporation, 1997) reports that numerous of Digit Span, Arithmetic, and Letter±Number
factor analytic studies (exploratory and con- Sequencing with factor loadings of 0.71, 0.51,
firmatory) examined the underlying structure and 0.62, respectively. Symbol Search and Digit
of the WAIS-III. There were four basic factors Symbol-Coding are assumed in the Processing
predicted to be underlying the WAIS-III: Speed factor, with loadings of 0.63 and 0.68,
Verbal Comprehension, Perceptual Organiza- respectively. The Symbol Search subtest requires
tion, Working Memory, and Processing Speed. the examinee to determine whether a pair of
Overall, results of exploratory and confirma- target symbols are present in a larger group of
tory factor analysis support the appropriate- shapes within a specified time limit. The addition
ness of separating the tests of the WAIS into of the new subtests seems to have strengthened
the four factors. The manual reports that the the factor structure, as in the previous version
four factor model is a ªclearly superior solution of the WAIS, some of the subtests did not load
to a one-, two-, or three-factor solution and strongly on any of the factors or loaded
more parsimonious than a five-factor oneº similarly across the factors (i.e., Picture Ar-
(p. 110). Except for the oldest age group, the rangement and Digit Symbol).
findings across all ages are similar. However, in
the 75±89 year age range, many more subtests
(iii) Preliminary research with the WAIS-III
loaded on the Processing Speed factor than the
Perceptual Organization Factor (i.e., Block The WAIS-R and the WAIS-III were com-
Design, Picture Completion, and Picture Ar- pared to see how well they were related (The
rangement all load on Processing Speed). Only Psychological Corporation, 1997). A sample of
Matrix Analogies had a factor loading above 192 individuals with a mean age of 43.5 years
0.40 on the Perceptual Organization factor for (ranging from 16 to 74) were administered the
Measures of Intelligence 205

two tests. The median time between adminis- mean VIQ, PIQ, and FSIQ scores were 92.2,
trations was 4.7 weeks. As would be predicted 81.7, and 86.6, respectively. The PIQ scores tend
by work done by Flynn (1984), subjects scored to be more sensitive to the effects of this
2.9 points lower on the WAIS-III FSIQ than on neurological condition; thus, the mean VIQ
the WAIS-R FSIQ. The WAIS-III VIQ and PIQ score is predictably higher than the PIQ score.
were 1.2 points and 4.8 points lower than the The mean factor index scores demonstrate more
respective WAIS-R scales. The overall correla- differentiation in their cognitive profile. This
tions between the WAIS-III and WAIS-R sample had a mean factor indices of 79.6 (PSI),
global scales were high. The correlation coeffi- 84.8 (POI), 87.2 (WMI), and 93.0 (VCI).
cients for the VIQ, PIQ, and FSIQ were 0.94, A total of 108 individuals diagnosed with
0.86, and 0.93, respectively. mental retardation were administered the
The WAIS-III and WISC-III were also WAIS-III. Six of these, 46 were categorized as
administered to a sample of adolescents to having mild mental retardation, while the other
determine how well the two tests correlated (The 62 had moderate mental retardation (The
Psychological Corporation, 1997). The sample Psychological Corporation, 1997). The results
consisted of 184 16-year olds who were demonstrated deficits across all areas of
administered the two tests from 2 to 12 weeks cognitive functioning, as expected. In the mildly
apart (median time = 4.6 weeks). The correla- mentally retarded group, mean IQ scores were
tions between the global scales of the two tests as follows: 60.1 (VIQ), 64.0 (PIQ), and 58.3
were very high, indicating that the two instru- (FSIQ). The subjects with moderate mental
ments appear to be measuring very similar retardation exhibited lower scores, earning
constructs. The VIQ, PIQ, and FSIQ correla- mean VIQ, PIQ, and FSIQ scores of 54.7,
tion coefficients were 0.88, 0.78, and 0.88, 55.3, and 50.9, respectively. The variability in
respectively. The Index scores from the WAIS- scores of each of these clinical groups is much
III and WISC-III were also compared. The smaller than found in the general population.
Indices' correlations were 0.87, 0.74, 0.80, and The standard deviations are more than 50%
0.79 for the VCI, POI, WMI, and PSI, smaller than those found in the general
respectively. The differences between the mean population.
WISC-III and WAIS-III IQs were all less than ADHD sufferers was another group studied
one point. The differences between the two tests' and reported in the WAIS-III technical manual
mean VCI and POI were also each less than one (The Psychological Corporation, 1997). Tradi-
point. The difference between the WAIS-III and tionally, IQ scores have not be useful in
WISC-III mean WMI was 1.7 standard score discriminating ADHD from non-ADHD in-
points. On the PSI, the difference between the dividuals. However, examining subtests pat-
means on the two tests was 2.7 points. Thus, terns on tests of cognitive ability has been more
overall, the IQ and Indices of the two tests fruitful in discriminating those with ADHD
correspond quite highly. from those without.
The WAIS-III Technical Manual (The Psy- The WAIS-III was administered to 30
chological Corporation, 1997) also presents individuals diagnosed with ADHD (mean age
some studies collected from clinical groups with 19.8 years). The mean level of intellectual
neurological, psychiatric, and developmental functioning was found to be in the Average
disorders. Reviewed here are a select group of range for this group (mean FSIQ = 103.00). In
these studies, including those from a sample of addition, there was no significant difference
patients with mild Alzheimer's disease, a sample found between Verbal and Performance IQs for
of individuals who are mentally retarded, and the group (104.2 and 100.9, respectively). The
one from individuals with attention-deficit WAIS-III factor indices were also examined,
hyperactivity disorder (ADHD). and the pattern of performance on the indices
Individuals with probable Alzheimer's dis- was found to differ in comparison to the general
ease (N = 35) were administered both the population. The ADHD sample scored on
WAIS-III and the Wechsler Memory Scales- average 8.3 points lower on the WMI than
Third Edition (WMS-III). Decrements in the VCI. About 30% of the ADHD sample had
cognitive ability and memory were predicted. WMI scores at least 1 SD lower than their VCI
This sample was reported to have a significantly scores, whereas 13% of the WAIS-III standar-
higher level of education than the normal dization sample had obtained discrepancies of
population, with 48.6% of the sample having this magnitude. On average, the ADHD sample
completed at least four years of college. The scored 7.5 points lower on the PSI than the POI.
results of this study (The Psychological Cor- In the ADHD sample, 26% of the group had
poration, 1997) show that the individuals with PSI scores at least 1 SD lower than their POI
probable Alzheimer's disease had lower scores scores, but only 14% of the WAIS-III standar-
on all IQ scales than the general population. The dization sample had such discrepancies.
206 Intellectual Assessment

(iv) Analyzing the WAIS-III data administration and scoring rules of the WAIS-
III were made more uniform throughout the
The WAIS-III manual provides a general
entire test, which reduces chances of examiner
description of how a clinician may begin to
error. In addition, the scoring rules are listed
interpret the plethora of data obtained in its 14
right in the administration manual for the
subtests. However, to obtain the maximum
Verbal items with subjective scoring systems.
amount of information from the profile, one
This change from the WAIS-R has eased the
should utilize an approach to profile interpreta-
work of clinicians (no more flipping back and
tion that will group and regroup subtests
forth during administration).
(Kaufman & Lichtenberger, in press). Similar
The WAIS-III attempted to improve its floor
to the WISC-III interpretation, the WAIS-III
and ceiling in comparison to the earlier version.
may be examined from the global level (IQs) to
Several step-down items have been added on
the individual profile (subtest) level. An orga-
each subtest for lower functioning individuals.
nized, systematic approach will be advanta-
However, like the WAIS-R, individuals who are
geous in obtaining the most accurate picture of
extremely gifted or severely retarded cannot be
the individual.
adequately assessed with the WAIS-III because
The WAIS-III record form provides a nice
the range of possible Full Scale IQ scores is only
beginning to profile interpretation. However, a
45±150. Studies on individuals in the lower
structure is needed to work through the large
extreme range of the WAIS-III are yet to
amount of data in a systematic fashion. Kauf-
determine whether evaluating an individual who
man and Lichtenberger, (in press) have devel-
falls at the extreme low end is a distinct
oped a series of 10 steps to aid the clinician in
disadvantage. As on the WAIS-R, even if a
interpreting and integrating all of the data
subject receives a raw score of zero on a subtest,
obtained from the WAIS-III's three IQs and
they can receive one to five scaled-score points
four factor indices, while not becoming over-
on that subtest. Uniformity is not found across
whelmed with the multiple scores and difference
the range of scaled scores for each subtest. On
scores. The 10 steps, a step-by-step approach to
certain subtests subjects may reach a ceiling
WAIS-III profile interpretation, is presented in
more quickly than on others. At certain ages,
summary form below. The 10 steps are
ordinarily the highest scaled score that can be
presented in Table 6. Using these 10 steps can
obtained on the subtest is 19; however, on the
help to organize the information to generate
Arithmetic or Picture Arrangement subtest, 17
meaningful hypotheses about personal cogni-
is the maximum score (Kaufman & Lichten-
tive strengths and weaknesses in preparation for
berger, in press). Profile analysis is made
clear presentation in the form of a written
difficult because of this nonuniformity across
report.
subtests, especially for the extremely gifted
subjects.
The method of the WAIS-R of using a
(v) Overview
reference group (ages 20±34) to determine
WAIS-III is likely to follow in its footsteps of everyone's scaled scores was not retained in
the WAIS-R, which has proven itself as a leader the development of the WAIS-III. Kaufman
in the field of adult assessment. The new norms (1985) stated that this WAIS-R method is
and psychometric improvements of the WAIS- ªindefensible,º because use of this single
III are much welcomed by the assessment reference group impairs profile interpretation
community. The WAIS-III has strong reliability below age 20 and above 34; thus, the WAIS-III
and validity for Verbal, Performance, and Full change in determining scaled scores is a
Scale IQs, as did its predecessor, the WAIS-R. significant improvement. The process of profile
The subtest with the lowest split half reliability interpretation has been made much less con-
has been removed from the computation of the fusing by the removal of the reference group
IQs (Object Assembly). However, Picture Ar- scores. (However, if one wants to calculate the
rangement still exhibits split half reliability scores by using the 20±34 reference group, this is
coefficients below 0.75 at several ages, and it still possible.) Fewer clerical errors and less
remains part of the Performance IQ. The WAIS- confusion in case reports will be present because
III sample selection was done with precision, of these changes (Kaufman & Lichtenberger, in
leading to an overall well-stratified sample. press). The WAIS-III manual and record form
Many visual and practical improvements themselves provide the beginning to interpreta-
were made in the development of the WAIS- tion, with clearly laid out tables to calculate
III. Administration is not difficult with the clear score discrepancies, and so forth. However, to
and easy to read WAIS-III manual, in addition meet clinicians' practical needs for more specific
to the record form with ample space and visual empirical guidelines for profile interpretation in
icons (Kaufman & Lichtenberger, in press). The a systematic and step-by-step fashion, other
Measures of Intelligence 207

Table 6 Ten tips for WAIS-III interpretation.

Step 1. Interpret the Full Scale IQ.


(i) Report the FSIQ confidence interval (The Psychological Corporation, 1997; Table A.5 p. 197).
(ii) Report the FSIQ percentile rank (The Psychological Corporation, 1997; Table A.5 p. 197).
(iii) Report the FSIQ ability level (The Psychological Corporation, 1997, Table 2.3 p. 25).
(iv) If in STEPS 2±6, it is determined that there is a significant difference between the component parts of the
FSIQ (i.e., VIQ & PIQ or VCI & POI), the FSIQ should not be interpreted as a meaningful
representation of the individual's overall performance.
Step 2. Determine if the Verbal±Performance IQ discrepancy is statistically significant.
(i) For all ages VIQ±PIQ difference of 6 points is significant at 0.15 level.
(ii) For all ages VIQ±PIQ difference of 9 points is significant at 0.05 level.
(iii) Values for specific ages are presented in Table B.1 (The Psychological Corporation, 1997; p. 205).
Step 3. Determine if the VIQ and PIQ are interpretable. Four questions to consider about the Verbal and
Performance Scales.
Verbal Scale
(i) Is the difference between VCI and WMI statistically significant (p50.05)?
Size needed for difference = 10+ points.
(ii) Is there abnormal scatter among the VIQ subtests?
Highest of 6 VIQ subtest scaled scores minus lowest = 8+ points.
Performance Scale
(iii) Is the difference between POI and PSI statistically significant (p50.05)?
Size needed for difference = 13+ points.
(iv) Is there abnormal scatter among the PIQ subtests?
Highest of 5 PIQ subtest scaled scores minus lowest = 8+ points.
Step 4. Determine if VIQ±PIQ discrepancy is interpretable or if the VCI and POI should be interpreted instead.
(i) If all answers in Step 3 are no, then the VIQ±PIQ discrepancy is interpretable. Skip Step 5 and go directly to
Step 6.
(ii) If the answer to one or more of the questions in Step 3 is yes, then the VIQ±PIQ discrepancy may not be
interpretable.
(iii) If VIQ±PIQ is not interpretable, then examine the VCI±POI discrepancy in Step 5.
Step 5. Determine whether VCI and POI are interpretable and significantly different from one another.
(i) Is there abnormal scatter among the VCI subtests?
Highest of 3 VCI subtest scaled scores minus lowest = 5+ points.
(ii) Is there abnormal scatter among the POI subtests?
Highest of 3 POI subtest scaled scores minus lowest = 6+ points.
(iii) If the answer to either (i) or (ii) is yes, then VCI±POI difference may not be interpretable. Otherwise, if both
answers are no, examine the interpretable VCI±POI difference:
(a) Is the difference between VCI and POI statistically significant (p50.05)?
(b) Size needed for difference = 10+ points.
Step 6. Determine if the VIQ±PIQ discrepancy (or the VCI±POI discrepancy) is abnormally large.
(i) 17 Point difference is abnormally large for the VIQ±PIQ.
(ii) 19 Point difference is abnormally large for the VCI±POI.
(iii)Exact point values according to ability level are available in Appendix D (The Psychological Corporation,
1997, pp. 300±309)
(iv) If the discrepancies are abnormally large, this indicates that they are too big to ignore (see Steps 4 & 5), and
they may be interpreted anyway.
Step 7. Determine whether the Working Memory and Processing Speed indices are interpretable.
(i) Do not interpret WMI if scatter among the 3 subtests is = 6+ points.
(ii) Do not interpret PSI if difference among 2 subtests is = 4+ points.
Step 8. Interpret the Global Verbal and Nonverbal Dimensions, as well as the small factors, if they were found to be
interpretable.
Study the information and procedures presented in (Kaufman & Lichtenberger, in press).
Step 9. Interpret significant strengths and weaknesses in the WAIS-III subtest profile.
(i) If the VIQ±PIQ discrepancy is less than 17 points, use the individual's mean of all WAIS-III subtests as the
person's midpoint.
(ii) I the VIQ±PIQ discrepancy is 17 or more points, use 2 separate means:
(a) Use the individual's mean of all the Verbal subtests as the midpoint for determining strengths and
weaknesses on Verbal subtests;
208 Intellectual Assessment

Table 6 (continued)

(b) Use the individual's mean of all the Performance subtests as the midpoint for determining strengths
and weaknesses on Performance subtests.
(iii) Subtract the individual's mean from each of the subtest scaled scores to determine strengths and
weaknesses. Round to the nearest whole number.
(iv) Values are presented in Table B.3 (The Psychological Corporation, 1997, p. 208) for determining if a
subtest significantly deviates from the individual's own mean. The following summary information may
also be used to determine significance:
+2 points: Vocabulary
+3 points: Similarities, Arithmetic, Digit Span, Information, Comprehension, Coding, Block Design,
Matrix Reasoning,
+4 points: Letter-Number Sequencing, Picture Completion, Picture Arrangement, Symbol Search
+5 points: Object Assembly.
Step 10. Generate hypotheses about the fluctuations in the WAIS-III subtest profile.
Review the information presented in Kaufman and Lichtenberger (in press) which detail how to reorganize
subtest profiles to systematically generate hypotheses about strengths and weaknesses.

sources are available (Kaufman & Lichtenber- 1984). Many of the controversies, especially
ger, in press). Undoubtedly, like its predeces- those regarding the validity of the K-ABC
sors, the WAIS-III is likely to become one of the theory, will likely endure unresolved for some
most readily chosen instruments in the assess- time (Kamphaus et al., 1995). Fortunately, the
ment of intelligence. apparent controversy linked to the K-ABC has
resulted in numerous research studies and
papers that provide more insight into the K-
4.08.2.1.6 Kaufman Assessment Battery for ABC and its strengths and weaknesses.
Children (K-ABC)
The K-ABC is a battery of tests measuring
(i) Theory
intelligence and achievement of normal and
exceptional children aged 2.5±12.5. It yields four The K-ABC intelligence scales are based on a
scales: Sequential Processing, Simultaneous theoretical framework of Sequential and Simul-
Processing, Mental Processing Composite (Se- taneous information processing, which relates
quential and Simultaneous), and Achievement. to how children solve problems rather than what
The K-ABC is becoming a frequently used test type of problems they must solve (e.g., verbal or
in intelligence and achievement assessment that nonverbal). In stark contrast is Wechsler's
is used by both clinical and school psychologists theoretical framework of the assessment of
(Kamphaus, Beres, Kaufman, & Kaufman, ªg,º a conception of intelligence as an overall
1995). In a nationwide survey of school global entity. As a result, Wechsler used the
psychologists conducted in 1987 by Obringer Verbal and Performance scales as a means to an
(1988), respondents were asked to rank the end. That end is the assessment of general
following instruments in order of their usage: intelligence. In comparison, the Kaufmans
Wechsler's scales, the K-ABC, and both the old emphasize the individual importance of the
and new Stanford±Binets. The Wechsler scales Sequential and Simultaneous scales in inter-
earned a mean rank of 2.69, followed closely by pretation, rather than the overall Mental
the K-ABC with a mean of 2.55, the L-M version Processing Composite (MPC) score (Kamphaus
of the Binet (1.98), and the Stanford±Binet et al., 1995).
Fourth Edition (1.26). Bracken (1985) also The Sequential and Simultaneous framework
found similar results of the K-ABC's popularity. for the K-ABC stems from an updated version
Bracken surveyed school psychologists and of a variety of theories (Kamphaus et al., 1995).
found that for ages 5±11 years the WISC-R The foundation lies in a wealth of research in
was endorsed by 82%, the K-ABC by 57%, and clinical and experimental neuropsychology and
the Binet IV by 39% of the practitioners. These cognitive psychology. The Sequential and
results suggest that clinicians working with Simultaneous theory was primarily developed
children should have some familiarity with the from two lines of theory: the information
K-ABC (Kamphaus et al., 1995). processing approach of Luria (e.g., Luria,
The K-ABC has been the subject of great 1966), and the cerebral specialization work of
controversy from the outset, as evident in the Sperry (1968, 1974), Bogen (1969), Kinsbourne
strongly pro and con articles written for a (1975), and Wada, Clarke, and Hamm (1975).
special issue of the Journal of Special Education The neuropsychological processing model,
devoted to the K-ABC (Miller & Reynolds, which originated with the neurophysiological
Measures of Intelligence 209

observations of Luria (1966, 1973, 1980) and often a visual aspect to the problem and visual
Sperry (1968), the psychoeducational research imagery used to solve it. A prototypical example
of Das (1973; Das et al., 1975; Das, Kirby, & of a Simultaneous subtest is the Triangles
Jarman, 1979; Naglieri & Das, 1988, 1990), and subtest on the K-ABC, which is similar to
the psychometric research of Kaufman and Wechsler's Block Design. To solve both of these
Kaufman (1983), possesses several strengths subtests, children must be able to see the whole
relative to previous models in that it (i) provides picture in their mind and then integrate the
a unified framework for interpreting a wide individual pieces to create the whole.
range of important individual difference vari- In comparison, Sequential processing em-
ables; (ii) rests on a well-researched theoretical phasizes the ability to place or arrange stimuli in
base in clinical neuropsychology and psycho- sequential or serial order. The stimuli are all
biology; (iii) presents a processing, rather than a linearly or temporally related to one another,
product-oriented, explanation for behavior; creating a form of serial interdependence within
and (iv) lends itself readily to remedial strategies the stimulus (Kaufman & Kaufman, 1983). The
based on relatively uncomplicated assessment K-ABC subtests assess the child's Sequential
procedures (Kaufman & Kaufman, 1983; processing abilities in a variety of modes. For
McCallum & Merritt, 1983; Perlman, 1986). example, Hand Movements involves visual
This neuropsychological processing model input and a motor response, Number Recall
describes two very distinct types of processes involves auditory input with a vocal response,
which individuals use to organize and process and Word Order involves auditory input and
information received in order to solve problems visual response. These different modes of input
successfully: successive or sequential, analytic- and output allow the examiner to assess the
linear processing vs. holistic/simultaneous pro- child's sequential abilities in a variety of ways.
cessing (Levy & Trevarthen, 1976; Luria, 1966). The Sequential subtests also provide informa-
These processes have been identified by numer- tion on the child's short-term memory and
ous researchers in diverse areas of neuropsy- attentional abilities.
chology and cognitive psychology (Perlman, According to Kamphaus et al. (1995), one of
1986). From Sperry's cerebral specialization the controversial aspects of the K-ABC was the
perspective, these processes represent the fact that it took the equivalent of Wechsler's
problem-solving strategies of the left hemi- Verbal Scale and redefined it as ªachievement.º
sphere (analytic/sequential) and the right hemi- The Kaufmans' analogs of tests such as
sphere (Gestalt/holistic). From Luria's Information (Faces & Places), Vocabulary
theoretical approach, successive and simulta- (Riddles and Expressive Vocabulary), and
neous processes reflect the ªcodingº processes Arithmetic (Arithmetic) are included on the
that characterize ªBlock 2º functions. K-ABC as achievement tests and viewed as
Regardless of the theoretical model, succes- tasks that are united by the demands they place
sive processing refers to the processing of on children to extract and assimilate informa-
information in a sequential, serial order. The tion from their cultural and school environ-
essential nature of this mode of processing is ment. The K-ABC is predicated on the
that the system is not totally surveyable at any distinction between problem solving and knowl-
point in time. Simultaneous processing refers to edge of facts. The former set of skills are
the synthesis of separate elements into groups. interpreted as intelligence; the latter is defined as
The essential nature of this mode of processing achievement. This definition presents a break
is that any portion of the result is, at once, from other intelligence tests, where a person's
surveyable without dependence on its position acquired factual information and applied skills
in the whole. The model assumes that the two influence greatly the obtained IQ (Kaufman &
modes of processing information are available Kaufman, 1983).
to the individual. The selection of either or both
modes of processing depends on two conditions:
(i) the individual's habitual mode of processing (ii) Standardization and properties of the scale
information as determined by social±cultural
and genetic factors, and (ii) the demands of the Stratification of the K-ABC standardization
task (Das et al., 1975). sample closely matched the 1980 US Census
In reference to the K-ABC, Simultaneous data on the variables of age, gender, geographic
processing refers to the mental ability to region, community size, socioeconomic status
integrate information all at once to solve a (SES), race or ethnic group, and parental
problem correctly. Simultaneous processing occupation and education. Additionally, unlike
frequently involves spatial, analogic, or orga- most other intelligence measures for children,
nizational abilities (Kaufman & Kaufman, stratification variables also included educa-
1983; Kamphaus & Reynolds, 1987). There is tional placement of the child (see Table 7).
210 Intellectual Assessment

Table 7 Representation of the K-ABC standardization sample by educational


placement.a

Educational placement N % %

Regular classroom 1862 93.1 91.1


Speech impaired 28 1.4 2.0
Learning disabled 23 1.2 2.3
Mentally retarded 37 1.8 1.7
Emotionally disturbed 5 0.2 0.3
Otherb 15 0.8 0.7
Gifted and talented 30 1.5 1.9c

Total K-ABC sample 2000 100.0 100.0

a
Data from US Department of Education, National Center for Education Statistics. 1980. Table 2.7, The
Condition of Education, Washington, DC, US Government Printing Office. b Includes other health
impaired, orthopedically handicapped, and hard of hearing. c Data from US Office for Civil Rights, 1980,
State, Regional, and National Summaries of Data from the 1978 Child Rights Survey of Elementary and
Secondary Schools (p. 5). Alexandria, VA: Killalea Associates.

Reliability and validity data provide con- school-age subtest on its respective scale.
siderable support for the psychometric aspects Analyses of the combined processing and
of the test. A test±retest reliability study was achievement subtests also offered good con-
conducted with 246 children after a two- to four- struct validation of the K-ABC's three-scale
week interval (mean interval = 17 days). The structure (Kaufman & Kamphaus, 1984).
coefficients for the Mental Processing Compo- Although the K-ABC and the WISC-III
site were 0.83 for age two years, six months differ from one another in a number of ways,
through four years, eleven months; 0.88 for ages there is strong evidence that the two measures
five years through eight years, eleven months; correlate substantially (Kamphaus & Reynolds,
and 0.93 for ages nine years to 12 years, five 1987). In a study of 182 children enrolled in
months. Test±retest reliabilities for the Achieve- regular classrooms, the Mental Processing
ment scale composite for the same age groups Composite (MPC) correlated 0.70 with WISC-
were 0.95, 0.95, and 0.97, respectively (Kam- R Full Scale IQ (FSIQ), thus, sharing a 49%
phaus et al., 1995). The test±retest reliability overlap in variance (Kamphaus et al., 1995;
research reveals that there is a clear develop- Kaufman & Kaufman, 1983). There have also
mental trend, with coefficients for the preschool been numerous correlational studies conducted
ages being smaller than those for the school-age with handicapped and exceptional populations
range. This trend is consistent with the known that may be found in the Interpretative manual.
variability over time that characterizes pre- The overall correlation between the K-ABC and
school children's standardization test perfor- the WISC-R range from 0.57 to 0.74, indicating
mance in general (Kamphaus & Reynolds, that the two tests overlap a good deal, yet also
1987). Split-half reliability coefficients for the show some independence (Kamphaus et al.).
K-ABC global scales range from 0.86 to 0.93
(mean = 0.90) for preschool children, and from
(iii) Overview
0.89 to 0.97 (mean = 0.93) for children aged
5±12.5 (Kamphaus et al., 1995). Although the K-ABC has been the subject of
There has been a considerable amount of past controversy, it appears that it has held its
research done on the validity of the K-ABC. The own and is used often by professionals. The K-
K-ABC interpretive manual (Kaufman & Kauf- ABC is well designed with easy to use easels and
man, 1983) includes the results of 43 such manuals. The information in the manuals is
studies. Construct validity was established by presented in a straightforward, clear fashion,
looking at five separate topics: developmental making use and interpretation of the K-ABC
changes, internal consistency, factor analysis relatively easy (Merz, 1985). There has been a
(principal factor, principal components, and considerable amount of research done on the
confirmatory), convergent and discriminant validity of the K-ABC and the authors have
analysis, and correlations with other tests. done a thorough job of presenting much of that
Factor analysis of the Mental Processing Scales information in the manual. The reporting of the
offered clear empirical support for the existence reliability and validity data in the manual is
of two, and only two, factors at each age level, complete and understandable. However, there is
and for the placement of each preschool and not enough information presented on the
Measures of Intelligence 211

content validity of the test. The various tasks on however, other theories guided the test devel-
the subtests on the K-ABC are based on clinical, opment process, specifically the construction of
neuropsychological, and/or other research- the subtests. Tasks were developed from the
based validity; however, a much clearer ex- models of Piaget's formal operations (Inhelder
plication of the rationale behind some of the & Piaget, 1958; Piaget, 1972) and Luria's (1973,
novel subtests would have been quite helpful 1980) planning ability in an attempt to include
(Merz, 1985). high-level, decision-making, more developmen-
The K-ABC measures intelligence from a tally advanced tasks. Luria's notion of planning
strong theoretical and research basis, evident in ability involves decision-making, evaluation of
the quality of investigation in the amount of hypotheses, and flexibility, and ªrepresents the
research data presented in the manual (Merz, highest levels of development of the mammalian
1985). The K-ABC was designed to measure the brainº (Golden, 1981, p. 285).
intelligence and achievement of children aged Cattell and Horn (Cattell, 1963; Horn &
2.5±12.5 and the research done to date suggests Cattell, 1966, 1967) postulated a structural
that in fact the test does just that. The model that separates fluid from crystallized
Nonverbal Scale significantly contributes to intelligence. Fluid intelligence traditionally
the effort to addressing the diverse needs of involves relatively culture-fair novel tasks and
minority groups and language handicapped taps problem solving skills and the ability to
children. Overall, it appears that the authors of learn. Crystallized intelligence refers to acquired
the K-ABC have met the goals listed in the skill, knowledge, and judgments which have
interpretative manual and that this battery is a been taught systematically or learned via
valuable assessment tool (Merz, 1985). acculturation. The latter type of intelligence is
Keith and Dunbar (1984) present an alternate influenced highly by formal and informal
means of interpreting the K-ABC, based on education and often reflects cultural assimila-
exploratory and confirmatory factor analytic tion. Tasks measuring fluid ability often involve
data. The two K-ABC Reading subtests are more concentration and problem solving than
eliminated in their alternate analysis, and crystallized tasks which tend to measure
factors labeled Verbal Memory, Nonverbal retrieval and application of general knowledge.
Reasoning, and Verbal Reasoning are pre- Piaget's formal operations depicts a
sented. For school-aged children whose hypothetical-deductive abstract reasoning sys-
Achievement Scale splits in half, this model tem that has as its featured capabilities the
may help interpret their profile. A problem with generation and evaluation of hypotheses and
the Keith and Dunbar labels is that they do not the testing of propositions. The prefrontal areas
offer evidence to support their Verbal Memory of the brain associated with planning ability
and Nonverbal Reasoning labels. Keith and mature at about ages 11±12 years (Golden,
Dunbar conclude that considerable caution be 1981), the same ages that characterize the onset
used when interpreting K-ABC results. In the of formal operational thought (Piaget, 1972).
K-ABC Interpretive Manual (Kaufman & The convergence of the Luria and Piaget
Kaufman, 1983), it is also stressed that a child's theories regarding the ability to deal with
profile may need to be approached from an abstractions is striking; this convergence pro-
alternative model, if the author's model does vided the rationale for having age 11 as the
not create a good interpretation of the profile. lower bound of the KAIT, and for attempting to
measure decision making and abstract thinking
with virtually every task on the KAIT (Kauf-
4.08.2.1.7 Kaufman Adolescent and Adult
man & Kaufman, 1993).
Intelligence Test
Within the KAIT framework (Kaufman &
The Kaufman Adolescent and Adult Intelli- Kaufman, 1993), crystallized intelligence ªmea-
gence Test (KAIT) (Kaufman & Kaufman, sures the acquisition of facts an problem solving
1993) is an individually administered intelli- ability using stimuli that are dependent on
gence test for individuals between the ages of 11 formal schooling, cultural experiences, and
and more than 85 years. It provides Fluid, verbal conceptual developmentº (p. 7). Fluid
Crystallized, and Composite IQs, each a intelligence ªmeasures a person's adaptability
standard score with a mean of 100 and a and flexibility when faced with new problems,
standard deviation of 15. using both verbal and nonverbal stimuliº
(Kaufman & Kaufman, 1993, p. 7). It is
important to note that this crystallized-fluid
(i) Theory
construct split is not the same as Wechsler's
The Horn±Cattell theory forms the founda- 1974, 1981, 1991) verbal±nonverbal split. This
tion of the KAIT and defines the constructs was documented in the results of a factor
believed to be measured by the separate IQs; analysis done with the WISC-R and the KAIT
212 Intellectual Assessment

that showed the KAIT crystallized subtests 11±94 years was stratified on the variables of
loaded highly on the Crystallized/Verbal factor gender, racial/ethnic group, geographic region,
(0.47±0.78), Fluid subtests loaded 0.51±0.88 on and SES (Kaufman & Kaufman, 1993).
the Fluid factor, and Memory for Block Designs Mean split-half reliability coefficients for the
also loads 0.41 on the Perceptual Organization total normative sample were 0.95 for Crystal-
factor (Kaufman & Kaufman, 1993; Kaufman, lized IQ, 0.95 for Fluid IQ, and 0.97 for
Ishikuma, & Kaufman, 1994). The KAIT Fluid Composite IQ (Kaufman & Kaufman, 1993).
subtests stress reasoning rather than visual± Mean test±retest reliability coefficients, based
spatial ability, include verbal comprehension or on 153 identified normal individuals in three age
expression as key aspects of some tasks, and groups (11±19, 20±54, 55±85+), retested after a
minimize the role played by visual-motor speed one-month interval, were 0.94 for Crystallized
for correct responding. In addition, the KAIT IQ, 0.87 for Fluid IQ, and 0.94 for Composite
scales measure what Horn (1989) refers to as IQ (Kaufman & Kaufman, 1993). Mean split-
broad fluid and broad crystallized abilities, half reliabilities of the four Crystallized subtests
rather than the purer and more specific skill ranged from 0.89 to 0.92 (median = 0.90).
areas that have emerged in Horn's expansion Mean values for the four Fluid subtests ranged
and elaboration of the original Horn±Cattell from 0.79 to 0.93 (median = 0.88) (Kaufman &
Gf-Gc theory. Kaufman, 1993). Median test±retest reliabilities
The Core Battery of the KAIT is composed of for the eight subtests, based on the 153 people
three Crystallized and three Fluid subtests, and indicated previously, ranged from 0.72 to 0.95
these six subtests are used to compute the IQs. (median = 0.78). Rebus Delayed Recall had an
The Expanded Battery also includes two average split-half reliability of 0.91 and Audi-
supplementary subtests and two measures of tory Delayed Recall had an average value of
delayed recall that evaluate the individual's 0.71; their respective stability coefficients were
ability to retain information that was learned 0.80 and 0.63 (Kaufman & Kaufman, 1993).
previously in the evaluation during two of the Factor analysis, both exploratory and con-
Core subtests. The Core Battery of the KAIT firmatory, gave strong construct validity sup-
consists of subtests one through six, and port for the Fluid and Crystallized Scales, and
subtests one through 10 comprise the Expanded for the placement of each subtest on its
Battery. Each subtest except the supplementary designated scale. Crystallized IQs correlated
Mental Status task yields age-based scaled 0.72 with Fluid IQs for the total standardization
scores with a mean of 10 and a standard sample of 2000 (Kaufman & Kaufman, 1993).
deviation of three. Sample and teaching items Table 8 summarizes the results of correla-
are included for most subtests to ensure that tional studies involving the KAIT and other well
examinees understand what is expected of them known intelligence tests. The values shown in
for each subtest. Table 8 support the construct and criterion-
The delayed recall subtests are administered, related validity of the three KAIT IQs.
without prior warning, about 25 and 45 minutes Two separate exploratory joint factor ana-
after the administration of the original, related lyses were conducted to analyze the KAIT with
subtests. The two delayed recall subtests provide both the WISC-R and the WAIS-R. Data were
good measure of an ability that Horn (1985, obtained from 118 adolescents in the WISC-R
1989) calls TSR (Long-Term Storage and sample and 338 adults in the WAIS-R sample.
Retrieval). TSR ªinvolves the storage of Two-, three-, four-, five-, and six-factor solu-
information and the fluency of retrieving it tions were examined for each analysis using
later through associationº (Woodcock, 1990, both varimax and oblimin rotations. Three
p. 234). robust factors came out in the three-factor
The Mental Status subtest is comprised of 10 solutions for both the WISC-R and the WAIS-
simple questions that assess attention and R. Loadings from the first unrotated principal
orientation to the world. Most normal adoles- factor (g loadings) along with the three-factor
cents and adults pass at least nine of the 10 solution for the joint analysis of the KAIT and
items, but the task has special use with retarded the WISC-R are presented in Table 9. The
and neurologically impaired populations. The analogous data for the KAIT and WAIS-R are
Mental Status subtest may be used as a screener presented in Table 10.
to determine if the KAIT can be validly The results of these joint analyses indicate
administered to an individual. that the Wechsler subtests and KAIT subtests
are about equal as measures of general
intelligence. The KAIT subtests have a mean
(ii) Standardization and properties of the scale
g loading of 0.71 and the Wechsler subtests also
The KAIT normative sample, composed of have a mean g loading of 0.71. The most
2000 adolescents and adults between the ages of important finding from these analyses is that the
Measures of Intelligence 213

Table 8 Correlations of the three KAIT IQs with standard scores and IQs yielded by
other major intelligence tests

Sample KAIT KAIT KAIT


Scale Age size crystallized fluid composite

WAIS-R IQ Verbal 16±19 71 0.85 0.74 0.86


20±34 90 0.78 0.66 0.78
35±49 108 0.79 0.74 0.85
50±83 74 0.85 0.70 0.86
Performance 16±19 71 0.64 0.70 0.72
20±34 90 0.60 0.74 0.73
35±49 108 0.57 0.73 0.73
50±83 74 0.74 0.66 0.77
Full scale 16±19 71 0.84 0.79 0.88
20±34 90 0.77 0.76 0.83
35±49 108 0.74 0.78 0.85
50±83 74 0.84 0.70 0.85
WISC-R IQ Verbal 11±16 118 0.79 0.74 0.83
Performance 11±16 118 0.67 0.67 0.72
Full scale 11±16 118 0.78 0.75 0.82
K-ABC Sequential 11±12 124 0.46 0.44 0.50
Simultaneous 11±12 124 0.53 0.62 0.63
Mental processing 11±12 124 0.57 0.62 0.66
composite
Achievement 11±12 124 0.81 0.64 0.82
Stanford± Composite 11±42 79 0.81 0.84 0.87
Binet-4 intelligence

Source: Kaufman and Kaufman (1993).

KAIT Fluid subtests and Wechsler Perfor- period beginning at about age 55. This finding
mance subtests seem to measure markedly that fluid ability reaches a peak in development
different constructs. The Fluid and Perceptual in young adulthood and declines thereafter, at
Organization factors correlate about as highly first quite gradually, but more rapidly as age
with each other as they do the Crystallized/ progresses, is in agreement with results from
Verbal Factor. The differences between Fluid previous research, although a more steep
IQ and Perceptual Organization abilities have decline in ability was found in the research
been studied and discussed by Woodcock directed by Horn (1985). Through the 20s, the
(1990). Woodcock presented evidence that measure of crystallized knowledge was found to
Wechsler's Performance IQ primarily measures increase, but then showed no increase or
Horn's Gv or broad visualization, and not fluid decrease until about age 60. Similar findings
intelligence. have been reported in other cross-sectional
The KAIT benefits from an integration of investigations of fluid and crystallized measures
theories that unite developmental (Piaget), (Kaufman, Kaufman, Chen, & Kaufman,
neuropsychological (Luria), and experimental- 1996). After age 60, crystallized knowledge
cognitive (Horn±Cattell) models of intellectual was found to decrease as well. Individual
functioning. The theories work well together differences in education, gender, and ethnicity
and do not compete with one another. Together, were found not to alter the basic findings
the theories give the KAIT a solid theoretical (Kaufman & Horn, 1996).
foundation that facilitates test interpretation The KAIT has also been examined with
across the broad 11±94-year age range on which respect to its relationship to adult interests, as
the battery was normed. demonstrated on the Strong Interest Inventory
The changes in crystallized and fluid abilities (SII; Hansen & Campbell, 1985). Kaufman
on the KAIT were examined in a study of 1500 and McLean (1992, November) examined 936
adults aged 17±94 (Kaufman & Horn, 1996). individuals who were administered both the
The results of this study indicated that fluid KAIT and the SII. The SII includes six
reasoning (Gf) declined steadily across adult- General Occupational Themes (GOTs), includ-
hood, and this decline accelerated during the ing Realistic, Investigative, Artistic, Social,
214 Intellectual Assessment

Table 9 Exploratory joint principal-factor analysis of the KAIT and the WISC-R (N = 118).

Oblimin factor pattern

First unrotated Crystallized/verbal Perceptual organization Fluid


Subtest factor (g) I II III

KAIT
Crystallized
Definitions 82 47 45
Auditory comprehension 69 51 37
Double meanings 79 47 48
Famous faces 65 78
Fluid
Rebus learning 69 64
Logical steps 66 60
Mystery codes 78 88
Memory for block designs 74 41 51
WISC-R
Verbal
Information 86 62
Vocabulary 84 75
Arithmetic 73 59
Comprehension 67 46
Similarities 79 50 32
Performance
Picture completion 67 43
Picture arrangement 53 36
Block design 78 79
Object assembly 73 80
Coding 55 32

Source: Kaufman and Kaufman (1993).


Note: Decimal points are omitted. Rotated loadings 50.25 are omitted; those 50.4 are in bold print. Correlations among factors are as follows:
Factors I and II (0.59); Factors I and III (0.69); Factors II and III (0.65).

Enterprising, and Conventional. The findings those subjects who favored Sensing and Feeling
indicated that there were two General Occupa- on the Myers±Briggs. Just as hypothesized,
tional Themes that produced substantial mean individuals classified as Intuitive earned higher
differences between IQ levels. Individuals with KAIT Composite IQs than those classified as
higher IQ were more Investigative and more Sensing. However, the Fluid IQ was not found
Artistic (Artistic mean score = 49) than those to be favored over the Crystallized IQ, as had
with average IQ (Artistic mean score = 45) or been predicted (Kaufman et al., 1996). Thus, a
low IQs (Artistic mean score = 42). The authors modest association is present between person-
explain that in light of the Investigative person's ality dimensions and intellectual ability (as
interest in science and in solving abstract evidenced on the Myers±Briggs and KAIT).
problems, the relationship of the Investigative
theme to IQ level makes sense. The relationship
4.08.2.1.8 Overview
between the Artistic theme and IQ was not
hypothesized by the researchers, but was none- The KAIT represents a reconceptualization
theless significant, even with the effect of of the measurement of intelligence that is more
Investigative partialed out. consistent with current theories of intellectual
An examination of the KAIT with the development (Brown, 1994). The fluid-crystal-
Myers±Briggs Type Indicator has been con- lized dichotomy, the theory underlying the
ducted to understand more clearly the com- KAIT, is based on the original Horn±Cattell
monly accepted relationship between theory of intelligence, thus offering a firm and
personality style and cognition (Kaufman, well researched theoretical framework (Flana-
McLean, & Lincoln, 1996). The researchers gan, Alfonso, & Flanagan, 1994). The fluid-
had hypothesized that subjects who favored crystallized dichotomy enhances the richness of
Intuition and Thinking on the Myers±Briggs the clinical interpretations that can be drawn
would be more intelligent and would also favor from this instrument (Brown, 1994). The test
fluid over-crystallized intelligence, compared to materials are well constructed and attractive,
Measures of Intelligence 215

Table 10 Exploratory joint principal-factor analysis of the KAIT and the WAIS-R (N = 338).

Oblimin factor pattern

First unrotated Crystallized/verbal Perceptual organization Fluid


Subtest factor (g) I II III

KAIT
Crystallized
Definitions 78 65 34
Auditory comprehension 73 69
Double meanings 79 64 47
Famous faces 68 69
Fluid
Rebus learning 65 60
Logical steps 66 57
Mystery codes 64 32 56
Memory for block designs 54 62
WAIS-R
Verbal
Information 79 89
Vocabulary 81 91
Arithmetic 77 47
Comprehension 79 78
Similarities 77 59 32
Digit Span 60 25 29
Performance
Picture completion 66 49
Picture arrangement 64 28 57
Block design 70 76
Object assembly 64 80
Coding 50 60

Source: Kaufman and Kaufman (1993)


Note: Decimal points are omitted. Rotated loadings 50.25 are omitted; those 50.4 are in bold print. Correlations among factors are as follows:
Factors I and II (0.53); Factors II and III (0.53).

and the manual is well organized and helpful however, it can be difficult to use with border-
(Dumont & Hagberg, 1994; Flanagan et al., line individuals and some elderly clients. Elderly
1994). Furthermore, the test materials are easy clients' scores on some of the subtests may be
to use and stimulating to examinees (Flanagan negatively impacted by poor reading, poor
et al.). hearing, and poor memory (Dumont & Hag-
ªThe KAIT has been standardized by state- berg, 1994).
of-the-art measurement techniquesº (Brown, Flanagan et al. (1994) report that the
1994). The psychometric properties of the inclusion of only three subtests per scale may
KAIT regarding standardization and reliability limit or interfere with the calculation of IQs if a
are excellent and the construct validity evidence subtest is spoiled. The usefulness of the
that is reported in the manual provides a good Expanded Battery and Mental Status subtest
foundation for its theoretical underpinnings of clinical populations is questionable given the
(Flanagan et al., 1994). reliability and validity data presented in the
The theoretical assumption that formal manual, suggesting that interpretations be made
operations is reached by early adolescence with caution (Flanagan et al.).
limits that application of the KAIT with certain Although there clearly are some limitations in
adolescent and adult populations (Brown, the use of the KAIT with some populations,
1994). If an individual has not achieved formal overall, the test appears to be well thought out
operations, many of the subtests will be too and validated (Dumont & Hagberg, 1994). The
difficult for them and perhaps frustrating and KAIT represents an advancement in the field of
overwhelming. Examiners should be aware of intellectual assessment with its ability to
this limitation when working with such indivi- measure fluid and crystallized intelligence from
duals in order to maintain rapport. The KAIT a theoretical perspective and, at the same time,
can be a useful assessment tool when working maintain a solid psychometric quality (Flana-
with high-functioning, intelligent individuals; gan et al., 1994).
216 Intellectual Assessment

4.08.2.1.9 The Stanford±Binet: Fourth edition short, however, in terms of age, parental
occupation, and parental education. The total
(i) Theory
sample size was large (5013), with age repre-
Like its predecessors, the fourth edition is sentation extending from two years to 23 years,
based on the principal of a general ability factor, 11 months. The concentration of the sample is
g, rather than on a connection of separate on children aged 4±9 years old (41%). Not only
functions. The fourth edition has maintained, were adults 24 years and older not represented,
albeit to a much lesser degree, its adaptive but also representation beyond age 17 years, 11
testing format. No examinee takes all the items months was negligible (4%).
on the scale, nor do all examinees of the same In order to assess characteistics of SES,
chronological age respond to the same tasks. information regarding parental occupation and
Like its predecessors, the scale provides a parental education was obtained. A review of
continuous appraisal of cognitive development Table 11 demonstrates that children whose
from ages two through young adult. parents came from managerial/professional
One of the criticisms in the previous versions occupations and or who were college graduates
is that they tended to underestimate the and beyond were grossly over-represented in the
intelligence of examinees whose strongest sample. In other words, the norms are based on
abilities did not lie in verbal skills (or over- a large percentage of individuals from upper-
estimate the intelligences of those whose verbal socioeconomic classes. In order to adjust for this
skills excelled). Therefore, consideration when discrepancy, a weighting procedure was applied,
developing the Binet IV was to give equal which makes the norming sample suspect.
credence to several areas of cognitive function- Unquestionably, SES has been shown time
ing. The authors set out to appraise verbal and again to be the single most important
reasoning, quantitative reasoning, abstract/ stratification variable regarding its relationship
visual reasoning, and short-term memory (in to IQ (Kaufman, 1990, Chapter 6; Kaufman &
addition to a composite score representing g). Doppelt, 1976).
This model is based on a three-level hier- Internal consistency estimates for the
archical model of the structure of cognitive Stanford±Binet IV Composite Scale are excel-
abilities. A general reasoning factor is at the top lent, ranging from 0.95 to 0.99 across the age
level (g). The next level consists of three broad groups (median = 0.97) (Sattler, 1988). The
factors: crystallized abilities, fluid analytic internal reliabilities are also high for the Verbal
abilities, and short-term memory. The Horn± Reasoning, Abstract/Visual Reasoning, Quan-
Cattell theory forms a foundation for the test, titative Reasoning, and Short-term Memory
with measures of Gc being Verbal and Quanti- Area scores (typically in the upper 0.80s±0.90s).
tative, and Abstract-Visual being a Gf scale. The Subtest reliabilities are also good, with the
third level consists of more specific factors, exeption of Memory for Objects which had a
similar to some of Thurstone's eight primary median of 0.73 (Thorndike et al., 1986b).
mental factors: verbal reasoning, quantitative Test±retest reliability estimates are also good
reasoning, and also abstract/visual reasoning. for preschool (Composite coefficient = 0.91)
The selection of these four areas of cognitive and elementary school aged (Composite coeffi-
abilities came from the authors' research and ceint = 0.90) samples (Thorndike, Hagen, &
clinical experience of the kinds of cognitive Sattler, 1986a). From an internal reliability
abilities that correlate with school progress. perspective, this measure is generally good.
The Binet IV contains previous tasks, Construct validity for g and for the four
combining old with new items, and some factors was studied using a variant of con-
completely new tasks. In general, test items firmatory factor analysis. The subtests had
were accepted if (i) they proved to be acceptable impressive high to substantial loadings on g
measurements of the construct, (ii) they could be (0.51±0.79). Unfortunately, the four factors
reliably administered and scored, (iii) they were were given weak support by the confirmatory
relatively free of ethnic and/or gender bias, and procedure. Additionally, exploratory factor
(iv) they functioned adequately over a wide analysis gave even less justification for the four
range of age groups. Binet Scales; only one or two factors were
identified by Reynolds, Kamphaus, and Ro-
senthal (1988) for 16 of the 17 age groups
(ii) Standardization and properties of the scale
studied. Clearly, the factor analytic structure
Standardization procedures followed 1980 does not conform to the theoretical framework
US Census Data. There appears to be an used to construct the test. Therefore, once again
accurate sample representation from geo- the composite score is left as the only clearly
graphic region, size of community, race/ethnic valid representation of a child's cognitive
group, and gender. The standardization falls abilities.
Measures of Intelligence 217

Table 11 Representation of the Stanford±Binet fourth edition.

Sample percent US population percent

By parental occupation
Managerial/professional 45.9 21.8
Technical sales 26.2 29.7
Service occupations 9.7 13.1
Farming/forestry 3.2 2.9
Precision production 6.7 13.0
Operators, fabricators, other 8.3 9.5
Total 100.0 100.0

By parental education
College graduate or beyond 43.7 19.0
1±3 years of college 18.2 15.3
High school graduate 27.5 36.5
Less than high school 10.6 29.2
Total 100.0 100.0

Correlational studies, using nonexceptional scribes the Binet IV as ªin most respects, a
children, between the Binet IV and the completely new version of a very old testº
Stanford±Binet (Form L-M), WISC-R, WAIS- (Spruill, 1987). This author also questioned
R, WPPSI, K-ABC have ranged from 0.80 to whether or not the weighting procedure that was
0.91 (comparing full-scale composites). Corre- used to correct for sample bias was adequate
lational studies using exceptional children (Spruill, 1987) that was not outweighed by the
(gifted, learning impaired, mentally retarded) large size of the standardization sample.
produced generally lower correlations, prob- Finally, it is not clear why a test described as
ably because of restricted variability in the test for individuals aged two to adult does not
scores. For example, for gifted students the include persons over the age of 23 in the
mean composite score on the Binet IV corre- standardization sample.
lated 0.69 with the WISC-R Full Scale IQ. These Although there appears to be a number of
data and data from similar validity investiga- difficulties with the Binet IV, the test is still used
tions are presented more extensively in the and it is not without its strengths. The
Technical manual for the Binet IV (Thorndike administration of some of the subtests allow
et al., 1986a). Despite the presentation of ample the examiner flexibility, and young children
evidence of concurrent validity, the substantial seem to find the items challenging and fun. The
problems with construct validity, the data scale has excellent internal reliability and
collection method, and other difficulties with provides a flexible administration format.
the Binet IV have led at least one reviewer to Despite its shortcomings, Binet IV continues
recommend that the battery be laid to rest to be a very good assessment of cognitive skills
(Reynolds, 1987): ªTo the S-B IV, Requiescat in related to academic progress (Spruill, 1987). It
paceº (p.141). also includes several excellent, well-constructed
tasks that offer valuable information when they
are administered in addition to the Wechsler
(iii) Overview scales (Kaufman, 1990, 1994b).
The Binet IV was developed in an attempt to
increase the popularity of the test as well as
4.08.2.1.10 Woodcock±Johnson Psycho-
address some of the negative reviews that had
Educational Battery-Revised: tests
plagued the previous edition. The test authors
of cognitive ability (WJ-R)
attempted to make the fourth edition signifi-
cantly different from the previous L-M edition; The WJ-R is one of the most comprehensive
however, it appears that this goal has achieved test batteries available for the clinical assess-
only limited success. Canter (1990) describes the ment of children and adolescents (Kamphaus,
ªrebirthº of the Binet as giving way to 1993). The WJ-R is a battery of tests for
ªconfusion and even dismay as the primary individuals from age 2 to 90+, and is composed
consumers of intelligence tests learned that the of two sections, Cognitive and Achievement.
new edition offered a more complicated route to The focus of this discussion is the Cognitive
the same destination.º Another reviewer de- portion of the WJ-R battery.
218 Intellectual Assessment

(i) Theory There are two composite scores, Broad Cogni-


tive Ability and Early Development (for pre-
The WJ-R Cognitive battery is based on schoolers), which are both comparable to an
Horn's (1985, 1989) expansion of the Fluid/ overall IQ. The individual subtest scores as well
Crystallized model of intelligence (Kamphaus, as the composite scores have a mean of 100 and
1993; Kaufman, 1990). The standard and a standard deviation of 15.
supplemental subtests of the WJ-R are aligned Computer software is available for scoring
with eight of the cognitive abilities isolated by the WJ-R and is essential if all of the
Horn (1985, 1989) (Kamphaus, 1993; Kaufman information is to be obtained that the WJ-R
1990). The cognitive battery measures seven is capable of providing. The WJ-R provides the
Horn abilities: Long-term Retrieval, Short-term examiner with percentile ranks, grade-based
Memory, Processing Speed, Auditory Proces- scores, age-based scores, and the Relative
sing, Visual Processing, Comprehension-knowl- Mastery Index (RMI). The RMI is a unique
edge, and Fluid Reasoning. An eighth ability, kind of ratio, with the second part of the ratio
Quantitative Ability, is measured by several set at a value of 90. The denominator of the ratio
Achievement subtests on the WJ-R. means that children in the norm sample can
The four subtests that measure Long-term perform the intellectual task with 90% accu-
Retrieval (Memory for Names, Visual± racy. The numerator of the ratio refers to that
Auditory Learning, Delayed Recall/Memory child or adolescent's proficiency on that subtest
for Names, Delayed Recall/Visual±Auditory (Kamphaus, 1993). For example, if a child
Learning), require the subject to retrieve obtains an RMI of 60/90, it would mean that the
information stored minutes or a couple of days child's proficiency on the subtest is at a 60%
earlier. In contrast, the subtests that measure level whereas the typical child of that age (or
Short-term Memory (Memory for Sentences, grade) mastered the material at a 90% level of
Memory for Words, Numbers Reversed) re- accuracy.
quire the subject to store information and The entire battery is quite lengthy and
retrieve it immediately or within a few seconds. therefore can be timely to administer. The
The two Processing Speed subtests (Visual seven-subtest Standard Battery takes approxi-
Matching, Cross Out) assess the subject's ability mately 40 minutes to administer; however, all
to work quickly, particularly under pressure to the clinician will obtain from it is, essentially, a
maintain focused attention. measure of g. In order to obtain all of the
Within the Auditory Processing domain, information that the WJ-R is capable of
three subtests (Incomplete Words, Sound providing, a clinician should administer most
Blending, Sound Patterns) assess the subject's of the subtests in both the Cognitive and
ability to perceive fluently patterns among Achievement batteries. Administration of a
auditory stimuli. The three Visual Processing thorough cognitive and achievement assessment
subtests (Visual Closure, Picture Recognition, using the WJ-R would take approximately 3.5±5
Spatial Relations) assess the subject's ability to hours depending on the subject's age, abilities,
manipulate fluently stimuli that are within the and speed. However, individual subtests may be
visual domain. administered to test specific hypothesis without
Picture Vocabulary, Oral Vocabulary, Lis- administering the entire battery. The WJ-R tests
tening Comprehension, and Verbal Analogies also provides measures of differential scholastic
are the four subtests that are linked to the aptitudes including reading, mathematics, writ-
Comprehension-knowledge factor, also known ten language, and knowledge. An aptitude-
as crystallized intelligence within Horn's theo- achievement comparison may be made if the
retical model. These subtests require the subject WJ-R Tests of Achievement are given in
to demonstrate the breadth and depth of their addition. Such a discrepancy reflects the
knowledge of a culture. Analysis-synthesis, amount of disparity between certain intellectual
Concept Formation, Spatial Relations, and capabilities of an individual and their actual
Verbal Analogies (which also loads on the academic performance.
Comprehension-knowledge factor) assess the Evidence has been presented that supports
subject's Fluid Reasoning, or ªnewº problem- the use of the WJ-R, standard cognitive and
solving ability. Finally, from the Achievement achievement tests in the identification and
portion of the WJ-R, both the Calculation and classification of school-aged children as gifted,
Applied Problems subtests assess the indivi- learning-disabled, and mentally retarded
dual's Quantitative Ability. (Evans, Carlsen, & McGrew, 1993). Significant
The cognitive battery consists of 21 subtests, group differences were found on mean scores of
seven of which comprise the standard battery; all WJ-R standard cognitive and achievement
the remaining 14 are part of the supplemental clusters. Together, the WJ-R cognitive and
battery (one per ability as described by Horn). achievement demonstrated the ability to predict
Measures of Intelligence 219

group membership (gifted, L.D., or M.R.). This WAIS-R Verbal Scale scores (r = 0.44) for the
was shown in an overall classification agree- rehabilitation subjects, but a more moderate
ment of 93.5% for ages 8±10 and 84.3% for ages positive correlation was found for the university
16±18. These levels of classification agreement subjects (r = 0.73). The WAIS-R Full Scale IQ
support the use of the WJ-R batteries in the also correlated moderately with the Broad
identification of exceptional students (Evans Cognitive in university subjects (r = 0.72),
et al., 1993). It should be noted, however, that and had a lower correlation with rehabilitation
clinicians must supplement the process of subjects (r = 0.58). The correlation between
assessment and diagnosis by including other WJ-R Broad Cognitive Scores and the Perfor-
factors beyond statistical classification, such as mance Scale IQ was low for both the university
social-emotional considerations, medical con- and rehabilitation subjects (r = 0.40). The low
ditions, vision and hearing measures, and other correlation between the PIQ and Broad Cog-
environmental considerations, to make the nitive suggests that the two instruments are
determination of classification of the aforemen- providing different information, and are there-
tioned groups. fore both potentially useful.

(ii) Standardization and properties of the scale (iii) Overview


The WJ-R was normed on a reasonably The WJ-R Cognitive battery was developed
representative sample of 6359 individuals based on Horn's expansion of the Cattell±Horn
selected to provide a cross-section of the US Fluid±Crystallized model of intelligence. This
population aged 2±90+ (Woodcock & Mather, theoretical rationale allows for further empirical
1989). The sample included 705 preschool analysis of both the WJ-R and the theory
children, 3245 students in grades K-12, 916 (Webster, 1994). The standardization of the
college/university students, and 1493 indivi- battery appears to be sound and the various age
duals aged 14±90+ who were not enrolled in groups are represented adequately.
school. Stratification variables included gender, The Cognitive battery is quite thorough and,
geographic region, community size, and race. when administered in its entirety, can provide
However, Kaufman (1990) reports that, the examiner with a wealth of information
although representation on important back- about an individual's intellectual functioning
ground variables was adequate, it was necessary and abilities. The test materials and manuals are
to use a weighting procedure to adjust the data easy to use and well designed. The administra-
that was collected so it would match US tion is fairly simple; however, scoring the test,
population statistics. especially when the Achievement battery is
The internal consistency estimates for the administered as well, can be quite a lengthy and,
standard and supplemental battery subtests are initially, a difficult process. The scoring can be
good, with median scores from ages 2 to 79 done by hand but is done more efficiently with
ranging from 0.69 to 0.93. The Broad Cognitive the computer scoring program. The computer
Ability composite score for the seven standard scoring program is easy to use and provides the
battery subtest yields a median internal con- examiner with the individual's raw scores,
sistency coefficient of 0.94 and the Broad standard scores, percentile ranks, and age and
Cognitive Ability Early Development Scale grade equivalents for each subtest (Webster,
yields a coefficient of 0.96 at ages two and four 1994).
(Kamphaus, 1993). The WJ-R Cognitive battery is a well
The validity of the WJ-R has been called into standardized test developed on a theory of
question when used with a learning disabilities intelligence. However, the test is not without
population (Hoy et al., 1993). This is because shortcomings. Webster (1994) raises issues with
the mean broad cognitive ability scores from the the specific psychometric procedures used in
WJ-R Tests of Cognitive Ability have been developing test items. Data are lacking that
found to be one standard deviation lower than show the efficacy of the WJ-R to predict, from a
mean full scale scores on the WISC-R. How- time based perspective, actual functional levels
ever, in an adult sample, significant and of academic achievement and to identify
consistently high correlations were found children at risk for failure early in the educa-
between the WAIS-R and the WJ scores tional process (Webster, 1994). Kaufman (1990)
(Siehen, 1985). Hoy et al. studied 27 male and points to another shortcoming with the small
20 university students with a previously diag- number of tests that comprise each scale. The
nosed learning disability, as well as 47 learning Standard Scale measures each of the seven
disabled individuals from a rehabilitation clinic. scales with one subtest apiece.
The results indicated a low positive correlation The Woodcock±Johnson Psycho-Educational
between the WJ-R broad cognitive score and the Battery: Revised examiner's manual reports that
220 Intellectual Assessment

ªItems included in the various tests were developed from include Horn and Cattell's
selected using item validity studies as well as (1966) fluid and crystallized intelligences, Das'
expert opinionº (Woodcock & Mather, 1989, (1973) simultaneous and successive processes,
p. 7). Kamphaus (1993) states that the manual Jensen's (1980) associative and cognitive levels,
should have included more information on the and Wechsler's (1974, 1981, 1989) verbal and
results of the experts' judgments or some performance scales.
information on the methods and results of the The DTLA-3 yields five types of scores: raw
studies that were used to assess validity. scores, subtest standard scores, composite
It is clear that the WJ-R Cognitive battery is quotients, percentiles, and age equivalents.
quite comprehensive, providing the clinician Standard scores for the individual subtests have
with a wealth of information. The standardiza- a mean of 10 and a standard deviation of three
tion sample is large, the factor loadings reveal and the Composite Quotients have a mean of
generally strong factor analytic support for the 100 and a standard deviation of 15.
construct validity for the battery for adolescents The individual subtest reliabilities range from
and adults, and the reliability coefficients are 0.77 to 0.94, with a median of 0.87, and the
excellent (Kaufman, 1990). averaged alphas for the composites ranged from
0.89 to 0.96, with a median of 0.94. To assess the
DTLA-3's stability over time, the test±retest
4.08.2.1.11 Detroit Tests of Learning Aptitude
method was used with a sample of 34 children
(DTLA-3)
residing in Austin, Texas. The children, aged six
DTLA-3 was developed by Hammill (1991) through 16, were tested twice, with a two-week
and was designed to measure different, but period between testings (Hammill, 1991). The
interrelated, mental abilities for individuals results of this test±retest analysis indicate that
aged six through 17 years, 11 months. It is a individual subtest reliabilities range from 0.75 to
battery of 11 subtests and yields 16 composites 0.96, with a median of 0.84, and Composite
that measure both general intelligence and reliabilities range from 0.81 to 0.96, with a
discrete ability areas. Hammill and Bryant median of 0.90.
(1991) report that the DTLA-3 was influenced
greatly by Spearman's two-factor theory (1927).
(i) Overview
This theory of ªaptitudeº consisted of a general
factor g that is present in all intellectual pursuits, The DTLA-3 was designed to measure both
and specific factors that vary from task to task general intelligence and discrete abilities for
(McGhee, 1993). children aged six through 17 years, 11 months.
The 11 subtests are used to form the 16 The DTLA-3 is not grounded in one specific
composite scores. The subtests are grouped into theory but rather can be linked to a number of
different combinations according to various different theorists and their views on intelligence
hypothetical constructs that exist in theories of and achievement. This ªeclecticº theorizing has
intelligence and information processing. In resulted in the DTLA-3's numerous subtests,
general, the composite scores estimate general composites and various combinations of the two
mental ability; however, they all do so in a that yield potentially important information
somewhat different manner. The General about an individual's abilities.
Mental Ability Composite is formed by com- Reliability and validity studies are encoura-
bining the standard scores of all 11 subtests, ging but are based on specific and limited
and, thus, has been referred to as the best samples (VanLeirsburg, 1994). Additional re-
estimate of g. The Optimal Level Composite is search in this area would be beneficial.
composed of the four largest standard scores Furthermore, test±retest reliability data were
that the individual earns. This individualized collapsed across age levels, which makes it
score is often referred to as the best estimate of a impossible to determine the stability of scores of
person's overall ªpotential.º The Domain the various age levels (Schmidt, 1994). The
Composites may be divided into three areas: standardization sample was representative of
Linguistic, Attentional, and Motoric. Further- the US population but more information on
more, there is a Verbal and Nonverbal socioeconomic level is needed (Schmidt, 1994).
Composite in the Linguistic domain, an Also, there is no normative data reported for
Attention-enhanced and Attention-reduced subjects with handicapping conditions and
Composite in the Attentional Domain, and a sample stratification for age was not equalized
Motor Enhanced and a Motor-reduced compo- (VanLeirsburg, 1994).
site in the Motoric Domain. Finally, there are The testing manual suggests that individual
the Theoretical Composites of the DTLA-3 on testing time may vary but that on average it
which the battery's subtests are constructed. takes 50 minutes to two hours to administer.
The major theories that the subtests were Scoring and interpretation of the results is easy,
Measures of Intelligence 221

yet it can be quite time consuming without the systems for receiving, perceiving, remembering,
aid of the computer program (VanLeirsburg, and processing information in the visual and
1994). Despite apparent shortcomings, the auditory modalities. The systems are repre-
DTLA-3 may be useful for eligibility or sented by Verbal and Visualization/Spatial
placement purposes as well as a useful research factors. There is strong neuropsychological
tool (Schmidt, 1994). evidence for the existence of these systems
(Elliott, 1997), which tend to be specific in the
left and right cerebral hemispheres, respectively.
4.08.2.1.12 Differential Abilities Scales (DAS)
The DAS specifically measures each of these
The DAS was developed by Elliott (1990) and factors by the Verbal cluster and the Spatial
is an individually administered battery of 17 Cluster.
cognitive and achievement tests for use with Normally, the auditory and visual systems do
individuals aged 2.5 through 17 years. The DAS not operate completely independently. There is
Cognitive Battery has a preschool level and a interaction of the systems. The integrative
school-age level. The school-age level includes system is represented factorally by the fluid
reading, mathematics, and spelling achievement reasoning factor in the Cattell±Horn theory.
tests that are referred to as ªscreeners.º The Analysis of both verbal and visual information
same sample of subjects was used to develop the is usually required in measures of fluid reason-
norms for the Cognitive and Achievement ing. The neuropsychological function is an
Batteries; therefore, intra- and intercompari- integrative system of the frontal lobe, which is
sons of the two domains are possible. critical to complex mental functioning (Luria,
The DAS is not based on a specific theory of 1973). The DAS measures fluid ability by the
intelligence. Instead, the test's structure is based Nonverbal Reasoning cluster, which requires
on tradition and statistical analysis. None- integrated analysis and transformation of both
theless, the test is not theory free, and, in fact, is visual and verbal information.
based in part on g and the view of intelligence as Elliott (1997) notes that there is much
hierarchical in nature (McGhee, 1993). Elliott evidence from cognitive psychology that in-
(1990) described his approach to the develop- dicates that verbal and visual short-term
ment of the DAS as ªeclecticº and cited memory systems are quite distinct. However,
researchers such as Cattell, Horn, Das, Jensen, some cognitive tests (i.e., Stanford±Binet IV)
Thurstone, Vernon, and Spearman. Indeed, represent memory with a single factor. The
there are some clear-cut relationships between DAS, on the other hand, represents visual and
several DAS scales and theoretical constructs. auditory short-term memory with distinct
For example, Horn's (1985, 1989) concepts of measures, rather than with one unitary task.
fluid and crystallized intelligence are measured The DAS also provides a measure of
quite well by the Nonverbal Reasoning and intermediate-term memory (in the Horn±Catell
Verbal Ability scales, respectively. Elliott model), which is usually measured by tasks that
endorses Thurstone's ideas that the emphasis have both verbal and visual components. The
on intellectual assessment should be on the DAS has a measure in which pictures are
assessment and interpretation of distinct abil- presented, but they have to be recalled verbally
ities (Kamphaus, 1993). He also stresses that in (Recall of Objects). This visual±verbal short-
the assessment of children with learning and term memory measures another distinct infor-
developmental disabilities, clinicians need more mation processing system (Elliott, 1997).
fine detail than is provided by a global IQ score The cognitive portion of the DAS consists of
(Elliott, 1997). Therefore, subtests were con- ªcoreº and ªdiagnosticº subtests designed to
structed to emphasize their unique variance assess intelligence at the preschool level and the
which should translate into unique abilities. school-age level. The core subtests measure
Although it was expected that meaningful complex processing and conceptual ability,
composites would be derived from the subtests, which is strongly g-related. The diagnostic
the primary focus in test development was at the subtests measure less cognitively complex
subtest level. One of the main distinctions functions, such as short-term memory and
between the DAS and other batteries is its processing speed, thereby having less of a g-
emphasis on the subtest level. saturation (Elliott, 1997). The achievement
Elliott (1997) noted how in psychology there portion measures skills in the areas of word
has long been a link between cognitive abilities reading, spelling, and basic number skills. The
and neurological structure. The DAS uses the core subtests are averaged to obtain the General
link between the factor structure of abilities and Conceptual Ability (GCA) score and, depend-
the neurological evidence of the nature of the ing on the age of the individual, additional
structures. For example, the DAS has two composite scores are calculated which are
major ability clusters which reflect two major referred to as Cluster scores.
222 Intellectual Assessment

The individual Cognitive subtests have a Blacks or Hispanics, compared with Whites
mean of 50 and a standard deviation of 10. The (Elliott, 1990).
GCA scores, Cluster scores, and Achievement The DAS has a median reliability estimate of
scores, have a mean of 100 and a standard 0.95 for the GCA. Internal consistency relia-
deviation of 15. Percentile ranks, age equiva- bility estimates for the cluster scores range from
lents, and score comparisons are also available 0.83 for Nonverbal Reasoning at age five to 0.94
in the examiner's manual. Score comparisons for Spatial at several ages (Kamphaus, 1993).
provide a profile analysis and allow the The test±retest reliability coefficients for the
examiner to ascertain information regarding preschool composite scores are 0.84 for Verbal
aptitude-achievement discrepancies. Ability and 0.79 for Nonverbal Ability. The
Interpretation of the DAS subtests and individual subtests' reliabilities vary with an
composites is facilitated by the framework average coefficient of 0.78.
provided in the Handbook (Elliott, 1990). Elliott Correlational research has shown good
(1990) notes one positive aspect of interpreting evidence of concurrent validity for the DAS
the DAS is that the ªdesign of scoring (Kamphaus, 1993). With a sample of 27 children
procedures on the Record Form enables aged 7±14, the WISC-III Full Scale IQ
statistically significant high and low scores to correlated very highly with the DAS GCA
be identified immediatelyº (p. 37). Significant score (0.92), and the WISC-III Verbal IQ score
discrepancies between subtests, cluster scores, correlated highly with the DAS Verbal Ability
and ability and achievement can be obtained score (0.87). The WISC-III Performance IQ
immediately. Like Kaufman's (1994b) approach correlated 0.78 with Nonverbal Reasoning and
to interpreting the WISC-III, an ipsative 0.82 with Spatial Ability. Additionally, the DAS
approach is used to examine differences between Speed of Information Processing subtest score
subtests, requiring the examiner to compare the correlated 0.67 with the WISC-III Processing
child's mean score on core subtests to his or her Speed Index score. The Binet IV Composite IQ
individual subtest scores. Discrepancy between correlated 0.88 with the DAS GCA for nine- and
ability and achievement is analyzed by examin- 10-year-olds and 0.85 with the DAS GCA for a
ing the GCA (or Special Nonverbal Composite) sample of gifted children. The K-ABC Mental
and each of the achievement tests. Processing Composite correlated 0.75 with the
DAS GCA for 5±7-year-olds (Kamphaus,
1993).
(i) Standardization and properties of the scale
Elliott (1997) presents three validity studies
Elliott (1997) noted that exceptionally careful not published in the DAS manual. One of these
and effective standardization and data-analytic studies included a confirmatory factor analysis
procedures were used in the development of the of the DAS by Keith (1990), which concluded
DAS. The DAS was standardized on 3475 that ªthe constructs measured by the DAS are
children tested between 1987 and 1989. The remarkably consistent across overlapping age
normative sample includes 200 cases for each levels of the testº (Elliott, 1997, p. 20). Elliott
age level between the ages of five and 17. The also discusses a joint factor analysis of the DAS
younger part of the sample consisted of 350 and WISC-R. In a reanalysis of data, Elliott (in
children between the ages of 2.5 year, and four press) reported five factors emerging: crystal-
years, 11 months. Exceptional children were lized intelligence (including DAS verbal and
also included in the standardization sample. WISC-R verbal subtests), spatial or broad
Gender, race, geographic region, community visualization (including DAS spatial and four
size, and enrollment (for ages 2±5 through 5±11) of the five major WISC-R Performance subt-
in an educational program were controlled. SES ests), nonverbal reasoning or fluid intelligence
was estimated using the average education level (defined only by DAS Nonverbal Reasoning
of the parent or parents living with the child subtests), auditory short-term memory, and
(Kamphaus, 1993). speed of processing.
Over and above the requirements of the norm
sample, an additional 600 cases of Black and
(ii) Overview
Hispanic children were collected. The reason
that this oversample was collected was to In general, the professional reviews of the
perform statistical analysis for item bias and DAS seem to be quite positive. Sandoval (1992)
prediction bias. The test developers wanted to reports that the DAS is one of the least biased
ensure that the rules for scoring would be tests available. The test appears to be a relatively
sensitive to minority children's responses (El- culture fair measure; however, its use with
liott, 1997). Only a small number of items were linguistically different children needs to be
deleted due to item bias, and there was ªno explored further (Sandoval, 1992). In examining
evidenceº that the DAS is biased against either different cultural group's performance on the
Measures of Intelligence 223

DAS, the group differences found typically to easy to use, making interpretation of the
be present on traditional IQ tests are also found profiles, and individual and composite scores
on the DAS. For example, African-American much easier.
and Hispanic children score between half and
two-thirds of a standard deviation below White
children, and Asian children score above White 4.08.2.1.13 Cognitive Assessment System
children on all but verbal areas of the test. (CAS)
Caution is necessary when assessing Hispanic (i) Theory
children because the DAS overpredicts achieve-
ment for this group based upon group achieve- The Das±Naglieri Cognitive Assessment
ment results (Bain, 1991). The author of the System (Naglieri & Das, 1996) was developed
DAS suggests that children who are not according to the Planning, Attention, Simulta-
proficient in English be given the nonverbal neous, and Successive (PASS) theory of intelli-
tests on the Special Nonverbal scale in the gence. The subtests are organized into four
primary language. However, this can be scales designed to provide an effective measure
problematic as the test developers did not of the PASS cognitive processes. Planning
provide directions in other common languages, subtests require the child to devise, select, and
such as Spanish. In addition, the utility of the use efficient plans of action to solve the test
English norms for assessing a child who is problems, regulate the effectiveness of the plans,
administered the test in Spanish or another non- and self-correct when necessary. Attention tests
English language has not been explored. require the child to selectively attend to a
The DAS manual has recommendations for particular stimulus and inhibit attending to
administering the test to deaf or limited- distracting stimuli. Simultaneous processing
English-proficient children, however, these tests require the child to integrate stimuli into
recommendations are lacking in a couple of groups to form an interrelated whole and
areas. Braden (1992) notes, ªThe recommenda- Successive processing tests require the child to
tions that age equivalents be used to represent integrate stimuli in their specific serial order or
the performance of retarded person is common, appreciate the linearity of stimuli with little
but it is potentially misleadingº (p. 93). Another opportunity for interrelating the parts.
problem with the recommendations is that no The CAS yields Planning, Attention, Simul-
mention is made of the use of interpreters for taneous, Successive, and Full Scale normalized
hearing-impaired children and nonverbal chil- standard scores (mean of 100 and standard
dren, which may have a detrimental effect on deviation of 15). The Planning scale's subtests
deaf children's test scores. include Matching Numbers, Planned Codes,
According to Braden (1992), the Technical Planned Connections, and Planned Search; the
Manual includes extensive research data which Attention scale subtests include Number Detec-
suggest that the DAS is a psychometric tion, Receptive Attention, and Expressive
improvement over existing techniques for Attention; the Simultaneous Scale subtests are
measuring intellectual abilities and for deter- Nonverbal Matrices, Verbal±Spatial Relations,
mining intracognitive and aptitude-achieve- and Figure Memory; and the Successive Scale
ment discrepancies. The GCA of the DAS is subtests are Word Series, Sentence Repetition,
largely independent of tasks known to be Sentence Questions, and Successive Speech
difficult for learning disabled children and is Rate. All subtests are set at a normative mean
able to assist in the identification of learning of 10 and SD of three.
disabilities or processing deficits.
The DAS can be a useful tool in assessing
(ii) Standardization and properties of the scale
intelligence and achievement in both children
and adolescents. However, there are a few The CAS was standardized on 2200 children
characteristics of the DAS which do not ranging in age from five through 17 years. The
promote ease of administration, especially for sample was stratified by age, gender, race,
novices (Braden, 1992). These difficulties in- ethnicity, geographic region, and parent educa-
clude having to apply two rules for subtest tion according to US. Census reports and
discontinuation, and having to convert raw closely matches the US population character-
scores to ability scores prior to obtaining subtest istics on the variables used. In addition to
scaled scores. The DAS examiner's manual administration of the CAS, most of the
provides interpretive information and a frame- standardization sample was also administered
work for interpretation for the composite scores several achievement tests from the Woodcock±
and subtests. The level and/or depth of Johnson Tests of Achievement. (Woodcock &
information that the interpretative portion of Johnson, 1989). This provided for both validity
the manual provides is quite thorough and is evidence and analysis of the relationships
224 Intellectual Assessment

between PASS and achievement. No further subtest score includes Backwards span which
data were available on the CAS when this involves more than sequential processing (Scho-
chapter went to press. field & Ashman, 1986), there is no efficient
measure of sequential processing on the Wechs-
ler. In addition, the K-ABC Achievement Scales
4.08.3 INSTRUMENT INTEGRATION are highly related to the Wechsler Verbal IQ and
crystallized abilities (Kaufman & Kaufman,
It is to the advantage of clinical and school
1983; Naglieri & Jensen, 1987).
psychologists that there are so many instru-
Given the above characteristics of the K-
ments available to assess a child's, adolescent's,
ABC, examiners may note that the entire
or adult's intellectual functioning. Often when
Simultaneous Processing Scale serves as a good
one instrument is administered, such as the
measure for children with motor and/or speed
WISC-III or WAIS-R, and then analyzed, the
problems who earn low Wechsler Performance
examiner will find that questions and hypoth-
IQs, because it minimizes both of these vari-
eses are raised regarding the individual's
ables. From the Horn view, the K-ABC offers
functioning in specific areas. Creativity and a
good supplemental subtests to measure Gc
bit of detective work are required to uncover
including Faces and Places and Riddles, in the
exactly where a person's true deficits and
Achievement Scale. These tasks measure range
strengths lie in their cognitive abilities. Part of
of general knowledge by identifying visual
the detective work in this process involves the
stimuli (Faces and Places), and require the
integration of information from various instru-
child to use verbal reasoning to demonstrate
ments to support or clarify hypotheses raised as
word knowledge (Riddles). This is unlike many
initial results are examined. Thus, examiners
tests, such as WISC-III Vocabulary, and similar
ultimately have to be able to integrate data from
Binet IV and DAS tasks, which measure word
multiple instruments. As suggested by Kaufman
knowledge by requiring a child to retrieve word
(1994) ªcrucial educational decisions are some-
definitions from long-term storage. Like the
times made on the basis of a psychological
WJ-R crystallized subtests, Riddles requires a
evaluation, and these decisions should be
one-word response; it is, therefore, a good
supported by ample evidenceº (p. 326) so that
Wechsler supplement to help discern whether a
initial hypotheses are verified. This section
low V-IQ is due more to conceptual problems or
describes and discusses several cognitive tests
expressive difficulties. The K-ABC offers some
in terms of their value when integrated with
alternative modalities of receiving input and
results from the Wechsler Scales.
expression of response, to supplement Wechsler
subtests that mainly use the auditory-vocal and
4.08.3.1 K-ABC Integration with Wechsler visual-motor channels of communication. The
Scales K-ABC offers three subtests which call for use
of the visual and vocal modalities (Magic
The K-ABC measures some of the same Window, Faces and Places, and Gestalt Clo-
abilities as the WISC-III and WPPSI-R, but also sure), and has one subtest that uses the auditory-
measures ability in ways that are different from motor channel (Word Order). In addition, the
the Wechsler scales, thereby contributing unique K-ABC taps the visual and semantic-motor
information about a child's cognitive function- channels with Reading Understanding, which
ing. The K-ABC Simultaneous Processing Scale requires a child to read a stimulus and do what it
is believed by some researchers to involve the says (i.e., ªStand upº).
same cognitive requirements as Wechsler's
Performance Scale (Das, Naglieri, & Kirby,
1994) and by others to be a measure of Visual 4.08.3.2 Integration of KAIT with Wechsler
Processing (Gv) (Horn, 1991). However, two Scales
Simultaneous subtests (Matrix Analogies and
Photo Series), involve more reasoning (and load The KAIT was developed from the Horn±
on two of Woodcock's (1990) factors: Fluid Cattell theory and yields both a Crystallized IQ
Reasoning (Gf) and Visual Processing (Gv)) and Fluid IQ. The three subtests comprising the
than Wechsler's Performance subtests: The KAIT Fluid Scale are very good supplements to
Sequential Processing Scale of the K-ABC is the Wechsler scales. As noted previously, there
an excellent addition to the Wechsler because it is controversy regarding how well the Wechsler
measures sequential processing more efficiently scales measure fluid abilities; thus, it is wise to
than any Wechsler subtest. That is, because the administer supplemental tests to tap an indivi-
only Wechsler test that can be viewed as dual's fluid reasoning ability and learning
measuring sequential processing is Digit Span ability. Assessment of planning ability, formal
Forward (Das et al., 1994) but the Digit Span operational thought, and learning ability may
Instrument Integration 225

be obtained through KAIT Fluid Subtests: separate Quantitative Factor (Gq), which also
Mystery Codes and Logical Steps. Problem- included Wechsler Arithmetic and WJ-R math
solving through verbal reasoning and verbal achievement subtests. The Binet IV Abstract-
comprehension is required in Logical Steps, and Visual subtests, except Matrices, were on the Gv
Rebus Learning demands vocal responding; factor, along with most Wechsler Perceptual
therefore, the KAIT Fluid Scale measures an Organization subtests. Matrices, however, had
ability that is quite different from Wechsler's P- a substantial loading on the Gf factor; it is,
IQ. If questions about an individual's planning therefore, an excellent addition to Wechsler's
speed arise from the primary battery adminis- Performance Scale.
tered, examiners may administer Mystery The Binet IV can be integrated with Wechsler
Codes to further assess planning speed. results, and can be especially helpful in assessing
To supplement Wechsler's Verbal Scale, the young children and mentally retarded indivi-
KAIT Crystallized subtests may be used. For duals because of the extension of its norms
assessing an individual's base of general factual down to age two. Response time is relatively
knowledge, Famous Faces may be adminis- unimportant on the Binet IV; therefore, it
tered to supplement WISC-III or WAIS-R provides several subtests to further evaluate
Information. Famous Faces uses pictorial hypotheses regarding a low score on the WISC-
stimuli integrated with verbal clues about III P-IQ or PO Index. If poor fluid intelligence is
famous people; whereas Information is a purely suspected, Pattern Analysis, Paper Folding and
auditory-vocal task. Formal operational Cutting, Matrices, and Number Series can be
thought within the Crystallized domain can administered. To further assess the comprehen-
be assessed through Double Meanings. Double sion knowledge ability measured by the WISC-
Meanings challenges examinees to unify ap- III, without requiring verbal comprehension,
parently disparate semantic stimuli. KAIT Absurdities is especially good to administer
subtests, Double Meanings and Definitions, because the stimulus is visual and minimally
can also provide follow-up to questionable verbal. For assessing whether a child's fluid
performance on tasks of word knowledge and reasoning ability generalizes to number manip-
verbal concept formation, such as Wechsler's ulation activities, the two Binet IV Quantitative
Vocabulary or Similarities. Auditory Compre- subtests are useful. The tasks not included on
hension can be used for questions regarding an the Wechsler scales such as Matrices, Equation
individual's memory and comprehension abil- Building, Number Series, and Verbal Relations,
ity. This subtest mimics a real-life situation, in can be used to further explore the reasoning
requiring an individual to listen to a mock news abilities of an individual.
broadcast and answer questions about it. The
two delayed recall (TSR) KAIT subtests are
also very good WISC-III supplements when 4.08.3.4 Integration of WJ-R with Wechsler
hypotheses are raised regarding an individual's Scales
memory.
Wechsler's Verbal IQ primarily can be viewed
as a measure of crystallized intelligence and
4.08.3.3 Integration of Binet IV with Wechsler short-term memory; Performance IQ as a blend
Scales of fluid reasoning, visual processing, and
processing speed (Kaufman, 1994b). However,
As noted in the earlier discussion of the Binet some researchers view Wechsler's Perceptual
IV, there is controversial and weak factor- Organization to be a measure of visual proces-
analytic support for the four Binet IV area sing (McGrew & Flanagan, 1996; Woodcock,
scores (Verbal, Abstract-Visual, Quantitative, 1990). Long-term retrieval is not specifically
and Short-term Memory). The relationship measured by the Wechsler scales nor is auditory
between Wechsler's Verbal and Performance processing. And, if Woodcock and others are
IQs and the Binet IV Area scores is not clear- correct, then fluid reasoning also is not
cut. In a correlational analysis with the WISC-R measured very well by the Wechsler scales.
and Binet IV (Thorndike et al., 1986), the Therefore, WJ-R Cognitive subtests provide
WISC-R and Binet IV Verbal scales were found excellent tasks for extending assessment from
to relate substantially to each other. However, Wechsler subtests using the WJ-R tests which
Kaufman (1994b) notes that the Absurdities were developed to reflect Horn's pure factors.
subtest probably lowered the relationship with The WJ-R provides subtests which are con-
the Verbal IQ and increased the correlation with trolled learning tasks, allowing the assessment
the Performance IQ because it uses visual of a person's learning ability. Conventional
stimuli. In Woodcock's (1990) factor analysis, intelligence tests, including Wechsler's, do not
the Binet IV Quantitative subtests loaded on a typically measure this ability. The controlled
226 Intellectual Assessment

learning subtests include the following: Memory Picture Completion and Information. The
for Names and Visual±Auditory Learning (both Long-term Retrieval subtests of the WJ-R
Long-term Retrieval tasks), and Analysis± provide a good assessment of the long-term
Synthesis and Concept Formation (both Fluid memory function; therefore, these tasks com-
Reasoning tasks). Whereas the Wechsler Per- plement the Wechsler subtests for supplemen-
formance subtests emphasize visual-motor co- tary analysis. In addition, the WJ-R tests
ordination and speed of response, provide a measure Auditory Processing and
Analysis±Synthesis and Concept Formation Visual Processing, which have strong perceptual
involve no motor coordination at all, and speed components. These perceptual processes are not
of response is not a major variable in determin- typically evaluated in most tests of intelligence,
ing a person's performance level. but need to be assessed in cases with possible
The WJ-R provides several subtests from neuropsychological difficulties.
which to choose, so when questions arise
regarding Wechsler's Perceptual Organization
construct (including visual processing and fluid 4.08.3.5 DTLA-3 Integration with Wechsler
abilities, as noted above), the WJ-R is an Scales
excellent tool to investigate these hypotheses.
The DYLA-3 has several theoretical under-
The different aspects of the information-
pinnings, including models such as fluid and
processing model are measured by four WJ-R
crystallized intelligence, simultaneous and suc-
factors including Gv (input), Gf (integration),
cessive processes, and verbal and performance
Glr (storage), and Gs (output). A high or low
abilities. DTLA-3 subtests may be used to
score on Wechsler's Performance scale should
augment the Wechsler scales in several instances.
be explored further to determine what aspects of
To further assess perceptual organization abil-
an individual's information-processing may
ity, fluid ability, and simultaneous processing,
have affected this asset or deficit. If an
Design Reproduction or Symbolic Relations
individual is suspected of having a deficit or
may be administered. For hypotheses regarding
strength in their nonverbal visual-spatial ability,
similar fluid abilities, but tapping sequential
requiring further testing for clarification, then
processing, examiners may administer Design
WJ-R Spatial Relations is a useful tool to make
Sequences. DTLA-3 Design Reproduction is
that determination. The other Fluid Reasoning
also a good supplement Wechsler Performance
subtests, previously mentioned, have a heavy
subtests if there is a question regarding a
verbal component and do not assess visual-
person's ability being hampered by response
spatial skills, although they do use figural
speed tests. This test does require visual-motor
material. (One precaution to note is that
coordination but places minimal demands on
cognitive tests in the WJ-R battery are heavily
speeded performance. Like the K-ABC, the
entrenched in the tradition of measuring
DTLA-3 offers some alternative modalities of
intelligence through predominantly verbal
receiving input and expression of response to
means (Kaufman, 1990)).
supplement Wechsler subtests that mainly use
For assessing strengths or weaknesses within
the auditory-vocal and visual-motor channels of
the auditory-vocal channel, the following WJ-R
communication. The DTLA-3 offers two subt-
factors may be used: Ga (input), Gc (integra-
ests which call for use of the visual and vocal
tion), and Gsm (storage). These factors can be
modalities (Story Construction and Picture
helpful in clarifying questions raised in the
Fragments), and has one subtest that uses the
Verbal scale of the Wechsler test. On the WJ-R,
auditory-motor channel (Reversed Letters).
two Gc tasks require one-word responses, which
make them good when you do not know if a low
Verbal score reflects poor concepts or poor 4.08.3.6 Integration of DAS with Wechsler
expression. Auditory-perceptual tasks on the Scales
Ga scale assess whether a child can perceive
words in isolation (through filling in the gaps or The six Core subtests of the DAS create three
by blending sounds). However, for assessing a separate scales for children, namely: Verbal,
processing deficit of longer auditory input, Spatial, and Nonverbal Reasoning. The WISC-
additional subtests may be needed (such as the III Verbal Comprehension subtests (specifically
Cognitive Assessment System subtests Verbal- Vocabulary and Similarities) have been noted to
Spatial Relations, Sentence Repetition, and be quite similar to the DAS Verbal Scale
Sentence Questions). (Kaufman, 1994). The DAS Verbal, Spatial,
Wechsler scales do not assess long-term and Nonverbal Reasoning scales have been
memory over the period of a few minutes, shown to correspond to the Woodcock±
although they do measure short-term memory Johnson Revised factors of Gc, Gv and Gf,
with Digit Span and remote memory with respectively (McGhee, 1993).
Future Directions 227

The two subtests comprising the DAS important distinctions. The CAS offers a verbal
Nonverbal Reasoning Scale provide an excel- test of simultaneous processing (Verbal-Spatial
lent addition to the WISC-III because they are Relations), one that involves memory (Figure
quite different from WISC-III subtests. The Memory), and one with complex demands
Nonverbal Reasoning subtests (Matrices, and (Nonverbal Matrices). The addition of the
Sequential and Quantitative Reasoning) mea- Verbal-Spatial Relations subtest is important
sure nonverbal reasoning without time limits because it integrates both nonverbal and verbal
but they do require visual-motor coordination, stimuli for the comprehension of logical
and minimize visualization. Thus, they can grammatical sentences. Similarly, the Succes-
provide good measures of an individual's pure sive processing Scale of the CAS provides tests
fluid ability. DAS subtests requiring visual- that demand immediate recall of information
motor coordination, but placing minimal (Word Series), and also measures that demand
demands on speeded performance include Re- comprehension of syntax (Sentence Repetition
call of Designs and Pattern Construction (when and Setence Questions) and where the involve-
the latter test is administered via special ment of immediate memory is markedly reduced
procedures). Thus, these subtests can be useful (Successive Speech Rate).
in following up hypotheses generated from The CAS offers a view of ability that reduces
Wechsler's Performance subtests, which reward the influence of language and achievement, and
quick performance. therefore, provides additional information from
the WISC-III or the WPPSI-R to evaluate the
performance of children who are bilingual or
4.08.3.7 Integration of CAS with Wechsler whose educational history is problematic.
Scales Because the CAS does not have achievement
or language based tests like the Wechsler (e.g.,
Like the other tests included in this chapter, Arithmetic or Vocabulary) the reduction in the
the CAS has some overlap with the WISC-III, involvement of acquired knowledge provides an
but because its conceptualization is based on the opportunity to evaluate children whose poor
PASS theory unique information about a child's school history or language difference may have
cognitive functioning can be obtained. The CAS lowered their Wechsler scores. In such a
Planning and Attention Scales require processes situation the CAS scores can assist the psychol-
that can not be effectively assessed by the ogist in determining the extent to which low
Wechsler Scales (Das et al., 1994). In order to Wechsler scores may reflect language/achieve-
measure planning adequately, tests that evalu- ment issues rather than low intellectual ability.
ate the child's ability to decide how to solve
problems, and determine their effectiveness are
required. This means that the child must be 4.08.4 FUTURE DIRECTIONS
given the opportunity to compete tasks in
planful ways, unencumbered by rules imposed Intellectual assessment has changed a great
by the test. Additionally, items that are deal in the twentieth century. It has moved from
influenced by the child's plan rather than other assessments based on language and speech
factors (e.g., spatial or verbal skills) are needed. patterns to sensory discrimination, with most
Tests of this type are not found on the WISC- early assessments being for the mentally
III. Attention tests should demand the focus of deficient. Gradually, the assessment instru-
cognitive activity and selective attending to ments developed into the precursors of the
particular information while avoiding distrac- standardized instruments used in the 1990s,
tion. Carefully constructed measures of atten- which measure more complex cognitive tasks
tional processes are not included on the for all levels of cognitive ability. The most
Wechsler, yet this, as well as planning processes, commonly used tests, the Wechsler scales, were
are especially important when evaluating chil- not developed out of theory, but were guided
dren, especially those with attention deficits and rather by clinical experience. In the progression
learning disabilities, for example. The measure- of test development, the relative alternates to
ment of Planning and Attention offer important the Wechsler have been more theory-based. The
cognitive functions that extend beyond the direction in test development in the 1990s seems
WISC-III and therefore offer additional in- to be continuing to lead to theory being at the
formation for diagnosis as well as intervention base of intelligence tests, instead of being just
(Das et al., 1994). purely clinically driven. Psychometric theories
The CAS, like the K-ABC, provides a and neurological theories are growing as the
measure of simultaneous processing that is basis for new instruments. However, despite this
similar to the demands of Wechsler's Perfor- proliferation of new theory-driven tests, there is
mance Scale (Das et al., 1994) but there are a definite conservatism that holds on to the past,
228 Intellectual Assessment

reluctant to let Wechsler tests be truly rivaled. cent, and adult intelligence are discussed. The
Part of this hold on the past is research-based Wechsler scales are generally used by examiners
and part is clinically-based. Because the Wechs- as the primary instrument in an assessment
ler tests have been reigning supreme for so long, battery. However, there are multiple other
there has been a mountain of research studies excellent instruments available for assessing
using the Wechsler scales. Thus, clinicians have cognitive ability. The following instruments are
a good empirical basis to form their under- discussed: WPPSI-R, WISC-III, WAIS-R, K-
standing of what a specific Wechsler profile may ABC, KAIT, Binet-IV, WJ-R Tests of Cognitive
be indicating. Clinically, psychologists are also Ability, DTLA-3, DAS, and CAS. For each, the
quite comfortable and familiar with the Wechs- theory or theories underlying the instrument is
ler scales. A good clinician who has done many presented, followed by the standardization and
assessments may be familiar enough with every properties of the scale, including research using
nook and cranny of the WAIS-R to barely need the scale, and each is completed with an
the manual to administer it. Thus, the field so far overview of the instrument.
has changed relatively slowly. Computer based One of the main principles of the intelligent
technology is likely going to ultimately shape testing philosophy discussed is that hypotheses
the field of assessment by 2020. Computer generated from the profile of the main assess-
scoring programs and computer assisted reports ment instrument should be supported with data
are already in use, and the future is likely to from multiple sources. Accordingly, this chap-
include a much greater progression of techno- ter discusses how to supplement the Wechsler
logically advanced instruments for assessing scales with additional cognitive tasks to support
intelligence. or clarify hypotheses raised from the initial
cognitive battery. It is important for examiners
4.08.5 SUMMARY to be knowledgeable and insightful in choosing
supplemental measures to uncover exactly what
This chapter first introduces the assessment of an individual's cognitive strengths and weak-
intellectual ability through a brief overview of nesses are, in order to form the basis for good
its development throughout history. Some recommendations. Specific examples are given
important faces in the history of assessment for how examiners may supplement the Wechs-
include those such as Jean Esquirol, Edouard ler scales with each cognitive instrument
Seguin, Sir Francis Galton, James McKeen presented. For example, hypotheses raised in
Cattell, Alfred Binet, Lewis Termin, and David the Wechsler profile regarding fluid reasoning
Wechsler. The early pioneers in assessment may be further assessed by WJ-R subtests or
focused mainly on tests for the mentally KAIT subtests. If questions arise regarding the
deficient, but more recently in history tests effect of response speed, the Binet IV or K-ABC
were developed for assessing all levels of Simultaneous subtests may be useful supple-
intellectual functioning. The progression of test ments. The DAS provides good supplemental
development to the standardized instruments information about nonverbal reasoning ability.
we know today is reviewed in the beginning of Alternative modalities of receiving input and
this chapter. expression of response may be assessed by the
The debate over intelligence testing is also DTLA-3 or K-ABC subtests. The K-ABC
discussed in this chapter. Three groups of critics Achievement subtests may also provide addi-
are presented. One controversy is raised by tional information about an individual's verbal
those opposed to subtest interpretation advo- ability or crystallized abilities. Unique informa-
cated by Wechsler; another is those that insist tion about a child's planning ability may be
that all the Wechsler scales measure is g, found by supplementing a battery with the
rendering the different scales meaningless; and CAS. The multiple pieces of evidence provided
a final controversy is raised by a group who by the supplemental tests, can be carefully
complains that the instruments do not enhance integrated to confirm or deny hypotheses raised
remedial interventions. Kaufman's (1979, in an individual's cognitive profile. An illus-
1994b) intelligent testing approach is presented trative case report is presented at the conclusion
in response to the criticism. The aim of the basic of this chapter, which provides an example of
principles of intelligent testing presented in this how many different sources of evidence are
chapter is to encourage clinicians to approach combined to provide a clear description of a 13-
the task of profile interpretation using their year-old female's cognitive and academic
knowledge of research, theory, and clinical functioning.
skills, rather than being overly dependent on The future direction of intellectual assessment
specific scores. appears to be in theory-driven instruments. The
With this important philosophy of intelligent psychological community has held tightly onto
testing in mind, 10 measures of child, adoles- the clinically-based Wechsler scales because of
Illustrative Case Report 229

their large research base, clinical familiarity, state and Laura visits him once or twice a year.
and just tradition. Thus, psychometric and Laura has also had other adults serve as
neurological theories are becoming an impor- caretakers for her. From ages 1±3, her mother
tant base for the newer instruments of assess- indicated that two other people helped care for
ment, although their use by the majority of her, and from ages 4±13, five housekeepers also
clinicians has only slowly been occurring. provided care for her.
Technological advances are also eminently Mrs. S. reported that she had a normal, full-
going to be further impacting the field of term pregnancy with Laura. There were no
assessment. The use of computer science and problems during the birth, and Laura was born
neurological measurement will likely begin to weighing six pounds, nine ounces after a short
change the face of cognitive assessment within labor of only 15 minutes. Laura's medical
the next few decades. history is relatively unremarkable, with her
parents noting only that she experienced a
bilateral hernia at nine months and had several
4.08.6 ILLUSTRATIVE CASE REPORT earaches and colds as a younger child. Mr. and
Mrs. S. stated that their daughter has never been
The following case report is of Laura S., a 13-
hospitalized, has had no major injuries, or
year-old seventh grader, experiencing difficulty
diseases. All of Laura's developmental mile-
on standardized tests in the areas of vocabulary,
stones were ªon timeº or ªearlyº according to
reading comprehension, and writing mechanics.
her mother.
This report illustrates the methods and proce-
A medical question of Laura's parents had
dures for test integration and interpretation
been her hearing ability, due to her noted
described earlier in this chapter. Table 12
difficulty sometimes distinguishing particular
provides the specific scores earned by Laura
sounds in school and occasional single-word
on each instrument administered.
substitutions. After discussing this at the
current evaluation's intake interview, Mr. and
4.08.6.1 Referral and Background Information Mrs. S. took Laura to a physician at a local
university's Medicine Ear Institute to rule out
Laura was brought in for an evaluation by her any potential hearing loss. According to a letter
parents, Linda and Rob (Mr. & Mrs. S) who sent by the physician, he performed a physical
were referred for an assessment by her current exam and audiogram for Laura. He stated that
school, due to concern about the inconsistency her audiogram ªrevealed normal thresholds
between Laura's high grades at school and her bilaterally with excellent speech discrimina-
lower standardized test scores in some areas. tion,º and ªshe appears to have normal
From Laura's scores on the Educational Record hearing.º
Bureau's (ERB) Comprehensive Testing Pro- Laura's educational history began when she
gram (CTPII), the main area of concern to entered preschool at age three. According to her
Laura's parents was in her Verbal Ability, parents, she had no difficulties with beginning
specifically, Vocabulary, Reading Comprehen- preschool or transitioning to her next school.
sion, and Writing Mechanics. Mr. and Mrs. S. All of Laura's report cards from first through
would like to gain a better understanding of seventh grades indicate consistent, excellent
Laura's difficulty with her mathematics courses, achievement. Most of her teachers commented
reading comprehension, and memory retention. on her conscientious and serious approach to
Laura's parent's reported that she does not learning, and express their great pleasure having
always seem to understand written instructions her as a student.
and asks a lot of questions. They would A classroom observation of Laura was
specifically like recommendations to help Laura performed for this evaluation. Laura was
improve her learning ability, enhance her observed in her Honors math class. Laura
memory, and develop a greater aptitude for was very friendly, chatting with her friends and
understanding directions. brightly greeting the teacher right before class.
Laura is a 13.5-year-old adolescent girl who However, as soon as the class began, she got
has lived at home with her mother and step- right to work. When the instructor asked
father since age three. At age 2.5., her mother questions of the class, Laura raised her hand
and biological father separated, and, other than to every question. She was focused on the work
for one month of her life, she has lived with her all throughout the class, and seemed motivated
mother. Laura also has a 20-year-old step-sister, to do well in math. She asked the teacher for
Kelly, and a 23-year-old step-brother, Rick, assistance several times during the observation.
who do not live with her at home. Mr. and Mrs. In an interview, the teacher stated that Laura
S. both work full-time as professionals outside frequently asks for extra help, but it is her
the home. Laura's biological father lives out of teacher's belief that this is mainly because ªshe
230 Intellectual Assessment

Table 12 Laura: Psychometric summary.

WISC-III

Percentile Percentile
Scale IQ rank Factor Index rank
Verbal scale 114+5 82 Verbal comprehension 120+5 91
Performance scale 95+5 37 Perceptual organization 94+6 34
Full scale 106+4 66 Freedom from distractibility 101+8 53
Processing speed 111+7 77

Percentile Percentile
Subtest Scaled score rank Subtest Scaled score rank
Information 11 63 Picture Completion 8 25
Similarities 14 91 Coding 14 91
Arithmetic 8 -W 25 Picture Arrangement 8 25
Vocabulary 12 75 Block Design 11 63
Comprehension 17 -S 99 Object Assembly 9 37
Digit span 12 75 Symbol Search 10 50

KAIT
Percentile
IQ rank (age)
Fluid scale 120+5 91
Standard Percentile
Subtest score rank (age)
Rebus Learning 16 98
Logical Steps 12 75
Mystery Codes 13 84

Standard
WJ-R score Percentile rank
Broad reading 118 89
Basic reading skills 125 95
Reading comprehension 113 81
Letter word identification 125 95
Passage comprehension 106 65
Word attack 117 87
Reading vocabulary 118 88
Broad mathematics 111 77
Basic math skills 115 85
Calculation 117 86
Applied problems 102 56
Quantitative concepts 108 70
Broad written language 105 64
Basic writing skills 97 41
Written expression 130 98
Dictation 95 37
Writing samples 123 94
Proofing 99 48
Writing fluency 134 99
Punctuation and capitalization 97 41
Spelling 100 50
Usage 98 44
Broad knowledge 99 48
Science 97 42
Social studies 96 40
Humanities 110 74
Illustrative Case Report 231

Table 12 (continued)

Standard
WJ-R (intra-achievement Actual Predicted deviation Percentile
discrepancies) standard score standard score difference rank
Broad reading 118 105 +1.48 93
Broad mathematics 111 106 +0.52 70
Broad written language 105 109 70.45 32
Broad knowledge 99 113 71.20 12

does not trust her gut.º The teacher said that please, but also her uncomfortableness with
Laura always begins the class with a happy, ambiguity in a situation. She appeared much
cheerful mood, but she tends to ªstress outº more relaxed when the situation was structured
when problems become difficult. and she knew clearly everything that was
In discussing family history related to Laura's expected of her.
difficulties, Mrs. S. indicated that she feels she Laura demonstrated a strong ability to
also has ªa poor memory and retrieving skills.º concentrate and focus. She had great stamina
Mr. and Mrs. S. noted that Laura has several throughout the mentally challenging evalua-
strengths, including her strong intuitive abil- tion. She was persistent in always refusing
ities, her confidence, and her persistence. Laura breaks offered to her and worked straight
is very popular with her peers and is viewed as a through during each session, displaying extreme
leader by many of her teachers. self-control. She was cooperative and friendly,
In an interview with Laura, she stated that she often helping the examiner put away stimulus
does not enjoy reading but likes using her materials and always following directions.
creativity in writing. She said that she takes Encouragement from the examiner was wel-
school more seriously than most of her peers, comed warmly with a smile by Laura. She
and has set very high standards for herself. She gradually became less cautious in her casual
explained that she is harder on herself than her conversation with the examiner as the testing
parents are, especially when it comes to progressed. She began to share bits and pieces of
academics. In her free time Laura works her life and showed a well-rounded self.
diligently on her homework and enjoys playing Laura always tried her best. Even after having
tennis and soccer, and socializing with her attempted problems that were difficult for her,
friends. she did not lose her motivation to keep trying. In
solving problems she worked quickly, but was
reflective. She would continue to check her
4.08.6.2 Appearance and Behavioral work, even after she was done, always being
Characteristics careful in her responses. Laura expressed
Laura is a mature, pretty, 13-year-old seventh anxiety about having to solve mathematical
grader. She wore her thick brown hair parted problems mentally, saying, ªI can't do things in
stylishly down the middle. For each of the my head, without pencils and things.º She was
evaluation sessions she dressed casually and more tentative in answering such questions. On
neatly in jeans and a sweatshirt. She talked more difficult nonverbal tasks, Laura tended to
comfortably when asked questions by the analyze the situation first and then proceed with
examiner and spoke in soft voice with an air the task. Again, evidence of her trying carefully
of politeness and good manners. She tended to to do her best and not make mistakes. On the
respond with short phrases rather than in basis of her behavior during the evaluation these
complete sentences when answering questions results should be considered a valid representa-
posed by the examiner. However, this did not tion of her ability.
detract from her ability to articulate her
thoughts clearly. 4.08.6.3 Tests Administered
She seemed eager to please the examiner,
often asking for clarification about what was (i) Wechsler Intelligence Scale for Children-
the right thing to do during a certain subtest. 3rd edition (WISC-III)
For example, during a task which required her (ii) Woodcock±Johnson-Revised (WJ-R):
to look at a picture and tell what important part Tests of Achievement
was missing, she asked, ªCan I tell you what's (iii) Kaufman Adolescent and Adult Intelli-
wrong with it?º Her frequent questions to the gence Test (KAIT): Fluid Subtests
examiner indicated not only her anxiousness to (iv) Rotter Incomplete Sentences.
232 Intellectual Assessment

4.08.6.4 Test Results and Interpretation the problems, which Laura was more comfor-
table with. Her performance on problems of
Laura was administered a series of cognitive calculation, such as addition, subtraction,
tests to assess her information processing multiplication, and division with multiple digit
abilities. According to the WISC-III, Laura is numbers, decimals, and fractions, was signifi-
functioning currently within the Average to cantly better than her performance on applied
High Average range of intelligence. She problems requiring her to use mathematics to
obtained a Verbal IQ score of 114+5 (82nd solve problems involving scenarios with money,
percentile), Performance IQ score of 95+5 distance, and weight. On the Woodcock±
(37th percentile), and Full Scale IQ score of Johnson-Revised Mathematics Subtests, she
106+4 (66th percentile). In addition, she also scored at the 86th percentile on calculation
obtained a Verbal Comprehension Index of and 56th percentile on problems that were
120+5, which was significantly higher than her applied. Thus, her weakness in Arithmetic on
Freedom from Distractibility Index of 101+8. the WISC-III is not due to poor computation
However, this difference within her verbal scale ability, but rather due to the fact that she needs
was due mainly to her difficulty computing the concrete visual stimulus of written numbers
arithmetic problems mentally. Laura's Proces- in order to utilize the mathematical knowledge
sing Speed Index of 111+7 was significantly that she does have.
higher than her Perceptual Organization Index Laura's variability in her performance on the
of 94+6, indicating that she performed better Verbal Scale of the WISC-III was accentuated
on tests of visual processing speed than on tests by her extremely high score (99th percentile) on
of nonverbal reasoning. a test of social judgment, verbal reasoning, and
It is important to note that Laura's Verbal practical knowledge. Her strong verbal ability
subtest scores exhibit a significant amount of was apparent during this subtest, as well as on a
scatter, suggesting that her Verbal Comprehen- subtest requiring her to use abstract reasoning
sion and Perceptual Organization Indices to describe how two things are similar. This
provide a more meaningful picture of her strength was also paralleled by her excellent
abilities than the overall Full Scale IQ, Verbal performance on WJ-R tests of written expres-
IQ, or Performance IQ. The variance in her sion, on which she earned an overall score at the
Verbal Scale indicates that some of her abilities 98th percentile. She can come up easily with
are more well developed than others. The vocabulary to express her ideas and is able to
difference between her verbal and nonverbal formulate alternative ways to get her point
abilities, as reflected by her Factor Indices, is across if it is unclear. However, the mechanical
unusually large and statistically significant. details of written expression are more difficult
Because there is a 26-point difference in favor for Laura. For example, she earned a lower
of her Verbal Comprehension Index over her score on a test of dictation (37th percentile),
Perceptual Organization Index (occurring in which examined her spelling, punctuation, and
less than 5% of normal children), her Full Scale word usage. Laura's performance in spelling,
IQ should not be used as an indication of her punctuation, and word usage was at a lower
overall ability. A fuller and clearer picture of level than her overall Written Expression ability
Laura's abilities can be found by looking at her (41st, 50th, and 44th percentiles, in contrast to
performance in individual areas rather than 98th percentile). Thus, she is able to get her
considering the statistical average of these verbal ideas across, but is lacking skill in the
various abilities. grammatical rules and details of written
In the verbal area, Laura demonstrated a expression.
significant weakness in Arithmetic, earning a Laura's verbal reasoning ability also ap-
score in the 25th percentile. The WISC-III peared stronger than her knowledge of general
arithmetic does not allow use of paper and factual knowledge. She scored at the 63rd
pencil to do computation, so Laura was percentile on a WISC-III task requiring her to
required to manipulate the numbers mentally answer questions about information acquired
to solve auditorally and visually presented from to formal schooling. However, her
problems. She expressed several times during performance on WJ-R tests of Broad Knowl-
this test that she needed to see the numbers and edge was lower than expected, given her high
have them down concretely in front of her to level of academic achievement at school, as well
figure the problems out. Her weakness on as her performance on other WJ-R tests of
WISC-III Arithmetic was in contrast to her achievement. On subtests covering areas such as
higher scores in the mathematics area on science, social studies, and humanities, Laura
Woodcock±Johnson-Revised Tests of Achieve- scored in the Average level at the 48th
ment (WJ-R). All of the mathematics on the WJ- percentile. A person with Laura's total achieve-
R allowed the use of paper and pencil in solving ment performance would have been expected to
Illustrative Case Report 233

earn a slightly higher score on these tests of this task she used her good verbal concentra-
Broad Knowledge. Her actual standard score tion, expression ability and memory to succeed.
was 1.2 standard deviations lower than what She also performed quite well on tasks that
was predicted in this area, indicating that she is required her to use abstract reasoning with
not achieving at a level consistent with what novel stimuli and planning ability.
would be expected given her level of achieve-
ment. This is also reflected in her lower ERB
group standardized test scores that are dis- 4.08.6.5 Summary and Diagnostic Impressions
crepant from her higher school grades. The
ERB scores reflect strictly facts derived from Laura is a mature, pretty 13-year-old girl who
multiple choice exams. However, her interac- was brought in for a psychoeducational evalua-
tion with teachers, participation and perfor- tion by her parents due to their concern with
mance at school allow the teachers to inconsistency between her standardized tests
understand Laura as a more complete person, scores and high grades at La Jolla Country Day
which positively influences her grades. School. Her parents' main areas of concern
Laura's performance on WJ-R Broad Read- involve her abilities in vocabulary, reading
ing (89th percentile) is also commensurate with comprehension, writing mechanics, and mathe-
her strong verbal reasoning skills evidenced. matics. Mr. and Mrs. S. wanted a better
However, her score on a task measuring understanding of these discrepancies in her
comprehension of a written passage was some- performance, as well as her difficulty with
what lower than expected (65th percentile) given memory retention and understanding written
her strong verbal abilities. Nonetheless, it is in directions.
the average range and not low enough to be of Laura demonstrated strong motivation to
concern. Overall, Laura's strong performance please the examiner, persistence, and stamina
on Broad Reading ability was 1.48 standard during her evaluation. She was reflective in her
deviations above what would be predicted for problem solving and careful in her responses,
an individual with her total achievement trying hard to perform to the best of her ability.
performance, so she is demonstrating strong She earned scores in Average to High Average
ability with her reading skills. Only 7% of range of intelligence on the WISC-III. Her
students Laura's age, who had the same factor indices and specific strengths and weak-
expected standard score as she, scored as high nesses gave the most meaningful picture of
or higher than Laura on Broad Reading Laura's abilities, due to the scatter in her verbal
Subtests of the WJ-R. subscales. She performed significantly better on
Laura's significantly higher score on the tests of verbal reasoning and word knowledge
Verbal Comprehension Index than the Percep- than on tests of nonverbal ability and visual-
tual Organization Index of the WISC-III perceptual skills. She demonstrated a weakness
indicates that her verbal abilities are more well on a task requiring her to mentally solve
developed than her nonverbal and perceptual auditorally and visually presented arithmetic
abilities. However, even on the nonverbal problems. However, this was due to her
performance tests, all of her scores were at or difficulty performing mathematical calculations
above the Average level (25th to 91st percen- in her head without the concrete visual help of
tiles). For example, Laura performed well on a pencil and paper, as evidenced by her higher
task requiring her to use short-term memory in scores on WJ-R tests of Mathematics that allow
copying a series of different symbols from a use of pencil and paper for problem-solving.
visually presented key. She scored at the 91st Her strengths were in the area of verbal
percentile on this task and scored at the 63rd reasoning. This was congruent with her strong
percentile on a another nonverbal task requiring ability to express herself in a written format on
her to reproduce a model using blocks. the WJ-R and also with her overall reading
Her cognitive skills were further assessed by ability demonstrated on the WJ-R. Although
the Fluid Scale of the Kaufman Adolescent and her ability to express herself verbally and in
Adult Intelligence Test (KAIT). The KAIT writing was good, her skills in the details of
Fluid subtests measure one's ability to solve writing, such as spelling, punctuation, and
novel problems using reasoning, memory, usage were not as strong. In addition, her
paired-associate learning, verbal comprehen- performance on a test of passage comprehen-
sion, and perceptual organization. Laura scored sion was not as high as expected given her
well above average (standard score 120+5) on overall verbal abilities. In the area of general
these tasks involving her ability to solve novel knowledge, including science, social studies,
problems. She scored at the 98th percentile on a and humanities, Laura did not perform as well
task that essentially required her to learn a new as one would predict given her other achieve-
language through paired-associate learning. On ment scores.
234 Intellectual Assessment

On an additional cognitive test measuring solving or reasoning as learning methods, such


Laura's ability to solve novel problems, she as Social Studies. If her school classes present
demonstrated well above average abilities. On material in a factual, ªmemorize thisº style, then
the KAIT Fluid subtests, Laura's performance Laura needs to study in more creative way to
indicated that she has strong ability to learn new enhance her learning and reduce boredom. For
material through paired-associate learning. She example, she can make up a story about a
also evidenced above average ability to use character that may have lived in a period of
abstract reasoning and planning ability with history that she is learning about. As another
novel stimuli. example, Laura can draw and illustrate a time-
Laura has learned very well to compensate for line of important historical events to remember.
her areas of weakness, which is evident in her (iv) To incorporate a different method of
high grades at school. Her ability to ask studying subjects that are especially tedious to
questions when uncertain, and create a struc- Laura, she could study with another diligent
tured environment for herself so she is most student. Similarly, she may want to get together
comfortable are examples of such compensatory with a group of peers and create a quiz show to
strategies. Additionally, this strength is reflected reinforce and study facts that may seem boring
in her higher grades at school compared to her to her when she is studying alone.
ERB scores. Her ability to express herself well in (v) To encourage her to become comfortable
writing and vocally, such as on essay tests or with and reinforce information that may be
class discussions, helps her grades at school but presented in a test format such as the ERB tests,
is not able to aid her in the cut-and-dry Laura may create a competition for herself. For
responses needed for the ERB tests. Her strong example, on a weekly basis she could give herself
ability to solve novel problems will be quite a test on a content area (such as those in books
useful to Laura in continuing to creatively create created for SAT preparation), and then she
other strategies when new difficulties appear in could chart her own progress from week to
her life. week.
(vi) It is important to reassure Laura that it is
fine to use the compensatory strategies that
4.08.6.6 Recommendations work for her, such as writing down arithmetic
problems. Her ability to figure out such strate-
The following recommendations have been gies is a strength that was evident in her ability
made to assist Laura and her parents in to solve novel problems and can be used to help
enhancing her learning ability. her in areas that are more difficult for her.
(i) Laura is a highly self-motivated student (vii) To help Laura increase her overall
who has created very high expectations for academic abilities, Laura will benefit from
herself, and has been working very hard to broadening her base of what she considers
meet those high standards. At times this causes ªstudying.º Studying includes not only com-
anxiety for her, and when this anxiety reaches pleting assignments given at school and prepar-
an unmanageable level it may cause difficulty ing for exams, but also includes an awareness of
and decreased performance. It will therefore be one's environment outside of the school con-
useful for Laura and her parents to discuss that text. In broadening her conceptualization of
it is okay not to be perfect in every area of her studying, for example, Laura may incorporate
life, every minute of the day, and to allow Laura more pleasure reading of nonschool books or
to experience those instances of nonperfection. magazines into her weekly routine, which will
Tolerance and appreciation for her own con- benefit her grammar and vocabulary. She may
tinuum of strengths to weaknesses needs to be watch a movie or television program and relate
developed to keep Laura feeling good about it to some topic she is studying in school. This is
herself. important as people who improve their general
(ii) As Laura works so hard at not making learning, tend to do better overall on standar-
any mistakes, she is sometimes hesitant to dized types of testing.
proceed without asking many questions to (viii) Laura demonstrated strong written ex-
prevent making any sort of error. She demon- pression ability, but weaker ability to incorpo-
strated strong abilities in many areas and rate correctly details of grammar such as
therefore should be encouraged to go with spelling and punctuation. These details may
her gut feeling, and try to attempt problems be less interesting to Laura, thus, again she may
that normally she may immediately ask for help want to use her problem-solving ability to create
on. This will further foster her sense of inde- more interesting ways to learn such details.
pendence and confidence in her own abilities. There are many excellent computer programs
(iii) Laura expressed being uninterested in that help teach grammatical details, in an
some subjects that have not stressed problem- interesting manner. As Laura tends to strive
References 235

for excellence, she may set goals for herself Das, J. P., Kirby, J., & Jarman, R. F. (1979). Simultaneous
according to a computer program she is work- and successive cognitive processes. New York: Academic
Press.
ing with to help her. Das, J. P., Naglieri, J. A., & Kirby, J. (1994). Assessment of
(ix) Laura is very conscientious and focused cognitive processes. Boston: Allyn & Bacon.
in her school work. She may benefit and find it Delugach, R. (1991). Test review: Wechsler Preschool and
rewarding to become a peer tutor for a lower Primary Scale of Intelligence-Revised. Journal of Psy-
choeducational Assessment, 9, 280±290.
grade child (such as a fourth- or fifth-grade Dumont, R., & Hagberg, C. (1994). Test reviews: Kaufman
student). Tutoring a child who is having trouble Adolescent and Adult Intelligence Test (KAIT). Journal
with grammar will reinforce rules in Laura's of Psychoeducational Assessment, 12, 190±196.
mind as she gains self respect by being ap- Elliott, C. D. (1990). Differential Ability Scales (DAS)
pointed to teach someone else. administration and scoring manual. San Antonio, TX:
Psychological Corporation.
Elliott, C. D. (1997). The Differential Ability Scales (DAS).
In D. P. Flanagan, J. L. Gensaft, & P. L. Harrison
ACKNOWLEDGMENTS (Eds.), Beyond traditional intellectual assessments: Con-
temporary and emerging theories, tests, and issues
The authors would like to thank Drs. Kristee (pp. 183±208). New York: Guilford Press.
A. Beres, Randy Kamphaus, Nadeen L. Kauf- Evans, J. H., Carlsen, R. N., & McGrew, K. S. (1993).
man, Jack A. Naglieri, and Mitch Perlman for Classification of exceptional students with the
their contributions to this chapter. Woodcock±Johnson Psycho-Educational Battery-Re-
vised. In R. S. MacCallum (Ed.), Journal of Psychoedu-
cational Assessment monograph series. Advances in
psychoeducational assessment: Woodcock±Johnson
4.08.7 REFERENCES Psycho-educational Battery-Revised (pp. 6±19). German-
American Psychological Association (1990). Standards for town, TN: Psychoeducational Corporation.
educational and psychological tests and manuals. Wa- Flanagan, D. P., Alfonso, V. C., & Flanagan, R. (1994). A
shington, DC: Author. review of the Kaufman Adolescent and Adult Intelli-
Bain, S. K. (1991). Test reviews: Differential ability scales. gence Test: An advancement in cognitive assessment?
Journal of Psychoeducational Assessment, 9, 372±378. School Psychology Review, 23, 512±525.
Binet, A., & Henri, V. (1895). La psychologie individuelle. Flynn, J. R. (1984). The mean IQ of Americans: Massive
L'AnneÂe Psycholgique, 2, 411±465. gains 1932 to 1978. Psychological Bulletin, 95, 29±51.
Binet, A., & Simon, T. (1905). MeÂthodes nouvelles pour le Glutting, J. J., McDermott, P. A., Prifitera, A., &
diagnostic du niveau intellectuel des anormaux. L'AnneÂe McGrath, E. A. (1994). Core profile types for the
Psychologique, 11, 191±244. WISC-III and WIAT: Their development and applica-
Binet, A., & Simon, T. (1908). Le deÂveloppement de tion in identifying multivariate IQ-achievement discre-
l'intelligence chez les enfants. L'AnneÂe Psychologique, 14, pancies. School Psychology Review, 23, 619±639.
1±94. Goldern, C. J. (1981). The Luria-Nebraska Children's
Bogen, J. E. (1975). Some educational aspects of hemi- Battery: Theory and formulation. In G. W. Hynd & J. E.
spheric specialization. UCLA Educator, 17, 24±32. Obrzut (Eds.), Neuropsychological assessment of the
Bracken, B. A. (1985). A critical review of the Kaufman school-age child issues and procedures (pp. 277±302).
Assessment Battery for Children (K-ABC). School New York: Grune and Stratton.
Psychology Review, 14, 21±36. Hammill, D. D. (1991). Interpretive Manual for Detroit
Braden, J. P. (1992). Test reviews: The differential ability Tests of Learning Aptitude: (3rd ed.). Austin, TX: PRO-
scales and special education. Journal of Psychoeduca- ED.
tional Assessment, 10, 92±98. Hammill, D. D., & Bryant, B. R. (1991). Interpretive
Brown, D. T. (1994). Review of the Kaufman Adolescent Manual for Detroit Tests of Learning Aptitude-Primary:
and Adult Intelligence Test (KAIT). Journal of School Second Edition. Austin, TX: PRO-ED.
Psychology, 32, 85±99. Hansen, J. C., & Campbell, D. P. (1985). Manual for the
Buckhalt, J. A. (1991). A critical review of the Wechsler SVIB-SCII (4th ed.). Stanford, CA: Stanford University
Preschool and Primary Scale of Intelligence Revised Press (Distributed by Consulting Psychologists Press).
(WPPSI-R). Journal of Psychoeducational Assessment, 9, Horn, J. L. (1985). Remodeling old model in intelligence.
271±279. In B. B. Wolman (Ed.), Handbook of intelligence:
Canter, A. (1990). A new Binet, an old premise: A Theories, measurements, and applications (pp. 267±300).
mismatch between technology and evolving practice. New York: Wiley.
Journal of Psychoeducational Assessment, 8, 443±450. Horn, J. L. (1989). Cognitive diversity: A framework of
Cattell, R. B. (1963). Theory of fluid and crystallized learning. In P. L. Ackerman, R. J. Sternberg, & R.
intelligence: A critical experiment. Journal of Educational Glaser (Eds.), Learning and individual differences
Psychology, 54, 1±22. (pp. 61±116). New York: Freeman.
Cohen, J. (1957). A factor-analytically based rationale for Horn, J. L. (1991). Measurement of intellectual capabil-
the Wechsler-Adult Intelligence Scale. Journal of Con- ities: A review of theory. In K. S. McGrew, J. K. Werder,
sulting Psychology, 6, 451±457. & R. W. Woodcock (Eds.), Woodcock±Johnson Techni-
Cohen, R. J., Montague, P., Nathanson, L. S., & Swerdlik, cal manual: A reference on theory: and current research
M. E. (1988). Psychological testing. Mountain View, CA: (pp. 197±246). Allen, TX: DLM Teaching Resources.
Mayfield. Horn, J. L., & Cattell, R. B. (1966). Refinement and test of
Das, J. P. (1973). Structure of cognitive abilities: Evidence the theory of fluid and crystallized intelligence. Journal
for simultaneous and successive processing. Journal of of Educational Psychology, 57, 253±270.
Educational Psychology, 65, 103±108. Horn, J. L., & Cattell, R. B. (1967). Age difference in fluid
Das, J. P., Kirby, J. R., & Jarman, R. F. (1975). and crystallized intelligence. Acta Psychologica, 26,
Simultaneous and successive synthesis: An alternative 107±129.
model for cognitive abilities. Psychological Bulletin, 82, Hoy, C., Gregg, N., Jagota, M., King, M., Moreland, C., &
87±103. Manglitz, E. (1993). Relationship between the Wechsler
236 Intellectual Assessment

Adult Intelligence Scale-Revised and the Woodcock± Weighing psychometric, clinical, and practical factors.
Johnson Test of Cognitive Ability-Revised among adults Journal of Clinical Child Psychology, 25, 97±105.
with learning disabilities in university and rehabilitation Kaufman, A. S., Kaufman, J. C., Chen, T., Kaufman, N.
settings. In R. S. MacCallum (Ed.), Journal of Psycho- L. (1996). Differences on six Horn abilities for 14 age
educational Assessment monograph series. Advances in groups between 15±16 and 75±94 years. Psychological
psychoeducational assessment: Woodcock±Johnson Assessment, 8, 1±11.
Psycho-educational Battery-Rivised (pp. 54±63). German- Kaufman, A. S., & Lichtenberger, E. O. (in press). WAIS-
town, TN: Psychoeducational Corporation. III assessment made simple. New York: Wiley.
Inglis, J., & Lawson, J. (1982). A meta-analysis of sex Kaufman, A. S., & McLean, J. E. (1992, November). An
differences in the effects of unilateral brain damage on investigation into the relationship between interests and
intelligence test results. Canadian Journal of Psychology, intelligence. Paper presented at Annual meeting of the
36, 670±683. Mid-South Educational Research Association, Knox-
Inhelder, B., & Piaget, J. (1958). The growth of logical ville, TN.
thinking from childhood to adolescence. New York: Basic Kaufman, A. S., McLean, J. E., & Lincoln, A. (1996). The
Books. relationship of the Myers±Briggs Type Indicator to IQ
Jensen, A. R. (1980). Bias in mental testing. New York: level and fluid-crystallized discrepancy on the Kaufman
Free Press. Adolescent and Adult Intelligence Test (KAIT). Assess-
Kamphaus, R. W. (1993). Clinical assessment of children's ment, 3, 225±239.
intelligence. Boston: Allyn & Bacon. Kaufman, A. S., McLean, J. E., & Reynolds, C. R. (1988).
Kamphaus, R. W., Beres, K. A., Kaufman, A. S., & Sex, race, residence, region, and education differences on
Kaufman, N. L. (1995). The Kaufman Assessment the 11 WAIS-R subtests. Journal of Clinical Psychology,
Battery for Children (K-ABC). In C. S. Newmark 44, 213±248.
(Ed.), Major psychological assessment instruments (2nd Keith, T. Z. (1990). Confirmatory and hierarchical
ed., pp. 348±399). Boston: Allyn & Bacon. confirmatory analysis of the Differential Ability Scales.
Kamphaus, R. W., & Reynolds, C. R. (1987). Clinical and Journal of Psychoeducational Assessment, 8, 391±405.
research applications of the K-ABC. Circle Pines, MN: Keith, T. Z., & Dunbar, S. B. (1984). Hierarchical factor
American Guidance Service. analysis of the K-ABC: Testing alternate models. Journal
Kaufman, A. S. (1979). Intelligent testing with the WISC-R. of Special Education, 18, 367±375.
New York: Wiley. Kinsbourne, M. (Ed.) (1978). Asymmetrical function of the
Kaufman, A. S. (1983). Intelligence: Old conceptsÐnew brain. Cambridge, MA: Cambridge University Press.
perspectives. In G. Hynd (Ed.), The school psychologist Levy, J., & Trevarthen, C. (1976). Metacontrol of hemi-
(pp. 95±117). Syracuse, NY: Syracuse University Press. spheric function in human split-brain patients. Journal of
Kaufman, A. S. (1985). Review of Wechsler Adult Experimental Psychology: Human Perception and Per-
Intelligence Scale-Revised. In J. V. Mitchell (Ed.), The formance, 2, 299±312.
ninth mental measurement yearbook (pp. 1699±1765). Luria, A. R. (1966). Higher cortical functions in man. New
Lincoln, NE: University of Nebraska Press. York: Basic Books.
Kaufman, A. S. (1990). Assessing adolescent and adult Luria, A. R. (1973). The working brain: An introduction to
intelligence. Boston: Allyn & Bacon. neuro-psychology. London: Penguin.
Kaufman, A. S. (1992). Evaluation of the WISC-III and Luria, A. R. (1980). Higher cortical functions in man (2nd
WPPSI-R for gifted children. Roeper Review, 14, ed.). New York: Basic Books.
154±158. MacMann, G. M., & Barnett, D. W. (1994). Structural
Kaufman, A. S. (1993). King WISC the third assumes the analysis of correlated factors: Lessons form the Verbal-
throne. Journal of School Psychology, 31, 345±354. Performance dichotomy of the Wechsler scales. School
Kaufman, A. S. (1994a). A reply to MacMann and Barnett: Psychology Quarterly, 9, 161±197.
Lessons form the blind men and the elephant. School Matarazzo, J. D. (1985). Review of Wechsler Adult
Psychology Quarterly, 9, 199±207. Intelligence Scale-Revised. In J. V. Mitchell (Ed.), The
Kaufman, A. S. (1994b). Intelligent testing with the WISC- ninth mental measurement yearbook (pp. 1703±1705).
III. New York: Wiley. Lincoln, NE: University of Nebraska Press.
Kaufman, A. S., & Doppelt, J. E. (1976). Analysis of McCallum, R. S., & Merritt, F. M. (1983). Simultaneous-
WISC-R standardization data in terms of the stratifica- successive processing among college students. Journal of
tion variables. Child Development, 47, 165±171. Psychoeducational Assessment, 1, 85±93.
Kaufman, A. S., & Horn, J. L. (1996). Age changes on test McDermott, P. A., Fantuzzo, J. W., & Glutting, J. J.
of fluid and crystallized ability for women and men on (1990). Just say no to subtest analysis: A critique on
the Kaufman Adolescent and Adult Intelligence Test Wechsler theory and practice. Journal of Psychoeduca-
(KAIT) at ages 17±94 years. Archives of Clinical tional Assessment, 8, 290±302.
Neuropsychology, 11, 97±121. McDermott, P. A., Fantuzzo, J. W., & Glutting, J. J.,
Kaufman, A. S., & Kamphaus, R. W. (1984). Factor Watkins, M. W., & Baggaley, A. R., (1992). Illusions of
analysis of the Kaufman Assessment Battery for meaning in the ipsative assessment of children's ability.
Children (K-ABC) for ages 212 through 1212 years. Journal of Special Education, 25, 504±526.
Journal of Educational Psychology, 76, 623±637. McGhee, R. (1993). Fluid and crystallized intelligence:
Kaufman, A. S., & Kaufman, N. L. (1983). Interpretive Confirmatory factor analysis of the Differential Ability
manual for the Kaufman Assessment Battery for Children. Scales, Detroit Tests of Learning Aptitude-3, and
Circle Pines, MN: American Guidance Service. Woodcock±Johnson Psycho-Educational Battery-Re-
Kaufman, A. S., & Kaufman, N. L. (1993). Interpretive vised. In B. A. Bracken & R. S. McCallum (Eds.),
Manual for Kaufman Adolescent & Adult Intelligence Journal of Psychoeducational/Assessment monograph
Test. Circle Pines, MN: American Guidance Service. series, advances in psychoeducational assessment:
Kaufman, A. S., Ishikuma, T., & Kaufman, N. L. (1994). A Woodcock±Johnson Psycho-Educational Battery-Revised
Horn analysis of the factors measured by the WAIS-R, (pp. 39±53). Germantown, TN: Psychoeducational Cor-
Kaufman Adolescent and Adult Intelligence Test poration.
(KAIT), and two new brief cognitive measures for McGrew, K. S., & Flanagan, D. P. (1996). The Wechsler
normal adolescents and adults. Assessment, 1, 353±366. Performance scale debate: Fluid intelligence (Gf) or
Kaufman, A. S., Kaufman, J. C., Balgopal, R., & McLean, visual processing. NASP Communique, 24, 15±17.
J. E. (1996). Comparison of three WISC-III short forms: McShane, D., & Cook, V. (1985). Transcultural intellectual
References 237

assessment: Performance by Hispanics on the Wechsler McCallum (Eds.), Journal of Psychoeducational Assess-
scales. In B. B. Wolman (Ed.), Handbook of intelligence ment monograph series, advances in psychoeducational
(pp. 385±426). New York: Wiley. assessment: Wechsler Intelligence Scale for Children-
McShane D. A., & Plas, J. M. (1984). The cognitive Third edition (pp. 151±160). Germantown, TN: Psycho-
functioning of American Indian children: Moving from educational Corporation.
the WISC to the WISC-R. School Psychology Review, 13, Schmidt, K. L. (1994). Review of Detroit Tests of Learning
61±73. Aptitude-Third Edition. Journal of Psychoeducational
Merz, W. R. (1985). Test review of Kaufman Assessment Assessment, 12, 87±91.
Battery for Children. In D. J. Keyser & R. C. Sweetland Schofield, N. J., & Ashman, A. F. (1986). The relationship
(Eds.), Test overviews (pp. 393±405). Kansas City, MO: between Digit Span and cognitive processing across
Test Corporation of America. ability groups. Intelligence, 10, 59±73.
Miller, T. L., & Reynolds, C. R. (1984). Special issue . . . Siehen, F. A. (1985). Correlational study of Woodcock±
The K-ABC. Journal of Special Education, 8, 207±448. Johnson deviation IQ scores and WAIS-R with adult
Naglieri, J. A. (1984). Concurrent and predictive validity of population. Unpublished manuscript, Arizona State
the Kaufman Assessment Battery for Children with a University.
Navajo sample. Journal of School Psychology, 22, Sperry, R. W. (1968). Hemisphere deconnection and unity in
373±380. conscious awareness. American Psychologist, 23, 723±733.
Naglieri, J. A., & Das, J. P. (1988). Planning±Arousal± Sperry, R. W. (1974). Lateral specialization in the
Simultaneous±Successive (PASS): A model for assess- surgically separated hemispheres. In F. O. Schmitt &
ment. Journal of School Psychology, 26, 35±48. F. G. Worden (Eds.), The neurosciences: Third study
Naglieri, J. A., & Das, J. P. (1990). Planning, Attention, program. Cambridge, MA: MIT Press.
Simultaneous, and Successive (PASS) cognitive processes Spruill, J. (1984). Wechsler Intelligence Scale-Revised. In
as a model for intelligence. Journal of Psychoeducational D. J. Keyser & R. C. Sweetland (Eds.), Test overviews
Assessment, 8, 303±337. (pp. 728±739). Kansas City, MO: Test Corporation of
Naglieri, J. A., & Das, J. P. (1996). Das±Naglieri Cognitive America.
Assessment System. Chicago: Riverside. Spruill, J. (1987). Review of Stanford±Binet Intelligence
Naglieri, J. A., & Jensen, A. R. (1987). Comparison of Scale, Fourth edition. In D. J. Keyser & R. C. Sweetland
black-white differences on the WISC-R and the K-ABC: (Eds.), Test overviews (pp. 544±559). Kansas City, MO:
Spearmen's hypothesis. Intelligence, 11, 21±43. Test Corporation of America.
Obringer, S. J. (1988, November). A survey of perceptions Sternberg, R. J. (1993). Rocky's back again: A review of
by school psychologists of the Stanford±Binet IV. Paper the WISC-III. In B. A. Bracken & R. S. McCallum
presented at the meeting of the Mid-South Educational (Eds.), Journal of Psychoeducational Assessment mono-
Research Association, Louisville, KY. graph series, advances in psychoeducational assessment:
O'Grady, K. E. (1983). A confirmatory maximum like- Wechsler Intelligence Scale for Children-Third Edition
lihood factor analysis of the WAIS-R. Journal of (pp. 161±164). Germantown, TN: Psychoeducational
Consulting and Clinical Psychology, 51, 826±831. Corporation.
Perlman, M. D. (1986). Toward an integration of a Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986a).
cognitive-dynamic view of personality: The relationship Technical manual for the Stanford±Binet Intelligence
between defense mechanisms, cognitive style, attentional Scale-Fourth Edition. Chicago: Riverside.
focus, and neuropsychological processing. Unpublished Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986b).
doctoral dissertation, California School of Professional Stanford±Binet Intelligence Scale: Fourth Edition. Chica-
Psychology, San Diego. go: Riverside.
Piaget, J. (1972). Intellectual evolution from adolescence to Turkheimer, E., & Farace, E. (1992). A reanalysis of
adulthood. Human Development, 15, 1±12. gender differences in IQ scores following unilateral brain
The Psychological Corporation (1997). Wechsler Adult lesions. Psychological Assessment, 4, 498±501.
Intelligence Scale-Third edition (WAIS-III). San Anto- Turkheimer, E., Farace, E., Yfo, R. A., & Bigler, E. D.
nio, TX: Author. (1993). Quantitative analysis of gender differences in the
Reitan, R. M. (1955). Certain differential effects of left and effects of lateralized lesions on verbal and performance
right cerebral lesions in human adults. Journal of IQ. Intelligence, 17, 461±474.
Comparative and Physiological Psychology, 48, 474±477. VanLeirsburg, P. (1994). Review of Detroit Tests of
Reschly, D. J., & Tilly, W. D. (1993, September). The Learning Aptitude-3. In D. J. Keyser & R. C. Sweetland
WHY of system reform. Communique, pp. 1, 4±6. (Eds.), Test overviews (pp. 219±225). Kansas City, MO:
Reynolds, C. R. (1987). Playing IQ roulette with the Test Corporation of America.
Stanford±Binet, 4th edition. Measurement and Evalua- Wada, J., Clarke, R., & Hamm, A. (1975). Cerebral
tion in Counseling and Development, 20, 139±141. hemisphere asymmetry in humans. Archives of Neurol-
Reynolds, C. R., Chastain, R. L., Kaufman, A. S., & ogy, 37, 234±246.
McLean, J. E. (1987). Demographic characteristics and Watkins, M. W., & Kush, J. C. (1994). Wechsler subtest
IQ among adults: Analysis of the WAIS-R standardiza- analysis: The right way, the wrong way, or no way?
tion sample as a function of the stratification variables. School Psychology Review, 23, 640±651.
Journal of School Psychology, 25, 323±342. Webster, R. E. (1994). Review of Woodcock±Johnson
Reynolds, C. R., Kamphaus, R. W., & Rosenthal, B. L. Psycho-educational Battery-Revised. In D. J. Keyser &
(1988). Factor analysis of the Stanford±Binet Fourth R. C. Sweetland (Eds.), Test overviews (pp. 804±815).
Edition for ages 2 years through 23 years. Measurement Kansas City, MO: Test Corporation of America.
and Evaluation in Counseling and Development, 2, 52±63. Wechsler, D. (1939). Measurement of adult intelligence.
Roback, A. A. (1961) History of psychology and psychiatry. Baltimore: Williams & Wilkins.
New York: Philosophical Library. Wechsler D. (1958). Measurement and appraisal of adult
Sandoval, J. (1992). Test Reviews: Using the DAS with intelligence (4th ed.). Baltimore: Willilams & Wilkens.
multicultural populations: Issues of test bias. Journal of Wechsler D. (1974). Manual for the Wechsler Intelligence
Psychoeducational Assessment, 10, 88±91. Scale for Children-Revised. San Antonio, TX: Psycholo-
Sattler, J. M. (1988). Assessment of children (3rd ed.). San gical Corporation.
Diego, CA: Sattler. Wechsler, D. (1981). Manual for the Wechsler Adult
Schaw, S. R., Swerdlik, M. E., & Laurent, J. (1993). Intelligence Scale-Revised (WAIS-R). San Antonio,
Review of the WISC-III. In B. A. Bracken & R. S. TX: Psychological Corporation.
238 Intellectual Assessment

Wechsler, D. (1989). Manual for the Wechsler Preschool Woodcock, R. W. (1990). Theoretical foundations of the
and Primary Scale of Intelligence-Revised (WPPSI-R). WJ-R measures of cognitive ability. Journal of Psycho-
San Antonio, TX: Psychological Corporation. educational Assessment, 8, 231±258.
Wechsler, D. (1991). Manual for the Wechsler Intelligence Woodcock, R. W., & Johnson, M. B. (1989). Woodcock±
Scale for Children-Third Edition, (WISC-III). San Johnson Tests of Cognitive Ability: Standard and supple-
Antonio, TX: Psychological Corporation. mental batteries. Chicago: Riverside.
Witt, J. C., & Gresham, F. M. (1985). Review of the Woodcock, R. W., & Mather, N. (1989). WJ-R Tests of
Wechsler Intelligence Scale for Children-Revised. In J. V. Cognitive Ability-Standard and Supplemental Batteries:
Mitchell (Ed.), Ninth mental measurements yearbook Examiner's Manual. In R. W. Woodcock & M. B.
(pp. 1716±1719). Lincoln, NE: University of Nebraska Johnson (Eds.) Woodcock±Johnson psycho-educational
Press. battery-revised. Allen, TX: DLM Teaching Resources.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.09
Assessment of Memory, Learning,
and Special Aptitudes
ROBYN S. HESS
University of Nebraska at Kearney, NE, USA
and
RIK CARL D'AMATO
University of Northern Colorado, Greeley, CO, USA

4.09.1 INTRODUCTION 239


4.09.1.1 Assessment Approaches 241
4.09.1.2 Evaluation of Domain Areas 243
4.09.1.3 Intervention Approaches 244
4.09.2 ASSESSMENT OF MEMORY 245
4.09.2.1 Attention 246
4.09.2.2 Short-term and Long-term Memory 247
4.09.2.3 Memory: Implications for Intervention 249
4.09.3 ASSESSMENT OF LEARNING 249
4.09.3.1 Models of Learning 249
4.09.3.2 Learning Processes: Input and Integration 252
4.09.3.3 Academic Achievement: Output 253
4.09.3.4 Learning: Implications for Intervention 254
4.09.4 ASSESSMENT OF SPECIAL APTITUDES 254
4.09.4.1 Sensory Perception 255
4.09.4.1.1 Sensory perception: implications for intervention 256
4.09.4.2 Motor: Fine and Gross 256
4.09.4.3 Sensory-motor Integration 257
4.09.4.3.1 Motor: implications for intervention 257
4.09.4.4 Communication/Language 257
4.09.4.4.1 Communication/language: implications for intervention 259
4.09.5 FUTURE DIRECTIONS 259
4.09.6 SUMMARY 260
4.09.7 REFERENCES 261

4.09.1 INTRODUCTION traditional categories of cognition, such as


attention, memory, language, and learning are
The inner workings of the human mind and terms frequently used and considered all
the way in which people process information important to our daily adaptive functioning
has intrigued researchers for centuries. The and ability to learn new information. Yet all

239
240 Assessment of Memory, Learning, and Special Aptitudes

have defied simple explanation and manipula- Psychology has made many new inroads into
tion (e.g., Gaddes & Edgell, 1994; Lezak, 1995). understanding the learning process and the
That is, as professionals we are able to identify subsequent development of corrective or adap-
when a child or adult is having difficulty tive programs for children and adults with
processing information, but the exact interrela- learning disorders and traumatic brain injuries.
tions between an individual's different capa- As assessment specialists, psychologists must
cities in areas such as attention, learning style, quickly and accurately wade through the
and sensory integration still eludes educational cumulative data available about the individual
and clinical specialists. More puzzling still is in order to select the most viable of alternative
finding effective rehabilitation strategies to hypotheses to explain the findings and offer
address deficits in memory, learning, and other appropriate interventions (D'Amato & Dean,
psychological processes. These areas are fre- 1989a; D'Amato, Rothlisberg, & Leu, in press;
quently addressed in the growing volumes of Gutkin & Reynolds, 1990). Although adminis-
neuropsychological research, but are relevant to tering a test may be a routine activity,
the practices of many traditionally trained conducting a thorough, valid assessment is an
psychologists as well (Hamsher, 1984). Not extremely complex process. The clinician is
surprisingly, all of the information needed for required to make decisions regarding which
an adequate understanding and interpretation skills to evaluate and the best instrument to use
of cognitive processes cannot be provided in a with a particular client, and to generate accurate
single chapter or obtained in an individual interpretations of the results in order to create
university course. Thus, the purpose of this the most effective intervention plan. Adding to
chapter is to provide a brief description of our the immensity of this task are the wide range of
higher cognitive processes and introduce a client variables that can impact the assessment
variety of strategies and measures for evaluating process, including motivation, environment,
these functions. culture, age, developmental level, language,
Problems in attention, memory, and learning training, educational quality, personal experi-
are not isolated to the very young or the very ence, and attitude to name just a few (Golden,
old. Adult learning problems often become Sawicki, & Franzen, 1984; Hynd & Semrud-
apparent in employment settings and after Clikeman, 1990).
injuries resulting from strokes, accidents, or The clinician must recognize that the context
diseases. Epidemiological studies suggest that in of the client may influence or even define the
the decade following the late 1990s there will be outcome of the assessment (Dana, 1993;
a dramatic increase in the number of individuals Figueroa & Garcia, 1994). For example, many
suffering from organically related disorders of the psychological, educational, and person-
resulting from the abuse of alcohol and other ality instruments available to practitioners have
types of toxic substances (Touyz, Byrne, & been criticized as culturally biased, as tradi-
Gilandas, 1994). The recent documented in- tionally not including individuals from diverse
crease in the number of head injuries caused by ethnic backgrounds in the norming sample, and
motor vehicle accidents has become an intrinsic as measuring acquired knowledge rather than
fact in today's society. High speed transporta- an individual's responsiveness to instruction or
tion and the growing prevalence of violent street the learning process (Cole & Siegel, 1990; Dana,
crimes have further increased the incidence of 1993; Figueroa & Garcia, 1994; Sattler, 1992).
head injuries (Touyz et al., 1994). So too, the Because of these problems, several researchers
growing popularity of certain contact sports believe that standardized assessment may have
(e.g., hockey, boxing), noncontact sports (e.g., questionable validity for those clients who
rock climbing, mountaineering, bicycling), and represent diverse cultural groups (Cole & Siegel,
recreational activities (e.g., skateboarding, roll- 1990; Dana, 1993; Figueroa & Garcia, 1994;
er blading) has contributed significantly to the Martinez, 1985). From the beginning, one
number of individuals suffering from traumatic robust assumption of standardized testing was
brain injury (Drew & Templer, 1992; Templer & that all individuals who take the tests would
Drew, 1992). Internal processes such as eating have had equal or comparable exposure to the
disorders, depression, diseases, epilepsy, and contents of the assessment materials prior to the
tumors can result in impaired executive func- assessment (e.g., Colvin, 1921; Dearborn, 1921;
tioning as well (Black & Strub, 1994). These Woodrow, 1921). In direct contrast to this
realities of today's society make it necessary for supposition, current statistics indicate that the
clinicians to be able to accurately evaluate a US immigrant population is not only growing
client's strengths and weaknesses in everyday rapidly but is also quickly expanding in diversity
functioning and find the key elements to (Figueroa & Garcia, 1994). These authors
fostering effective behavioral change through conclude that tests, although given high status
rehabilitation or educational improvement. in US society, are actually quite fragile because
Introduction 241

of the founding assumption regarding homo- instruments used to measure memory, learning,
geneity and general shortcomings in technical and special aptitudes as well as a brief
properties. Nevertheless, standardized tests can description of those instruments that have
be useful in evaluating current functioning strong empirical support for use with children
especially when multiple sources of information and adult populations. Any one of these areas,
and multidimensional functions are evaluated on its own, represents a very narrow picture of
to measure individual processes (Sattler, 1992). the overall functioning of an individual. How-
The responsible clinician must recognize both ever, when used in conjunction with a more
the assets and the limitations when using thorough assessment, these areas can provide
standardized measures with ethnic minority the missing pieces to the puzzle.
clients. Client difficulties may be attributed not only
The strategy the practitioner uses to accom- to intra-individual characteristics, but also to
plish an effective assessment must, of necessity, the domain of functioning (e.g., social, voca-
be based upon well-grounded, empirically tional, educational); the context or environment
validated theories of cognition and behavior. in which the client is expected to function (e.g.,
Only through the use of a theoretical framework job site, classroom, independent living); the
are specific predictions regarding performance requirements of particular assignments, jobs, or
under a given set of ecological circumstances responsibilities, task; and the strategy used to
made possible (Dean, 1985a, 1986; Rothlisberg, teach or remediate a difficulty, intervention
1992). Unfortunately, no single, diagnostic (Geil & D'Amato, 1996). Although the focus of
paradigm or theory has proven sufficient to this chapter is memory, learning, and special
explain fully the vagaries of behavior (D'Amato aptitudes, it may be helpful for the clinician to
& Rothlisberg, 1992). Psychoanalytically, be- view these processes as key components within
haviorally, and biologically based approaches, the conceptual framework presented in Figure 1.
as well as other theoretical positions, have been These areas represent extremely important
continually challenged not only to describe aspects of psychological functioning and can
behavior, but also to provide effective inter- help to complete the diagnostic picture of an
ventions for the populations whom they serve individual by providing critical information to
(D'Amato & Dean, 1989b; Gutkin & Reynolds, assist the clinician in accurate diagnosis and
1990). Prepackaged programs dealing with intervention planning.
psycholinguistic or visual-motor training, and
sensory integration training have attempted 4.09.1.1 Assessment Approaches
this, but typically failed to meet the demands of
this challenge. Gradually, the field has acknowl- Before addressing the particular areas of
edged that the effective use of assessment concern, a brief discussion of the assessment
procedures, including educational and psycho- process is warranted. Both quantitative and
logical tests, is reliant upon a theoretical qualitative assessment procedures help to pro-
foundation, which allows the incorporation of vide a breadth of information concerning
information from multiple data sources and individual functioning. A quantitative or
environments in such a manner as to increase product-oriented approach uses standard per-
the amount of effective and appropriate inter- formance data to assess individuals within and
ventions generated. across all the functional domains to be measured
A framework that is particularly useful is one by comparing the findings to a normative group
reliant on an ecological approach. From this (D'Amato, Rothlisberg, & Rhodes, 1997; Dean,
perspective, it is critical to evaluate several 1985a, 1985b; Lezak, 1995). This process detects
different aspects of clients' lives in order to whether the client's skills show a discrepancy
develop a better understanding of their func- when they are compared to other individuals
tioning within a variety of contexts. The performing within a normal range. Patterns of
purpose of this chapter is to examine the performance can also be carefully analyzed to
particular areas of attention and memory, determine the individual client's strengths and
learning processes (input and output), and the weaknesses. Data is usually considered in
special aptitudes of sensory perception, sensory- several ways: level of performance or current
motor integration, and language/communica- functioning (compared to normative stan-
tion to facilitate making informed assessment dards); pattern of performance (uniqueness of
decisions. One must possess knowledge of the strengths and weaknesses); right±left differences
cognitive processes that these tests purport to (comparing tests that evaluate both hemispheres
measure to make judgments about the useful- including both sides of the body); pathognomic
ness of any given instrument with the client's signs (indications of abnormal signs or brain
presenting issue. Furthermore, the clinician is damage); qualitative analysis (behavioral ob-
provided with an introduction to the types of servations of problem solving); intervention
242 Assessment of Memory, Learning, and Special Aptitudes

• Domain (social, vocational, educational)


• Context (job site, classroom, independent living)
• Task (assignment, job, responsibility)
• Intervention (remediation, counseling, training)
Figure 1 Conceptual framework of client functioning.

planning (recommendations for appropriate cal battery (e.g., Halstead±Reitan Neuropsycho-


rehabilitation) (Hynd & Semrud-Clikeman, logical Test Battery; Reitan & Wolfson, 1993)
1990; Jarvis & Barth, 1994; Reitan & Wolfson, involve the same set of instruments for each
1985, 1993; Sattler, 1992; Selz, 1981). individual tested (Hynd & Semrud-Clikeman,
Most proponents of a quantitative approach 1990; Hynd & Willis, 1988). A standard battery
recommend a standard or fixed battery of tests. format insures that a broad array of appropriate
A fixed battery, such as a typical psychoeduca- tools is used to cover all significant domains and
tional battery (i.e., Wechsler Adult Intelligence therefore provide documented results that may
Scale-Revised [WAIS-R] or Wechsler Intelli- be interpreted with ease. In fact, a standard
gence Scale for Children-3rd Edition [WISC- battery approach may be the best choice when
III], Minnesota Multiphasic Personality and if potential litigation is an issue because this
Inventory-2nd Edition, Bender Visual-Motor method offers a normative data base to which
Gestalt Test, Woodcock±Johnson Psychoedu- client profiles can be compared and contrasted
cational Battery-Revised) or neuropsychologi- (Guilmette & Giuliano, 1991; Reitan & Wolfson,
Introduction 243

1995). Despite the apparent strengths of a Although the flexibility and individualization
standardized approach, it has been argued that apparent in this method is appealing, it requires
when developing treatment options, the use of a great deal of clinical experience to make
qualitative methods that explore the process of accurate interpretations of behaviors, and
learning or behavior may be better suited than a problems with reliability and validity are
purely quantitative or product-oriented ap- ever-present (Lezak, 1995).
proach (D'Amato, Rothlisberg, & Leu, in A third approach, and one which is likely
press). For example, if verbal instruction with used by the majority of clinicians, is the use of
verbal response is a strength for the client, a integrated data. Indeed, any time examiners
preference for left hemisphere processing might note an examinee's reaction to a task, the
be entertained and interventions utilizing a response time involved, or any problem-solving
verbal component could be tailored with that strategies employed (e.g., rehearsal, verbal
hypothesis in mind. Likewise, if a client cuing), they are inferring the underlying
demonstrated a strength in simultaneous pro- processes being used (D'Amato, Rothlisberg,
cessing of information, a global concept or & Leu, in press; Taylor, 1988; Taylor &
visual chart could be introduced before pre- Fletcher, 1990). Because all individuals show
senting the individual skills necessary to a distinctive pattern of learning and behavioral
accomplish the particular task. characteristics, it is improbable that any given
A second strategy, the qualitative approach test, or even battery of tests, in isolation, can
or process-oriented approach, uses informal capture the range of skills exhibited by that
procedures such as direct observation of individual. Furthermore, test scores that are
particular skills to analyze the specific patterns interpreted without consideration to the context
and processes in order to understand better the of the examination may be objective but are
intricacies of the client's psychological pro- meaningless in their individual application
cesses (Lezak, 1995). Practitioners utilize a (Lezak, 1995). Likewise, clinical observations
client's individual pattern of responses or unsupported by standardized and quantifiable
results to guide the assessment process. That testing may provide a rich picture of the client's
is, if a client was observed to have difficulty with current functioning but lack the comparability
memory tasks, that particular area would be necessary for many diagnostic and planning
investigated in more detail through the use of decisions. Thus it is expected that most
additional measures of memory. A decision- practitioners are integrated in their assessment
making process (i.e., whether to explore an area practices, relying on both norm-referenced
further or move on to another area of comparisons and qualitative procedures and
functioning) occurs after each item and is based observations to enrich their views of their
on clinical judgment. By employing this clients. In fact, Lezak (1995) suggests that
strategy, it is argued that a clinician is better either method is incomplete without the other.
able to understand the complexities of an
individual's performance and focus on the
impaired functional system (D'Amato, Rothlis- 4.09.1.2 Evaluation of Domain Areas
berg, & Leu, in press; Golden, 1981; Luria,
1980). Unique and individualized sets of Regardless of the position of the examiner
procedures, questions, or tasks shape the along the quantitative±qualitative continuum, it
evaluation process and might include an is helpful to conceptualize an evaluation of
individual case study approach consisting of domain areas, rather than simply focusing on
a mental status exam, observation, and symp- tests or specific problem behaviors. The follow-
tom checklists. From an educational perspec- ing domains are offered because of their
tive, a psychologist using a qualitative approach importance to daily functioning and usefulness
might gather information using work samples, to intervention development in educational and
classroom observations, or dynamic assessment vocational settings (Begali, 1994; D'Amato &
strategies (e.g., Campione & Brown, 1987; Rothlisberg, 1992; D'Amato, Rothlisberg, &
Feuerstein, Rand, & Hoffman, 1979). While Leu, in press; Gaddes & Edgell, 1994). These
the major areas traditionally covered in a domains include:
qualitative evaluation seem comprehensive (i) intelligence/cognitive abilities,
(e.g., investigations of motor functions, expres- (ii) personality/behavior/family information,
sive speech, writing, reading; see Hynd & (iii) memory and attention,
Semrud-Clikeman, 1990), this view does not (iv) learning processes,
rely on standardized batteries or clear compar- (v) academic achievement,
isons to normative populations. Instead, the (vi) sensory/perceptual systems,
selection of strategies utilized follows signifi- (vii) motor functions, and
cant clinical patient±practitioner interactions. (viii) communication/language skills.
244 Assessment of Memory, Learning, and Special Aptitudes

The areas of intelligence and personality assess- Table 1 Subdomains of attention and memory,
ment are covered in more depth in Chapters 8 learning process, and special aptitudes.
and 12 of this volume. The remaining areas are
divided into the subareas presented in Table 1 to Attention and memory
provide a better understanding of the types of Attention
skills encompassed in each of these domains. Concentration or vigilance
Given the complexity of these areas, all should Visual memory
be considered both formally and informally. Verbal memory
Recall
Direct observations and interviews with the
Recognition
client and family members are vital components Short-term memory
in evaluating any individual's performance. The Long-term memory
selection of tests utilized to evaluate these
Learning processes (input and output)
abilities will vary greatly depending on the
unique needs of the individual, considered in Visual processing
tandem with the reason for referral. Motoric processing
Data on the functioning of these domains Auditory processing
Linguistic/verbal processing
provides useful information for the clinical
Simultaneous processing
psychologist. As demonstrated in Table 2, a Sequential processing
variety of testing instruments are appropriate in Academic achievement
each of these areas. It should be noted that
Sensory/perceptual
different authors have suggested various subsets
of domains for analysis as well as recommend- Visual
ing literally hundreds of other measures as Auditory
appropriate for children and adults (Batchelor, Tactile-kinesthetic
Integrated
1996a; Begali, 1992; Dean & Gray, 1990;
Gaddes & Edgell, 1994; Hynd & Willis, 1988; Motor functions
Lezak, 1995). Thus the instruments categorized Strength
in Table 2 represent only a sampling of available Speed
measures. The practitioner must take responsi- Coordination
bility for carefully matching the individual with Lateral preference
potential assessment options, after considering Sensory-motor integration
the distinct features of the instruments and the Communication/language skills
unique needs of the client. Receptive vocabulary
Expressive vocabulary
Speech/language
4.09.1.3 Intervention Approaches Written language

The referral question for any client is rarely


ªhow is this individual functioning today?º; Source: Adapted from D'Amato & Rothlisberg (in press) and
D'Amato, Rothlisberg & Rhodes (1997).
instead the referral source is most often
interested in the extent of decline following an
injury or illness, the expected future perfor-
mance in school or work settings, or how to
maximize a client's potential given certain social, and recreational settings despite difficul-
difficulties (e.g., head injury, learning disability; ties related to their deficits. To reach this end,
Long, 1996). Several decisions must be made in intervention strategies may focus on:
relation to the intervention strategy and will be (i) remediating or retraining impaired cogni-
reliant on the quality of the information tive processes (if there is a reason to believe that
provided by the assessment. Intervention may the process can be improved with practice),
be conceptualized using one of three ap- (ii) helping the client to develop new skills to
proaches: remediation (retraining a previously compensate for residual deficits,
learned skill), compensation (learning to use (iii) creating classroom or workplace adapta-
other strengths to offset a lost skill), or a tions and other environmental compensations
combination of both (D'Amato & Rothlisberg, that permit effective performance despite resi-
1996). In particular, it is critical to determine the dual deficits,
level of intervention on which to focus one's (iv) choosing instructional or therapeutic
efforts and the ideal combination of strategies procedures that best fit the client's profile of
that will work best with an individual. Reha- strengths and weaknesses, and
bilitative efforts emphasize enabling clients to (v) promoting improved metacognitive
reach their goals in educational, vocational, awareness of strengths and needs so that the
Assessment of Memory 245

Table 2 Common instruments and procedures used client can become an active participant in
to evaluate attention and memory, learning processes selecting goals and interventions strategies
and special aptitudes. (Ylvisaker, Szekeres, & Hartwick, 1994).
Attention and memory
Test of Variables of Attention (TOVATM) 4.09.2 ASSESSMENT OF MEMORY
Visual Search and Attention Test
Tests of Memory and Learning Memory is one of the most important
Wechsler Memory Scale-Revised cognitive functions to be assessed. It is a highly
Wide range assessment of memory and learning complex cognitive function that encompasses
Learning processes: input and integration several relatively discrete stages: reception and
registration of sensory stimuli, temporary short-
Detroit Tests of Learning Aptitude-3 term storage of information, storage of the
Swanson's Cognitive Processing Test information in a more permanent form (long-
Children's Auditory Verbal Learning Test
term memory), and recall and retrieval of
Tactile Performance Test (Halstead±Reitan
Battery) previously stored information (Shiffrin &
Speech±Sounds Perception Test (Halstead±Reitan Atkinson, 1969; Taylor, Fletcher, & Satz,
Battery) 1984). Functioning at each stage depends upon
Wisconsin Card Sort the integrity of the previous steps, with any
interruption in the hierarchy having the poten-
Academic achievement: output
tial to interfere with memory storage or
Woodcock±Johnson Psycho-educational Battery- retrieval. For example, difficulties with atten-
Revised: Achievement tion, which most closely relates to the first stage
Peabody Individual Achievement Test-Revised of memory, would obviously lead to problems
Wechsler Individual Achievement Test in short- and long-term storage as well as later
Kaufman Test of Educational Achievement retrieval of the information. A further source of
Keymath-revised complexity in understanding and measuring this
Woodcock Reading Mastery Test-Revised
Test of Reading Comprehension-3
skill is the variety of theoretical approaches
Test of Written Language-3 from which memory can be conceptualized,
including information processing, neuropsy-
Sensory perception chological, and behavioral perspectives.
Observations Memory testing is very useful for assessing
Developmental history the possibility of organic disease, in helping to
Mental status examination differentiate between organic and psychiatric
Motor-free Visual Perception Test disorders, and in determining the functional
Vision and hearing screening significance of a memory problem (Black &
Strub, 1994). Most of the major neurobeha-
Motor (fine and gross)
vioral disorders such as dementia, confusional
Bender Visual-Motor Gestalt Test states, amnesia, material-specific memory/
Detroit Test of Learning Aptitude-3 (Motoric learning defects, and attentional dysfunction
Composite) are those in which disturbances of memory and
Developmental Test of Visual-Motor Integration attention are the prominent clinical features
Finger Oscillation Test (Hamsher, 1984). However, individuals with
Grip Strength Test
K-ABC Nonverbal Scale (e.g., Hand Movements
depression, post-traumatic stress disorder, dis-
subtest) sociative disorders (e.g., dissociative amnesia,
McCarthy Scales of Children's Abilities (Motor dissociative identity disorders) might also
Scale) demonstrate attention and memory deficits
WISC-III and WAIS-R (Block Design, Object (American Psychiatric Association, 1994).
Assembly, Coding subtests) Memory skills represent a difficult area to
Bruininks±Oseretsky Test of Motor Proficiency address because of the variety of levels (e.g.,
Communication/language skills working, short-term, long-term) and the poten-
tial implications of a deficit. So too, memory to
Revised Token Test some degree is modality specific; that is, for
Peabody Picture Vocabulary Test-Revised example, some individuals may have impaired
Test of Adolescent Language verbal memory but intact visual memory. Thus,
Test of Language Development-2 (Primary and it is important to look at various components of
Intermediate)
Test of Language Competence
memory rather than obtaining a simple global
memory score. Tests that provide a single
memory score offer a myopic and problematic
Source: Adapted from D'Amato, Rothlisberg, & Rhodes (1997). view of the multifaceted quality of memory.
246 Assessment of Memory, Learning, and Special Aptitudes

4.09.2.1 Attention errors, which are rarely made by those without


attention or vigilance difficulties (Black &
One of the key components to memory and Strub, 1994). More formal measures such as
learning is the ability to attend selectively to the Visual Search and Attention Test (Trenerry,
relevant information that we are presented with Crosson, DeBoe, & Leber, 1990) can be used for
during the course of daily functioning. Atten- adults. This test purports to measure sustained
tion is an extremely important basic function attention and visual scanning. The test consists
which refers to the client's ability to maintain of four 60-second trials and is made up of four
awareness and to focus on a specified environ- tasks which become increasingly complex. The
mental stimulus, while screening out other respondent is required to cross out letters or
stimuli that are potentially distracting (Black symbols that match a target. Normative tables
& Strub, 1994). Being able to attend has three are provided and arranged in four 10-year age
major benefits for an individual: accuracy, bands, an 18±19 year age band, and a 60+ age
speed, and maintenance of mental processing band, and the statistical properties appear to be
(LaBerge, 1995). Attention deficits appear as adequate (Hooper, 1995). Based on how the
distractibility or impaired ability for focused stimuli are presented, these tasks can provide a
behavior, regardless of the individual's inten- measure of either visual or auditory vigilance.
tion (Lezak, 1995). Intact attention is a Vigilance tests are often referred to as
necessary condition of concentration which continuous performance tests (CPTs), which
requires an individual to sustain attention over are automated tasks, now computer-adminis-
an extended period of time. Concentration tered, that purport to measure sustained
problems may be due to a simple attentional attention (Greenberg & Waldman, 1993; Las-
disturbance, or to inability to maintain a siter, D'Amato, Raggio, Whitten, & Bardos,
purposeful attentional focus or, as is often the 1994; Rosvold, Mirsky, Sarason, Bransome, &
case, to both problems. This skill is important Beck, 1956). CPTs have become a popular tool
for adequate performance on any cognitive for clinicians to measure attentional perfor-
task, and can be impaired as a result of either mance, response inhibition, and medication
an organic or emotional disorder (D'Amato, monitoring in both children and adults (Eliason
1990; Dean, 1985a). Several psychological & Richman, 1987; Lassiter et al., 1994). Many
difficulties have been associated with atten- versions of CPT have been developed since the
tional problems such as impulsivity, distract- original but the basic methodology of these
ibility, and poor social judgment. Tests that tasks remains fairly constant. Clients are
require mental effort and persistence can presented with a variety of stimuli that are
measure an individual's ability to select, sustain, displayed for a short period of time, and are
and shift attention (Slomka & Tarter, 1993). By instructed to respond to a predefined ªtargetº
comparing performance on various types of stimulus. A number of different indices can be
tasks, the practitioner is able to distinguish a recorded with these tasks including omissions
global attention deficit from the more discrete, errors (i.e., failing to detect target stimulus),
task-specific problems of concentration and commissions (i.e., responding to nontarget
tracking. stimulus), and response times for correct
It is important to clarify the nature of an detections (Greenberg & Waldman, 1993).
attention problem by observing people's general Commission errors are considered to be in-
behavior as well as their performance on tests dicative of impulsivity and omission errors are
involving concentration. An interview with thought to denote inattention (Eliason &
family members can provide important infor- Richman, 1987; Lassiter et al., 1994). Examples
mation about attentiveness and susceptibility to of these types of tests include the CPT-2
distraction. So too, formal or quantitative (Lindgren & Lyon, 1983), the Raggio Evaluation
measures of attention and short-term memory of Attention Deficit Disorder (Raggio, 1991),
can be derived from the Digit Span and Coding and the Test of Variables of Attention (TO-
or Digit Symbol subtests of the WISC-III or VATM; Greenberg, 1993).
WAIS-R tests. Some of the more specific, The TOVATM is a nonlanguage-based, visual
informal measures of attention and concentra- continuous performance test. This test runs for
tion might include observation, a digit span 23 minutes on a fixed-interval schedule and
task, and a vigilance task as outlined by Strub presents two easily discriminated visual stimuli
and Black (1993). The individual is given orally for 100 milliseconds every two seconds. It was
a series of random letters with the letter ªAº designed for use in the diagnosis and monitoring
occurring with greater frequency than the other of pharmacotherapy of children and adults with
letters. The individual is instructed to signal attention deficit disorders and can be used with
whenever the targeted letter (i.e., A) is heard. individuals age five to adulthood. The test does
The individual's performance is scored for not require right-left discrimination and has
Assessment of Memory 247

negligible practice effects. Recently, the authors lasting memory (short-term or long-term).
of the TOVATM have created developmental Furthermore, these different memory functions
norms for children aged 6 to 16 which are must be systematically reviewed through visual
available for few other CPT versions (Green- and aural modalities using both recall and
berg & Waldman, 1993). This type of tool may recognition tasks. Lezak (1995) suggests that at
also be useful in assisting the clinician in the a minimum, the memory examination should
differential diagnosis of children and adoles- include: immediate retention tasks, including
cents experiencing externalizing problems (e.g., short-term memory with interference; learning
attention deficit disorder, oppositional defiant in terms of extent of recent memory, learning
disorder, conduct disorder, and aggression) capacity, and how well newly learned material is
and/or learning disabilities (Eliason & Rich- retained; and efficiency of retrieval of both
man, 1987; Greenberg & Waldman, 1993). recently learned and long-stored information
Despite the technological advances and newly (i.e., remote memory).
defined norms, CPTs present a quandary to Informal methods of assessment include tests
practitioners because of the variety of attributes of immediate recall such as digit repetition and/
that the tests reportedly measure. Some see these or sentence repetition, interviewing for infor-
tests as measuring attention and impulsivity mation from remote memory (e.g., ªwhere were
(Klee & Garfinkel, 1983), educational achieve- you born?º), and new learning ability (e.g.,
ment (Campbell, D'Amato, Raggio, & Ste- immediate recall for a verbal story, asking the
phens, 1991), behavior (Lassiter et al. 1994), individual to remember four unrelated words
general neuropsychological functioning (Hal- for a span of 5, 10, and 30 minutes). During this
perin, Sharma, Greenblatt, & Schwartz, 1991), last task, the examiner can provide recognition
and information processing (Swanson, 1981). cues if the individual is having difficulty
Given this variance, a practitioner is left with the remembering the words. It is expected that
question of how to interpret the test results of a those without difficulties will remember all
particular client. While research supports many words, while those with brain damage might be
of these claims, different versions of the CPTs expected to remember one (Black & Strub,
have been used in these studies, with different 1994). For aphasic clients or those with other
samples of children and adults. So too, the speech or language problems, an informal
validity of CPTs have been related to material measure of visual memory can be completed
collected from teachers, parents, and peers, and by hiding five objects around the interview
from standardized intelligence, achievement, room as the client names each item as it is
and personality tests. While it is obvious that hidden. After 10 minutes, the client is asked for
CPTs measure issues critical to learning and name and location of each item. Reportedly,
memory, the specificity of these instruments both normal and lower IQ clients should be able
remains unclear. In conclusion, Morris (1996) to find all five objects, with slightly lower
noted that many of the measures that purported performance for older patients (approximately
specifically to measure sustained attention often four objects) (Black & Strub, 1994; Simpson,
measured other variables, and thus many of Black, & Strub, 1986). These memory tasks
these tests have poor construct validation and should be supplemented with observations and
may be more appropriately viewed as multi- interviews with family members. So too, if an
dimensional in nature. As an alternative, ability measure such as the WISC-III or WAIS-
Barkley (1996) advocates the use of more natural R is administered, performance on Digit Span
tasks to study attention in an individual. He can provide information on immediate verbal
concludes that CPT-type tasks are unrelated to retention and the information subtest can be an
our daily functioning and thus, an individual's indicator of the extent of remote memory in an
performance on such tasks is irrelevant. In individual.
response to this concern, several investigators To complete an assessment of the major
have reportedly used television viewing, perfor- dimensions of memory, Lezak (1995) has
mance on classroom tasks, video games, and suggested including:
driving performance as a means of studying (i) a test of configural recall and attention
attention and its deficits in various groups of such as the visual reproduction subtest on the
children and young adults (Barkley, 1996). Wechsler Memory Scale (Wechsler, 1987) or the
Benton Visual Retention Test (Benton-Sivan,
4.09.2.2 Short-term and Long-term Memory 1992);
(ii) a paragraph for recall to examine learn-
As already noted, each component of the ing and retention of meaningful verbal material;
memory process is reliant upon the previous and
steps. If information in sensory storage under- (iii) a test of learning ability that gives a
goes additional processing, it becomes a more learning curve and includes a recognition trial,
248 Assessment of Memory, Learning, and Special Aptitudes

such as Rey's Auditory-Verbal Learning Test rapid intellectual, academic, and physical (in-
(for review see Lezak, 1995). cluding neurological) development has ended
These techniques should be integrated into (Reynolds & Bigler, 1994). In adults, memory
the general clinical interview to create a varied dysfunction is associated with a variety of well-
testing format, to enable the practitioner to use defined disorders, and in many individuals is
nonmemory tasks as interference activities, and one of the earliest and key symptoms such as in
to reduce stress in those clients who have Korsakoff's disease and various other demen-
memory impairments and are concerned about tias including Alzheimer's disease. Because of
their abilities (Black & Strub, 1994). the key role of evaluating memory in the clinical
There are numerous formal instruments setting, there are a number of instruments
available which measure different dimensions designed for memory assessment in older
of memory. For children and adolescents, the populations including the Doors and People: A
Wide Range Assessment of Memory and Learn- Test of Visual and Verbal Recall and Recognition
ing (Sheslow & Adams, 1990), and the Test of (Baddeley, Emslie, & Nimmo-Smith, 1994), the
Memory and Learning (TOMAL; Reynolds & Memory Assessment Scales (MAS; Williams,
Bigler, 1994) can be used to evaluate individual 1991), and the Wechsler Memory Scale-Revised
strengths and weaknesses in the areas of (WMS-R; Wechsler, 1987).
memory and attention. In particular, the The WMS-R (Wechsler, 1987) provides an
TOMAL represents a reliable, empirically extensive measure of several dimensions of
sound measure for children and adolescents. memory. It consists of eight short-term memory
The TOMAL consists of four core indexes tests, four delayed-recall subtests, and a brief
comprising Verbal Memory, Nonverbal Mem- screening measure of mental status (i.e., in-
ory, Composite Memory, and Delayed Recall. formation and orientation questions). The eight
Supplementary indexes for Learning, Attention short-term memory tests yield four composite
and Concentration, Sequential Memory, Free scores: Verbal Memory, Visual Memory, Total
Recall, and Associative Recall are also pro- General Memory, and Attention/Concentra-
vided. Subtests include Memory for Stories, tion. The delayed-recall measures can be
Facial Memory, Word Selective Reminding, combined to derive a fifth composite score,
Visual Selective Reminding, Object Recall, Delayed Recall. The test is intended for use for
Abstract Visual Memory, Digits Forward, individuals ranging in age from 16 to 74 and
Visual Sequential Memory, Paired Recall, requires approximately 50 minutes to adminis-
Memory-for-Location, Manual Imitation, Let- ter. The psychometric properties of the WMS-R
ters Forward, Digits Backward, and Letters are questionable in terms of low reliability
Backward. The TOMAL was standardized for coefficients for the composite scores (average
children aged 5 to 19. r = 0.74), but provides stronger support for the
The TOMAL boasts many unique features, General Memory and Attention/Concentration
including a great variety of memory indexes (average r = 0.81) scores. Although the WMS-
(Reynolds & Bigler, 1994). While some of the R demonstrated satisfactory discrimination
subtests appear similar to other memory power between various clinical groups, factor
measures, some unique features of this test analyses supported a two-factor rather than the
include a learning index where teaching is hypothesized five-factor model. Huebner (1992)
permissible, a sequential memory index, and concluded that this instrument must be used
an attention and concentration index. Delayed cautiously in making clinical decisions about
recall subtests are also available and are offered individuals and interpretation should be re-
as an evaluation of forgetting or memory decay. stricted to General Memory and Attention/
It is possible to compare the examinee's own Concentration ability.
personal learning curve with a standardized For adolescents and adults, the MAS (Wil-
learning curve. The test is easy to administer and liams, 1991) also provides a valid, reliable, and
generally user-friendly. Its psychometric prop- comprehensive measure of memory function-
erties appear to be well-developed. In the ing. The MAS was standardized for use with
TOMAL subtests, 63% of the reliability adults aged 18 to 90. The major functions
coefficients are at or exceed 0.9, 31% are measured by the MAS include: verbal and
between 0.8 and 0.89, and only 6% fall below nonverbal learning and immediate memory;
0.8. Test±retest coefficients range from 0.71 to verbal and nonverbal attention, concentration,
0.91. Support for the validity of this instrument and short-term memory; and memory for verbal
was determined through indices of content and nonverbal material following delay. In
validity, construct validity (e.g., factor analytic addition, measures of recognition, intrusions
studies), and criterion-related validity. during verbal learning recall, and retrieval
Assessment of memory dysfunction in adults strategies are also available. The test consists
is easier than in children because their period of of 12 subtests based on seven memory tasks.
Assessment of Learning 249

Five of the subtests assess the retention of classical and operant conditioning; Baldwin &
information learned in a subtest administered Baldwin, 1986), cognitive (e.g., information
earlier in the sequence. Total testing time is processing; Pressley & Levin, 1983), or social
approximately one hour. Test-retest reliability (e.g., social learning and modeling; Bandura,
for the MAS was estimated using general- 1977) perspectives, to name but a few. Further-
izability coefficients and these correlations more, the distinction drawn between measures
averaged 0.85 for the subtests, 0.9 for the of memory and measures of learning is tenuous
summary scales (i.e., Short-Term Memory, at best since all instruments that evaluate an
Verbal Memory, and Visual Memory), and individual's learning process will automatically
0.95 for the global memory scale. The validity of include aspects of memory functioning. Because
the MAS was established using three types of learning takes center stage as one of our
studies: convergent and discriminant validity, functions of daily living, it represents a critical
factorial validity, and group differentiation. area to evaluate. By fully evaluating several
Despite these strengths, Berk (1995) concluded components of learning, clinicians can deter-
that clinicians should use caution in interpreting mine where the process is breaking down and
the scores until some technical problems (e.g., provide recommendations for rehabilitation.
inadequate samples, lack of evidence for An understanding of how a client best learns
content validity) can be corrected. also has important implications for the types of
therapeutic strategies that will most likely be
4.09.2.3 Memory: Implications for Intervention successful. For example, if an adolescent has
difficulty processing and remembering auditory
Because memory is multifaceted, interven- information, talk therapy may not be the best
tions in the memory domain must also be approach. Supplementing discussion with role
multidimensional. Interventions may be divided play, videos, and other visual cues may be
several ways; those involving language, those necessary to facilitate the client's acquisition of
that are nonverbal, those requiring long, short, new knowledge.
or intermediate memory, and those that use a Difficulties in learning can be attributed to a
combined approach to aid in retention (Gaddes number of disorders, including the general
& Edgell, 1994; Lezak, 1995). Strategy selection category of learning disorders (e.g., dyslexia,
depends on accessing the strengths of the clients dyscalculia), traumatic brain injuries, drug and
or, in the case of injury or disease, accessing alcohol abuse, and medical disorders (e.g.,
those parts of the brain which have been least strokes, Alzheimer's disease). Indeed, all of the
impacted. For example, learners who have variables that can affect attention and memory
difficulty with nonverbal memory tasks but will also impact learning. Furthermore, certain
have retained verbal skills may benefit from chronic medical conditions can play a role in
memory interventions that use language. Mne- learning difficulties. For example, childhood
monic devices may be used to assist in recall of diabetes is associated with subtle problems with
information if a series of problem-solving steps respect to visuospatial and visuomotor proces-
is required (Mastropieri & Scruggs, 1989). For sing (e.g., Rovet, Ehrlich, & Hoppe, 1988),
those with more difficulty remembering, some verbal abilities (Kovacs, Goldston, & Ivengar,
simple techniques such as writing all meetings in 1992), and memory and attention problems,
an appointment book or using grocery shopping which translate into increased risk for difficulties
lists and daily ªto doº lists are practical. in academic achievement, (Kovacs et al., 1992;
Rovet, Ehrlich, Czuchta & Akler, 1993). So too,
4.09.3 ASSESSMENT OF LEARNING sickle cell anemia, an inherited disorder in people
of African descent, often produces some subtle
Memory is a ubiquitous component of daily cognitive impairments that can affect school
life and is fundamental to the process of achievement negatively (Brown, Armstrong, &
learning. One must be able to remember in Eckman, 1993). Among individuals with trau-
order to demonstrate learning. The classic matic brain injuries, learning process problems
definition of learning describes it as changes may be reflected as an uncertainty as to whether
in behavior as a result of experience. Some have a concept has been learned or not (Cohen, 1991).
even considered this definition of learning as
also defining memory (e.g., see Kolb & 4.09.3.1 Models of Learning
Whishaw, 1990). Despite this relatively simple
definition, the learning process itself defies easy Information-processing theories have proved
explanation. Learning can be approached from extremely useful in conceptualizing learning
a neuropsychological (e.g., planning, attention, because this model can be applied to any given
simultaneous, successive [PASS] model; Das, cognitive task and allows the practitioner to
Naglieri, & Kirby, 1994), behavioral (e.g., specify where the learning process is breaking
250 Assessment of Memory, Learning, and Special Aptitudes

down. Silver (1993) proposed an information- & Dawson, 1978). More recently, neuropsycho-
processing model based on four steps: input logical models have been applied to ATIs and
(how information from the sense organs enters offer promise for identifying aptitudes and
the brain), integration (interpreting and proces- prescribing treatments (D'Amato, 1990; Hart-
sing the information), storage (storing the lage & Telzrow, 1983). One of the major
information for later retrieval), and output techniques that Cronbach and Snow (1977)
(expressing information via language or muscle suggested for matching treatment approaches
activity). Learning is reliant upon each of the with learner aptitudes was ªcapitalization of
first three steps and is observed or inferred from strengths.º Our increasing knowledge of how
the fourth step. Other models of information the brain functions allows clinicians to obtain a
processing highlight the importance of the more detailed understanding of how a client
working memory in skill acquisition and learns new information. For example, although
learning (Baddeley, 1986; Just & Carpenter, the cerebral hemispheres act in concert, the right
1992; Swanson, 1995). Working memory has hemisphere seems to be specialized for holistic,
traditionally been defined as a system of limited spatial, and/or nonverbal reasoning whereas the
capacity for the temporary maintenance and left shows a preference for verbal, serial, and/or
manipulation of information (e.g., Baddeley, analytic type tasks (Gaddes & Edgell, 1994;
1986; Just & Carpenter, 1992) and most closely Lezak, 1995; Reynolds, 1981a; Walsh, 1978).
corresponds to the integration step in Silver's Similarly, models of cognitive processing have
model. Tasks that measure working memory are been proposed that agree with the specialization
those that require the client to remember a small of how scientists think the brain processes
amount of material for a short time while information; some have called this preferential
simultaneously carrying out further operations. processing styles (D'Amato, 1990). For exam-
In daily life, these tasks might include remem- ple, simultaneous processing ability has been
bering a person's address while listening to affiliated with the right hemisphere because of
instructions about how to reach a specific its holistic nature; it deals with the synthesis of
destination (Swanson, 1995). When viewed parts into wholes and is often implicitly spatial
from this perspective, working memory differs (Das, Kirby, & Jarman, 1979). In contrast, the
from the related concept of short-term memory left hemisphere processes information using a
which is typically described as remembering more successive/sequential method, considering
small amounts of material and reproducing it serial or temporal order of input (Dean, 1984,
without integrating or transforming the infor- 1986). Models of brain organization have also
mation in any way (e.g., repeating back a series been proposed that attempt to explain the
of numbers) (Cantor, Engle, & Hamilton, 1991; diversity and complexity of behavior.
Just & Carpenter, 1992). Working memory An expansion of the hemispheric specializa-
appears to be extremely important to an tion approach is offered in the planning,
individual's ability to learn, and in adult attention, simultaneous, successive (PASS)
samples has correlations of 0.55±0.92 with cognitive processing model (Das et al., 1994)
reading and intelligence measures (e.g., Dane- which proposes four processing components.
man & Carpenter, 1980; Kyllonen & Christal, This model is based on the neuropsychological
1990). model of Luria (1970, 1973, 1980; Reynolds,
In an effort to promote the notion that input 1981a) and presents a comprehensive theoretical
and integration of stimuli can impact subse- model by which cognitive processes can be
quent learning, Cronbach and Snow (1977) have examined. On the basis of his clinical investiga-
advanced a theory suggesting that some types of tions with brain-injured patients, Luria (1973)
individuals might benefit from one form of suggested that there are three functional units
treatment, whereas others might benefit from that provide three classes of cognitive processes
another type of treatment: an aptitude by (i.e., memory, conceptual, and perceptual)
treatment interaction (ATI). Many researchers responsible for all mental activity. Figure 2
and educators alike believe that matching provides a graphic presentation of the PASS
learner characteristics with treatment ap- model of cognitive processing. The functional
proaches can enhance learning (e.g., Cronbach units work in concert to produce behavior and
& Snow, 1977; Resnick, 1976; Reynolds, 1981b). provide arousal and attentional (first unit),
However, subsequent studies have demon- simultaneous-successive (second unit), and
strated little support for this theory (e.g., Arter planning (third unit) cognitive processes. The
& Jenkins, 1977; Tarver & Dawson, 1978). PASS model separates the second unit into two
Initially, theories of input examined learner individual processes (i.e., simultaneous and
modalities (e.g., visual, auditory, kinesthetic), sequential). Instruments can be used to measure
which were later deemed to be too simplistic individual strengths in these different styles of
(Arter & Jenkins, 1977; Kaufman, 1994; Tarver processing.
Assessment of Learning 251

Input Output

Serial Concurrent Serial Concurrent

First Third
Functional Unit Functional Unit

AROUSAL/
ATTENTION PLANNING
Knowledge Base

Knowledge Base
Conceptual

Conceptual
Perceptual

Perceptual
Memory

Memory

Brain Stem Frontal

Occipital, Parietal & Temporal


Functional Unit
Second

Memory Conceptual Perceptual

SIMULTANEOUS & SUCCESSIVE

Figure 2 PASS model of cognitive processing. (Assessment of Cognitive Processes: The Pass Theory of
Intelligence (p. 21), by J. P. Das, J. A. Naglieri, and J. R. Kirby, 1994, New York: Allyn & Bacon. Copyright
1994, by Allyn & Bacon. Reprinted with permission.)

Knowledge of the brain and theories govern- beyond the product to determine the influence
ing information processing can determine the of related factors. These factors can include the
types of data collected during the assessment nature of the stimuli used (visual, verbal,
phase. For example, instead of simply observing tactile), the method of presentation (visual,
whether the individual was successful at a task verbal, concrete, social), the type of response
or set of measures, the practitioner looks desired (verbal, motor, constructional), and the
252 Assessment of Memory, Learning, and Special Aptitudes

response time allowed (timed, untimed; Cooley lisberg, & Rhodes, 1997). These specialized
& Morris, 1990). Other researchers have measures of performance in learning do not fall
advocated a move to an even more intense neatly within the traditional domains of
examination of processing through the use of intelligence, achievement, or neuropsychologi-
dynamic assessment strategies (Campione & cal processing. These tests, including the Detroit
Brown, 1987; Feuerstein et al., 1979; Palincsar, Tests of Learning Aptitude-3 (DTLA-3; Ham-
Brown, & Campione, 1991). Theoretically, this mill, 1991), the Swanson Cognitive Processing
strategy allows the examiner to obtain informa- Test (S-CPT; Swanson, 1996), the Children's
tion about the client's responsiveness to hints or Auditory Verbal Learning Test-2 (Talley, 1993),
probes, and thus elicits processing potential and others can offer valuable information
(Swanson, 1995). When an examinee is having concerning how individuals attend to and deal
difficulty, the examiner attempts to move the with new information. While practitioners have
individual from failure to success by modifying used these instruments to document clients'
the format, providing more trials, providing strengths and weaknesses, diagnose problems,
information on successful strategies, or offering and chart the course of disorders, these
increasingly more direct cues, hints, or prompts instruments offer more practical information
(Swanson, 1995). This approach allows the concerning rehabilitation or program planning
examiner an opportunity to evaluate perfor- than for diagnostic activities.
mance change in the examinee with and without Neuropsychological tests have also been seen
assistance. However, there is little if any by some to evaluate variables related to
standardized information available on this learning processes. In fact, the Halstead Reitan
technique and it has been criticized for its Neuropsychological Test Battery (Reitan &
clinical nature and poor reliability (e.g., Pa- Wolfson, 1985, 1993) reportedly measures
lincsar et al., 1991). problem solving, tactual discrimination, sen-
sory recognition, spatial memory, verbal-audi-
4.09.3.2 Learning Processes: Input and tory discrimination, attention, nonverbal
Integration auditory discrimination, psychomotor speed,
and manual dexterity as well as several other
Generally, three approaches have been uti- skills (D'Amato, 1990; Dean, 1985a, 1985b,
lized when evaluating how individuals prefer- 1986; Lezak, 1995). Individuals interested in a
entially process information. The first neuropsychological approach to processing
approach, seen as the traditional approach, should consult some of the recommended
employs established measures (such as the references and Chapters 10 and 11, this volume.
WISC-III) with the practitioner seeking to The area of learning processing is more
understand information processing through difficult to evaluate and is often subsumed
an analysis of common test results such as within the intelligence or achievement domains.
reviewing global scores, subtests, and clusters of One measure that has a long history in the
subtests (Kaufman, 1990, 1994). For example, a evaluation of processing styles is the DTLA-3.
pattern of strengths on the Picture Completion, This instrument was designed for use with
Block Design, Object Assembly and corre- individuals aged 6 to 17. More recently, two
sponding weaknesses in Picture Arrangement other versions of this test, the Detroit Tests of
and Coding might suggest meaningful differ- Learning Aptitude-Adult (Hammill & Bryant,
ences in a client's mental processing style (i.e., 1991a) and Detroit Test of Learning Aptitude-
right hemispheric functioning vs. left hemi- Primary (2nd ed.; Hammill & Bryant, 1991b)
spheric functioning from cerebral specialization have expanded the usefulness of this instrument
theory, or simultaneous vs. successive coding to include individuals from age 2 to 79. The
from Luria theory). The second view of DTLA-3 consists of 11 subtests comprising:
information processing, considered the infor- Word Opposites, Design Sequences, Sentence
mal approach, considers observations, check- Imitations, Reversed Letters, Story Construc-
lists, and learning style inventories to tion, Design Reproduction, Basic Information,
understand how individuals learn. From this Symbolic Relations, Word Sequences, Story
view, individuals who seem to profit most from Sequences, Picture Fragments; and 16 composite
visual clues may be seen as visual learners, and scores: General Mental Ability Composite,
might be taught utilizing overheads, visual Optimal Level Composite, Domain Composites
diagrams, and worksheets. The final approach (Verbal, Nonverbal, Attention-Enhanced,
to understanding processing stems from the Attention-Reduced, Motor-Enhanced, Motor-
administration and analysis of the many unique Reduced), Theoretical Composites (Fluid In-
measures that have been offered as learning style telligence, Crystallized Intelligence, Associative
or processing tests. This approach is seen as a Level, Cognitive Level, Simultaneous Proces-
nontraditional test approach (D'Amato, Roth- sing, Successive Processing, Verbal Scale, and
Assessment of Learning 253

Performance Scale). The testing time is esti- measure of achievement (e.g., Anastasi, 1988;
mated to vary from 50 minutes to two hours. The Dean, 1977, 1983), it would seem that the
internal consistency reliabilities for the subtests operationalization of the two areas allows for a
are sufficiently high; however, the data on comparison of more generic problem-solving
stability are limited. It was also noted that the and verbal tasks to those directly involved in
factor analysis does not support the construct scholastic performance. Thus, a measure of
validity of the different composites. In fact, only ability may be conceived of as attempting to
four factors (one being a residual or difficult to address the concept of underlying skills or
interpret category) were identified in the manual. capacities, whereas the measure of achievement
Despite these concerns, Poteat (1995) notes that is tied to the notion of the individual's
the DTLA-3 can be recommended as an adjunct proficiency in applying that ability in a
to some of the better developed measures of functional way to real world skills (e.g.,
intelligence and it provides some potentially academics). A measure of academic achieve-
valuable information about diverse abilities. The ment can help provide information as to the
DTLA has been especially helpful when evalu- degree of impairment experienced by indivi-
ating children who suffer from learning dis- duals, especially among children and adoles-
abilities or traumatic brain injuries. The Detroit cents. New learning, however, is not isolated to
Tests of Learning Aptitude-Adult (Hammill & the school years. Adults required to learn new
Bryant, 1991a) comprises 12 subtests and 16 skills as part of job training, vocational
composites and measures areas similar to the rehabilitation, or after brain injuries are all
DTLA-3. Internal consistency reliability of all placed in very real learning situations. It is
scores approximates 0.9 for all ages. This critical to have an understanding of the client's
instrument represents a useful tool in practice basic skills in order to facilitate vocational,
because of the type of information it can provide academic, and intervention decision making.
regarding client cognitive functioning in relation Assessment of academic achievement can
to learning new information. occur through a blend of formal and informal
A very recent contribution to the field of measures as well. For example, in a school
cognitive processing is the S-CPT (Swanson, setting, reviewing student clients' work samples,
1996) which purports to measure different interviewing the students and teacher about
aspects of intellectual functioning and informa- their learning and the classroom, and classroom
tion processing potential. The battery, designed observations can provide essential information.
for use with persons age five to adulthood, So too, curriculum-based measurement, where
draws from the work on information processing informal reading, writing, and math probes
theory and dynamic assessment. The subtests in (Shinn, 1989) are obtained to determine the
this measure are as follows: Rhyming Words, clients' current level of functioning and progress
Visual Matrix, Auditory Digit Sequencing, during intervention phases, are particularly
Mapping and Directions, Story Retelling, useful for monitoring the effectiveness of
Picture Sequencing, Phrase Recall, Spatial treatment approaches to learning difficulties
Organization, Semantic Association, Semantic (Fuchs, 1994). There are also several types of
Categorization, and Nonverbal Sequencing. norm-referenced instruments that are available,
This standardized test battery can be adminis- which, because of the availability of a standard
tered in an abbreviated form (five subtests) or in normative base, permit comparison across a
a complete form under traditional or interactive wide variety of curricular contexts (D'Amato,
testing conditions. Normative data for the S- Rothlisberg, & Rhodes, 1997). Some of these
CPT were gathered on 1611 children and adults. instruments measure a particular area of
The author reports high levels of internal achievement such as math or reading (e.g.,
reliability and high construct and criterion- Keymath Revised, Connolly, 1988; Test of
related validity (Swanson, 1995). This instru- Reading Comprehension-3, Brown, Hammill,
ment may offer a promising alternative to & Wiederholt, 1995; Test of Written Language-
product-oriented evaluation strategies while 3, Hammill & Larsen, 1996) while others
still allowing for normative comparison. provide a broad-based screening of a number
of academic areas (e.g., Peabody Individual
Achievement Test-Revised [PIAT-R], Mark-
4.09.3.3 Academic Achievement: Output wardt, 1989; Woodcock±Johnson Psychoeduca-
tional Battery-Revised [WJPB-R], Woodcock &
It is likely that those individuals who Johnson, 1989). These broad-based tests all
experience difficulties in processing or learning have a similar organizational structure. For
will display academic difficulties as well. example, measures in a particular area, such as
Although some might hold that there is little reading, are typically divided into basic skill
difference in the measure of ability and the areas (e.g., reading decoding) and some form of
254 Assessment of Memory, Learning, and Special Aptitudes

applied skill area (e.g., reading comprehension) tage may not give a good indication of the
so that variations in the aspects of the academic expectations for student performance in the
tasks can be noted. The difference between classroom where recall and more integrated
measures often lies in the method by which they answers are the norm (D'Amato, Rothlisberg, &
obtain their information (e.g., whether visual- Rhodes, 1997).
motor or oral responses are required); that is,
whether they require the examinee to indicate
the response through nonverbal (e.g., pointing) 4.09.3.4 Learning: Implications for Intervention
or verbal output.
A number of authors have related how
The achievement test that is designed for the
knowledge of the way individuals process
broadest range of individuals is the WJPB-R
information can contribute to the development
with norms ranging from 2 to 95 years of age.
of treatment based on neuropsychological
The WJPB-R consists of both a cognitive and an
processes (D'Amato, 1990; Reynolds, 1981b,
achievement component. The tests of achieve-
1986; Telzrow, 1985). For example, when
ment are divided into a standard battery
learning how to read, individuals who display
consisting of four broad areas: Reading (Letter-
a simultaneous/visual spatial strength in pro-
Word Identification, Passage Comprehension),
cessing might benefit from being taught using a
Mathematics (Calculations, Applied Problems),
whole word approach whereas individuals who
Written Language (Dictation, Writing Sam-
display a strength in sequential/auditory pro-
ples), and Broad Knowledge (Science, Social
cessing can be taught using a phonetic approach
Studies, Humanities). A supplemental battery is
(Whitten, D'Amato, & Chittooran, 1992). For
also available to expand the standard battery
both children and adults, cognitive rehabilita-
coverage. It includes Word Attack, Reading
tion is an emerging discipline which includes the
Vocabulary, Quantitative Concepts, Proofing,
retraining or use of compensatory strategies in
and Writing Fluency. Employing one or more of
thinking and problem-solving skills (Wedding,
the supplemental subtests gives the examiner the
Horton, & Webster, 1986). Cognitive retraining
option of computing additional areas of
can include assistance in strategy development
achievement such as Basic Reading Skills and
for attention and concentration, memory,
Reading Comprehension which is consistent
language, perceptual and cognitive deficits,
with the language of the Individuals with
and social behavior. Thus, the term cognitive
Disabilities Educational Act of 1990 and some
retraining encompasses all areas of functioning
state legislative guidelines for identifying spe-
that may have been negatively impacted by
cific areas of learning disability. This instrument
neuropsychological disorders or traumatic
is statistically sound and ample amounts of
brain injury (D'Amato, Rothlisberg, & Leu,
research have been conducted and support the
in press; Gray & Dean, 1989). Assisting learners
use of this test.
with cognitive remediation or compensation
Another measure of achievement, the PIAT-
often includes the use of metacognitive strate-
R (Markwardt, 1989) has also been supported as
gies. Metacognition includes analyzing the
a well-developed and psychometrically sound
processes an individual uses to generate an idea
instrument (Williams & Vincent, 1991). It
or thought. By receiving assistance in breaking
consists of five subtest scores (General Informa-
down problems and understanding the pro-
tion, Reading Recognition, Reading Compre-
cesses needed to solve problems, clients may
hension, Mathematics, Spelling) that are
learn how to generalize the process to many
provided in addition to the Total Reading
problem types and improve overall learning and
and Total Test scores. A Written Expression
functioning. Although cognitive retraining is
and optional Written Language score are also
time consuming, the generalizability of the
available. The PIAT-R was normed for indivi-
strategies has been seen as appropriate to many
duals aged 5±18 years. It is different from other
settings (Gray & Dean, 1989; Kavale, Forness,
tests in that it includes a larger pictorial
& Bender, 1988).
component in its item types, letting children
avoid the need for verbal reply, and instead
expecting them to point at the correct answer 4.09.4 ASSESSMENT OF SPECIAL
(out of four) for reading, spelling, and mathe- APTITUDES
matics items. Since the task demands for
recognition of information do not appear to In determining the basis for a client's
be the same as for recall, this response format difficulty, it is critical to explore the building
may aid children with retrieval difficulties or blocks of memory and learning to obtain an
those that have developed some background understanding of how the individual processes
knowledge of the area in question. It should be information (sensory input). For instance,
noted, though, that this response-type advan- sensory and perceptual skills are essential to
Assessment of Special Aptitudes 255

receiving stimuli from the environment and In the assessment of sensory perception, it is
making sense of what is received. So too, a important to evaluate visual, auditory, and
clinician must examine the output or produc- haptic (tactile) functions. For children and older
tion that the client demonstrates in response to adults it is especially important that actual
stimuli via action (e.g., motor skills) or com- sensory deficits have been ruled out through the
munication (e.g., spoken language, writing). administration of a thorough vision and hearing
That is, clients may understand a task, but, test. If these senses appear to be intact, an in-
because of integration difficulties or language depth evaluation of functioning is warranted.
impairments, be unable to demonstrate their Sensory perception can be evaluated informally
knowledge. For example, the reproduction of a through clinical observations, formally through
visual stimulus in response to a request involves standardized tests, or via other methods of data
both perceptual discrimination and fine motor collection. However, at times these strategies
development, as well as the ability to integrate may prove inconclusive regarding the etiology
visual, tactile, and auditory skills. Therefore, of performance difficulties and more formal
inadequate performance in copying geometric assessment is necessary to evaluate a client's
designs developed to assess these skills may stem functioning.
from: a misperception, or faulty interpretation Several instruments are available to measure
of the input information; problems in executing a client's functioning within this domain. Most
the fine motor response, or output; and/or are inexpensive and relatively quick to admin-
difficulties integrating the input and output, ister. Within the visual modality, the Motor-
otherwise known as integrative or central Free Visual Perception Test (MFVPT; Colar-
processing difficulties. By evaluating the do- usso & Hammill, 1996) allows the clinician to
mains of sensory perception, sensory-motor evaluate visual perception without motor
integration, and communication/language, the involvement in children. This 36-item measure
practitioner is in a better position to understand assesses five facets of visual perception: spatial
the client's ability to receive information relations, visual discrimination, figure±ground,
adequately, integrate these basic skills, and visual closure, and visual memory. The MFVPT
demonstrate the products of memory and is intended for children four to eight years of
learning processes. age. The MFVPT can offer information
essential for the differential diagnosis of motor
4.09.4.1 Sensory Perception vs. visual processing problems. However, when
used in isolation from other measures or
Perception of stimuli is a complex process techniques, the MFVPT offers information
involving many different aspects of brain regarding visual processing difficulties but is
functioning (Lezak, 1983). Typically, percep- unable to rule out motor concerns. For adults,
tion includes recognizing features and relation- the Benton Revised Visual Retention Test
ships among features. It is affected by context (Benton-Sivan, 1992) is a widely used measure
(figure±ground) and intensity, duration, sig- of visuoperceptual ability, constructional skills,
nificance, and familiarity of the stimuli (Ylvi- and immediate visual memory (Youngjohn,
saker, Szekeres, & Hartwick, 1994). Sensory Larrabee, & Crook, 1993). Clients are required
perception skills are vital to an individual's to reproduce abstract geometric designs from
understanding and response to the environment memory.
because they form the basis of each individual's Some clients have difficulty discriminating
interaction with the world (D'Amato, Rothlis- sounds even when thresholds for sound
berg, & Rhodes, 1997; Lezak, 1995). Difficulties perceptions are intact (Lezak, 1995). Auditory
may manifest themselves in the individual's discrimination can be tested by having the
ability to use information gained through the client repeat words and phrases spoken by the
senses. For example, a client may be able to hear clinician, or by asking the client whether two
sounds well, but have trouble understanding spoken words are the same or different. On this
what is heard (auditory processing). Likewise, a task, the clinician will want to use word pairs
client may be able to see words clearly but have that sound alike such as ªcatº and ªcapº along
problems reproducing them when writing with identical word pairs (Lezak, 1995). This
(visual-motor difficulties). Sensory perception technique has been formalized through the
tasks often form the foundation for the later development of Wepman's Auditory Discrimi-
performance of higher order cognitive skills. nation Test (Wepman & Reynolds, 1987) which
Without the ability to accurately sense and allows the clinician to determine whether the
perceive cues from the environment, the learner client is able to discriminate similar sounding
is placed in the position of trying to decode a words adequately. Although the test was
message when the code is scrambled and often originally devised to identify auditory discri-
changing. mination problems in young school children,
256 Assessment of Memory, Learning, and Special Aptitudes

and the present norms were developed on Movement also can consist of both discrete and
samples of four to eight year olds, norms for continuous patterns. Movements that are dis-
the oldest age group (8±0 to 8±11) are adequate crete might involve something as simple as
for adults since auditory discrimination is lifting a finger, while continuous movements
generally fully developed by this age (Lezak, include an integrated set of skills like skipping.
1995). Movements may be disrupted if damage exists in
The perception of tactile stimuli is regularly the premotor cortex where the ªkinesthetic
measured as a component of a thorough melodyº is believed to be formed. If this occurs,
neuropsychological examination, but less often the individual may not be able to perform serial-
in nonspecialized clinical settings. Informal continuous movements but may be able to
strategies for evaluating this area include asking demonstrate the specific discrete movements.
clients to indicate whether they feel the sharp or Because of the complexity of motor patterns, the
the dull end of a pin, pressure from one or two individual's posture, movement in isolation,
points (applied simultaneously and close to- and movement in serial order should be assessed
gether), or pressure from a graded set of plastic for possible intervention. This can be accom-
hairs, the ªVon Frey hairsº (Lezak, 1995). The plished by observing individuals completing
eyes should be closed or the hand kept out of tasks such as writing their name (uses one hand),
sight when tactile sensory functions are tested. tying their shoes (uses both hands), and also
More formal measures include the Tactile Form performing novel tasks such as repeated tapping
Perception Test (Benton, Hamsher, Varney, & or clapping patterns. It should be noted if there
Spreen, 1983) and the Tactual Performance Test is difficulty integrating the use of both hands.
(Reitan & Davison, 1974). Deficits in tactile Both fine and gross motor skills should always
senses are often associated with damage to the be evaluated.
right hemisphere of the brain and may have Although informal methods will yield a great
important implications for a client's vocational deal of information regarding a client's fine and
functioning (Lezak, 1995). gross motor functioning, several standardized
instruments are also available which measure
various specific or broad components of
4.09.4.1.1 Sensory perception: implications for motoric functioning. For example, the
intervention Bruininks±Oseretsky Test of Motor Proficiency
If the client is having difficulty in one or (Bruininks, 1978) provides a comprehensive
more areas of sensory perception, this informa- picture of an individual's motor development.
tion is critical for intervention planning. That The instrument was designed for children aged
is, the client's unique pattern of receiving 4±5 to 14±15 and can be administered in 15±60
information from the environment can be used minutes, depending on whether the complete or
to create effective education, rehabilitation, or short form of the battery is used. Three
therapeutic intervention. If a client is weak in composite scores are provided in the areas of:
auditory processing but strong in visual Gross Motor Development (Running speed and
processing, for example, visual cues such as agility, Balance, Bilateral coordination,
drawings, videos, or demonstrations may be the Strength), Gross and Fine Motor Development
most effective means for training them in new (Upper-limb coordination), and Fine Motor
skills. Development (Response speed, Visual-motor
control, Upper-limb speed and dexterity).
Specific fine motor abilities can also be
4.09.4.2 Motor: Fine and Gross measured by using the Finger Tapping Test
and the Grip Strength Test which are both a part
The motor domain involves a range of both of the Halstead-Reitan Battery (Reitan &
fine and gross motor movement. Fine motor Wolfson 1993).
skill is commonly thought of as movement To measure a client's lateral preference, the
which does not involve the entire body. Writing, Lateral Preference Schedule (Dean, 1988) can be
opening a letter, or tying a shoe are all examples administered to obtain a better understanding
of fine motor movements. Gross motor move- of clients' lateral preference in the use of their
ment involves large extremities and often the eyes, ears, arms, hands, and feet (Rothlisberg,
entire body. Activities such as walking or sitting 1991). Atypical patterns of lateral preference
down involve gross motor capacities. Inten- have been hypothesized to indicate potential
tional movement, using fine and gross motor predictors of reading difficulty (Bemporad &
skills, involves a series of brain-based systems Kinsbourne, 1983; Dean, Schwartz, & Smith,
and is learned with repetition. With repeated 1981). Determining lateral preference can be
action, the movement becomes rote or, as Luria useful in interpreting assessment findings and in
(1973) described it, a ªkinesthetic melody.º creating a plan for rehabilitation (Lezak, 1995).
Assessment of Special Aptitudes 257

4.09.4.3 Sensory-motor Integration performance (Hartlage & Golden, 1990). In the


personality area, performance on the Bender
An additional component of our motoric may also be used to develop hypotheses
functioning is the ability to integrate what is regarding impaired performance due to poor
received by the senses with what is produced planning, impulsivity, or compulsivity. Extre-
through action. For example, an individual may mely large or small figures, heavily reinforced
be able to perceive letters correctly and have lines, and second attempts are examples of the
adequate fine motor control, but still have item reproduction difficulties which are thought
difficulty correctly copying material presented to indicate emotional concerns on the part of the
in visual form. Numerous paper-and-pencil individual.
tests have been developed to assess motor The VMI is an individual or group adminis-
function as it relates to visual-motor integra- tered test that involves copying a sequence of 24,
tion. Two of the most popular measures for this increasingly complex, geometric figures. The
purpose are the Bender Visual-Motor Gestalt test requires a relatively short administration
(Bender, 1938) and the Developmental Test of time and is designed primarily for ages 4 to 13.
Visual-Motor Integration (VMI; Beery, 1989). The VMI offers several advantages as a tool for
The Visual-Motor Gestalt Test (Bender, assessment and is widely used in psychological
1938) is an individually administered test evaluation and research. The most common use
containing nine geometric figures which the of the VMI, now in its third edition, seems to be
client copies on to a blank sheet of paper. While in assisting with the diagnosis of children who
historically this test was seen as a general are suspected of having learning problems due
measure of organicity, it is more appropriately to visual-motor difficulties. The VMI is also
used as a measure of visual-motor skills. frequently employed when investigating the
Standard scores are provided in the develop- reliability and validity of other tests of visual-
mental scoring system for children ages 5±0 to motor integration, such as the Bender, self
11±11, although it is frequently used with adults drawing tasks, progressive matrices, and neu-
as well. Most commonly known as the ªBend- ropsychological tests (Goldstein, Smith, &
er,º this measure is perhaps the best known and Waldrep, 1986; Palisano & Dichter, 1989).
most widely used visual-motor assessment Because the VMI does not require a verbal
procedure available today (Bender, 1938; response, it has also been used to assess visual-
Reynolds & Kamphaus, 1990). As a component motor processes among non-English-speaking
of a comprehensive assessment battery, perfor- children (Brand, 1991; Frey & Pinelli, 1991).
mance on the Bender has long been thought to
reveal visual-motor difficulties that may be
associated with cerebral impairment (Sattler, 4.09.4.3.1 Motor: implications for intervention
1992). Traditionally used to assess an indivi- Motor problems and sensory-motor integra-
dual's constructional praxic skills, the Bender tion difficulties can impair a client's ability to
provides an evaluation of motor integration write or to learn new skills requiring motor
employed in the execution of complex learned coordination, and generally can have a negative
movements (Hartlage & Golden, 1990). The impact on daily functioning. In the classroom
information generated through this process setting, possible suggestions for accommodat-
may then be compared with levels of perfor- ing these difficulties might include modifying
mance across other measures of functioning. instructions to compensate for motor difficul-
Alternate uses of the Bender include its ties (e.g., allowing pointing to the correct choice
administration as a memory test as well as a rather than writing, allowing students to tape
copying test. This dual administration process record notes or copy them from others). In a
can be employed to assess different mental rehabilitation setting, the client may need to
functions (short-term visual memory and visual learn alternative methods for writing such as
perception) which utilize the same modalities in using word processing programs on a computer
perception and task execution (Sattler, 1992). or, if serious difficulties exist, using voice-
An additional technique available when inter- activated programs. Consultation with an
preting the Bender performance is to have occupational therapist or a physical therapist
individuals compare the figure which they will be extremely helpful in treatment planning
produced with the corresponding stimulus when motor difficulties are evident.
design. If the client is unable to recognize
obvious differences between the two designs, a
perceptual deficit may be involved. Likewise, if 4.09.4.4 Communication/Language
the client is able to detect a difference between
the two figures, but is unable to make them Language is the basic tool of human
alike, motor involvement may be influencing communication and hence essential to evaluate
258 Assessment of Memory, Learning, and Special Aptitudes

when working with any client (Black & Strub, produce approximately 20 animal names and a
1994). It should be viewed as a key skill because total of 40±60 words with performance depend-
it serves as a primary means of conveying ing to some degree on the client's intelligence,
information from the individual to others and education, and social/linguistic background
from others to the individual. Thus, commu- (Black & Strub, 1994). Additional methods of
nication difficulties have the power to influence examining expressive language include having
all areas of life (D'Amato, Rothlisberg, & the client repeat back meaningful verbal phrases
Rhodes, 1997). Difficulties or dysfunction or sentences of increasing length and semantic
found on tests of higher level functioning complexity. Word finding and naming difficul-
(e.g., learning processes) may well be secondary ties can be detected by the client's responses to
to a language disorder. Accordingly, language the open-ended questions or by having the client
should be evaluated early in the course of an describe a picture containing a series of objects
assessment to rule out problems in this area of or actions (Black & Strub, 1994).
functioning (Black & Strub, 1994; Lezak, Another major area of language functioning
1995). Another obvious reason to evaluate is receptive language or an individual's ability to
language and communication skills is that understand what has been said. A comprehen-
language disorders occur as the result of a wide sive assessment should include an evaluation of
range of neurologic diseases and can manifest the individual's ability to analyze and integrate
in a variety of forms of aphasia (e.g., information presented in a verbal format, since
Wernicke's, anomia, global, alexia, agraphia; a common difficulty among those experiencing
Kolb & Whishaw, 1990). To aid in the traumatic head injury is a decreased capacity to
interpretation of test findings, it is important coordinate the social aspects of language
for the clinician to be familiar with the various (Ylvisaker, Szekeres, Haarbauer-Krupa, Ur-
classic clinical aphasia presentations (see banczyk, & Feeney, 1994). It is not sufficient to
Gaddes & Edgell, 1994; Kolb & Whishaw, evaluate language comprehension based on
1990; Lezak, 1995). open-ended questions because this method
The language evaluation should be systematic relies on expressive skills and does not examine
and include an assessment of a range of comprehension in isolation (Black & Strub,
relatively specific language functions. Assess- 1994). Language comprehension can be eval-
ment must evaluate both receptive and expres- uated informally by asking the client to point to
sive verbal and nonverbal abilities to determine common objects in the room or by asking a
if adaptations are needed to enhance the series of increasingly complex questions that
individual in academic, vocational, and social require only a ªyesº or ªnoº response (e.g. ªDo
situations. As part of an informal evaluation of dogs have four legs?º) (Black & Strub, 1994). An
expressive language, the clinician will want to evaluation of a client's reading and writing skills
evaluate spontaneous speech and verbal fluency could also be included in an evaluation of
by asking the client open-ended questions language and communication (Black & Strub,
(Black & Strub, 1994). While the client is 1994). The client can be asked to read sentences
responding, the clinician can listen carefully for of increasing difficulty, spell words to dictation,
abnormal articulation, dysarthria (incoordina- and compose a paragraph in response to a
tion of the speech apparatus), verbal apraxia prompt (e.g., ªTell me how to change a tire.º)
(difficulty carrying out purposeful speech), (Black & Strub, 1994).
dysfluency, loss of prosody (melodic intona- In addition to the informal methods, several
tion), and disturbances of syntax or paraphasic instruments are available that can prove useful
errors (production of unintended syllables, for the clinical evaluation of language. There are
words, or phrases) (Black & Strub, 1994; Kolb a number of aphasia tests and batteries (e.g.,
& Whishaw, 1990). Another important compo- Boston Diagnostic Aphasia Examination, Good-
nent of language, pragmatics, can also be glass & Kaplan, 1983; Multilingual Aphasia
evaluated. Pragmatics refers to the knowledge Examination, Benton & Hamsher, 1989) which
and activities of socially appropriate commu- involve lengthy, well-controlled procedures and
nication, which takes in much of the nonverbal are best left to speech pathologists who have
aspects of communication, such as gestures, more extensive training in the specialized
loudness of speech, as well as verbal appro- techniques of aphasia examinations (Lezak,
priateness (Sohlberg & Mateer, 1990). 1995). As an alternative, aphasia screening tests
Evaluation of verbal fluency can be accom- can be used to indicate the presence of an
plished by counting the number of words the aphasic disorder and may even highlight its
client is able to produce without repetition specific characteristics, but do not provide the
within a restricted category (e.g., animals or fine discriminations of the complete aphasia test
words beginning with a particular letter) and batteries. Furthermore, these screening tests do
time (e.g., 60 seconds). The average adult should not require technical knowledge of speech
Future Directions 259

pathology for adequate administration or (on the primary form only). Overall, the TOLD-
determination of whether a significant aphasic 2 instruments are reliable and valid as language
disorder is present (Lezak, 1995). One of the screening tools for younger clients (Wochnick
most comprehensive aphasic screening tests Fodness, McNeilly, & Bradley-Johnson, 1991;
available is the Revised Token Test (McNeil & Westby, 1988). Some tests have been developed
Prescott, 1978). This expanded version of the to measure more complex language usage, such
original Token Test (De Renzi & Vignolo, 1962) as the TLC (Wiig & Secord, 1989) which
contains 10 10-item subtests. McNeil and purports to measure metalinguistic abilities.
Prescott (1978) sought to ameliorate psycho- The four subtests involve producing multiple
metric weaknesses of the original with this meanings for ambiguous sentences, recognizing
revision as well as seeking to develop an inferences on the basis of incomplete informa-
evaluative system for describing the nature tion, creating sentences given three words and a
and quantifying the degree of language deficit in context, and recognizing the meaning of
order to facilitate treatment planning. Using figurative language. This type of test may be
tokens of various shapes and sizes, the clinician useful for identifying subtle problems in
gives the client a series of increasingly complex language usage (Crosson, 1996).
instructions to follow. Though simple to
administer, this instrument is reportedly very
sensitive to disrupted linguistic processes that 4.09.4.4.1 Communication/language:
are central to the aphasic disability (Lezak, implications for intervention
1995). For clients who are experiencing difficulty
Clinicians wanting a basic measure of with either or both receptive and expressive
different aspects of communication and lan- language, the clinician can modify verbal
guage may wish to consider using the Peabody interaction by shortening the length of informa-
Picture Vocabulary Test-Revised (PPVT-R; tion presented or presenting information in
Dunn & Dunn, 1981), the Test of Language steps. Additional ideas for the school-age client
Development (TOLD-2; Hammill & Newcomer, might include recommending that the teacher
1988), and the Clinical Evaluation of Language repeat directions and have the student also
Fundamentals-Revised (Semel, Wiig, & Secord, repeat and explain the directions back to the
1987) or the Test of Language Competence teacher, pairing verbal instructions with non-
(TLC; Wiig & Secord, 1989). Unfortunately, verbal cues, and using nonverbal cues. Many of
most of these tests are normed exclusively on these strategies could be adapted to adults in
children and adolescents and, therefore, have rehabilitation and other types of therapeutic
limited application to adults. One of the most settings as well. The clinician must be careful to
frequently used tests, and one which has adult check frequently with clients to ensure under-
norms, is the PPVT-R. This test measures standing and to assist the clients and their
receptive vocabulary only and was normed for families to adjust to these communication or
individuals aged two and a half to adulthood. language deficits.
The PPVT-R is untimed and requires the
examinee to select from each plate of four
pictures the one that best represents the target 4.09.5 FUTURE DIRECTIONS
word. The test requires no reading ability, nor is
the ability to point or provide an oral response Although knowledge about how individuals
essential (Shea, 1989). The PPVT-R can help to process information has grown exponentially
establish the level of verbal understanding a since the late 1970s, researchers and practi-
client has when expressive language is not tioners alike are left with many questions
required. Comparing such receptive skills with regarding how individuals remember and learn.
those expressive skills needed for other tests How do age, gender, and ethnicity impact a
may help in developing hypotheses about the client's functioning on these specific instru-
qualitative nature of verbal performance and in ments? Do individuals with brain damage
framing potential treatment (D'Amato, Roth- process information differently than individuals
lisberg, & Rhodes, 1997). with ªnormalº brain functioning? How do the
The TOLD-2 is available in a form designed results of an assessment translate into effective
for primary ages (4±0 to 8±11) and intermediate treatment strategies that will help individuals
ages (8±6 to 12±11). This test purports to function better on the job? Despite these
measure receptive and expressive language questions and more, as a field we do know that
proficiency. The results for the TOLD-2 provide current measures of processing can allow
quotients for an Overall Spoken Language practitioners to make predictions with a reason-
score, and for the composites of Listening, able level of confidence. However, we must also
Speaking, Semantics, Syntax, and Phonology recognize that future research investigating the
260 Assessment of Memory, Learning, and Special Aptitudes

prediction accuracy of various tests is needed to In order to generate an accurate diagnosis or


expand the range and precision of clinical provide the most sound recommendations, a
prediction (Long, 1996). Indeed, limited re- complete understanding of the client is neces-
search exists that evaluates the effect of various sary. A practitioner could spend hours evaluat-
treatment approaches on success in clinical ing each of the areas outlined with careful
pediatric or adult populations (Batchelor, consideration of all subdomains using a multi-
1996b; Ris & Noll, 1994). As our knowledge dimensional approach. Although this approach
of the nervous system is expanded, and we begin would yield a bounty of information, it may not
to understand the intricacies of the brain's be practical given time limitations and insurance
organization, we can begin to see how percep- policy guidelines. The key for the clinician is to
tions are formed, information stored and find a balance between finding out the most
integrated, and action taken. Until that time, important information about client functioning
the explanation for behavior and certain through the use of instruments with the best
learning difficulties can only be inferred predictive ability and spending a limited amount
(D'Amato, Rothlisberg, & Leu, in press). of time on assessment. By creating an efficient
Another component complicating our en- and effective assessment approach, more time is
hanced understanding of information proces- available to implement treatment. The task of
sing is the need for common language and goals generating recommendations for interventions
between neurologists, psychologists, educa- that are likely to enhance client functioning is of
tional researchers, and vocational rehabilitation central importance to the issue of assessment.
specialists. Bigler (1996) notes that it is To this end, continued information is needed on
the relationship between assessed cognitive
essential that physicians and psychologists work processing and predicted future performance
toward some common understanding of normal in real-world settings (Sbordone & Long, 1996).
and abnormal behavioral manifestations of brain Furthermore, we must gain knowledge about
functioning, particularly aspects of complex atten- the most effective intervention strategies for all
tion and integration of sensory experiences, mem- types of individuals. Future research will enable
ory, motivation, organization of verbal and
nonverbal cognition, abstract thinking, problem
us to understand how the brain processes
solving, executive functions, and self-monitoring information and what treatments are effective
of behavior. Without agreement on a detailed and with what types of clients. Indeed, this addi-
relatively comprehensive model of neurobehavior- tional knowledge may allow us to match client
al development, psychologists will be limited in subtypes with specific treatments which will
their ability to develop appropriate assessment and increase the effectiveness and efficacy of
intervention strategies. (p. 50) psychological services.

Unfortunately, outcomes in rehabilitation


research have also suffered from a lack of 4.09.6 SUMMARY
consensus with importance placed on different
variables depending on the orientation of the The assessment of children, adolescents, and
author and audience (Batchelor, 1996b). For adults encompasses a wide range of domains
example, clinical researchers have examined from which a clinician may view a client's
pre- and postperformance measures of cogni- functioning. Consideration must be given to the
tive, motivational, and behavioral functions many layers of the client context (e.g., family
(Ris & Noll, 1994), while service providers have support, socioeconomic status, domain of
focused on outcome constructs such as employ- functioning), client characteristics (e.g., motiva-
ment and independent living (e.g., Adunsky, tion, education level, ethnicity, age), as well as
Hershkowitz, Rabbi, Asher-Sivron, & Ohry, the specific cognitive processes under question
1992). Concurrently, third-party payers are (e.g., memory, sensory perception). Although
interested in length of stay, cost, and effective- often overlooked or subsumed within the
ness in allocation of resources (Fratalli, 1993), broader arenas of intelligence or achievement,
while families and consumers are interested in the specific areas of memory, learning, and
quality of care. Batchelor (1996b) concluded special aptitudes are critical to our daily
that the majority of research has emphasized functioning. That is, one will have difficulty
short-term outcomes and the meaningful ques- demonstrating intelligence or learning new
tions generated by service providers, consu- tasks, if there is a severe memory deficit or
mers, and third-party payers have been difficulty in accurately perceiving stimuli. The
overlooked and present an important direction descriptions of the domains presented in this
for the field to pursue. chapter have offered insight into the field's
In terms of the practical aspects of assess- current understanding of these systems as well
ment, the practitioner's task is not an easy one. as the breadth of evaluation strategies available
References 261

to probe the diverse nature of cognitive search Monograph, No. 3. New York: American
processes. Once the practitioner identifies the Orthopsychiatric Association.
Benton, A. L., & Hamsher, K. deS. (1989). Multilingual
assessment needs of the individual client, and a Aphasia Examination. Iowa City, IA: AJA Associates.
decision is made as to the components most Benton, A. L., Hamsher, K. deS., Varney, N. R., & Spreen,
relevant for exploring the referral question, the O. (1983). Contributions to neuropsychological assess-
process can begin. By generating quality data, ment. New York: Oxford University Press.
Benton-Sivan, A. (1992). The Revised Visual Retention Test
our ability to predict outcomes and provide (5th ed.). New York: The Psychological Corporation.
effective potential intervention strategies is Berk, R. A. (1995). Review of the Memory Assessment
increased. Indeed, the goal of any assessment Scale. In J. C. Conoley & J. C. Impara (Eds). The twelfth
is to respond correctly to the question presented mental measurement yearbook (pp. 593±594). Lincoln,
by the referral source, provide accurate predic- NE: Buros.
Bigler, E. D. (1996). Bridging the gap between psychology
tions of future outcomes, and generate effective and neurology: Future trends in pediatric neuropsychol-
strategies for improving the client's adaptation ogy. In E. S. Bachelor & R. S. Dean (Eds.), Pediatric
or functioning. neuropsychology: Interfacing assessment and treatment
for rehabilitation (pp. 27±54). Boston: Allyn & Bacon.
Black, F. W., & Strub, R. L. (1994). The bedside and office
mental status examination. In S. Touyz, D. Byrne, & A.
4.09.7 REFERENCES Gilandas (Eds.), Neuropsychology in clinical practice
Adunsky, A., Hershkowitz, M., Rabbi, R., Asher-Sivron, (pp. 38±60) Boston: Academic Press.
L., & Ohry, A. (1992). Functional recovery in young Brand, H. J. (1991). Correlation for scores on revised tests
stroke patients. Archives of Physical Medicine and of visual-motor integration and copying test in a South
Rehabilitation, 73, 859±862. African sample. Perceptual and Motor Skills, 73,
American Psychiatric Association. (1994). Diagnostic and 225±226.
statistical manual of mental disorders (4th ed.). Washing- Brown, R. T., Armstrong, F. D., & Eckman, J. R. (1993).
ton DC: Author. Neurocognitive aspects of pediatric sickle cell disease.
Anastasi, A. (1988). Psychological testing (6th ed.). New Journal of Learning Disabilities, 26, 33±45.
York: Macmillan. Brown, V. L., Hammill, D. D., & Wiederholt, J. L. (1995).
Arter, J. A., & Jenkins, J. R. (1977). Examining the benefits Test of Reading Comprehension-3 (TORC-3). Austin,
and prevalence of modality considerations in special TX: PRO-ED.
education. The Journal of Special Education, 11, Bruininks, R. H. (1978). Bruininks±Oseretsky Test of
291±298. Motor Proficiency. Circle Pines, MN: American Gui-
Baddeley, A. (1986). Working memory. Oxford, UK: dance Service.
Oxford University Press. Campbell, J. W., D'Amato, R. C., Raggio, D. J., &
Baddeley, A., Emslie, H., & Nimmo-Smith, I. (1994). Stephens, K. D. (1991). Construct validity of the
Doors and People: A Test of Visual and Verbal Recall and computerized Continuous Performance Test with mea-
Recognition. Suffolk, UK: Thames Valley Test Co. sures of intelligence, achievement, and behavior. Journal
Baldwin, J. D., & Baldwin, J. I. (1986). Behavior principles of School Psychology, 29, 143±150.
in everyday life (2nd ed.). Englewood Cliffs, NJ: Prentice- Campione, J. C., & Brown, A. L. (1987). Linking dynamic
Hall. assessment with school achievement. In C. S. Lidz (Ed.),
Bandura, A. (1977). Social learning theory. Englewood Dynamic assessment: Foundations and fundamentals
Cliffs, NJ: Prentice-Hall. (pp. 82±115). New York: Guilford.
Barkley, R. A. (1996). Critical issues in research on Cantor, J., Engle, R. W., & Hamilton, G. (1991). Short-
attention. In G. R. Lyon & N. A. Krasnegor (Eds.), term memory, working memory, and verbal abilities:
Attention, memory, and executive function (pp. 45±56). How do they relate? Intelligence, 15, 229±246.
Baltimore: Brookes. Cohen, S. B. (1991). Adapting educational programs for
Batchelor, E. S. (1996a). Neuropsychological assessment of students with head injuries. Journal of Head Trauma
children. In E. S. Bachelor & R. S. Dean (Eds.), Pediatric Rehabilitation, 1, 56±63.
neuropsychology: Interfacing assessment and treatment Colarusso, R. P., & Hammill, D. D. (1996). Motor-Free
for rehabilitation (pp. 9±26). Boston: Allyn & Bacon. Visual Perception Test-Revised (MFPT-R). Novato, CA:
Batchelor, E. S. (1996b). Future considerations for Academic Therapy.
rehabilitation research and outcome studies. In E. S. Cole E., & Siegel, J. A. (1990). School psychology in a
Bachelor & R. S. Dean (Eds.), Pediatric neuropsychol- multicultural community: Responding to childrens'
ogy: Interfacing assessment and treatment for rehabilita- needs. In E. Cole & J. A. Siegel (Eds.), Effective
tion (pp. 347±352). Boston: Allyn & Bacon. consultation in school psychology (pp. 141±169). Toronto,
Beery, K. E. (1989). Developmental Test of Visual-Motor ON: Hogrefe & Huber.
Integration. Odessa, FL: Psychological Assessment Colvin, S. S. (1921). Intelligence and its measurement: A
Resources. symposium (IV). Journal of Educational Psychology, 12,
Begali, V. (1992). Head injury in children and adolescents: A 136±139.
resource and review for school and allied professionals Connolly, A. J. (1988). Keymath-revised: A diagnostic
(2nd ed.). Brandon, VT: Clinical Psychology Publishing inventory of essential mathematics. Circle Pines, MN:
Company. American Guidance Service.
Begali, V. (1994). The role of the school psychologist. In R. Cooley, E. L., & Morris, R. D. (1990). Attention in
C. Savage & G. F. Wolcott (Eds.), Educational dimen- children: A neuropsychology based model of assessment.
sions of acquired brain injury (pp. 453±473). Austin, TX: Developmental Neuropsychology, 6, 239±274.
PRO-ED. Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and
Bemporad, B., & Kinsbourne, M. (1983). Sinistrality and instructional methods. A handbook for research on
dylexia: A possible relationship between subtypes. Topics interactions. New York: Irvington.
in Learning and Learning Disabilities, 3(1), 48±65. Crosson, B. (1996). Assessment of subtle language deficits
Bender, L. (1938). A visual motor gestalt test and its in neuropsychological batteries: Strategies and implica-
clinical use. American Orthopsychiatric Association Re- tions. In R. J. Sbordone & C. J. Long (Eds.), Ecological
262 Assessment of Memory, Learning, and Special Aptitudes

validity of neuropsychological testing (pp. 243±259). difficulties. Journal of Consulting and Clinical Psychol-
Delray Beach, FL: GR Press/St Lucie Press. ogy, 49, 227±235.
D'Amato, R. C. (1990). A neuropsychological approach to Dearborn, W. F. (1921). Intelligence and its measurement:
school psychology. School Psychology Quarterly, 5, A symposium (XII). Journal of Educational Psychology,
141±160. 12, 210±212.
D'Amato, R. C., & Dean, R. S. (Eds.) (1989a). The school De Renzi, E., & Vignolo, L. A. (1962). The Token Test: A
psychologist in nontraditional settings: Integrating clients, sensitive test to detect disturbances in aphasics. Brain, 85,
services, and settings. Hillsdale, NJ: Erlbaum. 665±678.
D'Amato, R. C., & Dean, R. S. (1989b). The past, present, Drew, R. H., & Templer, D. I. (1992). Contact sports. In D.
and future of school psychology in nontraditional I. Templer, L. C. Hartlage, & W. G. Cannon (Eds.),
settings. In R. C. D'Amato & R. S. Dean (Eds.), The Preventable brain damage: Brain vulnerability and health
school psychologist in nontraditional settings: Integrating (pp. 15±29). New York: Springer.
clients, services, and settings (pp. 185±209). Hillsdale, NJ: Dunn, L. M., & Dunn, L. M. (1981). Peabody Picture
Erlbaum. Vocabulary Test-Revised. Circle Pines, MN: American
D'Amato, R. C., & Rothlisberg, B. A. (1992). Psychological Guidance Service.
perspectives on intervention: A case study approach to Eliason, M. J., & Richman, L. C. (1987). The Continuous
prescriptions for change. New York: Longman. Performance Test in learning disabled and nondisabled
D'Amato, R. C., & Rothlisberg, B. A. (1996). How children. Journal of Learning Disabilities, 20, 614±619.
education should respond to students with traumatic Feuerstein, R., Rand, Y., & Hoffman, M. (1979). The
brain injuries. Journal of Learning Disabilities, 29, dynamic assessment of retarded performers: The Learning
670±683. Potential Assessment Device: Theory, instruments, and
D'Amato, R. C., Rothlisberg, B. A., & Leu, P. W. (in techniques. Baltimore: University Park.
press). Neuropsychological assessment for intervention. Figueroa, R. A., & Garcia, E. (1994). Issues in testing
In C. R. Reynolds & T. B. Gutkin (Eds.), The handbook students from culturally and linguistically diverse back-
of school psychology (3rd ed.). New York: Wiley. grounds. Multicultural Education, 2, 10±19.
D'Amato, R. C., Rothlisberg, B. A., & Rhodes, R. L. Frattali, C. M. (1993). Perspectives on functional assess-
(1997). Utilizing a neuropsychological paradigm for ment: Its use for policy making. Disability and Rehabi-
understanding common educational and psychological litation, 15, 1±9.
tests. In C. R. Reynolds & E. Fletcher-Janzen (Eds.), Frey, P. D., & Pinelli, B. (1991). Visual discrimination and
Handbook of clinical child neuropsychology (2nd ed.). visuomotor integration among two classes of Brazilian
New York: Plenum. children. Perceptual and Motor Skills, 72, 847±850.
Dana, R. H. (1993). Multicultural assessment perspectives Fuchs, L. S. (1994). Integrating curriculum-based measure-
for professional psychology. Boston: Allyn & Bacon. ment with instructional planning for students with
Daneman, M., & Carpenter, P. A. (1980). Individual learning disabilities. In N. C. Jordan & J. Goldsmith-
differences in working memory and reading. Journal of Phillips (Eds.), Learning disabilities: New directions for
Verbal Learning and Verbal Behavior, 19, 450±466. assessment and intervention (pp. 177±195). Boston: Allyn
Das, J. P., Kirby, J., & Jarman, R. F. (1979). Simultaneous & Bacon.
and successive cognitive processes. New York: Academic Gaddes, W. H., & Edgell, D. (1994). Learning disabilities
Press. and brain function: A neuropsychological approach (3rd
Das, J. P., Naglieri, J. A., & Kirby, J. R. (1994). Assessment ed.). New York: Springer-Verlag.
of cognitive processes. The PASS theory of intelligence Geil, M., & D'Amato, R. C. (1996). Contemporary
New York: Allyn & Bacon. ecological neuropsychology: An alternative to the medical
Dean, R. S. (1977). Canonical analysis of a jangle fallacy. model for conceptualizing learning disabilities. Manu-
Multivariate Experimental Clinical Research, 3, 17±20. script submitted for publication.
Dean, R. S. (1983). Intelligence-achievement discrepancies Golden, C. J. (1981). The Luria±Nebraska Children's
in diagnosing pediatric learning disabilities. Clinical Battery: Theory and formulation. In G. W. Hynd & J.
Neuropsychology, 3, 58±62. E. Obrzut (Eds.), Neuropsychological assessment and the
Dean, R. S. (1984). Functional lateralization of the brain. school-aged child: Issues and procedures (pp. 277±302).
Journal of Special Education, 18, 239±256. New York: Grune & Stratton.
Dean, R. S. (1985a). Neuropsychological assessment. In Golden, C. J., Sawicki, R. F., & Franzen, M. D. (1984).
R. Michels, J. O. Cavenar, H. K. H. Brodie, A. M. Test construction. In G. Goldstein & M. Hersen (Eds.),
Cooper, S. B. Guze, L. L. Judd, G. L. Klerman, & A. J. Handbook of psychological assessment (pp. 19±37). New
Solnit (Eds.), Psychiatry (pp. 1±16). Philadelphia: York: Pergamon.
Lippincott. Goldstein, D. J., Smith, K. B., & Waldrep, E. E. (1986).
Dean, R. S. (1985b). Foundation and rationale for Factor analytic study of the Kaufman Assessment
neuropsychological bases of individual differences. In Battery for Children. Journal of Clinical Psychology,
L. C. Hartlage & C. F. Telzrow (Eds.), The neuropsy- 42, 890±894.
chology of individual differences: A developmental per- Goodglass, H., & Kaplan, E. (1983). Boston Diagnostic
spective (pp. 7±39). New York: Plenum. Aphasia Examination (BDAE). Philadelphia: Lea and
Dean, R. S. (1986). Perspectives on the future of Febiger. Distributed by Psychological Assessment Re-
neuropsychological assessment. In B. S. Plake & J. C. sources, Odessa, FL.
Witt (Eds.), Buros-Nebraska series on measurement and Gray, J. W., & Dean, R. S. (1989). Approaches to the
testing: Future of testing and measurement (pp. 203±241). cognitive rehabilitation of children with neuropsycholo-
Hillsdale, NJ: Erlbaum. gical impairment. In C. R. Reynolds & F. Fletcher-
Dean, R. S. (1988). Lateral Preference Schedule. Odessa, Janzen (Eds.), Handbook of clinical child neuropsychology
FL: Psychological Assessment Resources. (pp. 397±408). New York: Plenum.
Dean, R. S., & Gray, J. W. (1990). Traditional approaches Greenberg, L. (1993). Test of variables of attention
to neuropsychological assessment. In C. R. Reynolds & (T.O.V.A.TM). Wood Dale, IL: Stoetling.
R. W. Kamphaus (Eds.), Handbook of psychological and Greenberg, L. M., & Waldman, I. D. (1993). Develop-
educational assessment of children: Intelligence and mental normative data on the test of variables of
achievement (pp. 371±388). New York: Guilford Press. attention (T.O.V.A.TM). Journal of Child Psychology
Dean, R. S., Schwartz, N. H., & Smith, L. S. (1981). and Psychiatry and Allied Disciplines, 34, 1019±1030.
Lateral preference patterns as a discriminator of learning Guilmette, T. J., & Giuliano, A. J. (1991). Taking the
References 263

stand: Issues and strategies in forensic neuropsychology. is (little more than) working-memory capacity?! Intelli-
The Clinical Neuropsychologist, 5, 197±219. gence, 14, 389±433.
Gutkin, T. B., & Reynolds, C. R. (Eds.) (1990). The LaBerge, D. (1995). Attentional processing: The brain's art
handbook of school psychology (2nd ed.). New York: of mindfulness. Cambridge, MA: Harvard University
Wiley. Press.
Halperin, J. M., Sharma, V., Greenblatt, E., & Schwartz, S. Lassiter, K. S., D'Amato, R. C., Raggio, D. J., Whitten, J.
(1991). Assessment of the Continuous Performance Test: C. M., & Bardos, A. N. (1994). The construct specificity
Reliability and validity in a nonreferred sample. Psy- of the Continuous Performance Test: Does inattention
chological Assessment, 3, 603±608. relate to behavior and achievement? Developmental
Hammill, D. D. (1991). Detroit Tests of Learning Aptitude Neuropsychology, 10, 179±188.
(DTLA-3) (3rd ed.). Austin, TX: PRO-ED. Lezak, M. D. (1983). Neuropsychological assessment (2nd
Hammill, D. D., & Bryant, B. R. (1991a). Detroit Tests of ed.). New York: Oxford University Press.
Learning Aptitude-Adult (DTLA-A). Austin, TX: PRO- Lezak, M. D. (1995). Neuropsychological assessment (3rd
ED. ed.). New York: Oxford University Press.
Hammill, D. D., & Bryant, B. R. (1991b). Detroit Tests of Lindgren, S. D., & Lyon, D. (1983). PACE: Pediatric
Learning Aptitude-Primary (DTLA-P:2) (2nd ed.). assessment of cognitive efficiency. Iowa City, IA: Uni-
Austin, TX: PRO-ED. versity of Iowa, Department of Pediatrics.
Hammill, D. D., & Larsen, S. C. (1996). Test of Written Long, C. J. (1996). Neuropsychological tests: A look at our
Language-3 (TOWL-3). Austin, TX: PRO-ED. past and the impact that ecological issues may have on
Hammill, D. D., & Newcomer, P. L. (1988). Test of our future. In R. J. Sbordone & C. J. Long (Eds.),
Language Development Intermediate (TOLD-2) (2nd Ecological validity of neuropsychological testing
ed.). Austin, TX: PRO-ED. (pp. 1±14). Delray Beach, FL: GR Press/St Lucie Press.
Hamsher, K. de S. (1984). Specialized neuropsychological Luria, A. R. (1970). The functional organization of the
assessment methods. In G. Goldstein & M. Hersen brain. Scientific American, 222(3), 66±78.
(Eds.). Handbook of psychological assessment Luria, A. R. (1973). The working brain: An introduction to
(pp. 235±256). New York: Pergamon. neuropsychology. New York: Basic Books.
Hartlage, L. C., & Golden, C. J. (1990). Neuropsycholo- Luria, A. R. (1980). Higher cortical functions in man (2nd
gical assessment techniques. In T. B. Gutkin & C. R. ed.). New York: Basic Books.
Reynolds (Eds.), The handbook of school psychology (2nd Markwardt, (1989). Peabody Individual Achievement Test-
ed., pp. 431±457). New York: Wiley. Revised (PIAT-R). Circle Pines, MN: American Gui-
Hartlage, L. C., & Telzrow, C. F. (1983). The neuropsy- dance Service.
chological basis of educational intervention. Journal of Martinez, M. A. (1985). Toward a bilingual school
Learning Disabilities, 16, 521±528. psychology model. Educational Psychology, 20, 143±152.
Hooper, S. R. (1995). Review of the Visual Search and Mastropieri, M. A., & Scruggs, T. E. (1989). Constructing
Attention Test. In J. C. Conoley & J. C. Impara (Eds.), more meaningful relationships: Mnemonic instruction
The twelfth mental measurements yearbook for special populations. Educational Psychology Review,
(pp. 1081±1082). Lincoln, NE: Buros. 1, 83±111.
Huebner, E. S. (1992). Review of the Wechsler Memory McNeil, M. M., & Prescott, T. E. (1978). Revised Token
Scale-Revised. In J. J. Kramer & J. C. Conoley (Eds.), Test. Austin, TX: PRO-ED.
The eleventh mental measurement yearbook Morris, R. D. (1996). Relationships and distinctions
(pp. 1023±1024). Lincoln, NE: Buros. among the concepts of attention, memory, and executive
Hynd, G. W., & Semrud-Clikeman, M. (1990). Neuropsy- function: A developmental perspective. In G. R. Lyon &
chological assessment. In A. S. Kaufman (Ed.), Assessing N. A. Krasnegor (Eds.), Attention, memory, and execu-
adolescent and adult intelligence (pp. 638±695). Boston: tive function (pp. 11±16). Baltimore: Brookes.
Allyn & Bacon. Palincsar, A., Brown, A. L., & Campione, J. C. (1991).
Hynd, G. W., & Willis, W. G. (1988). Pediatric neuropsy- Dynamic assessment. In H. L. Swanson (Ed.), Handbook
chology, Boston: Allyn & Bacon. on the assessment of learning disabilities: Theory,
Jarvis, P. E., & Barth, J. T. (1994). The Halstead-Reitan research, and practice (pp. 75±95). Austin, TX: PRO-ED.
Neuropsychological Battery: A guide to interpretation and Palisano, R. J., & Dichter, C. G. (1989). Comparison of
clinical applications. Odessa, FL: Psychological Assess- two tests of visual-motor development used to assess
ment Resources. children with learning disabilities. Perceptual and Motor
Just, M. A., & Carpenter, P. A. (1992). A capacity theory Skills, 68, 1099±1103.
of comprehension: Individual differences in working Poteat, G. M. (1995). Review of the Detroit Tests of
memory. Psychological Review, 99, 122±149. Learning Aptitude, Third Edition. In J. C. Conoley & J.
Kaufman, A. S. (1990). Assessing adolescent and adult C. Impara (Eds.), The twelfth mental measurement
intelligence. Boston: Allyn & Bacon. yearbook (pp. 277±278). Lincoln, NE: Buros.
Kaufman, A. S. (1994). Intelligent testing with the WISC- Pressley, M., & Levin, J. R. (Eds.) (1983). Cognitive
III. New York: Wiley. strategy research: Psychological foundations. New York:
Kavale, K. A., Forness, R. F., & Bender, M. (1988). Springer-Verlag.
Handbook of learning disabilities: Volume II: Methods Raggio, D. (1991). Raggio Evaluation of Attention Deficit
and interventions. Boston: College-Hill. Disorder (Computerized test). Jackson, MS: University
Klee, S. H., & Garfinkel, B. D. (1983). The computerized of Mississippi Medical Center, Infant and Child Devel-
Continuous Performance Task: A new measure of opment Clinic.
inattention. Journal of Abnormal Child Psychology, 11, Reitan, R. M., & Davison, L. A. (1974). Clinical
489±495. neuropsychology: Current status and applications. New
Kolb, B., & Whishaw, I. Q. (1990). Fundamentals of human York: Winston/Wiley.
neuropsychology (3rd ed.). New York: Freeman. Reitan, R. M., & Wolfson, D. (1985). The Halstead±Reitan
Kovacs, M., Goldston, D., & Ivengar, S. (1992). Neuropsychological Test Battery: Theory and clinical
Intellectual development and academic performance of interpretation. Tucson, AZ: Neuropsychology Press.
children with insulin-dependent diabetes mellitus: A Reitan, R. M., & Wolfson, D. (1993). The Halstead±Reitan
longitudinal study. Developmental Psychology, 28, Neuropsychological Test Battery: Theory and clinical
676±684. interpretation (2nd ed.). Tucson, AZ: Neuropsychology
Kyllonen, P. C., & Christal, R. E. (1990). Reasoning ability Press.
264 Assessment of Memory, Learning, and Special Aptitudes

Reitan, R. M., & Wolfson, D. (1995, October). Cognitive Adolescent Psychiatric Clinics of North America: Learn-
and emotional consequences of mild head injury. Paper ing Disabilities, 2, 181±192.
presented at the fall conference of the Colorado Simpson, N., Black, F. W., & Strub, R. L. (1986). Memory
Neuropsychological Society, Colorado Springs, CO. assessment using the Strub-Black mental status exam-
Resnick, L. B. (Ed.) (1976). The nature of intelligence. ination and the Wechsler Memory Scale. Journal of
Hillsdale, NJ: Erlbaum. Clinical Psychology, 42, 147±155.
Reynolds, C. R. (1981a). The neuropsychological basis of Slomka, G. T., & Tarter, R. E. (1993). Neuropsychological
intelligence. In G. W. Hynd & J. E. Obrzut (Eds.), assessment. In T. H. Ollendick & M. Hersen (Eds.),
Neuropsychological assessment and the school-aged child: Handbook of child and adolescent assessment
Issues and procedures (pp. 87±124). New York: Grune & (pp. 208±223). Boston: Allyn and Bacon.
Stratton. Sohlberg, M. M., & Mateer, C. A. (1990). Evaluation and
Reynolds, C. R. (1981b). Neuropsychological assessment treatment of communicative skills. In J. S. Kreutzer & P.
and the habilitation of learning: Considerations in the Wehman (Eds.), Community integration following trau-
search for the aptitude x treatment interaction. School matic brain injury. Baltimore: Paul H. Brookes.
Psychology Review, 10, 343±349. Strub, R. L., & Black, F. W. (1993). The mental status
Reynolds, C. R. (1986). Transactional models of intellec- examination in neurology (3rd ed.). Philadelphia: F. A.
tual development, yes. Deficit models of process Davis.
remediation, no. School Psychology Review, 15, 256±260. Swanson, H. L. (1981). Vigilance deficits in learning
Reynolds, C. R., & Bigler, E. D. (1994). Test of memory and disabled children: A signal detection analysis. Journal
learning. Austin, TX: PRO-ED. of Psychology and Psychiatry, 2, 339±398.
Reynolds, C. R., & Kamphaus, R. W. (1990). Handbook Swanson, H. L. (1995). Using the Cognitive Processing Test
of psychological and educational assessment of children: to assess ability: Development of a dynamic assessment
Intelligence and achievement. New York: Guilford measure. School Psychology Review, 24, 672±693.
Press. Swanson, H. L. (1996). Swanson Cognitive Processing Test
Ris, D., & Noll, R. B. (1994). Long-term neurobehavioral (S-CPT). Austin, TX: PRO-ED.
outcome in pediatric brain-tumor patients: Review and Talley, J. L. (1993). Children's Auditory Verbal Learning
methodological critique. Journal of Clinical and Experi- Test-2 (CAVLT-2). Odessa, FL: Psychological Assess-
mental Neuropsychology, 16(1), 21±42. ment Resources.
Rosvold, H., Mirsky, A., Sarason, I., Bransome, L., & Tarver, S. G., & Dawson, M. M. (1978). Modality
Beck, L. (1956). A continuous performance test of brain preference and the teaching of reading: A review. Journal
damage. Journal of Consulting Psychology, 20, 343±350. of Learning Disabilities, 11, 5±17.
Rothlisberg, B. A. (1991). Factor stability of the Lateral Taylor, H. G. (1988). Learning disabilities. In E. J. Mash &
Preference Schedule. International Journal of Neu- L. G. Terdal (Eds.), Behavioral assessment of childhood
roscience, 61, 83±85. disorders (2nd ed., pp. 402±450). New York: Guilford
Rothlisberg, B. A. (1992). Integrating psychological ap- Press.
proaches to intervention. In R. C. D'Amato & B. A. Taylor, H. G., & Fletcher, J. M. (1990). Neuropsycholo-
Rothlisberg (Eds.), Psychological perspectives on inter- gical assessment of children. In G. Goldstein & M.
vention: A case study approach to prescriptions for change Hersen (Eds.), Handbook of psychological assessment
(pp. 190±198). New York: Longman. (2nd ed., pp. 228±255). New York: Pergamon.
Rovet, J. F., Ehrlich, R. M., Czuchta, D., & Akler, M. Taylor, H. G., Fletcher, J. M., & Satz, P. (1984).
(1993). Psychoeducational characteristics of children and Neuropsychological assessment in children. In G. Gold-
adolescents with insulin-dependent diabetes mellitus. stein & M. Hersen (Eds.). Handbook of psychological
Journal of Learning Disabilities, 26, 7±22. assessment (pp. 211±234). New York: Pergamon.
Rovet, J. F., Ehrlich, R. M., & Hoppe, M. (1988). Specific Telzrow, C. F. (1985). The science and speculation of
intellectual deficits in children with early onset diabetes rehabilitation in developmental neuropsychological
mellitus, Child Development, 59, 226±234. disorders. In L. C. Hartlage & C. F. Telzrow (Eds.),
Sattler, J. M. (1992). Assessment of children (3rd ed., rev.). The neuropsychology of individual differences: A
San Diego, CA: Sattler. developmental perspective (pp. 271±307). New York:
Sbordone, R. J., & Long, C. J. (Eds.) (1996). Ecological Plenum.
validity of neuropsychological testing. Delray Beach, FL: Templer, D. I., & Drew, R. H. (1992). Noncontact sports.
GR Press/St Lucie Press. In D. I. Templer, L. C. Hartlage, & W. G. Cannon
Selz, M. (1981). Halstead-Reitan neuropsychological test (Eds.), Preventable brain damage: Brain vulnerability and
batteries for children. In G. W. Hynd & J. E. Obrzut health (pp. 30±40). New York: Springer.
(Eds.), Neuropsychological assessment and the school- Touyz, S., Byrne, D., & Gilandas, A. (1994). Neuropsychol-
aged child: Issues and procedures (pp. 195±235). New ogy in clinical practice. Boston: Academic Press.
York: Grune & Stratton. Trenerry, M. R., Crosson, B., DeBoe, J., & Leber, W. R.
Semel, E., Wiig, E. H., & Secord, W. (1987). Clinical (1990). Visual search and attention test. Odessa, FL:
Evaluation of Language Fundamentals-Revised (CELF- Psychological Assessment Resources.
R). San Antonio, TX: Psychological Corp. Walsh, K. W. (1978). Neuropsychology: A clinical approach.
Shea, V. (1989). Peabody Picture Vocabulary Test-Revised. New York: Churchill Livingstone.
In C. S. Newmark (Ed.), Major psychological assessment Wechsler, D. (1987). Wechsler Memory Scale-Revised man-
instruments (Vol. II, pp. 271±283). Boston: Allyn & ual. San Antonio, TX: The Psychological Corporation.
Bacon. Wedding, D., Horton, A. M., & Webster, J. S. (1986). The
Sheslow, D., & Adams, W. (1990). Wide Range Assessment neuropsychology handbook: Behavioral and clinical per-
of Memory and Learning (WRAML). Wilmington, DE: spectives. New York: Springer.
Jastak. Wepman, J. M., & Reynolds, W. M. (1987). Wepman's
Shiffrin, R. M., & Atkinson, R. C. (1969). Storage and Auditory Discrimination Test (2nd ed.) Los Angeles:
retrieval processes in long-term memory. Psychological Western Psychological Services.
Review, 76, 179±193. Westby, C. (1988). Test review: Test of Language
Shinn, M. R. (Ed.) (1989). Curriculum-based measurement: Development-2 Primary, Test of Language
Assessing special children. New York: Guilford. Development-2 Intermediate. The Reading Teacher, 42,
Silver, L. B. (1993). Introduction and overview to the 236±237.
clinical concepts of learning disabilities. Child and Whitten, J. C., D'Amato, R. C., & Chittooran, M. M.
References 265

(1992). A neuropsychological approach to intervention. Woodcock, R., & Johnson, M. B. (1989). Woodcock±
In R. C. D'Amato & B. A. Rothlisberg (Eds.), Johnson Psychoeducational Battery-Revised (WJPB-R).
Psychological perspectives on intervention: A case study Chicago: Riverside.
approach to prescriptions for change (pp. 112±136). White Woodrow, H. (1921). Intelligence and its measurement: A
Plains, NY: Longman. symposium (XI). Journal of Educational Psychology, 12,
Wiig, E. H., & Second, W. (1989). Test of Language 207±210.
Competence-Expanded Edition (TLC). San Antonio, Ylvisaker, M., Szekeres, S. F., Haarbauer-Krupa, J.,
TX: The Psychological Corporation. Urbanczyk, B., & Feeney, T. J. (1994). Speech and
Williams, J. M. (1991). Memory Assessment Scales (MAS). language intervention. In R. C. Savage & G. F. Wolcott
Odessa, FL: Psychological Assessment Resources. (Eds.), Educational dimensions of acquired brain injury
Williams, R. E., & Vincent, K. R. (1991). Review of the (pp. 185±235). Austin, TX: PRO-ED.
Peabody Individual Achievement Test-Revised. In D. J. Ylvisaker, M., Szekeres, S. F., & Hartwick, P. (1994). A
Keyser & R. C. Sweetland (Eds.), Test critiques (Vol. 8, framework for cognitive intervention. In R. C. Savage
pp. 557±562). Kansas City, MO: Test Corporation of & G. F. Wolcott (Eds.), Educational dimensions of
America. acquired brain injury (pp. 35±67). Austin, TX: PRO-
Wochnick Fodness, R., McNeilly, J., & Bradley-Johnson, ED.
S. (1991). Test±retest reliability of the Test of Language Youngjohn, J. R., Larrabee, G. J., & Crook, T. H. (1993).
Development-2: Primary and Test of Language New adult age- and education-correction norms for the
Development-2: Intermediate. Journal of School Psychol- Benton Visual Retention Test. The Clinical Neuropsy-
ogy, 29, 161±165. chologist, 7, 155±160.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.10
Neuropsychological Assessment
of Children
CYNTHIA A. RICCIO and CECIL R. REYNOLDS
Texas A&M University, College Station, TX, USA

4.10.1 INTRODUCTION 267


4.10.1.1 Assessment Process 269
4.10.2 MEASURES USED IN THE ASSESSMENT OF CHILDREN 271
4.10.2.1 Neuropsychological Interpretation of Children's Measures 272
4.10.2.2 Development of New Measures for Children 273
4.10.2.3 Current and Future Trends 274
4.10.2.3.1 Memory 274
4.10.2.3.2 Attention 275
4.10.2.3.3 Computer-administered assessment 276
4.10.2.3.4 Integration of neuroimaging and electrophysiology 276
4.10.2.3.5 Integration of cognitive and developmental psychology 277
4.10.2.4 Measurement Issues 277
4.10.2.5 Approaches to Test Selection with Children 279
4.10.2.5.1 Nomothetic approaches 280
4.10.2.5.2 Idiographic approaches 283
4.10.2.5.3 Combined approaches 285
4.10.2.6 General Organization of the Neuropsychological Assessment of the Child 287
4.10.2.7 Interpretation Issues 289
4.10.2.7.1 Performance level 290
4.10.2.7.2 Profile patterns 290
4.10.2.7.3 Functional asymmetry 290
4.10.2.7.4 Pathognomonic signs 291
4.10.2.7.5 Combination approaches 291
4.10.3 CONCLUSIONS 291
4.10.4 REFERENCES 293

4.10.1 INTRODUCTION derived from the study of adults with identified


insult to the brain. The major premise of
The area of clinical neuropsychology has only neuropsychological assessment is that different
recently been established as a viable specialty behaviors, including higher order cognitive
area (Woody, 1997). By definition, neuropsy- skills, involve differing neurological structures
chology is the study of brain±behavior relation- or functional systems (Luria, 1980). As such, the
ships that uses the theory and methodologies of neuropsychological approach to assessment
both neurology and psychology. Historically, involves assessment of various behavioral
neuropsychology has been used for the diag- domains believed to be related to functional
nostic assessment of adults with known brain systems and making inferences about brain
damage or injury; the clinical research has been integrity based on the individual's performance

267
268 Neuropsychological Assessment of Children

across these domains. Neuropsychological as- adequately with the multidimensionality of


sessment samples behaviors known to depend observed behavior, creating a unified or holistic
on the integrity of the central nervous system picture of a student's functioning (Rothlisberg &
(CNS) using measures that correlate with D'Amato, 1988), and providing documentation
cognitive, sensorimotor, and emotional func- of changes in behavior and development (Hynd
tioning based on clinical research (Dean & & Willis, 1988). Clinical child neuropsychology
Gray, 1990). provides a theoretical framework for under-
As more became known about brain± standing identified patterns of strengths and
behavior relationships, clinical findings and weaknesses, the relationships between strengths
theories were applied to the understanding of and weaknesses, and the extent to which these
learning and behavior problems of adults where patterns remain stable or are subject to change
brain damage/injury was not identified. Thus, over the course of development (Fletcher &
neuropsychological assessment as it is practiced Taylor, 1984; Temple, 1997). Increased under-
today grew out of the need to clarify patho- standing of the child's strengths and weaknesses
physiological conditions where brain damage can potentially be used to identify areas that
was not indicated by neurological, neuroradio- may provide difficulty for the child in the future,
logical, or electrophysiological methods, in as well as compensatory strategies or methods to
order to make differential diagnoses and circumvent these difficulties. It has further been
provide information that would be useful in argued that neuropsychological assessment of
treatment planning and follow up (Dean & children can provide a better understanding of
Gray, 1990). Based on the neuropsychological the ways in which neurological conditions
study of adults, this progressed to the applica- impact on behavior and the translation of this
tion of neuropsychological methods and per- knowledge into educationally relevant informa-
spectives to the understanding of learning and tion (Allen, 1989). Although not all psycholo-
other problems in children (L. C. Hartlage & gists agree with the application of
Long, 1997). neuropsychological principles to children (see
Luria's (1970, 1980) theory, while based on Riccio, Hynd, & Cohen, 1993), the growing
adults, can be applied to children and adoles- significance of clinical child neuropsychology is
cents. Neuropsychological techniques have evident in the increasing number of child clinical
been incorporated into the assessment of and school psychology graduate programs that
children for special education for some time offer coursework in neuropsychology (e.g.,
(e.g., Haak, 1989; Hynd, 1981) with increasing D'Amato, Hammons, Terminie, & Dean, 1992).
interest by neuropsychologists in educational As a result of this growing interest in clinical
problems such as learning disability and atten- child neuropsychology, the extent of knowledge
tion deficit hyperactivity disorder (ADHD). available regarding the developing brain has
The influence of theories specific to child increased dramatically since the 1980s. This
psychology, school psychology, and education, includes advances in the understanding of
are evident in the composition of neuropsycho- typical development of neuropsychological
logical assessment batteries, procedures, and functions (e.g., Ardila & Roselli, 1994; Halperin,
measures used with children (Batchelor, 1996a). McKay, Matier, & Sharma, 1994; Miller &
Increased interest and emphasis in the applica- Vernon, 1996; Molfese, 1995). Research has also
tion of neuropsychology to educational issues begun to explore physiological processes and
and children may be due to a variety of factors: subcortical motivational systems that, together
the emergence of clinical neuropsychology as a with environmental influences, are believed to
specialty area; advances in neuroscience and impact on how the relevancy of information is
clinical evidence, specific to brain±behavior determined and, ultimately, on the formation of
relationships, based on localized brain damage cognitive representations in typically developing
in childhood and youth; advances in technology children (Derryberry & Reed, 1996). Additional
(e.g., functional imaging) that are adding to the advances in educational arenas have been made
knowledge base regarding brain development in the understanding of learning disabilities (e.g.,
and function; and continued research efforts Feagans, Short, & Meltzer, 1991; Geary, 1993;
specific to problems encountered by children Obrzut & Hynd, 1983; Riccio, Gonzalez, &
and their neuropsychological functioning. Hynd, 1994; Riccio & Hynd, 1995, 1996) as
Several positive outcomes of the application well as in the understanding of the short- and
of neuropsychology to children and adolescents long-term problems associated with traumatic
have been identified. These include extending the brain injury (e.g., Bigler, 1990; Snow & Hooper,
range of diagnostic techniques available, pro- 1994); the sequelae of neurological impairment
viding for better integration of behavioral data of known causes such as lead poisoning,
(Dean, 1986; Gray & Dean, 1990; Obrzut & meningitis, and so on (e.g., Bellinger, 1995;
Hynd, 1983), increasing the ability to deal Taylor, Barry, & Schatschneider, 1993); the
Introduction 269

impact of cancer treatment on CNS function ment and methodology, will be discussed.
(e.g., Copeland et al., 1988); and the short- and Finally, future directions and issues that need
long-term sequelae in children identified as at- to be addressed in the neuropsychological
risk for learning problems due to perinatal or assessment of children, if clinical child neurop-
prenatal difficulties (e.g., Breslau, Chilcoat, sychology is to continue to add to the under-
DelDotto, & Andreski, 1996; Cohen, Beckwith, standing of underlying processes in children's
Parmalee, & Sigman, 1996; Gatten, Arceneaux, learning and behavior, as well as the application
Dean, & Anderson, 1994; Saigal, 1995; Waber & of that understanding to intervention programs,
McCormick, 1995). will be addressed.
While more research has focused on educa-
tional problems, increased risk for psychiatric 4.10.1.1 Assessment Process
disorder and long-term adjustment problems
have been found to be associated with brain Neuropsychological assessment generally in-
injury, both in adults and children (e.g., Breslau cludes assessment of a number of functional
& Marshall, 1985; Rutter, Graham, & Yule, domains that are, based on clinical evidence,
1970; Seidel, Chadwick, & Rutter, 1975). associated with functional systems of the brain.
Research consistently demonstrates that adjust- This is considered important for the develop-
ment and behavioral problems are associated ment of hypotheses and potential interventions
with children who have neurodevelopmental (L. C. Hartlage & Telzrow, 1986; Whitten,
deficits (e.g., Hooper & Tramontana, 1997; D'Amato, & Chitooran, 1992). Areas evaluated
Tramontana & Hooper, 1997). Children with generally include cognition, achievement, and
neurological impairment have been found to be behavior/personality/emotionality, as would be
six times more likely to develop emotional, assessed as part of a general psychological
behavioral, or motivational problems secondary evaluation. A neuropsychological evaluation
to, if not as a direct result of, the neurological provides for consideration of a wider array of
impairment (Dean, 1986). At one time it was functions, however, than is addressed in a
believed that specific relationships between typical psychological or psychoeducational
brain dysfunction and child psychopathology evaluation (Dean, 1985, 1986; Obrzut, 1981).
would be found; it is now posited that the In general, the neuropsychological evaluation is
relationships between brain integrity and psy- more thorough and also includes the assessment
chopathology are nonspecific and impacted by of perceptual, motor, and sensory areas, and of
secondary influences including failure, frustra- attention, executive function (planning, orga-
tion, social stigma, family reaction, and so on nization), and learning/memory (e.g., Dean &
(Tramontana & Hooper, 1997). Many advances Gray, 1990; Obrzut, 1981; Shurtleff, Fay,
have been made in the area of psychopathology, Abbot, & Berninger, 1988).
including the development of models specific Given that the neurodevelopment of Luria's
to the underlying neurological basis of ADHD functional systems and the experiences of the
(see Riccio, Hynd, & Cohen, 1996), autism (e.g., child interact in a reciprocal manner (Spreen,
Damasio & Maurer, 1978; Hooper, Boyd, Hynd, Risser, & Edgell, 1995), as well as the potential
& Rubin, 1993; Hurd, 1996; Maurer & Damasio, for adjustment/behavioral difficulties, the use
1982; Shields, Varley, Broks, & Simpson, 1996), has been advocated of a transactional model
schizophrenia in childhood and adolescence that takes into consideration the reciprocal
(e.g., Asarnow, Asamen, Granholm, & Sher- interactions of the child, home and family
man, 1994; Asarnow, Brown, & Strandburg, members, classroom (teacher and peers), and
1995; Hendren, Hodde-Vargas, Yeo, & Vargas, other social environments in which the child
1995), conduct disorder (e.g., Moffitt, 1993), and functions (Batchelor, 1996b; D'Amato &
anxiety (e.g., Gray, 1982). Various models (e.g., Rothlisberg, 1996; D'Amato, Rothlisberg, &
Gray, 1982; Kinsbourne, 1989; Nussbaum et al., Leu, in press; Teeter, 1997; Teeter & Semrud-
1988; Rourke, 1989; Tucker, 1989) have been Clikeman, 1997). This should incorporate
proposed to explain the interface between brain information from a variety of sources (e.g.,
function and behaviors associated with child- parents, teachers, physicians, medical records,
hood psychopathology. school records, and so on) in order to enable
This chapter will provide an overview of the cross-comparison (Batchelor, 1996b). In addi-
neuropsychological assessment process for chil- tion, it has been suggested that motivational
dren, both historically and in the context of factors (Batchelor, 1996b) and the child's ability
current practices and future trends. Continuing to cope with the injury/impairment need to be
concerns with regard to the translation to determined (Dean, 1986). Thus, the neuropsy-
children and adolescents of what is known chological assessment process not only incor-
about adult functioning and neuropsychological porates a more complete review of information
assessment, as well as concerns with measure- regarding the child but attempts to integrate this
270 Neuropsychological Assessment of Children

information with an understanding of brain± support inferences about the integrity of various
behavior relations and environmental factors functional systems of the brain (Shurtleff et al.,
(Taylor & Fletcher, 1990). 1988). The neuropsychological perspective
This means that a neuropsychological assess- leads to better understanding of underlying
ment involves a wide range of tasks focused on causes of learning and behavior problems; this
the child, as well as measures/observations of in turn results in an increased ability to develop
the various contexts in which the child functions appropriate interventions or circumvent future
and the associated expectations. Some critics of problems (D'Amato et al., 1997).
neuropsychological assessment have argued Ultimately, data generated from the neuro-
that so extensive an evaluation is not time- or psychological assessment process are used to
cost-effective (e.g., Little & Stavrou, 1993). For develop recommendations regarding whether
example, neuropsychological assessment of a the individual would profit from compensatory
learning disability goes beyond identifying the strategies, remedial instruction, or a combina-
academic deficit(s) to the identification of the tion of approaches (Gaddes & Edgell, 1994).
child's processing strengths and deficits as well Through the use of information about how
as the child's ability to function in a variety of various skills correlate in the developmental
contexts (Morris, 1994). The assessment of a process, neuropsychological assessment allows
wider range of higher cortical functions is one to make inferences not only about those
supported by research findings that neurologi- skills measured, but also about skills that have
cal disorders are seldom expressed as a single not been evaluated. Further, by understanding
dysfunction (Dean & Gray, 1990), and it has the neurological correlates of these skills and
been shown to improve differential diagnosis of of instructional methods, neuropsychological
learning problems (D'Amato, Rothlisberg, & assessment can assist in the formulation of
Rhodes, 1997; Morris, 1994; Rourke, 1994). It is hypotheses regarding potential instructional
further argued that the process of deriving methods/materials for a particular child (Rey-
hypotheses for intervention planning is a nolds, Kamphaus, Rosenthal, & Hiemenz,
complex process that requires a comprehensive 1997). For example, based on the neuropsycho-
assessment battery coupled with neuropsycho- logical evaluation of two children with autism,
logical foundations and familiarity with the differing nonverbal teaching strategies were
contexts and task demands of the child (Gaddes, identified for each child in order to improve
1983; Rourke, 1994). The cumulative perfor- their individual outcomes (Hurd, 1996).
mances of the child on neuropsychological While ªtreatmentº within the framework of
measures are seen as behavioral indicators of education is generally considered to consist of
brain function (Fennell & Bauer 1997). Based eligibility, placement decisions, and the devel-
on all of the data generated in the evaluation opment of an educational plan, ªtreatmentº
process, hypotheses are generated which are resulting from a neuropsychological assessment
specific to how and why a child processes frequently includes assistance with specific
information (D'Amato, 1990; Dean 1986; Leu medical management, vocationally related
& D'Amato, 1994; Whitten et al., 1992). goals, speech/language areas, and physical
Inferences are then made based on the child's issues (Cohen, Branch, Willis, Weyandt, &
performance on a variety of measures and the Hynd, 1992; Dean & Gray, 1990). Effective
theoretical perspective of the clinician. interventions need to take into consideration
Much of the skepticism regarding the appli- the myriad psychosocial contexts in which the
cation of neuropsychology to problems of child functions and adjustment and motiva-
learning and behavior in children has centered tional issues, and to identify those environ-
on the assessment±intervention interface (Little mental modifications that can ameliorate or
& Stavrou, 1993; Samuels, 1979; Sandoval & reduce the behavioral effects of brain dysfunc-
Halperin, 1981). It is argued by some that tion (Batchelor, 1996b; Teeter & Semrud-
neuropsychological perspectives do not add to Clikeman, 1997). It has been suggested that
the ability to develop remedial and treatment the interventions developed must also be
programs and may even lead to a sense of multidimensional and incorporate not only
hopelessness (Little & Stavrou, 1993). Others academic, behavioral, and psychosocial techni-
have argued that the information obtained from ques, but also include motivational, metacog-
neuropsychological assessment can be used as nitive, medical, and classroom management
the basis for developing appropriate interven- techniques (Batchelor, 1996b; Teeter & Semrud-
tion programs (Gaddes & Edgell, 1994; Clikeman, 1997). However, correct diagnosis
Reynolds, 1981b; Rourke, 1991, 1994; Teeter and early implementation of treatment strate-
& Semrud-Clikeman, 1997). Data about the gies that work have been shown to be cost-
additional areas of functioning included in the effective in dollars and in the quality of a child's
neuropsychological assessment are needed to life (Reynolds, Wilen, & Stone, 1997).
Measures Used in the Assessment of Children 271

4.10.2 MEASURES USED IN THE drawing in adults, identification of unilateral


ASSESSMENT OF CHILDREN spatial neglect has been of particular interest
(e.g., Heilman, Watson, & Valenstein, 1985;
As previously noted, the application of Mesulam, 1985). Developmental study of clock-
neuropsychological theory and assessment with face drawing with children found that this
children was derived from applications with hemispatial neglect was developmental and not
adults. In the development of clinical child infrequent through the age of seven years
neuropsychology, historically, one of the basic (Edmonds, Cohen, Riccio, Bacon, & Hynd,
avenues used in determining the assessment 1993). It was concluded that this developmental
measures and processes to be used consisted of pattern was consistent with the development of
modifying, for use with children, existing the frontal lobes and planning ability in children.
neuropsychological batteries and other mea- Neurodevelopment follows an ontogenetic
sures already used for adults (L. C. Hartlage & course with primary cortical zones generally
Long, 1997). In some cases, this involved mature by birth (Luria, 1980). Secondary and
modifying some tasks in the battery or adding tertiary areas continue to develop postnatally.
tasks. An alternative strategy involved collect- These include the integrative systems involved in
ing some normative data on children for existing the higher order functions of learning, memory,
tasks. Both of these strategies were based on the attention, emotion, cognition, and language as
clinical efficacy of the measures with adults, not well as the association areas. The association
with children, and on the assumption that tasks areas are the last of these areas to develop and
for adults measure the same thing when used myelinate (Goldman & Lewis, 1978; Goldman-
with children. Similarly, in the assessment and Rakic, 1987). Vygotsky (1980) suggested that
hypothesis generation process, it is tempting to not only is there continued development of
assume that neuropsychological findings from secondary and tertiary areas, but that the
adults will be useful with children; however, this interaction of primary, secondary, and tertiary
has not been shown to be a valid assumption. areas is likely to change with chronological age
When applied to children and adolescents, the (Merola & Leiderman, 1985; Rutter, 1981).
premise that behavior can be used to make Although the developmental sequence for the
inferences about brain function and integrity has formation of neural pathways and the myelina-
to be expanded to include consideration of tion of specific locations corresponding to
neurodevelopmental differences that exist as a specific behaviors have been identified, these
function of the age of the child. To directly apply do not correspond directly to models of
adult inferences/hypotheses to children ignores cognitive development (Spreen et al., 1995).
what is known about changes in the functional Knowledge of typical neurodevelopmental
organization of the brain as children grow progress has increased since the 1980s; however,
(Cohen et al. 1992; Fletcher & Taylor, 1984). most of what is practiced today, as well as
Research has provided evidence of age-based the theoretical bases in neuropsychology, is
differences in children for verbal memory (Kail, grounded on observations and informal assess-
1984; Miller & Vernon, 1996), language (Sega- ment of individuals with identified brain damage
lowitz, 1983) and right hemisphere functions (Reynolds, 1997b). Extensive research regarding
(Bakker, 1984; Wittelson, 1977). Recent re- typical neurodevelopment, particularly in rela-
search, for example, found that the relationship tion to higher order cognitive skills, is limited,
of memory, general intelligence, and speed of and the changing organization over time of
processing in children was not consistent with brain function in children is only beginning to be
adult models (Miller & Vernon, 1996). Research understood (Hynd & Willis, 1988). Thus, there
has also suggested that typologies generated are still many unanswered questions regarding
from the Halstead±Reitan Neuropsychological the developmental progression of many func-
Battery (HRNB) used with children differed tional systems, particularly at the associative
from typologies generated with adults, in that and integrative levels, and concerning how the
the child groups were more homogeneous, but neurodevelopmental progression maps onto the
provided less coverage (28±42%) as compared to cognitive functioning observed.
adult typologies (Livingston et al., 1997). It is often assumed, for example, based on
Because of neurodevelopmental changes, it is earlier theory, that children reach adult levels of
also not possible to view brain dysfunction on a performance at 8±10 years of age. For example,
continuum based on behavioral deficits as these Luria (1966) suggested that the frontal lobes
may change over time (Fletcher & Taylor, 1984). become functional between the ages of four and
Further, there is often an over-reliance on seven years. This in turn led to the assumption
signs of dysfunction in adults as reflecting that executive functioning would approach
pathology in children when these may be adult levels by age 8±10 years. It has been
developmental. For example, on clock-face suggested that the greatest period of frontal lobe
272 Neuropsychological Assessment of Children

development occurs at the six- and eight-year- of neuropsychological performance need to be


old levels, which is consistent with Luria's initial different for adults and children. For children,
hypothesis (Passler, Isaac, & Hynd, 1985). the nature and persistence of learning problems
Subsequent research, however, has demon- is dependent on the status of development of
strated that the development of frontal lobe various brain structures, the effects of the
functioning continues at least through age 12 injury/insult, and the interactions between
and possibly through age 16 (e.g., Becker, Isaac, functional and dysfunctional neurological sys-
& Hynd, 1987; Chelune & Baer, 1986; Levin tems, as well as genetic and environmental
et al., 1991; Welsh, Pennington, & Grossier, influences (Teeter & Semrud-Clikeman, 1997).
1991). Further, while cognitive ability does not Neuropsychological assessment of children
appear to be a factor for particular measures of and adolescents requires not only tests/measures
frontal lobe functioning after age 12, it has been that are age-appropriate and have sufficient
suggested that cognition can impact perfor- empirical support for the inferences being made
mance on frontal lobe measures in younger between neurological substrates and the beha-
children (Chelune & Thompson, 1987; Riccio, vioral performance of the child, but the
Hall, et al., 1994). Thus, it is important to first generation of inferences also needs to take into
have a strong foundation of understanding of consideration these developmental issues
the normal neurodevelopmental course before it (Cohen et al., 1992). Further, it is important
is possible to interpret accurately and differ- to document the sensitivity of the measures to
entiate behaviors that represent an alteration or neurobehavioral and neurodevelopmental func-
deviance from expected neurodevelopment. tioning in children (Fletcher & Taylor, 1984).
Not only do neurodevelopmental courses Although the measures are derived predomi-
need to be considered, there are complex nantly from neuropsychological study and
differences between children and adults in the clinical evidence regarding adults with known
mechanisms of brain pathology that lead to brain injury, a developmental perspective needs
neuropsychological and behavioral/affective to be maintained in the application of neuro-
problems and these do not necessarily follow a psychology to children (Hooper & Tramontana,
similar progression in children as for adults 1997). Unfortunately, many of the measures
(Fennell & Bauer, 1997; Fletcher & Taylor, used with adults do not have the sensitivity
1984). The developing brain of the child needs necessary to reflect developmental issues and, as
to be considered in that the impact of neurolo- a result, the utility of procedures used with
gical insult is influenced by age as well as adults in the neuropsychological assessment of
location and nature of injury, gender, socio- children has multiple pitfalls and has been
economic status, level of emotional adjustment questioned (e.g., Cohen et al., 1992; Fletcher &
and coping, and the individual's own adaptive Taylor, 1984).
skills (Bolter & Long, 1985). With the develop-
ment of the child occurring on a continuous
basis and at a rapid rate, it is often difficult to 4.10.2.1 Neuropsychological Interpretation of
obtain sufficient consistency from the premor- Children's Measures
bid history (Batchelor, 1996a). Accurate estima-
tion of premorbid ability levels is best obtained Another approach to applying neuropsycho-
from previous individualized standardized cog- logical principles in the assessment of children
nitive or achievement assessment, or if this is took measures already in use for children (e.g.,
unavailable, from results of group-administered standardized intelligence tests) and interpreted
standardized data from school records with these measures from a neuropsychological
some consideration for potential regression perspective; where existing child measures did
effects (Reynolds, 1997c). For young children, not exist, these measures were then developed.
this information is not generally available. L. C. Hartlage and Long (1997) indicated that
Prenatal and perinatal, as well as postnatal most practitioners preferred this method (inter-
developmental histories may be inaccurate, preting child-based measures from a neuropsy-
incomplete, or unknown, particularly in very chological perspective) as opposed to using
young children (Batchelor, Gray, Dean, & adult measures with child norms. As with
Lowery, 1988; Gray, Dean, & Rattan, 1987). adults, this has occurred most frequently with
Even in school-aged children, teacher reports, the Wechsler scales. General summary scores of
grades, and so on may result in inaccurate Wechsler scales have been found to be reliable
estimations of premorbid ability (Reynolds, indicators of brain integrity (Black, 1976; Hynd
1997c). & Willis, 1988). Various subtests of the WISC-R
Given the different mechanisms and progres- also have been found to correlate with neurop-
sion involved in the pathology, it is clear that the sychological measures (see Batchelor, Sowles,
inferences drawn from and the interpretations Dean, & Fischer, 1991) and have been used to
Measures Used in the Assessment of Children 273

formulate hypotheses (Kaplan, 1988). Multiple tion, the lateralization component of the KABC
efforts have been made with regard to recate- is based on the way in which the information is
gorizing or clustering various subtests to processed or manipulated. Within each scale,
provide for neuropsychological interpretation there is a variation of mode of presentation and
of the WISC-R. L. C. Hartlage (1982), for response that allows for further evaluation of
example, suggested that the functional integrity complex functional systems (Reynolds & Kam-
of the right and left hemispheres could be phaus, 1997). KABC interpretation is intended
estimated by comparing the Similarities and to identify cognitive neuropsychological
Picture Arrangement subtests (temporal lobe) strengths of the child, and the related instruc-
and the Arithmetic and Block Design subtests tional methods and learning activities that will
(parietal lobe). Bannatyne (1974) proposed four exploit these strengths and circumvent deficit
categories of neuropsychological function that areas. Research on the effectiveness of this
could be assessed and interpreted based on model for intervention is, however, limited.
combinations of subtests on the WISC-R: Evaluation of the KABC with regard to its
verbal comprehension, sequencing, spatial, relevance to Luria's approach and to child
and acquired knowledge. Kaufman (1979) neuropsychology has been positive (e.g.,
recategorized the subtests into successive and Donders, 1992; Majovski, 1984; Snyder, Leark,
simultaneous tests, based on Luria's theory. Golden, Grove, & Allison, 1983). It has been
Concerns with this practice have been suggested that the KABC is a good complement
evidenced in the literature. Interpretations to other neuropsychological tests. Specifically
based on isolated measures of a child's behavior with regard to the use of the KABC as a
(e.g., a single subtest) have limited reliability component of a neuropsychological battery, it
and validity (Kamphaus, 1993; Lezak, 1995) has been shown to provide useful information in
and this is often what occurs in this process. the differential diagnosis of learning disability
Recategorizations of multiple subtests (e.g., subtypes (e.g., Hooper & Hynd, 1985; Telzrow,
Bannatyne, 1974; Kaufman, 1979), appear to Century, Harris, & Redmond, 1985) and right
have greater reliability, but the validity of these hemisphere dysfunction, which is consistent
recategorizations continues to be questionable with physical evidence (Morris & Bigler, 1985;
(see Kamphaus, 1993). Further, in many cases Shapiro & Dotan, 1985). Similar positive results
there is no attempt to translate the inferences were found in the comparison of dichotic
made, using these methods, into effective listening performance and KABC results (Diet-
interventions. zen, 1986). Thus, the KABC has been shown to
be sensitive to traumatic brain injury to specific
cortical regions (Donders, 1992). Research also
4.10.2.2 Development of New Measures for indicated that the pathognomonic and intellect
Children scales of the Luria Nebraska Neuropsychologi-
cal Battery-Children's Revision were closely
As opposed to trying to ªmake doº with related to performance on the global scales of the
existing children's measures, additional mea- KABC (Leark, Snyder, Grove, & Golden, 1983).
sures have been developed with an underlying Research results overall tend to support the use
neuropsychological basis. For example, the of the KABC in neuropsychological assessment,
Luria±Das model of successive/simultaneous and subtests of the KABC are frequently used in
processing (Das, Kirby, & Jarman, 1979) in eclectic batteries (e.g., Nussbaum et al., 1988;
conjunction with the cerebral lateralization Branch, Cohen, & Hynd, 1995).
research by Sperry (1968, 1974), Kinsbourne The KABC may well be the test of choice for
(1975), and others, served as the basis for the children under age five (Reynolds et al., 1997);
development of the Kaufman Assessment the use of sample and teaching items adds to the
Battery for Children (KABC; Kaufman & likelihood that a neurological substrate or
Kaufman, 1983a). As such, the design of the functional system is being assessed as opposed
KABC is compatible with current neuropsy- to language, experience, or culture (Reynolds &
chological models of higher order cognitive Kamphaus, 1997). The KABC has strong
function (Reynolds & Kamphaus, 1997). Unlike validity and reliability (Kamphaus, 1993), is
the Wechsler scales, where mode of presentation sensitive to developmental changes in informa-
determines the scale with which a task is tion processing/functional organization (Rey-
associated, on the KABC the cognitive proces- nolds & Kamphaus, 1997), and is considered an
sing demands of the task (e.g., simultaneous or appropriate instrument for use with US ethnic
sequential) determine the scale with which it is minorities (e.g., Fan, Willson, & Reynolds,
associated (Kaufman & Kaufman, 1983b). 1995; Kamphaus & Reynolds, 1987). While
Further, rather than conceptualizing lateraliza- further research with the KABC in conjunction
tion based on content or method of presenta- with neuropsychological assessment is needed,
274 Neuropsychological Assessment of Children

available research supports the potential for the suggested that the traditional combining of
KABC to be a useful tool for child neuropsy- forward and backward digits may be inap-
chologists with results providing implications propriate and that these tasks represent quite
for the habilitation of learning problems different cognitive demands (Ramsey &
(Reynolds & Kamphaus, 1997). Reynolds, 1995; Reynolds, 1997a) with distinct
neuropsychological substrates. Initial findings
suggest, for example, that forward memory
4.10.2.3 Current and Future Trends span may be more directly impacted by
attention while backward memory span may
The development of new measures, specifi- be more a reflection of general intelligence.
cally designed and normed for children may not Additional investigation into the distinction
only reflect current interest areas in children's between forward and backward memory span,
learning and behavior, but may in many ways as well as into other areas of memory continues
dictate the future directions of neuropsycholo- to be needed. Due to the increased interest in
gical assessment of children. In particular, since this area, children's norms for measures used in
the late 1980s a number of measures have been the assessment of memory in adults have been
developed which are specific to memory and developed (e.g., Delis, Kramer, Kaplan, &
attention. At the same time, there is also an Ober, 1994). In addition, three comprehensive
increase in the use of technology, with or measures for the assessment of memory/learn-
without the inclusion of electrophysiological or ing have been developed specifically for children
imaging methods, which is evident in the and adolescents since the mid-1980s. The
research literature and clinical practice. development of these measures has in many
ways been due to the perceived inappropriate-
ness of adult measures of memory for use with
4.10.2.3.1 Memory
children and the inability to relate results from
Nearly every disorder that involves the CNS adult measures to the contexts (e.g., school) in
and higher cognitive functions includes some which children function.
form of memory complaint; memory is incor- The first of the measures developed for the
porated in almost all daily activities (Reynolds assessment of memory in children, the Wide
& Bigler, 1997a). Research across neurological Range Assessment of Memory and Learning
disorders points to the importance of memory (WRAML; Sheslow & Adams, 1990), consists
in evaluating brain integrity (Reynolds & of 12 subtests which yield verbal memory and
Bigler, 1997a); 80% of a sample of clinicians visual memory scores, with normative data for
who performed testing noted memory as children ages 5±17 years. Delayed recall trials
important (Snyderman & Rothman, 1987). can be given for four of the subtests. Initial
Standard psychoeducational batteries used with factor analysis of the WRAML corroborated
children tend to focus solely on cognitive ability the two-factor structure (Haut, Haut, Callahan,
as defined by IQ, achievement, and behavioral & Franzen, 1992); however, with at-risk
status. In the area of learning disabilities, there children and a clinical population, three factors
has been recent interest in examining the under- were extracted (Aylward, Gioia, Verhulst, &
lying psychological processes, and particularly Bell, 1995; Phelps, 1995). Some concern has
learning and memory (Zurcher, 1995). Research been voiced with regard to the multiple items/
in the area of memory and the development of tasks that may tap attention as opposed to
new measures to assess memory functions may memory and the absence of consideration of
lead to further interest in the learning process attention/concentration (Haut et al., 1992).
itself (Reynolds, 1992). It has been argued, Further, evaluation of the WRAML for
additionally, that the assessment of learning children with, compared with those without,
and memory would provide useful information ADHD or learning disabilities indicated that
for instructional planning (Wasserman, 1995). the WRAML provided little additional infor-
Historically, assessment of memory in chil- mation for discriminating between clinical
dren relied on the use of subtests from various groups (Phelps, 1996).
tests including the KABC, the WISC-III and its The Test of Memory and Learning (TOMAL;
earlier versions, and so on (e.g., Nussbaum et al., Reynolds & Bigler, 1994) consists of 10 core
1988; Branch et al., 1995). All too frequently, subtests (five verbal and five nonverbal) yielding
inferences regarding verbal memory in parti- separate verbal memory and nonverbal memory
cular relied on the Digit Span subtest of the scale scores as well as a composite memory
Wechsler scales. Multiple concerns about rely- score. A delayed recall procedure can be
ing on Digit Span can be found in the research implemented to provide a delayed recall index.
literature (e.g., Reynolds, 1997a; Talley, 1986). Additional supplemental indices (e.g., sequen-
Recent research in the area of memory has tial recall, free recall, attention/concentration,
Measures Used in the Assessment of Children 275

and learning) can also be computed. Using a of the assessment process. The assessment of
variety of factor analytic methods, Reynolds attention, more so than of other domains, has
and Bigler (1996) examined the latent structure moved to computerized approaches. The most
of the TOMAL. Factor analytic study of the comprehensive battery of computerized mea-
TOMAL indicated that the factor solutions sures is the Gordon Diagnostic System (GDS;
obtained were highly stable across all age Gordon, 1983). This is a microcomputer-based
groups. Notably, none of the solutions obtained assessment that includes 11 tasks specific to
matched the verbal±nonverbal dichotomy attention and self-regulation. Since the devel-
usually considered and represented by the two opment of the GDS, a number of other
scales of the TOMAL. Instead, what emerged computer-based measures of attention and
were components representing various levels of impulsivity have been developed and marketed.
complexity in memory tasks and processing These programs tend to vary with regard to the
demands that cut across modalities. Alternative actual paradigm used; there are variations in the
methods of interpretation based on the factor modality employed, the type of stimuli, and the
analytic results are available (see Reynolds & nature of the task (Halperin, 1991).
Bigler, 1996). The TOMAL does provide Continuous performance tests (CPTs), for
separate scores for forward and backward example, may require a response only when a
recall, in contrast to many scales that combine specified target stimulus is presented (if X) or
these inappropriately. Unlike most neuropsy- only when the target stimulus follows another
chological measures (Reynolds, 1997b), the specified stimulus (if AX) and so on. A further
TOMAL included studies of ethnic and gender variation of this is a similar task where the
bias during standardization; items showing required ªresponseº to the presentation of the
cultural biases were eliminated. target stimuli is, however, to inhibit responding
Most recently, the Children's Memory Scale (Conners, 1995). The stimuli may be presented
(CMS; Cohen, 1997) was developed with in a visual or auditory format, or in a
linkages to the WISC-III built in to the combination format requiring a modality shift.
standardization process. The composition of Also, depending on the program used, the
the CMS was based on extensive clinical scores may be limited to correct responses,
practice with initial tasks and items, field trials commission errors, and omission errors, or may
of the measures, and feedback from clinicians include reaction time information.
involved in the field trials. The CMS consists of Through the use of computerized measures of
six core subtests representing verbal memory, attention, knowledge specific to the develop-
attention/concentration, and visual/nonverbal mental nature of attentional processes has been
memory as well as three supplemental subtests. gleaned (Mitchell, Chavez, Baker, Guzman, &
The CMS provides for evaluation of immediate Azen, 1990). Research has demonstrated the
recall as well as delayed recall of the verbal and usefulness of computerized measures of atten-
nonverbal memory areas. For scoring purposes, tion and self-regulation for monitoring the
seven index scores can be calculated to examine effects of medical management (e.g., Barkley,
differences between immediate/delayed verbal/ DuPaul, & McMurray, 1991; Barkley, Fischer,
visual memory, learning, recognition, and Newby, & Breen, 1988; Hall & Kataria, 1992). It
attention/concentration. Factor analytic studies was anticipated that computerized assessment
of the standardization sample were conducted of attention would provide more objective data
and four models evaluated to determine the in the assessment process for ADHD as well as
ªbest fit.º Results indicated that the three-factor providing information specific to attentional
solution (attention/concentration, verbal mem- deficits associated with traumatic brain injury
ory, visual memory) was the most consistent or other neurological disorders (Timmermans &
(Cohen, 1997). Christensen, 1991). The results of studies with
various paradigms for CPTs are equivocal with
regard to discriminant validity specific to
4.10.2.3.2 Attention
ADHD (e.g., Barkley et al., 1991; Wherry
It has been argued that the most frequent et al., 1993) as well as concerning the extent to
symptoms associated with childhood neuro- which results are consistent with teacher
psychological disorder include attention/con- perceptions (Barkley, 1991, 1994). Interpreta-
centration, self-regulation and emotional/ tion of these measures is limited by the
behavioral problems (Nussbaum & Bigler, availability of comprehensive research with
1990). Further, it is the neural traces left by any one software program. The extent to which
attention that are likely the root of memory. It is cultural differences, gender differences, cogni-
not surprising that there is increased interest in tive ability, order effects, and so on impact on
the measurement of attentional processes or CPT performance is unknown. Further, the
that these are seen as an important component extent to which the particular paradigms used
276 Neuropsychological Assessment of Children

provide predictive information that may be nor can a computer draw conclusions regarding
helpful in intervention planning has not been level of attention, motivation, fatigue, and so on
studied. that may be cues to discontinue testing for a
brief period. Computers also cannot provide the
child with prompts and encouragement as
4.10.2.3.3 Computer-administered assessment
needed to maintain performance over time
Measures of attention are not the only (Kane & Kay, 1997). At the extreme, there is the
computer-based assessment tools. A computer- potential for computers to be used as a
ized neuropsychological test battery for adults substitute for a complete evaluation and this
has been developed (Powell et al., 1993), is of concern (Kane & Kay, 1997). First (1994)
computer-administered interviews and self-re- asserted that clinicians needed to be well advised
ports are available, and specific neuropsycho- of the limitations as well as the strengths of
logical tests or their analogs can be administered computerized assessment procedures. As the
via computer (e.g., Burin, Prieto, & Delgado, number of computer-driven assessments in-
1995; Heaton, 1981). Computerized assessment creases, there will need to be an analogous
of children's reading skills has been investigated increase in the research field comparing the
with indications of high coefficients of equiva- various programs and their psychometric
lence with traditional assessment (Evans, properties with each other and with more
Tannehill, & Martin, 1995). With advances in traditional methods of assessments. At the time
microcomputers, the use of computerized of writing, in the late 1990s, many computerized
assessment will likely increase in the near future. assessment methods fail to meet established
The use of computers and technology in testing standards (Kane & Kay, 1997).
assessment has a number of advantages and
clearly allows for the development of an
4.10.2.3.4 Integration of neuroimaging and
increasing variety of tasks without excessive
electrophysiology
and cumbersome testing materials; computer-
ized assessment may be less time-consuming With advances in neuroscience, clinical and
and, as such, cost- and time-effective. Further, research protocols may more frequently include
the speed or measure of time to task completion neuroradiological methods in conjunction with
is considered one of the most sensitive indices in neuropsychological techniques in order to
neuropsychological assessment and computer enhance understanding of childhood disorders.
programs can provide increased accuracy in the This type of ªpartnershipº is already occurring
measurement of speed of processing (Kane & in a number of research areas (e.g., Bigler, 1991;
Kay, 1997). Kane and Kay (1997) point out a Denckla, LeMay, & Chapman, 1985; Duffy,
number of additional advantages to the use of Denckla, McAnulty, & Holmes, 1988; Hynd,
computers in the assessment process, including Marshall, & Semrud-Clikeman, 1991). The
presentation of items at a fixed rate (computer- integration of information from neuroradiology
paced) as well as providing for accurate measure with neuropsychological assessment has already
of time to completion (child-paced). Computers established relationships for specific lesions and
can also be used to generate multiple forms of a associated behaviors and is beginning to estab-
test, thus providing baseline data as well as a lish a better understanding of the relationship
means of monitoring change over time. With between myelination differences and white/gray
computerized assessment, standard/uniform matter ratios (e.g., Harbord et al., 1990; Jernigan
administration is ensured and results are free & Tallal, 1990; Turkheimer, Yeo, Jones, &
of potential bias. Computers further facilitate Bigler, 1990). The availability of imaging using
the production of relevant test statistics (Kane ultrasound has added to the knowledge of
& Kay, 1997). relationships between gross abnormalities evi-
There are however, multiple concerns and dent in vitro and later negative outcomes (e.g.,
disadvantages with ªdiagnosis by computer.º Beverley, Smith, Beesley, Jones, & Rhodes,
Predominant among these is the loss of 1990; Iivaneihan, Launes, Pihko, Nikkinen, &
information from not being able to observe Lindroth, 1990). Measurement issues in ima-
the process and strategy used by the individual ging, such as differences in resolution from one
in reaching the solution (Powell, 1997). First magnetic resonance image to another, continue
(1994) concluded that computerized assessment to be problems in this area, but will hopefully be
processes were advantageous, but cautioned resolved in the future. While in the past routine
that clinicians must continue to be a strong EEG of children did not offer much utility in the
component in the diagnostic process in order to evaluation of learning or behavior problems, the
provide for diagnostic validity. Computers development of computer-assisted analysis has
cannot replace the information gained from improved the interpretability of electrophysio-
interaction and clinical observation of process logical measures (Duffy & McAnulty, 1990).
Measures Used in the Assessment of Children 277

Computerized measures have been developed to on the evaluation of interventions in the area of
examine more closely the speed of information executive processes. Torgesen (1994) argued
processing, through reaction time paradigms that current measures of executive function,
that have included linguistic (e.g., Lovrich, with the presumed assumption for a need for
Cheng, & Velting, 1996) as well as visual stimuli novelty, evidence a lack of cross-theoretical
(Novak, Solanto, & Abikoff, 1995), in conjunc- integration between neuropsychology and the
tion with electrophysiological measures. information-processing paradigms. He further
This integration of methods across neuro- stated that there is a need to include assessment
science and neuropsychology is providing of tasks that are ecologically based and require
further evidence concerning brain±behavior executive function, in order to enhance the
relationships and adding to the knowledge base evaluation of treatment programs designed to
related to neurodevelopmental processes in remediate executive processes. Certainly, the
children and adolescents. Functional imaging production of child-centered, developmentally
and other imaging quantification methods hold sensitive measures of executive processing, that
promise for furthering the future understanding are more directly linked to real-life activities
of neuropsychological performance (Bigler, thus facilitating the development of interven-
1996). Similarly, it has also been argued that tions, and that have sufficient flexibility to allow
a comprehensive and integrative assessment for pre- and postevaluation, is needed. Overall,
process, that involves both the neurologist and integration of neuropsychological assessment
neuropsychologist with the tools and expertise and models of cognitive development may lead
of both disciplines, may enhance the value and not only to a better understanding of deficit
role of neuropsychological assessment (Batch- processes but also to better remediation/habi-
elor, 1996a, 1996b). litation programs (Williams & Boll, 1997).

4.10.2.3.5 Integration of cognitive and


developmental psychology 4.10.2.4 Measurement Issues
Neuropsychological assessment of children is Although research methods and statistical
being influenced more and more by develop- tools have greatly improved since the early
mental and cognitive psychology. This is most 1970s, clinical child neuropsychology has been
apparent in the areas of language, attentional criticized for its failure to attend to principles of
processes, and executive functions (Williams & research and to incorporate psychometric
Boll, 1997). Integration across fields has been advances (Cicchetti, 1994; Parsons, & Prigata-
suggested specifically with regard to metacogni- no, 1978; Reschly & Gresham, 1989; Ris & Noll,
tion (from cognitive psychology) and executive 1994; Sandoval, 1981; Willson & Reynolds,
function (Torgesen, 1994). The domain of 1982). Problems with statistical methods and
ªexecutive functionº may incorporate a variety design in clinical neuropsychology have been
of constructs (e.g., attention, self-regulation, frequently noted (e.g., Adams, 1985; Dean,
working memory) but the ªexecutiveº processes 1985; Reynolds, 1986a, 1986b, 1997b).
generally focus more on effortful and flexible One major concern relates to the extent and
organization, strategic planning, and proactive nature of normative data for many measures
reasoning (Denckla, 1994). Denckla further used in the neuropsychological assessment of
asserted that executive function cannot be dealt children. Although clinical insight may be
with as a ªcompositeº of scores on various gained by observation of test performance,
measures, but must be fractionated. The sound normative data provides a backdrop
measurement of executive function in children against which to evaluate that insight
is exceptionally difficult due to the ongoing (Reynolds, 1997b). The systematic development
development and maturation of the frontal and presentation of normative data across the
lobes through adolescence. Factor analytic lifespan for many tools used in neuropsycho-
study of executive function tasks (Welsh et al., logical assessment have received far too little
1991) yielded factors that appeared to be attention to date (Reynolds, 1986b) and greater
divided according to developmentally related attention in this area is needed. Good normative
constructs as opposed to theoretical ones data require extensive systematic and stratified
(Denckla, 1994). The majority of measures for sampling of a population in order to obtain a
executive function which are used with children reliable standard against which to judge the
are downward extensions of adult measures and performance of others. The provision of
many lack sufficient normative data and adequate normative data has multiple benefits
psychometric study. In addition, the emphasis for the field of neuropsychology, including
on the use of novel tasks in the assessment of improved communication among clinicians and
executive function places significant limitations researchers, increased accuracy in diagnosis,
278 Neuropsychological Assessment of Children

and facilitation of training for new members to detected (Reynolds, 1986b). Reliability is also
the discipline. In addition, good normative data the foundation on which validity is built.
provide the opportunity to deflate and expose a Related to issues of reliability and validity, the
variety of clinical myths (Reynolds, 1986b). method of scaling/measurement used with any
Most of what is known about the measures test or measure is ªcrucialº (Reynolds, 1997b,
used is specific to the performance of those with p. 189). Scaling across neuropsychological meas-
identified brain injury/insult as opposed to ures, however, is inconsistent. Frequently what
typically developing individuals. Normative are obtained are raw scores for number correct,
data that are available in the literature are often time for completion, or number of errors. This
based on small samples, may have been collected results in the need to use score transformations,
in a single geographical region, and do not based on insufficient normative data, in order to
reflect the ethnic diversity, socioeconomic levels, make any kind of meaningful comparison.
or gender composition of the general popula- Alternatively, clinicians may use inappropriate
tion. In many cases, the sample is predominantly scales such as age or grade equivalents in an
male and Caucasian, yet research suggests that attempt to give meaning to raw data (Reynolds,
gender and cultural differences may also con- 1997b). Grade equivalents, in particular, are
tribute to variations in brain organization (e.g., inappropriate due to the extent of extrapolation
McGlone & Davidson, 1973). that is used in their derivation as well as faulty
The lack of sufficiently large, stratified assumptions that are made with regard to
samples in the development and standardization learning and growth over time (e.g., from lower
of neuropsychological assessment inhibits the to upper grades, across subject areas, and
understanding of demographic influences throughout the calendar year). Further, grade
(Reynolds, 1997b) and thus complicates test equivalents exaggerate small differences in
interpretation (Dean, 1985). That cultural dif- performance between individuals and for a
ferences exist on standardized measures is well single individual across tests (Reynolds,
documented. Mostly, the use of neuropsycho- 1986b). It is, therefore, imperative that standard
logical measures with Hispanic populations score conversions, based on adequate normative
has been studied (e.g., Ardila & Roselli, 1994; data, be provided for measures used in neurop-
Ardila, Roselli, & Putente, 1994; Arnold, sychological assessment.
Montgomery, Castenada, & Langoria, 1994), It has been asserted that by age 10 or 12
but overall research on the effects of cultural children perform at adult levels in some areas,
differences (e.g., differences in the value of speed and for many older neuropsychological mea-
of responding) is sparse. Differences between sures most of the normative sampling, in
ethnic groups have also been examined with addition to using small numbers, often stopped
respect to specific measures of memory (e.g., at age 12. This is despite the fact that many
Mayfield & Reynolds, 1997). However, for most researchers have suggested that neurodevelop-
neuropsychological measures there has been no ment continues through at least age 14 (Boll,
study of ethnic and gender differences; cultural/ 1974) and possibly through age 16 (Golden,
ethnic differences are infrequently accounted for 1981). This further limitation in the provision of
in the collection of normative data and therefore normative data impedes the interpretation
cannot be used in the interpretation process. process for adolescents, bolsters the assumption
All too frequently, neuropsychologists rely that adolescents should function as adults, and
on the ªclinicalº nature of the test and overlook promotes the use of downward extensions of
the psychometric concepts of reliability and adult measures that often are not appropriate.
validity. The need for the establishment of Neuropsychological function is developmental,
reliability of neuropsychological measures has and distinct age-related norms are required. It
been cited in the literature (e.g., Parsons & has also been recommended that item response
Prigatano, 1978, Reynolds, 1982); reliability theory (IRT) be used to ensure that neuropsy-
information on neuropsychological measures is chological measures include an adequate range
not routinely reported in research studies and is of difficulty levels, thus ensuring coverage of
frequently not included in the test manuals developmental levels (Morris, 1994). Most
(Reynolds, 1997b). Reliability of test scores is existing neuropsychological measures, however,
important as it relates to the amount of variance have not been subjected to this type of analysis.
that is ªreal,º systematic, and related to true The standardization of administration pro-
differences between individuals. Therefore, it is cedures is also an area of concern. Reynolds
important to determine the reliability of (1986b) commented on the availability of at least
neuropsychological measures for purposes of four versions of Halstead's category test, three of
individual diagnosis as well as for research, in which were somewhat similar and the fourth
that reliability influences the likelihood that any with significant differences in terms of admin-
experimental or treatment effects will be istration. Despite these differences, however, the
Measures Used in the Assessment of Children 279

same normative data are used. Similarly, influenced by gender, premorbid status, the task
administration of the Wisconsin Card Sorting itself, neuropsychological functions, and so on,
Test (Heaton, 1981) can be done traditionally or making a high level of specificity difficult to
via computer, yet there is a single normative data attain (Batchelor, 1996b). Batchelor (1996b)
set to be used for scoring and interpretation. suggested that many neuropsychologists com-
Differences in administration impact on the promise between sensitivity and specificity
validity and reliability of the measure and through the selection, administration, and
normative data, including validity studies, for interpretations of neuropsychological measures
each variation of administration (unless con- that are needed to effect such a balance.
trolled in the standardization process) are Often, in an attempt to provide accurate
necessary. differential diagnosis, a large set of behaviors is
Sensitivity, specificity, and diagnostic accu- typically assessed. Researchers then use multi-
racy need to be further researched as well. variate classification procedures for determina-
Sensitivity is the extent to which a given test tion of group information or to determine the
accurately predicts brain impairment and is effectiveness of specific measures in the diag-
often gauged by statistical power (Pedhazur, nostic process. The sample sizes in many of the
1973). Sensitivity is dependent on validity. No studies, however, are too small for multivariate
single neuropsychological measure demon- analysis, given the large number of variables
strates high sensitivity (Boll, 1978); combined involved. As a result, in the absence of cross-
scores from a given battery may be more suc- validation, many diagnoses or classifications
cessful (e.g., Selz & Reitan, 1979a). In contrast, may be due to random relationships (Willson &
specificity is dependent on the nature of the Reynolds, 1982). Problems with the lack of
behavioral, cognitive, and emotional functions consistency in the diagnosis/classification of
of the task (Batchelor, 1996b). The extent of disorders also impede the research process and
specificity can only be determined by comparing ultimately, clinical practice (Hooper & Tra-
clinical groups to each other as opposed to montana, 1997).
focusing on differences between a specific
clinical group and the normal population. 4.10.2.5 Approaches to Test Selection with
Cross-clinical group comparisons are frequently Children
not done however. When research is based on
comparisons across clinical groups, the results In addition to selecting tests based on
are generally inconclusive (Koriath, Gualtieri, psychometric properties, it has been suggested
van Bourgondien, Quade, & Werry, 1985). that child neuropsychologists should select
In comparing clinical groups, it is important measures that vary along a continuum of
to control for comorbidity and family history difficulty, include both rote and novel tasks,
(Seidman et al., 1995) as well as to differentiate and vary the tasks with regard to processing and
between subtypes of a given disorder when these response requirements within modalities
have been identified (e.g., Halperin, 1991). For (Rourke, 1994). Many neuropsychologists con-
many disorders, subtypes have been validated, tinue to include observation and informal
yet frequently research studies with clinical assessment; others have adopted more actuarial
groups rely on the more global rubric. With approaches; many use a combination of
regard to learning disabilities, Rourke (1994) observation, informal assessment, and actuarial
asserted that this ªlumpingº together may lead approaches (Reynolds, 1997b), with a focus on
to gross misunderstanding, if not to conflicting direct appraisal of functions and abilities in
results across studies. For example, the need to order to obtain detailed information on the
develop and incorporate typologies/subtypes behavioral effects of brain impairment (Tra-
for homogeneous grouping of children with montana, 1983). It has been argued that the use
dyslexia, for the purposes of developing appro- of actuarial methods that rely on standardized
priate intervention as well as for research measures to obtain information may not,
purposes, has been recognized for some time, however, be useful in intervention planning
yet much of the educational and psychological (D'Amato, 1990).
literature and practices relating to dyslexia In addition to quantitative measures, neuro-
continue to address heterogeneous groups of psychological assessment may incorporate not
children without regard for subtype (Reynolds, only Luria's theory but also his qualitative
1986b). Differing typologies and comorbidity assessment model (Luria, 1966, 1970). Luria
have rarely been considered in the extant described assessment that was flexible and
literature on many neuropsychological mea- varied from individual to individual depending
sures and likely contribute to the conflicting on the functional system that was of concern
results of differing studies. Further, the con- (Teeter, 1986). Although more dependent on
structs being measured by given tasks may be clinician interpretation, qualitative methods can
280 Neuropsychological Assessment of Children

add to information related to the process of 4.10.2.5.1 Nomothetic approaches


learning and may be better suited to the
development of intervention/treatment plans The fixed/standardized battery or nomothetic
(D'Amato et al., in press). In the incorporation approach uses the same assessment protocol for
of qualitative tasks, child neuropsychologists all children being assessed. An example of the
make use of work samples, informal tasks, nomothetic approach would be the administra-
criterion-referenced measures and clinical ob- tion of a published neuropsychological battery,
servations of interactions throughout the assess- usually in conjunction with IQ and achievement
ment process (D'Amato et al., 1997). Qualitative tests. It may also be a predetermined set group-
procedures can also be used to complete a task ing of selected tests that remains constant across
analysis and determine specifically which com- children evaluated, regardless of the referral
ponents of a more complex task are problematic problem (Sweet, Moberg, & Westergaard, 1996).
for the child (Taylor, 1988). Others may use These tend to be more actuarial in nature (Lezak,
standardized measures but administer them in 1995) and often rely on cut-off scores, pathog-
other than standardized fashion (Kaplan, 1988). nomonic signs, or a combination, for determi-
Modifications of tasks presented (e.g., provision nation of the presence of brain damage. The
of cues, adjustment of rate, changing modality of choice of a fixed battery approach is generally
presentation or response, adjustment of task related to an orientation and preference con-
complexity) can provide insight into processing sistent with standardized procedures, objective
differences (Clark & Hostettler, 1995; Harring- methods, and psychometric development. It
ton, 1990; Ylvisaker et al., 1990) and have been may also reflect a preference for ªblindº
recommended for use in the evaluation of assessment such that the referral problem does
children who are culturally or linguistically not dictate the measures used (Goldstein, 1997).
diverse (Gonzalez, Brusca-Vega, & Yawkey, This approach has the advantage of covering a
1997). Unfortunately, tests administered with breadth/depth of functions, provides for ex-
these types of modifications are no longer tensive databases, and facilitates the collection
consistent with standardization procedures, of data for clinical interpretation of large
and clinicians need to exercise caution in the numbers of clinical groups. Standardized/no-
interpretation of brain±behavior relations based mothetic batteries, however, often do not take
on qualitative data (D'Amato et al., in press). A into consideration education, age, and experi-
strictly qualitative approach using experimen- ential variables, and may or may not specifically
tal/ad hoc measures and nonquantitative/ non- address the referral question. Further, diagnosis
standardized interpretation of standardized with a nomothetic approach may be driven by
measures may provide additional information, the base rates of the clinical problems in a
but does not allow for verification of diagnostic particular setting, due to sampling bias, and
accuracy, is not easily replicated, and does not therefore may not be useful for detecting
allow for formal evaluation of treatment disorders in other population samples (Tramon-
methods (Rourke, 1994). In practice, most tana & Hooper, 1987). Use of a standardized
clinicians prefer a combination of quantitative battery/nomothetic approach appears to be
and qualitative measures (Rourke, 1994). declining in the general area of clinical neurop-
Test selection in the neuropsychological sychology (Sweet et al., 1996); however it may be
assessment of children and young people varies the preferred method if litigation is a potential
considerably from clinician to clinician due to issue (Reitan & Wolfson, 1985).
differences in philosophy and theoretical foun- The published neuropsychological batteries
dations. Evaluation may take the form of vari- most frequently used with school-aged children
ous published battery approaches (e.g., Golden, are the Luria Nebraska Neuropsychological
1997; Reitan, 1974; Selz, 1981) or may use a more Battery-Children's Revision (LNNB-CR;
eclectic approach (e.g., Benton, Hamsher, Golden, 1984), the Halstead±Reitan Neuro-
Varney, & Spreen, 1983; Gaddes, 1980; Hynd psychology Battery (HRNB; Reitan & Davison,
& Cohen, 1983; Knights & Norwood, 1979; 1974) and the Reitan Indiana Neuropsycholo-
Obrzut, 1981; Obrzut & Hynd, 1986; Rourke, gical Test Battery for Children (RINB; Reitan,
Bakker, Fisk, & Strang, 1983; Rutter, 1983; 1969). All of these batteries require extensive
Spreen & Gaddes, 1979; Teeter, 1986; Tramon- training for appropriate administration and
tana & Hooper, 1987). Generally, however, the interpretation of results. The neuropsychologi-
approaches can be categorized as nomothetic, cal battery is often supplemented with a
idiographic, or a combination of these two traditional test of cognitive ability as well as
approaches. Additional variation within cate- achievement testing.
gories, however, is evident in the extent to which The HRNB and RINB are considered to be
clinicians rely on quantitative, qualitative, or the most widely used in clinical practice
both types of information in the process. (Howieson & Lezak, 1992; Nussbaum & Bigler,
Measures Used in the Assessment of Children 281

1997). Both of these use a multiple inferential (Golden, 1981, 1997) and was revised four times
approach to interpretation, including level of in the process (Plaisted, Gustavson, Wilkening,
performance, pathognomonic signs, patterns of & Golden, 1983). It is designed for children ages
performance, and right±left differences (Reitan, 8±12 years and in addition to IQ and achieve-
1986, 1987). The batteries contain numerous ment provides information specific to motor,
measures that are considered necessary for rhythm, tactile, visual, receptive speech, ex-
understanding brain±behavior relationships in pressive language, and memory functions. A
children and adolescents. Descriptions of these description of the LNNB-CR is provided in
measures are provided in Tables 1 and 2. Both Table 3. The development of the LNNB-3
the HRNB and the RINB can be used in clinical represents an extensive revision and major
practice for the assessment of a child with expansion of the LNNB-CR and the adult
identified brain damage as well as with those version. It includes tasks from the previous two
children where specific brain damage has not measures, but also includes additional tasks,
been documented through neuroradiological with a total of 27 domains being evaluated. With
methods (Nussbaum & Bigler, 1997). this major revision, both lower level and more
Strong correlations have been found between complex tasks and items have been added. The
the Wechsler Intelligence Scale for Children- LNNB-3 is intended for use with individuals
Revised (WISC-R; Wechsler, 1974) and the from age five through adulthood. Interpretation
RINB and HRNB (Klesges, 1983) suggesting of the LNNB-CR and LNNB-3 focuses pre-
the ability of the latter tests to predict dominantly on scale patterns and intrascale
neuropsychological dysfunction. A number of (intraindividual) differences, as opposed to
factor analytic studies have been completed levels of performance or pathognomonic signs.
comparing results from the WISC-R and the Due to its recent development, there is little
Reitan batteries (e.g., Batchelor et al., 1991; research available on the LNNB-3 and most is
D'Amato, Gray, & Dean, 1988; Snow & Hynd, specific to adults (e.g., Crum, Bradley, Teichner,
1985a); these consistently suggest that most of & Golden, 1997; Crum, Golden, Bradley, &
the variance is due to factors of language, Teichner, 1997). Extensive research has, how-
academic achievement, and visual spatial skills. ever been completed with the LNNB-CR.
With the addition of other measures, up to eight Factor analytic studies (e.g., Karras, Newton,
factors were found (D'Amato et al., 1988; Franzen, & Golden, 1987; Pfeiffer, Naglieri, &
Batchelor et al., 1991). Although factor analytic Tingstrom, 1987; Sweet, Carr, Rossini, &
research has demonstrated a good deal of Kasper, 1986) have resulted in varying factor
common information when the WISCR and structures. It has been determined consistently
HRNB were both given, it has also been that the LNNB-CR offers unique information
determined that the HRNB offers unique not otherwise obtained in psychoeducational
information (Klonoff & Low, 1974) and the assessment, with particular sensitivity to deficits
addition of the HRNB to the typical psycho- in language, writing, reading, and rhythm
educational battery has been found to increase (Geary & Gilger, 1984). The pathognomonic
the extent to which variability in school scale of the LNNB-CR has been found to
achievement can be accounted for (Strom, account for increased variance, independently
Gray, Dean, & Fischer, 1987). Research of the WISC-R, and to be a better predictor of
regarding the efficacy of the HRNB in the academic achievement in spelling and reading
differential diagnosis of children with learning (McBurnett, Hynd, Lahey, & Town, 1988). It
problems is equivocal (Arffa, Fitzhugh-Bell, & was also found that the LNNB-CR had greater
Black, 1989; Batchelor, Kixmiller, & Dean, shared variance than the WISC-R with mea-
1990; Selz & Reitan, 1979a, 1979b). Factor sures of achievement (Hale & Foltz, 1982). The
analytic research with the RINB has been less LNNB-CR has been found to be more sensitive
conclusive (Crockett, Klonoff, & Bjerring, to improvement in functioning following med-
1969; Foxcroft, 1989; Teeter, 1986). The RINB ical intervention (e.g., shunt placement) than
has been found to be sensitive to mild levels of either cognitive or achievement measures (Tor-
traumatic brain injury within four months of kelson, Liebrook, Gustavson, & Sundell, 1985),
injury in the absence of obvious lags in academic as well as supporting differential diagnosis
achievement (Gulbrandson, 1984). The HRNB (Carr, Sweet & Rossini, 1986) and the under-
and RINB are both downward extensions of the standing of academic deficits in children with
adult version with some modifications for emotional/behavioral problems (Tramontana,
children (Teeter, 1986) and do not fully reflect Hooper, Curley, & Nardolillo, 1990). The utility
the developmental continuum of childhood and of the LNNB-CR in the differentiation of
youth (Cohen et al., 1992). learning disability as opposed to other forms of
The LNNB-CR was developed on the basis of brain damage, however, has been questioned
the neurodevelopmental stages of the child (e.g., Morgan & Brown, 1988; Oehler-Stinnett,
282 Neuropsychological Assessment of Children

Table 1 Halstead±Reitan Neuropsychological Battery for Children (ages 9±14 years).

Subtest Description Function(s) assessed

Category test Requires individual to select colors This task assesses general abstraction
or numbers corresponding to some and concept formation as well as
abstract problem-solving criteria. general neuropsychological
Immediate feedback is provided for functioning (Reitan & Wolfson, 1985,
both correct and incorrect responses. 1988).
Tactual performance test The individual is blindfolded and This task measures tactual
required to place blocks in slots on a discrimination, sensory recognition,
form board using the dominant and spatial memory. The drawing
hand, the nondominant hand, and component is a measure of incidental
both hands together. The individual learning/memory (Reitan & Wolfson,
is then asked to draw a diagram of 1988; Selz, 1981).
the board with the blocks in their
proper spaces.
Speech sounds perception A taperecorded voice presents a This task measures alertness,
sequence of 60 spoken nonsense attention/concentration, and verbal
words from which the individual ability (Reitan & Wolfson, 1988)
must select the correct word each
time from three written choices.
Seashore rhythm test The individual is required to This test is thought to be an indicator
differentiate between 30 pairs of of generalized cerebral function as
rhythmic patterns which are well as a measure of alertness and
sometimes the same and sometimes attention/concentration. (Reitan &
different. Wolfson, 1988)
Trail making test This test uses two tracking tasks, one The test is believed to measure
with numbers (A) and one with conceptual flexibility, symbolic
letters and numbers (B). First, the recognition, and visual tracking under
individual must connect numbered time constraints (Selz, 1981). It is also
circles in order; then, the individual used as a measure of overall
must connect circles in sequence, functioning (Reitan & Wolfson, 1985,
alternating numbers and letters. 1988).
Finger oscillation test This test requires the individual to This measures motor speed and
depress a lever as quickly as possible manual dexterity (Selz, 1981) and
with the index finger of each hand. lateral dominance (Reitan & Wolfson,
1988.)
Aphasia screening test This test includes enunciation of It is a measure of verbal ability. The
spoken language (repeating), drawings are indicative of the verbal-
naming, reading, writing, spelling, to-motor process (Reitan & Wolfson,
and arithmetic. It also includes 1988).
copying of a square, circle, and
Greek cross.
Sensory perceptual All measure receptive sensory
examination function (Reitan & Wolfson, 1985,
Tactile perception The individual is asked to report 1988).
whether right hand, left hand, right
side of face, or left side of face is
touched; touches are done
unilaterally and bilaterally.
Auditory perception Examiner lightly rubs fingers
together at the individual's right, left
or both ears and the individual is
asked to localize the sound
produced.
Visual perception The individual is asked to report
peripheral, unilateral and bilateral
single movements produced by the
examiner, to assess all four
quadrants of the visual field.
Measures Used in the Assessment of Children 283
Table 1 (continued)

Subtest Description Function(s) assessed

Tactile form recognition The individual must identify a cross, This test is believed to measure tactile
triangle, square or circle when put in perception as well as attention
the dominant hand behind a board (Nussbaum & Bigler, 1997).
(unseen) and point to that same
object with the nondominant hand;
the same process is then carried out
with the object in the nondominant
hand.
Fingertip number writing This requires the individual to This is a measure of sensory
identify numerals written on their perceptual functioning (Reitan &
fingertips (both hands). Wolfson, 1988).
Grip strength test Using a hand dynamometer, the This measures motor functioning and
strength of grip for the dominant and lateral dominance (Reitan & Wolfson,
nondominant hand is determined. 1988)

Stinnett, Wesley, & Anderson, 1988; Snow & provided for all tasks. The structured interview
Hynd, 1985b; Snow, Hynd, & Hartlage, 1984). and mental status exam are intended to provide
Hynd (1992) also questioned the appropriate- information specific to emotional state, motiva-
ness of the standardization sample. tion, temperament, and prior medical condi-
More recently, the Neuropsychological In- tions as well as to premorbid history, age at
vestigation for Children (NEPSY; Korkman, onset and emotional reaction (coping) that may
Kirk, & Kemp, 1997) has been developed for influence neuropsychological performance
young children. Based on Luria's model (1970), (Dean & Woodcock, in press). It is projected
the NEPSY consists of 27 subtests that are that this battery will be available in both English
summarized in test profiles of strengths and and Spanish, with general as well as focused
weaknesses. Initially developed in Finnish, in its norms that account for age and education using
English version the test includes subtests regression methods. As with the NEPSY,
specific to attention and executive functions, clinical research with the DWNAS will be
and language, sensorimotor, visuospatial and needed once all components of the system are
memory/learning functions (Korkman et al., available.
1997). It is intended for use with children age Nomothetic approaches may also be eclectic
3±12 years. Early research on the NEPSY and use selected measures to sample behaviors
appears positive (e.g., Korkman, 1988; Kork- from the differing functional systems. Several
man, & Hakkinen-Rihu, 1994; Korkman, examples of eclectic batteries can be found in the
Liikanen, & Fellman, 1996); however, addi- research literature (e.g., Nussbaum et al., 1988;
tional research with this measure, particularly in Rourke, 1994). Since the combinations of
comparison to the KABC for younger children, measures vary in an eclectic battery from
will be needed following publication. clinical setting to clinical setting, research
In addition, a new battery, the Dean± regarding the efficacy of any given combination
Woodcock Neuropsychological Assessment of tasks as compared to other combinations or
System (DWNAS: Dean & Woodcock, in press) to the published batteries is not feasible, and
is in the process of development. Based on the factor analytic studies of eclectic batteries are
work of Catell and Horn (Horn, 1988, 1991), not routinely found in the research literature.
this battery combines the cognitive battery of
the Woodcock±Johnson Psychoeducational
4.10.2.5.2 Idiographic approaches
Battery-Revised with a newly developed battery
of sensorimotor tests, a structured interview, At the other end of the continuum, an
and a mental status exam. The sensory and idiographic approach tailors the assessment
motor portion is projected to include eight tests battery to the referral question and the child's
of sensory function and nine tests of motor individual performance on initial measures
function. Although some of these tests are administered (Christensen, 1975; Luria, 1973).
similar to those on other neuropsychological This type of approach is intended to isolate
batteries, standardized administration, objec- neurobehavioral mechanisms that underlie the
tive scoring criteria, and normative data will be problem of a particular individual rather than
284 Neuropsychological Assessment of Children

Table 2 Reitan Indiana Neuropsychological Battery (ages 5±8).

Subtest Description Function(s) assessed

Category Test Requires the individual to select This task assesses general abstraction
colors or numbers corresponding to and concept formation as well as
some abstract problem-solving general neuropsychological
criteria. Immediate feedback is functioning. (Reitan & Wolfson,
provided. This version has fewer items 1985).
and only five categories.
Matching pictures test The child matches a single picture to This task assesses abstraction and
the same picture or to a picture from a concept formation (Reitan &
more general category. Wolfson, 1988).
Color form test The child must alternately touch This task assesses abstraction and
shapes and colors. concept formation (Reitan &
Wolfson, 1988).
Progressive figures test The child must use the small shape This task assesses concept formation
within a large shape to select the outer (Reitan & Wolfson, 1988). Also
shape of the next figure in sequence in involves cognitive flexibility and
a timed condition. attention (Nussbaum & Bigler, 1997)
Tactual performance test The individual is blindfolded and This task measures tactual
required to place blocks in slots on a discrimination, sensory recognition,
form board using the dominant hand, and spatial memory. The drawing
the nondominant hand, and both component is a measure of incidental
hands together. The individual is then learning (Reitan & Wolfson, 1988;
asked to draw a diagram of the board Selz, 1981).
with the blocks in their proper spaces.
Finger oscillation test This test requires the individual to This measures motor speed and
depress a lever as quickly as possible manual dexterity (Selz, 1981) and
with the index finger of each hand. lateral dominance (Reitan &
Wolfson, 1988.)
Fingertip symbol writing This requires the individual to identify This is a measure of sensory
Xs and Os written on their fingertips perceptual functioning and attention
(both hands). (Reitan & Wolfson, 1988).
Marching test The child is required to touch a This measures motor functioning
sequence of circles as quickly as (Reitan & Wolfson, 1988).
possible.
Sensory perceptual All measures are sensitive to
examination receptive sensory function (Reitan &
Tactile perception The individual is asked to report Wolfson, 1985, 1988).
whether right hand, left hand, right
side of face, or left side of face is
touched; touches are done unilaterally
and bilaterally.
Auditory perception Examiner lightly rubs fingers together
at the individual's right, left or both
ears and the individual is asked to
localize the sound produced.
Visual perception The individual is asked to report
peripheral, unilateral and bilateral
single movements produced by the
examiner, to assess all four quadrants
of the visual field.
Grip strength test Using a hand dynamometer, the This measures motor functioning
strength of grip for the dominant and and lateral dominance of the upper
nondominant hand is determined. body (Reitan & Wolfson, 1988).
Measures Used in the Assessment of Children 285
Table 2 (continued)

Subtest Description Function(s) assessed

Tactile form recognition The individual must identify a cross, This test is believed to measure tactile
triangle, square, or circle when put in perception as well as attention
the dominant hand behind a board (Nussbaum & Bigler, 1997).
(unseen) and point to that same object
with the nondominant hand; the same
process is then carried out with the
object in the nondominant hand.
Aphasia screening test This test includes enunciation of It is a measure of verbal ability. The
spoken language, naming, reading, drawings are indicative of the verbal-
writing, and arithmetic; naming, to-motor process (Reitan &
identifying body parts, left/right, Wolfson, 1988).
numerals and letters. It also includes
drawing a square circle, and Greek
cross. It is an abbreviated version of
the screening test for older children
and adults.
Individual performance These measure visual perceptual and
Matching Vs, figures Child must match figures or Vs. Child spatial abilities (Reitan & Wolfson,
Star, concentric squares must copy figures of varying 1988).
difficulty.

providing a detailed evaluation of all areas of 1986; Rourke, Fisk, & Strang, 1986). The
functioning. With no predetermined uniformity components of the flexible battery itself gen-
across evaluations and the dependence on the erally reflect the theoretical position taken by the
individual's presenting problems, this approach neuropsychologist with regard to the manner in
requires substantial clinical knowledge to which behavioral performance reflects brain
determine the components of the battery in pathology and the reasons for referral for
order to meet this goal. Due to the limited data neuropsychological evaluation for a given
on neuropsychological functions and organiza- individual (Bauer, 1994). According to surveys
tion of behavior in children, this approach is less completed in the 10 years since 1987, the flexible
frequently used (Fennell, 1994). However, it battery approach is generally that preferred by
may be more cost-effective because of the small neuropsychologists working with populations
number of domains which are assessed of varying ages (Sweet & Moberg, 1990; Sweet
(Goldstein, 1997). A major drawback to the et al., 1996). This approach is believed to more
idiographic approach is the limited research accurately identify specific deficits (Batchelor,
base which is generated and the inability to 1996b).
verify or study the efficacy of this approach as The Boston Process Approach (Kaplan,
compared to other approaches. 1988; Lezak, 1995) is one example that
incorporates a flexible battery. Specific mea-
sures with low specificity are used to assess
4.10.2.5.3 Combined approaches multiple constructs in a variety of neuropsy-
The most frequently used approach represents chological domains (Batchelor, 1996b). Hy-
a combination of the nomothetic and idio- potheses are then made based on the initial
graphic approaches and has been referred to as measures, and additional measures with higher
the flexible battery approach (Sweet et al., 1996). levels of specificity are then selected and used to
A core set of the same tests is administered to all differentiate within and between various func-
children, as in the nomothetic approach, and tions. Hypotheses initially generated from the
serves as the basis for initial hypothesis genera- screening battery are thus either confirmed or
tion; this may constitute an initial screening nullified. Inferences are then made regarding
battery. To this core set, further tests are added brain function based on the specific deficits
that are specific to the referral question or that identified. The flexible battery used in the
are believed, based on initial observations and Boston Process Approach is not limited to
performance, to enhance the information pro- quantitative data but also includes qualitative
vided (Bauer, 1994; L. C. Hartlage & Telzrow, information that is believed to be important in
286 Neuropsychological Assessment of Children

Table 3 Luria Nebraska Neuropsychological Battery-Children's Revision.

Scale Description Function(s) assessed

C1 (motor) Items cover a variety of motor skills These tests measure motor domains
(bilateral and unilateral) including simple but are sensitive to many types of
hand movements, drawing, and motor problems. (Golden, 1997).
constructional skills.
C2 (rhythm) Items include a variety of tasks in which These items are considered to be
the child is required to report whether one most sensitive to attention and
of two groups of tones is higher or lower, concentration (Golden, 1997).
reproduce tones and rhythmic patterns,
and identify the number of beeps in
groups of sounds.
C3 (tactile) Items include tasks in which the child is These items measure the extent of
asked to report where they are touched, cutaneous sensation and
how hard they are touched, as well as to stereognostic perception (Golden,
name and identify objects through touch. 1997).
C4 (visual) Items include tasks in which the child is These items measure visual±spatial
required to identify an object or picture, organization and perception as well
overlapping pictures, pictures that are as right hemisphere function
difficult to perceive, and mirror image (Golden, 1997).
versions; items also include progressive
matrices, and spatial rotation.
C5 (receptive speech) The child is required to repeat phonemes, These items measure receptive
repeat phonemes at various levels of pitch, language and auditory skills as well
name objects, point to objects, identify as left hemisphere function
and define words, and respond to (Golden, 1997).
sentences.
C6 (expressive speech) The child is required to repeat phonemes, These items measure expressive
words, and sentences as well as to generate language as well as left hemisphere
speech forms including naming objects, function. Results may be impacted
counting forward and backward, by reading ability (Golden, 1997).
spontaneous discourse in response to a
picture, story, or discussion topic.
C7 (writing) Tasks include copying of letters and These items measure visual motor
words, writing first and last name, writing and auditory motor skills and are
sounds, words, and phrases from believed to measure functioning of
dictation. the temporal±parietal±occipital
area (Golden, 1997).
C8 (reading) The child is asked to generate sounds from These items measure reading as well
letters, name letters, read simple words, as left hemisphere function
sentences, and paragraphs. (Golden, 1997).
C9 (arithmetic) Child is asked to write arabic and roman These tasks measure arithmetic
numerals from dictation, compare skills, but are considered the most
numbers, complete simple computation sensitive to educational deficits as
problems, and generate serial threes. well to all/any dysfunction
(Golden, 1997).
C10 (memory) Tasks required include having the child These items measure short-term
memorize words as well as predicting their memory functions and are most
own performance on various memory sensitive to verbal dysfunction
tasks. (Golden, 1997).
C11 (intellectual) The child is asked to complete a variety These are considered to be reflective
tasks including interpretation of pictures, of general neuropsychological
arranging pictures in order, identification function, concept formation, and
of what is comical/absurd, interpretation reasoning (Golden, 1997).
of story, determination of similarities,
simple arithmetic problems, identification
of logical relations and so on.
Measures Used in the Assessment of Children 287

understanding the child's problems and in The neuropsychological examination of chil-


developing effective intervention programs dren is focused more directly on an analysis of
(Batchelor, 1996b; Milberg, Hebben, & Kaplan, the functional concomitants and sequelae than
1996). There is less of a focus on the results of on the identification of strictly neurologic
standardized test performance with greater disorders, but it is also useful in the diagnosis
attention paid to developmental history, pre- and identification of more subtle conditions
sentation of symptoms, strategy use in task (e.g., learning disabilities, ADHD) or other
completion, and error analysis. As such, the neurologic disorders, especially in early stages
ªprocessº approach uses both standardized (such as childhood onset of Huntington's
measures and experimental measures as well disease) that are more resistant to diagnosis
as ªtesting of limitsº that may involve proce- via neurologic examination. (In adulthood,
dural modifications in order to gain insight into differential diagnosis, such as depression versus
brain±behavior relationships (Kaplan, 1988; dementia or differentiation of malingering or
Milberg, et al., 1996). Concern has, however, among various dementias takes on greater
been expressed regarding the reliability of scores importance.) At all ages, the neuropsychological
obtained on standardized measures when the examination is also focused on rehabilitation.
standardization procedures have been compro- A thorough history is important to a proper
mised (e.g., Rourke et al., 1986). Further, most neuropsychological assessment. The length of
of the research and clinical study, with the time since trauma or disease onset, premorbid
Boston Process Approach in particular, has levels of functions, family history of related
been with adult populations as opposed to problems, and problems related to gestation,
children, and it is not recommended for other delivery, and the postnatal period are all
than research applications. relevant to accurate interpretation of the results
of neuropsychological testing. If a school-aged
or college-aged individual is involved, it is
4.10.2.6 General Organization of the important to review educational history with
Neuropsychological Assessment of the specific performance data including standar-
Child dized tests scores and grades along with any
special education history. With children who
When the neurologist examines a child, the are, developmentally, a moving target, it is
physical examination looks principally for important to be always cognizant of the
structural defects in the CNS, trauma to the educational implications of the reason for
CNS, or specific disease entities or toxins. An referral. Following a review of history and
assessment of history is an integral component obtainable records, there are nine key points to
of both the neurological and the neuropsycho- consider in the organization of the neuropsy-
logical assessment of children and includes, for chological assessment.
the neurologist, assessment of the gestational (i) All or at least a significant majority of the
period, delivery, postnatal history, and the child's educationally relevant cognitive skills or
family medical history through at least two higher order information process skills should be
generations. The physical examination that lay assessed. This will often involve an assessment of
people view as the neurological examination general intellectual level via a comprehensive IQ
proper is based largely on observations of the test such as a Wechsler scale or KABC. Evalua-
neurologist and is conducted in the context of a tion of the efficiency of mental processing as
brief interview and physical manipulation to assessed by strong measures of g, is essential to
assess tone, muscle strength, deep tendon provide a baseline for interpreting all other
reflexes, sensation, and brain stem and spinal aspects of the assessment process. Assessment
reflexes. Electrophysiologic, serologic and/or of basic academic skills including reading, writ-
imaging studies may then be ordered as may be ing, spelling, and math will be necessary, along
suggested by such results. Neuropsychological with tests of memory and learning such as the
testing may also be ordered when there are TOMAL which also have the advantage of
suspicions of intellectual delay or functional including performance-based measures of atten-
sequelae are suspected, related to trauma, tion and concentration. Problems with memory,
disease, or toxins. As recently as the 1970s attention and concentration, and new learning
and into the 1980s, neuropsychological testing are the most common of all complaints following
was used to evaluate lesion site and size and to CNS compromise and are frequently associated
assist in the differential diagnosis of a variety of with more chronic neurodevelopmental disor-
neurologic diseases, but this function has been ders (e.g., learning disability, ADHD).
largely supplanted by advances in neuroima- (ii) Testing should sample the relative effi-
ging, clinical serology, and the linking of a ciency of the right and of the left hemispheres of
variety of cancers to mental symptoms. the brain. Asymmetries of performance are of
288 Neuropsychological Assessment of Children

interest in their own right, but different brain locate strengths of the child and intact systems
systems are involved in each hemisphere that that can be used to overcome the problems the
have different implications for treatment as child is experiencing. Treatment following CNS
well. Even in a diffuse injury such as anoxia, compromise involves habilitation and rehabili-
it is possible to find greater impairment in one tation with the understanding that some or-
portion of an individual's brain than in another. ganic deficits will represent permanently
Specific neuropsychological tests like those of impaired systems. As the brain is a complex
Halstead and Reitan or the LNNB-CR are interdependent systemic network of complex
useful here. organizations that produce behavior, the ability
(iii) Sample anterior and posterior regions of to identify intact systems is crucial to enhancing
cortical function. The anterior portion of the the probability of designing successful treat-
brain is generative and regulatory while the ment. Identification of intact systems also
posterior region is principally receptive. Deficits suggests the potential for a positive outcome
and their nature in these systems will have great to parents and teachers, as opposed to fostering
impact on treatment choices. Many common low expectations and fatalistic tendencies on
tests such as receptive (posterior) and expressive identification of brain damage or dysfunction.
(anterior) vocabulary tests may be applied here (vii) Assess affect, personality, and behavior.
along with a systematic and thorough sensory Neuropsychologists sometimes ignore their
perceptual examination. In conjunction with roots in psychology and focus on assessing
key point (ii), this allows for evaluation of the the neural substrates of a problem. However,
integrity of the four major quadrants of the CNS compromise will result in changes in
neocortex. affect, personality, and behavior. Some of these
(iv) Determine the presence of specific defi- changes will be transient, some will be perma-
cits. Any specific functional problems a child is nent, and due to the developmental nature of
experiencing must be determined and assessed. children, some will be dynamic. Some of these
In addition to those being of importance in the changes will be direct (i.e., a result of the CNS
assessment of children with neurodevelopmen- compromise at the cellular and systemic levels)
tal disorders, traumatic brain injury (TBI), and others will be indirect (i.e., reactive to loss
stroke, even some toxins can produce very or changes in function, or to how others
specific changes in neocortical function that respond to and interact with the individual).
are addressed best by the neuropsychological A thorough history, including onset of problem
assessment. Similarly, research with children behaviors, can assist in determination of direct
with leukemia suggests the presence of subtle versus indirect effects. Comprehensive ap-
neuropsychological deficits following che- proaches such as the Behavior Assessment
motherapy that may not be detected by more System for Children (BASC; Reynolds & Kam-
traditional psychological measures. Neuropsy- phaus, 1992) which contain behavior rating
chological tests tend to be less g-loaded as a scales, omnibus personality inventories, and
group and to have greater specificity of mea- direct observation scales seem particularly use-
surement than many common psychological ful. Such behavioral changes will also require
tests. Noting areas of specific deficit is impor- intervention and the latter may vary depending
tant in both diagnosis and treatment planning. on whether the changes noted are direct or
(v) Determine the acuteness versus the chroni- indirect effects or whether there were behavior
city of any problems or weaknesses found. The problems evident on a premorbid basis.
ªageº of a problem is important to diagnosis (viii) Test results should be presented in ways
and to treatment planning. Combining a thor- that are useful in school settings, not just in acute
ough history with the pattern of test results care or intensive rehabilitation facilities. Schools
obtained, it is possible, with reasonable accu- are a major context in which children with
racy, to distinguish chronic neurodevelopmen- chronic neurodevelopmental disorders must
tal disorders such as dyslexia or ADHD from function. Children who have sustained insult
new acute problems resulting from trauma, to the CNS (i.e., TBI, stroke) will eventually
stroke, or disease. Care must be taken especially return to a school or similar educational setting.
in developing a thorough, documented history Schools are where the greatest long-term impact
when such a determination is made. When on a child's outcome after CNS compromise is
designing intervention/treatment strategies, re- seen and felt. Results should speak to academic
habilitation and habilitation approaches take and behavioral concerns, reflecting what a child
differing routes depending upon the age of the needs to be taught next in school, how to teach
child involved and the acuteness or chronicity of to the child's strengths through the engagement
the problems evidenced. of intact complex functional systems, and how
(vi) Locate intact complex functional systems. to motivate and manage positive behavioral
It is imperative in the assessment process to outcomes. For children with TBI, additional
Measures Used in the Assessment of Children 289

information regarding potential for recovery more of an ipsative as opposed to a normative


and the tenuousness of evaluation results im- determination. Furthermore, certain strengths
mediately post-injury needs to be communi- are more useful than others. Preserved language
cated as does the need for reassessment of both and speech are of great importance for example,
the child and the intervention program at while an intact sense of smell (an ability often
regular intervals. impaired in TBI) is of less importance in
(ix) If consulting directly to a school, be designing treatment plans and outcome re-
certain the testing and examination procedures search. Even more important to long-term
are efficient. School systems, which is where one recovery are intact planning and concept
finds children, do not often have the resources formation skills. The executive functioning
for funding the type of diagnostic workups skills of the frontal lobes take on greater and
neuropsychologists prefer. Therefore, when greater importance with age, and strengths in
consulting to the school, it is necessary to be those areas are crucial to long-term planning (as
succinct and efficient in planning the neuro- is the detection of weaknesses). These will
psychological evaluation. If the school can change, however, with age as the frontal lobes
provide the results of a very recent intellectual become increasingly prominent in behavioral
and academic assessment as well as the beha- control after age nine years, again through
vioral assessment information, this can be then puberty, and continuing into the 20s.
integrated into the neuropsychological assess- There are of course times when the scope of
ment by the neuropsychologist. If a recent the neuropsychological assessment of a child is
intellectual and academic assessment has not less broad. On occasion, referrals may be very
been completed, it may be cost-efficient for specific (e.g., ªDoes Susan have memory or
qualified school district personnel to complete attention problems?). Even when such see-
this portion of the assessment for later integra- mingly succinct questions are asked, it is
tion with other data obtained and interpretated commonly a good practice to inquire of the
by the neuropsychologist. For children in referral source as to whether other questions
intensive rehabilitation facilities or medical may be anticipated (e.g., Is memory an issue
settings, it may be appropriate for school because of poor school achievement? Possible
personnel to participate in the evaluation prior learning disability?).
to discharge (i.e., for children with TBI being This section draws in part upon the writings,
released and returned to the schools). This teachings, and workshops of Lawrence C.
collaborative involvement can facilitate pro- Hartlage and Byron Rourke.
gram planning with the receiving school district
and is preferable to eliminating needed compo-
nents of the neuropsychological evaluation. 4.10.2.7 Interpretation Issues
When considering rehabilitation of the child
with a focal injury or TBI, several additional Neuropsychological assessment of children
considerations are evident. It is important to yields not only an accumulation of test data and
determine what type of functional system is impressions, but also a variety of paradigms for
impaired. Impaired systems may, for example, understanding and interpreting that data. There
be modality-specific or process-specific. The are a number of competing paradigms and
nature or characteristics of the impairments theories (e.g., Ayers, 1974; Das et al.,and 1979;
must be elucidated before an intelligent reme- Luria, 1966; Reynolds, 1981b), and as a result,
dial plan can be devised. not only is there considerable variability in the
The number of systems impaired should be quality and choice of measures used in the
determined and prioritized. Children may not neuropsychological assessment of children,
be able to work out everything at once and a there are considerable differences in the ways
system of priorities should be devised so that the in which the data obtained are used for making
most important of the impairments to impact inferences and eventually interpreted (Batchelor,
overall recovery is the first and most intensely 1996a; Nussbaum & Bigler, 1997). Interpreta-
addressed area of impairment. The degree of tion of the accumulated data is dependent to
impairment, a normative question, is also an a great extent on the neuropsychologist's clinical
important consideration in this regard. At skills and acumen (D'Amato et al., 1997).
times, this will require the neuropsychologist Interpretation may be based on overall perfor-
to reflect also on the indirect effects of a TBI, as mance level (e.g., Reitan, 1986, 1987), perfor-
an impaired or dysfunctional system may mance patterns (e.g., Mattarazzo, 1972; Reitan,
adversely affect other systems that are without 1986, 1987), asymmetry of function (e.g., L. C.
true direct organic compromise. Hartlage, 1982), the presence of ªorganicº signs
The quality of neuropsychological strengths (Kaplan, 1988; Lezak, 1995), or on some
that exist will also be important and tends to be combination of features. It is not necessarily
290 Neuropsychological Assessment of Children

the case that only one paradigm is appropriate; (L. C. Hartlage & Telzrow, 1983; Reynolds,
which paradigm is most suitable may depend on 1981b, 1986a; Teeter, 1997). It has been argued
the child being evaluated. Most importantly, the that a strength model is more efficacious, with
model used for interpretation should allow the habilitation based on those complex functional
neuropsychologist to make predictions about systems that are sufficiently intact, and there-
the child's ability to perform in a variety of fore potentially capable of taking over and
contexts and about the efficacy of treatment/ moderating the acquisition of the skills needed
intervention plans (Reynolds et al., 1997). (Reynolds et al., 1997). Emphasis on weak-
nesses, generally referred to as the deficit model,
is not supported by research (e.g., Adams &
4.10.2.7.1 Performance level
Victor, 1977; L. C. Hartlage, 1975; P. L.
With the use of this indicator, the child's Hartlage & Givens, 1982; P. L. Hartlage &
overall level of performance is compared to Hartlage, 1978), and deficit approaches to
normative data and conclusions are reached intervention (e.g., remediation of the deficit
based on deviations from the norm. The extent process) have not been found to be effective and
of variability among typically developing may even be harmful (L. C. Hartlage &
children on some measures at given ages (e.g., Reynolds, 1981).
when the standard deviation approximates the There are, however, some problems with this
mean score) may preclude interpretation of method of interpretation regardless of whether
results using this approach. In addition, this the focus is on strengths, weaknesses, or a
approach can be misleading, particularly in combination of these. This approach may be
those individuals with higher cognitive ability misleading as other variables may account for
(Jarvis & Barth, 1984; Reitan & Wolfson, 1985). these intra-individual differences (Jarvis &
Further, there is a tendency for this method to Barth, 1984). Additionally, some such intra-
yield a large number of false positives due to the individual differences (e.g., verbal IQ±
potential for other factors (e.g., motivation, performance IQ differences) have been found
fatigue) to impact on a child's performance to occur with frequency in the general popula-
(Nussbaum & Bigler, 1997). tion (e.g., Kaufman, 1976b) and seemingly
abnormal levels of subtest scatter (WISC-R)
have been found to be relatively common
4.10.2.7.2 Profile patterns
(Gutkin & Reynolds, 1980; Kaufman, 1976a,
Application of the neuropsychological model 1976b; Reynolds, 1979). Base rates in the
to learning problems has been criticized as being general population of specific intra-individual
too aligned with a medical model and an differences for various other combinations of
emphasis on pathology (Gaddes & Edgell, measures have not been studied, and what
1994). As asserted by Little and Stavrou appears to be a ªdifferenceº may not be unusual
(1993), merely identifying that brain integrity or unique at a given age level. Further, the
has in some way been compromised is not in and stability of these profile patterns over at least
of itself particularly helpful to the child or to very short periods of time needs to be
those who need to develop interventions to help investigated (Reynolds, 1997b).
the child. Neuropsychologists look beyond
diagnosis or categorization to an understanding
4.10.2.7.3 Functional asymmetry
of brain±behavior relations. In order to accom-
plish this, neuropsychological assessment in- Examination of asymmetries in performance
volves consideration of associations and across measures is another method of intra-
dissociations of performance across measures individual consideration. Replicable asymme-
(Fletcher, 1988; Rutter, 1981). Performance tries in performance are generally considered
patterns or intraindividual differences provide a signs of CNS dysfunction (Batchelor, 1996b).
means of conceptualizing functional vs. dys- Most frequently, the comparison is made
functional organizational systems. Strengths between those functions that are believed to
and weaknesses are then identified based on the be right hemisphere-dominated as opposed to
discrepancies between the domains studied. left hemisphere-dominated. These differences,
This method has, however, been used frequently however, may be difficult to interpret, particu-
for the identification or classification of sub- larly for younger children (Reynolds et al.,
types of learning disabilities (Branch et al., 1995; 1997). Further, understanding of the lateraliza-
Nussbaum & Bigler, 1986; Rourke, 1984). tion of cortical functions is frequently based on
In interpreting data obtained using this type evidence from adults as opposed to children and
of evaluation, clinicians differ with regard to assumes that the lateralization is stable over
emphasis on child strengths, child weaknesses time, despite differing rates of brain maturity
or a combination of strengths and weaknesses (Spreen et al., 1995). Reliance on left±right
Conclusions 291

differences and measures based on lateraliza- Bigler, 1997). Identification of ªorganicº signs is
tion of function have also been criticized as generally completed through qualitative analy-
ignoring the role of hemispheric interaction on sis of errors (Kaplan, 1988; Lezak, 1995). The
behavior (e.g., Efron, 1990; Hiscock & presence of specific types of errors is then seen as
Kinsbourne, 1987). an indication of a compromise to brain
As with the patterns of performance method, integrity. This method has been used reliably
right±left differences have been used in the with adult populations; however, the utility of
characterization of children with right hemi- this approach in the neuropsychological assess-
sphere dysfunction as suggestive of learning ment of children has not been demonstrated
disability or ADHD (Rourke, 1989). The results (Batchelor, 1996b). The range of variability
of studies are, however, equivocal (e.g., Branch associated with the developmental process in
et al., 1995; Gross-Tsur, Salev, Manor, & Amir, children would seem to make it more difficult to
1995; Voeller, 1995). Research, in general, interpret specific errors as signs of organic
regarding lateralization of function and hemi- impairment (Nussbaum & Bigler, 1997). Unlike
spheric specialization is fraught with conflicting the performance levels method, the use of
results (e.g., Bever, 1975; Das et al., 1979; Dean, pathognomonic signs has been found to result in
1984; Reynolds, 1981b), and it has been a large number of false negatives (Boll, 1974).
suggested that the traditional verbal±nonverbal This may be related to the potential for
distinction between hemispheres is an over- reorganization/recovery of function in children
simplification of a complex system (Dean, 1984; (Nussbaum & Bigler, 1997).
Reynolds, 1981a). Based on Luria's theories,
asymmetries of function are not content- or
modality-specific but rather are ªprocessº- 4.10.2.7.5 Combination approaches
specific (Reynolds, 1981a, 1981b). Bever Boll (1981) proposed utilizing performance
(1975) posited two fundamental lateralized levels, patterns of performance, pathognomonic
processing types, the analytic and holistic; these signs, and asymmetry of function in concert, in
were translated into sequential and simulta- order to account for the potential limitations to
neous in the KABC based on Das et al. (1979). In the use of any single approach in the inter-
the research literature, however, there is often a pretation of neuropsychological assessment
preponderance of emphasis placed on content data. This multiple inferential levels approach
and modality, as opposed to process, in the is used in the HRNB and is supported by others
interpretation of functional asymmetries; it is as well (e.g., Rourke, 1994). The ªrules
believed that this may account for the conflicting approachº (Selz & Reitan, 1979b) also com-
results across studies (Reynolds et al., 1997). bines approaches, but in a different manner.
Nussbaum et al., (1988) proposed an alternate Using the ªrules approach,º each of 37 aspects
method of examining asymmetry. In the model of neuropsychological performance is rated on a
of Nussbaum and colleagues, the neuropsycho- four-point scale in order to provide an objective
logical protocol and interpretation reconcep- system for measuring the extent of impairment.
tualized neurobehavioral functioning along the More recently, Taylor and Fletcher (1990)
anterior±posterior gradient as opposed to left± proposed that the child's performance on
right differences. The recommended protocol neuropsychological measures be used to identi-
includes tasks from the HRNB as well as from fy and clarify the functional aspects of the
other batteries. Nussbaum and colleagues child's problems, with the understanding that
asserted that this model may provide additional the biological or neurological substrates of the
information in the investigation of asymmetries learning or behavior problem serve to set limits
in children with learning and behavioral pro- on the child's performance. Levine (1993) has
blems. Initial research in this area suggested that posited still another model for interpretation of
weaknesses on anterior measures were asso- neuropsychological data. The ªobservable phe-
ciated with psychological/behavioral problems nomenonº model places the emphasis on
(Teeter & Semrud-Clikeman, 1997). Two later observable behaviors, that may impact on
studies, however, failed to support the anterior± classroom performance and the changing
posterior gradient theory (Matazow & Hynd, demands placed on the child over time, as
1992) opposed to test results.

4.10.2.7.4 Pathognomonic signs


4.10.3 CONCLUSIONS
The pathognomonic signs approach involves
the identification of specific deficits or perfor- Neuropsychological assessment and the field
mance errors that are not frequently found in of clinical child neuropsychology in general
typically developing individuals (Nussbaum & have much to offer in the way of understanding
292 Neuropsychological Assessment of Children

the functional systems of the brain and the specific as possible in descriptions of clinical
mechanisms involved in the learning and self- subgroups (e.g., Fletcher, Shaywitz, & Shay-
regulation process. Not only is this important in witz, 1994). Regardless of the perspective used
understanding and designing treatment pro- in interpretation, the value of that interpreta-
grams for children with problems, but increased tion is only as good as the measures used in the
understanding of brain functions and their assessment process and their sensitivity and
relation to behavior can also improve the specificity (Batchelor, 1996b) in combination
overall outcomes for all children (Gaddes, with the skills and knowledge of the user
1983). Historically, neuropsychological assess- (Golden, 1997). Failure to resolve these mea-
ment of children has taken its lead from research surement and methodology issues has impeded
and practice with adults. Issues relating to and will continue to impede progress in the field
neurodevelopment, task appropriateness, vary- of neuropsychology (Reynolds, 1997b).
ing contexts for children, progression following The field of clinical child neuropsychology is
brain injury with children, and so on, render the in part driven by the development and applica-
continuing use of this approach inappropriate. tion of standardized diagnostic procedures that
A variety of theoretical models exist; however, are sensitive to higher cognitive process as
many of these are adult-based and used without related to brain function (Reynolds, 1997b).
consideration of developmental issues. Only by While the development of new measures of
developing its own theories and clinical assess- memory, attention, information processing, and
ment procedures, that are sensitive to develop- so on provide alternatives for clinical child
mental features and responsive to educational neuropsychologists (e.g., Reynolds, & Bigler,
issues, can the field of clinical child neuropsy- 1994), further research is needed with these as
chology continue to advance and make mean- well as with other new and existing measures in
ingful contributions to the understanding of order to determine their utility as part of a
learning and behavior problems in children. comprehensive neuropsychological battery. The
Development and incorporation of typologies incorporation of computer-based assessment is
within the childhood disorders, based on clinical likely to increase in the next decades with the
experience with children and neuropsychologi- potential for incorporation of computer simula-
cal theory that addresses habilitation, program- tion, interactive types of tasks, virtual reality and
ming, and research needs and that has intuitive so on, as means of measuring neuropsycholo-
appeal to psychologists, educators, and neurol- gical function. Computerized testing may facil-
ogists, would be viewed as a major conceptual itate the interface with electrophysiological and
contribution to the field of child neuropsychol- neuroradiological methods and, ultimately,
ogy (Reynolds, 1986b). bring about significant advances in the under-
Continued methodological and measurement standing of learning and behavior problems.
problems in the research that serves as a Technological advances in assessment, however,
foundation for the interpretation of neuropsy- will require the same types of research regarding
chological data impede progress in the field of psychometric properties, confounding factors,
clinical child neuropsychology and impact on cultural/gender differences, and so on. Even
the accuracy of diagnosis and on the appro- with those existing tests that currently include an
priateness of treatment planning. Lack of option for computerized assessment, there are
attention to standard psychometric methods some indications of differences in results
within the field of clinical neuropsychology is all following computer administration as opposed
too rampant and poses serious limitations in to the more traditional administration. If this is
research in clinical arenas; intuitive appeal, the case, then it may be appropriate for separate
clinical acumen, and perceived utility are not normative data to be obtained for each mode of
sufficient, but must be combined with sound administration. Furthermore, children with
empirical research (Reynolds, 1986b). While it is substantial CNS compromise will have difficulty
anticipated that new measures will have suffi- manipulating computerized test materials, and
cient normative samples for evaluation of careful validity research will be required at a time
associations with demographic variables, and when publishers and others are looking for ways
assessment of validity, reliability, sensitivity, to reduce costs associated with health care
and specificity issues, many existing measures products.
continue to have insufficient normative data. A major concern with regard to the increased
All too often, sensitivity is a focus; specificity is emphasis on reducing health care costs is that
also necessary if results are to be useful in neuropsychologists will shorten tests and at-
treatment/intervention planning. This requires tempt to streamline batteries, and in the process
further investigation of contrasting clinical lessen both the quantity of time required and the
groups. In research with clinical groups, there quality of the assessment provided (Woody,
is a need to consider comorbidity and be as 1997). This not only impacts on clinical practice,
References 293

but also on the knowledge generated through Arnold, B. R., Montgomery, G. T., Castenada, I., &
future research. Conflict or dissonance among Langoria, R. (1994). Acculturation of performance of
Hispanics on selected Halstead±Reitan neuropsycholo-
clinicians will not be well tolerated and the gical tests. Assessment, 1, 239±248.
future of clinical child neuropsychology will Asarnow, R. F., Asamen, J., Granholm, E., & Sherman, T.
need the support of public policy (Woody, (1994). Cognitive/neuropsychological studies of children
1997). This means that reasonable agreement on with a schizophrenic disorder. Schizophrenia Bulletin, 20,
647±669.
theoretical foundations, training, and proce- Asarnow, R. F., Brown, W., & Strandburg, R. (1995).
dures will need to be reached (Woody, 1997). Children with a schizophrenic disorder: Neurobehavioral
Provision of neuropsychological services needs studies. European Archives of Psychiatry and Clinical
to be predicated on academics, research, and Neuroscience, 245(2), 70±79.
training consistent with both a clinical psychol- Ayers, A. J. (1974). Sensory integration and learning
disorders. Los Angeles: Western Psychological Services.
ogy orientation and specialized training in Aylward, G. P., Gioia, G., Verhulst, S. J., & Bell, S. (1995).
brain±behavior relationships, in order to ensure Factor structure of the Wide Range Assessment of
sound foundations (Woody, 1997). With regard Memory and Learning in a clinical population. Journal
to neuropsychological assessment of children, of Psychoeducational Assessment, 13, 132±142.
Bakker, D. J. (1984). The brain as a dependent variable.
there is a definite need on the part of child Journal of Clinical Neuropsychology, 6, 1±16.
neuropsychologists to be well grounded in the Bannatyne, A. (1974). Diagnosis: A note on recategoriza-
developmental process, from a cognitive as well tions of the WISC scale scores. Journal of Learning
as neurological perspective. Children are a Disabilities, 7, 272±274.
moving target and need even more sophisticated Barkley, R. A. (1991). The ecological validity of laboratory
and analogue assessments of ADHD symptoms. Journal
assessment devices than do adults. Considerable of Abnormal Child Psychology, 19, 149±178.
work remains in this domain alone. Barkley, R. A. (1994). The assessment of attention in
Neuropsychological assessment of children children. In G. R. Lyon (Ed.), Frames of reference of the
with learning or behavioral/emotional problems assessment of learning disabilities: New views on measure-
is not necessarily essential. It does, however, ment issues (pp. 69±102). Baltimore: Brookes.
Barkley, R. A., DuPaul, G. J., & McMurray, M. B. (1991).
provide for a comprehensive evaluation of Attention deficit disorder with and without hyperactiv-
cognitive skills and emotional factors as well ity: Clinical response to three dose levels of methylphe-
as of environmental influences (L. C. Hartlage & nidate. Pediatrics, 87, 519±531.
Long, 1997). Neuropsychological assessment Barkley, R. A., Fischer, M., Newby, R., & Breen, M.
(1988). Development of multimethod clinical protocol
may be most appropriate for those children who for assessing stimulant drug responses in ADHD
exhibit characteristics of a disorder that includes children. Journal of Clinical Child Psychology, 20,
cognitive deficits (e.g., learning disability) or 163±188.
significant behavioral problems (e.g., behavior Batchelor, E. S. (1996a). Introduction. In E. S. Batchelor,
disorder) in the absence of neurophysiological Jr. & R. S. Dean (Eds.), Pediatric neuropsychology:
Interfacing assessment and treatment for rehabilitation
evidence of brain damage, for those with a (pp. 1±8). Boston: Allyn & Bacon.
known neurological syndrome or disease as Batchelor, E. S. (1996b). Neuropsychological assessment of
evidenced by neurophysiological methods, and children. In E. S. Batchelor, Jr. & R. S. Dean (Eds.),
for those children who, because of genetic Pediatric neuropsychology: Interfacing assessment and
treatment for rehabilitation (pp. 9±26). Boston: Allyn &
predisposition or prenatal/perinatal complica-
Bacon.
tions, are believed to be at high risk for Batchelor, E. S., Gray, J. W., Dean, R. S., & Lowery, R.
neurological disorders (Allen, 1989). (1988). Interactive effects of socioeconomic factors and
perinatal complications. NASA Program Abstracts, F-
169, 115±116.
4.10.4 REFERENCES Batchelor, E. S., Kixmiller, J. S., & Dean, R. S. (1990).
Neuropsychological aspects of reading and spelling
Adams, R. L. (1985). Review of the Luria Nebraska performance in children with learning disabilities.
Neuropsychological Battery. In J. V. Mitchell (Ed.), The Developmental Neuropsychology, 6, 183±192.
ninth mental measurements yearbook. Lincoln, NE: Batchelor, E. S., Sowles, G., Dean, R. S., & Fischer, W.
University of Nebraska. (1991). Construct validity of the Halstead Reitan
Adams, R. D., & Victor, M. (1977). Principles of neurology. Neuropsychological Battery for children with learning
New York: McGraw-Hill. disabilities. Journal of Psychoeducational Assessment, 9,
Allen, C. (1989). Why use neuropsychology in the schools? 16±31.
The Neuro-Transmitter, 1, 1±2. Bauer, R. M. (1994). The flexible battery approach to
Ardila, A., & Roselli, M. (1994). Development of language, neuropsychological assessment. In R. D. Vanderploeg
memory, and visuospatial abilities in 5- to 12-year old (Ed.), Clinician's guide to neuropsychological assessment
children using a neuropsychological battery. Develop- (pp. 259±290). Hillsdale, NJ: Erlbaum.
mental Neuropsychology, 10, 97±120. Becker, M. G., Isaac, W., & Hynd, G. W. (1987).
Ardila, A., Roselli, M., & Puente, T. (1994). Neuropsycho- Neuropsychological development of nonverbal beha-
logical evaluation of the Spanish speaker. New York: viors attributed to ªfrontal lobeº functioning. Develop-
Plenum. mental Neuropsychology, 3, 275±298.
Arffa, S., Fitzhugh-Bell, K., & Black, F. W. (1989). Bellinger, D. (1995). Lead and neuropsychological function
Neuropsychological profiles of children with learning in children: Progress and problems in establishing
disabilities and children with documented brain damage. brain±behavior relationships. In M. G. Tramontana &
Journal of Learning Disabilities, 22, 635±640. S. R. Hooper (Eds.), Advances in Child Neuropsychology
294 Neuropsychological Assessment of Children

(Vol. 3, pp. 12±47). New York: Springer-Verlag. Establishing guidelines for their valid application in
Benton, A. L., Hamsher, K., Varney, N. R., & Spreen, O. neuropsychological research. Journal of Clinical and
(1983). Contributions to neuropsychological assessment. Experimental Neuropsychology, 16, 155±161.
New York: Oxford University Press. Clark, E., & Hostettler, C. (1995). Traumatic brain injury:
Bever, T. G. (1975). Cerebral asymmetries in humans are Training manual for school personnel. Longmont, CO:
due to the differentiation of two incompatible processes: Sopris West.
Holistic and analytic. In D. Aaronson & R. Reiber Cohen, M. J. (1997). The Children's Memory Scale. San
(Eds.), Developmental neurolinguistics and communication Antonio, TX: Psychological Corporation.
disorders. New York: New York Academy of Sciences. Cohen, M. J., Branch, W. B., Willis, W. G., Weyandt, L.
Beverley, D. W., Smith, I. S., Beesley, P., Jones, J., & L., & Hynd, G. W. (1992). Childhood. In A. E. Puente &
Rhodes, N. (1990). Relationship of cranial ultrasono- R. J. McCaffrey (Eds.), Handbook of neuropsychological
graphy, visual and auditory evoked responses with assessment (pp. 49±79). New York: Plenum.
neurodevelopmental outcome. Developmental Medicine Cohen, S. E., Beckwith, L., Parmalee, A. H., & Sigman, M.
and Child Neurology, 32, 210±222. (1996). Prediction of low and normal school achievement
Bigler, E. R. (1990). Traumatic brain injury: Mechanisms of in early adolescents born preterm. Journal of Early
damage, assessment, intervention and outcome. Austin, Adolescence, 16, 46±70.
TX: Pro-Ed. Conners, C. (1995). Continuous performance test computer
Bigler, E. R. (1991). Neuropsychological assessment, program 3.0: User's manual. Toronto, London: Multi-
neuroimaging, and clinical neuropsychology. Archives Health Systems.
of Clinical Neuropsychology, 6, 113±132. Copeland, D. R., Dowell, R. E., Jr., Fletcher, J. M.,
Bigler, E. R. (1996). Bridging the gap between psychology Bordeaux, J. D., Sullivan, M. P., Jaffe, N., Frankel, L.
and neurology: Future trends in pediatric neuropsychol- S., Ried, H. L., & Cangir, A. (1988). Neuropsychological
ogy. In E. S. Batchelor, Jr. & R. S. Dean (Eds.), Pediatric effects of childhood cancer treatment. Journal of Child
Neuropsychology (pp. 27±54). Boston: Allyn & Bacon. Neurology, 3, 53±62.
Black, F. W. (1976). Cognitive, academic, and behavioral Crockett, D., Klonoff, H., & Bjerring, J. (1969). Factor
findings in children with suspected and documented analysis of neuropsychological tests. Perceptual and
neurological dysfunction. Journal of Learning Disabil- Motor Skills, 29, 791±802.
ities, 9, 182±187. Crum, T. A., Bradley, J. D., Teichner, G., & Golden, C. J.
Boll, T. J. (1974). Behavioral correlates of cerebral damage (1997, November). Analysis of the general intelligence
in children age 9±14. In R. M. Reitan & L. A. Davison subtest of the Luria Nebraska neuropsychological battery
(Eds.), Clinical neuropsychology: Current status and III. Paper presented at the 17th Annual Conference of
applications (pp. 91±120). Washington, DC: Winston. the National Academy of Neuropsychologists, Las
Boll, T. J. (1978). Diagnosing brain impairment. In B. B. Vegas, NV.
Wolman (Ed.), Clinical diagnosis of mental disorders. Crum, T. A., Golden, C. J., Bradley, J. D., & Teichner, G.
New York: Plenum . (1997, November). Analyzing the concurrent validity of
Boll, T. J. (1981). The Halstead Reitan Neuropsychological the memory scales of the Luria Nebraska neuropsycholo-
Battery. In S. Filskov & T. J. Boll (Eds.), Handbook of gical battery-Third edition. Paper presented at the 17th
clinical neuropsychology (pp. 577±607). New York: Annual Conference of the National Academy of
Wiley-Interscience. Neuropsychologists, Las Vegas, NV.
Bolter, J. F., & Long, C. J. (1985). Methodological issues in Damasio, A. R., & Maurer, R. G. (1978). A neurological
research in developmental neuropsychology. In L. C. model for childhood autism. Archives of Neurology, 37,
Hartlage & C. F. Telzrow (Eds.), Neuropsychology of 504±510.
individual differences: A developmental perspective D'Amato, R. C. (1990). A neuropsychological approach to
(pp. 41±59). New York: Plenum. school psychology. School Psychology Quarterly, 5,
Branch, W. B., Cohen, M. J., & Hynd, G. W. (1995). 141±160.
Academic achievement and attention-deficit/hyperactiv- D'Amato, R. C., Gray, J. W., & Dean, R. S. (1988). A
ity disorder in children with left- or right-hemisphere comparison between intelligence and neuropsychological
dysfunction. Journal of Learning Disabilities, 28, 35±43. functioning. Journal of School Psychology, 26, 283±292.
Breslau, N., Chilcoat, H., DelDotto, J., & Andreski, P. D'Amato, R. C., Hammons, P. F., Terminie, T. J., & Dean,
(1996). Low birth weight and neurocognitive states at six R. S. (1992). Neuropsychological training in APA-
years of age. Biological Psychiatry, 40, 389±397. accredited and nonaccredited school psychology pro-
Breslau, N., & Marshall, I. A. (1985). Psychological grams. Journal of School Psychology, 30, 175±183.
disturbance in children with physical disabilities: Con- D'Amato, R. C., & Rothlisberg, B. A. (1996). How
tinuity and change in a 5-year follow-up. Journal of education should respond to students with traumatic
Abnormal Child Psychology, 13, 199±216. brain injuries. Journal of Learning Disabilities, 29,
Burin, D. I., Prieto, G., & Delgado, A. (1995). Solution 670±683.
strategies and spatial visualization strategies: Design of a D'Amato, R. C., Rothlisberg, B. A., & Leu, P. W. (in
computerized test for their assessment. Interdisciplinaria, press). Neuropsychological assessment for intervention.
12(2), 123±137. In C. R. Reynolds & T. B. Gutkin (Eds.), The handbook
Carr, M. A., Sweet, J. J., & Rossini, E. (1986). Diagnostic of school psychology (3rd. ed.). New York: Wiley.
validity of the Luria Nebraska Neuropsychological D'Amato, R. C., Rothlisberg, B. A., & Rhodes, R. L.
Battery-Children's Revision. Jounal of Consulting and (1997). Utilizing neuropsychological paradigms for
Clinical Psychology, 54, 354±358. understanding common educational and psychological
Chelune, G. J., & Baer, R. A. (1986). Developmental tests. In C. R. Reynolds & E. Fletcher-Janzen (Eds.),
norms for the Wisconsin Card Sorting Test. Journal of Handbook of clinical child neuropsychology (2nd ed.,
Clinical and Experimental Neuropsychology, 8, 219±228. pp. 270±295). New York: Plenum.
Chelune, G. J., & Thompson, L. L. (1987). Evaluation of Das, J. P., Kirby, J. R., & Jarman, R. F. (1979).
the general sensitivity of the Wisconsin Card Sorting Simultaneous and successive cognitive processes. New
Test among younger and older children. Developmental York: Academic Press.
Neuropsychology, 3, 81±89. Dean, R. S. (1984). Functional lateralization of the brain.
Christensen, A. L. (1975). Luria's neuropsychological Journal of Special Education, 8, 239±256.
investigation. New York: Spectrum. Dean, R. S. (1985). Foundation and rationale for
Cicchetti, D. V. (1994). Multiple comparison methods: neuropsychological bases of individual differences. In
References 295

L. D. Hartlage & C. F. Telzrow (Eds.), The neuropsy- of clinical child neuropsychology (2nd ed., pp. 204±215).
chology of individual differences: A developmental per- New York: Plenum.
spective (pp. 203±244). New York: Plenum. First, M. B. (1994). Computer-assisted assessment of DSM
Dean, R. S. (1986). Lateralization of cerebral functions. In III-R diagnoses. Psychiatric Annals, 24, 25±29.
D. Wedding, A. M. Horton, & J. S. Webster (Eds.), The Fletcher, J. M. (1988). Brain-injured children. In E. J.
neuropsychology handbook: Behavioral and clinical per- Mash & L. G. Terdal (Eds.), Behavioral assessment of
spectives (pp. 80±102). Berlin, Germany: Springer- childhood disorders (Vol. 2, pp. 451±589). New York:
Verlag. Guilford.
Dean, R. S., & Gray, J. W. (1990). Traditional approaches Fletcher, J. M., Shaywitz, B. A., & Shaywitz, S. E (1994).
to neuropsychological assessment. In C. R. Reynolds, & Attention as a process and as a disorder. In G. R. Lyon
R. W. Kamphaus (Eds.). Handbook of psychological and (Ed.). Frames of reference for the assessment of learning
educational assessment of children (pp. 317±388). New disabilities: New views on measurement issues
York: Guilford. (pp. 103±116). Baltimore: Brookes.
Dean, R. S., & Woodcock, R. W. (in press). Dean Fletcher, J. M., & Taylor, H. G. (1984). Neuropsycholo-
Woodcock neuropsychological assessment system profes- gical approaches to children: Toward a developmental
sional manual. Manuscript in preparation, Ball State neuropsychology. Journal of Clinical Neuropsychology, 6,
University. 139±156.
Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A. Foxcroft, C. D. (1989). Factor analysis of the Reitan-
(1994). CVLT-C Children's California Verbal Learning Indiana Neuropsychological Test Battery. Perceptual
Test: Manual. San Antonio, TX: Psychological Corpora- and Motor Skills, 69, 1303±1313.
tion. Gaddes, W. H. (1980). Learning disabilities and brain
Denckla, M. B. (1994). Measurement of executive function. function. Berlin, Germany: Springer-Verlag.
In G. R. Lyon, Frames of reference of the assessment of Gaddes, W. H. (1983). Applied educational neuropsychol-
learning disabilities: New views on measurement issues ogy: Theories and problems. Journal of Learning
(pp. 117±142). Baltimore: Brookes. Disabilities, 16, 511±514.
Denckla, M. B., LeMay, M., & Chapman, C. A. (1985). Gaddes, W. H., & Edgell, D. (1994). Learning disabilities
Few CT scan abnormalities found even in neurologically and brain function: A neurodevelopmental approach. New
impaired learning disabled children. Journal of Learning York: Springer-Verlag.
Disabilities, 18, 132±135. Gatten, S. L., Arceneaux, J. M., Dean, R. S., & Anderson,
Derryberry, D., & Reed, M. A. (1996). Regulatory J. L. (1994). Perinatal risk factors as predictors of
processes and the development of cognitive representa- developmental functioning. International Journal of
tions. Development and Psychopathology, 8, 215±234. Neuroscience, 75, 167±174.
Dietzen, S. R. (1986). Hemispheric specialization for verbal Geary, D. C. (1993). Mathematical disabilities: Cognitive,
sequential and nonverbal simultaneous information proces- neuropsychological, and genetic components. Psycholo-
sing styles of low-income 3 to 5 year olds. Unpublished gical Bulletin, 114, 345±362.
doctoral dissertation, Washington State University. Geary, D. C., & Gilger, J. W. (1984). The Luria-Nebraska
Donders, J. (1992). Validity of the Kaufman Assessment Neuropsychological Battery-Children's Revision: Com-
Battery for Children when employed with children with parison of learning disabled and normal children
traumatic brain injury. Journal of Clinical Psychology, matched on full scale IQ. Perceptual and Motor Skills,
48, 225±229. 58, 115±118.
Duffy, F. H., Denckla, M. B., McAnulty, G. B., & Holmes, Golden, C. J. (1981). The Luria-Nebraska children's
J. A. (1988). Neurophysiological studies in dyslexia. In battery: Theory and formulation. In G. W. Hynd & J.
F. Plum (Ed.), Language, communication, and the brain E. Obrzut (Eds.), Neuropsychological assessment and the
(pp. 105±122). New York: Raven. school-age child: Issues and procedures (pp. 277±302).
Duffy, F. H., & McAnulty, G. (1990). Neurophysiological New York: Grune & Stratton.
heterogeneity and the definition of dyslexia: Preliminary Golden, C. J. (1984). Luria-Nebraska neuropsychological
evidence for plasticity. Neuropsychologia, 28, 555±571. battery: Children's revision. Los Angeles: Western
Edmonds, J. E., Cohen, M. J., Riccio, C. A., Bacon, K. L., Psychological Services.
& Hynd, G. W. (1993, October). The development of Golden, C. J. (1997). The Nebraska neuropsychological
clock face drawing in normal children. Paper presented at children's battery. In C. R. Reynolds & E. Fletcher-
the annual meeting of the National Academy of Janzen (Eds.), Handbook of clinical child neuropsychology
Neuropsychology, Phoenix, AZ. (2nd ed., pp. 237±251). New York: Plenum.
Efron, R. (1990). The decline and fall of hemipsheric Goldman, P. S., & Lewis, M. E. (1978). Developmental
specialization. Hillsdale, NJ: Erlbaum. biology of brain damage and experience. In C. W.
Evans, L. P., Tannehill, R., & Martin, S. (1995). Children's Cotman (Ed.), Neuronal plasticity. New York: Raven.
reading skills: A comparison of traditional and compu- Goldman-Rakic, P. S. (1987). Development of cortical
terized assessment. Behavior Research Methods, Instru- circuitry and cognitive function. Child Development, 58,
ments, and Computers, 27(2), 162±165. 601.
Fan, X., Willson, V. L., & Reynolds, C. R. (1995). Goldstein, G. (1997). The clinical utility of standardized or
Assessing the similarity of the construct structure of flexible battery approaches to neuropsychological assess-
the KABC for black and white children from 7 to 12‰ ment. In G. Goldstein & T. M. Incagnoli (Eds.),
years in age. Journal of Psychoeducational Assessment, Contemporary approaches to neuropsychological assess-
13, 120±131 ment (pp. 67±92). New York: Plenum.
Feagans, L. V., Short, E. J., & Meltzer, L. J. (1991). Gonzalez, V., Brusca-Vega, R., & Yawkey, T. (1997).
Subtypes of learning disabilities: Theoretical perspectives Assessment and instruction of culturally and linguistically
and research. Hillsdale, NJ: Erlbaum. diverse students with or at-risk of learning problems.
Fennell, E. B. (1994). Issues in child neuropsychological Boston: Allyn & Bacon.
assessment. In R. Venderploeg (Ed.), Clinician's guide to Gordon, M. (1983). The Gordon Diagnostic System.
neuropsychological assessment (pp. 165±184). Hillsdale, DeWitt, NY: Gordon System.
NJ: Erlbaum. Gray, J. A. (1982). The neuropsychology of anxiety: An
Fennell, E. B., & Bauer, R. M. (1997). Models of inference enquiry into the functions of the septo-hippocampal
in evaluating brain±behavior relationships in children. In system. Oxford, UK: Oxford University Press.
C. R. Reynolds & E. Fletcher-Janzen (Eds.), Handbook Gray, J. W., & Dean, R. S. (1990). Implications of
296 Neuropsychological Assessment of Children

neuropsychological research for school psychology. In T. Hartlage, P. L., & Givens, T. S. (1982). Common
B. Gutkin & C. R. Reynolds (Eds.), The handbook of neurological problems of school age children. In C. R.
school psychology (pp. 269±288). New York: Wiley. Reynolds & T. B. Gutkin (Eds.), The handbook of school
Gray, J. W., Dean, R. S., & Rattan, G. (1987). Assessment psychology (pp. 1009±1222). New York: Wiley.
of perinatal risk factors. Psychology in the Schools, 24, Hartlage, P. L., & Hartlage, L. C. (1978). Clinical
15±21. consultation to pediatric neurology and developmental
Gross-Tsur, V., Salev, R. S., Manor, O., & Amir, N. pediatrics. Journal of Clinical Child Psychology, 12,
(1995). Developmental right hemisphere syndrome: 52±53.
Clinical spectrum of the nonverbal learning disability. Haut, J. S., Haut, M. W., Callahan, T. S., & Franzen, M.
Journal of Learning Disabilities, 28, 80±86. D. (1992, November). Factor analysis of the Wide Range
Gulbrandson, G. B. (1984). Neuropsychological sequelae Assessment of Memory and Learning (WRAML) scores
of light head injuries in older children 6 months after in a clinical sample. Paper presented at the 12th Annual
trauma. Journal of Clinical Neuropsychology, 6, 257±268. Meeting of the National Academy of Neuropsychology,
Gutkin, T. J., & Reynolds, C. R. (1980, September). Pittsburgh, PA.
Normative data for interpreting Reitan's index of Wechs- Heaton, R. K. (1981). A manual for the Wisconsin Card
ler subtest scatter. Paper presented at the annual meeting Sorting Test. Odessa, FL: Psychological Assessment
of the American Psychological Association, Montreal, Resources.
Canada. Heilman, K. M., Watson, R. T., & Valenstein, E. (1985).
Haak, R. (1989). Establishing neuropsychology in a school Neglect and related disorders. In K. M. Heilman & E.
setting: Organization, problems, and benefits. In C. R. Valenstein (Eds.), Clinical neuropsychology (2nd ed.,
Reynolds & E. Fletcher-Janzen (Eds.), Handbook of pp. 243±293). New York: Oxford University Press.
clinical child neuropsychology (pp. 489±502). New York: Hendren, R. L., Hodde-Vargas, J., Yeo, R. A., & Vargas,
Plenum. L. A. (1995). Neuropsychophysiological study of chil-
Hale, R. L., & Foltz, S. G. (1982). Prediction of academic dren at risk for schizophrenia: A preliminary report.
achievement in handicapped adolescents using a mod- Journal of the American Academy of Child and Adolescent
ified form of the Luria Nebraska Pathognomonic Scale Psychiatry, 34, 1284±1291.
and WISC-R Full Scale IQ. Clinical Neuropsychology, 4, Hiscock, M., & Kinsbourne, M. (1987). Specialization of
99±102. the cerebral hemispheres: Implications for learning.
Hall, C. W., & Kataria, S. (1992). Effects of two treatment Journal of Learning Disabilities, 20, 130.
techniques on delay and vigilance tasks with attention Hooper, S. R., Boyd, T. A., Hynd, G. W., & Rubin, J.
deficit hyperactive disorder (ADHD) children. Journal of (1993). Definitional issues and neurobiological founda-
Psychology, 126, 17±25. tions of selected severe neurodevelopmental disorders.
Halperin, J. M. (1991). The clinical assessment of attention. Archives of Clinical Neuropsychology, 8, 297±307.
International Journal of Neuroscience, 58, 171±182. Hooper, S. R., & Hynd, G. W. (1985). Differential
Halperin, J. M., McKay, K. E., Matier, K., & Sharma, V. diagnosis of subtypes of developmental dyslexia with
(1994). Attention, response inhibition, and activity level the Kaufman Assessment Battery for Children (K-ABC).
in children: Developmental neuropsychological perspec- Journal of Clinical Child Psychology, 14, 145±152.
tives. In M. G. Tramontana & S. R. Hooper (Eds.), Hooper, S. R., & Tramontana, M. G. (1997). Advances in
Advances in child neuropsychology (Vol. 2., pp. 1±54). neuropsychological bases of child and adolescent psy-
New York: Springer-Verlag. chopathology: Proposed models, findings, and on-going
Harbord, M. G., Finn, J. P., Hall-Craggs, M. A., Robb, S. issues. Advances in Clinical Child Psychology, 19,
A., Kendall, B. E., & Boyd, S. G. (1990). Myelination 133±175.
patterns on magnetic resonance of children with devel- Horn, J. L. (1988). Thinking about human abilities. In J. R.
opmental delay. Developmental Medicine and Child Nesselroade & R. B. Cattell (Eds.), Handbook of
Neurology, 32, 295±303. multivariate psychology (2nd ed., pp. 645±685). New
Harrington, D. E. (1990). Educational strategies. In M. York: Academic.
Rosenthal, E. R. Griffith, M. R. Bond, & J. D. Miller Horn, J. L. (1991). Measurement of intellectual capabil-
(Eds.), Rehabilitation of the adult and child with traumatic ities: A review of theory. In K. S. McGrew, J. K. Werder,
brain injury (2nd ed., pp. 476±492). Philadelphia: Davis. & R. W. Woodcock (Eds.), WJ-R technical manual.
Hartlage, L. C. (1975). Neuropsychological approaches to Chicago: Riverside.
predicting outcome of remedial education strategies for Howieson, D. B., & Lezak, M. D. (1992). The neuropsy-
learning disabled children. Pediatric Psychology, 3, chological evaluation. In S. C. Yudofsky & R. E. Hales,
23±28. (Eds.), The American Psychiatric Press textbook of
Hartlage, L. C. (1982). Neuropsychological assessment neuropsychiatry (2nd ed., pp. 127±150). Washington,
techniques. In C. R. Reynolds & T. B. Gutkin (Eds.), DC: American Psychiatric Press.
The handbook of school psychology (pp. 296±313). New Hurd, A. (1996). A developmental cognitive neuropsycho-
York: Wiley. logical approach to the assessment of information
Hartlage, L. C., & Long, C. J. (1997). Development of processing in autism. Child Language Teaching and
neuropsychology as a professional specialty: History, Therapy, 12, 288±299.
training, and credentialing. In C. R. Reynolds & E. Hynd, G. W. (1981). Neuropsychology in schools. School
Fletcher-Janzen (Eds.), Handbook of clinical child Psychology Review, 10, 480±486.
neuropsychology (2nd ed., pp. 3±16). New York: Plenum. Hynd, G. W. (1992). Neuropsychological assessment in
Hartlage, L. C. & Reynolds, C. R. (1981). Neuropsycho- clinical child psychology. Newbury Park, CA: Sage.
logical assessment and the individualization of instruc- Hynd, G. W., & Cohen, M. J. (1983). Dyslexia: Neuro-
tion. In G. W. Hynd & J. E. Obrzut (Eds.), psychological theory, research, and clinical differentiation.
Neuropsychological assessment of the school-aged child New York: Grune & Stratton.
(pp. 355±378). Boston: Allyn & Bacon. Hynd, G. W., Marshall, R. M., & Semrud-Clikeman, M.
Hartlage, L. C., & Telzrow, C. F. (1983). The neuropsy- (1991). Developmental dyslexia, neurolinguistic theory
chological basis of educational intervention. Journal of and deviations in brain morphology. Reading and
Learning Disabilities, 16, 521±528. Writing: An Interdisciplinary Journal, 3, 345±362.
Hartlage, L. C., & Telzrow, C. F. (1986). Neuropsychologi- Hynd, G. W., & Willis, W. G. (1988). Pediatric neuropsy-
cal assessment and intervention with children and adoles- chology. New York: Grune & Stratton.
cents. Sarasota, FL: Professional Resource Exchange. Iivaneihan, M., Launes, J., Pihko, H., Nikkinen, P., &
References 297

Lindroth, L. (1990). Single photon emission computed Korkman, M., & Hakkinen-Rihu, P. (1994). A new
tomography of brain perfusion: Analysis of 60 pediatric classification of developmental language disorders. Brain
cases. Developmental Medicine and Child Neurology, 32, & Language, 47(1), 96±116.
63±68. Korkman, M., Kirk, U., & Kemp, S. (1997). The
Jarvis, P. E., & Barth, J. T. (1984). Halstead±Reitan Test neuropsychological investigation for children. San Anto-
Battery: An interpretive guide. Odessa, FL: PAR. nio, TX: Psychological Corporation.
Jernigan, T. L., & Tallal, P. (1990). Late childhood changes Korkman, M., Liikanen, A., & Fellman, V. (1996).
in brain morphology observable with MRI. Develop- Neuropsychological consequences of very low birth
mental Medicine and Child Neurology, 32, 379±385. weight and asphyxia at term: Follow-up until school
Kail, R. (1984). The development of memory in children. San age. Journal of Clinical Neuropsychology, 18, 220±233.
Francisco: Freeman. Leark, R. A., Snyder, T., Grove, T., & Golden, C. J. (1983,
Kamphaus, R. W. (1993). Clinical assessment of children's August). Comparison of the KABC and standardized
intelligence. Boston: Allyn & Bacon. neuropsychological batteries: Preliminary results. Paper
Kamphaus, R. W., & Reynolds, C. R. (1987). Clinical and presented at the annual meeting of the American
research applications of the K-ABC. Circle Pines, MN: Psychological Association, Anaheim, CA.
American Guidance Service. Leu, P. W., & D'Amato, R. C. (1994, April). Right children,
Kane, R. L., & Kay, G. G. (1997). Computer applications wrong teachers? Using an ecological assessment for
in neuropsychological assessment. In G. Goldstein & T. placement decisions. Paper presented at the 26th Annual
M. Incagnoli (Eds.), Contemporary approaches to neu- Convention of the National Association of School
ropsychological assessment (pp. 359±392). New York: Psychologists, Seattle, WA.
Plenum. Levin, H. S., Culhane, K. A., Hartmann, J., Evankovich,
Kaplan, E. (1988). A process approach to neuropsycholo- K., Mattson, A. J., Harward, H., Ringholz, G., Ewing-
gical assessment. In T. Boll & B. K. Bryant (Eds.). Cobbs, L., & Fletcher, J. M. (1991). Developmental
Clinical neuropsychology and brain function (pp. 125±167). changes in performance on tests of purported frontal
Washington, DC: American Psychological Association. lobe functioning. Developmental Neuropsychology, 7,
Karras, D., Newton, D. B., Franzen, M. D., & Golden, C. 377±395.
J. (1987). Development of factor scales for Luria- Levine, M. D. (1993). Developmental variation and learning
Nebraska Neuropsychological Battery: Children's revi- disorders. Cambridge, MA: Education Publishers Service.
sion. Journal of Clinical Child Psychology, 16, 19±28. Lezak, M. D. (1995). Neuropsychological assessment (4th
Kaufman, A. S. (1976a). A new approach to the ed.). New York: Oxford University Press.
interpretation of test scatter on the WISC-R. Journal Little, S. G., & Stavrou, E. (1993). The utility of
of Learning Disabilities, 9, 160±167. neuropsychological approaches with children. The Be-
Kaufman, A. S. (1976b). Verbal-performance IQ discre- havior Therapist, 16, 104±106.
pancies on the WISC-R. Journal of Learning Disabilities, Livingston, R. B., Pritchard, D. A., Moses, J. A., Haak, R.
9, 739±744. A., Marshall, R., & Gray, R. (1997). Modal profiles for
Kaufman, A. S. (1979). Cerebral specialization and the Halstead±Reitan neuropsychological battery for
intelligence testing. Journal of Research and Development children. Archives of Clinical Neuropsychology, 12,
in Education, 12, 96±107. 450±476.
Kaufman, A. S., & Kaufman, N. L. (1983a). Kaufman Lovrich, D., Cheng, J. C., & Velting, D. M. (1996). Late
Assessment Battery for Children (K-ABC) administration cognitive brain potentials, phonological and semantic
and scoring manual. Circle Pines, MN: American classification of spoken words, and reading ability in
Guidance Services. children. Journal of Clinical Neuropsychology, 18,
Kaufman, A. S., & Kaufman, N. L. (1983b). Kaufman 161±177.
Assessment Battery for Children (K-ABC) interpretative Luria, A. R. (1966). Higher cortical functions in man. New
manual. Circle Pines, MN: American Guidance Services. York: Basic Books.
Kinsbourne, M. (1975). Cerebral dominance, learning, and Luria, A. R. (1970). Functional organization of the brain.
cognition. In H. R. Myklebust (Ed.), Progress in learning Scientific American, 222, 66±78.
disabilities. New York: Grune & Stratton. Luria, A. R., (1973). The working brain. New York: Basic
Kinsbourne, M. (1989). A model of adaptive behavior Books.
related to cerebral participation in emotional control. In Luria, A. R. (1980). Higher cortical functions in man (2nd
G. Gainotti & C. Caltagirone (Eds.), Emotions and the ed.). New York: Basic Books.
dual brain (pp. 248±260). New York: Springer-Verlag Majovski, L. V. (1984). The K-ABC: Theory and applica-
Klesges, R. C. (1983). The relationship between neuropsy- tions for child neuropsychological assessment and
chological, cognitive, and behavioral assessments of research. Journal of Special Education, 18, 266±268.
brain functioning in children. Clinical Neuropsychology, Matazow, G., & Hynd, G. W. (1992, February). Analysis of
5, 28±32. the anterior±posterior gradient hypothesis as applied to
Klonoff, H., & Low, M. (1974). Disordered brain function attention deficit disordered children. Paper presented at
in young children and early adolescents: Neuropsycho- the annual meeting of the International Neuropsycholo-
logical and electroencephalographic correlates. In R. M. gical Society, San Diego, CA.
Reitan & L. A. Davison (Eds.), Clinical neuropsychology: Mattarazzo, J. D. (1972). Wechsler's measurement and
Current status and application (pp. 76±94). Washington, appraisal of adult intelligence. Baltimore: Williams &
DC: Winston. Wilkins.
Knights, R. M., & Norwood, J. W. (1979). A neuropsycho- Maurer, R. G., & Damasio, A. R. (1982). Childhood
logical test battery for children: Examiner's manual. autism from the point of view of behavioral neurology.
Ottawa, Canada: Knights Psychological Consultants. Journal of Autism and Developmental Disorders, 12,
Koriath, U., Gualtieri, C. T., van Bourgondien, M. E., 195±205.
Quade, D., & Werry, J. S. (1985). Construct validity of Mayfield, J. W., & Reynolds, C. R. (1997). Black±white
clinical diagnosis in pediatric psychiatry: Relationship differences in memory test performance among children
among measures. Journal of the American Academy of and adolescents. Archives of Clinical Neuropsychology,
Child Psychiatry, 24, 429±436. 12, 111±122.
Korkman, M. (1988). NEPSY: An adaptation of Luria's McBurnett, K., Hynd, G. W., Lahey, B. B., & Town, P. A.
investigation for young children. Clinical Neuropsychol- (1988). Do neuropsychological measures contribute to
ogist, 2, 375±392. the prediction of academic achievement? The predictive
298 Neuropsychological Assessment of Children

validity of the LNNB-CR pathognomonic scale. Journal Hynd & J. E. Obrzut (Eds.), Child neuropsychology (Vol.
of Psychoeducational Assessment, 6, 162±167. 1, pp. 1±12). New York: Academic Press.
McGlone, J., & Davidson, W. (1973). The relation between Oehler-Stinnett, J., Stinnett, T. A., Wesley, A. L., &
spatial ability with special reference to sex and hand Anderson, H. N. (1988). The Luria Nebraska Neuro-
preference. Neuropsychologia, 11, 105±113. psychological Battery-Children's Revision: Discrimina-
Merola, J. L., & Leiderman, J. (1985). The effect of task tion between learning disabled and slow learner children.
difficulty upon the extent to which performance benefits Journal of Psychoeducational Assessment, 6, 24±34.
from between hemisphere division of inputs. Interna- Parsons, O. A., & Prigatano, G. P. (1978). Methodological
tional Journal of Neuroscience, 51, 35±44. considerations in clinical neuropsychological research.
Mesulam, M. M. (1985). Principles of behavioral neurology. Journal of Consulting and Clinical Psychology, 46,
Philadelphia: F. A. Davis. 608±619.
Milberg, W. B., Hebben, N., & Kaplan, E. (1986). The Passler, M., Isaac, W., & Hynd, G. W. (1985). Neuropsy-
Boston process approach to neuropsychological assess- chological development of behavior attributed to frontal
ment. In I. Grant & K. M. Adams (Eds.), Neuropsycho- lobe functioning in children. Developmental Neuropsy-
logical assessment and neuropsychiatric disorders (2nd chology, 1, 349±370.
ed., pp. 58±80). New York: Oxford University Press. Pedhazur, E. J. (1973). Multiple regression in behavioral
Miller, L. T., & Vernon, P. A. (1996). Intelligence, reaction research: Explanation and prediction. New York: CBS
time, and working memory in 4- to 6-year-old children. College Publishing.
Intelligence, 22, 155±190. Pfeiffer, S. I., Naglieri, J. A., & Tingstrom, D. H. (1987).
Mitchell, W. G., Chavez, J. M., Baker, S. A., Guzman, B. Comparison of the Luria Nebraska Neuropsychological
L., & Azen, S. P. (1990). Reaction time, impulsivity, and Battery-Children's Revision and the WISC-R with
attention in hyperactive children and controls: A video learning disabled children. Perceptual and Motor Skills,
game technique. Journal of Child Neurology, 5, 195±204. 65, 911±916.
Moffitt, T. E. (1993). The neuropsychology of conduct Phelps, L. (1995). Exploratory factor analysis of the
disorder. Developmental Psychopathology, 5, 135±151. WRAML with academically at-risk students. Journal of
Molfese, D. L. (1995). Electrophysiological responses ob- Psychoeducational Assessment, 13, 384±390.
tained during infancy and their relation to later language Phelps, L. (1996). Discriminative validity of the WRAML
development: Further findings. In M. G. Tramontana & with ADHD and LD children. Psychology in the Schools,
S. R. Hooper, (Eds.), Advances in Child Neuropsychology 33, 5±12.
(Vol. 3, pp. 1±11). New York: Springer-Verlag. Plaisted, J. R., Gustavson, J. C., Wilkening G. N., &
Morgan, S. B., & Brown, T. L. (1988). Luria-Nebraska Golden, C. J. (1983). The Luria Nebraska Neuropsy-
Neuropsychological Battery-Children's Revision: Con- chological Battery-Children's Revision: Theory and
current validity with three learning disability subtypes. current research findings. Journal of Clinical Child
Journal of Consulting and Clinical Psychology, 56, Psychology, 12, 13±21.
463±466. Powell, H. (1997). Comment on computerized assessment
Morris, J. M., & Bigler, E. (1985, January). An investigation of arithmetic computation skills with MicroCog. Journal
of the Kaufman Assessment Battery for Children (KABC) of the International Neuropsychological Society, 3, 200.
with neurologically impaired children. Paper presented at Powell, D. H., Kamplan, E. F., Thitla, D., Weintraub, S.,
the annual meeting of the International Neuropsycholo- Catlin, R., & Funkenstein, H. H. (1993). MicroCog
gical Society, San Diego, CA. assessment of cognitive functioning manual. San Antonio,
Morris, R. (1994). Multidimensional neuropsychological TX: Psychological Corporation.
assessment models. In G. R. Lyon (Ed.), Frames of Ramsey, M. C., & Reynolds, C. R. (1995). Separate digits
reference for the assessment of learning disabilities: New tests: A brief history, a literature review, and re-
views on measurement (pp. 515±522). Baltimore: Brookes examination of the factor structure of the Test of
Novak, G. P., Solanto, M., & Abikoff, H. (1995). Spatial Memory and Learning (TOMAL). Neuropsychology
orienting and focused attention in attention deficit Review, 5, 151±171.
hyperactivity disorder. Psychophysiology, 32, 546±559. Reitan, R. M. (1969). Manual for the administration of
Nussbaum, N. L., & Bigler, E. D. (1986). Neuropsycho- neuropsychological test batteries for adults and children.
logical and behavioral profiles of empirically derived Indianapolis, IN: Author.
subgroups of learning disabled children. International Reitan, R. M. (1974). Clinical neuropsychology: Current
Journal of Clinical Neuropsychology, 8, 82±89. status and applications. New York: Winston.
Nussbaum, N. L., & Bigler, E. D. (1990). Identification and Reitan, R. M. (1986). Theoretical and methodological bases
treatment of attention deficit disorder. Austin, TX: Pro-Ed. of the Halstead±Reitan Neuropsychological Test Battery.
Nussbaum, N. L., & Bigler, E. D. (1997). Halstead±Reitan Tucson, AZ: Neuropsychological Press.
neuropsychological test batteries for children. In C. R. Reitan, R. M. (1987). Neuropsychological evaluation of
Reynolds & E. Fletcher-Janzen (Eds.), Handbook of children. Tucson, AZ: Neuropsychological Press.
clinical child neuropsychology (2nd ed., pp. 219±236). Reitan, R. M., & Davison, L. A. (1974). Clinical
New York: Plenum. neuropsychology: Current status and applications. Wa-
Nussbaum, N. L., Bigler, E. D., Koch, W. R., Ingram, J. shington, DC: Winston.
W., Rosa, L., & Massman, P. (1988). Personality/ Reitan, R. M., & Wolfson, D. (1985). The Halstead Reitan
behavioral characteristics in children: Differential effects neuropsychological battery: Theory and clinical interpre-
of putative anterior versus posterior cerebral asymmetry. tation. Tucson, AZ: Neuropsychological Press.
Archives of Clinical Neuropsychology, 3, 127±135. Reitan, R. M., & Wolfson, D. (1988). The Halstead Reitan
Obrzut, J. E. (1981). Neuropsychological procedures with Neuropsychological Test Battery and REHABIT: A
school-age children. In G. W. Hynd & J. E. Obrzut model for integrating evaluation and remediation of
(Eds.), Neuropsychological assessment and the school-age cognitive impairment. Cognitive Rehabilitation, May-
child: Issues and procedures (pp. 237±275). New York: June, 10±17.
Grune & Stratton. Reschly, D., & Gresham, F. M. (1989). Current neuro-
Obrzut, J. E., & Hynd, G. W. (1983). The neurobiological psychological diagnosis of learning problems: A leap of
and neuropsychological foundations of learning disabil- faith. In C. R. Reynolds & E. Fletcher-Janzen (Eds.),
ities. Journal of Learning Disabilities, 16, 515±520. Handbook of clinical child neuropsychology (pp. 503±520).
Obrzut, J. E., & Hynd, G. W. (1986). Child neuropsychol- New York: Plenum.
ogy: An introduction to theory and research. In G. W. Reynolds, C. R. (1979). Interpreting the index of abnorm-
References 299

ality when the distribution of score differences is known: learning disabilities. Learning Disability Quarterly, 17,
Comment on Piotrowski. Journal of Consulting and 311±322.
Clinical Psychology, 47, 401±402. Riccio, C. A., Hall, J., Morgan, A., Hynd, G. W.,
Reynolds, C. R. (1981a). The neuropsychological basis of Gonzalez, J. J., & Marshall, R. M. (1994). Executive
intelligence. In G. W. Hynd & J. E. Obrzut (Eds.), function and the Wisconsin card sorting test: Relation-
Neuropsychological assessment and the school-aged child: ship with behavioral ratings and cognitive ability.
Issues and procedures (pp. 87±124). New York: Grune & Developmental Neuropsychology, 10, 215±229.
Stratton. Riccio, C. A., & Hynd, G. W. (1995). Contributions of
Reynolds, C. R. (1981b). Neuropsychological assessment neuropsychology to our understanding of developmental
and the habilitation of learning: Considerations in the reading problems. School Psychology Review, 24,
search for the aptitude 6 treatment interaction. School 415±425.
Psychology Review, 10, 343±349. Riccio, C. A., & Hynd, G. W. (1996). Neuroanatomical
Reynolds, C. R. (1982). The importance of norms and and neurophysiological aspects of dyslexia. Topics in
other traditional psychometric concepts to assessment Language Disorders, 16(2), 1±13.
in clinical neuropsychology. In R. N. Malathesha & Riccio, C. A., & Hynd, G. W., & Cohen, M. J. (1993).
L. C. Hartlage (Eds.), Neuropsychology and cognition Neuropsychology in the schools: Does it belong? School
(Vol. 3, pp. 55±76). The Hague, The Netherlands: Psychology International, 14, 291±315.
Nijhoff. Riccio, C. A., Hynd, G. W., & Cohen, M. J. (1996).
Reynolds, C. R. (1986a). Transactional models of intellec- Etiology and neurobiology of Attention-Deficit Hyper-
tual development, yes. Deficit models of process activity Disorder. In W. Bender (Ed.), Understanding
remediation, no. School Psychology Review, 15, 256±260. ADHD: A practical guide for teachers and parents
Reynolds, C. R. (1986b). Clinical acumen but psychometric (pp. 23±44). New York: Merrill.
naivete in neuropsychological assessment of educational Ris, M. D., & Noll, R. B. (1994). Long-term neurobeha-
disorders. Archives of Clinical Neuropsychology, 1(2), vioral outcome in pediatric brain tumor patients: Review
121±137. and methodological critique. Journal of Clinical and
Reynolds, C. R. (1992). Two key concepts in the diagnosis Experimental Neuropsychology, 16, 21.
of learning disabilities and the habilitation of learning. Rothlisberg, B. A., & D'Amato, R. C. (1988). Increased
Learning Disability Quarterly, 15(1), 2±12. neuropsychological understanding seen as important for
Reynolds, C. R. (1997a). Forward and backward memory school psychologists. Communique, 17(2), 4±5.
span should not be combined for clinical analysis. Rourke, B. P. (1984). Subtype analysis of learning
Archives of Clinical Neuropsychology, 12, 29±40. disabilities. New York: Guilford.
Reynolds, C. R. (1997b). Measurement and statistical Rourke, B. P. (1989). Nonverbal learning disabilities: The
problems in neuropsychological assessment of children. syndrome and the model. New York: Guilford.
In C. R. Reynolds & E. Fletcher-Janzen (Eds.), Hand- Rourke, B. P. (1991). Neuropsychological validation of
book of clinical child neuropsychology (2nd ed., learning disability subtypes. New York: Guilford.
pp. 180±203). New York: Plenum. Rourke, B. P. (1994). Neuropsychological assessment of
Reynolds, C. R. (1997c). Postscripts on premorbid ability children with learning disabilities: Measurement issues.
estimation: Conceptual addenda and a few words on In G. R Lyon (Ed.), Frames of reference for the
alternative and conditional approaches. Archives of assessment of learning disabilities: New views on measure-
Clinical Neurpsychology, 12, 769±778. ment issues (pp. 475±514). Baltimore: Brookes.
Reynolds, C. R., & Bigler, E. D. (1994). Manual for the Rourke, B. P., Bakker, D. J., Fisk, J. L., & Strang, J. D.
Test of Memory and Learning. Austin, TX: PRO-ED. (1983). Child neuropsychology: An introduction to theory,
Reynolds, C. R., & Bigler, E. D. (1996). Factor structure, research, and practice. New York: Guilford.
factor indexes, and other useful statistics for inter- Rourke, B. P., Fisk, J. L., & Strang, J. D. (1986).
pretation of the Test of Memory and Learning (TO- Neuropsychological assessment of children: A treatment
MAL). Archives of Clinical Neuropsychology, 11, 29±43. oriented approach. New York: Guilford.
Reynolds, C. R., & Bigler, E. D. (1997a). Clinical Rutter, M. (1981). Psychological sequelae of brain damage
neuropsychological assessment of child and adolescent in children. American Journal of Clinical Neuropsychol-
memory with the Test of Memory and Learning. In C. R. ogy, 138, 1533±1544.
Reynolds & E. Fletcher-Janzen (Eds.), Handbook of Rutter, M. (1983). Developmental neuropsychiatry. New
clinical child neuropsychology (2nd ed., pp. 296±319). York: Guilford.
New York: Plenum. Rutter, M., Graham, P., & Yule, W. (1970). A neuropsy-
Reynolds, C. R., & Kamphaus, R. W. (1992). Behavior chiatric study in childhood. London: Lavenham Press.
assessment system for children. Circle Pines, MN: Saigal, S. (1995). Long term outcome of very low-birth-
American Guidance Services. weight infants: Kindergarten and beyond. Developmental
Reynolds, C. R., & Kamphaus, R. W. (1997). The Brain Dysfunction, 8, 109±118.
Kauffman assessment battery for children: Develop- Samuels, S. J. (1979). An outside view of neuropsycholo-
ment, structure and applications in neuropsychology. In gical testing. Journal of Special Education, 13, 57±60.
A. M. Horton, D. Wedding, & J. Webster, (Eds.), The Sandoval, J. (1981, August). Can neuropsychology con-
neuropsychology handbook (Vol. 1, pp. 291±330). New tribute to rehabilitation in educational settings? No. Paper
York: Springer. presented at the annual meeting of the American
Reynolds, C. R., Kamphaus, R. W., Rosenthal, B. L., & Psychological Association, Los Angeles, CA.
Hiemenz, J. R. (1997). Application of the Kaufman Sandoval, J., & Halperin, R. M. (1981). A critical
assessment battery for children (K-ABC) in neuropsy- commentary on neuropsychology in the schools: Are
chological assessment. In C. R. Reynolds & E. Fletcher- we ready? School Psychology Review, 10, 381±388.
Janzen (Eds.), Handbook of clinical child neuropsychology Segalowitz, S. (1983). Language functions and brain
(2nd ed., pp. 252±269). New York: Plenum. organization. New York: Academic Press.
Reynolds, C. R., Wilen, S., & Stone, B. (1997, November). Seidel, U. P., Chadwick, O., & Rutter, M. (1975).
The economy of neuropsychological evaluations. Paper Psychological disorders in crippled children: A compara-
presented at the annual meeting of the National tive study of children with and without brain damage.
Academy of Neuropsychology, Las Vegas, NV. Developmental Medicine and Child Neurology, 17, 563.
Riccio, C. A., Gonzalez, J. J., & Hynd, G. W. (1994). Seidman, L. J., Biederman, J., Faraone, S. V., Milberger,
Attention-deficit hyperactivity disorder (ADHD) and S., Norman, D., Seiverd, K., Benedict, K., Guite, J.,
300 Neuropsychological Assessment of Children

Mick, E., & Kiely, K. (1995). Effects of family history (1987). Incremental validity of the Halstead±Reitan
and comorbidity on the neuropsychological performance neuropsychological battery in predicting achievement
of children with ADHD: Preliminary findings. Journal of for learning disabled children. Journal of Psychoeduca-
the American Academy of Child and Adolescent Psychia- tional Assessment, 5, 157±165.
try, 34, 1015±1024. Sweet, J. J., Carr, M. A., Rossini, E., & Kasper, C. (1986).
Selz, M. (1981). Halstead±Reitan neuropsychological test Relationship between the Luria Nebraska Neuropsycho-
batteries for children. In G. W. Hynd & J. E. Obrzut logical Battery-Children's Revision and the WISC-R:
(Eds.), Neuropsychological assessment of the school-aged Further examination using Kaufman's factors. Interna-
child: Issues and procedures, (pp. 195±235). New York: tional Journal of Clinical Neuropsychology, 8, 177±180.
Grune & Stratton. Sweet, J. J., & Moberg, P. (1990). A survey of practices and
Selz, M. & Reitan, R. M. (1979a). Rules for neuropsycho- beliefs among ABPP and non-ABPP clinical neuropsy-
logical diagnosis: Classification of brain function in chologists. The Clinical Neuropsychologist, 4, 101±120.
older children. Journal of Consulting and Clinical Sweet, J. J., Moberg, P., & Westergaard, C. K. (1996). Five
Psychology, 47, 258±264. year follow-up survey of practices and beliefs of clinical
Selz, M. & Reitan, R. M. (1979b). Neuropsychological test neuropsychologists. The Clinical Neuropsychologist, 10,
performance of normal, learning disabled, and brain 202±221.
damaged older children. Journal of Nervous and Mental Talley, J. L. (1986). Memory in learning disabled children:
Disease, 167, 298±302. Digit span and the Rey Auditory Verbal Learning Test.
Shapiro, E. G., & Dotan, N. (1985, October). Neurological Archives of Clinical Neuropsychology, 1, 315±322.
findings and the Kaufman Assessment Battery for Taylor, H. G. (1988). Neuropsychological testing: Rele-
Children. Paper presented at the annual meeting of the vance for assessing children's learning disabilities.
National Association of Neuropsychologists, Philadel- Journal of Consulting and Clinical Psychology, 56,
phia, PA. 795±800.
Sheslow, D., & Adams, W. (1990). Wide Range Assessment Taylor, H. G., Barry, C. T., & Schatschneider, C. W.
of Memory and Learning. Wilmington, DE: Jastak (1993). School-age consequences of haemophilus influ-
Associates. enzae type b meningitis. Journal of Clinical Child
Shields, J., Varley, R., Broks, P., & Simpson, A. (1996). Psychology, 22, 196±206.
Hemisphere function in developmental language disor- Taylor, H. G., & Fletcher, J. M. (1990). Neuropsycholo-
ders and high level autism. Developmental Medicine & gical assessment of children. In G. Goldstein & M.
Child Neurology, 38, 473±486. Hersen (Eds.), Handbook of neuropsychological assess-
Shurtleff, H. A., Fay, G. E., Abbott, R. D., & Berninger, ment (pp. 228±255). New York: Plenum.
V. W. (1988). Cognitive and neuropsychological corre- Teeter, P. A. (1986). Standard neuropsychological batteries
lates of academic achievement: A levels of analysis for children. In G. W. Hynd & J. E. Obrzut (Eds.), Child
assessment model. Journal of Psychoeducational Assess- neuropsychology (Vol. 2, pp. 187±227). New York:
ment, 6, 298±308. Academic Press.
Snow, J. H., & Hooper, S. R. (1994). Pediatric traumatic Teeter, P. A. (1997). Neurocognitive interventions for
brain injury. Thousand Oaks, CA: Sage. childhood and adolescent disorders: A transactional
Snow, J. H., & Hynd, G. W. (1985a). Factor structure of model. In C. R. Reynolds & E. Fletcher-Janzen, Hand-
the Luria-Nebraska Neuropsychological Battery-Chil- book of clinical child neuropsychology (2nd ed.,
dren's Revision. Journal of School Psychology, 23, pp. 387±417). New York: Plenum.
271±276. Teeter, P. A., & Semrud-Clikemen, M. (1997). Child
Snow, J. H., & Hynd, G. W. (1985b). A multivariate neuropsychology: Assessment and interventions for neuro-
investigation of the Luria-Nebraska Neuropsychological developmental disorders. Boston: Allyn & Bacon.
Battery-Children's Revision with learning disabled chil- Telzrow, C. F., Century, E., Harris, B., & Redmond, C.
dren. Journal of Psychoeducational Assessment, 2, 23±28. (1985, April). Relationship between neuropsychological
Snow, J. H., Hynd, G. W., & Hartlage, L. H. (1984). processing models and dyslexia subtypes. Paper presented
Differences between mildly and more severely learning at the annual meeting of the National Association of
disabled children on the Luria Nebraska Neuropsycho- School Psychologists, Las Vegas, NV.
logical Battery-Children's Revision. Journal of Psycho- Temple, C. M. (1997). Cognitive neuropsychology and its
educational Assessment, 2, 23±28. application to children. Journal of Child Psychology,
Snyder, T. J., Leark, R. A., Golden, C. J., Grove, T., & Psychiatry, and Allied Disciplines, 38, 27±52.
Allison, R. (1983, March). Correlations of the K-ABC, Timmermans, S. R., & Christensen, B. (1991). The
WISC-R, and Luria Nebraska Children's Battery for measurement of attention deficit in TBI children and
exceptional children. Paper presented at the annual adolescents. Cognitive Rehabilitation, 9, 26.
meeting of the National Association of School Psychol- Torgesen, J. K. (1994). Issues in the assessment of executive
ogists, Detroit, MI. function: An information processing perspective. In G.
Snyderman, M., & Rothman, S. (1987). Survey of expert R. Lyon (Ed.), Frames of reference for the assessment of
opinion on intelligence and aptitude testing. American learning disabilities: New views on measurement issues
Psychologist, 42, 137±144. (pp. 143±162). Baltimore: Brookes.
Sperry, R. W. (1968). Hemisphere deconnection and unity Torkelson, R. D., Leibrook, L. G., Gustavson, J. L., &
in conscious awareness. American Psychologist, 23, Sundell, R. R. (1985). Neurological and neuropsycholo-
723±733. gical effects of cerebral spinal fluid shunting in children
Sperry, R. W. (1974). Lateral specialization in the with assumed arrested (ªnormal pressureº) hydrocepha-
surgically separated hemispheres. In F. O. Schmitt & lus. Journal of Neurology, Neurosurgery, and Psychiatry,
F. G. Worden (Eds.), The neurosciences: Third study 48, 799±806.
program. Cambridge, MA: MIT Press. Tramontana, M. G. (1983). Neuropsychological evaluation
Spreen, O., & Gaddes, W. H. (1979). Developmental norms in child/adolescent psychiatric disorders: Current status.
for 15 neuropsychological tests age 6±15. Cortex, 5, Psychiatric Hospital, 14, 158±162.
813±818. Tramontana, M., & Hooper, S. (Eds.) (1987). Neuropsy-
Spreen, O., Risser, A. H., & Edgell, D. (1995). Develop- chological assessment with children. New York: Plenum.
mental neuropsychology. London: Oxford University Tramontana, M., & Hooper, S. (1997). Neuropsychology
Press. of child psychopathology. In C. R. Reynolds & E.
Strom, D. A., Gray, J. W., Dean, R. S., & Fischer, W. E. Fletcher-Janzen (Eds.), Handbook of clinical child
References 301

neuropsychology (2nd ed., pp. 120±139). New York: C., Everett, B., & Vaught, L. (1993). Concurrent and
Plenum. discriminant validity of the Gordon Diagnostic System:
Tramontana, M. G., Hooper, S. R., Curley, A. S., & A preliminary study. Psychology in the Schools, 30,
Nardolillo, E. M. (1990). Determinants of academic 29±36.
achievement in children with psychiatric disorders. Whitten, C. J., D'Amato, R. C., & Chitooran, M. M.
Journal of the American Academy of Child and Adolescent (1992). The neuropsychological approach to interven-
Psychiatry, 29, 265±268. tions. In R. C. D'Amato & B. A. Rothlisberg (Eds.),
Tucker, D. M. (1989). Neural substrates of thought and Psychological perspectives on intervention: A case study
affective disorders. In G. Gainotti & C. Caltagirone approach to prescriptions for change (pp. 112±136). New
(Eds.), Emotions and the dual brain (pp. 225±234). New York: Longman.
York: Springer-Verlag. Williams, M. A., & Boll, T. J. (1997). Recent advances in
Turkheimer, E., Yeo, R. A., Jones, C., & Bigler, E. D. neuropsychological assessment of children. In G. Gold-
(1990). Quantitative assessment of covariation between stein & T. M. Incagnoli (Eds.), Contemporary approaches
neuropsychological function and location of naturally to neuropsychological assessment (pp. 231±267). New
occurring lesions in humans. Journal of Clinical and York: Plenum.
Experimental Neuropsychology, 12, 549±565. Willson, V. L., & Reynolds, C. R. (1982). Methodological
Voeller, K. K. S. (1995). Clinical neurologic aspects of the and statistical problems in determining membership in
right hemisphere deficit syndrome. Journal of Child clinical populations. Clinical Neuropsychology, 4,
Neurology, 10, 516±522. 134±138.
Vygotsky, L. S. (1980). Mind in society: The development of Wittelson, S. F. (1977). Early hemisphere specialization
higher psychological process. Cambridge, MA: Harvard and interhemispheric plasticity: An empirical and
University Press. theoretical review. In S. Segalowitz & F. A. Gruber
Waber, D. P., & McCormick, M. C. (1995). Late (Eds.), Language development and neurological theory
neuropsychological outcomes in preterm infants of (pp. 213±287). New York: Academic Press.
normal IQ: Selective vulnerability of the visual system. Woody, R. H. (1997). Psycholegal issues for clinical child
Journal of Pediatric Psychology, 20, 721±735. neuropsychology. In C. R. Reynolds & E. Fletcher-
Wasserman, J. (1995, February). Assessment and remedia- Janzen (Eds.), Handbook of clinical child neuropsychology
tion of memory deficits in children. Paper presented at the (2nd ed., pp. 712±725). New York: Plenum.
meeting of the Supervisors of School Psychologists for Ylvisaker, M., Chorazy, A. J. L., Cohen, S. B., Mastrilli, J.
the New York Board of Education. New York. P., Molitor, C. B., Nelson, J., Szekeres, S. F., Valko, A.
Wechsler, D. (1974). Wechsler Intelligence Scale for S., & Jaffe, K. M. (1990). Rehabilitative assessment
Children-Revised. New York: Psychological Corpora- following head injury in children. In M. Rosenthal, E. R.
tion. Griffith, M. R. Bond, & J. D. Miller (Eds.), Rehabilita-
Welsh, M. C., Pennington, B. F., & Grossier, D. B. (1991). tion of the adult and child with traumatic brain injury (2nd
A normative developmental study of executive function: ed., pp. 521±538). Philadelphia: Davis.
A window on prefrontal function in children. Develop- Zurcher, R. (1995). Memory and learning assessment:
mental Neuropsychology, 7, 131±139. Missing from the learning disabilities identification
Wherry, J. N., Paal, N., Jolly, J. B., Adam, B., Holloway, process for too long. LD Forum, 21, 27±30.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.11
Neuropsychological Assessment
of Adults
C. MUNRO CULLUM
University of Texas Southwestern Medical Center at Dallas,
TX, USA

4.11.1 INTRODUCTION 304


4.11.2 APPROACHES TO ASSESSMENT IN CLINICAL NEUROPSYCHOLOGY 305
4.11.2.1 The Standard Battery 305
4.11.2.2 Test Batteries for Specific Populations 308
4.11.2.3 The Hypothesis-driven Approach 308
4.11.2.4 Quantitative and Qualitative Examinations of Neurobehavioral Competence 309
4.11.2.5 Other Issues in Test Interpretation 309
4.11.2.5.1 Test cutoff scores 310
4.11.2.5.2 T-scores 310
4.11.2.6 Computer Interpretation of Neuropsychological Assessment Results 310
4.11.2.7 Cognitive Screening 311
4.11.3 METHODS OF NEUROPSYCHOLOGY 311
4.11.3.1 Clinical Interview and Background Information 312
4.11.3.2 Neuropsychological Measurement of Brain Function 312
4.11.4 PRINCIPAL COGNITIVE DOMAINS FOR ASSESSMENT 315
4.11.4.1 Global Cognitive/Intellectual Functioning 315
4.11.4.2 Academic Achievement 322
4.11.4.3 Executive Functioning, Problem-solving, and Reasoning 323
4.11.4.3.1 Tests of reasoning and problem-solving 323
4.11.4.4 Arousal and Orientation 324
4.11.4.5 Attention/Concentration 325
4.11.4.5.1 Assessment of attention 325
4.11.4.6 Language 326
4.11.4.6.1 Global language assessment 327
4.11.4.7 Visuospatial Skills 328
4.11.4.7.1 Visuospatial tasks 329
4.11.4.8 Memory 332
4.11.4.8.1 Clinical assessment of learning and memory 332
4.11.4.8.2 Clinical memory tests 335
4.11.4.8.3 Memory batteries 335
4.11.4.8.4 Verbal memory tests 336
4.11.4.8.5 Nonverbal memory tests 337
4.11.4.9 Motor and Sensory Function 338
4.11.4.9.1 Psychometric measures of motor and sensory function 338
4.11.4.10 Assessment of Motivation 339
4.11.4.11 Personality and Emotional Functioning 340
4.11.4.12 Test Selection Issues 341
4.11.4.13 Relationships Between Neuropsychometry and the Behavioral Geography of the Brain 341

303
304 Neuropsychological Assessment of Adults

4.11.4.14 Training in Neuropsychology 342


4.11.4.15 Challenges for Neuropsychological Assessment 343
4.11.5 REFERENCES 343

4.11.1 INTRODUCTION regarding their functioning, and feedback may


be used therapeutically in terms of adjustment
Clinical neuropsychology represents one of issues. In many cases, informing the patient that
the most rapidly growing areas within the field their cognitive difficulties can be documented
of psychology and was one of the first to gain and explained (or at least are not unexpected)
specialty status by the American Psychological can be most reassuring. Discussing the nature of
Association in 1996. At a basic level, the a disorder and the various cognitive and
neuropsychological examination represents a emotional sequelae it may have, as well as the
combination of the traditional behavioral typical course of recovery or changes that can be
neurologic examination with a psychometric expected, is also helpful in terms of setting
approach to the evaluation of brain±behavior realistic expectations and goals. Furthermore,
relationships. From the broader field of psy- presenting information and potential compen-
chology, neuropsychology derived its emphasis satory strategies or interventions that are
in the application of psychometric procedures helpful given a patient's particular situation
to quantify behavior. From its other parent can be rewarding. In some medical settings in
discipline of neurology came the interest in particular, the neuropsychologist may play a
evaluating brain function. The term ªneurop- central role in helping patients understand the
sychologyº originally evolved in the 1930s and nature of their difficulties by providing this
1940s and its popularization is often attributed information in understandable terms within an
to Hans-Lukas Teuber. Many early neuropsy- emotionally supportive context.
chological procedures were developed during Among existing neurodiagnostic procedures,
war time to assess cognitive status and the neuropsychological evaluation remains the
suitability of individuals for special military most sensitive means of assessing human brain
service. Subsequently, penetrating missile function. Knowing the presence and location of
wounds to the brain became the focus of a given lesion, for example, provides only
localization studies. Accordingly, many tests limited information about an individual's
were created with the goal of being sensitive to functioning. For example, the patient in Figure
focal brain insults. Other measures were devel- 1 sustained an infarction of the right posterior
oped to assess for ªorganicity,º a now archaic cerebral artery.
term that had been used to grossly refer to brain How such a patient might be functioning in
damage or neurological deficit. Prior to the his or her daily life, however, remains a question
advent of modern neuroimaging procedures, that simple structural neuroimaging cannot
neuropsychological techniques emerged as a address. For example, the individual depicted
front-line assessment procedure for the identi- in Figure 1 might be expected to show a
fication and localization of acquired cerebral contralateral visual field cut (hemianopsia)
damage. While neuropsychological techniques based on the location of the lesion, although
continue to provide this aspect of neurodiag- the impact of this as well as any other associated
nostic assessment to some extent, more com- cognitive processing difficulties would be un-
monly the procedures are used to describe and known without clinical examination.
quantify behavior. The results of these evalua- Another example is provided in the case of the
tions can be used to infer cerebral integrity vs. 65-year-old patient whose magnetic resonance
dysfunction, to delineate cognitive strengths imaging (MRI) scan is presented in Figure 2.
and weaknesses, and to assist in differential This individual demonstrated grossly normal
diagnosis. Assisting other professionals in brain structure, with no major evidence of
differential diagnostic situations and document- atrophy or particular neuropathology. Despite
ing level of impairment can be a primary role for the normal appearance of her brain, a severe
these evaluations. Other goals of neuropsycho- level of dementia was observed as reflected by
logical assessment include making treatment her score of 10/30 on the Mini-Mental State
and rehabilitation recommendations, assisting Examination (MMSE; Folstein, Folstein, &
in placement issues, and in evaluating treatment McHugh, 1975). Although dementia is often
response (e.g., in cases of neurosurgical or associated with cortical atrophy, clinicians
pharmacological intervention). should keep in mind that dementia is a clinical
The neuropsychological examination can diagnosis and the degree of atrophy on
also provide useful information to patients neuroimaging shows only a modest association
Approaches to Assessment in Clinical Neuropsychology 305

Figure 1 CT scan depicting posterior right hemisphere lesion (note that right is depicted on the left).

with level of cognitive impairment (Naugle, about clinical neuropsychology (e.g., see Grant
Cullum, Bigler, & Massman, 1986). & Adams, 1996; Lezak, 1995; Naugle, Cullum,
Given the utility of neuropsychological & Bigler, 1998; Spreen & Strauss, 1998).
techniques in terms of documenting and under- The concept of ªapproachesº to neuropsy-
standing the neurobehavioral sequelae of chological assessment can be considered from
cerebral dysfunction, a discussion of some of several perspectives. Historically, the ªfixedº vs.
the major approaches and common measures ªflexibleº test battery approach was a topic of
used in clinical neuropsychology is in order. discussion and even heated debate for many
years. Since many contemporary neuropsychol-
ogists tend to utilize a ªcoreº battery of favorite
4.11.2 APPROACHES TO ASSESSMENT measures for many of the patients they examine,
IN CLINICAL but modify the assessment by adding or deleting
NEUROPSYCHOLOGY specific tests depending upon the clinical
population and/or referral issues in question,
Contemporary clinical neuropsychology this has become an essentially moot issue.
owes a debt to many of its early founders, Furthermore, the modification of even the most
and because of the youth of the field, many of ªfixedº of test batteries to address individual
these individuals remain heavily involved in the patient needs and referral questions is now
field. A listing of the major contributors to the commonplace. Thus, the distinction between
field of clinical neuropsychology is beyond the fixed and flexible approaches to neuropsycho-
scope of this chapter and involves a wide variety logical assessment is somewhat arbitrary,
of individuals from different theoretical back- although some discussion of the central issues
grounds and disciplines. In terms of major involved in test selection and the composition of
contributors to the development of neuropsy- ªstandardº or core test batteries is in order.
chological assessment, such a list would have to
include the following: Benton, Butters, Gold-
stein, Goodglass, Halstead, Kaplan, Meier, 4.11.2.1 The Standard Battery
Parsons, Reitan, and Spreen. Citations and
representative examples of their work can be In its strictest sense, a fixed standard battery
found in various textbooks and merit review by approach involves the administration of the
those interested in obtaining more information same set of tests to all patients, regardless of
306 Neuropsychological Assessment of Adults

Figure 2 MRI depicting normal gross brain structure in a patient with severe dementia.

diagnostic or referral question, level of impair- important issue than whether this represents a
ment, patient complaints, or clinical presenta- fixed or more flexible assessment approach.
tion. Applied too rigidly, such an approach may Perhaps the best example of a standard
not yield appropriately detailed information neuropsychological battery is the group of tests
regarding a specific area of deficiency (e.g., that was originally created in the 1940s by Ward
memory) unless that ability area is adequately Halstead and further developed by one of his
represented in the standard or core battery. The most prominent students and colleagues, Ralph
application of an extensive omnibus test battery Reitan. Halstead had strong interests in the
to all individual cases may also prove less assessment of biological intelligence and in the
efficient in some instances by oversampling quantification of brain±behavior relationships,
behavior. For example, in the case of a rigid particularly at a time when localization of cereb-
standard battery, a lack of impairment on a ral lesions was often a focus of neuropsycholo-
higher level ability measure may be followed by gical evaluations. The work of Halstead and
the routine administration of a similar task that Reitan contributed monumentally to the devel-
tends to be less sensitive to deficits in that same opment of American neuropsychology, and the
domain. However, it has also been argued that a Halstead±Reitan Neuropsychological Battery
standard battery approach may allow for a (HRB) continues to be used in many settings,
broader assessment of neurobehavioral abilities whether used in toto or by its component tests.
in some settings, and hence, may detect cognitive The core HRB consists of the following tests
impairment that a less comprehensive battery and the primary corresponding cognitive func-
might miss. For example, a patient referred for tions (also see Dodrill, 1997): Category Test
memory assessment might be administered only (abstract reasoning, logical analysis), Tactual
a test of global intelligence (e.g., Wechsler Adult Performance Test (complex psychomotor pro-
Intelligence Scale-Revised; WAIS-R) and se- blem solving); Trail Making Test (psychomotor
lected measures of memory (e.g., the Logical speed and cognitive sequencing); Aphasia
Memory and Visual Reproduction subtests from Screening Test (brief assessment of language
one of the versions of the Wechsler Memory and graphomotor construction); Speech Sounds
Scale are commonly used in various settings). Perception Test (verbal/auditory attention);
Such an evaluation may not be sensitive to Seashore Rhythm Test (nonverbal auditory
deficits in executive functioning or higher-order attention); Finger Tapping Test (fine motor
cognitive integration that might be manifest speed); Grip Strength (gross motor strength);
upon more detailed assessment and that can Sensory-Perceptual Examination (basic sensory
impact memory test performance. It should be and perceptual abilities); and Tactile Form
noted, however, that the limited comprehen- Recognition (sensory perception, agnosia).
siveness of the battery would be a more Composite summary indices from the core
Approaches to Assessment in Clinical Neuropsychology 307

HRB can also be calculated in order to provide logical tests can occur even when subscores and/
an overall index of cognitive impairment. These or qualitative performance characteristics sug-
include the General Neuropsychological Deficit gest clear abnormality or demonstrate a pattern
Scale (GNDS), which is derived from 42 scores consistent with a particular disorder.
from the HRB (0 = no impairment to a In practice, many clinicians who utilize the
maximum possible score of 168; see Reitan & HRB administer a modified version of the
Wolfson, 1993), the Average Impairment Rating battery that selectively includes individual HRB
(Russell, Neuringer, & Goldstein, 1970) which tests. For example, many settings that do not
ranges from 0.00 (no impairment) to 5.00, and routinely use the core HRB nevertheless
the original Halstead Impairment Index (HII; administer the Category Test as a measure of
Halstead, 1947), which utilizes a proportion of higher-order problem-solving and because of its
HRB test results based upon cutoff scores from high sensitivity to cerebral dysfunction. Alter-
selected tests (0.0 = no impairment to 1.0, natively, clinical evaluations relying upon the
reflecting impairment on all of the key vari- complete HRB often include additional mea-
ables). Like IQ scores, such summary indices sures to provide a more detailed assessment of
suffer from a number of limitations (e.g., see areas of cerebral functioning not well repre-
Lezak, 1995, pp. 23±24), although they can sented in the core battery. For example, one
arguably provide some useful reference infor- area that often is supplemented is memory, since
mation regarding an individual's overall func- the core HRB does not provide much in the way
tioning if not overinterpreted or used too rigidly. of memory assessment, even though this
Reitan and Wolfson (1996) provide a detailed represents a common complaint among neuro-
overview of the HRB in terms of its application logical populations.
and interpretation, and many of the assessment Luria's Neuropsychological investigation
and interpretation issues they discuss have (Christensen, 1984) represents a battery of
relevance beyond the HRB per se. For example, measures that involves the systematic applica-
the authors include a discussion of four of the tion of the assessment techniques originally
principal ways in which test data can be described by Luria (1966). The battery is
interpreted: (i) level of performance; (ii) pathog- organized into various sections assessing an
nomonic signs (i.e., specific test findings/deficits array of abilities, including basic and higher-
that strongly suggest brain dysfunction and are level motor functions, receptive and expressive
uncommon in nonbrain injured subjects; (iii) language, memory, and higher-order thinking.
patterns of test results (e.g., those typifying Even though it is presented here as a ªstandardº
different disorders); and (iv) comparison of test battery, it involves a sequential hypothesis-
results using each side of the body (e.g., right vs. testing approach that includes a variety of
left hand) to examine hemispheric lateralization individual tasks and relies upon qualitative
issues. judgments. Many clinicians utilize some of the
Clinicians vary with respect to their relative component tasks selectively in their examina-
reliance upon each of these interpretive strate- tions. Some of the more commonly used tasks
gies, and depending upon the case, some findings are the ªgo-no-goº paradigm (e.g., ªWhen I hold
may need to be given greater weight than others. up one finger, you hold up two,º and vice-versa)
For example, the pathognomonic sign of focal and what has come to be called the ªLuria 3-
motor weakness in the absence of a peripheral stepº motor sequence by some (i.e., rapidly
motor injury clearly merits careful attention to alternating the hand from flat to fist to edge) in
the possibility of contralateral cerebral dysfunc- order to assist in the examination of aspects of
tion. Similarly, a visual field cut (e.g., hemi- executive functioning and cognitive initiation/
anopsia) should alert the clinician to focal inhibition.
cerebral involvement in the appropriate corre- A very different type of standard test battery
sponding neuroanatomical area. involves computer administration. Although
In other cases, the pattern of test results can several such test batteries exist, one example is
be more important than the level of perfor- the Microcog: Assessment of Cognitive Func-
mance. Test findings that fall technically within tioning (Powell et al., 1993). This is a computer-
normal limits may show a pattern that is administered and scored battery of tests
suggestive of cerebral dysfunction. To illustrate designed to screen for gross cognitive impair-
this point using a more familiar measure, a ment in adults. It is available in short and long
patient may demonstrate overall scores within forms consisting of 12 and 18 subtests that
normal limits on the WAIS-III or WAIS-R, yet assess the following general areas: attention/
show relative weaknesses on those subtests most mental control, memory, reasoning/calculation,
sensitive to cerebral dysfunction which may spatial processing, and reaction time. The test
reflect neurologically-based deficits. Similarly, a software calculates standard scores and percen-
normal summary score on other neuropsycho- tile comparisons using age-referenced norms
308 Neuropsychological Assessment of Adults

that range from 18 to 89. While Microcog may As noted, many neuropsychologists develop
be useful in some screening situations, the their own groups of tests for selected popula-
limitations of the brief and selective nature of tions, such that a core battery for dementia
the tasks, the lack of alternate forms, and the might have some overlap with a battery for
limitations inherent in computer-based assess- epilepsy (e.g., perhaps in terms of intellectual
ment must be carefully considered in light of the assessment). Such batteries would likely have
specific clinical questions and patient popula- different components and stepdown procedures,
tions in need of evaluation. however. For example, additional or more
comprehensive memory measures might be
4.11.2.2 Test Batteries for Specific Populations selected for patients with epilepsy when later-
alized temporal lobe dysfunction is known or
Various standard test batteries have been suspected. Furthermore, an assessment of aca-
developed and/or assembled for specific pur- demic achievement skills would typically be
poses and populations, and many of these are more important in the young adult with epilepsy
reviewed in the excellent neuropsychological than in the older individual with Alzheimer's
test compendiums by Spreen and Strauss (1998) disease. Thus, the issue of ªcoreº test batteries
and Lezak (1995). A few of the more commonly depends largely upon neuropsychological train-
used batteries include the following: ing and experience, and many clinicians will use
(i) The consortium to establish a registry for somewhat different sets of tests for different
Alzheimer's disease (CERAD) neuropsycholo- populations. In any case, it is important for the
gical assessment battery (Morris, Heyman, neuropsychologist to be familiar with a wide
Mohs, et al., 1989) was developed for use in array of neuropsychological measures, not only
patients with known or suspected dementia. This in order to stay current with new test develop-
is a 30±45 minute examination of cognitive skills ments and research, but also to have a large test
that includes measures of orientation, verbal repertoire from which to select measures for
fluency, naming, verbal learning and memory, individual or unique cases, as well as to be best
and graphomotor constructional skills. Good prepared to review test results from other
reliability and validity have been demonstrated, centers.
and normative data for older adult populations
are available (Welsh et al., 1994). 4.11.2.3 The Hypothesis-driven Approach
(ii) The NIMH core neuropsychological bat-
tery (Butters et al., 1990) was developed to The hypothesis-driven or more flexible
evaluate cognitive changes associated with HIV approach to neuropsychological assessment
infection. Both a brief (1±2 hour) and an espouses the selection of tests based on referral
extended (7±9 hours) battery comprised of questions, known or suspected diagnoses/
standard clinical neuropsychological tests and pathology, and individual patient complaints
measures used in cognitive psychology were and symptom presentation. Such examinations
assembled in order to assess the following tend to be more ªcustomizedº or individually
cognitive ability areas: premorbid IQ, attention, tailored, even though a relatively standard
speed of information processing, learning and ªcoreº battery of tests may be included as part
memory, abstract reasoning, language, visuo- of the evaluation. Tests may also be added or
perceptual and graphomotor constructional removed from the planned evaluation as the
abilities, and psychomotor skills. It includes testing process progresses, depending upon
tasks that provide information regarding pat- performance results and patterns of strengths
terns of cognitive deficits as commonly seen in and weaknesses observed on various measures.
degenerative conditions that affect cortical and As noted, many neuropsychologists administer a
particularly subcortical cerebral functions. core group of tests to many or most of the
(iii) The Pittsburgh Occupational Exposures patients they examine, subsequently tailoring
Test battery (Ryan, Morrow, Bromet, & Par- aspects of the evaluation to follow up on areas of
kinson, 1989) was developed to evaluate those particular clinical interest. For example, a
cognitive functions most commonly affected by patient referred for evaluation of memory
exposure to environmental toxins. It comprises complaints might undergo a standard battery
a series of 15 standard and experimental of measures that provides an evaluation of
neuropsychological measures that assess a vari- multiple cognitive domains, although the focus
ety of abilities in a reasonably brief amount of of the examination might be a more extensive
time (i.e., approximately 90 minutes). Factor assessment of various aspects of memory
analysis of the battery revealed the following designed to thoroughly examine the patient's
five domains: general intelligence, attention, complaints or known/suspected pathology.
learning and memory, visuospatial, and psy- Care must obviously be used in test selection,
chomotor speed and manual dexterity. lest an examination become too cursory or
Approaches to Assessment in Clinical Neuropsychology 309

overly focused, or, at the other extreme, too that may be indicated, and the manner in which
lengthy and redundant. For example, an those scores may relate to the patient's
examination that consisted only of memory functioning in everyday life situations.
measures to evaluate a patient's memory As noted, many clinical neuropsychologists
complaints would neglect other areas of func- implement aspects of flexible and standard,
tioning (e.g., attention/concentration, executive quantitative and qualitative approaches to
functioning, language disturbance) that might assessment in practice, and these should not
play a prominent role in an individual's be viewed as mutually exclusive. Perhaps more
complaints of memory problems. Another issue important is the approach to test interpretation
is that once an area of deficiency is identified, the and performance analysis, which represents a
question arises as to how much exploration/ multidimensional process. As noted earlier, the
assessment of the deficit is necessary. Certainly dimensions of level and pattern of performance
addressing the referral question is of great on neuropsychological tests are very important
importance, and to the extent that this has been to consider. In some cases, the level of
addressed, the clinician may decide to stop the performance may suggest normal functioning,
evaluation at that point. In other cases, however, while the pattern of test results suggests the
such as in the differential diagnosis of Parkin- presence of an abnormal process. Along these
son's disease-related cognitive decline vs. de- lines, it is important to keep in mind that scores
pression vs. early Alzheimer's disease, more in the ªnormalº or ªaverageº range may reflect
detailed exploration of multifaceted aspects of impairment in some individuals compared to
memory and other neurocognitive functioning their premorbid or baseline level of functioning.
would be required beyond the question, ªIs there For example, the average IQ and memory scores
evidence of cognitive impairment?º that is often in the previously high-functioning college
posed in certain settings. professor who sustained a traumatic brain
injury (TBI) may represent a significant decline
4.11.2.4 Quantitative and Qualitative in functioning, even though the obtained scores
Examinations of Neurobehavioral remain within the ªnormalº range (e.g., see
Competence Naugle, Cullum, & Bigler, 1990). Furthermore,
if the patient achieves a good score on a
Quantitative assessment refers to the use of particular test, but does so in an abnormal
test scores for interpretation, and qualitative manner, this, too, may reflect altered mental
examination often refers more to observations function. Careful consideration of these factors,
that pertain to the process by which patients along with the pattern of test results and the
perform tasks. To illustrate this latter point, degree to which the findings are consistent with
even though a particular score on a test reflects a deficits commonly seen in TBI, are essential in
certain level of proficiency, that level of arriving at a correct diagnosis. Furthermore, the
achievement may be attained via different quantitative and qualitative information gained
cognitive strategies. An inefficient trial-and- through a comprehensive neuropsychological
error approach to a problem-solving task, for examination may be very important in provid-
example, may yield the same final result as a ing appropriate feedback and making realistic
more systematized strategy, even though these recommendations to the patient. Thus, as with
approaches reflect different underlying pro- the combination of standard and flexible test
cesses. Furthermore, the reliance upon either of batteries, utilization of quantitative as well as
these processes may have very different im- qualitative data from the neuropsychological
plications for patients in terms of their effec- evaluation can be of extreme importance.
tiveness in everyday functioning and in their
ability to cope with novel situations. This 4.11.2.5 Other Issues in Test Interpretation
approach to neuropsychological assessment is
perhaps nowhere better illustrated than in the Regardless of a clinician's primary approach
Boston Process Approach to clinical neuropsy- to neuropsychological assessment, the determi-
chology (see Milberg, Hebben, & Kaplan, 1996, nation of what tests are to be given to each
for an overview) which incorporates an exam- patient requires forethought and planning, and
ination of the process or manner in which tasks may vary depending upon the patient's age,
are performed rather than relying upon final or level of education, socioeconomic status and
summary scores alone. In some approaches to ethnic background, and clinical presentation.
neuropsychological assessment, there is an For example, the neuropsychological evalua-
emphasis on specific test scores, with less tion of a retired physician following a mild brain
attention paid to how scores are achieved, the injury would require the administration of a
cognitive functions underlying those scores, the different set of measures than an 87-year-old
relative preservation of component functions laborer referred for diagnostic confirmation of
310 Neuropsychological Assessment of Adults

Alzheimer's disease who is known to have at all test results on the same metric (mean = 50,
least a 10-year history of progressive decline. As SD = 10, with lower scores reflecting poorer
a more extreme example, the workup of an performances), thereby allowing for ready
elderly individual with three years of education comparisons across areas of cognitive function-
and limited English speaking abilities would ing. For example, questions such as, ªIs
require careful test selection to avoid measures memory more impaired than expected given
with high cultural and educational biases. an individual's background and overall level of
functioning?º can be addressed through the use
4.11.2.5.1 Test cutoff scores of such standardized scores. Also, the effects of
age and education on specific test results can be
The use of ªcutoffº or ªcutº scores is another
more readily appreciated, even when such
issue that merits comment, because this comes
factors may be thought to have little influence
up frequently in clinical practice and often
on a given task. For example, if it takes a 30 year
serves as a topic of debate. The cutoff score is
old female with a high school education 90
based on the notion that most normal indivi-
seconds to complete Trails B, the corresponding
duals tend to perform above a certain level on a
t score is 38, which reflects a mild impairment,
given test, while most brain-injured individuals
just over one standard deviation below average.
score below that level. From a statistical
If that individual were 75 years old, however,
perspective, the notion of cutoff scores holds
that same level of performance would fall in the
some merit, since many test results are normally
above-average range (t score = 56).
distributed, and depending upon sensitivity and
The use of t scores allows for an individual's
specificity qualities desired, cut points in a
performance on specific tests to be compared
distribution can be set in order to maximize one
with results from healthy groups of subjects of
or the other or both. This approach will produce
similar gender, age, and educational back-
definable true-positive and false-positive hit
grounds rather than relying upon strict cutoff
rates for groups, although statistical inferences
scores in order to help determine the degree of
may differ when applied to any given individual
normality±abnormality of findings. If enough
case. As a result, definitions of what constitutes
scores fall below expectation and/or form a
ªimpairmentº vs. ªnormalcyº may vary de-
consistent pattern of deficits, the likelihood of
pending upon a variety of factors such as age
cerebral dysfunction is increased.
and education, to take two of the more
Given the error variance of any behavioral
extensively studied demographic variables.
measure, t scores, like IQ or other standard
Different cutoff scores for impairment would
scores, should not be used as rigid neurobeha-
be needed to adjust for the effects of age on a
vioral markers that are absolute or ªtrueº
given test, since age has a significant effect on so
scores, but rather as interpretive guidelines to
many cognitive measures. Using the original
assist in evaluating levels and patterns of
cutoff score of 90 seconds for the Trail Making
performance in a particular case. Furthermore,
TestÐPart B, for example, would result in many
caution must be used in the strict application of
healthy elderly individuals being misclassified as
interpretive scores and guidelines derived from
ªimpaired,º when in fact their performances on
normal populations to cases of brain-injured
this test may fall well within normal limits or
individuals (e.g., see Reitan & Wolfson, 1996).
even above average for their age. Similar
Another potential risk of over-reliance upon
considerations must be made with respect to
standardized scores in neuropsychological in-
different educational levels and estimated
terpretation is that it can potentially give the
premorbid intellectual functioning, which can
(particularly inexperienced) clinician a false
have a profound impact on the interpretation of
sense of security, when in fact, it is the
certain neuropsychological test scores.
neuropsychologist's training, knowledge base,
experience, and skill in interpreting various
4.11.2.5.2 T-scores
aspects of test performance and behavior that
Heaton, Grant, and Matthews (1991) provide result in valid neuropsychological conclusions
normative reference values (t scores) for an regarding cerebral integrity.
extensive battery of tests that includes the HRB.
This work, and the accompanying supplement 4.11.2.6 Computer Interpretation of
for the WAIS-R (Heaton, 1992), represents a Neuropsychological Assessment
monumental contribution to the field by Results
providing age-, education-, and where appro-
priate, gender-adjusted standard scores for an Computer-derived interpretive programs
array of neuropsychological tests. The use of have been developed for several standard test
demographically-corrected t scores (derived batteries (e.g., HRB, Luria±Nebraska Neuro-
from the same normative population) places psychological Battery). While such programs
Methods of Neuropsychology 311

attempt to make general statements regarding popular MMSE, which is arguably the most
the likelihood and even pattern of cerebral widely used cognitive screening tool and pro-
dysfunction based upon normative and neuro- vides a very brief assessment of orientation,
pathological data and ªtypicalº profiles, these simple language, and recent memory skills.
programs may lend a false sense of interpretive Whereas such instruments have utility in
security to those with more limited training and quantifying gross level of impairment when
experience in clinical neuropsychology. While more severe brain dysfunction exists, scores that
certain features of some such programs are fall in the ªnormalº range on this test do not rule
arguably useful in certain situations (e.g., rapid out cognitive abnormality. That is, simple
scoring and normative referencing of results), screening measures such as the MMSE tend to
they must be used with caution. Because brain have a high false-negative rate (i.e., identifying
damage affects individuals in different ways patients as being intact, when in fact, they are
(i.e., a lesion in a specific location may produce not). For example, diagnostic error rates using
different symptoms across individual patients traditional cutoff scores on the MMSE (i.e.,
depending upon a host of interindividual <24/30) can be around 15% even among
neuroanatomic, neuropathologic, genetic, ex- patients with a diagnosis of possible or probable
periential, and personality factors), any blanket Alzheimer's disease (Cullum & Rosenberg,
statements about the nature of neurobehavioral 1998); in less impaired populations, the false-
disturbance in an individual must be examined negative rate likely will be higher. Although the
carefully within the context of the patient in use of cognitive screening tools like the MMSE is
question. Particular care must be used when preferable to a more idiosyncratic and non-
computer interpretations yield purported loca- standardized clinical mental status evaluation,
lization indices, since this process requires the limitations of any brief cognitive screening
knowledge of underlying neuroanatomical sys- tool must be kept carefully in mind. Along these
tems, neuropsychiatric disorders, and indivi- lines, as with other measures of cognitive
dual neurobehavioral variations in order to function, the use of cutoff or specific reference
optimize diagnostic accuracy. Attempts to scores on such tests must be carefully considered
distill numerically the complex process of in light of the individual patient's background.
clinically weighing various test scores, combi- Demographic factors often influence cognitive
nations of results, and qualitative performance tests, and appropriate norms should be used
features into ªfocalº interpretive patterns is a even in the case of brief screening tools (e.g., see
most challenging prospect, indeed, and erro- Crum, Anthony, Bassett, & Folstein, 1993 for
neous interpretations can be rendered through age- and education-corrected MMSE scores).
over-reliance upon some of the available test
interpretation programs. 4.11.3 METHODS OF
NEUROPSYCHOLOGY
4.11.2.7 Cognitive Screening
Neuropsychological assessment provides the
In keeping with the current managed-care most sensitive and comprehensive means by
Zeitgeist, neuropsychological evaluations which brain function and cognition can be
should be appropriately detailed and compre- assessed given our current state of technology.
hensive, while at the same time designed to The techniques are used widely in various
provide the maximum amount of information in settings across many patient populations, and
a time- and cost-efficient manner. It should be are recognized by the American Academy of
kept in mind, however, that brief cognitive Neurology as accepted diagnostic procedures
assessments should not be carried out at the (Report of the Therapeutics and Technology
expense of thoroughness. In some cases, a brief Assessment Subcommittee of the American
examination may be adequate to address referral Academy of Neurology, 1996). Neuropsychol-
questions, as in the case of documenting ogists often serve as consultants to neurolo-
cognitive impairment in a patient with severe gists, neurosurgeons, psychiatrists, clinical
dementia. A thorough examination of such a psychologists, and other medical and health
patient might be accomplished readily through professionals.
the use of cognitive screening tools, and in fact, Neuropsychological tests are the means by
the administration of an eight- or 10-hour which neurobehavioral samples are elicited, and
battery of measures in such a case might not specific test scores represent summary state-
yield any more diagnostic or clinically useful ments about observed behaviors and cognitive
information than a brief assessment. However, capabilities. Obtaining valid behavioral samples
evaluations that are overly brief may be is critical, as is a thorough knowledge of the
insensitive to mild or even more significant range of responses made by patients with and
cognitive deficits. Take, for example, the ever- without various neurobehavioral disorders. As
312 Neuropsychological Assessment of Adults

noted in neuropsychology training programs, Table 1 Cognitive domains typically assessed by the
neuropsychological tests are no better than the comprehensive neuropsychological evaluation.
clinician interpreting them. Thus, the best tests
in poor or inadequately trained hands may fail 1 Global cognitive/intellectual functioning
to yield the correct neurobehavioral conclu- 2 Academic achievement
3 Executive functioning/problem-solving
sions, and it is the clinician that is the instrument 4 Language
of evaluationÐnot the tests alone. Appropriate 5 Visuospatial processing
specialized training in neuropsychology is 6 Attention/concentration
essential to conduct detailed neuropsychologi- 7 Learning and memory
cal evaluations beyond a cognitive screening 8 Psychomotor functions
level, and this is the topic of a section later in this 9 Sensory perceptual abilities
chapter. 10 Psychological functioning

4.11.3.1 Clinical Interview and Background highly trained psychometrists or technicians.


Information These individuals are responsible for standar-
Prior to a discussion of individual cognitive dized test administration and scoring under the
domains and specific representative measures, supervision of the clinical neuropsychologist.
an overview of some of the other information This is a recognized standard practice in the
that is essential to the neuropsychological field that is endorsed and/or utilized by a
examination is in order. In addition to a formal majority of neuropsychologists (Sweet &
examination of mental functions, information Moberg, 1990).
pertaining to the patient's history, presenting
complaint, and overall functional status is 4.11.3.2 Neuropsychological Measurement of
essential. In order to interpret results from any Brain Function
neurobehavioral task, an understanding of the
behavioral geography of the brain and the The comprehensive neuropsychological eva-
underlying cerebrocortical systems is critical. It luation includes the administration of tests that
should be kept in mind that many cognitive comprise multiple cognitive domains. Although
processes represent inter-related phenomena the specific tests selected by individual neurop-
and that individual tasks seldom assess highly sychologists may vary, the domains listed in
specific skills or involve circumscribed cerebral Table 2 are commonly represented.
areas in isolation. Second, background variables Excellent descriptions and reviews of the
of the patient, including their age, level of most frequently used cognitive measures (as
education, SES, and cultural background, must well as normative data for some tests) can be
be kept in mind while asking them to perform found in Lezak (1995) and Spreen and Strauss
various tasks. Third, it should be kept in mind (1998), and the reader is encouraged to consult
that the clinician interpreting the test data is the these comprehensive reference sources. Table 3
key to a good neuropsychological evaluation. presents some of the more commonly used tests
Even the most sensitive tests in the hands of the to assess these cognitive domains. It should also
inadequately trained clinician may yield erro- be noted that there may be significant overlap
neous conclusions. This is particularly true in between some of the listed tests, and across
cases of subtle behavioral abnormalities that domains. The tests are listed for illustrative
may only be elicited through the use of specific purposes and are not intended to provide an
cognitive tasks or detected via the careful and exhaustive list or to reflect a prototypical
experienced observation of characteristic symp- clinical neuropsychological test battery.
toms or behavior patterns. Table 1 provides an The question sometimes arises as to why a
overview of much of the information that should neuropsychological evaluation cannot be ex-
be collected as part of the clinical interview tremely focused, that is, ªWhy not administer a
within the context of the typical neuropsycho- single test of memory to a patient if that is the
logical evaluation. Such data form a basis for the only presenting complaint?º The answer ob-
selection of assessment procedures as well as viously is that by using only one test or assessing
interpretation and integration of findings. only one cognitive domain, the clinician might
After obtaining this information, the selec- completely overlook the actual cause of the
tion of neuropsychological tests can be done, patient's reported difficulties. For example, an
with particular areas of interest being stressed individual presenting for memory disturbance
depending upon the referral question, nature of may actually have primary difficulties in atten-
the disorder(s) under consideration, and patient tion and concentration which preclude their
complaints and presentation. In many settings, ability to efficiently encode new material. When
neuropsychological tests are administered by that individual attempts to recall information at
Methods of Neuropsychology 313

Table 2 Common neuropsychological measures.

Representative measures Functions assessed

Intellectual
Wechsler Adult Intelligence Scale-3 (WAIS-III) and Wechsler Global cognitive capacity/
Adult Intelligence Scale-Revised (WAIS-R) Intelligence
National Adult Reading Test-Revised (NART-R) Premorbid IQ estimation
Global cognitive
Dementia Rating Scale (DRS) Presence and level of dementia
Mini-Mental State Examination (MMSE) Gross cognitive screening
Academic achievement
Wide Range Achievement Test-3 (WRAT-3) Reading, spelling, math
Pearowstart Individual Achievement Test-Revised (PIAT-R) Reading, spelling, math
Wechsler Individual Achievement Test (WIAT) Basic academic skills
Executive function and problem-solving
Wisconsin Card Sorting Test (WCST) Cognitive flexibility/problem-solving
Category Test Abstraction and reasoning
California Sorting Test (CST) Cognitive flexibility/idea generation
Similarities (WAIS-III, WAIS-R) Abstraction
Trail Making Test (Part B) Mental sequencing/flexibility
Raven's Matrices (standard, colored, advanced) Nonverbal reasoning
Language
Vocabulary (WAIS-R) Word knowledge
Boston Naming Test (BNT) Confrontation naming
Word fluency; Controlled Oral Word Association Verbal fluency
Token Test Comprehension
Boston Diagnostic Aphasia Examination (BDAE) Overall language function
Aphasia Screening Test Language screening
Visuospatial
Block Design (WAIS-R) Visuoconstructional ability
Clock drawings (command and copy) Graphomotor construction
Cross drawings (command and copy) Graphomotor construction
Rey±Osterrieth Complex Figure (copy) Graphomotor construction/planning
Hooper Visual Organization Test (HVOT) Visuoperceptual integration
Object Assembly (WAIS-R) Visuospatial integration
Line bisection, Visual cancellation tests Hemispatial inattention
Ruff Figural Fluency Test Nonverbal fluency
Attention/Concentration
Digit Span forward (WAIS) Simple attention
Digit Vigilance (number cancellation) Sustained concentration and attention
Paced Auditory Serial Addition Test (PASAT) Auditory concent/processing speed
Digit Symbol (WAIS) or Symbol Digit Modalities Test Visual attention/psychomotor speed
Trail Making Test (Part A) Visual tracking/psychomotor speed
Continuous Performance Test (various versions) Sustained vigilance and reaction time
Visual Sustained Attention Test (VSAT) Sustained concentration and attention
Learning and memory
Wechsler Memory Scale-3 (WMS-III) and Wechsler Memory
Scale-Revised (WMS-R) Global memory function
California Verbal Learning Test (CVLT) Verbal learning and memory
Logical Memory (WMS-III, WMS-R) Verbal memory
Visual Reproduction (WMS-III, WMS-R) Nonverbal memory
Rey±Osterrieth Complex Figure (recall) Complex nonverbal memory
Hopkins Verbal Learning Test Brief measure of verbal memory
Benton Visual Retention Test Nonverbal memory
Warrington Recognition Memory Test Verbal/nonverbal recog. memory
Rey Auditory Verbal Learning Test Verbal learning and memory
Buschke Selective Reminding Test Verbal learning and memory
Motor and sensory
Finger Tapping Test Fine motor speed
Hand Dynamometer Grip strength
Grooved Pegboard Fine motor dexterity
Luria three-step hand sequence Coordinated/integrated sequencing
Sensory-Perceptual Examination (Reitan±Klove) Sensory and perceptual skills
314 Neuropsychological Assessment of Adults

Table 3 Information to be collected from the history and behavioral observations.

History of symptoms
Description of present illness and specific symptoms (e.g., Why is patient being evaluated?)
Nature of onset (e.g., insidious vs. acute; are symptoms new or a recurrence?)
Duration
Course of symptoms
Associated behavioral or personality/emotional changes
Past medical history
Previous neurological disease
Previous psychiatric disease (psychiatric treatment, known diagnoses, etc.)
Significant head trauma (details: presence and length of LOC/coma; PTAÐassess)
Seizures (circumstances of onset, frequency, characterization of seizures)
Other medical illnesses requiring hospitalization
Toxic Exposure (may relate to work history)
Birth and developmental history
Birth complications
Developmental delays (motor, language, intellectual)
Social history
Abnormalities in behavior past and present
History of and current family and peer interactions
Educational history
Highest grade attained/ degrees awarded
Specific problems learning
Problems in school (e.g., truancy, being expelled, etc.)
Typical grades in school (e.g., below average, average, above average)
Best and worst subjects in school
Reasons for termination of education (if relevant; also, if not HS, did get GED?)
Vocational history
Present occupational status (if retired, what was last position)
Length of current job
Nature of responsibilities
Types of previous jobs
Frequency of job changes
Recent difficulties with work (self-report, work appraisals, raises, etc.)
Substance use history
Duration, frequency, and amounts of typical alcohol intake
Use of other drugs
Use of prescription medications (list current medications)
Family history
Neurologic disease in family (especially parents, siblings)
Psychiatric disease in family
Familial predilection for diseases (e.g., learning disabilities, dementia, epilepsy, etc.)
Cause and age of death of parents and siblings
Psychiatric symptoms
Unusual or bizarre behavior
Attention/concentration problemsÐdistractibility
Problems with social judgment
Depression
Anxiety
Paranoid ideation/suspiciousness
Hallucinations
Delusions
Tangentiality, looseness of associations
General appearance
Appropriateness of appearance for age
Posture (examine for asymmetries)
Facial expression
Eye contact
Personal cleanliness, grooming, dress
Mood
Range and appropriateness of affect, responsiveness to conversation/interview
Depression
Apathy
Lability
Principal Cognitive Domains for Assessment 315
Table 3 (continued)

Mania or hypomanic symptoms


Behavioral
Cooperativeness with examiner
Anxiety
Suspiciousness
Anger
Insight
Response to interview
Motor
General activity level
Abnormal posturing, movements, tremors, etc.
Gait (broad-based, shuffling, unsteady, symmetry of movement)
Handedness (and familial history thereof, particularly if left-handed)
Speech
Rate
Volume
Modulation
Expression
Dysarthria

a later time, they fail, thereby producing what the concept of global intellectual capacity has
subjectively seems like a deficit in memory. In utility within the context of assessing cognitive
actuality, their problem may not be at the functioning and adaptive ability as it relates to
retrieval or recall stage, but rather, at earlier brain function.
stages of information processing. Similarly, a From a clinical bedside perspective, the
patient with aphasia may present with word- clinician must rely upon the patient's educa-
finding difficulty that might also mimic a tional and vocational history, in addition to
memory impairment if the individual were not their overall presentation style, use of language
thoroughly evaluated. Some patients with (when possible), and performance during var-
frontal lobe damage demonstrate inefficient ious cognitive tasks to estimate a gross level of
encoding secondary to organizational deficits, intellectual functioning. Next, one must begin
when in fact their ability to recall learned to evaluate whether there seems to be evidence
material may be normal. of gross intellectual decline based on their
The following section discusses the major current presentation. Ideally, premorbid IQ and
cognitive domains typically assessed by the other cognitive test scores would be available
comprehensive neuropsychological evaluation, for direct comparison, but this is rarely the case.
including an overview of some common repre- Thus, detailed examination of this issue requires
sentative measures from each domain. Each formal psychometric assessment, and based on
section begins with a clinical overview of the area specific cognitive test results and patterns of test
to be examined and includes a brief discussion of scores, indices of current intellectual function-
relevant, less structured or ªbedsideº clinical ing as well as estimates of premorbid intelligence
examination/observation procedures prior to a can be derived (Franzen, Burgess, & Smith-
presentation of specific psychometric assess- Seemiller, 1998).
ment tools. The most commonly used intellectual assess-
ment tools for adults are the Wechsler Intelli-
4.11.4 PRINCIPAL COGNITIVE gence Scales. In addition to the information
DOMAINS FOR ASSESSMENT provided by the various component subtests of
4.11.4.1 Global Cognitive/Intellectual the Wechsler Scales, some assessment of an
Functioning individual's global level of cognitive capacity is
important in establishing an interpretive back-
The concept of intelligence has a long history ground for other specific test results. For
and has been the topic of much discussion and example, knowing that a patient has a Full-
controversy since the early 1900s. Various Scale IQ (FSIQ) score in the borderline range can
definitions of intelligence have been posited be very useful in accurately interpreting their low
(Sternberg, 1997), and it is clear that intelligence average to borderline performance on memory
is a multidimensional construct that is made up tests. Although some variability across cognitive
of many types of abilities that are demonstrated skills is commonly seen within individuals,
by each of us to varying degrees. Despite on- higher intellectual levels tend to be associated
going debate regarding the use of IQ measures, with more advanced skills in various domains.
316 Neuropsychological Assessment of Adults

Thus, the 75-year-old retiree with eight years of in making inferences about cerebral dysfunc-
formal education who obtains an FSIQ of 120 tion (see Heaton, Ryan, Grant, & Matthews,
would be expected to perform at a higher level on 1996). Pattern analysis of the Wechsler scales
memory or attention/concentration measures can thus be quite useful in the assessment of
than the elderly individual with a similar brain damage, as the subtests are differentially
educational background, but an IQ in the low susceptible to the effects of acquired brain
average range. injury and can be further impacted depending
One of the limitations of intellectual assess- upon the nature of the disorder and/or location
ment that is worth noting is the use of global IQ of primary cerebrocortical involvement. These
scores in depicting an individual's cognitive tests are described now in detail.
capabilities. An FSIQ of 100 can be arrived at in (i) Wechsler scales. (a) Wechsler Adult Intelli-
various ways, for example, by achieving an very gence Scale-Revised. The Wechsler Adult In-
high Verbal IQ and low Performance IQ score, telligence Scale-Revised (WAIS-R; Wechsler,
or vice-versa. On the WAIS-R, note that each of 1981) is the most popular measure of intellectual
the following sets of VIQ/PIQ scores result in an functioning in adults, and has enjoyed wide-
FSIQ of 100 for a 30-year-old: (i) VIQ = 121, spread use since its introduction in 1981. A
PIQ = 79; (ii) VIQ = 80, PIQ = 138; (iii) plethora of studies using the WAIS-R or its
VIQ = 100, PIQ = 101. Also, some subtest components has been published in the neurop-
scaled scores may be very high and offset lower sychological literature, and a great deal has been
ones, resulting in an ªaverageº overall IQ score. learned about its utility as a neuropsychological
These examples represent significantly different tool. This is underscored by the development
cognitive processing styles and differential and publication of the WAIS-R as a neuropsy-
ability patterns, although these distinctions chological instrument (WAIS-R-NI; Kaplan,
would be lost if there is an over-reliance upon Fein, Morris, & Delis, 1991). This modification
global IQ scores. of the WAIS-R reflects the application of
From a neuropsychological perspective, in- specific neuropsychological principles and pro-
tellectual tests such as the WAIS-R and WAIS- cedures to the administration of the WAIS-R.
III can be quite useful in describing cognitive For example, in individuals with difficulty
strengths and weaknesses and eliciting relevant responding to open-ended questions or those
neurobehavioral samples, even though these having initiation difficulties, the Information
tests were not originally designed for assessment subtest of the WAIS-R-NI includes a provision
of cognitive dysfunction. However, many of the for multiple choice responding. Some of the
changes in the WAIS-R reflected in the WAIS- timed subtests include extended time limits for
III, in fact, were influenced by neuropsycholo- those with motoric slowing to permit a more
gical principles, and the WAIS-III includes valid assessment of visuoconstructional skills
several new subtests that attempt to provide a apart from the effects of motor dysfunction.
more detailed exploration of cognitive abilities In settings where an overall estimate of
beyond ªIQº or ªgº alone. Whereas intellectual intellectual functioning is desired, various short
tests tend to be less sensitive to cerebral forms of the WAIS-R have been developed. One
dysfunction than many specialized neuropsy- very brief example is the combination of
chological measures of specific abilities, the Vocabulary and Block Design in the estimation
relative sensitivity of some subtests to cerebral of FSIQ (Brooker & Cyr, 1986; Silverstein,
dysfunction have proven useful in detecting 1982). Such brief forms and lengthier proce-
gross cognitive decline in some cases. For dures (e.g., Adams, Smigielski, & Jenkins, 1984)
example, the Vocabulary subtest of the WAIS- have shown reasonable correlations with the
R historically has been known as one of the tests complete WAIS-R, and these estimates may be
that is least sensitive to acquired cerebral adequate in some situations. For example, in a
dysfunction. This subtest relies upon highly case of head injury wherein the focus is upon
crystallized and overlearned knowledge, and more subtle difficulties with attention/concen-
the ability to define familiar words tends to be a tration and memory, less of an emphasis may be
function that is relatively less affected by placed on global intellectual status, particularly
acquired brain damage than many other tasks when there are no questions of a decline in this
that require novel learning or reasoning. In area. In selecting the shortened version of any
contrast, the Digit Symbol subtest of the WAIS- test such as the WAIS-R, clinicians must rely
R, which requires rapid psychomotor speed, upon experience with a large number of cases
new learning, and attention, is one of the more and with appropriately supportive literature.
sensitive subtests to cerebral dysfunction. Among the individual subtests that show the
Knowledge regarding which subtests are more strongest correlations with overall FSIQ scores
or less sensitive to brain injury, educational is the Vocabulary subtest (r = 0.81), followed
background, age, gender, and so on, is essential by Information (r = 0.81). At the other end of
Principal Cognitive Domains for Assessment 317

the spectrum lies Digit Symbol, which, along Table 4 WAIS-III subtests.
with Object Assembly, shows a correlation of
0.57 with FSIQ. In general, the five Perfor- Vocabulary Picture Completion
mance subtests, which rely more upon novel Similarities Digit Symbol±Coding
thinking and ªfluidº cognitive skills, tend to be Arithmetic Block Design
Digit Span Matrix Reasoning
more sensitive to cerebral dysfunction than the Information Picture Arrangement
more crystallized abilities tapped by the six Comprehension Symbol Search
Verbal subtests. Because the various subtests of Letter±Number Sequencing Object Assembly
the WAIS-R assess a variety of cognitive
abilities, some of these will be discussed in
more detail under specific cognitive domains. conditions. For example, deficits on PIQ subt-
(b) Wechsler Adult Intelligence Scale-III. The ests are often discussed in association with right
WAIS-III is a newly revised version of the hemisphere dysfunction. Whereas this may hold
Wechsler Adult Intelligence Scale (WAIS-III; true in many cases, decreased nonverbal test
Wechsler, 1997a). It contains a variety of scores may occur for a variety of reasons (e.g.,
changes and additional subtests, and some psychomotor slowing on timed tasks, greater
previously familiar measures have been dropped sensitivity to acquired brain injury than most of
or are now optional. The addition of a matrix the VIQ tasks). The following cases serve to
reasoning task and measures of sustained con- illustrate this point.
centration and working memory promise to First it should be noted that all three cases
provide a more well-rounded evaluation of depict at least a 22-point difference between
global cognitive functioning and may enhance VIQ and PIQ scores, although there is ob-
the test's utility in the context of the neuropsy- viously some variability across the profiles in
chological evaluation. Most of the familiar terms of specific scaled scores. One initial
subtests remain, although these have been impression might be that all three cases likely
updated for the 1990s, with improved norms. represent some sort of acquired brain injury, as
Many subtests that include scenes for visual 20-point differences between IQ indices are rare
material now are presented in color, with larger in the general population. A second level of
figures and a more representative depiction of interpretation, based in part on the clinical lore
our multicultural society. As a quick review of associated with the test, might include an
the test evaluators and consultants will indicate, inference about lateralization, that is, all three
there is a decidedly more neuropsychological cases might be suggestive of greater impairment
flair to the WAIS-III. As we have learned more of right hemisphere functions, given the large
about brain±behavior relationships and the VIQ±PIQ differences and relative deficits on the
effects of acquired cerebral dysfunction on ªnonverbalº subtests. As alluded to earlier,
intellectual functioning, test developers felt it knowledge that a variety of factors may result
important to incorporate aspects of this knowl- in decreased PIQ scores can provide the clin-
edge into the current test design. Table 4 lists the ician with information that is essential in
subtests for the WAIS-III. neurodiagnostic situations.
As reported in the WAIS-III technical man- Case 1 is an engineer in his 40s who sustained a
ual (Wechsler, 1997a), correlations involving right hemisphere stroke. CT findings (Figure 3)
VIQ, PIQ, and FSIQ scores between the WAIS- depict a hemorrhage deep in the right hemi-
III and WAIS-R are high (0.94, 0.86, and 0.93, sphere that included intraventricular bleeding.
respectively). WAIS-III VIQ, PIQ, and FSIQ His WAIS-R profile of decreased nonverbal
scores were found to be an average of 1.2, 4.8, abilities and a 22-point VIQ±PIQ discrepancy is
and 2.9 points lower than the corresponding what is often associated with a ªclassicº pattern
values on the WAIS-R. Obviously, given the of right hemisphere dysfunction (see Figure 4).
recent release of this test, many studies of its Whereas this holds true for this particular
neuropsychological utility and psychometric individual, Cases 2 and 3 illustrate that sim-
relationships to other neuropsychological mea- plistic interpretations or overgeneralizations
sures will be forthcoming. can lead to erroneous conclusions.
(c) The Wechsler scales and patterns in neu- Case 2 is a 50-year-old nurse with a history of
rological populations: VIQ±PIQ Differences. stroke four months prior to evaluation. MRI
Despite the utility of instruments like the results (Figure 5) revealed a chronic egg-sized
Wechsler scales in neuropsychological evalua- hemorrhage involving the left internal capsule
tions, appropriate interpretation of scores and and basal ganglia.
patterns of scores in cases of brain dysfunction Her primary presenting symptoms included
requires extensive training in neuropsychology mild right-sided hemiparesis and mild difficul-
and familiarity with a broad range of neuro- ties with verbal fluency and word finding. Much
pathological, neuropsychiatric, and normal of the decrease in PIQ subtest scores (Figure 6)
318 Neuropsychological Assessment of Adults

Figure 3 CT from Case #1 showing right hemisphere infarct.

14
14
12 12 12
12

10 9
8
7 7 7
8
5 5
6

0
I Dsp V Ar Co Si PC PA BD OA Dgsy

Figure 4 WAIS-R scaled scores from case #1. VIQ = 108, PIQ = 86, FSIQ = 98.
Principal Cognitive Domains for Assessment 319

Figure 5 MRI showing left basal ganglia/internal capsule hemorrhage in case #2 (note that left is
depicted on the right).

was felt to be related to the patient's hemiparesis alcohol abuse, and this pattern of findings on
and having to perform some tasks with her the WAIS-R was amply supported by the
nondominant left hand. However, the relative patient's impaired performances on other vi-
sensitivity of nonverbal vs. verbal WAIS-R suoconstructional tasks.
subtests to any type of acquired brain damage (d) WAIS subtest ªscatter.º The issue of
should be kept in mind, and in fact, additional ªscatterº among WAIS-R subtest scaled scores
time allowances on some of the PIQ subtests represents another area of clinical lore that
(e.g., Block Design) still resulted in evidence of merits comment within the context of assessing
cognitive impairment; thus, psychomotor fac- patients with known or suspected cerebral
tors alone could not completely explain the dysfunction. It has been observed by some that,
patient's difficulties in performing these tasks. in cases of brain damage, there tends to be a
Case 3 results are from a 40-year-old long- greater than average range of scores from
term alcoholic who showed evidence of alcohol- highest to lowest across subtests. This repre-
related dementia. Axial MRI results (Figure 7) sents a complex issue, since most individuals
depict generalized sulcal widening and ventri- with or without brain damage have relative
cular dilation. cognitive strengths and weaknesses, and as
The patient demonstrated generalized neu- noted, the subtests differ with respect to their
ropsychological deficits, with particular diffi- individual sensitivity to acquired cerebral dys-
culties in visuospatial abilities (see Figure 8). function.
Results such as this are not uncommon in cases Finally, what constitutes clinically ªsignifi-
of extensive and long-term (i.e., 10+ years) cantº scatter across Wechsler subtest scores also
320 Neuropsychological Assessment of Adults

16
16
14 14
14
12
12 11 11
10
10 9
8
8
6
6 5

0
I Dsp V Ar Co Si PC PA BD OA Dgsy
Figure 6 WAIS-R scaled scores from case #2. VIQ = 121, PQ = 101, FSIQ = 112.

Figure 7 MRI results from case #3 showing generalized atrophy.


Principal Cognitive Domains for Assessment 321

14 13

12 11 11
10 10
10
8 8
8
6 6
6 5

4
6
2

0
I Dsp V Ar Co Si PC PA BD OA Dgsy
Figure 8 WAIS-R scaled scores from case #3. VIQ = 103, PQ = 78, FSIQ = 90.

merits careful review. In his APA Presidential however, each typically has a different pattern
address, Matarazzo (1990) provides data that depicted. The task is to select the piece that best
clearly calls to question the notion of the completes the stimulus array. The Colored
relevance of ªscatter.º For example, a table is matrices are particularly well suited for children
presented (p. 1010) that was derived from and impaired elderly, and the Advanced ma-
previous work (Matarazzo & Prifitera, 1989) trices are most appropriate for high functioning
which includes the percentage of cases from the individuals when an assessment of nonverbal
WAIS-R standardization sample that demon- reasoning is desired. IQ equivalent scores can be
strate various levels of scatter (the difference derived from the matrices, and their nonverbal
between highest and lowest of the 11 subtest nature (particularly the Colored version) allows
scaled scores). While extreme score differences for those with language impairments or limited
of greater than 10 points were rare (i.e., seen in fluency in English to apprehend the nature of
less than 5% of the 1880 standardization sample the task quite readily in many cases.
subjects), discrepancies of 10 points were seen in (iii) National Adult Reading Test-Revised
approximately 9% of the sample, while differ- (NART-R). This is a measure of estimated
ences of seven points occurred in almost half of premorbid intellectual functioning (Blair &
the sample, and roughly 86% showed a differ- Spreen, 1989) as derived from the original
ence of five points. Thus, it should be kept in NART (Nelson, 1982; Nelson & O'Connell,
mind that ªscatterº among WAIS subtest scaled 1978). It comprises 60 printed words that are
scores is commonly seen in non-neurological presented to subjects to read aloud. Thus, rather
populations. With these numbers in mind, the than relying upon any sort of integrative or
issue of the neurological significance of scatter associative abilities, the NART-R and its var-
among Wechsler scaled scores should be con- ious derivations rely only upon sight-word
sidered cautiously. reading. Most of the words included in the
(ii) Ravens Progressive Matrices. This series NART- R are phonetically irregular, and thus,
of tests includes the Colored (Raven, 1995), correct pronunciation generally depends upon
Standard (Raven, 1996), and Advanced (Ra- prior familiarity with the words. The stimuli are
ven, 1994) Progressive Matrices, providing presented in order of increasing difficulty, and
different levels of assessment of nonverbal range from words like ªasteriskº to ªennui.º As
reasoning. In the Colored Matrices, subjects with any performance-based cognitive task, the
are shown a stimulus array at the top of a NART-R does show some sensitivity to ac-
booklet, with one small piece cut away. Below quired brain dysfunction; nevertheless, it tends
this, six small pieces corresponding to the to remain relatively stable, even in the face of
outline of the missing piece are presented; progressive dementia (Matt-Maddrey, Cullum,
322 Neuropsychological Assessment of Adults

Weiner, & Filley, 1996). The NART-R does very brief (i.e., 5±10 minutes) and comprises 30
have limitations in terms of its applicability to total points. Test items include an assessment of
individuals with very high and very low in- orientation to time and place, attention, lan-
tellectual status, but correlates well with Verbal guage, constructional skill, and three word
and Full-Scale IQ in many cases when there is repetition and recall. It is recommended that
little or no intellectual decline. standardized recall items (e.g., apple, table,
(iv) Dementia Rating Scale (DRS). This is a penny or rose, ball, key) are used, as opposed
popular instrument for the assessment of global to the original instructions of ªname three
cognitive functioning in patients with known or objectsº for patients to recall. Total MMSE
suspected dementia (Mattis, 1988). It has been scores correlate best with orientation and recall
widely used clinically as well as for research items, and significant variability can be seen
purposes, and has proven to be a reliable and among healthy subjects on three word recall
valid screening tool that typically requires only (Cullum, Smernoff, & Thompson, 1993). De-
20±30 minutes (up to 45 minutes in some spite the limitations inherent in any brief mea-
demented individuals). The total number of sure of overall cognition, the MMSE has proven
possible points on the DRS is 144, and Table 5 to be a valid and reliable screening tool, with
provides a general description of level of im- utility in a number of populations and settings.
pairment on this measure. It should be noted, Although the traditional cutoff for impairment
obviously, that any such listing provides only a is <24/30, the MMSE is sensitive to age and
global guideline for interpretation, as results on education effects, and appropriate norms should
the DRS do vary according to age, education, be used (Crum et al., 1993; Tombaugh, McDo-
and other background factors. well, Krisjansson, & Hubley, 1996).
The DRS is made up of the following
subtests: attention, initiation/perseveration, 4.11.4.2 Academic Achievement
conceptualization, construction, and memory.
Because of the screening nature of the test, These tests sample a range of abilities related
higher functioning or minimally impaired in- to academic or scholastic levels of attainment. A
dividuals may achieve normal scores yet de- number of omnibus achievement batteries exist
monstrate evidence of impairment on more and each has its relative strengths and limita-
sensitive tests. For example, the memory com- tions depending upon the purpose of assess-
ponent of the DRS assesses general orientation ment. Within the context of the adult
and includes some simplistic recognition mem- neuropsychological evaluation, various compo-
ory tasks, while only two tasks focus on the nents of different achievement tests are often
assessment of delayed recall. administered. The more common measures
Although designed as a tool to screen for the sample a variety of abilities related to academic
presence or absence of dementia, patterns on achievement (e.g., spelling, sight reading, read-
the DRS have also been shown to be useful in ing comprehension, math) which can be
helping to distinguish some types of dementias, compared with other neuropsychological skills
although it should be noted that once cognitive and performances on related tests. Assistance in
impairment becomes severe, such differentia- the diagnosis of learning disabilities can also be
tion on the basis of cognitive test results can be provided by these measures to the extent that
difficult or impossible. Despite is relative brev- specific deficiencies are identified.
ity, total scores from the DRS are quite useful in (i) Wide-Range Achievement Test-3rd Edition
documenting cognitive decline over time and (WRAT-3). The WRAT-3, like its predecessors
have shown to be predictive of nursing home (WRAT, WRAT-R), provides for an assess-
placement (Smith et al., 1994). ment of written spelling, written mathematical,
(v) Mini-Mental State Examination. This is a and sight-word reading skills (Wilkinson, 1993).
very popular screening measure of global cog- It enjoys widespread use across the lifespan (age
nitive status. It is particularly useful in cases of 5±75) and is relatively brief (i.e., approximately
dementia and may be used even in severely 20±30 minutes).
impaired individuals (Folstein et al., 1975). It is (ii) Woodcock±Johnson Psychoeducational
Battery-Revised (WJ-R). This is one of the
most comprehensive batteries when used in its
Table 5 Dementia Rating Scale (DRS) and level of entirety for the individual assessment of global
dementia.
cognitive skill, general information, academic
115±130 Mild
achievement, and interests (Woodcock & Math-
100±114 Moderate er, 1989). It has been used with children to
90±99 Moderate to severe adults, ranging in age from two to 90 years. The
590 Severe WJ-R Achievement Supplemental Battery pro-
vides a more focused assessment of academic
Principal Cognitive Domains for Assessment 323

achievement through the use of subtests that stranded at the airport with only one dollar,?º
assess reading (both nonsense words and for or ªWhy should people visit the doctor when
meaning), math concepts, proof-reading, writ- they are not ill?,º or ªThe tiger was eaten by the
ing to dictation, and writing/sentence composi- LionÐwho did the eating?º As with some other
tion. Various cluster scores pertaining to tests of higher cognitive function, proverb
different aspects of achievement and level of interpretation involves a variety of skills that
competence across domains can be examined in cannot be readily distilled into a specific set of
order to establish a profile of abilities. abilities. Administration of familiar (e.g., ªA
(iii) Peabody Individual Achievement Test- rolling stone gathers no mossº) as well as
Revised (PIAT-R). The PIAT-R is used pri- unfamiliar proverbs (e.g., ªThe used key is
marily for children, but offers norms from always brightestº) is recommended, however,
kindergarten through high school (up to age since an individual's ability to interpret novel
19). It comprises the following subtests: general material may rely upon different skills com-
information, reading recognition, reading pared to their ability to retrieve the common
comprhension, mathematics, spelling, and writ- response to a more well-rehearsed task. In-
ten expression (Markwardt, 1989). Various formation about patients' planning abilities is
subtests may be administered in order to screen also useful, whether assessed by a formal
for specific areas of difficulty, and because it is cognitive task or simply by inquiring about
infrequently assessed by many other tests of their plans for the future given their current
cognition, the reading comprehension subtest is situation. Malloy and Richardson (1998) out-
perhaps one of the most commonly used line a number of bedside and formalized
subtests of the PIAT-R as part of the adult procedures for the assessment of frontal/execu-
neuropsychological evaluation. tive functioning. In evaluating results from
various tests of executive abilities or frontal lobe
4.11.4.3 Executive Functioning, Problem- function, it must be kept firmly in mind that
solving, and Reasoning such measures are not uniformly or invariably
affected by damage to frontal systems (Bigler,
This group of abilities is referred to by various 1988; Stuss & Benson, 1984), and that normal
names, and encompasses a variety of skills and test results can be seen in some patients with
functions. Judgment, reasoning, problem-sol- clear evidence of frontal damage, just as
ving, and abstraction are some of the more impairment on the same tests can occur in
common skills within this domain that involve patients with pathology outside the frontal
some of the most complex and integrative lobes.
functions of the human brain. While such
abilities are not specifically localized and rely 4.11.4.3.1 Tests of reasoning and problem-
upon the integration of various cortical and solving
subcortical regions and pathways, the frontal
(i) Wisconsin Card Sorting Test (WCST)
lobes tend to play a relatively prominent role in
many of these skills. This is a popular measure of nonverbal
Formal neuropsychological evaluation repre- reasoning and cognitive flexibility (Heaton,
sents the most efficient means of assessing these 1981; Heaton, Chelune, Talley, Kay, & Curtis,
functions, although more informal administra- 1993). It requires subjects to sort a series of
tion of similarities, proverb interpretation, colored geometric designs into piles according
inhibition, and reasoning and judgment tasks to specific principles. Feedback is given as to
also can be useful. Fluency tasks have also been whether each individual response is correct or
shown to be useful in assessing executive or incorrect, although subjects are not instructed
anterior cerebral functionsÐletter or phonemic as to what strategies they should be using. The
fluency (ªtell me as many words as you can WCST is sometimes touted as a measure of
think of that start with ªR' within a minuteº) to ªfrontalº functioning. Whereas the test has been
assess left anterior hemispheric functions, and shown to be relatively more sensitive to anterior
design fluency for anterior right hemispheric brain damage in some studies (Heaton, 1981), it
function (e.g., having patients draw as many is important to keep in mind that the WCST is
unique nonreplicated designs as possible within not exclusively a measure of frontal functioning,
a minute). Such tasks may be modified to and can be sensitive to damage in other areas of
further tax executive systems, for example, by the brain. Thus, impairment on the WCST in
alternating between different procedures and of itself cannot necessarily be interpreted to
(ªname an animal, then a cityº), although suggest disruption of frontal brain systems.
normative data on such procedures are lacking. Nevertheless, patients who are disinhibited and
Judgment or reasoning tasks include ques- have difficulty monitoring their responses will
tions such as ªWhat would you do if you were tend to be perseverative in their responses to this
324 Neuropsychological Assessment of Adults

task. Furthermore, given the appropriate principles. Initially, subjects are asked to sort
supporting pattern of anterior dysfunction the cards into two equal groups, without any
based on behavior and results from other further instruction. This procedure is repeated
measures, results and observations from the until no new groupings are performed. After
WCST may assist in determining whether or not each sort, subjects are asked to indicate what
there is evidence of frontal brain system rule or principle they used. Subsequent trials
involvement. The most sensitive score from include stating the sorting principle used by the
the WCST has proven to be the number of examiner, and sorting in different ways accord-
perseverative responses (Heaton, 1981), as ing to cues. The test provides for an assessment
many brain-injured individuals will tend to of verbal and nonverbal concept formation and
repeat the same strategy, despite corrective perseveration and allows for information to be
feedback. The ability to shift response strategies derived regarding which component process
and perform such trial-and-error learning may may be principally affected. Preliminary studies
also be a good predictor of various aspects of of the CST have been promising in terms of the
everyday functioning. test's clinical utility (e.g., Beatty, 1993; Delis,
Squire, Bihrle, & Massman, 1992; Greve,
Farrwell, Besson, & Crouch, 1995). Also,
(ii) Category Test
because of the flexibility of the procedure, it
The Category Test (Reitan & Wolfson, 1993) is appropriate for a variety of ages, and
is one of the more sensitive measures of cerebral normative data on individuals aged eight to
dysfunction, probably because of its complexity. 89 will be available.
This task requires subjects to generate and
implement various problem-solving strategies 4.11.4.4 Arousal and Orientation
across its 208 items. It comprises seven subtests,
most of which require different principles to Evaluation of these basic functions is a
arrive at correct solutions. The subtests are necessary prelude to the examination of higher
generally arranged in increasing levels of level cognitive skills. These abilities represent
difficulty, although two of the subtests actually some of the basic building blocks upon which
require the same solution strategy, and the final higher abilities rely. For example, memory
subtest relies more upon memory, insofar as cannot usually be adequately assessed in a
subjects must recall which response was correct patient who is unable to attend to stimuli.
when they saw that item in a previous subtest. Arousal refers to basic level of consciousness as
The Category Test tends to be sensitive to a function of the reticular activating system.
dysfunction regardless of primary site of Arousal is often described as ranging from alert,
damage, although executive function impair- drowsy, lethargic, stuporous, to comatose.
ment tends to have a particularly disruptive Alternatively, hyperarousal can also be seen
effect on results. Several versions of the Category in terms of restlessness, agitation, and delirium.
Test are available (see Spreen & Strauss, 1998), The delirious patient is confused and transiently
including the standard machine version, a disoriented. Patients may seem to come ªin and
booklet version, a pamphlet version, and several out of itº from day to day or hour to hour, and
computerized adaptations of the original test. mental status examination results may fluctuate
Various short forms and child versions are also accordingly. Assessing level of arousal is a
available. clinical judgment, and often, if there are
difficulties in this area, a behavioral description
of what it takes to arouse a patient can be most
(iii) Delis±Kaplan Executive Function Scale
important (particularly since a more detailed
(DKEFS)
examination of cognitive abilities may not be
This is a battery of executive function tasks possible).
based on the Boston process approach that is Orientation is typically thought of in terms of
being published by The Psychological Corpora- the three general spheres of time, place, and
tion (Delis, Kaplan, & Kramer, in press). person. Orientation to time is the most readily
Among other measures, it includes the Califor- disrupted of these and can be assessed by asking
nia Sorting Test (CST), a measure designed to patients questions about the current time, day,
delineate more qualitative and multidimen- year, month, and date. Assessment of place is
sional aspects of problem-solving, concept obvious (i.e. ªWhere are we now?º), although
generation, and mental flexibility than other questioning about situational variables (ªWhy
sorting tasks. The test consists of a series of are you here?º) is also useful along these lines.
cards that differ in shape and color and have Orientation to person is the most resistant to
different words printed on them. The cards can disrupted cerebral functioning relative to time
be sorted according to any of eight different and place, and refers to a patient's ability to
Principal Cognitive Domains for Assessment 325

provide personal identifying information as well simple test of neglect involves line bisection,
as being able to identify others who are familiar wherein the examiner draws a series of horizon-
to him or her. tal lines of different length randomly on a page.
The task for the patient is to bisect each of these
4.11.4.5 Attention/Concentration with a small mark. Patients with neglect will
show a tendency to skew their marks to one side
The concept of attention is complex (e.g., see (i.e., away from the neglected side) of each
Mirsky, Anthony, Duncan, Ahearn, & Kellam, stimulus. A wide array of standardized clinical
1991). As some of the more basic cognitive and experimental assessment techniques is
abilities, attention and concentration are re- available to measure aspects of attention and
quired for adequate performance on essentially concentration, and only a few of these will be
all other cognitive tasks. Thus, it should be kept discussed.
in mind that if a severe attentional impairment is
present, associated difficulties should be man- 4.11.4.5.1 Assessment of attention
ifest on tests of other cognitive abilities as well.
General or undirected attention is related to (i) Digit Span (WAIS-III, WMS-III, WAIS-
overall arousal level and one's awareness of the R, WMS-R)
environment. Additional divisions of attention The familiar Digit Span task requires subjects
into immediate vs. sustained attention also has to repeat series of digits of increasing length.
clinical merit, as some patients can perform Digit span forward is a good measure of simple
simple tasks of immediate attention, yet may attention, and most healthy individuals perform
have difficulty when asked to maintain their within the seven plus/minus two span of
concentration longer. Measures of sustained apprehension range. While some consider the
attention, often referred to as ªconcentrationº digit span test to be representative of a memory
tasks, include measures designed to focus an task, the demands on memory per se are
individual's attentional skills for a more minimal, and digit span forward is best con-
extended period of time, for example, over sidered as a measure of attention. To illustrate
the course of a minute or longer. Informal this principle, amnestic patients with Alzhei-
assessment procedures may include having the mer's disease and Korsakoff's syndrome often
examiner say a long series of random letters or demonstrate a normal digit span forward despite
numbers and having the patient indicate severe anterograde amnesia. Digit Span back-
whenever they hear a designated target number wards represents a qualitatively different type of
or letter (e.g., the ªAº test of attention; Strub & task that relies more upon working memory
Black, 1988, p. 58). skills that should be considered separately from
Although it reflects a ªglobalº brain activity, digits forward (Reynolds, 1997).
attention has some focal and lateralized corre-
lates. Subcortical systems such as the reticular
(ii) Digit Symbol (WAIS-III, WAIS-R)
activating system in the core of the brainstem
and midbrain, as well as diencephalic structures This is a complex measure of attention,
(e.g., thalamus) play a central role in the brain's working memory, visual scanning, and psycho-
ability to maintain arousal and focus attention. motor speed. Subjects are required to learn to
In terms of laterality issues in the visual domain, associate numbers with symbols and must fill in
the left hemisphere attends to right hemispace the symbol that is paired with each of nine
and the right hemisphere attends to left hemi- numbers under time constraints (90 seconds).
space. Hemi-attentional disorders, less severe As noted, it tends to be the most sensitive of the
than hemispatial neglect, may require double WAIS-R subtests to brain dysfunction, and is
simultaneous stimulation to elicit, although often used separately from the Wechsler scales.
evidence of hemi-inattention can also be
detected in many cases by using line bisection
(iii) Symbol Digit Modalities Test
tasks, number or symbol cancellation tasks, or
even simple drawings to command or copy. As This is very similar to the Wechsler Digit
an extreme example, the patient with hemi- Symbol test, in that it requires the written
neglect may ignore one-half of their visual substitution of numbers and symbols. In this
world, denying that their arm or leg belongs to test, however, subjects must fill in the specified
them upon questioning, or eating only the food number that goes with the indicated symbol
on one side of their plate. Hemi-inattention and (Smith, 1982). An oral version is also available
neglect phenomena most often occur on the left wherein the patient responds verbally with the
side of hemispace, and the right hemisphere has number associated with each symbol they come
been implicated as perhaps playing a predomi- to. Ninety seconds are allowed in this test as
nant role in some aspects of attention. One well, and it has been used in a large number of
326 Neuropsychological Assessment of Adults

neurobehavioral studies since its inception. In designated target letter. In Connor's CPT
some research settings to assess medication (Connors, 1995), subjects are asked to press
effects, subjects complete the test repeatedly the appropriate key for any letter presented,
prior to the beginning of the study in order to except for the letter S. Multiple trials of different
decrease subsequent practice effects. interstimulus intervals are presented, and scor-
ing includes reaction times, number correct, and
errors of omission and comission.
(iv) Digit Vigilance Test
This represents a popular and well-normed 4.11.4.6 Language
(Heaton, Grant, & Matthews, 1991) version of
the traditional number cancellation task. In this Language is a central part of the human
test, subjects must cross out all of a specified experience, and a careful evaluation of the core
number they come to on pages that are filled components of language function are an
with rows and columns of single digits (Lewis & important part of the neurobehavioral exam-
Kupke, 1977; Lewis & Rennick, 1979). The task ination. Because of its complexity and multi-
requires visual scanning, psychomotor speed, faceted nature, an assessment of ªlanguageº
and sustained concentration, insofar as it requires multiple tasks. Note that some com-
typically takes several minutes to complete ponents of language (e.g., basic comprehension
the two-page version of this task. Additional and expression) are amenable to observation
information regarding lateralized inattention during even casual interactions with patients, as
can also be derived from this and similar tasks. well as during most of the traditional mental
status examination. Recall that the left hemi-
sphere is dominant for language functions in
(v) Visual Search and Attention Test (VSAT)
most people (particularly in right-handers,
This provides similar information to Digit wherein over 95% are left-hemisphere domi-
Vigilance regarding sustained visual concentra- nant for language), although a minority of
tion, but includes verbal and nonverbal stimuli individuals (particularly left-handers) have
(Trenerry, Crosson, DeBose, & Leber, 1990). bilateral speech representation or may even
The VSAT consists of four tasks (two for have their primary language center in the right
practice) that require subjects to cross out target hemisphere.
items from 10 rows of 40 stimuli (letters or In screening for language disturbance, several
numbers) on each version. Scoring for left- vs. primary areas should be assessed: verbal
right-sided errors in addition to an overall expression, naming, repetition, comprehension
attention index is included. (oral and written), reading, and writing.
Depending upon the function disrupted, per-
formances on tasks involving these skills may
(vi) Bells test
provide useful information in terms of localiza-
This is a symbol cancellation task similar to tion of dysfunction. Anterior dominant hemi-
Digit Vigilance, although the target in this case sphere systems are involved in expressive
is a small picture of a bell that is repeated within language (e.g., Broca's area), while more
a quasirandom array of 315 small visual figures posterior regions are involved in receptive
on a page (Gauthier, Dehaut, & Joanette, 1989). language (e.g., Wernicke's area). Aphasic dis-
One of the advantages of this task can be that orders can occur as a result of a disruption of
patients who utilize more systematized visual either of these areas, interconnecting cortical
search strategies (e.g., scanning left to right and and subcortical pathways, or in cortical asso-
from row to row) on more symmetrically ciation areas nearby. The major aphasic
organized tasks such as Digit Vigilance may syndromes are listed in Table 6.
show impairment more readily on the more Several comprehensive batteries for assessing
randomly arranged bells. aphasia have been developed, including the
Boston Diagnostic Aphasia Examination
(BDAE; Goodglass & Kaplan, 1987), the
(vii) Continuous Performance Test (CPT)
Multilingual Aphasia Examination (MAE;
Several versions of this computerized test of Benton & Hamsher, 1989; Benton, Hamsher,
sustained concentration have been developed, Rey, & Sivan, 1994), and the Western Aphasia
and variations on the basic task have been used Battery (Kertesz, 1982). Routine neuropsycho-
in research and clinical settings since the 1950s. logical evaluations vary widely in terms of the
In one common format, the CPT presents depth with which language is examined. At a
several series of single visual stimuli (usually basic level, to begin to assess language function,
letters) projected at brief intervals. Subjects the following areas and procedures should be
must respond when a letter was preceded by a considered.
Principal Cognitive Domains for Assessment 327

Table 6 Common aphasic syndromes and abilities affected.

Aphasia Type Fluency Naming Repetition Comp. Rdg. comp.

Broca's Poor Poor Poor Good Good


Wernicke's Good Poor Poor Poor Poor
Conduction Good Poor Poor Good Good
Global Poor Poor Poor Poor Poor
Transcortical motor Poor Poor Good Good Good
Transcortical sensory Fluent Poor Good Poor Poor
Mixed transcortical Poor Poor Good Poor Poor
Anomic Good Poor Good Good Poor

(i) Expressive language/fluency. Spontaneous specific words or a sentence to dictation, or


speech can be examined during clinical interview perhaps a sentence or paragraph on a topic of
and history-taking. This includes an assessment interest (which also allows for an assessment of
of rate of speech, prosody, grammatical com- sentence structure, punctuation, and grammar
plexity, and use of vocabulary. use, etc).
(ii) Confrontation naming refers to the ability The assessment of general expressive and
to name objects or representations of objects. receptive language skills encompasses a large
This can be assessed in a rudimentary way by array of tests and procedures, and is well
having the patient name things in the room or beyond the scope of this section. Nevertheless,
things you are wearing or holding. A much more various language skills should be sampled in the
sensitive means of assessing naming ability is to comprehensive neuropsychological evaluation,
show the patient pictures of objects and request at least at a cursory level. This may entail the
that they say what it is. Clinical experience administration of specific tests (e.g., the Token
suggests that the further the representation is Test to evaluate comprehension), or may reflect
from real life (e.g., a line drawing vs. a photo or more of a basic examination of an individual's
real object), the more sensitive the task will be to ability to repeat, speak, write, and comprehend.
dysnomia. An important note is that some Various types of linguistic errors can be seen
degree of dysnomia is seen in most types of during speech and these various tasks and
aphasia. should be noted when they occur. Paraphasias
(iii) Repetition is examined by having the can be semantic or verbal (i.e., the substitution
patient repeat single words (e.g., ªpaper,º of an incorrect word), phonemic or literal (i.e.,
ªball,º ªcombº) and phrases of increasing length the incorrect substitution of phonemes within a
and complexity (e.g., ªThe boys played base- word or a similar sounding word), or neologistic
ball,º ªThe vat leaks,º ªHe is the one who did (i.e., made-up, nonsense words).
itº). Careful notation of the specific errors made
(e.g., omissions, paraphasias) should be made, 4.11.4.6.1 Global language assessment
as these may merit further exploration using
(i) The Aphasia Screening Test
more standardized tasks.
(iv) Comprehension is also assessed in inter- This test is a popular and extremely brief
view and during the performance of tasks upon measure that employs single- or several-item
request by the examiner (i.e., does the patient sampling of a variety of language skills (e.g.,
seem to understand what is being asked, or do reading, writing, naming, comprehension)
tasks require additional explanation?). Exam- (Reitan & Wolfson, 1986). For tests such as
ples of comprehension tasks include: ªPoint to this, it must be kept in mind that a single error
the ceiling, then to the floorº ªDo two pounds of on a particular item, while potentially sig-
flour weigh more than one?º ªPlace the quarter nificant, may not reflect an underlying deficit.
to the right of the penny and pick up the dime.º Instead, this may reflect the brief nature of the
(v) Reading is examined readily by having the task and its sensitivity to response variability
patient read single words, something they have during a neuropsychological examination.
written, and/or read sentences written by the Hence, individual errors on such tasks should
examiner. Reading comprehension can also be be followed up with additional tests, lest an
tested by having the patient read a passage from overinterpretation of results occur.
a newspaper or magazine and then asking them
questions about what they read (a number of
(ii) The Wechsler Vocabulary Subtest
standard brief tasks are available).
(vi) Writing is tested by having the patient This is a subtest of the Wechsler scales
write their name (a highly over learned task), (WAIS-III, WAIS-R) and provides a good
328 Neuropsychological Assessment of Adults

assessment of vocabulary and basic word subjects generate as many animals as they can
knowledge. As previously noted, the Vocabu- think of in 60 seconds, with other variations of
lary subtest tends to be relatively insensitive to this task including fruits, vegetables, first names
acquired cerebral damage in most cases, and of people, supermarket items, cities, and US
may provide an index from the Wechsler scales states.
regarding an individual's likely level of pre- Performance on letter (phonemic) and cate-
morbid functioning. Unlike the NART, how- gory (semantic) fluency tasks has been analyzed
ever, the Vocabulary subtest, with its reliance in number of populations, and it has been
upon subject-generated word definitions, tends shown that the abilities involved in each task are
to be somewhat more susceptible to the effects dissociable. For example, in patients with
of brain injury, and may also be impacted by Alzheimer's disease, it has been commonly
conditions that affect initiation of responding. reported that patients show a greater difficulty
The WAIS-R-NI version includes provisions on category fluency, and often do somewhat
for multiple-choice responding for those pa- better on letter fluency. This is thought to be
tients who may have difficulty with verbal related to the breakdown in semantic knowl-
expression. The correlation between WAIS-R edge as seen in patients with Alzheimer's
Vocabulary and FSIQ is 0.81 (0.85 with VIQ), disease, unlike patients with subcortical demen-
and the WAIS-III Vocabulary correlation with tias (e.g., see Chan et al., 1995). Along these
FSIQ is 0.80 (0.83 with VIQ), also makes the lines, Moscovitch (1994) provides evidence to
Vocabulary subtest appealing when a brief suggest that letter fluency is more of a frontally-
estimate of current verbal intellectual function- mediated task, while category fluency relies
ing is desired. more upon temporal lobe structures.

(iii) Boston Naming Test (BNT) 4.11.4.7 Visuospatial Skills


This is a popular quantitative measure of This cognitive domain refers to the ability to
confrontation naming ability (Kaplan, Good- process visual or nonverbal information. In the
glass, & Weintraub, 1983). Because of the most common visuospatial task, patients are
prominence of dysnomia in the various aphasic asked to draw a specific shape (e.g., intersecting
syndromes, the assessment of word-finding has pentagons). More difficult tasks include draw-
become a component of many neuropsycholo- ings of three-dimensional figures (cube, house).
gical test batteries. The standard BNT consists Drawings to command can be compared with
of 60 line drawings of objects of decreasing patients' copies of drawings to see if the
familiarity (e.g., ranging from ªcombº to presence of a model facilitates performance.
ªabacusº). Several short forms of the BNT Three-dimensional constructional tasks also
are also available (Franzen, Haut, Ranking, & exist, such as replicating designs using blocks.
Keefover, 1995; Lansing, Ivnik, Cullum, & These and related tasks require various levels of
Randolph, in press) for use when a more brief visuoperceptual and visual-integrative skills, in
assessment of naming is desired. As noted, addition to a motor output (e.g., graphomotor
clinical experience indicates that the ability to or manipulospatial).
name drawings of items is a much more sensitive Various standard tasks and stimuli are
means of assessing dysnomia than having available for clinical use. Prominent visuospa-
patients name articles that are physically tial errors can be readily detected on such tasks,
present in the room. most often in association with right hemisphere
damage (primarily parietal dysfunction). Later-
alized errors in drawings, embellishment or
(iv) Verbal fluency (letter and category)
paucity of creations, perseveration, rotations,
Clinical tests of verbal fluency come in two intrusions, and loss of gestalt or detail can all be
primary varieties: letter or phonemic fluency observed and may provide useful diagnostic and
and category or semantic fluency (Benton & descriptive information. For example, patients
Hamsher, 1967; Borkowski, Benton, & Spreen, with right hemisphere damage will tend to have
1967). For letter fluency, subjects are asked to more difficulty maintaining the overall gestalt
generate as many words as possible in 60 of designs, whereas left hemisphere dysfunction
seconds that begin with a specified letter. Norms more often results in problems recreating details
and alternate forms are available (Kozora & of figures. Because many of these tasks require
Cullum, 1994; Spreen & Strauss, 1998), visual perception and a motor response, the
although the most common sets of letters in clinician must use caution in interpreting
use today include F, A, S, and C, F, L, with P, R, visuoconstructional impairments when primary
W serving as an alternate form. Category visual, perceptual, or motor disturbances are
fluency is most commonly assessed by having present.
Principal Cognitive Domains for Assessment 329

4.11.4.7.1 Visuospatial tasks (i) Clock drawing test


This is a staple for many clinicians across
A variety of drawings are commonly used in
disciplines involved in mental status examina-
neuropsychological settings to assess grapho-
tions and neurobehavioral assessment, and
motor visuoconstructional ability. Some draw-
actually can provide a wealth of information
ings have gained widespread popularity such as
regarding a variety of cognitive skills. First, the
clocks, drawings of crosses, and three-dimen-
reproduction of the face of a clock from
sional cubes. A dissociation can be seen in some
memory requires some familiarity with such a
patients between their ability to spontaneously
stimulus. In some cultures, this represents a
draw a specified design or item to command and
highly overlearned icon, and thus, should be
their ability to reproduce it from a model
reproduced even by children without significant
presented to them. Obviously, the production of
difficulty, providing they are able to tell time.
any drawing involves a variety of skills, and an
Typical instructions (Goodglass & Kaplan,
interpretation of impaired visuoconstructional
1987) for the clock drawing test are to ªDraw
ability should only be done once a thorough
the face of a clock showing the numbers and the
assessment of the various component processes
hands set to ten after eleven.º The additional
has been undertaken. For example, patients
instruction to include all of the numbers on the
with basic motor or movement disorders will
clock can be helpful in some cases, particularly
often produce impaired drawings, which should
since some clocks and watches do not include a
not be interpreted as a visuoconstructional
complete set of alphanumeric symbols. The
deficit if their level of motor impairment can just
instruction to set the hands at a specified time is
as easily explain their poor productions.
also critical, and even when gross visuocon-
The term ªconstructional dyspraxiaº is often
structional abilities and the overall gestalt of the
used to describe impaired visuoconstructional
clock may be correct, time-setting errors may
abilities. Unfortunately, the term is often
reflect problems with executive functions and/
misused and overused in clinical practice.
or memory. It is also important that the
Technically, the term refers to an impairment
instructions include setting the hands to a time
in the ability to construct visuospatial figures or
that is not too readily obvious or overlearned, so
drawings, whether two- or three-dimensional.
as to reduce the likelihood of an individual
Thus, a patient who demonstrates an impaired
getting the time correct by chance. Various
copy of a key from the aphasia screening test
scoring procedures for clock drawings have
should not be considered to have constructional
been established (e.g., see Freedman, Leach, &
dyspraxia if they show normal performances on
Kaplan, 1994; Kozora & Cullum, 1994; Royall,
other visuoconstructional tasks such as WAIS-R
Cordes, & Polk, 1998), although careful visual
Block Design, for example. Furthermore, it is
inspection and qualitative analysis, in addition
important to obtain more than one drawing
to the application of a particular preferred
sample from a patient, as the degree of complex-
scoring system, is important from a clinical
ity of one stimulus may fail to elicit an
interpretive perspective.
impairment, whereas another one of greater
Along these lines, some of the more common
complexity or less familiarity may reveal
errors that can be seen on the clock drawing test
evidence of difficulty in this domain. The
include confusion, gross distortion, and con-
clinician's familiarity with developmental and
structional dyspraxia (Figure 9(a)), aspects of
general norms on a given drawing procedure are
unilateral neglect (Figure 9(b)), stimulus-bound-
critical. This is particularly important on
edness and perseveration (Figure 9(c)), and
drawing tasks that do not lend themselves to
errors in hand-setting (Figure 9(d)). Figure 9(d)
highly detailed scoring procedures. Even when
was drawn by a patient with mild Alzheimer's
such scoring procedures are available, clinical
disease and demonstrates a common conceptual
interpretation based on a careful review of the
stimulus-bound response by their setting of the
component processes and the actual drawings
hands at the 10 and 11 in order to depict ªten
produced can be critical. For example, the
after eleven.º Comparison of clock drawings
Rey±Osterrieth complex figure has seen the
done to command and under copy conditions
development of a variety of scoring systems over
have also demonstrated utility in some differ-
the years (see Lezak, 1995; Spreen & Strauss,
ential diagnostic situations (e.g., Libon, Mala-
1998). Although more recent attempts such as
mut, Swenson, Sands, & Cloud, 1996).
the Boston qualitative scoring system (Stern
et al., 1994) have attempted to capture more of
(ii) Aphasia Screening Test
the essential details and qualitative aspects of the
design, quantitative scoring alone may prove Although designed primarily as a brief screen-
inadequate in the absence of careful inspection ing measure of gross language disturbance, the
by a seasoned clinician. aphasia screening test contains several drawings,
330 Neuropsychological Assessment of Adults

(a) (b)

(c) (d)

Figure 9 Clock drawings from patients showing various error types to command (9a, 9b) and copy (9c, 9d).
Principal Cognitive Domains for Assessment 331

including a square, triangle, Greek cross, and a reasons. To illustrate, Figure 10 depicts the
skeleton key (Reitan, 1984). Traditional quali- designs from two brain injured patients.
tative scoring procedures for these drawings Technically speaking, both productions
exist, although as with other similar tasks, visual would receive a WAIS-III score of zero, yet
inspection by a trained clinician is extremely they obviously reflect very different underlying
important, and one needs to be familiar with the aspects of dysfunction. As has been discussed by
wide range of variability seen in the normal Kaplan and co-workers (Kaplan, 1990; Mil-
population as well. The copy of the Greek cross berg, Hebben, & Kaplan, 1996), many patients
or similar crosses is a very common graphomo- with right hemisphere dysfunction tend to break
tor constructional task, and despite its relative the overall square configuration of the design,
simplicity, nevertheless does require a variety of as they have difficulties in processing the gestalt
skills, including visuoperception, planning, of figures. In contrast, patients with damage to
graphomotor control, and sequencing. the left hemisphere tend to more frequently
make internal rotation errors. In Figure 10 , the
(iii) Block Design design on the left was, in fact, produced by a
patient with a large subcortical infarct in the
The Block Design subtest from the WAIS-III left-hemisphere, and the design on the right was
or WAIS-R provides one of the most common produced by a patient with a large middle
means of assessing visuoconstructional ability. cerebral artery stroke.
This familiar task requires the replication of red
and white designs using three-dimensional
(iv) Rey±Osterrieth complex figure
colored blocks. As with other visuoconstruc-
tional tasks, careful observation of the patient's Although the Rey±Osterrieth figure is perhaps
performance during the procedure can yield very most well known as a measure of nonverbal
important information, as can the qualitative memory, the copy of the figure provides an
nature of the designs that are produced. This excellent sampling of visuoconstructional skills
serves as some of the rationale for including (Corwin & Bylsma, 1993; Osterrieth, 1944; Rey,
spaces for noting the designs produced by 1941). In addition to providing data pertaining
patients in the WAIS-III summary form, as to an individual's organizational approach to
the same score can be achieved for a variety of the task (assuming the sequential replication of

Figure 10 Block Design reproductions (top = model) of a patient with left hemisphere damage (left) and right
hemisphere damage (right) showing characteristic internal detail and configural errors, respectively, Note that
technically, both reproductions would receive a score of zero, although they reflect very different types of errors.
332 Neuropsychological Assessment of Adults

the figure is somehow carefully tracked and Remote memory pertains to the ability to
recorded), visuoperceptual as well as visuospa- recall information from the distant past, for
tial and graphomotor constructional skills are example, from many years ago. Remote
necessary components to this task. Information memory involves the recollection of previously
pertaining to lateralized dysfunction can also be stored data (over the course of months or years)
obtained in some cases (Rapport, Dutra, that has been rehearsed or retrieved multiple
Webster, Charter, & Morrill, 1995). This re- times and in various contexts (e.g., where you
quires not only careful attention to the gestalt vs. went to high school, aspects of your job, stories
internal details of the figure, but also to left±right you have learned, etc.).
and quadrant by quadrant analysis, in addition In most cases of acquired cerebral dysfunc-
to an assessment of item and figural rotations tion, recent memory is the most vulnerable to
and perseverations. Gross omission or distor- damage (i.e., anterograde amnesia), while re-
tion of part of the figure might also implicate mote memory tends to remain relatively more
lateralized dysfunction. For example, Figure intact. However, retrograde amnesia does occur,
11(a) depicts the Rey±Osterrieth from a patient and often extends back into time (sometimes
following the rupture of a right middle cerebral decades, as in Korsakoff's syndrome) in a
artery aneurysm. The Rey±Osterrieth stimulus temporal gradient fashion, with a greater dis-
figure is presented in Figure 11(b). ruption of more recent memories. The clinical
assessment of memory involves multiple steps
4.11.4.8 Memory and procedures. At a basic level, obtaining a
history from a patient involves their remote
Memory is one of the most common clinical memory abilities. Questions pertaining to famil-
complaints of patients referred for neuropsy- iar information historical events can also be
chological evaluation. This represents a com- useful (e.g., as in the WAIS-III Information
plex set of abilities that are all too often subtest).
subsumed under a generic and unitary rubric of Recent memory is best informally assessed by
ªmemory.º While multiple memory systems and presenting the patient with a few words, name
skills have been identified and discussed over and address, or a paragraph to recall. Here it is
the years (Schacter & Tulving, 1994), some of essential to stress that delayed recall of such
the more useful concepts from a clinical information must be examined. For example,
standpoint include immediate, recent, and amnestic patients can often repeat back even a
remote memory. Because of the multifactorial brief story immediately after it is presented, and
nature of memory and the use of various labels it is only after additional time has elapsed (e.g.,
and terms to describe this set of abilities, some 10±15 minutes) that their full amnestic deficit
basic definitions of memory concepts are in may be manifest. The use of multiple memory
order before proceeding to a discussion of procedures is also recommended, since signifi-
common clinical memory assessment tools. cant variability even among healthy normal
Immediate or working memory refers to a subjects can be seen on singular brief tests of
very brief memory store where information is memory. Furthermore, the use of standard or
retained for a matter of seconds before it decays, favorite memory stimuli is also encouraged (i.e.,
in the absence of rehearsal. A prime example is rather than inventing new stimuli for every
going to the phone book to look up a new patient), since the specific content of items can
number, walking across the room to the phone, have a profound effect on their ability to be
and dialing the number. Unless the number is of recalled (e.g. ªrose, ball, keyº or ªapple, table,
particular relevance or is rehearsed, it will pennyº are easier for most people to remember
probably be forgotten immediately or shortly than ªbrown, honesty, tulip,º and these sets are
after the call. Evidence suggests a particularly no doubt easier to recall than ªabstemious,
prominent role of the prefrontal cortex in benign, pusillanimousº). Regardless of what
working memory. brief memory tasks are employed at the bedside,
Recent memory is the ability to encode new it must be kept in mind that more lengthy
information and remember things learned in the psychometric procedures may be required to
recent past. Temporal lobe structures (particu- elicit deficits in many cases.
larly the hippocampal formation) are particu-
larly important in this stage of information
4.11.4.8.1 Clinical assessment of learning and
acquisition. Examples include remembering a
memory
few words or an address presented several
minutes ago, or remembering what you had for Memory can fail for a wide variety of reasons.
breakfast. This type of memory is what is Problems with attention/concentration which
routinely assessed clinically and is what allows result in an inefficiency in the ability to attend to
us to learn new information each day. and learn new information may be perceived as
Principal Cognitive Domains for Assessment 333
(a)

(b)

Figure 11 Rey±Osterrieth copy by a patient with right hemisphere damage (11a). Note the distortion of the left
side of the figure in particular. The original Rey±Osterrieth figure is (11b).
334 Neuropsychological Assessment of Adults

a problem with memory, for example. If (ii) Forgetting rates


information is not efficiently encoded, it
An important concept in memory assessment
obviously cannot be stored for later recall.
includes the examination of delayed recall for
Subjectively, this can be perceived as a problem
newly learned information. The simple assess-
with remembering. Other component abilities
ment of the ability to immediately recall newly
that can interfere with efficient memory func-
presented information, for example, may not
tioning include language disturbance and
be significantly impaired in individuals with
problems in visuospatial processing, depending
even gross memory disorders. This can be
upon the nature of the material that is to be
illustrated by many patients with Alzheimer's
remembered.
disease, as their ability to recall a brief
Some of the different learning and memory
paragraph that was just presented may be
types that have been proposed include episodic,
relatively intact, even though they show a rapid
semantic, procedural, implicit, explicit, and
rate of forgetting over time. The seminal work
working memory. A discussion of these is
by Butters and colleagues (e.g., Butters, 1985)
beyond the purpose of this chapter, although
clearly demonstrated the importance of includ-
detailed elaborations can be found in sources
ing delayed recall techniques in the assessment
such as Schacter and Tulving (1994) and Squire
of memory disorders. Along these lines, the
(1987). At a basic level, most of the clinical
calculation of forgetting rates or percent loss
assessment approaches to memory involve an
over time has proven most useful in clinical
examination of episodic memory, insofar as
memory assessment.
patients are required to recall material that was
presented to them earlier. Typically this material
is presented in an explicit memory paradigm
(iii) Patterns of memory dysfunction
wherein patients are instructed to learn and
remember information, although in some pro- In addition to examining immediate vs.
cedures, implicit recall is required. One example delayed recall and looking at forgetting rates,
of the latter would be the typical administration it is important to carefully analyze patterns of
of the Rey±Osterrieth complex figure, wherein memory performance across individual patients.
subjects are asked to copy the complex design, Different disorders are now known to demon-
and once that is done, they are asked to recall it strate characteristic memory profiles in many
from memory, without ever having been in- cases, and the degree to which an individual's
structed to ªrememberº the figure. profile matches what is known in the literature
may be of great assistance in differential
diagnostic situations. As one example, patients
(i) Verbal and nonverbal memory
with Alzheimer's disease typically show impo-
Verbal learning and memory tasks (e.g., word verished learning, rapid forgetting, faulty en-
lists, verbal paired associates, and story recall) coding and storage of information, and a
have been shown to be strongly associated with tendency to rely upon more passive, less efficient
left temporal lobe functioning (Chelune, 1995; learning strategies (Bondi, Salmon, & Butters,
Hermann, Wyler, Richey, & Rea, 1987). More 1994). Their recall of stimuli that has been
debate exists about so-called ªnonverbalº recently presented to them is often marked with
measures of memory and their respective intrusion errors (i.e., inserting features of related
sensitivity to right temporal lobe dysfunction or unrelated stimuli during their recall of
(Barr et al., 1997). Some of this discussion recently presented material). Like the amnestic
centers around the nature of nonverbal stimuli Korsakoff's disease patient, individuals with
that are employed in many popular memory Alzheimer's disease may tend to confabulate
tests. For example, many of the figures that are during recall in an apparent attempt to ªfill inº
used are verbally encodable or lend themselves for the material which they are no longer able to
quite readily to verbal labels, thereby perhaps effectively organize or recall.
involving left temporal lobe structures to a Another example of a memory pattern that
significant degree. Other debate exists with has been shown useful in clinical assessment is
regard to the specificity of functioning of the the distinction between recall and recognition.
right temporal lobe, as some argue whether Free recall involves the recollection of pre-
there is a ªnonverbalº memory system at all. viously presented material. Cued recall addi-
This is perhaps nowhere better illustrated than tionally involves the presentation of a cue to
in cases of right temporal lobectomy in the help ªjogº an individual's memory. For exam-
treatment of intractable epilepsy, wherein scores ple, if a recently presented story was about a
on some nonverbal memory tests may show no woman who was robbed, a recall cue might be,
significant change following surgery (Naugle, ªit was about a woman.º The next step in
Chelune, Cheek, Luders, & Awad, 1993). assisted memory is the recognition paradigm,
Principal Cognitive Domains for Assessment 335

wherein subjects are presented with several Associate Learning (learning verbal associations
possible stimulus items that were presented to such as ªmetal±iron,º ªcrush±dark.º A total
them earlier, and the task is simply to identify Memory Quotient (MQ) score could be derived
which item was previously presented. It has to permit a more direct comparison with tradi-
been argued that such recognition paradigms tional IQ scores. Thus, a significant discrepancy
represent ªeasierº tasks, insofar as the load on between MQ and IQ scores would indicate a
memory processing is diminished. An everyday particular impairment of recent memory. In its
analogy would be driving to a new location, original version, the WMS assessed only im-
being unable to remember the name of the mediate recall of verbal and nonverbal material,
street, but then quickly recognizing the name of in addition to orientation/mental control and
the street once it is seen. Some patients with verbal paired associate learning. Importantly
various forms of amnesia demonstrate the from a memory assessment standpoint, Russell
ability to learn, even in the absence of awareness (1975) added delayed recall procedures to the
(e.g., Schacter & Tulving, 1994), and such Logical Memory and Visual Reproduction
mechanisms may be involved in recognition subtests, resulting in what became know as the
memory performance. Russell revision of the WMS.
(b) Wechsler Memory Scale-Revised
4.11.4.8.2 Clinical memory tests (WMS-R). Administration and scoring proce-
dures were revised and made more specific in this
An enormous array of published measures version of the WMS, and the floor effects of the
exist for the clinical examination of memory previous test were addressed (Wechsler, 1987).
skills, and an even larger number of tasks have Importantly, delayed recall measures were built
been developed in the experimental and cogni- into several of the subtests so as to assess
tive psychology literatures. A few of the tests forgetting over time. A number of new subtests
that are commonly used in clinical practice are to assess various aspects of memory were added,
discussed below, and an attempt has been made which allows for a more comprehensive assess-
to avoid too many overlapping measures. ment of different aspects of learning and mem-
Before presenting a brief overview of individual ory, but also adds substantially to the time.
memory assessment tools in common clinical Individual subtests include Information and
use, one popular omnibus memory battery will Orientation, Mental Control, Digit Span, Visual
be discussed. It should be noted, however, that Memory Span, Logical Memory I (immediate
many of the subtests which comprise the more recall) and II (delayed recall), Verbal Paired
global memory batteries are often used indivi- Associates I and II, Figural Memory, Visual
dually; hence, brief mention of some of the more Paired Associates I and II, and Visual Repro-
popular subtests will be made in the following duction I and II. The following memory index
subsections. scores can be derived: Attention/Concentration,
Verbal Memory, Visual Memory, General
4.11.4.8.3 Memory batteries Memory, and Delayed Memory. The WMS-R
was a welcome revision to the WMS, addressing
(i) Wechsler memory scales
some of the concerns about the earlier scale and
This series of memory assessment batteries providing much-needed improvements in stan-
has a long tradition and represents some of the dardization and norming procedures. While it
most popular clinical memory tests in use in the has proven to be useful in differentiating various
late 1990s. Because each version of the test memory-disordered populations (TroÈster, Ja-
contains various significant revisions, they will cobs, Butters, Cullum, & Salmon, 1989), the
be discussed separately, although there is some utility of the Verbal and Nonverbal Memory
overlap of core concepts and general ap- indices in distinguishing lateralized cerebral
proaches to memory testing. dysfunction is questionable (Loring, Lee, Mar-
(a) Wechsler Memory Scale (WMS). This tin, & Meador, 1989). In addition to the norms
was the original version (Wechsler, 1945) of the available in the test manual (age 16±74), norma-
popular Wechsler Memory Scales. The WMS tive data for the elderly (up to age 94) are
was designed to provide a brief assessment of available (Ivnik et al., 1992).
overall memory function by combining several (c) Wechsler Memory Scale-III (WMS-
tests of attention and memory. The subtests III). This represents the most recent revision
include the following: Personal and Current of the WMS (Wechsler, 1997). The WMS-III
Information, Mental Control (e.g., counting was normed along with the WAIS-III, thereby
down from 20, reciting the alphabet, counting providing for IQ and memory scores derived
by threes), Digit Span, Logical Memory (para- from the same standardization population. This
graph recall), Visual Reproduction (immediate allows for more direct comparisons between IQ
recall of a series of geometric figures), and and memory scores which should facilitate
336 Neuropsychological Assessment of Adults

inferences regarding the likelihood of acquired is presented five times to subjects, in the same
memory impairment. The standardization sam- order, which allows for not only an assessment of
ple was selected to be more representative of learning, but of primacy-recency recall effects.
the US population, and included individuals up The CVLT was developed with principles of
to age 89. New subtests were added, and some cognitive neuroscience and clinical memory
are optional in terms of the derivation of disorders in mind, and provides a quantitative
memory index scores. Scoring procedures were assessment of a wide variety of qualitative
further refined, updated, and expanded, and the features of memory performance. It has been
available computer-generated scoring greatly shown to be reliable (Paolo, Troster, & Ryan,
facilitates the comparison of WMS-III and 1997) and clinically useful in a variety of clinical
WAIS-III scores alike. Separate scores compar- populations, particularly in the differential
ing working memory and immediate memory, diagnostic assessment of memory disorders
auditory (previously referred to as ªverbalº) and (Cullum, Filley, & Kozora, 1995; Peavy et al.,
visual (previously ªnonverbalº) memory, im- 1994).
mediate and delayed recall, single-trial vs. multi- After the fifth presentation of the word list, an
trial learning, and percent retention composites interference list is presented to permit an
in each domain are now provided. Recognition examination of the effects of interference on
memory procedures were also added to several recall. Next, free recall of the initial word list is
of the WMS-III subtests. The subtests of the examined, followed by cued recall that involves
WMS-III include: Information and Orientation, the presentation of semantic categories for the
Logical Memory, Verbal Paired Associates, words (e.g., ªTell me all the items that were
Word Lists, Faces, Family Pictures, Visual fruitsº). Long delay free recall is examined
Reproduction, Letter±Number Sequencing, following a 20-minute delay. This is immedi-
Spatial Span, Digit Span, and Mental Control. ately followed by another series of cued recall
Whereas this is a lengthy list of measures, five of trials, and finally, a recognition trial wherein a
the subtests (Information and Orientation, longer list of items containing target items as
Word Lists, Visual Reproduction, Mental Con- well as distractors is presented, and subjects
trol, and Digit Span) are supplemental or must simply indicate ªyesº or ªnoº whether or
optional and do not contribute to the primary not these items appeared in the originally
index scores. From various combinations of the presented list. As noted, the CVLT provides
core subtests, the primary index scores can be for a quantified assessment of a variety of
derived: Auditory Immediate, Visual Immedi- qualitative aspects of verbal learning and
ate, Immediate (combined), Auditory Delayed, memory, and scoring software is available to
Visual Delayed, Auditory Recognition Delayed, provide normative reference scores for a host of
General Memory, and Working Memory. verbal learning and memory indices. An alter-
The WMS-III is obviously comprehensive in nate form of the CVLT has been developed that
its scope, now including aspects of a variety of has shown good correlations with the original
memory assessment stimuli and procedures. It is version (Delis et al., 1991), and a revised CVLT
lengthy and overlaps with some other standard is in preparation in 1998. A nine-item dementia
memory tests in common use, but contains some version of the CVLT has also been derived for
interesting new procedures based on research use in more impaired populations (Libon,
and clinical experience in neuropsychology. Mattson, et al., 1996).
Because of its recent release, studies regarding
its ultimate clinical utility in various settings and
(ii) Hopkins Verbal Learning Test (HVLT)
populations will be needed, in addition to the
development of procedures for deriving short This test was developed to provide an
forms and estimating memory index scores. abbreviated assessment of word list-learning
Perhaps even more so than with the WMS-R, it and memory (Brandt, 1991). It is particularly
is anticipated that when administration of the well suited for more impaired patients who
entire test is not feasible, many clinicians will might not be able to complete measures such as
routinely utilize selected subtests of the WMS- the CVLT. A list of 12 items from three semantic
III to fit their particular memory assessment categories is presented across three trials,
needs. followed by recognition testing. A modification
of the original HVLT provides for an assess-
4.11.4.8.4 Verbal memory tests ment of delayed recall prior to recognition in
order to assess forgetting over time. A unique
(i) California Verbal Learning Test (CVLT)
aspect of the HVLT is that it includes six
This is a 16-item word list-learning task that alternate forms, thereby reducing practice
contains items from four semantic categories effects and making it a good choice in serial
(Delis, Kramer, Kaplan, & Ober, 1987). The list assessment situations.
Principal Cognitive Domains for Assessment 337

(iii) Logical Memory (WMS, WMS-R, formation than quantitative scores alone. The
WMS-III) Boston scoring approach (Stern et al., 1994) to
the Rey±Osterrieth attempts to provide a
This is the classic story learning and memory
quantitative assessment of many of the quali-
test that is one of the most widely used clinical
tative features of the figure.
indices of memory function. It is a measure of
the ability to learn and retain new structured
verbal information. The original WMS included (ii) Visual Reproduction (WMS, WMS-R,
two stories that were read to patients, followed WMS-III)
by an assessment of immediate recall. The
This is one of the most commonly used
WMS-R updated the stories, norms, and
measures of visual or nonverbal memory, and is
scoring procedures, and importantly, added a
often used in conjunction with its verbal
delayed recall trial. Each of the WMS-R stories
counterpart, Logical Memory, even when the
contains 25 bits of information, and a standard
WMS are not used in their entirety. Although
30-minute delayed recall procedure provides for
each version of the WMS contains somewhat
a ready assessment of forgetting rates. The
different stimulus figures (only two of the
WMS-III version includes an initial story that is
original WMS figures have been included in
administered in standard fashion (i.e., one trial,
both the WMS-R and WMS-III, and Visual
followed by immediate and then delayed recall),
Reproduction is optional in the latest rendi-
and a second story that is repeated in order to
tion), the administration procedures are similar.
assess the effects of learning on storage and
It consists of the presentation of several cards
retrieval of material.
with geometric designs on each. Each card is
Patients' responses should be written down
presented to subjects for 10 seconds, and after
verbatim in order to glean additional qualitative
each is removed from view, immediate recall is
information, and to allow for cross-checking of
assessed by having the subject draw the design
scoring. Scoring for Logical Memory is done
from memory. Following a delay of 20±30
largely based upon identical or almost-identical
minutes (25±35 minutes in the WMS-III),
recall of the story material, although several
subjects are asked to reproduce the designs
scoring procedures allow partial credits to be
again from memory.
assigned for close approximations, and the
As noted, Visual Reproduction has long been
WMS-III version also incorporates scores for
used as a measure of nonverbal memory. While
thematic units or gist. Logical Memory has been
some literature indicates that the test does not
shown to be highly useful in the assessment of a
readily lend itself to localization of cerebral
wide array of memory disorders and senstive to
dysfunction in large groups and may be
left hippocampal damage in particular (Sass
insensitive to nondominant temporal lobe
et al., 1992).
resection (Barr et al., 1997; Naugle et al.,
1993), it can be useful in some individual cases
4.11.4.8.5 Nonverbal memory tests as there may be a tendency for patients with
right temporal lobe dysfunction to do more
(i) Rey±Osterrieth complex figure
poorly (Chelune & Bornstein, 1988). Further-
This is a very popular measure of nonverbal more, it can be of assistance in providing a more
memory and is described and presented in detailed examination of memory function
several sources (e.g., Corwin & Bylsma, 1993; beyond the assessment of learning and memory
Lezak, 1995; Rey, 1941; Spreen & Strauss, for words and stories.
1998). It consists of a complex figure made up of
various subcomponents. Administration proce-
(iii) Warrington Recognition Memory Test
dures vary somewhat (see Meyers & Meyers,
(RMT)
1995), although in one of the more common
administrations the task requires patients to This is a test of word and face recognition that
first copy the figure, and then immediately upon allows for a comparison between basic verbal
its removal (and without instructions to and nonverbal abilities (Warrington, 1984).
remember the figure), reproduce it from First, a series of 50 single words is presented in a
memory. Fifteen minutes later, delayed recall booklet, followed by yes/no recognition testing.
is assessed. Recognition procedures have also The same procedure is performed for a series of
been developed and used in some settings. 50 black and white pictures of unfamiliar male
Several scoring systems for the test have been faces. While some studies of patients with
developed (e.g., see Lezak, 1995; Loring, lateralized brain damage have demonstrated the
Martin, Meador, & Lee, 1990), although expected associations between left hemisphere/
qualitative interpretation of patients' reproduc- word recognition and right hemisphere/face
tions often provide much more detailed in- recognition deficits (Warrington, 1984), this
338 Neuropsychological Assessment of Adults

relationship does not always hold true (Naugle, to the back of either hand, interspersed with
Chelune, Schuster, Luders, & Comair, 1994), bilateral touch. Subjects are asked to indicate
and the test is perhaps best used as a which hand was touched. Tactile extinctions or
supplemental test of memory in conjunction suppressions occur when one side is not
with more primary measures. In clinical perceived, and if consistent, this may suggest
practice, many patients seem to have more contralateral parietal lobe dysfunction.
difficulty with the facial recognition portion of Basic motor function can be quantified using
the RMT, and the association between words various techniques and instruments, depending
and faces with left vs. right hemisphere upon the degree of detail that is desired about a
dysfunction is variable. given function. Simple finger tapping speed can
be assessed crudely by having the patient tap
their first finger to their thumb as quickly as
4.11.4.9 Motor and Sensory Function possible, first with the dominant, then non-
dominant hands. Grip strength can be assessed
A variety of clinical procedures have been by having the patient squeeze the examiner's
developed in neuropsychology and behavioral fingers, but this technique tends to be extremely
neurology to examine simple and complex crude and is better left to a dynamometer when
motor functions. These range in complexity, accurate information is desired. Fine motor
degree of standardization, and level of knowl- dexterity can be assessed using a variety of
edge required for appropriate interpretation/ procedures, including having the patient touch
analysis. At a most basic observational level, each finger consecutively to their thumb in
how a patient walks and moves during an rapid, repetitive fashion. Stereognosis can be
evaluation can provide useful information readily examined by having the patient name
regarding aspects of their motor functioning. objects placed in their hand without the aid of
Evidence of unilateral motor weakness, tremor, vision.
or other unusual movements may reflect a Whatever motor or sensory assessment task
neurological condition or even suggest a focal used, is important to note that noncortical or
cerebral deficit. These observations can also noncentral nervous system (non-CNS) (i.e.,
spur the astute observer to inquire more subcortical, spinal, peripheral) damage can
carefully about specific symptoms and may produce deficits in motor and/or sensory
help guide the neuropsychological evaluation in function. Furthermore, other factors such as
terms of test selection. To take an obvious inattention or impairments in concentration or
example, the patient presenting with unilateral memory may result in poor scores or errors on
left-sided motor weakness (e.g., foot drag and/ sensorimotor tasks. In the absence of neuro-
or decreased limb mobility) should undergo pathology, gross errors or variable responses
careful inquiry about their symptoms, and should alert the clinician to the possibility of
depending upon the situation, the clinician poor effort. In practice, motor and sensory
may wish to follow up on this observation by deficits tend to be consistent, such that repeated
including appropriate sensory, motor, and trials later in the examination should yield
cognitive tasks in his/her evaluation in order similar results. Interpretation of results from
to assess abilities associated with right hemi- these tests should always begin with these
sphere function. considerations in mind.
The primary motor and sensory areas of the
cortex, as well as their multiple ascending and
descending fiber tracts, are well known and 4.11.4.9.1 Psychometric measures of motor and
charted. It must be kept in mind, however, that sensory function
sensory and motor regions have significant
(i) Finger Tapping Test
overlap in terms of cortical surface representa-
tion, and deficits are often multimodal in This test is a simple assessment of index finger
nature, with more complex integrative functions tapping speed using a mechanical tapping key
having a broader distribution. In addition to the device (Reitan & Wolfson, 1993). Patients are
assessment of two-point discrimination and instructed to tap using their dominant hand
simple touch, for example, more subtle cortical index finger as fast as possible, and the number
deficits may be detectable by the use of double of taps performed within a series of 10-second
simultaneous stimulation techniques. These are intervals is recorded and averaged. These results
typically conducted in the visual, auditory, and are compared with similar scores from the
haptic modalities, and a variety of simple nondominant hand. Most right-handed indivi-
quantifiable procedures can be used. Double duals tend to show approximately a 10%
simultaneous stimulation in the tactile modality dominant hand superiority. Gross aberrations
involves repeated trials of unilateral light touch from this may reflect contralateral anterior
Principal Cognitive Domains for Assessment 339

cerebral dysfunction, although it is important to being evaluated. Poor effort can result in
note that arthritis or peripheral injuries to the artificially low or otherwise abnormal neurop-
brachial plexus, arm, hand, or fingers can have sychological test scores, and the clinician must
profound effects on this test and must be always be alert to this potential confound
carefully ruled out before cerebral implications during test administration and interpretation.
are made. Careful analysis of levels and patterns of
neuropsychological performance in relation to
the disorder in question is critical (i.e., ªDo the
(ii) Grooved Pegboard
results make sense given what has happened to
This is a test of fine motor dexterity that the patient?). For example, the patient with an
requires subjects to pick up small (one inch) uncomplicated mild traumatic brain injury
metal pegs that are rounded on one side and should not demonstrate severe, widespread
squared on the other (Klùve, 1965). The task is neuropsychological deficits, and this finding
to pick them up one at a time and place them in in such a case should raise questions regarding
similarly shaped holes on the test platform as levels of motivation and effort. Furthermore, as
quickly as possible. First, the dominant hand is serial testing frequently occurs within the
tested, followed by the nondominant hand. The context of forensic neuropsychological evalua-
task requires fine motor coordination and tions, the examination of consistency of scores
speed. within and across test sessions can be a very
good source of information regarding the
(iii) Hand dynamometer likelihood of invalidity or unreliability of results
(Cullum, Heaton, & Grant, 1991; Reitan &
Various manufacturers provide grip strength Wolfson, 1997).
meters to assess basic hand strength. This is a The issue of malingering in relation to
device that requires patients to simply squeeze a neuropsychological assessment has been re-
handle as hard as they can, and the amount of viewed in several excellent sources (e.g., Fran-
force they exert is indicated in pounds or zen, Iverson, & McCracken, 1990; Nies &
kilograms on the instrument. Two or three trials Sweet, 1994). Suspected malingering or willful
are usually administered with the dominant and distortion/exaggeration of deficits continues to
nondominant hands. Dominant hand grip pose challenges in various clinical settings.
strength usually tends to be approximately Along these lines, a number of specialized tests
10% stronger than nondominant in most right- and procedures have been developed to help
handed individuals. detect exaggerated or feigned deficits. Some of
these involve the administration of simple tasks
(iv) Sensory±Perceptual Examination (SPE) that are presented as being difficult (e.g., Rey's
15-item test; Lezak, 1995). Others that show
This test is included in the HRB, but is often promise are based upon the time to complete
used apart from this battery in its entirety or in shorter or less difficult vs. longer tasks (e.g., the
portions (Reitan & Wolfson, 1985). The basic Dot Counting Test; Lezak, 1995; Rey, 1941).
functions assessed by the SPE include finger Another technique involves the comparison of
gnosis, graphesthesia, stereognosis, and basic unusual patterns of performance with results
sensation and extinctions to simple stimulation known to be associated with a particular deficit
and double simultaneous stimulation in the or disorder (e.g., Brandt, 1988). Perhaps the
visual, auditory, and tactile modalities. It most popular approach has been to utilize
essentially represents a quantified approach to forced-choice recognition testing (Guilmette,
some of the sensory testing often performed in Hart, & Giuliano, 1993; Hiscock & Hiscock,
the traditional neurological examination. In the 1989), which relies upon probability theory by
absence of peripheral damage factors, arthritis, identifying performances that fall below a
severe cerebral dysfunction, or prominent chance level (e.g., the Portland Digit Recogni-
attentional difficulties (which can make the tion Test; Binder, 1993).
interpretation of patient responses more diffi- The detection of malingering poses a complex
cult), deficits associated with these various tasks challenge to clinicians, particularly as a variety
may suggest some degree of impairment of the of psychological and situational factors (e.g.,
sensory cortex and/or associated pathways. gain issues) may further complicate the diag-
nostic picture. While it is difficult to definitively
4.11.4.10 Assessment of Motivation prove malingering in a particular case, the
existing research suggests that it is often possible
The sensitivity and clinical utility of neurop- to detect evidence of malingering, although this
sychological measures is largely dependent requires skillful and deliberate inquiry by the
upon the cooperation and effort of the patient clinician.
340 Neuropsychological Assessment of Adults

As with diagnosis, the clinical history can to be associated with depressive symptomatol-
prove invaluable in establishing the likelihood ogy, while lesions of the right hemisphere tend
of malingering. Consistency of reporting is to be more frequently associated with varying
important, and information from various degrees of unawareness or denial (anosognosia)
sources can be compared, including clinical of deficit (e.g., see Lacritz & Cullum, 1998).
interview data and reports from the patient to Right hemisphere damage can also result in
other professionals. The consistency or ªfitº aprosodia, which refers to the impaired ability to
between neuropsychological test results and communicate meaning using intonations and
reports of everyday functioning can also be inflections during speech (emotional language).
highly useful. For example, the patient who Some data suggest that the aprosodic syndromes
performs in the severely impaired range on secondary to right hemisphere dysfunction
formal memory testing should demonstrate parallel the aphasias (Ross, 1985), with anterior
difficulties with memory in their everyday right hemisphere lesions being more associated
functioning. Likewise, neuropsychological test with expressive dysprosody, and more posterior
data obtained from the patient who has gross lesions associated with receptive dysprosody.
difficulty with coordination, sequencing, mem- Expressive prosody testing can be readily
ory, and reaction times who rides skillfully away accomplished by asking the patient pretend to
on his/her motorcycle following the evaluation be an actor and say a neutral sentence such as ªI
should be carefully scrutinized. am going to the movieº using different affective
As indicated by Nies and Sweet in their review intonations (happy, sad, angry, indifferent).
of the literature, ªNote that, at present, multiple Receptive prosody can be assessed by the
measures and methodologies are needed, in that examiner (while outside of the patient's view)
no single measure or methodology has proven saying a sentence in different tones of voice and
sufficient to dateº (1994, p. 544). They having the patient indicate if the examiner
recommend some of the following strategies sounded happy, sad, angry, indifferent, etc. The
that may assist in the clinical detection of patient with expressive dysprosody will gener-
malingering: (i) use specific tests of malingering ally sound monotone and be unable to alter the
as outlined above, (ii) evaluate patterns on affective tone of their voice across test conditions
clinical neuropsychological tests where below- aside from perhaps becoming louder or softer at
chance performances can be assessed, (iii) times. The patient with receptive dysprosody will
examine unusual or aberrant test results, (iv) be able to say the sentence with variable affective
examine consistency of test scores within and intonations, but will have difficulty guessing the
across measures and test sessions, (v) obtain affective valence of the examiner's sentences.
independent information regarding a patient's The issue of changes in emotional functioning
reports of their everyday functioning, and (vi) following brain damage represents a complex
obtain a detailed clinical history and carefully biopsychosocial interaction that merits careful
examine the information provided, both in investigation, since multiple factors may be
terms of accuracy and consistency. involved. Obviously, an insult to the brain can
have a direct neurobiological effect upon the
4.11.4.11 Personality and Emotional way in which information and affect is
Functioning processed. However, individual psychological
reactions to changes in cognitive and/or
Changes in the functional integrity of brain physical functioning are also commonly seen.
systems can result in alterations in emotional To help establish whether neurobiologically-
reactivity (e.g., lability) and personality. In induced changes have occurred, the clinician
some cases, subtle changes in behavior and must be careful to obtain information regarding
personality may represent the primary or patients' premorbid functioning from family
earliest harbingers of brain dysfunction. The members or others who know the patient, as
evaluation of emotional or personality change well as patients themselves. Defensiveness or
associated with neurological disease or damage unawareness on the part of the patient and/or
poses a complex set of challenges. Damage to family may exist, and it must be kept in mind
the frontal and prefrontal cortex can be that the clinician may be hearing a biased
particularly associated with personality change. picture of the situation. The issue of secondary
This may present variably as decreased respon- gain and malingering as discussed above must
sivity (e.g., apathy or amotivation) or increased also be considered, particularly in litigation
responsivity (e.g., hypomania, increased or situations. Finally, the potential effects of pre-
indiscrimant drives), depending upon the pri- existing psychopathology should be examined,
mary site of damage. In cases of lateralized and psychiatric and/or more detailed psycho-
brain dysfunction, left hemisphere lesions logical evaluation may be useful along these
(particularly more anterior) have been shown lines. Direct questioning about symptoms and
Principal Cognitive Domains for Assessment 341

behavioral changes obviously may be produc- incipient Alzheimer's disease might include the
tive, although in some instances, the line of following: administration of a complete WAIS-
inquiry must be more subtle, requiring indirect III to obtain a more precise assessment of current
probing of evidence for personality/emotional intellectual functioning and allow for analysis of
changes. Rephrasing questions and asking subtest patterns; possible omission of the DRS
about related symptoms in a variety of contexts since it may not be sufficiently sensitive to subtle
may also be a useful means of not only verifying deficits in such an individual in the early stages of
information, but in some cases, uncovering dementia; addition of the category test to
important data. provide a more challenging evaluation of
Formal psychological assessment in patients higher-order cognitive abilities than the WCST;
with neurological disorders requires some substitution of the Ravens Advanced Matrices
adjustments to standard interpretations, insofar for the coloured version, as the latter would
as the endorsement of certain symptoms may likely be too simplistic to be of much value in
simply relate to the neurological condition such a case; administration of a complete WMS-
rather than implying psychopathology. For III to provide a more thorough examination of
example, patients with neurologic disease often memory function.
demonstrate clinical elevations on the Minne- Alternatively, the test battery might be
sota Multiphasic Personality Inventory-2 dramatically altered in the case of a patient with
(MMPI-2) that should not be necessarily known dementia who is being followed to track
attributed to psychological disturbance. Scale their rate of cognitive decline (or perhaps to
8 in particular contains a number of items assess for any positive effects of cognitive
dealing with unusual symptoms, and scales 1 and enhancing medications). As a more extreme
3 relate to aspects of physical functioning which example, if an individual presents with clear
may be endorsed by neurological patients due to evidence of severe dementia during clinical inter-
their physical status, in the absence of psycho- view (or if the patient had a previous MMSE
pathology. Table 7 presents some of the more score of 10), it may suffice to administer the DRS
commonly used measures of psychological and obtain only a brief screening assessment of
functioning within the context of the neuropsy- cognitive domains (e.g., by administering all or
chological evaluation. parts of the CERAD neuropsychological test
battery). In such a case, a brief neuropsycholo-
4.11.4.12 Test Selection Issues gical examination may nevertheless be consid-
ered ªcomprehensiveº insofar as a range of
As noted, specialized core batteries of tests are abilities is assessed, and the addition of multiple
often assembled by neuropsychologists for complex measures may actually yield little
specific populations. Test selection is based on additional neurobehavioral data, depending
training and experience in combination with on the individual patient and referral questions.
information obtained from the relevant litera-
ture. Table 8 presents an example of a test battery
designed to assess dementia (both in terms of 4.11.4.13 Relationships Between
early detection as well as assistance with Neuropsychometry and the Behavioral
differential diagnosis and staging of disease). Geography of the Brain
Note that individual measures can readily be
added or dropped (i.e., a ªstepdown batteryº) Neuropsychological procedures, like the neu-
from the ªcoreº set of tests, depending upon rological examination, are designed to provide
specific patient needs and clinical questions. information about the integrity of human brain
An example of a modification to this battery in function. While these measures are sensitive to
the case of a high functioning business executive impaired cognition and may provide informa-
referred for evaluation secondary to suspected tion regarding lateralized or focal dysfunction, it

Table 7 Common measures of psychological and emotional functioning.

Minnesota Multiphasic Personality Inventory-2 Psychopathology/emotional status


(MMPI-2)
Personality Assessment Inventory (PAI) Psychopathology/emotional function
Millon Clinical Multiaxial Inventory (MCMI) Personality styles, psychopathology
Beck Depression Inventory (BDI) Depression
Inventory of Depressive Symptomatology (IDS) Depression
Profile of Mood States (POMS) Brief review of psychiatric symptoms
Symptom Checklist-90 (SCL-90) Brief review of physical and psychiatric symptoms
342 Neuropsychological Assessment of Adults

Table 8 Sample test battery for the evaluation of dementia.

Global cognitive functioning


Dementia Rating Scale
WAIS-III/WAIS-R IQ or IQ estimate (e.g., based on Vocabulary and Block Design)
NART-R estimated premorbid IQ
Reasoning/Cognitive flexibility
Wisconsin Card Sorting Test
Similarities and proverbs testing (WAIS-III)
Trail Making Test, Part B
Language
WAIS-III Vocabulary
Boston Naming Test
Verbal fluency (letter and category, e.g., FAS, animals)
Token Test and/or Comprehension subtest from BDAE
Visuospatial skills
WAIS-III Block Design
Clock drawing
Rey±Osterrieth Complex Figure (copy)
Attention/concentration
WAIS-III Digit Span (forward)
Trail Making Test, Part A
Learning and Memory
California Verbal Learning Test
Wechsler Memory Scale-3 (WMS-III) or Wechsler Memory Scale-Revised (WMS-R)
or portions thereof (e.g., Logical Memory and Visual Reproduction)
Rey±Osterrieth Complex Figure (immediate and delayed recall)
Motor functioning
Finger tapping test
Rapid Alternating Movements/Luria three-step

should be kept in mind that the neuroanatomical associated with aspects of emotional processing,
specificity of most of the tests is limited (Dodrill, and links between affective memory and limbic
1998). Furthermore, while the frontal, temporal, functioning have been shown. While functional
parietal, and occipital cortices subserve specific neuroanatomic distinctions exist (by lobe as well
functions to some degree, the amount of as in the anterior±posterior and lateral planes),
overlapping functional representations and the myriad interconnections that exist between
inter-relationships cannot be underscored. It is these areas and other cortical and subcortical
worth recalling that even ªprimaryº motor and systems indicate that various brain functions
sensory regions of the cortex have overlapping may be disrupted when specific circuits or
cellular representations and functions to some systems are damaged, even when the cortical
degree. Thus, the extent to which various region known to be largely responsible for a
cognitive functions are said to be ªlocalizedº given function may appear to be intact. Thus, a
is somewhat relative, although a number of careful probing of multiple aspects of brain±
functional neuroanatomic correlations exist behavior relationships is required to provide a
with various measures. Higher cognitive func- thorough assessment of function.
tions in particular tend to defy localization,
although frontal systems (particularly prefron-
tal cortices) tend to be heavily involved in the 4.11.4.14 Training in Neuropsychology
many abilities associated with executive func-
tioning (e.g., planning, reasoning, judgment, Because of the complexities inherent in the
insight) and personality. Limbic structures of neurobehavioral assessment of human brain
the temporal lobes play a predominant role in function, specialized training is needed for the
learning and memory for new information, with clinical neuropsychologist. Neuropsychologists
verbal memory skills being strongly associated must have a firm understanding of basic brain
with left hemisphere (specifically temporohip- organization and function, and need to be
pocampal) functions in most individuals. familiar with neurological disorders and neu-
Although less well supported, association is ropathology as well as their myriad behavioral
also often seen between nonverbal memory sequelae. As such, specific knowledge bases
abilities and nondominant temporal lobe func- must be mastered that go beyond the traditional
tioning. The limbic system is also intimately training of the doctoral degree in psychology.
References 343

Several documents relevant to training and ence is included in these recommendations.


practice in the specialty of clinical neuropsy- Some of the exit criteria for the postdoctoral
chology have been published over the years residency include demonstration of advanced
(e.g., Adams & Rourke, 1992; Eubanks, 1997; skills in neuropsychology, eligibility for state or
Reports of the INS-Division 40 Task Force on provincial licensure, and eligibility for board
Education, Accreditation, and Credentialing, certification in clinical neuropsychology.
1987), although the most comprehensive and
far-reaching proviso to date was published in
4.11.4.15 Challenges for Neuropsychological
1997. The Houston Conference on Specialty
Assessment
Training and Education in Neuropsychology
was held in September 1997 and outlines The neuropsychological evaluation repre-
recommended training and education guidelines sents the most sensitive existing means by which
for the future in clinical neuropsychology to assess the functional integrity of the human
(Hannay et al., 1998). For this conference, brain. Modern structural imaging provides us
neuropsychologists representing various back- with wonderfully detailed images of intracranial
grounds, employment settings, and organiza- contents, but even knowing that an abnormality
tions were selected from applications across the exists in a given region cannot provide much
USA and Canada and invited to participate. insight into how that patient is able to function
One of the goals of the Houston Conference was and process information in day-to-day life.
to develop the guidelines for an integrated Functional imaging has seen major advance-
model of education and training in clinical ments in recent years and is able to indicate
neuropsychology, and this includes the follow- areas of relative dysfunctionÐbut this, too,
ing generic definition of a clinical neuropsy- cannot indicate what specific cognitive abilities
chologist: are impaired, or to what extent. Such techniques
also offer little or no insight into how to help
A clinical neuropsychologist is a professional brain-injured patients adapt and better com-
psychologist trained in the science of brain± pensate for their altered cognitive processing, as
behavior relationships. The clinical neuropsychol- this is the purview of clinical neuropsychology.
ogist specializes in the application of assessment Neuropsychological assessment is faced with
and intervention principles based on the scientific the ongoing challenge of improving diagnostic
study of human behavior across the lifespan as it and prognostic procedures and integrating
relates to normal and abnormal functioning of the findings from clinical neuropsychology, cogni-
central nervous system. (p. 161)
tive psychology, and neuroscience into assess-
ment tools that can provide increasingly
The recommendations for coursework, train- detailed examinations of some of the intricacies
ing, and skills are provided in some detail, in- of human brain function. Many of the neurop-
cluding a model of integrated education across sychological tests in common use in the 1990s
the doctoral, internship, and postdoctoral or were developed years or decades ago, and even
residency levels. In addition to core courses in though some procedures are ªtried and trueº
clinical psychology (e.g., psychopathology, as- and continue to provide important neurodiag-
sessment, ethics), coursework in areas pertain- nostic and neurodescriptive information, the
ing specifically to brain±behavior relationships field must strive to keep up with the rapid
are recommended, including neuroanatomy, advances being made in neuroimaging and
psychopharmacology, and various aspects of neuroscience and be open to incorporating this
neuropsychology per se. It is noted in the Policy new information into assessment procedures.
Statement of the Houston Conference that the
proposed guidelines were not intended to be
applied retroactively or to those currently in ACKNOWLEDGMENTS
training at the time of the conference and Thanks to Kathy Saine, Ph.D., and Eric
proceedings, but rather, to serve as recommen- Smernoff, M.A., for their assistance with this
dations for future education and training in manuscript.
clinical neuropsychology. Whereas the specific
functions of neuropsychologists vary depending
upon setting, the following seven professional 4.11.5 REFERENCES
activities were put forth as being important for Adams, K. M., & Rourke, B. P. (1992). The TCN guide to
the clinical neuropsychologist: assessment, in- professional practice in clinical neuropsychology. Amster-
tervention, consultation, supervision, research dam/Lisse: Swets & Zeitlinger.
Adams, R. L., Smigielski, J., & Jenkins, R. L. (1984).
and inquiry, consumer protection, and profes- Development of a Satz-Mogel short form of the WAIS-
sional development. A postdoctoral residency R. Journal of Consulting and Clinical Psychology, 52,
of the equivalent of two years full time experi- 908.
344 Neuropsychological Assessment of Adults

Baddeley, A. (1992). Working memory. Science, 255, Complex Figure Copy Test. The Clinical Neuropsychol-
556±559. ogist, 7, 3±21.
Barr, W. B., Chelune, G. J., Hermann, B. P., Loring, D. Crum, R. M., Anthony, J. C., Bassett, S. S., & Folstein, M.
W., Perrine, K., Strauss, E., Trenerry, M. R., & F. (1993). Population-based norms for the Mini-Mental
Westerveld, M. (1997). The use of figural reproduction State Examination by age and educational level. Journal
tests as measures of nonverbal memory in epilepsy of the American Medical Association, 269, 2386±2391.
surgery candidates. Journal of the International Neurop- Cullum, C. M., Filley, C. M., & Kozora, E. (1995).
sychological Society, 3, 435±443. Episodic memory function in advanced aging and early
Beatty, W. W. (1993). Age differences on the California Alzheimer's Disease. Journal of The International Neu-
Card Sorting Test: Implications for the assessment of ropsychological Society, 1, 100±103.
problems solving by the elderly. Bulletin of the Psycho- Cullum, C. M., Heaton, R. K., & Grant, I. (1991).
nomic Society, 31, 511±514. Psychogenic factors influencing neuropsychological
Benton, A. L., & Hamsher, K.de-S. (1989). Multilingual perforance: Somatoform disorders, factitious disorders,
Aphasia Examination. Iowa City, IA: AJA Associates. and malingering. In H. O. Doerr & A. Carlin (Eds.),
Benton, A. L., Hamsher, K.de-S., & Sivan, A. B. (1994). Forensic neuropsychology (pp. 141±171). New York:
Multilingual Aphasia Examination (3rd ed.). Iowa City, Guilford Press.
IA: AJA Associates. Cullum, C. M., & Rosenberg, R. N. (1998). Memory
Bigler, E. D. (1988). Frontal lobe damage and neuropsy- lossÐwhen is it Alzheimer disease? Journal of the
chological assessment. Archives of Clinical Neuropsychol- American Medical Association, 279, 1689±1690.
ogy, 3, 279±297. Cullum, C. M., Thompson, L. L., & Smernoff, E. N.
Binder, L. M. (1993). An abbreviated form of the Portland (1993). Three word recall as a measure of memory.
digit recognition test. The Clinical Neuropsychologist, 7, Journal of Clinical and Experimental Neuropsychology,
104±107. 15, 321±329.
Blair, J. R., & Spreen, O. (1989). Predicting premorbid IQ: Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A.
A revision of the National Adult Reading Test. The (1987). California verbal learning test. Research edition.
Clinical Neuropsychologist, 3, 129±136. San Antonio, TX: The Psychological Corporation.
Bondi, M. W., Salmon, D. P., & Butters, N. (1994). Delis, D. C., Massman, P. J., Kaplan, E., McKee, R.,
Neuropsychological features of memory disorders in Kramer, J. H., & Gettman, D. (1991). Alternate form of
Alzheimer's disease. In R. D. Terry, R. Katzman, & K. the California Verbal Learning Test: Development and
L. Bick (Eds.), Alzheimer's disease (pp. 41±63). New reliability. The Clinical Neuropsychologist, 5, 154±162.
York: Raven Press. Delis, D. C., Squire, L. R., Bihrle, A., & Massman, P.
Borkowski, J. G., Benton, A. L., & Spreen, O. (1967). (1992). Componential analysis of problem-solving abil-
Word fluency and brain damage. Neuropsychologia, 5, ity: Performance of patients with frontal lobe damage
135±140. and amnesic patients on a new sorting test. Neuropsy-
Brandt, J. (1988). Malingered amnesia. In R. Rogers (Ed.), chologia, 30, 683±697.
Clinical assessment of malingering and deception. New Dodrill, C. B. (1997). Myths of neuropsychology. The
York: Guilford Press. Clinical Neuropsychologist, 11, 1±17.
Brandt, J. (1991). The Hopkins Verbal Learning Test: Dev- Eubanks, J. D. (1997). Clinical neuropsychology summary
elopment of a new verbal memory test with six equivalent information prepared by Division 40, Clinical Neurop-
forms. The Clinical Neuropsychologist, 5, 125±142. sychology, American Psychological Association. The
Brooker, B. H., & Cyr, J. J. (1986). Tables for clinicians to Clinical Neuropsychologist, 11, 77±80.
use to convert WAIS-R short forms. Journal of Clinical Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975).
Psychology, 42, 982±985. ªMini-mental state.º Journal of Psychiatric Research, 12,
Butters, N. (1985). Alcoholic Korsakoff's syndrome: An 189±198.
update. Seminars in Neurology, 4, 226±244. Franzen, M. D., Burgess, E. J., & Smith-Seemiller, L.
Butters, N., Grant, I., Haxby, J., Judd, L. L., Martin, A., (1998). Methods of estimating premorbid functioning.
McClelland, J., Pequegnat, W., Schacter, D., & Stover, Archives of Clinical Neuropsychology, 12, 711±738.
E. (1990). Assessment of Aids-related cognitive changes: Franzen, M. D., Iverson, G. L., & McCracken, L. M.
Recommendations of the NIMH Workshop on neurop- (1990). The detection of malingering in neuropsycholo-
sychological assessment approaches. Journal of Clinical gical assessment. Neuropsychology Review, 1, 247±279.
and Experimental Neuropsychology, 12, 963±978. Freedman, M., Leach, L., & Kaplan, E. (1994). Clock
Cermak, L. S. (1994). Neuropsychological explorations of drawing: A neuropsychological analysis. New York:
memory and cognition. New York: Plenum. Oxford University Press.
Chan, A. S., Butters, N., Salmon, D. P., Johnson, S. A., Gauthier, L., Dehaut, F., & Joanette, Y. (1989). The Bells
Paulsen, J. S., & Swenson, M. R. (1995). Comparison of Test: A quantitative and qualitative test for visual
the semantic networks in patients with dementia and neglect. International Journal of Clinical Neuropsychol-
amnesia. Neuropsychology, 9, 177±186. ogy, 11, 49±54.
Chelune, G. J. (1995). Hippocampal adequacy versus Goodglass, H., & Kaplan, E. (1987). The assessment of
functional reserve: Predicting memory functions follow- aphasia and related disorders (2nd ed.). Philadelphia: Lea
ing temporal lobectomy. Archives of Clinical Neuropsy- & Febiger.
chology, 10, 413±432. Grant, I., & Adams, K. M. (Eds.) (1996). Neuropsycholo-
Chelune, G. J., & Bornstein, R. A. (1988). WMS-R gical Assessment of neuropsychiatric disorders (2nd ed.).
patterns among patients with unilateral brain lesions. New York: Oxford University Press.
The Clinical Neuropsychologist, 2, 121±132. Greve, K. W., Farrwell, J. F., Besson, P. S., & Crouch, J.
Christensen, A.-L. (1984). The Luria method of examina- A. (1995). A psychometric analysis of the California
tion of the brain-impaired patient. In P. E. Logue & J. Card Sorting Test. Archives of Clinical Neuropsychology,
M. Schear (Eds.), Clinical neuropsychology: A multi- 10, 265±278.
disciplinary approach. Springfield, IL: C.C. Thomas. Guilmette, T., Hart, K., & Giuliano, A. (1993). Malinger-
Conners, C. K. (1995). Conners' continuous performance ing detection: The use of a forced choice method in
test. New York: Multi-Health Systems. identifying organic versus simulated memory impair-
Corwin, J., & Bylsma, F. W. (1993). Translations of ment. The Clinical Neuropsychologist, 7, 59±69.
excerpts from Andre Rey's Psychological examination of Halstead, W. C. (1947). Brain and intelligence. Chicago:
traumatic encephalopathy and P.A. Osterrieth's The University of Chicago Press.
References 345

Hannay, H. J., Bieliauskas, L. A., Crosson, B. A., Lewis, R. F., & Rennick, P. M. (1979). Manual for the
Hammeke, T. A., Hamsher, K. De-S., & Koffler, S. P. Repeatable Cognitive±Perceptual±Motor Battery. Clinton
(1998). Proceedings of the Houston Conference on Township, MI: Ronald F. Lewis.
Specialty Education and Training in Clinical Neuropsy- Lezak, M. D. (1995). Neuropsychological assessment (3rd
chology. Archives of Clinical Neuropsychology, 13, ed.). New York: Oxford University Press.
157±250. Libon, D. J., Malamut, B. L., Swenson, R., Sands, L. P., &
Heaton, R. K. (1981). Wisconsin Card Sorting Test Cloud, B. S. (1996). Further analyses of clock drawings
(WCST). Odessa, FL: Psychological Assessment Re- among demented and nondemented older subjects.
sources. Archives of Clinical Neuropsychology, 11, 193±205.
Heaton, R. K. (1992). Comprehensive norms for an Libon, D. J., Mattson, R. E., Glosser, G., Kaplan, E.,
expanded Halstead±Reitan battery: A supplement for the Malamut, B. L., Sands, L. P., Swenson, R., & Cloud, B.
Wechsler Memory Scale-Revised. Odessa, FL: Psycholo- S. (1996). A nine-word dementia version of the
gical Assessment Resources. California Verbal Learning Test. The Clinical Neuropsy-
Heaton, R. K., Chelune, G. J., Talley, J. L., Kay, G. G., & chologist, 10, 237±244.
Curtis, G. (1993). Wisconsin Card Sorting Test (WCST) Loring, D.W., Lee, G.P., Martin, R.C., & Meador, K.J.
Manual Revised and expanded. Odessa, FL: Psychologi- (1989). Verbal and visual memory index discrepancies
cal Assessment Resources. from the Wechsler Memory Scale-Revised: Cautions in
Heaton, R. K., Grant, I., & Matthews, C. G. (1991). interpretation. Psychological Assessment, 1, 198±202.
Comprehensive norms for an expanded Halstead±Reitan Loring, D. W., Martin, R. C., Meador, K. J., & Lee, G. P.
battery: Demographic corrections, research findings, and (1990). Psychometric construction of the Rey±Osterrieth
clinical applications. Odessa, FL: Psychological Assess- complex figure: Methodological considerations and
ment Resources. interrater reliability. Archives of Clinical Neuropsychol-
Heaton, R. K., Ryan, L., Grant, I., & Matthews, C. G. ogy, 5, 1±14.
(1996). Demographic influences on neuropsychological Luria, A. R. (1966). The working brain: An introduction to
test performance. In K. M. Adams & I. Grant (Eds.), neuropsychology. New York: Basic Books.
Neuropsychological assessment of neuropsychiatric disor- Malloy, P. F., & Richardson, E. D. (1998). Assessment of
ders (pp. 141±163). New York: Oxford University Press. frontal lobe functions. Journal of Neuropsychiatry, 6,
Hermann, B. P., Wyler, A. R., Richey, E. T., & Rea, J. M. 399±410.
(1987). Memory function and verbal learning ability in Markwardt, F. C., Jr. (1989). The Peabody Individual
patients with complex partical seizures of temporal lobe Achievement Test-Revised. Circle Pines, MN: American
origin. Epilepsia, 28, 547±554. Guidance Service.
Hiscock, M., & Hiscock, C. K. (1989). Refining the forced- Matarazzo, J. D. (1990). Psychological assessment versus
choice method for the detection of malingering. Journal psychological testing: Validation from Binet to the
of Clinical and Experimental Neuropsychology, 11, school, clinic, and courtroom. American Psychologist,
967±974. 45, 999±1017.
Ivnik, R. J., Malec, J. F., Smith, G. E., Tangalos, E. G., Matarazzo, J. D., & Prifitera, A. (1989). Subtest scatter and
Petersen, R. C., Kokman, E., & Kurland, L.T. (1992). premorbid intelligence: Lessons from the WAIS-R
Mayo's older Americans normative studies: WMS-R standardization sample. Psychological Assessment, 1,
norms for ages 56 to 94. The Clinical Neuropsychologist, 186±191.
6, 49±82. Mattis, S. (1988). Dementia Rating Scale (DRS). Odessa,
Kaplan, E. (1990). The process approach to neuropsycho- FL: Psychological Assessment Resources.
logical assessment of psychiatric patients. Journal of Matt-Maddrey, A., Cullum, C. M., Weiner, M. F., &
Neuropsychiatry, 2, 72±87. Filley, C. M. (1996). Premorbid intelligence estimation
Kaplan, E., Fein, D., Morris, R., & Delis, D. C. (1991). and level of dementia in Alzheimer's disease. Journal of
WAIS-R as a neuropsychological instrument. San Anto- The International Neuropsychological Society, 2, 1±5.
nio, TX: The Psychological Corporation. Meyers, J. E., & Meyers, K. R. (1995). Rey complex figure
Kaplan, E. F., Goodglass, H., & Weintraub, S. (1983). The test under four different administration procedures. The
Boston Naming Test (2nd ed.). Philadelphia: Lea & Clinical Neuropschologist, 9, 63±67.
Febiger. Milberg, W. P., Hebben, N., & Kaplan, E. (1996). The
Kertesz, A. (1982). Western Aphasia Battery. San Antonio, Boston process approach to neuropsychological assess-
TX: The Psychological Corporation. ment. In I. Grant & K. M. Adams (Eds.), Neuropsycho-
Klùve, H. (1963). Grooved pegboard. Lafayette, IN: logical assessment of neuropsychiatric disorders (2nd ed.,
Lafayette Instruments. pp. 58±80). New York: Oxford University Press.
Kozora, E., & Cullum, C.M. (1994). Qualitative features of Mirsky, A. F., Anthony, B. J., Duncan, C. C., Ahearn, M.
clock drawings in normal aging and Alzheimer's disease. B., & Kellam, S. G. (1991). Analysis of the elements of
Assessment, 1, 179±187. attention: A neuropsychological approach. Neuropsy-
Kozora, E., & Cullum, C. M. (1995). Generative naming in chology Review, 2, 109±145.
normal aging: Total output and qualitative changes Morris, J. C., Heyman, A., Mohs, R. C., et al. (1989). The
using phonemic and semantic constraints. The Clinical consortium to establish a registry for Alzheimer's disease
Neuropsychologist, 9, 313±320. (CERAD). Part I. Clinical and neuropsychological
Lacritz, L. H., & Cullum, C. M. (1998). Stroke and assessment of Alzheimer's disease. Neurology, 39,
depression. In A. J. Rush (Ed.), Mood & anxiety 1159±1165.
disorders (pp. 339±362). Philadelphia: Williams and Moscovitch, M. (1994). Cognitive resources and dual-task
Wilkins. interference effects at retrieval in normal people: The
Lansing, A. E., Ivnik, R. J., Cullum, C. M., & Randolph, role of the frontal lobes and medial temporal cortex.
C. (in press). An empirically derived short form of the Neuropsychology, 8, 524±534.
Boston Naming Test. Archives of Clinical Neuropsychol- Naugle, R. I., Chelune, G. J., Cheek, R., Luders, H., &
ogy. Awad, I. A. (1993). Detection of changes in material-
Lewis, R., & Kupke, T. (1977). The Lafayette Clinic specific memory following temporal lobectomy using the
Repeatable Neuropsychological Test Battery: Its develop- Wechsler Memory Scale-Revised. Archives of Clinical
ment and research applications. Paper presented at the Neuropsychology, 8, 381±395.
annual meeting of the Southeastern Psychological Naugle, R. I., Chelune, G. J., Schuster, J., Luders, H. O., &
Association, Hollywood, Florida. Comair, Y. (1994). Recognition memory for words and
346 Neuropsychological Assessment of Adults

faces before and after temporal lobectomy. Assessment, Royall, D. R., Cordes, J. A., & Polk, M. (1998). CLOX: An
1, 373±381. executive clock drawing task. Journal of Neurology,
Naugle, R. I., Cullum, C. M., & Bigler, E. D. (1998). Neurosurgery, and Psychiatry, 64, 588±594.
Introduction to clinical neuropsychology: A casebook. Russell, E. W. (1975). A multiple-scoring method for the
Austin, TX: ProEd. assessment of complex memory functions. Journal of
Naugle, R. I., Cullum, C. M., Bigler, E. D., & Massman, P. Consulting and Clinical Psychology, 43, 800±809.
J. (1986). Neuropsychological and computerized axial Russell, E.W., Neuringer, C., & Goldstein, G. (1970).
tomography volume characteristics in senile and pre- Assessment of brain damage: A neuropsychological key
senile dementia. Archives of Clinical Neuropsyhology, 1, approach. New York: Wiley.
219±230. Ryan, C. M., Morrow, L. A., Bromet, E., & Parkinson, D.
Nelson, H. E. (1982). The National Adult Reading Test K. (1987). Assessment of neuropsychological dysfunc-
(NART): Test Manual. Windsor, UK: NFER-Nelson. tion in the workplace: Normative data from the
Nelson, H. E., & O'Connell, A. (1978). Dementia: The Pittsburgh occupational exposures test battery. Journal
estimation of premorbid intelligence levels using the of Clinical and Experimental Neuropsychology, 9,
National Adult Reading Test. Cortex, 14, 234±244. 665±679.
Nies, K. J., & Sweet, J. J. (1994). Neuropsychological Sass, K. J., Sass, A., Westerveld, M., Lencz, T., Novelly, R.
assessment and malingering: A critical review of past and A., Kim, J. H., & Spencer, D. D. (1992). Specificity in
present strategies. Archives of Clinical Neuropsychology, the correlation of verbal memory and hippocampal
9, 501±552. neuron loss: Dissociation of memory, language, and
Osterrieth, P. A. (1944). Le test de copie d'une figure verbal intellectual ability. Journal of Clinical and
complexe. Archives de Psychologie, 30, 206±356; trans- Experimental Neuropsychology, 14, 662±672.
lated by J. Corwin & F.W. Bylsma (1993), The Clinical Schacter, D. L., & Tulving, E. (1994). What are the
Neuropsychologist, 7, 9±15. memory systems of 1994? In D. L. Schacter & E. Tulving
Paolo, A. M., TroÈster, A. I., & Ryan, J. J. (1997). Test- (Eds.), Memory systems 1994 (pp. 1±38). Cambridge,
retest stability of the California Verbal Learning Test in MA: MIT Press.
older persons. Neuropsychology, 11, 613±616. Silverstein, A. B. (1982). Two-and four-subtest short forms
Peavy, G., Jacobs, D., Salmon, D., Butters, N., Delis, D., of the Wechsler Adult Intelligence Scale-Revised. Journal
Taylor, M., Massman, P, Stout, J., Heindel, W., Kirson, of Consulting and Clinical Psychology, 50, 415±418.
E., Atkinson, J., Chandler, J., Grant, I., and the HIV Smith, G. E., Ivnik, R. J., Malec, J. F., Kokmen, Tangalos,
Neurobehavioral Research Center Group (1994). Verbal E., & Petersen, R. C. (1994). Psychometric properties of
memory performance of patients with human immuno- the Mattis Dementia Rating Scale. Assessment, 1,
deficiency infection: Evidence of subcortical dysfunction. 123±131.
Journal of Clninical and Exeperimental Neuropsychology, Spreen, O., & Strauss, E. (1998). Compendium of neurop-
16, 508±523. sychological tests (2nd ed.). New York: Oxford Uni-
Powell, D. H., Kaplan, E. F., Whitla, D., Weintraub, S., versity Press.
Catlin, R., & Funkenstein, H. H. (1993). Manual for Squire, L. R. (1987). Memory and brain. New York: Oxford
MicroCog: Assessment of cognitive functioning. San University Press.
Antonio, TX: The Psychological Corporation. Stern, R. A., Singer, E. A., Duke, L. M., Singer, N. G.,
Rapport, L. J., Dutra, R. L., Webster, J. S., Charter, R., & Morey, C. E., Daughtrey, E. W., & Kaplan, E. (1994).
Morrill, B. (1995). Hemispatial deficits on the Rey± The Boston qualitative scoring system for the Rey±
Osterrieth complex figure drawing. The Clinical Neu- Osterrieth Complex Figure: Description and interrater
ropsychologist, 9, 169±179. reliability. The Clinical Neuropsychologist, 8, 309±322.
Raven, J. C. (1994). Advanced progressive matrices sets I & Sternberg, R. J. (Ed.) (1997). Intelligence and lifelong
II. Oxford, UK: Oxford Psychologists Press. learning [Special issue]. American Psychologist, 52.
Raven, J. C. (1995). Colored progressive matrices sets A, Strub, R. L., & Black, F. W. (1988). Neurobehavioral
Ab, B. Oxford, UK: Oxford Psychologists Press. disorders: A clinical approach. Philadelphia: F.A. Davis.
Raven, J. C. (1996). Progressive Matrices: A perceptual test Stuss, D. T., & Benson, D. F. (1984). Neuropsychological
of intelligence. Individual Form. Oxford, UK: Oxford studies of the frontal lobes. Psychological Bulletin, 95,
Psychologists Press. 3±28.
Reitan, R. M., & Wolfson, D. (1993). The Halstead±Reitan Sweet, J. J., & Moberg, P. J. (1990). A survey of practices
Neuropsychological Test Battery: Theory and clinical and beliefs among ABPP and non-ABPP clinical neuro-
interpretation. Tucson, AZ: Neuropsychology Press. psychologists. The Clinical Neuropsychologist, 4, 101±120.
Reitan, R. M., & Wolfson, D. (1996). Relationships of age Tombaugh, T. N., McDowell, I., Krisjansson, B., &
and education to Wechsler Adult Intelligence Scale IQ Hubley, A. M. (1996). Mini-Mental State Examination
values in brain-damaged and non-brain-damaged (MMSE) and the Modified MMSE (3MS): A psycho-
groups. The Clinical Neuropsychologist, 10, 293±304. metric comparison and normative data. Psychological
Reports of the INS-Division 40 Task Force on Education, Assessment, 8, 48±59.
Accreditation, and Credentialing (1987). The Clinical Trenerry, M. R., Crosson, B., DeBoe, J., & Leber, W. R.
Neuropsychologist, 1, 29±34. (1990). Visual Search and Attention Test. Odessa, FL:
Reports of the Therapeutics and Technology Assessment Psychological Assessment Resources.
Subcommittee of the American Academy of Neurology TroÈster, A. I., Jacobs, D., Butters, N., Cullum, C. M., &
(1996). Assessment: Neuropsychological testing of Salmon, D. P. (1989). Differentiating Alzheimer's disease
adults. Neurology, 47, 592±599. from huntington's disease with the Wechsler Memory
Rey, A. (1941). L'examen psychologique dans les cas Scale-Revised. Clinics in Geriatric Medicine, 5, 611±632.
d'enceÂphalopathie traumatique. Archives de Psychologie, Warrington, E. K. (1984). Recognition Memory Test.
28, 286±340. Windsor, UK: NFER-Nelson.
Reynolds, C. R. (1997). Forward and backward memory Wechsler, D. (1945). A standardized memory scale for
span should not be combined for clinical analysis. clinical use. The Journal of Psychology, 19, 87±95.
Archives of Clinical Neuropsychology, 12, 29±40. Wechsler, D. (1981). Wechsler Adult Intelligence Scale-
Ross, E. D. (1985). Modulation of affect and nonverbal Revised. San Antonio, TX: The Psychological Corpora-
communication by the right hemisphere. In M. M. tion.
Mesulam (Ed.), Principles of behavioral neurology Wechsler, D. (1987). Wechsler Memory Scale-Revised. San
(pp. 239±257). Philadelphia: F.A. Davis. Antonio, TX: The Psychological Corporation.
References 347

Wechsler, D. (1997a). Wechsler Adult Intelligence Scale- consortium to establish a registry for Alzheimer's disease
Third Edition. San Antonio, TX: The Psychological (CERAD). Part V. A normative study of the neuropsy-
Corporation. chological battery. Neurology, 44, 609±614.
Wechsler, D. (1997b). Wechsler Memory Scale-Third Wilkinson, G. S. (1993). The Wide Range Achievement
Edition. San Antonio, TX: The Psychological Corpora- Test-3 manual. Wilmington, DE: Wide Range, Inc.
tion. Woodcock, R. W., & Mather, N. (1989). Woodcock±
Welsh, K. A., Butters, N., Mohs, R. C., Beekly, D., Johnson Tests of Achievement. Allen, TX: DLM Teach-
Edland, S., Fillenbaum, G., & Heyman, A. (1994). The ing Resources.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.12
Principles of Personality
Assessment
JERRY S. WIGGINS and KRISTA K. TROBST
University of British Columbia, Vancouver, BC, Canada

4.12.1 INTRODUCTION 350


4.12.2 PSYCHODYNAMIC TRADITION 350
4.12.2.1 Historical Background 350
4.12.2.2 Conceptual Framework 351
4.12.2.3 Assessment Instruments 351
4.12.2.4 Interpretive Principles 351
4.12.2.4.1 Projective hypothesis 352
4.12.2.4.2 Levels of functioning 352
4.12.2.4.3 Psychological adjustment 352
4.12.2.5 Applications and Current Status 352
4.12.3 PERSONOLOGICAL TRADITION 353
4.12.3.1 Historical Background 353
4.12.3.2 Conceptual Framework 353
4.12.3.3 Assessment Instruments 354
4.12.3.4 Interpretive Principles 354
4.12.3.5 Applications and Current Status 355
4.12.4 MULTIVARIATE TRADITION 355
4.12.4.1 Historical Background 355
4.12.4.2 Conceptual Framework 356
4.12.4.3 Assessment Instruments 356
4.12.4.4 Interpretive Principles 356
4.12.4.5 Applications and Current Status 357
4.12.5 EMPIRICAL TRADITION 357
4.12.5.1 Historical Background 357
4.12.5.2 Conceptual Framework 358
4.12.5.3 Assessment Instruments 359
4.12.5.4 Interpretive Principles 359
4.12.5.4.1 Empirical strategy 359
4.12.5.4.2 Constructive strategy 359
4.12.5.4.3 Interpersonal strategy 359
4.12.5.5 Applications and Current Status 360
4.12.6 INTERPERSONAL PARADIGM 360
4.12.6.1 Historical Background 360
4.12.6.2 Conceptual Framework 361
4.12.6.3 Assessment Instruments 361
4.12.6.3.1 Interpersonal Adjective Scales 361
4.12.6.3.2 Inventory of Interpersonal Problems 361
4.12.6.3.3 Impact Message Inventory 362
4.12.6.4 Interpretive Principles 362
4.12.6.5 Applications and Current Status 363

349
350 Principles of Personality Assessment

4.12.7 CONCLUSION 364


4.12.8 REFERENCE 365

4.12.1 INTRODUCTION believe that these traditions have contributed to


more or less invisible barriers to communication
Consider a hypothetical clinician who is both being erected among clinicians who work,
well trained and well informed about available knowingly or not, within them. This state of
personality tests. In her daily practice, she affairs is unfortunate because ªmembershipº in
employs a range of assessment instruments a tradition, as it were, is mainly determined by
(singly or in combination) to address the range where one went to graduate school, with whom
of diagnostic issues posed by her clients. When one studied, and the particular clinical setting in
ªthought disorderº is suspected, she might which one finds oneself operating. Further, we
occasionally administer a Rorschach and from are convinced that, at the present time, there are
most clients she will obtain a case history. Her some remarkably similar ideas being advocated
clients themselves may be asked to fill out the about the nature of personality and its assess-
NEO-Personality Inventory to clarify dimen- ment among those working within several
sions of normal personality functioning or the different traditions, and that recognition of
Minnesota Multiphasic Personality Inventory- these commonalties might advance the field as a
Revised (MMPI-2) to assess gross abnormality. whole (see Wiggins, in press).
She might also routinely administer the Inter- We would venture to suggest that most
personal Adjective Scales to assess the character readers will have familiarity with a few of these
and quality of clients' interpersonal relation- traditions, but not with some of the others. For
ships. example, those intimately familiar with the
Like many of her colleagues, our clinician may psychodynamic tradition are less likely to be
describe her theoretical orientation as ªeclecticº familiar with the multivariate and empirical
and indeed that is what she appears to be. But she traditions, and vice versa. Those who are fond
is also, perhaps unknowingly, operating within of obtaining life history data from interviews
larger historical contexts of assumptions and may not be fully aware of the rich history of the
beliefs that may include traditions of thought as personological tradition nor of the exciting
diverse as psychoanalysis, psychobiography, recent developments that have occurred within
psychometrics, empirical realism, and symbolic that tradition. And many may not be aware that
interactionism (respectively for the tests just the interpersonal tradition has fostered a strong
mentioned). We will refer to such contexts as ªunderground movementº within clinical psy-
different ªtraditionsº of personality assessment, chology for many decades.
by which we mean the background of a set of
unquestioned beliefs or orienting attitudes
within and against which personality tests are 4.12.2 PSYCHODYNAMIC TRADITION
administered and interpreted. 4.12.2.1 Historical Background
The traditions that will be considered in this
chapter are referred to as psychodynamic, David Rapaport (1911±1960) was a major
personological, multivariate, empirical, and systematist of psychoanalytic theory, the origi-
interpersonal. Such traditions are related, but nator of a now standard psychodiagnostic test
not equivalent, to certain theories of personality battery, and a highly influential mentor of the
(Hall & Lindzey, 1978) and psychopathology principal architects of the psychodynamic
(Millon, 1969). Similarly, these traditions are tradition in personality assessment. After
related, but not equivalent, to certain kinds of receiving his PhD from the Royal Hungarian
personality tests. Thus, the MMPI has been University, he emigrated to the USA in 1938 and
associated with a distinctive philosophy of shortly thereafter joined the staff of the
science (empirical realism) rather than with a Menninger Clinic where he eventually became
particular theory of psychopathology. The chief psychologist and head of the research
Rorschach test and the Wechsler scales have department. During and shortly after World
figured prominently within the ego psychologi- War II, the results of an extensive program of
cal perspective of the psychodynamic tradition, collaborative research were summarized in a
although both tests were constructed for quite two-volume Manual of diagnostic psychological
different purposes. testing (Rapaport, 1944±46) that eventually
We believe that these traditions reflect ªrevolutionized clinical psychology and influ-
different and potentially useful ways of viewing enced clinical psychologists the world overº
the entire psychodiagnostic enterprise. We also (Gill & Klein, 1967, p. 18).
Psychodynamic Tradition 351

Rapaport moved to the Austin Riggs Center interactions with significant others (ªobjectsº)
in 1948 and continued his extensive collabora- led to internalized representations of both
tions with colleagues at institutions such as the others (Jacobson, 1954) and self (Kohut,
Menninger Foundation, Yale University De- 1971) that serve as ªinternal working modelsº
partment of Psychiatry, and the Research (Bowlby, 1973) for later interpersonal relation-
Center for Mental Health at New York ships. From this perspective, the person is, from
University. Speaking collectively for workers birth (Main, Kaplan, & Cassidy, 1985), an
at these and other institutions, Roy Schafer object seeker (Fairbain, 1952) who establishes a
(1967) observed, ªAll of us are working within ªgratifying involvementº with other persons
the psychoanalytic psychodiagnostic tradition (Behrends & Blatt, 1985).
crystallized by David Rapaportº (p. 2). The shift in emphasis within psychoanalytic
theory from drives to internalized representa-
4.12.2.2 Conceptual Framework tions has been accompanied by a corresponding
shift in rationales for personality assessment, as
It is difficult to make a coherent set of exemplified by the ego-psychological approach
distinctions among the many ªpsychodynamicº of Allison, Blatt, and Zimet (1968) and the
theories that have been proposed as alternatives object-relations approach of Blatt and Lerner
to Freud's classical psychoanalytic theory (1983).
(Westen, 1990). It would seem to be generally
true however that since, and even before, 4.12.2.3 Assessment Instruments
Freud's death, there has been an increased
emphasis on the autonomous and adaptive Prior to Rapaport's influential contributions
functions of the ego and on the internal in the 1940s, assessment psychologists were
representation of self and others within the primarily technicians who administered IQ
ego domain. The aspect of classical psycho- tests. Since that time, they have become
analysis that has been de-emphasized, or even clinicians who administer batteries of both
abandoned by some, is the proposition that structured and projective tests of personality
personality and its development are largely and cognition. The multitest battery was
determined by unconscious libidinal and ag- advocated in view of the apparent complexity
gressive drives. of personality and cognition and their inter-
Rapaport (1960) must be counted among the related functions, as well as for the purpose of
most rigorous systematists of classical psycho- gathering normative data that would shed light
analytic theory and he was also a major on those complexities (Rapaport, Gill, &
contributor to the alternative perspective Schafer, 1946). At Menninger, the composition
known as ªego psychology.º In both his of the battery reflected judgments regarding the
theoretical and assessment work, his principal potential of each instrument to yield measures
interest was in thought processes and their that might be interpreted within the ego
course of development from drive-oriented to psychological framework of Rapaport and
reality-oriented processes. In this context, his associates.
Hartmann's (1939) concept of the independence The projective component of the original
(from drive) of the ego and its role in fostering Menninger battery included the Rorschach Test
active processes of adaptation were appealing to (Rorschach, 1921), the Thematic Apperception
him, as was Erikson's (1950) emphasis on the Test (TAT; Morgan & Murray, 1935), and a
cultural and historical contexts in which the ego locally constructed Word Association Test. The
develops. Rapaport (1960) developed and nonprojective component included the Bellevue
expanded the metaconcept of structure which Scale (Wechsler, 1941), the Babcock Test of
he felt was the ªprime requisite for progress mental efficiency (Babcock, 1933), the Sorting
toward dimensional quantificationº (p. 98) and Test of concept formation (Goldstein &
which he applied broadly to many aspects of Scheerer, 1941), and the Hanfmann±Kasanin
psychological functioning and cognitive orga- test of concept formation (Hanfmann &
nization (see Gill & Klein, 1967). Kasanin, 1937). As this test battery evolved
The generally increased emphasis upon ego over a 20-year period, the Rorschach, TAT, and
functions in postclassical psychoanalysis was WAIS (Wechsler, 1958) became the more or less
also reflected in the currently important standard core of the psychodynamic test battery
psychodynamic alternative of object relations (e.g., Allison et al., 1968).
theory. Proponents of this view challenged the
classical idea that cathexes of external ªobjectsº 4.12.2.4 Interpretive Principles
(including persons) served mainly as vehicles
through which instinctual energies were dis- The vast literature on interpretive principles
charged. Instead, it was postulated that early associated with the psychodynamic paradigm
352 Principles of Personality Assessment

defies easy summarization. Rapaport's theore- reality-oriented secondary modes of thought


tical work (e.g., Gill, 1967) and the writings (WAIS) to those which allow for more personal,
of those whom he influenced (e.g., Holt, 1967) less conventionally constrained thinking (TAT)
are, understandably, highly technical in nature. and finally those which allow for considerably
novel, personalized, and regressive modes of
Of greater help to the clinician are the case
thinking (Rorschach). (Allison et al., 1968, p. vii)
summaries (Allison et al., 1968; Prelinger &
Zimet, 1964; Rapaport et al., 1946; Schafer,
Assessment of level of functioning has been
1948) and the individual treatments of the
greatly facilitated by Holt's innovative proce-
Rorschach (Schafer, 1954), TAT (Holt, 1951),
dures for assessing primary and secondary
and WAIS (Blatt & Allison, 1981). The
process in the Rorschach (Holt & Havel, 1960).
following statements regarding the projective
hypothesis, levels of functioning, and psycho-
logical adjustment are meant only to convey 4.12.2.4.3 Psychological adjustment
some of the flavor of three, among many,
important interpretive principles. The Rorschach, TAT, and WAIS may be
employed to assess both adaptive capacities and
impairments in psychological functioning and
4.12.2.4.1 Projective hypothesis to identify the functions impaired in different
psychiatric diagnostic groups. Adjustment and
The projective hypothesis states that ªAll
maladjustment may be assessed with reference
behavior manifestations of the human being,
to the following postulated sequence:
including the least and the most significant, are
revealing and expressive of his personality, by
Certain patterns of defense mechanisms are
which we mean that individual principle of adopted and these determine specific strengths
which he is the carrierº (Rapaport, 1942, p. 92). and weaknesses in psychological functioning
Thus, people's possessions±±clothes, automo- which then become characteristic of the adjust-
biles, furniture±±are expressive of their person- ment of the personality; with the onset of mal-
alities and reflect single acts of choice; in their adjustment, an exaggeration or breakdown in
totality, they reflect the organization of such these strengths and weaknesses characteristic for
choices. Responses to the ambiguous stimuli that maladjustment occurs which can be measured;
of projective tests may also be thought of in this leads to a diagnostic differentiation. (Rapa-
terms of ªchoice,º although such choices are port, Menninger, & Schafer 1947, p. 249)
much less conscious or volitional in nature.
Thus, responses to a Rorschach inkblot may be 4.12.2.5 Applications and Current Status
thought of as reflecting ªchoicesº between
forms, colors, shadings, and so forth, to which Since the late 1960s, theory and method
a subject imparts meaning through organiza- within the psychodynamic paradigm of person-
tion. Responses to a TAT card also involve both ality assessment have evolved into an object
choice (e.g., which figure to identify with) and relations perspective that is highly compatible
organization (e.g., sequence of events). Re- with contemporary formulations of social
sponses to intelligence and concept formation cognition, information processing, attachment
tests involving choice and organizational pro- research, and ego development (see Westen,
cesses may be used as ªnonprojective tests of 1990). Sidney Blatt and his associates at Yale
personality,º given an adequate theory of have been primarily responsible for this para-
ªfunctions underlying the reactions and digm shift. From the early 1950s until his
achievements on these testsº (Rapaport, 1946, untimely death in 1960, Rapaport contributed
p. 228). to the development of ego psychology and its
application to personality assessment; since the
early 1970s, Blatt has contributed to the
4.12.2.4.2 Levels of functioning development of object relations theory and its
The psychoanalytic model of primary (plea- application to current personality assessment
sure principle) and secondary (reality principle) methods.
modes of thought is meant to account for both a ªThe study of the representation of the
developmental sequence and characteristics of human form on the Rorschach is an ideal data
the mature adult (Rapaport, 1951). Conse- base for assessing an individual's representa-
quently, there is a continuum of adult psycho- tional world±±his conception of people, includ-
logical functioning that may be assessed with an ing himself, and their actual and potential
appropriate battery of tests. interactions. The representation of people, that
is, object representations, have both structure
This continuum ranges from functioning in situa- and contentº (Blatt & Lerner, 1983, p. 8). The
tions which put a premium on highly logical, structural aspects of object representations are
Personological Tradition 353

emphasized in the Rorschach scoring system World War II, a variant of this procedure was
developed by Blatt, Brenneis, Schimek, and employed in the selection of intelligence agents
Glick (1976) that provides a developmental and saboteurs (Office of Strategic Services,
analysis of object representations in terms of 1948), and subsequently this approach spawned
such categories as differentiation, articulation, a variety of successful peacetime assessment
and integration of object and action. The programs (e.g., MacKinnon, 1975; Stern, Stein,
content and affective themes of object repre- & Bloom, 1956).
sentations are emphasized in the Rorschach Murray's enduring contributions to the
scoring system developed by Mayman (1967) personological paradigm go well beyond those
that emphasizes phenomenological dimensions just mentioned. During his 60 years at Harvard,
such as affect states, ego states, experience of Murray was a source of personal inspiration for
self, and sense of identity. an extraordinarily talented and diverse group of
An object relations scoring system for the students, colleagues, and visitors who, over the
TAT has also been developed by Westen (1991) years, have applied and expanded his ªperso-
in which stories are rated for complexity of nologyº up until the present time. Early
representations of people, affective tone of associates at the clinic included Samuel Beck,
relationship paradigms, capacity for emotional Erik Erikson, Jerome Frank, Daniel Levenson,
investment, and understanding of social caus- Donald MacKinnon, Silvan Tomkins, and
ality. A similar scoring system has been Robert White. Dan McAdams and William
developed for the Picture Arrangement subtest McKinley Runyan are more recent products of
of the WAIS (Westen, 1991, p. 72). In this Harvard tradition.
conclusion, the three major instruments of the
original Menninger battery continue to show
promise under a revised conceptual orientation 4.12.3.2 Conceptual Framework
that is most compatible with current thinking in
personality, social, clinical, and developmental Murray's (1938) personology provided a set
psychology. of orienting attitudes that guided the multiform
(many assessors, many tests), organismic (hol-
istic) approach to the study of lives over time
4.12.3 PERSONOLOGICAL TRADITION (e.g., White, 1966, 1975). Murray's (1959)
4.12.3.1 Historical Background conceptual framework was meant to serve as
ªthe scaffold of a comprehensive system,º
The ªstoriedº nature of human conduct rather than as a completed formal theory.
(Sarbin, 1986) has been recognized for many Nevertheless, Murray (1938) introduced a
centuries within the disciplines of history, number of concepts that have subsequently
literature, and most of the social sciences. proven useful in guiding the study of lives. By
Within the much more recent personological taking the life cycle of the individual as the
tradition in personality assessment, the person largest unit of study, Murray was committed to
is taken to be the basic unit of observation and studying personality ªthe long wayº (White,
the person's life story is considered to be the 1981), and he introduced units of time varying in
preferred way in which the person is to be duration and complexity from the proceeding
understood (McAdams, 1994). The origins of (single episode) to the unity-thema (central life
the personological tradition, and in fact the motif). A thema may be understood in terms of
origins of academic personality psychology the interaction of needs (for which Murray
itself, can be traced most directly to Harvard developed his famous taxonomy) and press
University in the 1930s. At that institution, (environmental facilitation or obstruction of a
Gordon Allport (1937) wrote the first textbook need). The TAT (Murray, 1943) was developed
on personality, which defined the person as the as a projective measure of need±press interac-
basic unit of observation. Subsequently, he tion and has been utilized in large-scale studies
emphasized the use of personal documents in of needs for achievement (McClelland, 1961),
psychobiography (Allport, 1942, 1965) and power (Winter, 1973), and intimacy (McAdams,
spent a lifetime pondering the question of 1989).
ªHow shall a psychological life history be Erik Erikson's (1950) psychodynamic theory
written?º (Allport, 1967). At the Harvard of personality development provided a more
Psychological Clinic, Murray (1938) introduced explicit account of the stages of psychosocial
a multiform organismic method for studying development over the entire life span (from the
ªlives in progressº (White, 1952), in which an original psychosocial crisis of ªtrust versus
interdisciplinary team of investigators applied a mistrustº to the final crisis of ªego-integrity
variety of assessment procedures to the study of versus despairº). His theory informed his own
individuals over the course of their lives. During classic psychobiographies, such as those of
354 Principles of Personality Assessment

Martin Luther (Erikson, 1958) and Gandhi ticular person) and surveyed attempts to
(Erikson, 1969), as well as contributing to more establish acceptable criteria for interpretations
recent psychobiographical efforts (e.g., Stewart, at each level. Although personologists in gen-
Franz, & Layton, 1988). More recently, in a eral, and clinicians in particular, are primarily
book entitled The stories we live by: Personal concerned with the individual, an awareness of
myths and the making of the self, McAdams the larger enterprise that Runyan portrays
(1993) presented a neo-Eriksonian theory of serves to clarify the nature and goals of
identity development that is enriched by more individual personality assessment. And for
contemporary concepts such as image, proto- those few who might seriously be considering
type, and script, and that is guided by the becoming psychobiographers, Alan Elms'
metatheoretical concepts of agency and com- (1994) book is essential reading.
munion (Bakan, 1966). This work has been The 25 different assessment procedures
hailed as ªthe most original and important new employed for the study of individuals at the
book on personality theory since George Kelly's Harvard Clinic (Murray, 1938) were indeed
Psychology of personal constructsº (Hogan, ªmultiformº in terms of both methods and the
1994, p. 356). The heuristic potential of assessors who employed them; for example,
McAdams' conceptual framework is suggested autobiography (Murray), hypnosis (White),
by the quality and quantity of critical response it level of aspiration (Frank), dramatic produc-
has elicited from personality psychologists tions (Erikson), and Rorschach (Beck). Perhaps
representing diverse theoretical perspectives of greatest interest to the contemporary clin-
(see McAdams, 1996 and peer commentary ician is the case study method in which some
therein). Although McAdams' framework is form of interview is employed with all the
more formal, explicit, and testable than earlier attendant advantages and limitations of retro-
efforts, its roots in the formulations of Allport spective, introspective, and qualitative methods
and Murray are easily traceable (McAdams, (see Runyan, 1982, chap. 8). The procedures
1994). employed by McAdams (1993, chap. 10)
provide an excellent example of a theory-driven
approach to the interview.
4.12.3.3 Assessment Instruments
In what is now widely recognized as the
definitive general treatise on life histories and 4.12.3.4 Interpretive Principles
psychobiography, Runyan (1982) observed:
The variety of personality research methods
It should be clear that there is no single life history for studying an individual's life history is
method, any more than there is a single personality reflected in a similar variety of interpretive
research method, and that life histories may be principles associated with different methods. In
studied through phenomenological self-reports, this context, the interpretive principles eluci-
archival research, prospective longitudinal re- dated by Alexander (1990) would appear to be
search, and experimental research. (p. 6) ªmainstreamº in the sense that they evolved
directly from the framework of Murray and his
Personality assessment psychologists from co-workers (especially Tomkins 1947, 1979) at
Murray to McAdams have recognized the the Harvard Psychological Clinic.
need for methodological pluralism (Craik, In analyzing data from directed interviews
1986) in the study of lives, and for that reason or autobiographical essays, Alexander (1990)
the ªboundariesº of the personological para- looks for recurring dynamic sequences (scripts,
digm have historically been somewhat more themas, guiding messages) that may be revealed
permeable than those of the other paradigms by ªletting the data speakº or by ªasking the
considered in this chapter. Nevertheless, it data a question.º In the former method, the
would be inaccurate to characterize this para- typically large data set may be reduced by
digm as ªeclectic,º because the diversity of identifying ªsignificantº material according to
methods employed reflects the different levels the following nine criteria of salience: (i)
of analysis encompassed by the term ªlife primacy (that which occurs first), (ii) frequency
history,º rather than the selective use of theories (that which occurs often), (iii) uniqueness (that
and methods to ªexplainº a given life. Follow- which is unusual or odd), (iv) negation (that
ing Kluckhohn and Murray (1953), Runyan which is denied or disavowed), (v) emphasis
(1982) distinguished three levels of generality in (that which is either overemphasized or under-
the social sciences: the universal (general laws of emphasized), (vi) omission (that which is
human behavior); the group (differences in sex, missing by normative standards or by implica-
social class, culture); and the individual (dis- tion), (vii) error or distortion (factual errors and
tinctive, distinguishing characteristics of a par- Freudian slips), (viii) isolation (that which does
Multivariate Tradition 355

not ªfitº and non sequiturs), and (ix) incomple- 4.12.4 MULTIVARIATE TRADITION
tion (that which is not finished or lacks closure).
The method of ªasking the data a questionº is 4.12.4.1 Historical Background
similar to that employed in the analysis of TAT
protocols for predetermined categories such as Sir Francis Galton's (1888) method for
ªpower motivationº (Winter, 1973). The data- analyzing the ªco-relationsº between twin pairs
base of interview material is reduced by was refined by Pearson (1896) and utilized
selecting every sequence or incident related to effectively by Spearman (1904) in identifying a
the questions posed by the investigator. When ªgeneral factorº of intelligence (g) that ap-
applied to interview material, this method peared to ªunderlieº various tests of mental
retains the ªprojectiveº advantage of addressing abilities. Factors are statistical abstractions that
issues that the storyteller did not consciously summarize the relations among test scores in
intend to describe or deal with, while circum- terms of underlying or ªlatentº variables which
venting problems such as the identification of may be thought of as operating at several levels.
the ªheroº of the narrative (Alexander, 1990). For example, the intelligence quotient (IQ)
could be considered to be a general higher-order
4.12.3.5 Applications and Current Status factor that summarizes relations between verbal
and performance major group factors, which in
In recent years, there appears to have been a turn, summarize relations among multiple
ªback to basicsº movement in personality minor group factors (Vernon, 1950).
assessment research (Wiggins & Pincus, 1992) Raymond B. Cattell, the founding father of
that has rekindled interest in the fundamental the multivariate tradition of personality assess-
assumptions of several paradigms, including the ment, was a student of Spearman and was
personological: ªOnce, again, it is okay to study among the first to apply factor-analytic meth-
the 'whole person.' Better, contemporary per- odology to the study of temperament (e.g.,
sonologists insist, as did pioneers like Gordon Cattell, 1933). Using cluster-analytic proce-
Allport, that such an endeavor is the persono- dures, Cattell (1943) reduced Allport and
logist's raison d'eÃtreº (McAdams, 1988, p. 1). Odbert's (1936) exhaustive compilation of
In terms of clinical applications, it should be trait-descriptive terms from Webster's unab-
borne in mind that the personological tradition ridged dictionary to 35 ªsurface traitsº (see
has tended to focus on relatively normal, John, 1990). He administered the 35 surface-
nonclinical samples and that within this tradi- trait variables (e.g., talkative vs. silent) to
tion, life histories have been viewed as relatively respondents in a series of peer-rating studies
veridical accounts of what has happened in a and eventually concluded that no less than 16
person's life (Runyan, 1982) or as imaginative ªsource traitsº could account for the interrela-
reconstructions of the past (McAdams, 1993). It tions among the 35 surface clusters (Cattell,
is also important to recognize that personality 1949).
assessment within the personological paradigm The Guilfords (1936, 1939) were also
occurs on a different level than traditional pioneers in the application of factor analysis
psychodiagnostic assessment (Alexander, 1990). to personality data, and they assembled evi-
Whereas personological assessment attempts to dence for the four interpretable factors of
understand current personality functioning in ªshyness,º ªrhathymia,º ªdepression,º and
terms of an individual's own life history, ªliking for thinkingº which would now be
traditional psychodiagnostic assessment is con- called ª extraversion,º ªconscientiousness,º
cerned with determining the individual's mem- ªneuroticism,º and ªintellect,º and which
bership in a group or class of individuals that (together with ªagreeablenessº) constitute the
differs from other groups in terms of psycho- currently important five-factor model (FFM) of
pathology. personality (Digman, 1996). A decade later,
Eysenck (1947) factored the intercorrelations
A major difference between the two approaches is among presenting symptoms of psychiatric
that the results of the former [personological] can patients and found two factors (extraversion
easily be directed toward answering the questions and neuroticism) that had both theoretical
intended by the latter [psychodiagnostic]. The (Jung, 1971) and empirical (MacKinnon,
reverse is, unfortunately, a low probability event 1944) precedents (see Eysenck, 1990,
in anything other than a global sense. To designate
someone as obsessive or hysteric, or depressive, or
pp. 95±98). By the 1950s, the systems of Eysenck
schizophrenic will place the focus on particular (1953), Cattell (1957), and Guilford (1959) had
salient aspects of functioning but say little about become extensive alternative theories of person-
the dynamics in that individual leading to that ality structure.
particular form of functioning. (Alexander, 1990, The history of the multivariate tradition is a
p. 7) curious one that has recently been reconstructed
356 Principles of Personality Assessment

by Digman (1996). Briefly, Digman argues that personal theory (Wiggins & Trapnell, 1996).
as early as the 1930s, there was evidence for the Moreover, a series of investigations by Costa
generality and replicability of a FFM of and McCrae revealed that instruments em-
personality that consisted of surgency/extraver- ployed in the other four traditions of person-
sion (E), agreeableness (A), conscientiousness ality assessmentÐpsychodynamic (Myers &
(C), neuroticism (N), and intellect/openness McCaulley, 1985), personological (Jackson,
(O). Over a period of almost 50 years, a series of 1984), empirical (Hathaway & McKinley,
cumulative studies may be traced (e.g., Fiske, 1983), and interpersonal (Wiggins, Trapnell,
1949; Norman, 1963; Tupes & Christal, 1961) & Phillips, 1988)Ðshowed meaningful conver-
that led to an emerging consensus on the utility gences with some or all of the dimensions of the
and generalizability of this five-factor represen- FFM (Wiggins & Trapnell, 1997).
tation (e.g., Digman, 1979; Goldberg, 1977;
Wiggins, 1973b). By the 1980s, there were three 4.12.4.3 Assessment Instruments
distinctive interpretations of the nature of these
five factors (Costa & McCrae, 1985; Goldberg, The NEO Personality Inventory (NEO PI-R;
1981; Hogan, 1986), that were followed in the Costa & McCrae, 1992b) is the instrument of
1990s by still more interpretive perspectives choice for measuring the dimensions of the FFM
(e.g., Buss, 1996; Wiggins & Trapnell, 1996). It of personality. This test provides global mea-
should be noted, however, that although there sures (domain scores) for each of the five factors
are a number of workers within the multivariate and six more specific measures (facet scores)
paradigm who agree on the number, if not the within each of the five domains: (i) neuroticism
nature, of the factors in the FFM, there are (anxiety, angry hostility, depression, self-con-
many who strongly contest both the number of sciousness, impulsiveness, vulnerability); (ii)
factors and their significance (e.g., Block, 1995; extraversion (warmth, gregariousness, assertive-
Eysenck, 1992; Hough, 1992; Waller & Ben- ness, activity, excitement-seeking, positive emo-
Porath, 1987). tions); (iii) openness (fantasy, aesthetics,
feelings, actions, ideas, values); (iv) agreeable-
4.12.4.2 Conceptual Framework ness (trust, straightforwardness, altruism, com-
pliance, modesty, tender-mindedness); and (v)
Cattell's (1957) master plan for constructing a conscientiousness (competence, order, dutiful-
theory of personality presupposed a rigorous ness, achievement striving, self-discipline). The
and representative sampling of the language ªastonishingly fruitful research collaborationº
personality sphere that, in effect, defined the (Block, 1995) of Costa and McCrae has
universe of content of behaviors to be explained produced scores of publications that document
(Wiggins, 1984). To make certain that this the substantive, structural, and empirical valid-
universe of content was properly defined, ity of this instrument.
Norman (1967) replicated the earlier lexical
research of Allport and Odbert (1936), using a 4.12.4.4 Interpretive Principles
more recent unabridged dictionary, and subse-
quently employing more rigorous clustering There are notable differences in interpretive
procedures than those used by Cattell (see John, principles suggested by proponents of different
1990). Norman's taxonomy of personality multivariate instruments and these would
attributes was subsequently employed by Gold- appear to reflect different claims as to the
berg (1981, 1990, 1993) in a program of extensiveness and importance of the ªuniverse
methodologically elegant research on the nat- of contentº measured by each instrument, and
ural language of personality that firmly estab- the theoretical basis on which that content is
lished what he termed the ªBig Fiveº interpreted for each instrument. With respect to
dimensions identified by earlier investigators. the former, the differences among alternative
Goldberg's taxonomic research program theories of personality structure within the
revitalized interest in the five-factor model and multivariate tradition have often reflected
within a relatively brief period of time a variety preferences for different methods of factoring
of new instruments were developed (e.g., Botwin correlations among the phenotypic variables of
& Buss, 1989; Costa & McCrae, 1985; Gold- personality study. Thus, Cattell, who argued for
berg, 1992; Hogan, 1986; Trapnell & Wiggins, 16 primary source traits, as well as the originator
1990) that measured these five dimensions of the multiple-factor method himself (Thur-
within an equal variety of conceptual frame- stone, 1934), have both been accused of ªover-
works, for example, evolutionary theory (Buss, factoringº (Digman, 1996); while Eysenck
1996), trait theory (McCrae & Costa, 1996), (1992), who disputes anything beyond three
lexical theory (Saucier & Goldberg, 1996), factors, has been accused of ªunderfactoringº
socioanalytic theory (Hogan, 1996), and inter- (Costa & McCrae, 1992a). That almost half a
Empirical Tradition 357

century of essentially methodological disagree- clinician's development of empathy; to help


ments may be approaching resolution can be select appropriate treatments; to identify the
seen in the emerging consensus that the FFM of client's strengths; and to anticipate the out-
personality is an appropriate ªworking modelº come, duration, and course of therapyº (p. 395).
of the universe of content of personality A detailed and perceptive account of how this
structure that may (and should) be expanded, might be done has been provided by Miller
as necessary, whenever the incremental validity (1991). The notable longitudinal stability of
of additional dimensions are demonstrated NEO PI-R dimensions in the adult personality
(Wiggins, 1996). (McCrae & Costa, 1990) has led to a revised
The theoretical bases on which a five-factor view of the therapeutic enterprise:
profile may be interpreted are of greater interest
to the practitioner. And, as is true of a number we expect that clients will bring these dispositions
of instruments that originated in other person- with them to psychotherapy, and that the therapist
ality assessment traditions, it is possible to should (a) take them into account when trying to
interpret the FFM from a variety of theoretical understand the individual and his or her problems,
and (b) tailor the therapy to fit the needs and styles
perspectives. The NEO PI-R (Costa & McCrae, of the client. (McCrae, 1991, p. 406)
1992b) provides explicit principles for inter-
preting normal and abnormal behavior from the The most extensive psychodiagnostic appli-
standpoint of modern trait theory (McCrae & cation of the FFM in general, and of the NEO
Costa, 1996). The Hogan Personality Inventory PI-R in particular, has been to the diagnosis and
(Hogan & Hogan, 1992) provides explicit understanding of personality disorders (Costa
interpretive principles for personnel selection & Widiger, 1994). The definition of these
in the workplace from the standpoint of socio- disorders in terms of dysfunctional personality
analytic theory (Hogan, 1996). A new edition of traits, by a committee of the APA (1980),
the venerable Sixteen Personality Factor Ques- stimulated an unprecedented collaborative ef-
tionnaire (Cattell, Cattell, & Cattell, 1993) fort among psychiatrists, psychologists, and
provides interpretive principles for five ªglobal psychometricians to find common grounds
factor scalesº (Cattell, 1994) from the standpoint for characterizing these disorders. It was found,
of a long-standing ªmainstreamº personality for example, that both the APA diagnostic
theory (Wiggins, 1984); and the possibilities for criteria and the clinical literature on personality
interpreting other well-established instruments, disorders were compatible with descriptions of
such as the Personality Research Form (Jack- each disorder generated from the facet and
son, 1984), from an FFM perspective, are under domain scales of the NEO PI-R (see Widiger,
active investigation (e.g., Jackson, Paunonen, Trull, Clarkin, Sanderson, & Costa, 1994,
Fraboni, & Goffin, 1996). Table 1, p. 42).
Subsequent advances in relating the FFM to
4.12.4.5 Applications and Current Status the personality disorders are too numerous to
summarize, but may be found in current
Historically, there has been little consensus journals, particularly the Journal of Personality
regarding the relation between the normal Disorders. The current literature may be viewed
dimensions of personality, measured by in- as reflecting yet another milestone in the long-
ventories developed within the multivariate standing relation between the FFM and the
tradition, and the dimensions and/or categories multivariate paradigm, which Digman (1994)
of psychopathology. Recently, however, there characterized as follows:
has been an increased conceptual and empirical
focus on this issue, stimulated in part by the Now, after many years of lying on the closet shelf
definition of personality disorders in terms of of personality theory, the model has been dusted
personality traits by the American Psychiatric off, ªas good as new,º and appears to be for many
Association (APA; 1980) and by the avail- researchers . . . a very meaningful theoretical
ability of new instruments and techniques structure for organizing the myriad specifics im-
for addressing this problem (Strack & Lorr, plied by the term personality. (p. 13)
1994).
It would appear that the global dispositions
measured by the NEO PI-R provide a compre- 4.12.5 EMPIRICAL TRADITION
hensive assessment of emotional (N), interper- 4.12.5.1 Historical Background
sonal (E & A), motivational (C), and cognitive
(O) styles that are not measured by traditional The MMPI (Hathaway & McKinley, 1943) is,
clinical instruments. As Costa (1991b) noted, and has been for many years, the most widely
ªThis portrait can be used in psychotherapy: to used inventory of personality and psycho-
formulate a tentative diagnosis; to aid the pathology. To understand the 50-year history
358 Principles of Personality Assessment

of this instrument is to understand the history of 4.12.5.2 Conceptual Framework


objective personality assessment for the same
time period. As Craik (1986) put it, in his The rationales underlying four distinguish-
historical survey of personality research meth- able strategies of objective (as opposed to pro-
ods, ªthe MMPI came to serve as the centerpiece jective) test construction are based on differing
of this period's [post-WWII] predominant or views regarding the meaning of subjects' re-
mainstream agendaº (p. 21). Within that sponses to objective test items (Wiggins, 1973b)
agenda, one can discern the operation of two (i) The rational (correspondence) strategy.
separate dialectical processes over time: on- This assumes a one-to-one correspondence
going disputes between the developers of the between a subject's ªreportº (ªI am anxiousº)
MMPI and their critics and bipolar shifts in and a palpable internal state of the subject (the
conceptualization that have occurred within the experience of anxiety) (Buchwald, 1961). This
evolving conceptual and interpretive frame- assumption was the cornerstone of introspec-
works of the developers of the MMPI. The tionism in early experimental psychology and it
ongoing disputes reflect the fact that, because of was the principal rationale underlying the ear-
its prominence, the MMPI came to be asso- liest personality inventories (e.g., Woodworth,
ciated with such contentious issues as clinical 1917).
versus statistical prediction (Meehl, 1954) and (ii) The empirical (instrumental) strategy.
response styles (Wiggins, 1962). The shifts in This views a subject's ªutteranceº as ªan
conceptualization reflect the virtuosity and intrinsically interesting and significant bit of
intellectual flexibility of the developers of the verbal behavior, the nontest correlates of which
MMPI who were able to shift from typological must be discovered by empirical meansº (Meehl,
categories to continuous trait dimensions; from 1945, p. 297, emphasis added). If it were
discriminant validity of differential diagnoses to established empirically that a false response to
the construct validity of scales and profiles; and the item ªI am anxiousº was more characteristic
from denigration of self-reports to the canoni- of hospitalized hysterics than of a normal
zation of item content (Ben-Porath, 1994). control group, that particular bit of verbal
The dialectical processes just described were behavior would become meaningful, and would
enacted by a remarkably diverse group of illustrate the instrumental value of verbal be-
scientists/practitioners whose program of clin- havior as a tool in psychodiagnostics. This
ical research had its roots in the unique philosophy was evident in the early behavior-
combination of behavioral, biological and istic critique of introspectionism in experimen-
psychometric thinking that prevailed at the tal psychology and in later critiques of existing
University of Minnesota in the 1930s and 1940s rational personality inventories of the 1930s
(Meehl, 1989). Stark R. Hathaway was an (Humm & Wadsworth, 1935; Landis & Katz,
originator of the MMPI, a mentor for successive 1934; Landis, Zubin, & Katz, 1935). The earliest
generations of Minnesota students and a example of a major inventory constructed under
merciless critic of his own instrument (e.g., an empirical strategy was the Vocational Inter-
Hathaway, 1972). Paul E. MeehlÐlearning est Blank (Strong, 1927), which served as the
theorist, philosopher of science, psychoanalyst, prototype for the construction of the MMPI.
and taxonometricianÐis a legend in his own (iii) The constructive (substantive) strategy.
time, the principal theorist of the MMPI, and This views a subject's response as a manifesta-
arguably the major figure in the field of tion of an underlying personality construct that
personality assessment since the late 1950s. is embedded in an interlocking system of laws
W. Grant Dahlstrom is the Talmudic scholar of that relate constructs to one another and to
the MMPI, whose meticulous organizations of observable properties of the environment
that vast literature have informed and influ- (Cronbach & Meehl, 1955). The meaning of a
enced generations of clinicians and researchers construct is given by the empirical laws into
(e.g., Dahlstrom & Dahlstrom, 1980; Dahl- which it enters and the significance of a con-
strom, Welsh, & Dahlstrom, 1972±1975; Welsh struct is given by the number of such lawful
& Dahlstrom, 1956). James N. Butcher is a relations discovered. The construct validity of a
distinguished editor, scholar, and researcher personality scale is estimated from ªthe propor-
who is principal author of the revised MMPI tion of test score variance that is attributable to
(MMPI-2; Butcher, Dahlstrom, Graham, Telle- the construct variableº (p. 289).
gen, & Kaemmer, 1989), the new MMPI content (iv) The interpersonal (self-presentational)
scales (Butcher, Graham, Williams, & Ben- strategy. This views a subject's response as an
Porath, 1990), the adolescent version of the interpersonal communication between the sub-
MMPI (MMPI-A; Butcher et al., 1992), and the ject and the tester (or the institution that the
MMPI automated clinical report (Butcher, tester represents) (Carson, 1969b; Leary, 1957).
1993). The meaningfulness of such a communication is
Empirical Tradition 359

dependent upon both the subject's and the the traditional psychiatric diagnostic categories
tester's views of the meaning of test responses of that time. The conservative rationale of the
in a particular testing situation (e.g., hospital empirical strategy permitted such statements as
admission, vocational guidance). Thus, the ªThis subject's pattern of verbal behavior
subject may mean to communicate a complaint resembles more closely that of a group of
(ªI am anxiousº) and the tester (or the scoring hospitalized schizophrenics than it does that of
system employed by the institution) may encode a normal control group.º This fact, together
this as ªneuroticism.º The subject may view the with evidence from other sources, was sugges-
testing situation as an opportunity for self- tive of the working diagnosis of ªschizophre-
disclosure (Jourard, 1964) and the examiner nia.º Unfortunately, the individual clinical
may view it as an opportunity for impression scales were not successful in subsequent
management (Goffman, 1959). Elsewhere it has attempts to discriminate diagnostic groups from
been argued that the interpersonal view of test normals (Hathaway, 1960), due most likely to
responses has much to commend it, not the least lack of power in the original statistical compar-
of which is the likelihood that this is the frame isons and to unreliability of both predictor and
of reference which subjects themselves adopt criterion measures.
(Wiggins, 1966).
4.12.5.4.2 Constructive strategy
4.12.5.3 Assessment Instruments Ben-Porath (1994) provides an excellent
Construction of the MMPI began with the historical summary of the ªreinvention of the
generation of a large pool of items that were MMPIº following the failure of the empirical
considered representative of ªbehaviors of strategy and of its subsequent ªevolution into
significance to the psychiatristº (Hathaway & an omnibus measure of personalityº within a
McKinley, 1940). These items were presented in construct-oriented perspective. The highlights
true±false format to several groups of diagnosed of this shift over a 50-year period of unprece-
psychiatric inpatients (e.g., depressives) and to a dented research and clinical work include
large group of normal control subjects. The emphases upon: (i) MMPI profile configura-
proportion of patients in a given diagnostic tions, rather than on single scales (e.g., Gough
group who responded ªtrueº (ªendorsementº) 1946); (ii) normal personality correlates of
to each item (e.g., 80%) was compared with the profile types (e.g., Black, 1956); (iii) additional
proportion of normal subjects who had re- empirically and substantively derived scales that
sponded ªtrueº to that item (e.g., 10%), in order broadened the nomological network of MMPI
to identify discriminating items that might be investigation (e.g., Morey, Waugh, & Blash-
keyed (in this instance, ªtrueº) on a clinical scale field, 1985), and (iv) actuarial prediction
(e.g., depression scale; Hathaway & McKinley, systems based on configural profile types
1942). Innovative procedures were then applied (e.g., Marks & Seeman, 1963).
in the development of three validity scales that
served to identify sources of invalidity or 4.12.5.4.3 Interpersonal strategy
ªfakingº in test responding: a ªlie scaleº (L)
with items of high desirability and low endorse- Loevinger's (1957) formulation of the sub-
ment, an ªinfrequency scaleº (F) with items of stantive component of construct validity pro-
low desirability and low endorsement, and a vided an even greater contrast to the original
subtle ªcorrection scaleº (K) that measures empirical perspective of the MMPI. Never-
defensiveness when high and self-criticalness theless, workers within the MMPI tradition for
when low. The resultant three validity scales, the most part continued to pursue the other
eight clinical scales (hypochondriasis, depres- components of construct validity just described.
sion, hysteria, psychopathic deviate, paranoia, Interest in the ªcontentº of patients' commu-
psychasthenia, schizophrenia, hypomania), a nication of complaints via the MMPI remained
masculinity/femininity scale, and a social in- minimal until the appearance of Wiggins' (1966)
troversion scale constitute the MMPI profile MMPI Content Scales. In terms of Loevinger's
that was normed and standardized on the distinctions, these content scales appeared to
original sample of normal control subjects. provide: (i) representative coverage of the
universe of content of the MMPI item pool
(see Johnson, Butcher, Null, & Johnson, 1984),
4.12.5.4 Interpretive Principles (ii) psychometrically sound measures of 13
dimensions of self-report that were interpreta-
4.12.5.4.1 Empirical strategy
ble with reference to previous studies of the
The original purpose of the MMPI was to aid factorial structure of the MMPI (Welsh, 1956),
psychiatrists in their assignment of patients to and (iii) empirical evidence of convergent and
360 Principles of Personality Assessment

discriminant validity with reference to psychia- settings (e.g., Webb, Levitt, & Rojdev, 1993),
tric diagnostic categories (Wiggins, 1966; Payne although there continue to be criticisms from
& Wiggins, 1972). Perhaps of greater interest to both without and within the MMPI community
the MMPI community was the potential (e.g., Caldwell, 1997).
usefulness of these content scales in individual In 1972, Hathaway concluded: ªSo, in
psychodiagnostic appraisal (e.g., Nichols, summary, we are stuck, I think, with the
1987). Variants of the original content scales MMPI . . . for a dreary while longer, although
are now an official supplement to the clinical a prophet may even now be wandering in the
scales of MMPI-2 (Butcher et al., 1990) and psychometric wildernessº (1972 p. 40). There is
their incremental contribution to the differential little doubt that we will be ªstuckº with MMPI-
diagnosis of psychopathology appears highly 2 for many years to come, although the paths of
promising (e.g., Ben-Porath, Butcher, & Gra- the prophets are now more discernible (e.g.,
ham, 1991). Costa, 1991a; Morey, 1991).

4.12.6 INTERPERSONAL PARADIGM


4.12.5.5 Applications and Current Status
4.12.6.1 Historical Background
The MMPI developed in the 1940s was, by
contemporary standards, based on inadequate The interpersonal circumplex model had its
and outdated norms, replete with offensive and origins in the writings of Harry Stack Sullivan
anachronistic items, and patently unsuited to (1953) who introduced a number of original and
the diagnostic task for which it was devised. The radical ideas into the American psychiatry of the
clinical scales were derived by less than optimal 1940s. Although not entirely liberated from
contrasted group procedures (Jackson, 1971) Freud's spatial-hydraulic metapsychology,
that, from the beginning, appeared unreplicable Sullivan was ªbehavioralº in the sense that
(Benton, 1949). In the absence of a coherent his primary emphasis was upon the things that
theoretical rationale, the profiles, code-types, persons do to one another in interpersonal
and actuarial systems based on these scales have transactions, rather than upon internal pro-
an ambiguous conceptual status (Helmes & cesses that persons might ªhave.º His oft-cited
Reddon, 1993). Pressures to restandardize, definition of personality as ªthe relatively
revise, or replace this relic have existed almost enduring pattern of recurrent interpersonal
from its inception, culminating in a historically situations which characterize a human lifeº
important summit meeting (Butcher, 1972) in represents an even greater departure from
which the third option was seriously considered psychoanalytic thought, as well as a unique
(Wiggis, 1973a). Once again, however, the perspective within personality theory itself.
resilient and perdurable Minneapolis Group Sullivan was influenced by Lewin's (1938)
(Butcher et al., 1989) rose to the challenges from field-theoretical conceptualization of contem-
within and without to produce what was poraneous, bidirectional influences in ªpsycho-
optimistically dubbed ªMMPI-2º (leaving open logical fields,º which led him to define the
the possibility of MMPI-3 in the year 2035). interpersonal situation as the basic unit of
Although the original MMPI is a disgruntled observation in psychiatry. This idea does not
psychometric critic's dream come true, it is connote a separate ªpersonº or ªpersonsº who
difficult to ignore the past and present status of may be considered independently from the
the MMPI as our most widely used objective complex field (situation) of bidirectional caus-
personality test. Given what appeared to be the alities in which they are embedded; nor does it
lesser of two evils, the personnel of the MMPI hold out the hope of capturing such complex-
Restandardization Project opted to emphasize ities as ªinteractionº terms in analyses of
continuity with the vast interpretive and variance.
empirical database accumulated for the original The Kaiser Foundation Research Project, in
clinical and validity scales while simultaneously which Timothy Leary (1957) was the most pre-
removing inappropriate items, correcting psy- eminent investigator, attempted to operationa-
chometric deficiencies where possible, and lize Sullivanian concepts in terms of concrete
generating new content scales. Shorn of its measurement procedures. An ordinary lan-
dated and politically incorrect items, the lamb- guage analysis of clinicians' observations of
like content of the MMPI-2 is now clothed in the the things that patients did to each other (and
lupine garments of computer technology and to themselves) in group psychotherapy led to
state-of-the-art methodology (e.g., uniform T- a taxonomy of interpersonal behaviors that
scores, adaptive testing, measurement of vari- appeared to be empirically well-captured by
able response inconsistency). MMPI-2 appears a circular arrangement of interpersonal vari-
to be replacing the original in most clinical ables organized around the coordinates of
Interpersonal Paradigm 361

ªdominanceº and ªaffiliation.º LaForge, entity), respectively (Wiggins, 1991). These


Leary, Naboisek, Coffey, and Freedman concepts are compatible with Sullivan's
(1954) derived the basic trigonometry of what emphasis on the communal context of inter-
would later be called a ªcircumplex,º without personal transactions and with his belief in the
knowledge of Guttman's (1954) related work on importance of the social sciences for under-
that topic. standing interpersonal situations. When con-
Notable contributions during the 1960s joined with Foa and Foa's (1974) theory of the
included the conceptualization of both maternal exchange of love (communion) and status
and child behavior within an interpersonal (agency) in interpersonal transactions, these
circumplex framework (Schaefer, 1961), a concepts provide a rich conceptual framework
psychometrically sophisticated replication of for the measurement and understanding of
the circumplex within a clinical population interpersonal behavior (Wiggins & Trapnell,
(Lorr & McNair, 1963), and a highly influential 1996).
integration of the circumplex with the clinical,
social, and experimental psychology of that
time (Carson, 1969a). Alternative conceptual 4.12.6.3 Assessment Instruments
formulations of the interpersonal model were 4.12.6.3.1 Interpersonal Adjective Scales
presented in the 1970s (e.g., Benjamin, 1974;
Kiesler, 1979; Wiggins, 1979), and during the The Interpersonal Ajective Scales (IAS;
1980s the model was applied to psychotherapy Wiggins, 1995) evolved from a psychological
(Anchin & Kiesler, 1982), complementarity taxonomy of trait-descriptive terms that was
(Kiesler, 1983), and interpersonal problems developed within the framework of a larger
(Horowitz, Rosenberg, Baer, Ureno, & Villa- program of collaborative research on language
senor, 1988). Among the contributions of and personality (Goldberg, 1977). Within a
the 1990s have been an updated and representative pool of trait-descriptive adjec-
psychometrically-sound version of the original tives selected from an unabridged dictionary, an
interpersonal checklist (Wiggins, 1995), a ªinterpersonal domainº was distinguished from
comprehensive exposition of contemporary other domains, such as characterological,
interpersonal theory and research (Kiesler, temperamental, and cognitive domains (Wig-
1996), and a presentation of the impressive gins, 1979). Approximately 800 interpersonal
variety of contexts in which the interpersonal adjectives were assigned to the original cate-
circumplex model has been applied (Plutchik & gories of the interpersonal circumplex (Leary,
Conte, 1997). 1957) on both conceptual and empirical
grounds. Using computer-based multivariate
procedures it was found that scales based on the
original categories failed to meet certain
4.12.6.2 Conceptual Framework
circumplex criteria that could be better met
The radical interpersonalism of Sullivan with scales based on the revised categories that
(1953) may be difficult to comprehend on first now constitute the IAS (Wiggins, 1995). The
exposure because of our ingrained ªindividual- IAS consists of 64 adjectives that respondents
ist languageº (p. 50) for describing personality rate for self-descriptive accuracy on an eight-
as reflecting attributes of a discrete individual place Likert scale ranging from ªextremely
who is ªseparateº from others and from a inaccurateº to ªextremely accurate.º Eight
shared social environment. Theorists within the scales, or ªoctants,º of eight items each, assess
interpersonal paradigm have attempted to the interpersonal dispositions listed in the first
operationalize Sullivan's conceptualization of column of Table 1. The PA octant, for example,
personality in ways that avoid this individua- includes items such as ªdominant,º ªforceful,º
listic bias, by defining personality as ªnothing and ªassertive.º
more (or less) than the patterned regularities
that may be observed in an individual's relations
4.12.6.3.2 Inventory of Interpersonal Problems
with other persons, who may be real in the sense
of actually being present, real but absent and Horowitz (1979) transcribed statements of
hence `personified' or `illusory' º (Carson, interpersonal problems expressed by psychiatric
1969a, p. 26). outpatients in the course of videotaped intake
On a metatheoretical level, it is helpful to interviews and employed these statements as
think of the dominance and affiliation axes of items in the construction of the Inventory of
the interpersonal circumplex in terms of Interpersonal Problems (IIP), in which respon-
Bakan's (1966) concepts of ªagencyº (mastery, dents are required to indicate the extent to
self-assertion, self-expansion) and ªcommu- which each of 127 statements is problematic on
nionº (union with a larger social or spiritual a five-place Likert scale ranging from ªnot at
362 Principles of Personality Assessment

Table 1 Octant scales from three interpersonal assessment instruments.

IAS Interpersonal IAS Inventory of IAS Impact Message


Adjective Scales Interpersonal Problems Inventory

PA Assured±dominant Domineering Dominant


BC Arrogant±calculating Vindictive Hostile±dominant
DE Cold-hearted Cold Hostile
FG Aloof-introverted Socially avoidant Hostile±submissive
HI Unassured±submissive Nonassertive Submissive
JK Unassuming±ingenuous Exploitable Friendly±submissive
LM Warm±agreeable Overly nurturant Friendly
NO Gregarious±extraverted Intrusive Friendly±dominant

allº to ªextremelyº (Horowitz et al., 1988). (e.g., ªbossed aroundº), action tendencies
Subsequently, Alden, Wiggins and Pincus (e.g., ªI want to tell him to give someone else
(1990) developed a circumplex version of this a chance to make a decisionº), and perceived
inventory (IIP-C) that consists of eight scales, of evoking messages (e.g., ªHe thinks he's always
eight items each, that assess the interpersonal in control of thingsº). Respondents (e.g., a
problems listed in the second column of Table 1. psychotherapist) are asked to imagine them-
The PA octant, for example, includes items such selves in the company of a particular person
as ªI try to control other people too much.º (e.g., a psychotherapy patient) and to indicate
the extent to which they experience the covert
reactions on a four-place scale. Covert reactions
4.12.6.3.3 Impact Message Inventory to a target person (e.g., feeling ªbossed
The Impact Message Inventory (IMI) is a aroundº) are thus revealing of the personality
highly original and promising method of and the interpersonal impact of that target
assessment that is based on Kiesler's (1988) person (e.g., rigidly dominant, PA).
theory of interpersonal communication in
psychotherapy. The theory postulates that 4.12.6.4 Interpretive Principles
disordered individuals are unaware of the
unintended, inappropriate, and ambiguous One of the most valuable features of the
messages they repetitively ªsendº to others circumplex model is the opportunity it provides
and that they are therefore confused and for interpreting interpersonal variables with
distressed by the pattern of negative responses reference to the geometric principles of a circle
they consistently evoke or ªpullº from others. (LaForge et al., 1954). The variables of
The IMI attempts to identify the location within traditional multiscale inventories are typically
the interpersonal circumplex of these patterns of displayed as ªfactor listsº (Hogan, 1983) in
negative response evoked in others, as a means which the ordering and interrelations among
of gaining insight into a client's maladaptive variables lack both conceptual and interpretive
transactional behavior. significance. In contrast, the trigonometric
Items were generated from a content analysis procedures that can be applied to the variables
of free responses to 15 interpersonal vignettes of an interpersonal circumplex permit descrip-
that described characters enacting 15 different tions and diagnostic inferences that cannot be
interpersonal styles, similar to those found in generated from traditional scales (Wiggins,
the first column of Table 1. Respondents were Phillips, & Trapnell, 1989).
asked to imagine themselves in the company of Some general principles of circumplex inter-
each of these characters and to record their pretation may be illustrated with reference to
covert reactions using the stem, ªHe makes me Figure 1, which presents the IAS profile of a 44-
feel . . .º Content analysis of responses suggested year-old woman who was employed as a senior
three categories of covert reaction: direct bank manager. At the bottom of the figure are
feelings, action tendencies, and perceived evok- T-scores on the octant variables, expressed
ing messages. The Octant version of the IMI with reference to an appropriate normative
consists of six items for each of the target stimuli group. These scores have been plotted on
listed in the third column of Table 1. For each shaded sectors of the circle and they are
octant, there are two items for each of the three interpreted as representing eight vectors in
categories of covert reaction to the target two-dimensional space. The mean or average
stimulus. Thus, for example, the PA scale directionality of these eight vectors is of critical
includes two items each for direct feelings diagnostic significance because it determines
Interpersonal Paradigm 363

the ªtypologicalº category to which this of behavior is expressed in a clearly differen-


woman will be assigned. By trigonometric tiated fashion, interpretation would proceed
procedures, this average directionality was with reference to the empirical literature of the
determined to be 908, which falls exactly at interpersonal tradition that has examined both
the midpoint of the assured-dominant category the dimension of assured-dominant (PA) and
(PA). Such an angular location is considered to the characteristics of dominant ªtypesº in both
be prototypical of individuals classified as pure normal and psychiatric populations (Kiesler,
ªPA types.º Had the location been at 1108, a 1996). This would include the literature of other
more hostile manifestation of dominance (BC) instruments that have studied this location (PA)
would have been suggested; had it been at 658, a in contexts such as interpersonal problems (IIP-
more affiliative expression of dominance (NO) C), impact messages (IMI), and other contexts
would have been suggested. But the protypi- that have been studied by interpersonal re-
cality of the present classification increases our searchers.
confidence in asserting that the principal
interpersonal style of this woman is one that
emphasizes the exercise of power over others in 4.12.6.5 Applications and Current Status
a social context, by such activities as taking
charge, making decisions, and winning argu- The definition of the personality disorders in
ments (Wiggins, 1995, p. 22). terms of personality traits by the APA (1980),
The general shape of the profile in Figure 1 and the availability of new instruments and
closely resembles the characteristic configura- techniques for assessing personality disorder,
tion of IAS profiles that is found in the average has resulted in an increased understanding of
profiles of subjects in all IAS typological groups this diagnostic axis. For example, many of the
(Wiggins et al., 1989). The characteristic Diagnostic and statistical manual of mental
configuration of an IAS profiles is one in which disorders (DSM) personality disorder categories
the principal elevation occurs on the defining have been shown to be well captured by the two-
octant (in this case, PA), with secondary and dimensional structures of the IAS (e.g., Wiggins
approximately equal elevations occurring on & Pincus, 1989, 1994) and the IIP-C (e.g., Pincus
adjacent octants (BC and NO), and diminishing & Wiggins, 1990; Soldz, Budman, Demby, &
and approximately equal elevations occurring Merry, 1993). Similarly, clinicians' ratings of
on subsequent pairs of octants (DE and LM), DSM personality disorder categories have been
(FG and JK), down to a highly truncated found to be well captured by an interpersonal
ªoppositeº (to the principal) octant (HI). Thus, circumplex model (e.g., Blashfield, Sprock,
across interpersonal situations, we would expect Pinkston, & Hodgin, 1985; Plutchik & Conte,
this woman frequently to behave in a forceful, 1986). And more recently, Benjamin (1995) has
assertive, dominant, and self-confident manner provided a detailed and perceptive description
(PA); to somewhat less frequently behave in of procedures for the diagnosis of personality
aggressive (BC) and gregarious (NO) ways; to disorders within the framework of her variant of
seldom, if ever, behave in a submissive (HI) the interpersonal circumplex model.
fashion, and so forth. The interpersonal tradition originated in the
The vector length of an interpersonal profile context of psychotherapy and, perhaps more
is a measure of ªdevianceº in both a statistical than any other tradition, has contributed to an
and a psychiatric sense. In the former sense, understanding of the therapeutic process itself
vector length is the standard deviation of the (e.g., Anchin & Kiesler, 1982; Benjamin, 1995;
eight interpersonal variables that indicates the Kiesler, 1988; Safran & Segal, 1990; Sullivan,
ªintensityº with which a pattern of interperso- 1953). The most notable contribution of this
nal behavior is expressed. In the latter sense, tradition to psychotherapy has been the
high-variance profiles are associated with inter- circumplex structural model that provides a
personal ªrigidityº and are found most often in framework for representing dyadic interactions
psychiatric groups. The length of the arrow in the therapeutic relationship. A number of
indicating angular location in Figure 1 is the different assessment devices based on the
vector length of the profile, which is approxi- interpersonal circumplex, and variants of that
mately 1.8 standard deviations above that of the model (e.g., Benjamin, 1974), have proven
normative group. This would suggest that the useful in studies of psychotherapy process and
pattern of interpersonal behaviors described outcome (Kiesler, 1996).
above would be expressed vigorously, and quite Gurtman (1996) has emphasized the con-
possibly rigidly or intemperately. struct validity of the circumplex version of
Having established that this individual's Horowitz's Inventory of Interpersonal Pro-
profile is representative of those obtained by blems (IIP-C; Alden et al., 1990) within the
assured dominant ªtypes,º and that this pattern psychotherapy context. Henry (1996) has made
364 Principles of Personality Assessment

PA
Assured –
dominant
o
90
o o
112 67
BC
Arrogant – NO
calculating Gregarious –
o extroverted
135 o
45

o o
157 22

DE LM
Cold- Warm –
hearted agreeable
o o
180 0

o o
247 337

JK
FG Unassuming –
Aloof – ingenuous
introverted o
o 315
225
o
o
202 292
o
270
Unassured –
submissive
HI

o
PA BC DE FG HI JK LM NO Angular location = 90
o

T-Score 65 56 52 40 30 41 49 60 Vector length T-score = 68

Figure 1 IAS profile of a 44-year-old bank manager.

a similar case for the Structural Analysis of literature that attests to the heuristic potential of
Social Behavior (Benjamin, 1974) as a common the interpersonal circumplex model in psy-
metric for programmatic psychotherapy re- chotherapy research. The future of the inter-
search. Kiesler, Schmidt, and Wagner (1997) personal circumplex tradition for research in
have stressed the conceptual advantages of the both psychodiagnostics and psychotherapy
IMI in analyzing the psychotherapeutic rela- appears to be a bright one (Wiggins & Trobst,
tionship. And Kiesler (1996) has also empha- 1997).
sized the potential of the revised Check List of
Psychotherapy Transactions (Kiesler, Gold-
ston, & Schmidt, 1991) for measuring the 4.12.7 CONCLUSION
interpersonal behavior of interactants in the
therapy relationship. Taken together, the stu- In the introduction to this chapter, we
dies summarized in the aforementioned review considered the diagnostic work of a hypothe-
articles constitute an impressive empirical tical clinician who was both well trained and
References 365

well informed about personality tests, but who limitations, especially managed care in the
was perhaps unaware of the conceptual and USA, often preclude the once-favored idea of
interpretive richness, and particularly of the a ªstandard batteryº of tests. It would also not
distinctiveness, of the five personality assess- be fair to expect our hypothetical clinician to
ment traditions just considered. Assuming that master all available orientations and tests. But
she persevered in reading through this highly perhaps she could become familiar with a few
condensed overview, there are a few ªtake more. Attending a workshop or two (outside of
homeº messages that might now be conveyed to her present area of expertise) might enhance the
her. professional life of our hypothetical clinician.
First, in Kelly's (1955) terms, the five
traditions arose and were developed within
different ªfoci of convenienceº and therefore ACKNOWLEDGMENTS
are often most usefully applied within their
specific areas of focus. The psychodynamic We would like to express our indebtedness to
tradition of Freud arose from the revolutionary Yossef S. Ben-Porath, Sidney J. Blatt, Dan P.
idea that the causes and reasons for behavior are McAdams, David S. Nichols, and William
frequently not apparent and that we must look McKinley Runyan for their helpful suggestions
ªwithinº the individual, as it were, to discover concerning earlier drafts of sections within their
the sources (often historical) of contemporary own areas of expertise. However, we accept full
problems of living. The personological tradition responsibility for any errors of fact or inter-
arose in response to the question of ªHow shall pretation that remain in the present version.
a psychological life history be written?º The
multivariate tradition arose as an extension of
the mulifactor assessment of intelligence to the 4.12.8 REFERENCES
personality domain, and was subsequently
concerned with the determination of the basic Alden, L. E., Wiggins, J. S., & Pincus, A. L. (1990).
Construction of circumplex scales for the Inventory of
descriptive dimensions of personality. The Interpersonal Problems. Journal of Personality Assess-
empirical tradition arose in the context of ment, 53, 521±536.
descriptive psychiatry, and was originally most Alexander, I. E. (1990). Personology: Method and content in
concerned with the differential diagnosis of personality assessment and psychobiography. Durham,
NC: Duke University Press.
psychopathology. The interpersonal tradition Allison, J., Blatt, S. J., & Zimet, C. N. (1968). The
was originally focused on the character and interpretation of psychological tests. New York: Harper
quality of interpersonal relationships and on and Row (Reprinted in 1988 by Taylor & Francis).
what persons did to one another within such Allport, G. W. (1937). Personality: A psychological
interpretation. New York: Holt, Rinehart, & Winston.
relationships. Allport, G. W. (1942). The use of personal documents in
Second, instruments developed or employed psychological science. New York: Social Science Re-
within the five traditions have been found to search Council.
have different ªranges of convenienceº (Kelly, Allport, G. W. (1965). Letters from Jenny. New York:
1955), that is, to be generally or specifically Harcourt, Brace, & World.
Allport, G. W. (1967). Autobiography. In E. G. Boring &
applicable to a particular spectrum of assess- G. Lindzey (Eds.), A history of psychology in autobio-
ment problems. Here it must be reiterated that graphy (Vol. 5, pp. 1±25). New York: Appleton-Century-
there is not a one-to-one relation between Crofts.
instruments and traditions in this respect. A few Allport, G. W., & Odbert, H. S. (1936). Trait names: A
psycho-lexical study. Psychological Monographs, 47
examples must suffice: The Rorschach test may (Whole Issue No. 211).
be useful in assessing cognitive and perceptual American Psychiatric Association (1980). Diagnostic and
deficits; the WAIS may be useful in detecting statistical manual of mental disorders (3rd ed.). Washing-
covert motivational conflicts; the MMPI item ton, DC: Author.
pool has proven a fruitful source for the Anchin, J. C., & Kiesler, D. J. (Eds.) (1982). Handbook of
interpersonal psychotherapy. Elmsford, NY: Pergamon.
construction of literally hundreds of scales Babcock, H. (1933). A short form of the Babcock
serving different assessment purposes; the examination for the measurement of mental deterioration.
TAT, and its associated need-press framework, Chicago: Stoelting.
has proven useful within many different tradi- Bakan, D. (1966). The duality of human existence: Isolation
and communion in Western man. Boston: Beacon Press.
tions, as has the NEO PI-R, and the IAS. Behrends, R. S., & Blatt, S. J. (1985). Internalization and
Finally, our hypothetical clinician should be psychological development through the life cycle. Psy-
aware of the armamentarium of conceptual choanalytic Study of the Child, 40, 11±39.
orientations and instruments that may be Benjamin, L. S. (1974). Structural analysis of social
brought to bear on a specific referral or behavior. Psychological Review, 81, 392±425.
Benjamin, L. S. (1995). Interpersonal diagnosis and treat-
assessment task. It is extremely unlikely that ment of personality disorders (2nd ed.). New York:
there is one test, or one orientation, that will be Guilford Press.
optimal for all assessment tasks. Practical Ben-Porath, Y. S. (1994). The MMPI and MMPI-2: Fifty
366 Principles of Personality Assessment

years of differentiating normal and abnormal person- In J. N. Butcher (Ed.), MMPI: Research developments
ality. In S. Strack & M. Lorr (Eds.), Differentiating and clinical applications (pp. 41±53). New York:
normal and abnormal personality (pp. 361±401). New McGraw-Hill.
York: Springer. Cattell, H. E. P. (1994). Development of the 16 PF fifth
Ben-Porath, Y. S., Butcher, J. N., & Graham, J. R. (1991). edition. In S. R. Conn & M. L. Rieke (Eds.), The 16 PF
Contribution of the MMPI-2 content scales to the fifth edition technical manual (pp. 3±20). Champaign, IL:
differential diagnosis of psychopathology. Psychological Institute of Personality and Ability Testing.
Assessment, 3, 634±640. Cattell, R. B. (1933). Temperament tests. II: Tests. British
Benton, A. L. (1949). Review of Minnesota Multiphasic Journal of Psychology, 23, 308±329.
Personality Inventory. In O. K. Buros (Ed.), The third Cattell, R. B. (1943). The description of personality: Basic
mental measurements yearbook (pp. 104±107). Highland traits resolved into clusters. Journal of Abnormal and
Park, NJ: Gryphon Press. Social Psychology, 38, 476±506.
Black, J. D. (1956). Adjectives associated with various Cattell, R. B. (1949). The Sixteen Personality Factor
MMPI codes. In G. S. Welsh & W. G. Dahlstrom (Eds.), Questionnaire (1st ed.). Champaign, IL: Institute of
Basic readings on the MMPI in psychology and medicine Personality and Ability Testing.
(pp. 151±172). Minneapolis, MN: University of Minne- Cattell, R. B. (1957). Personality and motivation structure
sota Press. and measurement. Yonkers-on-Hudson, NY: World
Blashfield, R., Sprock, J., Pinkston, K., & Hodgin, J. Book.
(1985). Exemplar prototypes of personality disorder Cattell, R. B., Cattell, A., & Cattell, H. E. P. (1993).
diagnoses. Comprehensive Psychiatry, 26, 11±21. Sixteen Personality Factor Questionnaire (5th ed.).
Blatt, S. J., & Allison, J. (1981). The intelligence test in Champaign, IL: Institute of Personality and Ability
personality assessment. In A. I. Rabin (Ed.), Assessment Testing.
with projective techniques (pp. 137±231). New York: Costa, P. T., Jr. (Ed.) (1991a). Clinical use of the five-factor
Springer. model of personality [Special series]. Journal of Person-
Blatt, S. J., Brenneis, C. B. Schimek, J. G., & Glick, M. ality Assessment, 57(3), 393±464
(1976). The normal development and psychopathological Costa, P. T., Jr. (1991b). Clinical use of the five-factor
impairment of the concept of the object on Rorschach. model: An introduction. Journal of Personality Assess-
Journal of Abnormal Psychology, 85, 364±373. ment, 57, 393±398.
Blatt, S. J., & Lerner, H. D. (1983). The psychological Costa, P. T., Jr., & McCrae, R. R. (1985). The NEO
assessment of object representation. Journal of Person- Personality Inventory manual. Odessa, FL: Psychological
ality Assessment, 47, 7±28. Assessment Resources.
Block, J. (1995). A contrarian view of the five-factor Costa, P. T., Jr., & McCrae, R. R. (1992a). Four ways five
approach to personality Psychological Bulletin, 117, factors are basic. Personality and Individual Differences,
187±215. 13, 653±665.
Botwin, M. D., & Buss, D. M. (1989). Structure of act- Costa, P. T., Jr., & McCrae, R. R. (1992b). NEO PI-R
report data: Is the five-factor model of personality professional manual. Odessa, FL: Psychological Assess-
recaptured? Journal of Personality and Social Psychology, ment Resources.
56, 988±1001. Costa, P. T., Jr., & Widiger, T. A. (Eds.) (1994). Personality
Bowlby, J. (1973). Attachment and loss, Vol. 2: Separation. disorders and the five-factor model of personality.
New York: Basic Books. Washington, DC: American Psychological Association.
Buchwald, A. M. (1961). Verbal utterances as data. In H. Craik, K. H. (1986). Personality research methods:
Feigl & G. Maxwell (Eds.), Current issues in the An historical perspective. Journal of Personality, 54,
philosophy of science (pp. 461±468). New York: Holt. 18±51.
Buss, D. M. (1996). Social adaptation and five major Cronbach, L. J., & Meehl, P. E. (1955). Construct validity
factors of personality. In J. S. Wiggins (Ed.), The five- in psychological tests. Psychological Bulletin, 52,
factor model of personality: Theoretical perspectives 281±302.
(pp. 180±207). New York: Guilford Press. Dahlstrom, W. G., & Dahlstrom, L. E. (Eds.) (1980). Basic
Butcher, J. N. (Ed.) (1972). Objective personality assess- readings on the MMPI: A new selection on personality
ment: Changing perspectives. New York: Academic Press. measurement. Minneapolis, MN: University of Minne-
Butcher, J. N. (1993). User's guide for the Minnesota sota Press.
Clinical Report. Minneapolis, MN: National Computer Dahlstrom, W. G., Welsh, G. S., & Dahlstrom, L. E.
Systems. (1972±1975). An MMPI handbook. (Vols. 1 & 2).
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, Minneapolis, MN: University of Minnesota Press.
A., & Kaemmer, B. (1989). The Minnesota Multiphasic Digman, J. M. (1979, November). The five major domains
Personality Inventory-2 (MMPI-2): Manual for admin- of personality variables: Analysis of questionnaire data in
istration and scoring. Minneapolis, MN: University of light of the five robust factors emerging from studies of
Minnesota Press. rated characteristics. Paper presented at the annual
Butcher, J. N., Graham, J. R., Williams, C. L., & Ben- meeting of the Society of Multivariate Experimental
Porath, Y. S. (1990). Development and use of the MMPI- Psychology, Los Angeles, CA.
2 content scales. Minneapolis, MN: University of Digman, J. M. (1994). Historical antecedents of the five-
Minnesota Press. factor model. In P. T. Costa, Jr. & T. A. Widiger (Eds.),
Butcher, J. N., Williams, C. L., Graham, J. R., Archer, R. Personality disorders and the five-factor model of person-
P., Tellegen, A., Ben-Porath, Y. S., & Kaemmer, B. ality (pp. 13±18). Washington, DC: American Psycholo-
(1992). Minnesota Multiphasic Personality Inventory gical Association.
(MMPI-A): Manual for administration, scoring, and Digman, J. M. (1996). The curious history of the five-factor
interpretation. Minneapolis, MN: University of Minne- model. In J. S. Wiggins (Ed.), The five-factor model of
sota Press. personality: Theoretical perspectives (pp. 1±20). New
Caldwell, A. B. (1997). Whither goest our redoubtable York: Guilford Press.
mentor, the MMPI/MMPI-2? Journal of Personality Elms, A. C. (1994). Uncovering lives: The uneasy alliance of
Assessment, 68, 47±68. biography and psychology. New York: Oxford University
Carson, R. C. (1969a). Interaction concepts of personality. Press.
Chicago: Aldine. Erikson, E. H. (1950). Childhood and society. New York:
Carson, R. C. (1969b). Interpretive manual to the MMPI. Norton.
References 367

Erikson, E. H. (1958). Young man Luther: A study in study of concept formation. Journal of Psychology, 3,
psychoanalysis and history. New York: Norton. 521±540.
Erikson, E. H. (1969). Ghandi's truth: On the origins of Hartmann, H. (1939). Ego psychology and the problem of
militant nonviolence. New York: Norton. adaptation. New York: International Universities Press.
Eysenck, H. J. (1947). Dimensions of personality. London: Hathaway, S. R. (1960). Forward. In W. G. Dahlstrom, G.
Routledge & Kegan Paul. S. Welsh, & L. E. Dahlstrom (Eds.), An MMPI
Eysenck, H. J. (1953). The structure of human personality. handbook: A guide to use in practice and research
New York: Wiley. (pp. vii±xi). Minneapolis, MN: University of Minnesota
Eysenck, H. J. (1990). Rebel with a cause. London: Allen. Press.
Eysenck, H. J. (1992). Four ways five-factors are not basic. Hathaway, S. R. (1972). Where have we gone wrong? The
Personality and Individual Differences, 6, 667±673. mystery of the missing progress. In J. N. Butcher (Ed.),
Fairbain, W. R. D. (1952). Psychoanalytic studies of the Objective personality assessment: Changing perspectives
personality: The object relation theory of personality. (pp. 21±43). New York: Academic Press.
London: Routledge & Kegan Paul. Hathaway, S. R., & McKinley, J. C. (1940). A multiphasic
Fiske, D. W. (1949). Consistency of the factorial structure personality schedule (Minnesota): I Construction of the
of personality ratings from different sources. Journal of schedule. Journal of Psychology, 10, 249±254.
Abnormal and Social Psychology, 44, 329±344. Hathaway, S. R., & McKinley, J. C. (1942). A multiphasic
Foa, U. G., & Foa, E. B. (1974). Societal structures of the personality schedule (Minnesota): III. The measurement
mind. Springfield, IL: Charles C. Thomas. of symptomatic depression. Journal of Psychology, 14,
Galton, F. (1888). Co-relations and their measurement. 73±84.
Proceedings of the Royal Society, 45, 135±145. Hathaway, S. R., & McKinley, J. C. (1943). The Minnesota
Gill, M. M. (Ed.) (1967). The collected papers of David Multiphasic Personality Inventory. Minneapolis, MN:
Rapaport. New York: Basic Books. University of Minnesota Press.
Gill, M. M., & Klein, G. S. (1967). The structuring of drive Hathaway, S. R., & McKinley, J. C. (1983). The Minnesota
and reality: David Rapaport's contributions to psycho- Multiphasic Personality Inventory manual. New York:
analysis and psychology. In M. M. Gill (Ed.), The Psychological Corporation.
collected papers of David Rapaport (pp. 8±34). New Helmes, E., & Reddon, J. R. (1993). A perspective on
York: Basic Books. developments in assessing psychopathology: A critical
Goffman, E. (1959). The presentation of self in everyday life. review of the MMPI and MMPI-2. Psychological
Garden City, NY: Doubleday Anchor. Bulletin, 113, 453±471.
Goldberg, L. R. (1977, August). Language and personality: Henry, W. P. (1996). The structural analysis of social
Developing a taxonomy of trait-descriptive terms. Invited behavior as a common metric for programmatic
address to the Division of Evaluation and Measurement psychotherapy research. Journal of Consulting and
at the 86th annual meeting of the American Psycholo- Clinical Psychology, 64, 1263±1275.
gical Association, San Francisco. Hogan, R. (1983). A socioanalytic theory of personality. In
Goldberg, L. R. (1981). Language and individual differ- M. M. Page (Ed.), 1982 Nebraska symposium on
ences: The search for universals in personality lexicons. motivation: PersonalityÐcurrent theory and research
In L. Wheeler (Ed.), Review of personality and social (pp. 55±89). Lincoln, NE: University of Nebraska Press.
psychology (Vol. 2, pp. 141±165). Beverly Hills, CA: Hogan, R. (1986). Hogan Personality Inventory manual.
Sage. Minneapolis, MN: National Computer Systems.
Goldberg, L. R. (1990). An alternative ªDescription of Hogan, R. (1994). Reinventing ourselves (Review of D. P.
personalityº: The Big-Five factor structure. Journal of McAdams, The stories we live by). Contemporary
Personality and Social Psychology, 59, 1216±1229. Psychology, 39, 355±356.
Goldberg, L. R. (1992). The development of markers for Hogan, R. (1996). A socioanalytic perspective on the five-
the Big-Five factor structure. Psychological Assessment, factor model. In J. S. Wiggins (Ed.), The five-factor
4, 26±34. model of personality: Theoretical perspectives
Goldberg, L. R. (1993). The structure of phenotypic (pp. 163±179). New York: Guilford Press.
personality traits. American Psychologist, 48, 26±34. Hogan, R., & Hogan, J. (1992). Hogan Personality
Goldstein, K., & Scheerer, M. (1941). Abstract and Inventory manual. Tulsa, OK: Hogan Assessment Sys-
concrete behavior: An experimental study with special tems.
tests. Psychological Monographs, 53(2) (Whole Issue No. Holt, R. R. (1951). The Thematic Apperception Test. In H.
239). A. Anderson & G. L. Anderson (Eds.), An introduction
Gough, H. G. (1946). Diagnostic patterns on the MMPI. to projective techniques (pp. 181±229). New York:
Journal of Clinical Psychology, 2, 23±37. Prentice-Hall.
Guilford, J. P. (1959). Personality. New York: McGraw- Holt, R. R. (Ed.) (1967). Motives and thought: Psycho-
Hill. analytic essays in honor of David Rapaport. Psycholo-
Guilford, J. P., & Guilford, R. B. (1936). Personality gical Issues, 5, Monograph 18/19.
factors S, E, and M, and their measurement. Journal of Holt, R. R., and Havel, J. (1960). A method for assessing
Personality, 34, 21±36. primary and secondary process in Rorschach responses.
Guilford, J. P., & Guilford, R. B. (1939). Personality In M. A. Rickers-Ovsiankina (Ed.), Rorschach psychol-
factors D, R, T, and A. Journal of Abnormal and Social ogy (pp. 263±315). New York: Wiley.
Psychology, 34, 21±36. Horowitz, L. M. (1979). On the cognitive structure of
Gurtman, M. B. (1996). Interpersonal problems and the interpersonal problems treated in psychotherapy. Jour-
psychotherapy context: The construct validity of the nal of Consulting and Clinical Psychology, 47, 5±15.
Inventory of Interpersonal Problems. Psychological Horowitz, L. M., Rosenberg, S. E., Baer, B. A., Ureno, G.,
Assessment, 8, 241±255. & Villasenor, V. S. (1988). The Inventory of Interperso-
Guttman, L. (1954). A new approach to factor analysis: nal Problems: Psychometric properties and clinical
The radex. In P. R. Lazarsfeld (Ed.), Mathematical applications. Journal of Consulting and Clinical Psychol-
thinking in the social sciences (pp. 258±348). Glencoe, IL: ogy, 56, 885±892.
Free Press. Hough, L. (1992). The ªBig Fiveº personality variablesÐ
Hall, C. S., & Lindzey, G. (1978). Theories of personality construct confusion: Description versus prediction. Hu-
(3rd ed.). New York: Wiley. man Performance, 5, 139±155.
Hanfmann, E., & Kasanin, J. (1937). A method for the Humm, D. G., & Wadsworth, G. W. (1935). The Humm-
368 Principles of Personality Assessment

Wadsworth temperament scale. Journal of Applied Journal of Educational Psychology, 26, 321±330.
Psychology, 92, 163±200. Leary, T. (1957). Interpersonal diagnosis of personality.
Jackson, D. N. (1971). The dynamics of structured New York: Ronald.
personality tests: 1971. Psychological Review, 78, Lewin, K. (1938). The conceptual representation and
229±248. measurement of psychological forces. Durham, NC: Duke
Jackson, D. N. (1984). Personality Research Form manual University Press
(3rd ed.). Fort Huron, MI: Research Psychologists Press. Loevinger, J. (1957). Objective tests as instruments of
Jackson, D. N., Paunonen, S. V., Fraboni, M., & Goffin, psychological theory. Psychological Reports, 3, 635±694.
R. D. (1996). A five-factor versus six-factor model of Lorr, M., & McNair, D. M. (1963). An interpersonal
personality structure. Personality and Individual Differ- behavior circle. Journal of Abnormal and Social Psychol-
ences, 20, 33±45. ogy, 67, 68±75.
Jacobson, E. (1954). The self and the object world. New MacKinnon, D. W. (1944). The structure of personality. In
York: International Universities Press. J. McV. Hunt (Ed.), Personality and the behavior
John, O. P. (1990). The ªBig Fiveº factor taxonomy: disorders (Vol. 1, pp. 3±48). New York: Ronald.
Dimensions of personality in the natural language and in MacKinnon, D. W. (1975). IPAR's contribution to the
questionnaires. In L. A. Pervin (Ed.), Handbook of conceptualization and study of creativity. In I. A. Taylor
personality: Theory and research (pp. 66±100). New & J. W. Getzels (Eds.), Perspectives in creativity
York: Guilford Press. (pp. 60±89). Chicago: Aldine.
Johnson, J. H., Butcher, J. N., Null, C., & Johnson, K. N. Main, M., Kaplan, N., & Cassidy, J. (1985). Security in
(1984). Replicated item level factor analysis of the full infancy, childhood, and adulthood: A move to the level
MMPI. Journal of Personality and Social Psychology, 47, of representation. Monographs of the Society for
105±114. Research in Child Development, 50, 66±104.
Jourard, S. M. (1964). The transparent self. Princeton, NJ: Marks, P. A., & Seeman, W. (1963). The actuarial
Van Nostrand. description of abnormal personality: An atlas for use with
Jung, C. G. (1971). Psychological types (H. G. Baynes, the MMPI. Baltimore, MD: Williams & Wilkins.
Trans.; revised by R. F. C. Hull). Princeton, NJ: Mayman, M. (1967). Object representations and object
Princeton University Press (Original work published relationships in Rorschach responses. Journal of Projec-
1923). tive Techniques, 31, 17±25.
Kelly, G. A. (1955). The psychology of personal constructs McAdams, D. P. (1988). Biography, narrative, and lives:
(2 vols.). New York: Norton. An introduction. Journal of Personality, 56, 1±18.
Kiesler, D. J. (1979). An interpersonal communication McAdams, D. P. (1989). Intimacy: The need to be close.
analysis of relationship in psychotherapy. Psychiatry, 42, New York: Doubleday.
299±311. McAdams, D. P. (1993). The stories we live by: Personal
Kiesler, D. J. (1983). The 1982 Interpersonal Circle: A myths and the making of the self. New York: William
taxonomy for complementarity in human transactions. Morrow.
Psychological Review, 90, 185±214. McAdams, D. P. (1994). The person: An introduction to
Kiesler, D. J. (1988). Therapeutic metacommunication: personality psychology (2nd ed.). Fort Worth, TX:
Therapist impact disclosure as feedback in psychotherapy. Harcourt Brace.
Palo Alto, CA: Consulting Psychologists Press. McAdams, D. P. (1996). Personality, modernity, and the
Kiesler, D. J. (1996). Contemporary interpersonal theory storied self: A contemporary framework for studying
and research: Personality, psychopathology, and psy- persons (Target article). Psychological Inquiry, 7,
chotherapy. New York: Wiley. 295±321.
Kiesler, D. J., Goldston, C. S., & Schmidt, J. A. (1991). McClelland, D. C. (1961). The achieving society. New
Manual for the Check List of Interpersonal Transactions- York: Van Nostrand.
Revised (CLOIT-R) and the Check List of Psychotherapy McCrae, R. R. (1991). The five-factor model and its
transactions-revised (CLOPT-R). Richmond, VA: Virgi- assessment in clinical settings. Journal of Personality
nia Commonwealth University. Assessment, 57, 399±414.
Kiesler, D. J., & Schmidt, J. A. (1993). The Impact Message McCrae, R. R., & Costa, P. T., Jr. (1990). Personality in
Inventory: Form IIA Octant Scale version. Palo Alto, CA: adulthood: Emerging lives, enduring dispositions. New
Mind Garden. York: Guilford Press.
Kiesler, D. J., Schmidt, J. A., & Wagner, C. C. (1997). A McCrae, R. R., & Costa, P. T., Jr. (1996). Toward a new
circumplex inventory of impact messages: An opera- generation of personality theories: Theoretical contexts
tional bridge between emotion and interpersonal beha- for the five-factor model. In J. S. Wiggins (Ed.), The five-
vior. In R. Plutchik & H. R. Conte (Eds.), Circumplex factor model of personality: Theoretical perspectives.
models of personality and emotions (pp. 221±244). Wa- (pp. 51±87). New York: Guilford Press.
shington, DC: American Psychological Association. Meehl, P. E. (1945). The dynamics of ªstructuredº
Kluckhohn, C., & Murray, H. A. (1953). Personality personality tests. Journal of Clinical Psychology, 1,
formation: The determinants. In C. Kluckhohn, H. A. 296±303.
Murray, & D. M. Schneider (Eds.), Personality in nature, Meehl, P. E. (1954). Clinical versus statistical prediction: A
society, and culture (pp. 53±67). New York: Knopf. theoretical analysis and review of the evidence. Minnea-
Kohut, H. (1971). The analysis of the self: A systematic polis, MN: University of Minnesota Press.
psychoanalytic approach to the treatment of narcissistic Meehl, P. E. (1989). Autobiography. In G. Lindzey
personality disorders. New York: International Univer- (Ed.), A history of psychology in autobiography (Vol. 3,
sities Press. pp. 337±389). Stanford, CA: Stanford University Press.
LaForge, R., Leary, T. F., Naboisek, H., Coffey, H. S., & Miller, T. R. (1991). The psychotherapeutic utility of the
Freedman, M. B. (1954). The interpersonal dimension of five-factor model of personality: A clinician's experience.
personality: II. An objective study of repression. Journal Journal of Personality Assessment, 57, 415±433.
of Personality, 23, 129±153. Millon, T. (1969). Modern psychopathology. Philadelphia:
Landis, C., & Katz, S. E. (1934). The validity of certain Saunders.
questions which purport to measure neurotic tendencies. Morey, L. C. (1991). Personality Assessment Inventory
Journal of Applied Psychology, 18, 343±356. manual. Odessa, FL: Psychological Assessment Re-
Landis, C., Zubin, J., & Katz, S. E. (1935). Empirical sources.
validation of three personality adjustment inventories. Morey, L. C., Waugh, M. H., & Blashfield, R. K. (1985).
References 369

MMPI scales for DSM-III personality disorders: Their Bircher. (Trans. Hans Huber Verlag, 1942).
derivation and correlates. Journal of Personality Assess- Runyan, W. McK. (1982). Life histories and psychobio-
ment, 49, 245±251. graphy: Explorations in theory and method. New York:
Morgan, C. D., & Murray, H. A. (1935). A method for Oxford University Press.
investigating fantasies: Thematic Apperception Test. Safran, J. D., & Segal, Z. V. (1990). Interpersonal process in
Archives of Neurology and Psychiatry, 34, 209±306. cognitive therapy. New York: Basic Books.
Murray, H. A. (1938). Explorations in personality. New Sarbin, T. R. (Ed.) (1986). Narrative psychology: The
York: Oxford University Press. storied nature of human conduct. New York: Oxford
Murray, H. A. (1943). The Thematic Apperception Test: University Press.
Manual. Cambridge, MA: Harvard University Press. Saucier, G., & Goldberg, L. R. (1996). The language of
Murray, H. A. (1959). Preparations for the scaffold of a personality: Lexical perspectives on the five-factor
comprehensive system. In S. Koch (Ed.), Psychology: A model. In J. S. Wiggins (Ed.), The five-factor model of
study of science (Vol. 3, pp. 7±54). New York: McGraw- personality: Theoretical perspectives (pp. 21±50). New
Hill. York: Guilford Press.
Myers, I. B., & McCaulley, M. H. (1985). Manual: A guide Schaefer, E. S. (1961). Converging conceptual models for
to the development and use of the Myers-Briggs Type maternal behavior and for child behavior. In J. G.
Indicator. Palo Alto, CA: Consulting Psychologists Glidewell (Ed.), Parental attitudes and child behavior
Press. (pp. 124±146). Springfield, IL: Charles C. Thomas.
Nichols, D. S. (1987). Interpreting the Wiggins MMPI Schafer, R. (1948). The clinical application of psychological
content scales. In K. L. Moreland & J. N. Butcher (Eds.), tests. New York: International Universities Press.
Clinical notes on the MMPI (No. 10, pp. 3±26). Schafer, R. (1954). Psychoanalytic interpretation in
Minneapolis: National Computer Systems. Rorschach testing: Theory and application. New York:
Norman, W. T. (1963). Toward an adequate taxonomy of Grune & Stratton.
personality attributes: Replicated factor structure in peer Schafer, R. (1967). Projective testing and psychoanalysis.
nomination personality ratings. Journal of Abnormal and New York: International Universities Press.
Social Psychology, 66, 574±583. Soldz, S., Budman, S., Demby, A., & Merry, J. (1993).
Norman, W. T. (1967). 2800 personality trait descriptors: Representation of personality disorders in circumplex
Normative operating characteristics in a university popu- and five-factor space: Explorations with a clinical
lation. Ann Arbor, MI: University of Michigan, Depart- sample. Psychological Assessment, 5, 53±63.
ment of Psychology. Spearman, C. (1904). General intelligence, objectively
Office of Strategic Services Assessment Staff (1948). determined and measured. American Journal of Psychol-
Assessment of men. New York: Rinehart. ogy, 15, 201±293.
Payne, F. D., & Wiggins, J. S. (1972). MMPI profile types Stern, G. G., Stein, M. I., & Bloom, B. S. (1956). Methods
and the self-report of psychiatric patients. Journal of in personality assessment. Glencoe, IL: Free Press.
Abnormal Psychology, 79, 1±8. Stewart, A. J., Franz, C., & Layton, L. (1988). The
Pearson, K. (1896). Mathematical contributions to the changing self: Using personal documents to study lives.
theory of evolution: Regression, heredity, and panmixia. Journal of Personality, 56, 41±74.
Philosophical Transactions, 187a, 253±318. Strack, S. & Lorr, M. (1994). Introduction. In S. Strack &
Pincus, A. L., & Wiggins, J. S. (1990). Interpersonal M. Lorr (Eds.), Differentiating normal and abnormal
problems and conceptions of personality disorders. personality (pp. xiii±xviii). New York: Springer.
Journal of Personality Disorders, 4, 342±352. Strong, E. K., Jr. (1927). Vocational Interest Blank (Form
Plutchik, R., & Conte, H. R. (Eds.) (1997). Circumplex A). Stanford, CA: Stanford University Press.
models of personality and emotions. Washington, DC: Sullivan, H. S. (1953). The interpersonal theory of
American Psychological Association. psychiatry. New York: Norton.
Plutchik, R., & Conte, H. R. (1986). Quantitative assess- Thurstone, L. L. (1934). The vectors of mind. Psychological
ment of personality disorders. In R. Michels & J. O. Review, 41, 1±32.
Cavenar, Jr. (Eds.), Psychiatry (Vol. 1, pp. 1±13). Tomkins, S. S. (1947). The Thematic Apperception Test.
Philadelphia: Lippincott. New York: Grune & Stratton.
Prelinger, E., & Zimet, C. N. (1964). An ego-psychological Tomkins, S. S. (1979). Script theory: Differential magnifi-
approach to character assessment. Glencoe, IL: The Free cation of affects. In H. E. Howe, Jr. & R. A. Dienstbier
Press. (Eds.), Nebraska symposium on motivation (Vol. 26,
Rapaport, D. (1942). Principles underlying projective pp. 201±236). Lincoln, NE: University of Nebraska
techniques. Character and Personality, 10, 213±219 Press.
(reprinted in Gill, 1967). Trapnell, P. D., & Wiggins, J. S. (1990). Extension of the
Rapaport, D. (1944±1946). Manual of diagnostic psycholo- Interpersonal Adjective Scales to include the Big Five
gical testing (2 vols.). New York: Josiah Macy, Jr. dimensions of personality. Journal of Personality and
Foundation. Social Psychology, 59, 781±790.
Rapaport, D. (1946). Principles underlying nonprojective Tupes, E. C., & Christal, R. E. (1961). Recurrent
tests of personality. Annals of the New York Academy of personality factors based on trait ratings (USAF ASD
Sciences, 46, 643±652 (reprinted in Gill, 1967). Tech. Rep. No. 61±97). Lackland Air Force Base, TX:
Rapaport, D. (1951). The conceptual model of psycho- US Air Force.
analysis. Journal of Personality, 20, 56±81 (reprinted in Vernon, P. E. (1950). The structure of human abilities.
Gill, 1967). London: Methuen.
Rapaport, D. (1960). The structure of psychoanalytic Waller, N. G., & Ben-Porath, Y. S. (1987). Is it time for
theory: A systematizing attempt. Psychological Issues, clinical psychology to embrace the five-factor model of
2, Monograph 6. personality? American Psychologist, 42, 887±889.
Rapaport, D., Gill, M., & Schafer, R. (1946). Diagnostic Webb, J. T., Levitt, E. E., & Rojdev, R. (1993, March).
psychological testing (2 vols.). Chicago: Year Book. After three years: A comparison of the clinical use of the
Rapaport, D., Menninger, K. A., & Schafer, R. (1947). The MMPI and MMPI-2. Paper presented at the 53rd
new role of psychological testing in psychiatry. American Annual Meeting of the Society for Personality Assess-
Journal of Psychiatry, 103, 473±476 (reprinted in Gill, ment, San Francisco, CA.
1967). Wechsler, D. (1941). The measurement of adult intelligence.
Rorschach, H. (1921). Psychodiagnostik. Bern, Switzerland: Baltimore: Williams & Wilkins.
370 Principles of Personality Assessment

Wechsler, D. (1958). The measurement and appraisal of Wiggins, J. S. (1991). Agency and communion as con-
adult intelligence (4th ed.). Baltimore: Williams & ceptual coordinates for the understanding and measure-
Wilkins. ment of interpersonal behavior. In W. Grove & D.
Welsh, G. S. (1956). Factor dimensions A and R. In G. S. Cicchetti (Eds.), Thinking clearly about psychology:
Welsh & W. G. Dahlstrom (Eds.), Basic readings on the Essays in honor of Paul E. Meehl (Vol. 2, pp. 89±113).
MMPI in psychology and medicine (pp. 264±281). Min- Minneapolis, MN: University of Minnesota Press.
neapolis, MN: University of Minnesota Press. Wiggins, J. S. (1995). Interpersonal Adjective Scales:
Welsh, G. S., & Dahlstrom, W. G. (1956). Basic readings on Professional manual. Odessa, FL: Psychological Assess-
the MMPI in psychology and medicine. Minneapolis, ment Resources.
MN: University of Minnesota Press. Wiggins, J. S. (1996). Preface. In J. S. Wiggins (Ed.), The
Westen, D. (1990). Psychoanalytic approaches to person- five-factor model of personality: Theoretical perspectives
ality. In L. Pervin (Ed.), Handbook of personality theory (pp. vii±xi). New York: Guilford.
and research (pp. 21±65). New York: Guilford Press. Wiggins, J. S. (in press). Paradigms of personality assess-
Westen, D. (1991). Clinical assessment of object relations ment. New York: Guilford Press.
using the TAT. Journal of Personality Assessment, 56, Wiggins, J. S., Phillips, N., & Trapnell, P. (1989). Circular
56±74. reasoning about interpersonal behavior: Evidence con-
White, R. W. (1952). Lives in progress (1st ed.). New York: cerning some untested assumptions underlying diagnos-
Holt, Rinehart, & Winston. tic classification. Journal of Personality and Social
White, R. W. (1966). Lives in progress (2nd ed.). New York: Psychology, 56, 296±305.
Holt, Rinehart, & Winston. Wiggins, J. S., & Pincus, A. L. (1989). Conceptions of
White, R. W. (1975). Lives in progress (3rd ed.). New York: personality disorders and dimensions of personality.
Holt, Rinehart, & Winston. Psychological Assessment, I, 303±316.
White, R. W. (1981). Exploring personality the long way: Wiggins, J. S., & Pincus, A. L. (1992). Personality:
The study of lives. In A. I. Rabin, J. Arnoff, A. M. Structure and assessment. In M. R. Rosenzweig & L.
Barclay, & R. A. Zucker (Eds.), Further explorations in W. Porter (Eds.), Annual review of psychology (Vol. 43,
personality (pp. 3±19). New York: Wiley. pp. 473±504). Palo Alto, CA: Annual Reviews.
Widiger, T. A., Trull, T. J., Clarkin, J. F., Sanderson, C., & Wiggins, J. S., & Pincus, A. L. (1994). Personality structure
Costa, P. T., Jr. (1994). A description of the DSM-III-R and the structure of personality disorders. In P. T. Costa,
and DSM-IV personality disorders with the five-factor Jr. & T. A. Widiger (Eds.), Personality disorders and the
model of personality. In P. T. Costa, Jr. & T. A. Widiger five-factor model of personality (pp. 73±93). Washington,
(Eds.), Personality disorders and the five-factor model of DC: American Psychological Association.
personality (pp. 41±71). Washington, DC: American Wiggins, J. S., & Trapnell, P. D. (1996). A dyadic-
Psychological Association. interactional perspective on the five-factor model. In J.
Wiggins, J. S. (1962). Strategic, method, and stylistic S. Wiggins (Ed.), The five-factor model of personality:
variance in the MMPI. Psychological Bulletin, 59, Theoretical perspectives (pp. 88±162). New York: Guil-
224±242. ford.
Wiggins, J. S. (1966). Substantive dimensions of self-report Wiggins, J. S., & Trapnell, P. D. (1997). Personality
in the MMPI item pool. Psychological Monographs, 80, structure: The return of the Big Five. In R. Hogan, J. A.
(22, Whole No. 630). Johnson, & S. R. Briggs (Eds.), Handbook of personality
Wiggins, J. S. (1973a). Despair and optimism in Minnea- psychology (pp. 737±764). San Diego, CA: Academic
polis (A review of J. N. Butcher (Ed.), Objective Press.
personality assessment: Changing perspectives). Contem- Wiggins, J. S., & Trapnell, P. D., & Phillips, N. (1988).
porary Psychology, 18, 605±606. Psychometric and geometric characteristics of the revised
Wiggins, J. S. (1973b). Personality and prediction: Principles Interpersonal Adjective Scales (IAS-R). Multivariate
of personality assessment. Reading, MA: Addison- Behavioral Research, 23, 517±530.
Wesley. Wiggins, J. S., & Trobst, K. K. (1997). Prospects for the
Wiggins, J. S. (1979). A psychological taxonomy of trait- assessment of normal and abnormal interpersonal
descriptive terms: The interpersonal domain. Journal of behavior. Journal of Personality Assessment, 68, 110±126.
Personality and Social Psychology, 37, 395±412. Winter, D. G. (1973). The power motive. New York: Free
Wiggins, J. S. (1984). Cattell's system from the perspective Press.
of mainstream personality theory. Multivariate Behavior- Woodworth, R. S. (1917). Personal data sheet. Chicago:
al Research, 19, 176±190. Stoelting.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.13
Observations of Parents,
Teachers, and Children:
Contributions to the Objective
Multidimensional Assessment of
Youth
DAVID LACHAR
University of Texas-Houston Medical School, Houston, TX, USA

4.13.1 ISSUES IN OBJECTIVE MULTIDIMENSIONAL MULTISOURCE ASSESSMENT 372


4.13.1.1 Overview 372
4.13.1.2 Multidimensional Measurement 372
4.13.1.3 Multisource Assessment 373
4.13.1.4 Psychometric and Technical Considerations 374
4.13.1.4.1 Scale stability and homogeneity 374
4.13.1.4.2 Scale construction methods and scale character 375
4.13.1.4.3 Evidence of scale and questionnaire validity 376
4.13.2 MULTIDIMENSIONAL MULTISOURCE RATING SCALES 377
4.13.2.1 Overview 377
4.13.2.2 Conners' Rating Scales-Revised 378
4.13.2.2.1 Conners' Parent Rating Scale-Revised 378
4.13.2.2.2 Conners' Teacher Rating Scale-Revised 379
4.13.2.2.3 Conners±Wells' Adolescent Self-Report Scale 379
4.13.2.2.4 Commentary 380
4.13.2.3 Child Behavior Checklist/4±18 and 1991 Profile (CBCL); Teacher's Report Form and
1991 Profile (TRF); Youth Self-report and 1991 Profile (YSR) 380
4.13.2.3.1 Child Behavior Checklist/4±18 (CBCL) and 1991 Profile 382
4.13.2.3.2 Teacher's Report Form (TRF) and 1991 Profile 383
4.13.2.3.3 Youth Self-report (YSR) and 1991 Profile 383
4.13.2.3.4 Commentary 384
4.13.2.4 Behavior Assessment System for Children 385
4.13.2.4.1 BASC Parent Rating Scales 386
4.13.2.4.2 BASC Teacher Rating Scales 386
4.13.2.4.3 BASC Self-Report of Personality (SRP) 386
4.13.2.4.4 BASC Monitor for ADHD 387
4.13.2.4.5 Commentary 387
4.13.2.5 Personality Inventory for Children, 2nd Edition (PIC-2); Personality Inventory for Youth (PIY);
Student Behavior Survey (SBS) 388
4.13.2.5.1 Personality Inventory for Children, 2nd Edition (PIC-2); Personality Inventory for Youth (PIY) 389
4.13.2.5.2 Student Behavior Survey 397
4.13.2.5.3 Commentary 398

371
372 Observations of Parents, Teachers, and Children

4.13.3 CONCLUSIONS 398


4.13.4 REFERENCES 399

4.13.1 ISSUES IN OBJECTIVE 4.13.1.2 Multidimensional Measurement


MULTIDIMENSIONAL
MULTISOURCE ASSESSMENT Multidimensional instruments require stan-
dardized responses to similarly formatted
4.13.1.1 Overview descriptive statements which generate estimates
of a variety of dimensions of child behavior or
The use of objective questionnaires in the psychopathology. These estimates are usually
clinical assessment of child and adolescent visually displayed in the form of a profile of
adjustment continues to grow in popularity. standard scores. These standard scores repre-
The five years that followed my last review of sent a common metric that has been derived
objective assessment of children and adolescents from one normative reference sample or over-
(Lachar, 1993), have brought additional psy- lapping or similar samples, and are therefore
chometric advances in instrument development directly comparable (vs. the case of scores from
and application. The importance of objective several single-dimension measures that have
measures in current practice has also been been standardized on normative samples col-
demonstrated by the relative coverage provided lected in very different places and times, as well
in monographs and textbooks available for as under different conditions).
graduate education and continuing education Advances in psychopathology and measure-
(Kamphaus & Frick (1996) is an excellent ment technology demonstrate the clinical
example that supports this observation). advantage of multidimensional measurement.
It is also important to consider the effect It is the case, rather than the exception, in
recent changes in the provision of health care clinical assessment that several scaled dimen-
may have upon the use of psychological testing sions of adjustment, such as depression and
in the diagnostic process. Managed care may anxiety or inattention and noncompliance, are
influence the use of objective assessment in two concurrently elevated in the clinical range. Not
quite different ways. First, in the pursuit of cost only are these dimensionsÐoften classified into
containment, the use of time-intensive indivi- externalizing (behavioral excess) or internaliz-
dually administered psychological tests is less ing (behavioral deficit) dimensionsÐ
likely to be authorized in the routine evaluation problematic concurrently, the externalizing
of youth, quite independently of the documented and internalizing diagnoses are often found to
value of such procedures. Relatively inexpensive be comorbid. For example, adolescents hospi-
diagnostic questionnaires, in contrast, are more talized on a psychiatric unit are often described
likely to be accepted as a routine component of by more than one diagnosis or by a diagnosis
initial evaluations because they improve upon that reflects different dimensions of psycho-
the efficiency of the intake data collection pathology. It is not unusual for such adolescents
process. Second, these families of multidimen- to demonstrate significant levels of depression
sional measures can contribute substantial value or anxiety and at the same time demonstrate
to the program evaluation aspects of managed pathologies of behavioral excess (oppositional
care. Objective questionnaires have the potential defiant disorder, conduct disorder, attention
to independently document both the need for deficit hyperactivity disorder (ADHD); see, for
treatment as well as to quantify treatment example, O'Connor, McGuire, Reiss, Hether-
efficacy through repeated application. ington, & Plomin, 1998). The most useful
This chapter will expand upon many of the assessment procedure will provide a quantified
themes presented in Lachar (1993) and will estimate of the pattern of problems in need of
describe in some detail four ªfamiliesº of remediation and answer relevant questions such
assessment instruments that meet current stan- as the role that family conflict may play in
dards for intake assessment and treatment symptom development or maintenance, and the
evaluation. These materials: (i) assess multiple context of such symptoms.
dimensions of problem behavior, (ii) collect Multidimensional assessment is uniquely
observations from parents, teachers, and youth, designed to describe patterns of adjustment
and (iii) provide standard scores based upon characterized by dimensions of both poor
contemporary national samples. In 1993 only adjustment and relatively symptom-free adjust-
one set of measures met these criteria (Achen- ment. In many cases the documented absence of
bach, 1992); at the conclusion of 1997 four sets of certain problems is as important as is the
such scales were either commercially available or documented presence of others. The following
in press. common clinical challenge documents the
Issues in Objective Multidimensional Multisource Assessment 373

importance of multidimensional assessment. Parents and teachers not only refer youth for
Children and adolescents are often referred assessment, they are also the primary sources of
for evaluation by parents, educators, physi- useful systematic observation. Certainly adults
cians, and mental health professionals because should be the ideal informants to report
they have been observed to be hyperactive and/ noncompliance of a child to their requests.
or inattentive. Exclusive application of homo- Parents are the only consistently available
geneous measures of ADHD or hyperactivity in source for report of early childhood develop-
such cases can only prove to be problematic. ment and description of child behavior in the
These referred children are often found to be home. Teachers offer the most accurate ob-
ªinattentiveº for reasons other than ADHD servations of the age-appropriateness of a child's
(depression, anxiety, situational adjustment, adjustment in the classroom and academic
learning disability, acquired cognitive disability, achievement, as well as of the attentional,
etc.). When significant elevation is not obtained motivational, and social phenomena unique to
on focused hyperactivity scales, guidance in the the classroom and to the school. It is likely,
search for likely alternative diagnoses is not however, that such observational accuracy
provided. On the other hand, youth found to decreases after the elementary school grades,
meet all of the criteria of ADHD often present as teachers have very little continuous observa-
with coexisting conditions (see, for example, tion of students who appear for only 45 minutes a
August, Realmuto, MacDonald, Nugent, & day in a classroom of 30 or more other students.
Crosby, 1996; Vaughn, Riccio, Hynd, & Hall, (An exception perhaps would be the ratings of
1997). When such comorbid conditions remain counselors, resource room teachers who work
unrecognized, their presence cannot be con- with small groups of students, and teachers in
sidered in treatment planning. In such cases, the self-contained special education classrooms.)
resulting treatment may be compromised by the Youth self-description, regardless of problems
absence of such information. that have been documented for this source of
information (Greenbaum, Dedrick, Prange, &
Friedman, 1994; Jensen et al., 1996), still must be
4.13.1.3 Multisource Assessment seen as the most direct and accurate expression
of personal thoughts and feelings. (Note that
Although the evaluation of adjustment or Michael & Merrell (1998) have demonstrated
psychopathology in adults usually relies on self- adequate short-term temporal stability for the
report, such self-description is rarely adequate in self-report of third to fifth graders.)
the evaluation of children and adolescents. The availability of a coordinated ªfamily of
Indeed, the context of assessment is fundamen- instrumentsº in which uniquely designed ques-
tally different for children and adolescents who tionnaires are completed by two or three
are unlikely to refer themselves for evaluation or different informant sources, each providing
treatment and may not possess the academic, independent sets of child descriptions, offers a
cognitive, or motivational status to complete a natural opportunity for critical comparison of
comprehensive self-report instrument. The first the value of these sources of information.
consideration in this regard is that young Achenbach, McConaughy, and Howell (1987)
elementary school children, perhaps students conducted a comprehensive literature review
in K through 3, may be unable to describe them- and found, although relatively greater between-
selves adequately through response to ques- source agreement was obtained for scales
tionnaire statements. These children are unlikely representing externalizing behaviors, very lim-
to have mastered the range of vocabulary ited concordance in general between the report
necessary to adequately describe dimensions of parent, teacher, and youth. A review of similar
of adjustment; such language competence is not studies that evaluated the responses to objective
usually attained before the fourth or fifth grade. interviews from parent and child concluded that
Another consideration is the reality that greater agreement was obtained for visible
youth are most often referred for evaluation behaviors and for child±parent pairs with
because they are either noncompliant with the increasing child age (Lachar & Gruber, 1993).
requests of the significant adults in their lives or Although one reasonable approach to the
exhibit problems in academic achievement, interpretation of differences between parent,
often presenting with inadequate reading skills. teacher, and youth is to assign such differences
It is therefore not unusual that completion of a to situation-specific variation (e.g., the child is
self-report inventory of several hundred items actually oppositional at home, not in the
by a high school student (e.g., the Minnesota classroom), other explanations are equally
Multiphasic Personality Inventory-Adolescent plausable. One alternative explanation of cross-
consisting of 478 statements) could present informant variance requires a close examination
quite an assessment challenge. of the measures being applied. It is a frequent
374 Observations of Parents, Teachers, and Children

occurrence to find that scales with similar names Inventory for Children, 2nd Edition (PIC-2). See
contain significantly different content. In this also Lachar, in press.)
situation an apparent lack of agreement For the clinician there is a distinct pragmatic
between informants on similarly named scales advantage in using an assessment system with
(such as ªdepressionº) may more accurately comparable parent, teacher, and youth versions.
reflect the differences in statement manifest Research may document one day that one source
content. At one extreme a clinician might of descriptive report is superior to another in the
consider accepting any evidence of symptom assessment of a specific problem domain (only
presence from any informant source, at the surveys of the opinions of clinicians and parents
other extreme it would also be possible to focus are currently available: Loeber, Green, & Lahey,
exclusively on problems demonstrated by at 1990; Phares, 1997). Until that day, however,
least two or even all three informant sources. conditions regularly occur in the conduct of
On the other hand, the development and psychological evaluations that on some occa-
application of valid identical across-source sions make it necessary to collect one, and on
measures may not provide optimal assessment other occasions another, of these parent-,
data. Such an approach may restrict the teacher-, and youth-report questionnaires. Chil-
diagnostic potential of each informant source dren may be too young, uncooperative, or
by excluding the measurement of attributes that language-impaired to complete a questionnaire.
may be uniquely obtained from only one inform- The evaluation may occur during the summer
ant source (e.g., a parent-completed measure of vacation, or the youth may have not consistently
developmental delay, a teacher-completed mea- attended one school, or may have left school
sure of academic skills, or a youth-completed permanently, making collection of a teacher
measure of achievement motivation). rating difficult or impossible. Parents may fail
Another explanation for poor cross-infor- family appointments when their child is hospi-
mant agreement for data gathered in a clinical talized, or children may be under agency
context acknowledges the potential effect of guardianship rather than parent supervision.
response sets on the accuracy of such informa- In such not infrequent cases, a ªfamilyº of
tion. Starting with the child or adolescent being objective questionnaires separately normed for
assessed, adequate compliance with question- parents, teachers, and youth provide the flex-
naire instructions may not be obtainedÐ ibility to facilitate the collection of data from the
reflecting either inadequate language or reading source or sources that are available.
skills, or lack of adequate motivation for the
task. It is also likely that a youth may not want to
share a personal history of maladaptive behavior 4.13.1.4 Psychometric and Technical
and current internal discomfort with mental Considerations
health professionals, although a negative pre-
sentation of parent adjustment and home The topics discussed in this section are
conflict may be more readily provided. At times, unlikely to spark much enthusiasm in clinicians
youth may also be motivated to admit to who also demonstrate little interest in reading
problems and symptoms that are not present. the technical sections of test manuals. These
These same motivations and conditions may issues, however, are important as they influence
also influence parent report. Indeed, there has test validity and ease of interpretation and must
been some concern that poor parent adjustment be considered in the selection of instruments for
may compromise the validity of parent report specific clinical applications.
(Achenbach, 1981). However, a subsequent
review (Richters, 1992) and the specific analysis
4.13.1.4.1 Scale stability and homogeneity
of this issue with the Personality Inventory for
Children (PIC) (Lachar, Kline, & Gdowski, It may seem quite reasonable to apply the
1987) have found no empirical support for this technical standards and statistical techniques
consideration. Some instruments, however, in- first developed in the construction of ability and
corporate validity scales to identify such re- achievement measures to scales of adjustment
sponse sets. These scales are designed to measure and personality. For example, in the develop-
random or inadequate responses to scale state- ment of an age-normed measure of verbal IQ or a
ments, defensive denial of existing problems, as grade-normed measure of arithmetic calculation
well as admission of symptoms that are unlikely skill, certain assumptions are made. As each of
to be present or exaggeration of actual problems. these measures would be conceptualized as
(Examples of the effect of informant defensive- relatively homogenous and stable over at least
ness on profile pairs are presented in Table 5 and weeks or months, item components would need
discussed in associated text for the Personality to demonstrate substantial intercorrelation as
Inventory for Youth (PIY) and the Personality well as little change over time (coefficent alphas
Issues in Objective Multidimensional Multisource Assessment 375

and test±retest correlations of one week/three It is of course generally true that increased
months of at least 0.80). scale length will improve the depth or variety of
Measures of adjustment or psychopathology, expressed content. Against this reality must be
in contrast, can vary in degree of homogeneity balanced concerns about the time required of
and may in fact in some cases be expected to be informants to complete a questionnaire, an
relatively unstable over time or sensitive to the issue that appears to be significant to psychol-
effects of treatment. A measure of trait adjust- ogists who work in test development and those
ment might consist of responses that describe who work in test application. One must also
previous behaviors and therefore demonstrate consider the context in which evaluations are to
considerable temporal stability, while a measure be conducted. A baseline or initial evaluation
of mood that consists of self-statements made in would justify a more comprehensive evaluation
the present tense may provide a state measure than would a repeated evaluation or some
that is expected to vary considerably over time. approach to treatment evaluation in which daily
An empirically constructed scale in which items or weekly observations are required.
have been selected because they have statistically Each clinician will need to judge the value of
separated well-defined samples (e.g., normal the information obtained vs. the time needed to
adolescents vs. adolescents adjudicated delin- generate these measures. In this author's
quent) is likely to be multidimensional and experience, within a clinical (vs. research)
limited in homogeneity. In contrast, a scale context, parents view these questionnaires as
constructed by factor analysis consists of items appropriate and appreciate such an offer to
that correlate highly with each other and such make this sort of contribution to the diagnostic
scales demonstrate substantial homogeneity. process. Teachers, unless asked to complete
The importance of scale homogeneity as mea- lengthy forms on a whole classroom of students,
sured by coefficient alpha and scale stability are quite willing to spend 15 or 20 minutes
represented by the test±retest correlation must completing a form that either initiates a referral
be evaluated in light of the expressed meaning to a school psychologist or provides necessary
and application of each measure. Of importance classroom observation for an independent or
in any case is the demonstrated validity of these agency-conducted evaluation. The instruments
scales; that is, evidence that each scale measures of the families of questionnaires presented in
what it is supposed to and that interpretive this chapter vary in length from a ªshort formº
guidelines are derived from empirically based of 10 items that generates one standard score to
relationships between scale values and indepen- a ªshort formº of eight 12-item shortened scales
dent observations. that is as long as the full-length versions of some
other questionnaires.
Scale length (number of items) and rate of
4.13.1.4.2 Scale construction methods and scale
item endorsement (infrequently endorsed items
character
in short scales) within normative and clinical
It is important to be aware of both scale samples directly influence scale performance,
content and scale performance when applying such as range of standard scores and the shape
scale standard scores in clinical assessment. of score distributions. Response format is also
Scale content has gained in judged importance relevant in that a scale score increment of one
and scale items have been selected frequently point may represent the endorsement of one
because of specific content or written to measure additional symptom or characteristic (true/
specific dimensions (see, for example, the false; present/absent) or a subjective increase
Conners DSM-IV scales (Conners, 1997) and in symptom observed certainty or frequency
the Disruptive Behavior DSM-IV scales of the (somewhat true to very true; sometimes true to
Student Behavior Survey (Lachar, Kline, Win- often true). It may be useful to identify the
genfeld, & Gruber, 1999; Pisecco, Lachar, minimum descriptive content necessary to
Gallen, Gruber, & Huzinec, 1998). This recent generate a clinically significant standard score.
focus on item content is in sharp contrast to It may also be useful to determine the amount of
previously developed empirically keyed scales in response difference necessary to move a scale
which item performance took preference over value from normal limits to the clinical range.
item content. Current consensus is that scale Note that a scale with a 0±18 raw score range
content is necessary but insufficient to assure can be generated from either six symptoms rated
scale validity. It is recommended that clinicians from 0 to 3, or from 18 true/false statements that
first read the items on each scale and examine the vary in both symptom intensity and frequency.
content dimensions provided as well as the Although it will take longer to respond to 18
content equivalence for parent, teacher, and than to six items, it is obvious that the content
youth questionnaires before attempting any basis for standard scores will vary substantially
clinical application. for these two measures. A comprehensive
376 Observations of Parents, Teachers, and Children

evaluation of any dimension is likely to require applications of each scale before it can be
more than a handful of observations. interpreted with certainty. It should be under-
Although these scale scores are usually stood that standard scores derived from the
positively skewed (symptoms and dimensions most representative and comprehensive of
of psychopathology are not normally distrib- normative data simply provide the frequency
uted), scale distribution may be excessively or infrequency of a given score in this normative
restricted due to scale length, item response sample. Just because a pattern of scale standard
frequency, or normative sample character so scores is infrequent does not necessarily demon-
that the scale mean and standard deviation that strate that this pattern is clinically meaningful
generate standard scores are very limited. (See or significant. It is also important to acknowl-
Rowe & Rowe (1997) for a contemporary edge that scale validity is not a general attribute.
discussion of this issue with a focus on the The validity of a scale can only be demonstrated
Conners' Global Index.) For example, a scale within a specific context for a specific purpose.
with a normative mean of 1.13 and a standard The construction or selection of a normative
deviation of 2.78 (Conners' Teacher Rating sample for each questionnaire is also a process
Scale-Revised (CTRS-R): Oppositional, female requiring critical reviewÐespecially when ques-
12±14 years) would generate the following raw tionnaires provide a variety of normative
to T-score conversions: 0 = 46T, 1 = 50T, 2 = options (see, for example, Reynolds & Kam-
53T, 3 = 57T, 4 = 60T, 5 = 64T, 6 = 69T. phaus, 1992). In the evaluation of school-aged
Because the Oppositional scale consists of six children, scale raw scores may be converted to
items rated from 0 (not true at all/never, seldom) standard scores based upon normative (not
to 3 (very much true/very often, very frequent), receiving counseling or special education ser-
the following interpretive issues are suggested. vices), representative (incorporated an accurate
Considering that the scale raw scores for proportion of all children), or referred (clinical
Oppositional can range from 0 to 18, responses or special education) samples. Normative
of ªnot trueº or ªjust a little trueº predominate samples also differ as to whether they are
in the normative sample. Only a scale score of gender specific or combine the protocols of both
ª0º extends below the mean. Of most interest, a boys and girls. It should also be noted that
clinically elevated score (465T) can be obtained normative samples may be constructed to reflect
by responding ªjust a little trueº to each item, or child age. Such age-referenced samples may
ªvery much trueº to only two statements. Please represent broadly defined groups such as
note that other CTRS-R scales actually obtain preadolescent/adolescent, or provide norms
mean raw scores below 1.0 and standard that reflect a much more narrow age range
deviations below 2.0, but the CTRS-R is far (see Conners, 1997). The relative utility of any
from the only questionnaire to exhibit these normative sample rests upon the demonstration
troublesome psychometric characteristics. that derived standard scores are comparable in
Users are well advised to review a test's raw validity. For example, the scales of the PIC have
score-to-T score translation table or profile been found to be equivalent in their ability to
where the issue can be evaluated. predict external criteria for boys vs. girls and for
children vs. adolescents (Kline & Lachar, 1992;
Kline, Lachar, & Sprague, 1985). Test authors
4.13.1.4.3 Evidence of scale and questionnaire
who promote the use of an atypical normative
validity
procedure should demonstrate that the resulting
All too often test evaluations and reviews interpretive guidelines are as accurate as those
discuss internal consistency and reliability at obtained by more traditional means. For
length and then give short shrift to test validity. example, if a questionnaire provides both
This is perhaps understandable in that it is easier traditional gender-specific and novel mixed-
to make comparisons and establish numerical gender norms, evidence should be provided that
guidelines for evaluating these characteristics. mixed-gender norms generate interpretive
UnfortunatelyÐand particularly so for instru- guidelines which are either more accurate or
ments rating psychopathology or other infre- as accurate as more traditional gender-specific
quent behaviorÐthe emphasis should be quite guidelines.
the opposite. Validity is far more difficult to The construction of a new set of scales for a
define, measure, and establish, but it really is the multidimensional inventory or set of inventories
only bottom line. Even a scale that demon- must follow a path in which fundamental
strates somewhat weak or even impaired evidence of validity must first be obtained. If
reliability can be extremely valuable clinically obtained, additional study of more specific and
if it can successfully identify a rare or transient focused evidence of accurate scale performance
characteristic such as violent labile aggression. must be pursued. For example, when items are
It is necessary to establish the validity of specific placed on dimensions on the basis of manifest
Multidimensional Multisource Rating Scales 377

content, the accuracy of this placement could be in which the investigator asks the question
demonstrated through item-to-total correla- ªWhat is the likelihood of a given test result
tions in which each item will be expected to given certain clinical criteria?º vs. ªWhat is the
obtain the greatest correlation with the dimen- likelihood of certain clinical phenomena given a
sion on which it has been placed. If the scales are certain test score, score range, or score pattern?º
constructed to measure adjustment or psycho- It is the latter question (and associated research
pathology, each scale could then be applied to design) that replicates the use of tests in clinical
regular education and special education and/or assessment and hence offers the better evidence
clinical samples to determine if meaningful of validity.
separation occurs between these groups for each The clinician who reads that 95% of a sample
scale. Here ªmeaningfulº is more than a of adolescents hospitalized on a psychiatric unit
statistically significant difference, which when with a discharge diagnosis of major depression
samples are large could be so small as to be of no obtain an elevated score on a certain depression
pragmatic value. Scales in this context should scale should understand that this information is
demonstrate at least a moderate effect sizeÐ insufficient to demonstrate either construct
generally about one-third of a standard devia- validity or provide an interpretive guideline.
tion or greater (Cohen, 1988). If, for example, a comparable proportion of the
Unlike ability tests that gain initial construct remaining adolescents on that psychiatric unit
validity by correlation with similar although also received similar test results, these addi-
established measures, scales of adjustment are tional data would suggest that this ªdepressionº
less likely to demonstrate comparable concor- scale measures general distress rather than
dance (0.70s±0.80s) because of differences in depression. In addition, clinicians should pay
content and format. Indeed, substantial corre- attention to any evidence of a scale's ability to
lations obtained between pairs of scales from make clinically meaningful distinctions, not the
different tests completed by the same informant separation of normal and referred subjects (see,
most likely reflect substantial shared item for example, Forbes (1985) in which PIC
content. Adjustment scales for children and profiles of children with the diagnosis of ADHD
adolescents acquire validity through demon- were compared to the PIC profiles of behavio-
stration of external correlates that may take a rally disturbed children whose symptom pre-
variety of forms including independent ratings, sentation did not support such a diagnosis). It is
special education placement, diagnoses, or also important to focus on studies that docu-
specific treatment. For example, in students ment the independent correlates of scale eleva-
referred for evaluation, elevation of a parent- tions or profile patterns (see, for example,
completed attention deficit hyperactivity scale LaCombe, Kline, Lachar, Butkus, & Hillman
should correlate with content-congruent tea- (1991) and the clinician, teacher, and self-report
cher ratings, and students diagnosed with correlates of parent-report PIC-2 shortened
ADHD and assigned to receive stimulant scales presented in Table 4). In general, such
therapy should obtain clinically elevated scale studies assemble a sizable data set that includes
scores that subsequently fall within the norma- both the test scores under study and other
tive range following successful treatment. clinically meaningful independent information,
The accumulation of substantial construct with the goal of determining the degree to which
validity and the development of empirically test variables accurately predict clinically mean-
based interpretive guidelines are prerequisite for ingful variables across all subjects. In this way a
a questionnaire to achieve status as an estab- scale's convergent validity (scale elevation for
lished clinical measure. Unfortunately, the patients afflicted with the problem to which the
accumulation of a substantial bibliography scale is directed) and discriminant validity (scale
provides an insufficient basis for the develop- nonelevation for nonafflicted subjects or pa-
ment of this status. Tests are far more often tients afflicted with unrelated problems)
applied in the quantification of phenomena (Campbell & Fiske, 1959) are established.
under study rather than applied in the study of
scale score meaning. For example, contrast
ªHow many children with mild mental retarda- 4.13.2 MULTIDIMENSIONAL
tion also demonstrate the DSM-IV criteria for a MULTISOURCE RATING SCALES
disruptive behavior disorder?º with ªHow do 4.13.2.1 Overview
children with mild mental retardation who also
obtain an elevated parent-informant hyperac- The second section of this chapter describes
tivity scale differ in some way from children the four families of commercially available
with mild mental retardation who do not obtain multidimensional, multisource questionnaires.
such an elevated scale?º That is, tests of Although considerable detail is provided for
adjustment are usually applied in the paradigm each set of scales, clinicians are urged to review
378 Observations of Parents, Teachers, and Children

all materials that may be appropriate for their suggests that the comparison of sources (tea-
specific applications. The conduct of this review cher, parent, adolescent) can determine the
has been difficult. Within these four sets of reliability of the generated diagnostic hypoth-
scales, one (Behavior Assessment System for eses. Conners also suggests that the CRS-R
Children; Reynolds & Kamphaus, 1992) was scales should be augmented when their inter-
first published in 1992, two (Child Behavior nalizing scales are elevated by using other
Checklist and related measures: Achenbach, measures, such as the Children's Depression
1991; Conners Rating Scales (CRS): Conners, Inventory (CDI) (Kovacs, 1992) and the
1997) have recently been revised and expanded, Revised Children's Manifest Anxiety Scale
while the fourth set of scales includes one (Reynolds & Richmond, 1985).
recently published questionnaire (Personality Conners details his concern for the accuracy
Inventory for Youth: Lachar & Gruber, 1995), of each protocol and the possible effect of
one questionnaire just revised (Personality response sets, such as defensiveness, although
Inventory for Children, 2nd Edition (PIC-2): the CRS-R does not provide measures of such
Lachar, 1999), and one new measure (Student response sets. Conners suggests that it is useful
Behavior Survey (SBS): Lachar, Kline, Wingen- to review the within-scale consistency of the
feld, & Gruber, 1999). Although the established content of items rated ª3º (very much true/very
materials have been collectively applied in often, very frequent) within elevated (464T)
approximately 3000 studies, no systematic re- scales/indices, although the efficiency of this
view has been conducted to identify those studies process would be vastly improved through the
that could form the basis for the development of development of a consistency scale. Although
interpretive guidelines (note in contrast the the manual suggests that a clinically elevated
actuarial studies of profile scales and profile scale exceeds 64T, case examples present scale
types for the PIC: Lachar & Kline, 1994). elevations below 60T as contributing to the
diagnostic process. The manual presents and
analyzes raw scores (mean, standard deviation)
4.13.2.2 Conners' Rating Scales-Revised rather that providing T-score ranges and T-
score frequencies within well-defined clinical or
[Multi-Health Systems Inc., 908 Niagara special education samples (e.g., establishing the
Falls Boulevard, North Tonawanda, NY scale T-score ranges for the dimensions that best
14120-2060, USA] define children diagnosed with ADHD).
The Conners scales were derived from teacher
and parent measures first used in research
4.13.2.2.1 Conners' Parent Rating Scale-
conducted during the 1960s in the study of the
Revised
pharmacological treatment of disruptive beha-
viors. The Conners' Rating Scale-Revised The 80-statement form is the most compre-
(CRS-R) emerged from a considerable research hensive of the Conners' parent measures.
base generated over more than 30 years. An Although the manual notes the reading level
annotated bibliography of over 450 studies is of this form as ninth grade, Conners' Parent
available from the publisher (Wainwright & Rating Scale-Revised (CPRS-R) items range in
MHS Staff, 1996). Parent, teacher, and adoles- complexity from one-word stems such as
cent response to questionnaire items takes the ªIrritableº to a 28-word descriptive phrase:
same four-choice options: 0 = Not True at All ªDoes not follow through on instructions and
(Never, Seldom), 1 = Just a Little True (Occa- fails to finish school work, chores or duties in
sionally), 2 = Pretty Much True (Often, Quite a the workplace (not due to oppositional beha-
Bit), and 3 = Very Much True (Very Often, viour or failure to understand instructions).º
Very Frequent). CRS-R standard scores are The CPRS-R generates seven factor-derived
linear T scores derived from contiguous three- nonoverlapping scales apparently generated
year segments of the normative sample. The from the ratings of regular education students
CRS-R hand scoring forms are easily completed from the normative sample ages 3±17 years
and scored, and computer support for scoring (Oppositional, Cognitive Problems, Hyperac-
and interpretation is available. The 1997 tivity, Anxious-Shy, Perfectionism, social pro-
manual (Conners, 1997) and most current blems, and psychosomatic), Conners' ADHD
review chapter (Conners, in press) suggest that Index, Conners' Global Index, and DSM-IV
these scales continue to focus on ADHD and scales Inattentive, Hyperactive-Impulsive, and
strengthen their assessment of related or Total. The ADHD Index consists of the 12 items
comorbid disorders. empirically determined to best identify children
Conners in these publications emphasizes the at risk for an ADHD diagnosis (this 12-item set
relative importance of teacher observation in varies by informant source across this family of
making these diagnostic determinations and tests). The Conners' Global Index includes the
Multidimensional Multisource Rating Scales 379

10 items historically labeled the ªHyperactivity CPRS-R are also incorporated in this form.
Indexº in previous published and unpublished Three short teacher forms are also available: (i)
versions of the Conners parent and teacher a 28-item version that provides shortened
scales. This index is presented as the best general Oppositional and Cognitive Problems scales,
measure of behavioral adjustment and the one the Hyperactivity scale, and the ADHD Index,
most sensitive to treatment effects. (ii) a 27-item version that provides the 12-item
Three additional forms are derived from these ADHD Index and the three DSM-IV scales, and
80 items: (i) a shortened form of 27 items (iii) a single sheet that presents the 10-item
(CPRS-R:S) that generates the 12-statement Global Index.
ADHD Index and shortened forms of the A comparison of CPRS-R and CTRS-R
Oppositional, Cognitive Problems, and Hyper- scales and index item content and scale length
activity scales, (ii) a 26-item format that may shed some light on CRS-R construction
provides the ADHD Index and the three and suggest issues in clinical application.
DSM-IV scales (CADS-P), and (iii) a sheet Because the CTRS-R is shorter (59 items) than
that only presents the 10-item Global Index. the CPRS-R (80 items), CTRS-R factor-derived
These three shortened forms are provided as scales are also shorter, with individual responses
options to monitor treatment effects. contributing greater to score derivation. The
Table 1 presents the pattern of scale and greatest shortening is present on factor scales
Index overlap (scales: Oppositional, 10 items; Oppositional (10 to six items) and Cognitive
Anxious-Shy, 8 items; Perfectionism, 7 items; Problems (12 to eight items). Assignment of
Social Problems, 5 items; and Psychosomatic, 6 identical titles to CPRS-R and CTRS-R scales
items are each formed from unique items). should not lead the clinician necessarily to
The DSM-IV Hyperactive±Impulsive and assume comparable item content for parent±
Hyperactivity scales exhibit considerable over- teacher scale pairs. While the Global Index and
lap as do the DSM-IV Inattentive and Cognitive DSM-IV scales do consist of items with
Problems scales, suggesting the need for addi- identical or comparable content, the remaining
tional guidelines for application and interpreta- CTRS-R score dimensions, except for the
tion of the full-length and shortened parent and Cognitive Problems and Hyperactivity scales,
teacher scales. It is also interesting to note that consist of no more than 50% comparable
the Global Index, presented as the 10 items most content for similarly titled CPRS-R dimensions
sensitive to behavioral disturbance, almost (six of the eight CTRS Cognitive Problems
never appears on any other scales (eight out items have equivalents on the identically named
of 10 items are unique). 12-item CPRS dimension. Six of the seven
CTRS Hyperactivity items have equivalents on
the identically named nine-item CPRS dimen-
4.13.2.2.2 Conners' Teacher Rating Scale- sion). Therefore, it was not unexpected to
Revised observe that the greatest parent±teacher agree-
The full-length version of the Conners' ment for normative cases occurred on the
Teacher Rating Scale-Revised (CTRS-R) con- Global Index and DSM-IV measures (Connors,
sists of 59 items. Across-informant analysis may 1997, p. 126).
be facilitated in that the six CTRS-R factor-
derived scales from three- to 17-year old
4.13.2.2.3 Conners±Wells' Adolescent Self-
normative children have been given the same
Report Scale
names as the CPRS-R scales (there is not a
teacher version of the CPRS-R Psychosomatic This 87-item questionnaire has been designed
scale). The Conners' Global Index, Conners' for students ages 12±17 with at least a sixth
ADHD Index, and three DSM-IV scales of the grade reading competence. Conner±Wells'

Table 1 CPRS-R item overlap.

Measure (no. of items/no. of unique items) B C H K L M

B:Cognitive problems (12/6) 3 5


C:Hyperactivity (9/3) 1 5
H:Conners' ADHD Index (12/6) 3 1 2 2
K:Conners' Global Index (10/8) 1 1
L:DSM-IV Inattentive (9/4) 5 2
M:DSM-IV Hyperactive-Impulsive (9/2) 5 2
380 Observations of Parents, Teachers, and Children

Adolescent Self-Report Scale (CASS) includes the study of the accuracy of each informant
six nonoverlaping factor-derived scales with source in diagnosing ADHD as well as the study
lengths of either eight terms (Anger Control of across-informant agreement.
Problems; Hyperactivity) or 12 items (Family The majority of CRS-R measures appear to
Problems; Emotional Problems, Conduct Pro- have been developed out of distinctly different
blems, Cognitive Problems). This form can also item pools, resulting in sets of scales that may
be scored for a 12-item ADHD Index and two complement rather than duplicate each other. It
DSM-IV ADHD scales (as in the parent and would be useful for the manual to explain why
teacher versions). Two shortened versions are the CRS norms incorporate three-year intervals.
available. In one, 27 items yield shortened The current profiles suggest in displayed raw-to-
versions of three factor-derived scales (Conduct T-score conversion substantial and variable
Problems, Cognitive Problems, and Hyperac- gender and age effects. The magnitude of this
tivity) and the 12-item ADHD Index, while the age effect is easily demonstrated by tracking on a
other provides the ADHD Index and the two profile the T-score equivalents for one scale raw
DSM-IV scales Inattentive and Hyperactive- score across the five sets of age norms. For
Impulsive in 30 items. example, a DSM-IV Hyperactive Impulsive raw
Comparison of item content between adoles- score of 12 for males resulted in the following five
cent and parent report demonstrates excellent T scores: 3±5, 60T; 6±8, 63T; 9±11, 66T; 12±14,
concordance for DSM-IV scales. Although 70T; 15±17, 81T. In contrast, the ADHD Index
reasonable agreement would be expected to be demonstrated far less age-related variation.
found for ADHD Indexes, Cognitive Problems
and Hyperactivity scales, and between CPRS-S
Oppositional and CASS Anger Control Pro- 4.13.2.3 Child Behavior Checklist/4±18 and
blems, very few items from the parent scale can 1991 Profile (CBCL); Teacher's Report
be identified in the self-report version. Form and 1991 Profile (TRF); Youth
Self-report and 1991 Profile (YSR)
4.13.2.2.4 Commentary
[University Associates in Psychiatry, 1 South
Table 2 displays average T-score group Prospect Street, Burlington, VT 05401-3456,
estimates derived from mean raw scores for USA]
parent, teacher, and self-report scales from three The CBCL, the first published measure of
samples: general clinical, ADHD, and matched these three (other forms now cover preschool,
normative (Conners, 1997, pp. 135±136). direct observation, and young adults), was
It would be very useful for the CRS-R Manual initially published in 1983 as a direct extension
to provide for each sample the proportion of of the factor analytic study of child problems
scale scores that equaled or exceeded the and symptoms first published in 1966 (Achen-
minimum T-score for the clinical range, whether bach, 1966). The 1983 CBCL consisted of two
this is 60T or 65T. Such lack of clinically relevant parts, an assessment of competence and a series
information is also demonstrated in a journal of factor-derived behavior problem scales
presentation of the six factor-derived CASS constructed from 120 brief descriptions. Each
scales (Conners et al., 1997). In comparing 86 checklist item is rated by a parent as either 0 =
adolescents with a sole diagnosis of ADHD Not True (as far as you know), 1 = Somewhat
combined type to matched normative controls, or Sometimes True, or 2 = Very True or Often
an overall correct classification rate of 82.6% True, with several items requiring individual
was obtained using all six scales. Although all six elaboration when these items are positively
scales were statistically significant in group endorsed (scores of 1 or 2). Because of these
contrasts (demonstrating comorbidity and need 17±21 items in each checklist, the informant
for multidimensional assessment), no indication must be closely monitored and an experienced
of the relative importance of each scale's clinician must review these comments before
contribution to this discrimination is given, each pamphlet is submitted to more automatic
nor are the proportion of elevated scores per scoring procedures (Drotar, Stein, & Perrin,
scale provided for ADHD and contrast samples. 1995). A companion teacher rating form
Table 2 documents substantial normative/ became available in 1986 and a self-report
clinical differences as well as the relative checklist was published in 1987. These forms
superiority of parent report when it is compared and subsequent revisions have been used in over
to the companion self-report form. Brevity and 2000 citations (Vignoe & Achenbach, 1997).
ease of application would recommend the CRS- The popularity of the CBCL and related
R measures in monitoring the treatment of instruments in research application has had a
children with ADHD. Versions of DSM-IV significant influence on the study of child and
scales for parent, teacher, and youth will allow adolescent psychopathology since 1986.
Multidimensional Multisource Rating Scales 381

Table 2 Average approximate T-scores for ADHD, clinical (CLIN), and normative
(NORM) samples.

ADHD CLIN NORM

CPRS-R
Oppositional 59 63 47
Cognitive Problems 67 57 46
Hyperactivity 69 58 48
Anxious-Shy 56 58 49
Perfectionistic 49 57 50
Social Problems 63 68 48
Psychosomatic 56 57 48
Global Index 67 64 46
ADHD Index 66 59 47
DSM-IV Total 69 59 47
CTRS-R
Oppositional 61 64 47
Cognitive Problems 56 50 47
Hyperactivity 69 68 49
Anxious-Shy 62 62 46
Perfectionistic 56 62 49
Social Problems 51 58 53
Global Index 66 62 46
ADHD Index 65 58 48
DSM-IV Total 64 58 46
CASS
Family Problems 53 53 46
Emotional Problems 56 56 47
Conduct Problems 57 54 45
Cognitive Problems 60 56 44
Anger Control Problems 57 57 47
Hyperactivity 57 52 45
ADHD Index 59 53 44

In each of these three forms competence has than broadly demonstrable dimensions. For
been measured through rationally developed example, only the profile for 12±16-year-old
scales that vary by informant source. The boys did not include a depression dimension.
factor-derived dimensions of child problems (Certainly this result could not be taken as
and symptoms followed a similar general support for the conclusion that adolescent boys
structure for each informant source (each do not demonstrate problematic depression!)
designated an ªaxisº by Achenbach). Each Obvious problems presented themselves in
form's profile recorded several narrow-band applying these scales in research and clinical
scales (designated ªsyndromesº by Achenbach), assessment. Repeated administration to the
summative Internalizing and Externalizing individual case over time could result in changes
scales derived from second-order factor analy- in scores as likely to be related to instrument or
sis, and a Total Symptoms scale. A series of item normative variation as to actual changes in
analyses (principal components analysis fol- subject status.
lowed by varimax rotation) was applied to The 1991 revision, documented in five
independent clinical samples of boys or girls monographs totaling over 1000 pages, repre-
within several age-range defined samples, sented a major departure (Achenbach, 1991a,
resulting in dimensions often unique for one 1991b, 1991c, 1991d, 1993). This revision, in
gender or one age group, as well as scales with contrast to the earlier versions of these check-
the same titles but differing in content. The lists, emphasized consistencies in scale dimen-
products of what appears to be a direct (vs. sions and scale content across child age (4±18
guided) unreplicated application of a data years), gender, and respondent/setting. Because
reduction technique of scale construction the actual checklist items remained essentially
proved to be problematic. These item analyses the same, this revision also provides substantial
apparently identified at times either sample- continuity, allowing either the continued use of
specific or age-/gender-specific variance rather the original sex-age-specific narrow-band
382 Observations of Parents, Teachers, and Children

scales, or the rescoring of previously obtained derivation of which has been of concern
protocols to generate the new scales. (Readers (Lachar, 1993; Kamphaus & Frick, 1996; Kline,
may find it useful to first read Chapter 3 of 1994, 1995). Scale values under 70T are
Achenbach (1993) to gain an overall apprecia- normalized and therefore compressed and
tion of these checklists.) unrepresentative of actual scale score distribu-
Of primary importance is that a series of tions in the normative sample. Values above
within-instrument item analyses was conducted 70T have been arbitrarily assigned by the
to obtain dimensions common across gender checklists' author with the goal of placing all
and age groups (ªcore syndromesº). Substantial possible scale scores evenly across available
samples of protocols from each form obtained profile space and do not represent any specific
from clinical and special education settings were probabilistic or statistical relationship with
evaluated (4455 CBCLs, 2815 TRFs, and 1272 scores under 70T. The CBCL checklists do
YSRs). This current version also provides eight not provide any measure of informant response
narrow-band scales for each informant derived set or protocol validity. Not only is the
from analysis of the 89 items common to parent-, identification of either symptom denial or
teacher-, and self-report forms, as well as random response problematic, but the T score
Internalizing, Externalizing, and Total Scales. assignment process above 70T can result in
The CBCL Internalizing scale score is derived protocols that are the result of a response set to
from the responses of three narrow-band scales: exaggerate problems but appear as valid
Withdrawn, Somatic Complaints, and Anxious/ estimates of child adjustment.
Depressed. The Externalizing scale score in-
cludes the responses from two narrow-band
4.13.2.3.1 Child Behavior Checklist/4±18
scales: Delinquent Behavior and Aggressive
(CBCL) and 1991 Profile
Behavior. Three other common scales were
not placed on these two broad-band dimensions: Parents complete a four-sided pamphlet and
Social Problems, Thought Problems, and Atten- clinicians complete one of two two-sided profile
tion Problems. Review of scale content reveals forms (one for girls, one for boys). Norms are
some item overlap across these eight scales provided for two age ranges (4±11 and 12±18 for
within each form, substantial difference in scale problem scales, 6±11 and 12±18 for competence
length within a profile (e.g., TRF scales range scales) separate by gender. Rated competence
from eight to 25 items), and scales that are items are organized by manifest content into
identically named yet vary in content across three narrow scales (Activities, Social, and
informants. Close examination reveals that only School) which are then summed into a total
one of these narrow-band scales (ªcross-infor- score. Parents are asked to list and then rate
mant syndromesº), Somatic Complaints, pre- (frequency, performance level) a child's parti-
sents the same nine items to parent, teacher, and cipation in sports, hobbies, organizations, and
youth. chores. Parents also describe the child's friend-
The 1991 forms now rely on national vs. ships, social interactions, performance in aca-
regional normative (nonreferred) samples. demic subjects, need for special assistance in
Although the CBCL and YSR are routinely school, and history of retention in grade. As
self-administered in clinical application, the standard scores for these scales increase with
CBCL normative data and some undefined demonstrated ability, a borderline range is
proportion of the YSR norms were obtained suggested at 30±33T and the clinical range is
through interview of the appropriate informant. designated at T 5 30. It has been suggested that
It may be that the interactive process of a child's social and economic opportunities as
interview administration of the CBCL inhibited well as race and ethnicity significantly affect
informant response to checklist items. For these values (Drotar et al., 1995). These scales
example, six of eight parent informant scales have also been compared to those of the PIC in
obtained average normative raw scores of less their ability to predict adaptive level as defined
than 2 with associated restricted scale variance. by the Vineland Adaptive Behavior Scales
It is important to note that increased problem (Pearson & Lachar, 1994).
behavior scale elevation reflects increased The behavior problem scales consist of one to
problems, although these scales do not con- eight word descriptions (ªOvertired,º ªArgues a
sistently extend below 50T. Because of the lot,º ªFeels others are out to get him/herº).
idiosyncratic manner in which T scores are Appendix A of the 1991 CBCI provides scoring
assigned to scale raw scores, it is useful to follow directions for the 21 items that incorporate
each manual's suggestion to use scale raw informant individual comment. Clinicians must
scores, rather than T scores, in research understand checklist content and child psycho-
applications. It is difficult to determine the pathology to correctly score these items. For
interpretive meaning of checklist T scores, the example, item 113 is ªPlease write in any
Multidimensional Multisource Rating Scales 383

problems your child has that were not listed the structure of a short factor-derived teacher
above.º The person who scores the checklist rating form.) It may also be useful to review the
must review the up to three examples that can be item by dimension correlation matrix that
elicited, determine if they duplicate other check- underlies each factor analysis to determine the
list content, and then score the one unique appropriateness of each item's scale placement.
symptom that receives the highest rating (Very
True or Often True rather than Somewhat or
4.13.2.3.2 Teacher's Report Form (TRF) and
Sometimes True). The importance of this issue
1991 Profile
cannot be minimized as almost all items of
the Thought Problems scale for all three inform- The TRF emerged from the CBCL and the
ants are provided in this format. Syndrome T comparability of the 120 symptom/problem
scores are interpreted as borderline at 67±70T items is readily apparent, as are the eight
and in the clinical range at T 4 70. There is some syndrome, two broad-band, and one summary
evidence, however, which suggests that ADHD score. However, comparison of the competence
and comorbid conditions are more accurately sections of the TRF and the CBCL reveals a
identified with a lower cutting point (Biederman marked difference. TRF measures are based
et al., 1993; Chen, Faraone, Biederman, & upon very limited data: an average rating of
Tsuang, 1994; Faraone, Biederman, Weber, academic performance based on up to six
& Russell, 1998; Steingard, Biederman, Doyle, subjects identified by the teacher, individual
& Sprich-Buckminster, 1992). The Total Pro- seven-point ratings on four topics (how hard
blem scale is interpreted as borderline at 60±63T working, behaving appropriately, amount
and in the clinical range at T 4 63. Although the learning, and how happy), as well as a summary
summary Internalizing and Externalizing scales score from these four items. The TRF desig-
are discussed in terms of their relative elevation nates a borderline interpretive range for the
when compared to each other, absolute inter- mean academic performance and the summary
pretive guidelines are not provided. score of 40±37T, with clinical range of T 5 37.
Profile and pamphlet formats facilitate review Even a cursory review of the TRF pamphlet
of response content at the item level. Although suggests that the equivalence of CBCL and TRF
scales are developed to overcome the unrelia- forms was given priority over collection of
bility of item responses, CBCL content may be teacher observations that would be unique to
important for a number of reasons. On several the classroom and the school environment. If
scales a ªvery true or often trueº response to as observations of social and academic skills were
few as two or three statements places a scale T integrated into the TRF, the teacher could also
score in the clinical range. When scales are provide estimates of adaptive behaviors neces-
limited in length and relatively heterogeneous in sary for success in the classroom.
content, the same scale elevation can be
generated by quite different observational con-
4.13.2.3.3 Youth Self-report (YSR) and 1991
tent. A casual review of scale content reveals
Profile
some inconsistencies that follow into TRF and
YSR dimensions. Obsessions and compulsions, The elements of this self-report measure have
often associated with anxiety disorders, join most of the basic characteristics of the CBCL
strange ideas, behaviors, and hallucinations on and TRF. The YSR requires a fifth grade
the Thought Problems dimension; ªNervous, reading ability and provides competence and
highstrung, or tenseº appears on Attention problem behavior standard scores normed
Problems as well as Anxious/Depressed; while separately by gender for the age range of 11
ªUnhappy, sad, or depressedº appears on through 18. Seven adaptive competency items
Withdrawn as well as Anxious/Depressed. The are scored for Activities, Social, and a Total
19±25 item Aggressive Behavior dimension Competence scale. The YSR manual's Appendix
appears to have a number of items that appear A is necessary to score these items (the addition
to be more readily associated with ADHD of scoring examples would be very useful). These
phenomena than any ªaggressiveº dimension: seven multipart items tap competence and level
ªDemands a lot of attention,º ªShowing off or of involvement in sports, activities, organiza-
clowning,º ªTalks too much,º as well as two tions, jobs, and chores. Items also provide self-
items only on the TRF: ªTalks out of turn,º and report of academic achievement, interpersonal
ªDisrupts class discipline.º These observations adjustment, and level of socialization.
suggest that although a factor analysis can The remainder of the pamphlet presents 120
identify clinically meaningful dimensions, sub- statements in which 16 are socially desirable
sequent efforts may be necessary to assure the items endorsed by most students. These items
stability of item scale placement. (See Erford replace potential YSR equivalents of CBCL
(1996) for an example of an attempt to replicate items that presented either problem behaviors
384 Observations of Parents, Teachers, and Children

of very young children or school-related The decision to revise scales without mod-
difficulties. The presence of these 16 unscored ification of the original problem statement pool
items represents an unused opportunity to assured continuity but at the same time
develop a semantic inconsistency scale to restricted potential improvement. These forms
measure random response or inadequate com- continue to demonstrate their original limita-
prehension, or some sort of defensiveness tions in psychometric character. It is difficult to
measure. The lack of validity scales in this understand why the author has taken an
self-report of symptoms and problems is idiosyncratic and statistically unsupportable
considered problematic by several reviewers approach both in subjecting summations of
(Kamphaus & Frick, 1996; Kline, 1995; Lachar, symptoms to a normalization process and in
1993). Two reviewers cite literature that docu- providing profiles that assign T scores to all
ments the YSR's diagnostic insensitivity (Kam- physically possible raw scores. Certainly pro-
phaus & Frick, 1996; Merrell, 1994). blematic item performance in the form of
The eight problem scales range from 50 to restricted range in symptom frequency within
100T, with 67±70T assigned the borderline these normative samples is at the core of
range and T 4 70 the clinical range. The attempts to modify score distributions to
Activities and Social Competency scales range improve psychometric performance. The ab-
from 55 to 20T, while the Total Competence sence of validity scales continue to present
score is profiled from 10 to 80T. Activities and random, defensive, and exaggerated protocols
Social are in the borderline range from 33 to as profiles that should be interpreted rather than
30T, while the clinical range includes scales excluded from this process.
5 30T. On the Total Competence scale, T The most remarkable observation, in contrast
scores of 40±37 are designated as borderline, to the continuing research growth of CBCL/
while T scores 5 37 form the clinical range. TRF/YSR application, is the apparent absence
of interpretive guidelines for the clinician
(certainly percentiles derived from normative
4.13.2.3.4 Commentary
samples do not provide evidence of scale
The CBCL, TRF, and YSR format is meaning beyond that suggested by manifest
responsible for both the broad success that item content). These 1991 manuals present as
these instruments have achieved as well as their primary evidence of validity that items and
shortcomings. Each form can be completed in scales differentiate clinical and normative
10±20 minutes and appears easily scored (if samples. Such evidence is only clinically useful
competence items and items requiring infor- if such a decision is frequently made in practice.
mant explanation are avoided). Brevity is (Clinicians know whether a student has or has
especially attractive for research applications not been referred for an evaluation.) Certainly
in which subject and human subject research there should be sufficient opportunity within
committee tolerance for multiple measures is of 2000 published studies to document the answers
specific concern. Scoring software provided by to specific clinically relevant questions. (Indeed,
the CBCL/TRF/YSR author facilitates not it is quite clear that much of the results of
only the processing of individual instruments, research in child psychopathology over the last
but provides estimates of agreement over 15±20 years have been dependent on the
multiple reports on the same instrument, as structure and psychometric characteristics of
well as agreement across different instruments. the CBCL.) It should be quite easy to identify
It is almost certain that the CBCL, TRF, and and summarize those studies that used a
YSR common format will stimulate meaningful contrasted-groups design to identify those
across-informant research that will document scales that are effective in making specific
the effect of item and scale content as well as distinctions.
informant characteristic (gender, age, ethnicity, A review of CBCL citations would easily find
clinical status) on both degree of agreement and carefully selected samples relevant to clinical
relative contribution to the diagnostic process. practice and scale validity. These protocols
Such research should generate useful guidelines could then be scored for 1991 dimensions and
for the clinical application of these instruments. subsequent profile types to establish the
On the other hand, the primary focus on parallel proportion of each sample that is elevated on
dimensions and consistency in the problem each dimension or obtains a specific classifica-
behavior scale construction technique limits tion for each instrument. For example, one
collection of informant-specific observation. investigation would be presented in tabular
The other instrument families presented in this form by profile scales that would be represented
chapter vary in dimensions assessed to some by 11 columns (eight narrow-band, two broad-
degree to facilitate the collection of such band, one summary) and samples that would be
information. represented by individual rows. In this manner,
Multidimensional Multisource Rating Scales 385

one could determine not only if Anxious/ always), while SRP items are rated as either
Depressed is usually clinically elevated for true or false. Raw scale scores are easily
students who receive solitary or combined obtained from the self-scoring form, although
anxiety and/or depression diagnoses, but that selecting and transferring scale and composite
an Anxious/Depressed clinical elevation is standard scores in the form of linear T scores
infrequent in samples that are not characterized from 77 pages of conversion tables appears to
by either depression or anxiety. A review of this require significant effort and concentration.
literature may also demonstrate which scales Computer scoring and interpretation greatly
may be used to differentiate between conditions facilitate this process.
frequently addressed in the diagnostic process, For all three informants, scales are provided
as well as external correlates of these scales. that measure clinical dimensions and adaptive
(Apparently some of the best available pub- dimensions (positive, desirable). All items are
lished support for scale validity is confined to placed on only one substantive scale following
contrasted group analysis instead of continuous an iterative process in which item content, item
heterogeneous clinical samples and to the to scale correlations, normative±clinical endor-
collection of nonindependent criteria from the sement differences and factor structure deter-
same informant who completed the CBCL mined scale structure. In addition, these scales
measure: see for example, Edelbrock & Cost- are combined on the basis of their intercorrela-
ello, 1988; Weinstein, Noam, Grimes, Stone, & tions into broad-band measures or scale
Schwab-Stone, 1990.) composites, Externalizing Problems, Internaliz-
ing Problems, and Adaptive Skills. The relative
diagnostic utility of scales vs. composites
4.13.2.4 Behavior Assessment System for remains to be demonstrated. Children of school
Children age are evaluated by the use of parent (PRS) and
teacher (TRS) forms designated for ages 6±11 or
[American Guidance Service Inc., 4201 12±18. Scale item composition differs by both
Woodland Road, Circle Pines, MN 55014- informant and age group, although substantial
1796, USA] similarity is demonstrated across age groups
The Behavior Assessment System for Chil- within each informant-specific form. The Self-
dren (BASC) differs from the three other report of Personality (SRP) is similarly divided
systems presented in this chapter in that the into forms for ages 8±11 and 12±18. In contrast
entire system (Parent Rating Scales, PRS; to the PRS and TRS that emphasize across-
Teacher Rating Scales, TRS; Self-report of informant similarities, the SRP has been
Personality, SRP) was published at one time. designed to complement parent and teacher
(The BASC also includes a developmental reports as a measure focused upon emotions
history form and a classroom observation form, and self-perceptions rather than as reports of
the Student Observation System). The BASC overt behaviors.
conveniently provides one integrated manual BASC forms are also notable for their
for all rating instruments (Reynolds & Kam- incorporation of validity scales to estimate the
phaus, 1992). Considering that these materials accuracy of obtained standard scores as well as
first became available in 1992 (with the provision of a brief Critical Items set. (The
advanced computer program first available in BASC manual does not, however, explain the
1994), its application in the literature has been rationale for item selection or provide relevant
limited, although assessment applications con- item performance statistics for these critical
tinue to be developed. For example, the BASC items.) Three sets of norms are provided:
Behavior Monitor for ADHD, brief parent and gender-specific (including representative pro-
teacher forms intended for repeated use to portions of students receiving special education
demonstrate whether the behaviors of children services), combined gender, and clinical.
already diagnosed as ADHD change as a result The use of a combined gender normative
of treatment, became available in 1998. sample to generate standard scores is the
In a manner similar to the Achenbach and the exception to general assessment practice, as
Conners' scales, the BASC ratings completed by most measures of adjustment and personality
parent, teacher, and youth are marked directly compare each student to same-gender norms. It
on self-scoring pamphlets or on one-page forms would seem that such a procedure would further
that allow the recording of responses for minimize the significance of disruptive beha-
subsequent computer entry. Each of these viors in girls and internalizing symptoms in
forms is relatively brief (126±186 items) and boys, as well as present boys as relatively lacking
can be completed in 10±30 minutes. PRS and in adaptive competence when compared to girls
TRS items are rated on a four-point frequency (see BASC manual, Figure 11.2). One could
scale (never, sometimes, often, and almost easily calculate the effect of any cutting point,
386 Observations of Parents, Teachers, and Children

such as T 4 60, on each BASC scale using both with other studentsº). Both TRS forms provide
normative approaches within a sample of a School Problems Composite that incorporates
referred students. After students would be so scales Attention Problems and Learning Pro-
identified, the accuracy of this classification blems (ªDoes not complete tests,º ªMakes
could be established using information inde- careless errorsº). In this manner the TRS
pendent of BASC data. provides 21 or 22 items that are unique to the
In contrast to the BASC comprehensive classroom.
normative sample, the sample used to generate Both the PRS and TRS incorporate a ªfake
clinical standard scores appears to both be badº (F) index to assess the possibility that a
limited in size (in some age/form samples less teacher or parent rated a child in an inordinately
than 100 students) and restricted in composi- negative fashion. The F index includes 16±20
tion. Yet test materials provide complete items per form that represent either maladaptive
support for scoring results on clinical norms. behaviors to which the respondent answered
Clinical norms are only routinely used for ªAlmost alwaysº or adaptive behaviors to
assessment measures that cannot be effectively which the respondent answered ªNever.º These
applied within general population samples (e.g., extreme scores (neither item nor scale statistics
a rating scale for severely disturbed hospitalized are provided in the manual) may represent
patients in which 99% of a regular population either problem exaggeration or the presence of
would obtain a summary score of 0). It is also severe behavioral disturbance.
difficult to define the parameters necessary to
construct a ªrepresentativeº clinical sample.
4.13.2.4.3 BASC Self-Report of Personality
Certainly TRS, PRS, and SRP samples of 30 or
(SRP)
fewer children diagnosed as depressed or
autistic would be inadequate for anything but SRP scales and scale composites vary from
initial exploration of scale or composite con- both PRS and TRS dimensions. Narrow-band
struct validity. scales Attitude to School, Attitude to Teachers,
and (only for ages 12±18) Sensation Seeking
form the School Maladjustment Composite.
4.13.2.4.1 BASC Parent Rating Scales
Scales Atypicality, Locus of Control, Social
PRS scale composites and associated narrow- Stress, Anxiety, and (only for ages 12±18)
band scales are Internalizing Problems (Anxi- Somatization form the Clinical Maladjustment
ety, Depression, and Somatization), Externali- Composite. A Personal Adjustment Composite
zation Problems (Hyperactivity, Aggression, is formed by scales Relations with Parents,
and Conduct Problems), and Adaptive Skills Interpersonal Relations, Self-Esteem, and Self-
(Adaptability (ages 6±11 only), Social Skills, Reliance. Two other unassigned narrow-band
and Leadership). Additional profiled scales scales are included: Depression and Sense of
include Atypicality, Withdrawal, and Attention Inadequacy. An additional measure, the Emo-
Problems. The PRS also provides a Behavioral tional Symptoms Index (ESI) is, according to
Symptoms Index, consisting of those clinical the BASC manual, the SRP's most global
scales common to all age levels on both the PRS indicator of serious emotional disturbance,
and TRS that load the highest on the first particularly internalized disorders. The ESI is
unrotated factor in a common-factor analysis. composed of two scales from the Clinical
This combination of scales is presented in the Maladjustment composite (Social Stress and
manual as a measure of overall level of problem Anxiety), two scales from the Personal Adjust-
behavior. The BASC manual suggests that ment composite (Interpersonal Relations and
clinical scales should be considered of potential Self-Esteem), and the two clinical scales that do
significance when T 4 60, while adaptive scale not appear on any SRP composite (Depression
scores should receive diagnostic attention when and Sense of Inadequacy). Interpretive guide-
T 5 40. lines presented in the BASC manual generally
suggest that scores of 60±69T suggest an at-risk
or mild/moderate range and scores above 69T a
4.13.2.4.2 BASC Teacher Rating Scales
ªclinical rangeº interpretation for clinical
TRS scale composites and associated narrow- scales. Because the adaptive scales are worded
band scales include Internalizing Problems and and scored in the positive direction, low
Externalization Problems composites with the elevations are given interpretive significance,
same narrow-band scales as for the PRS. The at comparable ranges of 31±40T and below 31T.
TRS Adaptive Skills composite includes the The SRP also provides three scales to
PRS scale components as well as a Study Skills measure the presence of response sets that are
scale that provides unique classroom observa- likely to compromise scale validity. As in the
tion (e.g., ªReads assigned chapters,º ªStudies PRS and TRS, the SRP includes an F-Index
Multidimensional Multisource Rating Scales 387

composed of infrequent responses that may literature to demonstrate scale validity and to
reflect symptom exaggeration (ªNobody likes suggest appropriate clinical and research appli-
me,º ªNothing about me is rightº). In addition, cations. The BASC family of instruments are
a 14-item L-Index has been incorporated to attractive for several reasons, however, espe-
measure problem denial (ªI always go to bed on cially if all three ratings (parent-, teacher-, and
time,º ªMy parents are always rightº). The third self-report) are obtained for a student. Scale and
measure, V-Index, is not a scale but a small set composite distributions appear to be less skewed
of items that, when responded to in the True than many Achenbach and Conners' measures.
direction, are likely to reflect either lack of Cluster analysis of the TRS normative sample,
cooperation or inadequate comprehension (ªI for example, resulted in seven clusters. In these
have never been in a car,º ªTelevision does not clusters mean clinical scale T scores ranged from
really existº). a low 40s to a high at or above 70, while mean
adaptive scale T scores ranged from a low of 33
to a high of 60 (Kamphaus, Huberty, DiStefano,
4.13.2.4.4 BASC Monitor for ADHD
& Petoskey, 1997). The second reason, SRP T
The BASC Monitor (Kamphaus & Reynolds, scores easily are obtained below 50T, in contrast
1998) consists of brief parent and teacher forms to the YSR, and substantial item content is
that, along with the BASC Student Observation usually represented in T scores within the
System and other variables defined by the ªclinical range.º Another reason to consider
clinician, provide a multimethod system de- these scales is that application of the BASC in
signed to evaluate the effectiveness of treat- school psychology settings in which ease of
ments used with ADHD. The primary purpose collection of both teacher and parent ratings is
is to demonstrate whether the behaviors of greater than in other settings would take
children already diagnosed as ADHD change as advantage of the complementary nature of the
a result of treatment. These forms are intended SRP when the PRS and TRS are routinely
for repeated use with the same child. Items are collected. The final reason to consider the BASC
derived from 1992 PRS and TRS forms, other is that the SRP scales appear especially attractive
items collected during form development but in the dimensions they generate, if parent and
not placed on a 1992 scale, as well as a few items teacher ratings are available to document
written for this specific purpose that reflect comorbid disruptive behavior phenomena.
DSM-IV diagnostic criteria. Norms were di- For example, it would be clinically meaningful
rectly derived or extrapolated from BASC data in an adolescent described by parent and teacher
and are organized into five age groups in order as aggressive and conduct disordered if he would
to be sensitive to age effects: 4±5, 6±7, 8±11, describe himself as seeing others as responsible
12±14, and 15±18. for his problems (Locus of Control) and as
The Parent Monitor Ratings (PMR) consists focused on the experiences of thrill-seeking and
of 46 items and the Teacher Monitor Ratings excitement (Sensation Seeking).
(TMR) consists of 47 items. All items are rated As with any first manual, however, the
on the BASC four-choice frequency format of psychologist will be left with many questions
Never, Sometimes, Often, or Almost Always. regarding application of BASC scales to the
Each form provides scales named Attention individual student. First, it is unclear from what
Problems, Hyperactivity, Internalizing Pro- source interpretive guidelines are derived, unless
blems, and Adaptive Skills, and a listing of from scale item content. For example, elevation
DSM-IV Items, some of which appear on other of SRP Sensation Seeking is presented to be
PRS and TRS scales, had been studied but not associated with potential for alcohol and drug
retained, or were written specifically to reflect use or experimentation, while elevation of SRP
DSM criteria. Item content was selected to be Social Stress in young children with scores
appropriate across the 4±18 years span, and higher than 70T suggests children who ªmay
some variation between PMR and TMR was turn inward in an unsuccessful attempt to cope
allowed to reflect the unique opportunities for with these tensionsº (BASC manual, pp. 60±61).
observation associated with each rater (e.g., Perhaps these conclusions are not derived from
TMR: ªBothers other children when working,º scale correlates, but from an extrapolation of
PMR: ªFiddles with things while at mealsº). the phenomena associated with the clinical
dimension measured by a given scale.
Although validity scales are provided, espe-
4.13.2.4.5 Commentary
cially for SRP, the psychometric details of their
It is difficult for a newly published set of construction and their performance in data
instruments to compete with those with a 20- to presented in the manual are not provided. Does
30-year history of application. Certainly the the L-Index correlate negatively with SRP
BASC cannot draw upon an extensive research clinical scales and positively with SRP adaptive
388 Observations of Parents, Teachers, and Children

scales? Were the clinical samples presented in blems and Aggression/Conduct Problems were
the manual first screened by these scales to strongly intercorrelated and negatively related
exclude potentially invalid protocols? The to adaptive functioning 3.5 years later (August,
evidence of scale validity primarily takes the MacDonald, Realmuto, & Skare, 1996).
form of factor structure and correlations with
other published measures. Readers of the
manual will not find a series of case studies, 4.13.2.5 Personality Inventory for Children, 2nd
nor are tabled and profiled mean scale values for Edition (PIC-2); Personality Inventory
a variety of referred samples presented in terms for Youth (PIY); Student Behavior
of proportion that exceed some value presented Survey (SBS)
as clinically meaningful. In this regard, it is most
interesting to contrast manual tables 12.28, [Western Psychological Services, 12031 Wil-
13.25, and 14.22 in that not one SRP clinical shire Boulevard, Los Angeles, CA 90025-1251,
scale obtains a mean of at least 60T, while only USA]
one adaptive scale score obtains a mean of 40T Forty years ago two University of Minnesota
or less. Similar normal limits results were psychologists began the development of a new
obtained for a relatively small sample of inventory approach to the evaluation of
children in state custody due to abuse or children and adolescents. They assembled a
neglect, although the L-Index was apparently 600-item administration booklet and named it
not scored (Dalton, 1996). the ªPersonality Inventory for Children. For
A brief review of published studies using the use with children from six through adoles-
PRS and TRS, in contrast, provides some cence.º The directions stated that each item was
encouraging evidence of scale validity. BASC to be answered as either ªtrueº or ªfalseº by the
parent and teacher scales demonstrated con- child's mother in order to describe both the child
vergent validity with CBCL and TRF scales and family relationships. Professors Wirt and
(Vaughn, Black, Hall, Hynd, & Riccio, 1995). Broen accumulated administration booklet
PRS scales were found to demonstrate con- descriptive statements following an outline.
vergent validity in nonreferred kindergarten To ensure comprehensive coverage of child
students with scales of the Social Skills Rating behavior and adjustment, 50 items were sorted
System for Children (Flanagan, Alfonso, Pri- into 11 separate content areas: aggression,
mavera, & Povall, 1996) and demonstrated anxiety, asocial behavior, excitement, family
similar validity in elementary school students at relations, intellectual development, physical
risk for conduct disorder with scales of the development, reality distortion, social skills,
CBCL (Doyle, Ostrander, Skare, Crosby, & somatic concern, and withdrawal. To these 550
August, 1997). In a study of cognitive problem- potential scale items, 50 items were added in an
solving, increasing PRF Hyperactivity Scale effort to strengthen or clarify the meaning of
elevation was associated with more problem certain areas of concern. Following many of the
solutions with aggressive or noncompliant general procedures employed in development of
content as well as fewer reasons or consequences the Minnesota Multiphasic Personality Inven-
justifying the selection of the ªbest solutionº tory, PIC scales were identified from this item
(Bloomquist, August, Cohen, Doyle, & Ever- pool over a span of 20 years.
hart, 1997). Four TRS scales (Aggression, The initial 1977 published profile included
Conduct Problems, Depression, and Social three validity scales, a general screening scale,
Skills) significantly classified ADHD and non- and 12 measures of child ability and adjustment
disabled students, as well as ADHD students and family function developed through either
with and without a comorbid condition (Lett & empirical item selection techniques or through
Kamphaus, 1997). TRS Learning Problems iterative content valid procedures. In 1981 the
correlated in referred elementary students with administration booklet was revised and items
the Freedom from Distractibility factor, were sorted into one of four parts. Completion
although no relation was obtained for the of part I (items 1±131) allowed the scoring of
Attention Problems or Hyperactivity scales four broad-band factor-derived scales (Lachar,
(Lowman, Schwanz, & Kamphaus, 1996). In Gdowski, & Snyder, 1982). Completion of parts
addition, TRS Externalizing Problems corre- I and II (items 1±280) generated the entire
lated significantly with a peer rating dimension clinical profile with ªshortened scalesº (Lachar,
ªaggressive±disruptive,º while TRS Internaliz- 1982), while parts I±III (420 items) allowed the
ing Problems correlated significantly with a peer scoring of the original length scales. The final
rating dimension ªsensitive±isolatedº (Realm- 180 items of this booklet were eventually
uto, August, Sieler, & Pessoa-Brandao, 1997). dropped from the administration booklet
In a longitudinal study, teacher/parent compo- because they did not appear on any of the
site measures of Hyperactivity/Attention Pro- standard full-length profile scales.
Multidimensional Multisource Rating Scales 389

From the beginning, the task of PIC scale and applied to the assessment of all revised clinical
profile interpretation attributed relatively little dimensions. In addition, a national normative
importance to item content (except for the sample has been collected concurrently with
construction of a Critical Items list), but instead that for the third diagnostic component, a new
established external correlates and interpretive teacher rating form, the Student Behavior
guidelines for individual profile scales (Lachar Survey (SBS). Although these revisions have
& Gdowski, 1979) and replicated profile increased the focus on content validity in this
patterns (Gdowski, Lachar, & Kline, 1985; family of parent, teacher, and self-report
Kline, Lachar, & Gdowski, 1987; LaCombe questionnaires, the 20-year tradition of deriving
et al., 1991). A diagnostic procedure in which interpretive guidelines from the empirical rela-
similarity coefficients are calculated between the tions between test measures and independent
individual PIC profile to be interpreted and the phenomena of adjustment and behaviour con-
mean profiles of students receiving specific tinues in the PIC-2 and SBS.
special education services has also been in-
corporated into profile scoring and interpreta-
4.13.2.5.1 Personality Inventory for Children,
tion software (Kline, Lachar, Grubber, &
2nd Edition (PIC-2); Personality
Boersma, 1994). Special effort has focused on
Inventory for Youth (PIY)
determining that PIC scale validity is not
restricted by a child's age, gender, or ethnicity The PIC-2 and PIY profiles graph the linear T
status (Kline & Lachar, 1992; Kline et al., 1985). scores of three parallel validity scales and nine
The development efforts for the PIC-2 started substantive clinical scales. PIC-2 norms are
in 1989 with the rewriting of the first 280 items provided for students in grades K though 12 (a
of the PIC booklet into a self-report format for preschool version is under development), while
the PIY (Lachar & Gruber, 1993, 1995). Many PIY norms are available for students in grades 4
dimensions were easily translated from the ªmy through 12. Clinical scales were constructed
childº to the ªIº format, although this process using an iterative process in which initial item
was either difficult or impossible for some placement based upon previous PIC scale place-
dimensions. For example, accurate self-report ment and/or substantive item content was then
of early developmental phenomena and severe empirically supported within item-to-scale cor-
developmental delay would be unlikely or relation matrices based upon large clinical
inappropriate for students who can complete samples. Items retained in each final PIC-2
a comprehensive self-report inventory (ªMy and PIY clinical scale demonstrated both
child was difficult to toilet train,º ªMy child can statistical significance and the most substantial
comb his/her own hairº). Such PIC items were item-to-scale correlation among the nine po-
replaced with other self-report statements likely tential scale placements. All other items, unless
to be used in validity scales or to supplement placed on an inventory validity scale or a critical
coverage of dimensions such as depression and item list, were dropped from the final version of
defective reality testing (see PIY technical each inventory. This process, in contrast to item
manual for an elaboration of this process). selection through contrasted groups, placed an
The PIY was developed with a subscale-within- emphasis on item content and scale homo-
scale structure that facilitates profile interpreta- geneity.
tion and has also incorporated a set of validity The majority of PIY clinical scale items
scales constructed from the responses of (86±100%, mean = 95%) appear on the
children and adolescents. identically named PIC-2 clinical scale, facilitat-
Development of the PIY facilitated concur- ing comparison of parent- and youth-report.
rent critical review of the structure and content The items of the PIY clinical scales are placed on
of PIC scales and the PIC profile. This review only one scale and one subscale. Of the 264
motivated substantial data collection using a items that comprise the nine PIC-2 clinical
PIC research edition administration booklet scales, only 16 (6.1%) appear on two scales.
that allowed both the scoring of the PIC and the Each clinical scale is also divided into two or
collection of data on revised and new inventory three nonoverlapping subscales. Guided by a
items. Data collected from application of this series of item factor analyses, these subscales
booklet have been subjected to considerable consist of item subsets that are more homo-
statistical analysis through which the PIC (now geneous than their associated total scales.
PIC-2) and PIY have achieved a great deal of Clinical and validity scale content remains the
similarity to facilitate comparison. The PIC same across child age or gender, allowing
validity scales, already a significant component expected age- and gender-related variance to
of profile interpretation, have been improved in appear. (For example, greater elevation on the
the PIC-2 and one consistent clinical scale and Impulsivity and Distractibility scale will be
subscale construction methodology has been found for boys in comparison to girls and for
390 Observations of Parents, Teachers, and Children

younger children in comparison to older teacher descriptions, and self-report descrip-


children.) PIY clinical scales become interpre- tions. These data were drawn from an ongoing
table at 60T, and subscales usually at 65T. clinical project in which the PIY, SBS, clinician
(Analyses of PIC-2 scale standard scores suggest ratings, diagnoses, and individually adminis-
comparable guidelines.) Table 3 presents the tered psychometric results are being collected to
scale/subscale structure, representative items, some degree with over 1000 PIC-2 assessments.
and internal consistency estimates of the PIC-2 Even though an exploratory analysis, some care
parent-report and PIY self-report clinical scales. was taken in the selection of these correlates.
Table 3 demonstrates that PIC-2 and PIY item Each obtained scale descriptor was placed on
content varies from the description of commonly only the one scale with which it received the
occurring difficulties to statements suggestive of largest correlation, all being at least significant at
severe psychopathology. It should be noted that p 5 0.01. Table 4 summarizes the number of
many PIC-2 and PIY subscales incorporate ratings identified in this manner from each
more descriptors than the full scales found in the source (clinician, teacher, and student) and
other three families of objective questionnaires. provides three examples from each rating source
The PIY and PIC-2 both incorporate a for each of these eight scales. Correlations
screening or short assessment procedure. The between these shortened scales and their full-
first 80 items of the PIY comprise a 32-item length versions are also presented.
screening scale chosen to provide an optimal Table 4 demonstrates that these brief scales
identification of those regular education stu- correlated substantially with full-length ver-
dents whoÐwhen administered the full PIYÐ sions and obtained independent correlates
produced clinically significant results. These from nonparent observers that matched ex-
items also include three ªscan itemsº for each pressed scale content and diagnostic intent.
scale. Scan items were selected in such a manner Clinicians provided the greatest support for
so that individuals who endorse two or more of ADH, DLQ, and DIS, focusing on problems of
each set of three items would be those with a high disruptive and noncompliant behavior and
probability of scoring 4 59T on the correspond- intense and dysphoric affect that often form
ing clinical scale. Shortened versions of three the basis of clinical referral. These analyses
validity scales can also be derived from these demonstrate that ADH, as previously demon-
items. strated for the PIC Hyperactivity scale (Lachar
The PIC-2 provides a short form to measure & Gdowski, 1979), assesses those behaviors
change in clinical status associated with ther- most related to problems in classroom adjust-
apeutic intervention. Although PIC scales may ment. In addition, observations obtained
have demonstrated such sensitivity to change directly from the student under study provide
(see therapeutic case example in Lachar & Kline, those internal and subjective judgments that
1994), a brief form tailored specifically for this demonstrate the clinical value of PIC-2
purpose was constructed. Selected items were (i) dimensions which do not receive robust
written in the present tense, (ii) frequently correlates from clinicians or teachers.
endorsed in the context of clinical assessment, PIY and PIC-2 profiles incorporate three
and (iii) described clinical phenomena often the validity scales. The Inconsistency scale (INC)
focus of short-term intervention. Using these evaluates the likelihood that responses to items
guidelines, the 12 most favourable items from are random or reflect in some manner inade-
each of eight PIC-2 clinical scales were selected. quate comprehension of inventory statements
(Cognitive Impairment was excluded due to or compliance with test instructions. The
historical components, lack of appropriate Defensiveness scale (DEF) identifies profiles
therapeutic focus, and the global/stable nature likely to demonstrate the effect of minimization
of most descriptions on this dimension.) These or denial of current problems. The third validity
96 inventory statements have been placed at the scale, Dissimulation (FB), identifies profiles
beginning of the 275-item PIC-2 booklet to serve that may result from either exaggeration of
as both a short form and a method of efficient re- current problems or a malingered pattern of
evaluation of a child following short-term atypical or infrequent symptoms.
intervention. These 96 items are also available PIY and PIC-2 Inconsistency scales measure
in a self-scoring format. It is intended for these semantic inconsistency through the classifica-
scale scores to be graphed on the same profile at tion of response to 35 pairs of highly correlated
baseline and at appropriate interim and post- items drawn from all nine clinical scales (for
treatment intervals to demonstrate both dimen- example, ªI have many friends/I have very few
sions of change and stability. friendsº; ªMy child has a lot of talent/My child
The initial concurrent validity of these has no special talentsº). For each item pair,
shortened scales was established through corre- two response combinations are consistent and
lation of these scale scores with clinician ratings, two are inconsistent. Each inconsistent pair
Multidimensional Multisource Rating Scales 391

Table 3 PIC-2 and PIY Clinical Scale and Subscale Composition.

Cognitive Impairment PIC-2: 39 items, alpha = .881


PIY: 20 items, alpha = .740
Items in common: 20 (only 1 on PIC/2 COG3)
PIC-2 Subscales:
COG1/Inadequate Abilities (13 items, alpha = .792)
My child is rather absent-minded.
COG2/Poor Achievement (13 items, alpha = .785)
Reading has been a problem for my child.
COG3/Developmental Delay (13 items, alpha = .799)
My child could eat with a fork before age four years.
PIY Subscales:
COG1/Poor Achievement and Memory (8 items, alpha = .649)
School has been easy for me.
COG2/Inadequate Abilities (8 items, alpha = .673)
Other people think that I am talented.
COG3/Learning Problems (4 items, alpha = .441)
I have been held back a year in school.

Impulsivity and Distractibility PIC-2: 27 items, alpha = .921


PIY: 17 items, alpha = .773
Items in common: 17
PIC-2 Subscales:
ADH1/Disruptive Behavior (21 items, alpha = .917)
My child jumps from one activity to another.
ADH2/Fearlessness (6 items, alpha = .693)
My child will do anything on a dare.
PIY Subscales:
ADH1/Brashness (4 items, alpha = .535)
I like to show off.
ADH2/Distractibility/Overactivity (8 items, alpha = .613)
I cannot keep my attention on anything.
ADH3/Impulsivity (5 items, alpha = .543)
I often act without thinking.

Delinquency PIC-2: 47 items, alpha = .957


PIY: 42 items, alpha = .923
Items in common: 39
PIC-2 Subscales:
DLQ1/Antisocial Behavior (13 items, alpha = .881)
My child has been in trouble with the police.
DLQ2/Dyscontrol (17 items, alpha = .921)
When my child gets mad, watch out!
DLQ3/Noncompliance (17 items, alpha = .927)
My child often disobeys me.
PIY Subscales:
DLQ1/Antisocial Behavior (15 items, alpha = .831)
I have run away from home.
DLQ2/Dyscontrol (16 items, alpha = .839)
What people say often makes me angry.
DLQ3/Noncompliance (11 items, alpha = .828)
I tend to see how much I can get away with.

Family Dysfunction PIC-2: 25 items, alpha = .882


PIY: 29 items, alpha = .869
Items in common: 25
PIC-2 Subscales:
FAM1/Conflict Among Members (15 items, alpha = .847)
Out family argues a lot at dinner time.
FAM2/Parent Maladjustment (10 items, alpha = .781)
One of the child's parents drinks too much alcohol
392 Observations of Parents, Teachers, and Children
Table 3 (continued).

PIY Subscales:
FAM1/Parent±Child Conflict (9 items, alpha = .817)
I am unhappy about my home life.
FAM2/Parent Maladjustment (13 items, alpha = .738)
My parents often argue.
FAM3/Marital Discord (7 items, alpha = .701)
My parents are now divorced or living apart.

Reality Distortion PIC-2: 29 items, alpha = .898


PIY: 22 items, alpha = .831
Items in common: 20
PIC-2 Subscales:
RLT1/Developmental Deviation (14 items, alpha = .842)
My child does not understand other people.
RLT2/Hallucinations and Delusions (15 items, alpha = .817)
My child sometimes sees things that are not there.
PIY Subscales:
RLT1/Feelings of Alienation (11 items, alpha = .768)
I am different from most kids.
RLT2/Hallucinations and Delusions (11 items, alpha = .709)
People secretly control my thoughts.

Somatic Concern PIC-2: 28 items, alpha = .829


PIY: 27 items, alpha = .853
Items in common: 24
PIC-2 Subscales:
SOM1/Psychosomatic Preoccupation (17 items, alpha = .793)
My child seems tired most of the time.
SOM2/Muscular Tension and Anxiety (11 items, alpha = .675)
Recently my child has complained of chest pains.
PIY Subscales:
SOM1/Psychosomatic Syndrome (9 items, alpha = .730)
I often have headaches.
SOM2/Muscular Tension and Anxiety (10 items, alpha = .740)
At times I have trouble breathing.
SOM3/Preoccupation with Disease (8 items, alpha = .596)
I am worried about disease.

Psychological Discomfort PIC-2: 39 items, alpha = .909


PIY: 32 items, alpha = .864
Items in common: 31
PIC-2 Subscales:
DIS1/Fear and Worry (13 items, alpha = .728)
My child is often afraid of little things.
DIS2/Depression (18 items, alpha = .881)
My child tends to feel sorry for himself/herself.
DIS3/Sleep Disturbance/Preoccupation with Death (8 items, alpha = .770)
My child's sleep is calm and restful.
PIY Subscales:
DIS1/Fear and Worry (15 items, alpha = .779)
I worry about things that adults worry about.
DIS2/Depression (11 items, alpha = .730)
The future looks good to me.
DIS3/Sleep Disturbance (6 items, alpha = .703)
I often get up at night.

Social Withdrawal PIC-2: 19 items, alpha = .796


PIY: 18 items, alpha = .800
Items in common: 18
Multidimensional Multisource Rating Scales 393
Table 3 (continued).

PIC-2 Subscales:
WDL1/Social Introversion (11 items, alpha = .781)
My child worries about talking to others.
WDL2/Isolation (8 items, alpha = .656)
My child often stays in his/her room for hours.
PIY Subscales:
WDL1/Social Introversion (10 items, alpha = .781)
Shyness is my biggest problem.
WDL2/Isolation (8 items, alpha = .591)
I keep my thoughts to myself.

Social Skill Deficits PIC-2: 28 items, alpha = .915


PIY: 24 items, alpha = .855
Items in common: 23
PIC-2 Subscales:
SSK1/Limited Peer Status (13 items, alpha = .853)
My child often brings friends home.
SSK2/Conflict with Peers (15 items, alpha = .888)
Other children often get mad at my child.
PIY Subscales:
SSK1/Limited Peer Status (13 items, alpha = .788)
I am sure of myself in a group.
SSK2/Conflict with Peers (11 items, alpha = .804)
Other kids make fun of my ideas.

Note: coefficient alphas based upon 1178 PIY and 901 PIC-2 protocols obtained in the context of clinical
evaluation.

identified in a given protocol contributes one while characteristic (PIY, 83%; PIC-2, 55%) of
point to the INC raw score. Application of a the ªfake badº or dissimulated condition. The
cutting raw score of 4 12 resulted in correct pattern of these three scales readily identifies
identification of 90±95% of clinical protocols profiles in which caution must be applied to
and 92±96% of a random sample. The DEF is their interpretation. Table 5 presents PIC-2 and
an expanded version of the PIC Lie scale. DEF PIY profile scores from three hospitalized 12-
items represent denials of common problems year-old adolescents.
(ªSometimes I put off doing a chore. Falseº; The evaluation of Case A resulted in PIC-2
ªMy child almost never argues. Trueº) and and PIY validity scale results which did not
attributions of improbable positive adjustment suggest that the accuracy of these profiles had
(ªMy child always does his/her homework on been compromised in any systematic manner. It
time. Trueº; ªI am almost always on time and is important to observe that both profiles
remember what I am supposed to do. Trueº). present clinically elevated scale scores. First
DEF elevations above 59T, even in hospita- consider similarities in scale and subscale
lized psychiatric patients, result in profiles that pattern and scale and subscale clinical eleva-
either minimize current problems or consis- tions (e.g., apparent disagreement for COG3
tently deny the presence of most or all represents the reality that PIC-2 COG3 consists
problems in adjustment. of parent report of developmental delay, while
The third validity scale, Dissimulation (FB), PIY COG3 represents learning problemsÐa
was empirically constructed through item dimension much more similar to PIC-2 COG2).
analysis comparing clinical protocols and two Case A is a 12-year-old in his fifth psychiatric
sets of protocols completed by nonreferred hospitalization who had been placed in a self-
regular education students or their mothers. contained special education classroom for
First the PIY or PIC-2 was completed with behavioral maladjustment. He carries diagnoses
directions to provide an accurate description, of ADHD combined type, Oppositional Defiant
then a second questionnaire was completed as if Disorder, and Conduct Disorder. His behavior
the student was in need of mental health at home and in the hospital demonstrated
counseling or hospitalization. Selected items serious behavioral dyscontrol. He had been
in the scored direction were very infrequent in noncompliant in taking psychotropic medica-
the accurate normal (PIY, 11%; PIC-2, 4%) tion to improve his emotional and behavioral
and clinical (PIY, 18%; PIC-2, 15%) condition, control. He had threatened to kill himself (see
394 Observations of Parents, Teachers, and Children

Table 4 Correlates of PIC-2 Brief Clinical Scales.

Impulsivity and Distractibility (ADH12: .96a)


Clinician Ratings: total = 14
Hyperactive, overactive
Restless
Frequently frustrated
Teacher Ratings: total = 34
Disobeys class or school rules
Impulsive, acts without thinking
Misbehaves unless closely supervised
Student Ratings: total = 9
Brags about being sent to the principal
School has sent notes home about bad behavior
Likes to show off

Delinquency (DLQ12: .93)


Clinician Ratings: total = 44
Argues, Oppositional
Disobedient to parents
Poorly modulated anger
Teacher Ratings: total = 11
Does not demonstrate polite behavior
Blames others for his/her problems
Becomes upset for little or no reason
Student Ratings: total = 24
Gives parents a lot of trouble
Sometimes swear at parents
Ran away from home

Family Dysfunction (FAM12: .93)


Clinician Ratings: total = 6
Conflict between parents
Parent divorce/separation
Emotionally abused
Teacher Ratings: total = 6
Uses alcohol or drugs
Strikes or pushes school personnel
Preoccupied with sex
Student Ratings: total = 18
Family does not enjoy being with each other more than other families
A lot of tension in the home
Parents often argue

Reality Distortion (RLT12: .95)


Clinician Ratings: total = 3
Auditory hallucinations
Inappropriate emotion, affect
Delusions, paranoia
Teacher Ratings: total = 0
Student Ratings: total = 13
Don't get along with others most of the time
Need a lot of help from others
Do strange or unusual things

Somatic Concern (SOM12: .92)


Clinician Ratings: total = 6
Excessive sleeping
Sexually abused
Continually tired, listless
Multidimensional Multisource Rating Scales 395
Table 4 (continued).

Teacher Ratings: total = 0


Student Ratings: total = 33
Often get very tired
At times have trouble breathing
Don't have as much energy as most other kids

Psychological Discomfort (DIS12: .93)


Clinician Ratings: total = 22
Depressed, sad, unhappy
Moodiness
Inadequate self-esteem
Teacher Ratings: total = 4
Appears sad or unhappy
Doesn't take successes and failures in stride
Worries about little things
Student Ratings: total = 25
Shy with kids my own age
Not often in a good mood
Often afraid of little things

Social Withdrawal (WDL12: .96)


Clinician Ratings: total = 4
Withdrawn
Shy
Uncommunicative, seldom talks
Teacher Ratings: total = 2
Pessimistic about the future
Overcritical of himself/herself
Student Ratings: total = 12
Shyness is my biggest problem
Hardly ever talk
Stay in the house for days at a time

Social Skill Deficits (SSK12: .96)


Clinician Ratings: total = 5
Teased by peers
Isolated, few or no friends
Attends self-contained special education class
Teacher Ratings: total = 17
Does not seem to have fun
Avoids social interaction in class
Unaware of the feelings of others
Student Ratings: total = 28
Often rejected by other kids
Have very few friends
Not very popular with other kids

a
Indicates correlation between total and 12-item version of PIC-2 clinical scale in a sample of 905 referred
students. Note text for explanation of correlate selection procedure.

DIS3 scores) and had assaulted his mother who 15 minutes (see ADH values). He had a history
he said did not want him at home (see PIY of impulsive and disruptive behavior, fighting
FAM1 and FAM2). This patient attempted to with peers, noncompliance with adults, verbal
escape from the hospital and required multiple and physical aggression, and running away
time-outs and seclusions to control rages, from home (elevated DLQ values for PIY and
threats, and aggressive and inappropriate PIC-2).
behavior (see DLQ2). Off medication, A Case B, in contrast to Case A, documents
demonstrated an attention span of less that considerable parent/child disagreement. The
396 Observations of Parents, Teachers, and Children

Table 5 The influence of respondent defensiveness on PIY/PIC-2 profile pairs.

Case A Case B Case C


Scale/subscaleNORM PIC-2 PIY PIC-2 PIY PIC-2 PIY

Inconsistency 53 49 67 57 50 68
Dissimulation 60 72 81 48 47 72
Defensiveness 30 39 30 64 65 50

Cognitive Impairment 61 63 67 57 43 47
COG1 56 57 68 52 49 42
COG2 67 50 70 45 41 45
COG3 53 85 48 85 43 67

Impulsivity and Distractibility 79 69 81 39 44 64


ADH1 75 77 83 49 46 53
ADH2 81 62 64 37 41 62
ADH3 57 41 66

Delinquency 83 75 86 41 42 49
DLQ1 71 75 63 43 46 49
DLQ2 82 71 98 41 43 56
DLQ3 79 67 76 43 42 42

Family Dysfunction 60 69 58 57 67 49
FAM1 58 69 58 52 62 47
FAM2 60 74 54 57 72 49
FAM3 53 58 53

Reality Distortion 67 71 73 48 50 79
RLT1 69 64 83 53 55 74
RLT2 62 75 56 41 44 79

Somatic Concern 82 75 74 49 41 65
SOM1 82 66 76 38 42 59
SOM2 72 71 65 53 41 65
SOM3 73 59 65

Psychological Discomfort 90 59 97 63 53 68
DIS1 70 64 81 64 51 66
DIS2 88 38 92 52 49 58
DIS3 83 70 83 63 63 68

Social Withdrawal 47 83 85 41 46 59
WDL1 46 77 72 46 49 53
WDL2 50 78 88 38 42 65

Social Skill Deficits 57 43 86 56 49 57


SSK1 43 40 68 53 50 56
SSK2 74 50 97 59 48 55

Note: Table 3 provides the names and examples of PIC-2 amd PIY subscales.

only clear agreement as to problem area is in medical record details this patient's repeated
academic achievement (PIC-2 COG2, PIY denial and minimizing of his problems during
COG3). History and psychometric assessment this hospitalization in an attempt to facilitate
document retention in grade, special class his early discharge from treatment. This child's
placement, and achievement substantially be- psychiatric history was secondary to a trau-
low assessed ability. Clearly the PIC-2 most matic motor vehicle accident. He felt lonely
accurately describes a 12-year-old male who and scared, he frequently cried, sobbed, shook,
demonstrates multiple handicaps. The elevation avoided others, and was preoccupied with
of the PIY DEF scale (T = 64) is the most likely excessive worries (DIS, WDL). He externalized
explanation for a PIY profile that is essentially his problems (DLQ) and had difficulties with
within normal limits. Indeed, review of the peers and had little insight into his role in
Multidimensional Multisource Rating Scales 397

these conflicts (SSK2). This pattern of youth scores are gender-specific and divided into two
defensiveness is fairly common in inpatient age groups: 5±11 and 12±18 years.
settings. The SBS consists of three sections (Lachar
Case C is quite unusual in that the mother- et al., 1999). In the first section the teacher
completed PIC-2 is essentially within normal circles one rating out of five (Deficient, Below
limits, with the exception of FAM1, FAM2, Average, Average, Above Average, Superior)
and DIS3 obtaining some elevation. This for eight areas of achievement such as reading
normal-limits profile is quite unusual in light comprehension and mathematics that are
of the fact that the mother referred her 12-year- summed for an estimate of current academic
old daughter for this hospitalization. C pre- performance. The remaining SBS items are
sented with suicidal ideation, low self-esteem, rated on a four-point frequency scale: Never,
depression, crying spells, poor appetite, and Seldom, Sometimes, and Usually. The next two
associated weight loss (DIS, WDL2). She scales consist of positively worded descriptions
actively demonstrated somatic concern and of student adaptive behaviors. Academic Habits
somatic symptoms in response to conflict (13 items; ªCompletes class assignmentsº;
during this hospitalization (SOM) and said ªFollows the teacher's directionsº) and Social
that she would not talk about her problems with Skills (eight items: ªHelps other studentsº;
her mother, because she was afraid that such ªParticipates in class activitiesº) scales docu-
discussion would distress her mother who was ment positive behaviors. Ratings of parents are
under psychiatric care (PIC-2 FAM2). Clin- very school-specific in that the teacher is asked
icians were sufficiently concerned with C's to judge the degree to which parents support the
internalizing problems to assign discharge student's educational program (six items:
diagnoses of Generalized Anxiety Disorder ªParent(s) encourage achievementº; ªParent(s)
and Depressive Disorder NOS and to prescribe meet with school staff when askedº).
antidepressant medication. Why was C's The seven Problems in Adjustment scales
mother more defensive in describing her consist of negatively worded items organized
daughter's problems than C was herself? It into Health Concerns (six items: ªComplains of
was clearly documented in the medical record headachesº; ªTalks about being sickº); Emo-
that C's mother was concerned that, because of tional Distress (15 items: ªAppears sad or
her and C's psychiatric problems, she would be unhappyº; ªWorries about little thingsº);
seen as an inadequate mother and consequently Unusual Behavior (seven items: ªSeems lost
would lose custody of her child to another adult or disorientedº; ªSays strange or bizarre
family member. thingsº); Social Problems (12 items: ªCriticized
by other studentsº; ªAvoids social interaction in
classº); Verbal Aggression (seven items: ªIn-
sults other studentsº; ªTeases or taunts other
4.13.2.5.2 Student Behavior Survey
studentsº); Physical Aggression (five items:
The Student Behavior Survey (SBS) differs ªDestroys property when angryº; ªStarts fights
from the other teacher rating scales described in with other studentsº); and Behavior Problems
this chapter in that no particular effort was (15 items: ªDisrupts class by misbehavingº;
made to develop the SBS into a parallel teacher ªTalks excessivelyº). In initial samples of 1173
version of the PIC-2 and PIY. Indeed, the regular education students and 601 students
development of SBS items consisted of several referred for evaluation or receiving special
iterations in which the authors focused on education services in grades K through 12, 99
content appropriate to teacher observation. of 102 items significantly (p 5 0.001) separated
SBS items and their rating options appear on a these two samples. All items demonstrated that
self-scoring form. These items are sorted into they had been placed on the scale with which
content meaningful dimensions under 11 scale each obtained the largest correlation. Scale
headings rather than presented in random scores of regular education and referred
order. The resulting items demonstrate a school students obtained meaningful three-factor
focus in that 58 of 102 items specifically refer to solutions.
in-class or in-school behaviors and judgments Additional effort (Pisecco et al., 1998)
that can only be made by school staff resulted in three additional, 16-item nonover-
(Wingenfeld, Lachar, Gruber, & Kline, 1998). lapping scales consisting of SBS items first
The SBS consists of 102 items profiled on to 14 nominated as fitting DSM-IV diagnoses of
scales that assess student academic status and ADHD (combined type), Oppositional Defiant
work habits, social skills, parental participation Disorder, and Conduct Disorder. Item-to-scale
in the educational process, and problems such correlations and a three-factor solution of these
as aggressive or atypical behavior and emo- 48 SBS items empirically supported the place-
tional stress. Norms that generate linear T ment of these scale items.
398 Observations of Parents, Teachers, and Children

4.13.2.5.3 Commentary (although a sophisticated knowledge of both


developmental psychopathology and psycho-
The revision of the PIC, the addition of a
metrics is clearly a prerequisite). The routine
teacher rating scale, and the collection of a
application of such measures in clinics, hospi-
national representative normative sample for
tals, and private practices will continue to be a
each have gone a long way to respond to the
growing trend. This chapter documents that a
concerns raised by reviewers regarding the
number of quality measures are currently
PIC's age (Kamphaus & Frick, 1996; Knoff,
available to meet this need. Clinicians should
1989; Merrell, 1994). The evaluations of the SBS
review each set of measures to select the specific
and PIC-2 manuals and the review of their
family of instruments best suited for them to be
subsequent demonstrated ability to evaluate
applied as a routine procedure, or the specific
emotional adjustment at baseline and to
instruments to be applied to a specific clinical
quantify response to intervention will appear
activity.
well into the twenty-first century.
Routine collection of information that is both
The emphasis on evaluating response accu-
multidimensional and multisource will influence
racy using validity scales and the empirical
future research and clinical practice. Efforts to
determination of interpretive guidelines con-
document the effect of comorbid conditions and
tinues to characterize these measures. Many
to establish interpretive guidelines for indivi-
psychologists unconvinced of the importance of
dual profiles and sets of profiles will continue.
these concepts may not value their contributions
Interpretive handbooks, similar to those avail-
to assessment. Although the PIC has been
able for popular adult personality inventories,
reduced from 420 to 275 items, into which a
will be developed to support training and
set of subscales and a brief form have been
clinical applications. Projects will identify
incorporated, some clinicians may still judge the
research designs in which the meaning of scale
length of these questionnaires to be problematic.
values and profile patterns can be established
Although this chapter's author is obviously
through the analysis of independent criteria.
biased against such a position, it is certain that
Given the infrequency of studies designed to
the breadth and depth of a measure's content
evaluate tests, as compared to studies that use
establishes the potential boundaries for its
tests to evaluate clinical phenomena, efforts will
utility. Even the 270 items of the PIY are easily
be made to identify and collect substantial data
completed in less than 45 minutes by children as
that will allow such analyses. Indeed, the
young as students in the fourth grade. Efficiency
concurrent application of parent, teacher,
has been improved by rejecting any item not
youth, and clinician forms, as demonstrated
actively used in the interpretive process as well as
in Table 4, will allow the establishment of
providing computer software for scoring and
external correlates and the determination of
interpretation.
scale standard score interpretive ranges.
As either ªnewº or ªimprovedº measures, the
Studies will routinely investigate the relative
approximately 400 PIC-relevant publications
efficacy of informant sources (parent, teacher,
(bibliography available from this author) at best
and youth) to support a given diagnostic
now suggest diagnostic potential for these forms
decision, as well as compare the instruments
on the dimensions that retain the greatest
across these four families. Some early efforts
similarity from original to revised formats. A
include Pearson and Lachar's (1994) compar-
great deal of effort will be necessary to establish
ison of CBCL and PIC scales in their ability to
the diagnostic utility of the new and revised
predict the standard scores of the Vineland
forms to achieve the standards of the original
Adaptive Behavior Scale, and the comparison
inventory (Lachar & Kline, 1994). Such efforts
of BASC and CBCL forms in identifying
have begun. For example, matched samples of
students with ADHD and in differentiating
adolescents with discharge diagnoses of either
between ADHD types (Ostrander, Weinfurt,
Conduct Disorder or Major Depression were
Yarnold, & August, in press; Vaughn et al.,
correctly classified by PIY subscales in 83% of
1997). The former study established the relative
cases (Lachar, Harper, Green, Morgan, &
value of the BASC PRS Attention Problem
Wheeler, 1996).
Scale for identifying students likely to demon-
strate nonspecific ADHD, although both
4.13.3 CONCLUSIONS measures demonstrated limited ability to iden-
tify ADHD subtype. In the second study, both
Clinician interest in applying objective ques- BASC parent and teacher scales demonstrated
tionnaires in the evaluation of children and significant superiority over the CBCL and TRF
adolescents continues to grow. These measures in accurately classifying ADHD type.
are efficiently obtained, easily scored, and at The routine use of these measures will expand
least appear to be relatively easy to interpret to repeated assessment paradigms. The interest
References 399

in measuring treatment effectiveness as a discriminant validation by the multitrait-multimethod


clinical as well as a research priority will make matrix. Psychological Bulletin, 56, 81±105.
Cohen, J. (1988). Statistical power analysis for the
a meaningful contribution to identifying opti- behavioral sciences. Hillsdale, NJ: Erlbaum.
mal treatment strategies as well as assigning Chen, W. J., Faraone, S. V., Biederman, J., & Tsuang, M.
specific treatment options for the individual T. (1994). Diagnostic accuracy of the Child Behavior
case. These trends will facilitate retrospective Checklist scales for Attention-Deficit Hyperactivity
Disorder: A receiver-operating characteristric analysis.
ªfile drawerº study within clinical settings and Journal of Consulting and Clinical Psychology, 62,
expand the investigation of clinically referred 1017±1025.
youth. Conners, C. K. (1997). Conners' Rating Scales-Revised
technical manual. North Tonawanda, NY: Multi-Health
Systems.
Conners, C. K. (in press). Conners' Rating Scales-Revised.
4.13.4 REFERENCES In M. Maruish (Ed.), The use of psychological testing for
Achenbach, T. M. (1966). The classification of children's treatment planning and outcome assessment (2nd ed.).
psychiatric symptoms: A factor-analytic study. Psycho- Hillsdale, NJ: Erlbaum.
logical Monographs, 80 (Whole Issue No. 615). Conners, C. K., Wells, K. C., Parker, J. D. A., Sitarenios,
Achenbach, T. M. (1981). A junior MMPI? (Review of G., Diamond, J. M., & Powell, J. W. (1997). A new self-
multidimensional description of child personality: A report scale for assessment of adolescent psychopathol-
manual for the Personality Inventory for Children and ogy: Factor structure, reliability, validity, and diagnostic
Actuarial assessment of child and adolescent personality: sensitivity. Journal of Abnormal Child Psychology, 25,
An interpretive guide for the Personality Inventory for 487±497.
Children profile.) Journal of Personality Assessment, 45, Dalton, J. E. (1996). Juvenile male sex offenders: Mean
332±333. scores on the BASC Self-Report of Personality. Psycho-
Achenbach, T. M. (1991a). Integrative guide for the 1991 logical Reports, 79, 634.
CBCL/4-18, YSR, and TRF profiles. Burlington, VT: Doyle, A., Ostrander, R., Skare, S., Crosby, & August, G.
University of Vermont, Department of Psychiatry. J. (1997). Convergent and criterion-related validity of the
Achenbach, T. M. (1991b). Manual for the Child Behavior Behavior Assessment System for Children±Parent Rat-
Checklist/4-18 and 1991 Profile. Burlington, VT: Uni- ing Scale. Journal of Clinical Child Psychology, 26,
versity of Vermont, Department of Psychiatry. 276±284.
Achenbach, T. M. (1991c). Manual for the Teacher's Report Drotar, D., Stein, R. E., & Perrin, E. C. (1995).
Form and 1991 Profile. Burlington, VT: University of Methodological issues in using the Child Behavior
Vermont, Department of Psychiatry. Checklist and its related instruments in clinical child
Achenbach, T. M. (1991d). Manual for the Youth Self- psychology research. Journal of Clinical Child Psychol-
Report and 1991 Profile. Burlington, VT: University of ogy, 24, 184±192.
Vermont, Department of Psychiatry. Edelbrock, C., & Costello, A. J. (1988). Convergence
Achenbach, T. M. (1992). New developments in multiaxial between statistically derived behavior problem syn-
empirically based assessment of child and adolescent dromes and child psychiatric diagnoses. Journal of
psychopathology. In J. Rosen & P. McReynolds (Eds.), Abnormal Child Psychology, 16, 219±231.
Advances in psychological assessment (Vol. 8, pp. 75±102). Erford, B. T. (1996). Analysis of the Conners' Teacher
New York: Plenum. Rating Scale-28 (CTRS-28). Assessment, 3, 27±36.
Achenbach, T. M. (1993). Empirically based taxonomy: Faraone, S. V., Biederman, J., Weber, W., & Russell, R. L.
How to use syndromes and profile types derived from the (1998). Psychiatric, neuropsychological, and psychoso-
CBCL/4±18, TRF, and YSR. Burlington, VT: University cial features of DSM-IV subtypes of Attention-Deficit/
of Vermont, Department of Psychiatry. Hyperactivity Disorder: Results from a clinically referred
Achenbach, T. M., McConaughy, S. H., & Howell, C. T. sample. Journal of the American Academy of Child and
(1987). Child/adolescent behavioral and emotional Adolescent Psychiatry, 37, 185±103.
problems: Implications of cross-informant correlations Flanagan, D. P., Alfonso, V. C., Primavera, L. H., &
for situational specificity. Psychological Bulletin, 101, Povall, L. (1996). Convergent validity of the BASC and
213±232. SSRS: Implications for social skills assessment. Psychol-
August, G. J., MacDonald, A. W., Realmuto, G. M., & ogy in the Schools, 33, 13±23.
Skare, S. S. (1996). Hyperactive and aggressive path- Forbes, G. B. (1985). The Personality Inventory for
ways: Effects of demographic, family, and child char- Children (PIC) and hyperactivity: Clinical utility and
acteristics on children's adaptive functioning. Journal of problems of generalizability. Journal of Pediatric Psy-
Clinical Child Psychology, 25, 341±351. chology, 10, 141±149.
August, G. J., Realmuto, G. M., MacDonald III, A. W., Gdowski, C. L., Lachar, D., & Kline, R. B. (1985). A PIC
Nugent, S. M., & Crosby, R. (1996). Prevalence of profile typology of children and adolescents: I. An
ADHD and comorbid disorders among elementary empirically-derived alternative to traditional diagnosis.
school children screened for disruptive behavior. Journal Journal of Abnormal Psychology, 94, 346±361.
of Abnormal Child Psychology, 24, 571±595. Greenbaum, P. E., Dedrick, R. F., Prange, M. E., &
Biederman, J., Faraome, S. V., Doyle, A., Lehman, B. K., Friedman, R. M. (1994). Parent, teacher, and child
Kraus, I., Perrin, J., & Tsuang, M. T. (1993). Conver- ratings of problem behaviors of youngsters with serious
gence of the Child Behavior Checklist with structured emotional disturbances. Psychological Assessment, 6,
interview-based psychiatric diagnoses of ADHD children 141±148.
with and without comorbidity. Journal of Child Psychol- Jensen, P. S., Watanabe, H. K., Richters, J. E., Roper, M.,
ogy and Psychiatry, 34, 1241±1251. Hibbs, E. D., Salzberg, A. D., & Liu, S. (1996). Scales,
Bloomquist, M. L., August, G. J., Cohen, C., Doyle, A., & diagnoses, and child psychopathology: II. Comparing
Everhart, K. (1997). Social problem solving in the CBCL and the DISC against external validators.
hyperactive-aggressive children: How and what they Journal of Abnormal Child Psychology, 29, 151±168.
think in conditions of automatic and controlled proces- Kamphaus, R. W., & Frick, P. J. (1996). Clinical
sing. Journal of Clinical Child Psychology, 26, 172±180. assessment of child and adolescent personality and
Campbell, D. T., & Fiske, D. W. (1959). Convergent and behavior. Boston, MA: Allyn & Bacon.
400 Observations of Parents, Teachers, and Children

Kamphaus, R. W., Huberty, C. J., DiStifano, C., & for Youth: Contribution to diagnosis. Paper presented at
Petoskey, M. D. (1997). A typology of teacher-rated the 104th Annual Convention, American Psychological
child behavior for a national U.S. sample. Journal of Association, Toronto, Canada.
Abnormal Child Psychology, 25, 453±463. Lachar, D., & Kline, R. B. (1994). Personality Inventory
Kamphaus, R. W., & Reynolds, C. R. (1998). BASC for Children and Personality Inventory for Youth. In M.
Monitor for ADHD manual. Circle Pines, MN: American Maruish (Ed.), The use of psychological testing for
Guidance Service. treatment planning and outcome assessment. Hillsdale,
Kline, R. B. (1994). New objective rating scales for child NJ: Erlbaum.
assessment, I. Parent- and teacher-informant inventories Lachar, D., & Kline, R. B., & Gdowski, C. L. (1987).
of the Behavior Assessment System for Children, the Respondent psychopathology and interpretive accuracy
Child Behavior Checklist, and the Teacher Report Form. of the Personality Inventory for Children: The evalua-
Journal of Psychoeducational Assessment, 12, 289±306. tion of a ªmost reasonableº assumption. Journal of
Kline, R. B. (1995). New objective rating scales for child Personality Assessment, 51, 165±177.
assessment, II. Self-report scales for children and Lachar, D., Kline, R. B., Wingenfeld, S. A., & Gruber, C.
adolescents: Self-Report of Personality of the Behavior P. (1999). Student Behavior Survey (SBS) manual. Los
Assessment System for Children, the Youth Self-Report, Angeles: Western Psychological Services.
and the Personality Inventory for Youth. Journal of LaCombe, J. A., Kline, R. B., Lachar, D., Butkus, M., &
Psychoeducational Assessment, 13, 169±193. Hillman, S. B. (1991). Case history correlates of a
Kline, R. B., & Lachar, D. (1992). Evaluation of age, sex, Personality Inventory for Children (PIC) profile typol-
and race bias in the Personality Inventory for Children ogy. Psychological Assessment: A Journal of Consulting
(PIC). Psychological Assessment, 4, 333±339. and Clinical Psychology, 3, 678±687.
Kline, R. B., Lachar, D., & Gdowski, C. L. (1987). A PIC Lett, N. J., & Kamphaus, R. W. (1997). Differential
typology of children and adolescents: II. Classification validity of the BASC Student Observation System and
rules and specific behavior correlates. Journal of Clinical the BASC Teacher Rating Scale. Canadian Journal of
Child Psychology, 16, 225±234. School Psychology, 13, 1±14.
Kline, R. B., Lachar, D., Gruber, C. P., & Boersma, D. C. Loeber, R., Green, S. M., & Lahey, B. B. (1990). Mental
(1994). Identification of special education needs with the health professionals' perception of the utility of children,
Personality Inventory for Children (PIC): A profile- mothers, and teachers as informants on childhood
matching strategy. Assessment, 1, 301±313. psychopathology. Journal of Clinical Child Psychology,
Kline, R. B., Lachar, D., & Sprague, D. J. (1985). The 19, 136±143.
Personality Inventory for Children (PIC): An unbiased Lowman, M. G., Schwanz, K. A., & Kamphaus, R. W.
predictor of cognitive and academic status. Journal of (1996). WISC-III third factor: Critical measurement
Pediatric Psychology, 10, 461±477. issues. Canadian Journal of School Psychology, 12,
Knoff, H. M. (1989). Review of the Personality Inventory 15±22.
for Children, Revised Format. In J. C. Connolly & J. C. Merrell, K. W. (1994). Assessment of behavioral, social, &
Kramer (Eds.), The tenth mental measurements yearbook emotional problems. Direct & objective methods for use
(pp. 624±630). Lincoln, NE: Buros Institute of Mental with children and adolescents. New York: Longman.
Measurements. Michael, K. D., & Merrell, K. W. (1998). Reliability of
Kovacs, M. (1992). Children's Depression Inventory (CDI) children's self-reported internalizing symptoms over
manual. Toronto: Multi-Health Systems. short to medium-length time intervals. Journal of the
Lachar, D. (in press). Personality Inventory for Children- American Academy of Child and Adolescent Psychiatry,
2nd Edition (PIC-2), Personality Inventory for Youth 37, 194±201.
(PIY), and the Student Behavior Survey (SBS). In M. O'Connor, T. G., McGuire, S., Reiss, D., Hetherington, E.
Maruish (Ed.), The use of psychological testing for M., & Plomin, R. (1998). Co-occurrence of depressive
treatment planning and outcome assessment (2nd ed.). symptoms and antisocial behavior in adolescence: A
Hillsdale, NJ: Erlbaum. common genetic liability. Journal of Abnormal Psychol-
Lachar, D. (1982). Personality Inventory for Children ogy, 107, 27±37.
(PIC) revised format manual supplement. Los Angeles: Ostrander, R., Weinfurt, K. P., Yarnold, P. R., & August,
Western Psychological Services. G. J. (in press). Diagnosing attention deficit disorders
Lachar, D. (1993). Symptom checklists and personality using the BASC and the CBCL: Test and construct
inventories. In T. R. Kratochwill & R. J. Morris (Eds.), validity analyses using optimal discriminant classifica-
Handbook of psychotherapy with children (pp. 38±57). tion trees. Journal of Consulting and Clinical Psychology.
New York: Allyn & Bacon. Pearson, D. A., & Lachar, D. (1994). Using behavioral
Lachar, D. (1999). Personality Inventory for Children-2nd questionnaires to identify adaptive deficits in elementary
Edition (PIC-2) manual. Los Angeles: Western Psycho- school children. Journal of School Psychology, 32, 33±52.
logical Services. Phares, V. (1997). Accuracy of informants: Do parents
Lachar, D., & Gdowski, C. L. (1979). Actuarial assesment think that mother knows best? Journal of Abnormal Child
of child and adolescent personality: An interpretive guide Psychology, 25, 165±171.
for the Personality Inventory for Children profile. Los Pisecco, S., Lachar, D. Gallen, R. T., Gruber, C. P., &
Angeles: Western Psychological Services. Huzinec, C. (1998). Development of disruptive behavior
Lachar, D., Gdowski, C. L., & Snyder, D. K. (1982). DSM-IV scales from teacher ratings. Manuscript sub-
Broad-band dimensions of psychopathology: Factor mitted for publication.
scales for the Personality Inventory for Children. Journal Realmuto, G. M., August, G. J., Sieler, J. D., & Pessoa-
of Consulting and Clinical Psychology, 50, 634±642. Brandao (1997). Peer assessment of social reputation in
Lachar, D., & Gruber, C. P. (1993). Development of the community samples of disruptive and nondisruptive
Personality Inventory for Youth: A self-report compa- children: Utility of the revised class play method. Journal
nion to the Personality Inventory for Children. Journal of Clinical Child Psychology, 26, 67±76.
of Personality Assessment, 61, 81±98. Reynolds, C. R., & Kamphaus, R. W. (1992). Behavior
Lachar, D., & Gruber, C. P. (1995). Personality Inventory Assessment System for Children manual. Circle Pines,
for Youth (PIY) manual technical guide. Los Angeles: MN: American Guidance Service.
Western Psychological Services. Reynolds, C. R., & Richmond, B. O. (1985). Revised
Lachar, D., Harper, R. A., Green, B. A., Morgan, S. T., & Children's Manifest Anxiety Scale manual. Los Angeles:
Wheeler, A. C. (1996, August). The Personality Inventory Western Psychological Services.
References 401

Richters, J. E. (1992). Depressed mothers as informants (1997). Diagnosing ADHD (predominantly inattentive
about their children: A critical review of the evidence for and combined subtypes): Discriminant validity of the
distortion. Psychological Bulletin, 112, 485±499. Behavior Assessment System for Children and the
Rowe, K. S., & Rowe, K. J. (1997). Norms for parental Achenbach parent and teacher rating scales. Journal of
ratings on Conners' Abbreviated Parent±Teacher Ques- Clinical Child Psychology, 26, 349±357.
tionnaire: Implications for the design of behavioral Vignoe, D., & Achenbach, T. M. (1997). Bibliography of
rating inventories and analyses of data derived from published studies using the Child Behavior Checklist and
them. Journal of Abnormal Child Psychology, 25, related materials: 1997 edition. Burlington, VT: Uni-
425±451. versity of Vermont, Department of Psychiatry.
Steingard, R., Biederman, J., Doyle, A., & Sprich- Wainwright, A., & MHS Staff (1996). Conners' Rating
Buckminster, S. (1992). Psychiatric comorbidity in Scales: Over 25 years of researchÐAn annotated biblio-
attention deficit disorder: Impact on the interpretation graphy. Toronto: Multi-Health Systems.
of Child Behavior Checklist results. Journal of the Weinstein, S. R., Noam, G. G., Grimes, K., Stone, K., &
Academy of Child and Adolescent Psychiatry, 31, Schwab-Stone, M. (1990). Convergence of DSM-III
449±454. diagnoses and self-reported symptoms in child and
Vaughn, M., Black, K., Hall, J., Hynd, G., & Riccio, C. A. adolescent inpatients. Journal of the American Academy
(1995). Use of the BASC and Achenbach for assessment of Child and Adolescent Psychiatry, 29, 627±634.
and intervention planning. Paper presented at the 1995 Wingenfeld, S. A., Lachar, D., Gruber, C.P., & Kline, R.
Convention of the National Association of School B. (1998). Development of the teacher-informant Stu-
Psychologists, Chicago. dent Behavior Survey. Manuscript submitted to Journal
Vaughn, M. L., Riccio, C. A., Hynd, G. W., & Hall, J. of Psychoeducational Assessment.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.14
Objective Personality Assessment
with Adults
JAMES N. BUTCHER and JEANETTE TAYLOR
University of Minnesota, Minneapolis, MN, USA
and
G. CYNTHIA FEKKEN
Queen's University, Kingston, ON, Canada

4.14.1 INTRODUCTION 404


4.14.1.1 Intelligence and Insight 404
4.14.1.2 Veridicality of Self-report 405
4.14.1.3 Personality Stability 405
4.14.1.3.1 Definitions 405
4.14.1.3.2 Influences on personality stability 406
4.14.2 ASSESSING ADULTS IN CLINICAL SETTINGS 407
4.14.2.1 The Minnesota Multiphasic Personality Inventory-Revised 407
4.14.2.1.1 Origin of the MMPI/MMPI-2 408
4.14.2.1.2 Measurement dimensions 408
4.14.2.1.3 Recent validity research for the MMPI-2 409
4.14.2.2 Basic Personality Inventory 413
4.14.2.3 Personality Assessment Inventory 415
4.14.3 SPECIALIZED OR FOCUSED CLINICAL ASSESSMENT MEASURES 416
4.14.3.1 Millon Clinical Multiaxial Inventory 416
4.14.3.2 The Beck Depression Inventory 418
4.14.3.3 The State-Trait Anxiety Inventory 418
4.14.3.4 Whitaker Index of Schizophrenic Thinking 419
4.14.4 NORMAL RANGE PERSONALITY ASSESSMENT 420
4.14.4.1 Objective Personality Measures in Research 420
4.14.4.1.1 The Five Factor Model (the Big Five) 420
4.14.4.1.2 NEO Personality Inventory 420
4.14.4.1.3 Multidimensional Personality Questionnaire 420
4.14.4.1.4 Personality Research Form 421
4.14.4.1.5 Sixteen Personality Factor Test 421
4.14.4.1.6 California Psychological Inventory 421
4.14.4.2 Objective Personality Measures in Educational/Vocational Assessment 422
4.14.4.2.1 FFM 422
4.14.4.2.2 NEO-PI 422
4.14.4.2.3 CPI 422
4.14.4.2.4 MMPI/MMPI-2 422
4.14.4.3 Personnel Screening 423
4.14.4.4 Other Personality Measures in Personnel Selection 423
4.14.4.5 The 16PF 423
4.14.4.6 FFM 423

403
404 Objective Personality Assessment with Adults

4.14.4.7 NEO-PI 424


4.14.4.7.1 CPI 424
4.14.5 SUMMARY 424
4.14.6 REFERENCES 425

4.14.1 INTRODUCTION relationship between a client's level of intelli-


gence and his or her ability to competently
Objective personality tests form a standard reveal information about personality through a
part of most applied psychologists' toolboxes self-report instrument.
when it comes to measuring personality. In Objective personality measures leave the issue
contrast to projective techniques and to sub- of the client's intellectual functioning in the
jective approaches to personality assessment, hands of the clinician. None of the widely used
objective personality questionnaires are made personality assessment instruments require a
up of relatively unambiguous stimuli or items; pre-screening for normal or above normal
offer the respondent relatively restricted re- intelligence. However, most self-report person-
sponse options; and present a scoring scheme ality instruments require that the examinee have
that involves few, if any, scoring judgments, a minimum reading level competency (often
resulting in high scoring reliability (Wiggins, expressed as a minimum grade level). Thus, a
1973). Most objective personality assessment minimum level of school achievement is implicit
instruments used by clinicians and researchers in the validity of a client's score, and it follows
employ a self-report format. In contemporary that the clinician should use care when admin-
psychology, the Minnesota Multiphasic Person- istering a self-report instrument to a client with
ality Inventory (MMPI) and the MMPI-2 are known or suspected intellectual impairment. It
the most widely used personality assessment is also essential for the clinician to be aware of
instruments in both clinical and research the educational range of the standardization
settings (Butcher and Rouse, 1996). The next sample for any self-report measure, and to use
most popular instruments used to assess caution when interpreting protocols from
personality, the Rorschach and the Thematic clients who fall outside of that range.
Apperception Test, are projective (not objec- The clinician needs to be aware of a client's
tive), but each still requires the clinician to intellectual functioning as it may influence the
obtain self-reported behavior. In essence, the client's ability to provide information about his
clinician is dependent on the client's ability and or her personality on a self-report measure. If a
willingness to make accurate self-reports when client appears to be low in intellectual function-
assessing personality with standardized instru- ing, then the results of self-report measures
ments. It is, therefore, important to determine should be interpreted with caution and perhaps
whether adults being assessed can competently supplemented with reports from other sources.
reveal information about their personalities While the capacity for insight cannot readily
through self-report instruments. In order to be extrapolated from an intelligence quotient in
address this question, the client's intelligence a linear fashion, insight into one's problems
and insight in sharing self-information and the might be discernible from the personality profile
success of personality tests for eliciting self- itself. As Butcher and Rouse (1996) note in their
information that can be externally verified need recent review of personality research, several
to be considered. We will examine each of these of the widely used objective personality mea-
considerations below. sures have attitudinal measures that aid the
clinician in evaluating the client's disclosure
4.14.1.1 Intelligence and Insight capability. Crookes and Buckley (1976) found a
relationship between the Eysenck Personality
The existing literature on intelligence and Inventory Lie (L) scale score and diagnosis
personality is largely focused on the predictive among both psychiatric inpatients and out-
relationship between the two constructs. How- patients. They found that high L scale scores
ever, clinicians are generally not concerned were positively related to diagnosis of disorders
about the ability of a personality profile to associated with low insight. They concluded
predict an intelligence quotient or vice versa. that the L score (a validity indicator found on
Instead, most are concerned with the more basic several objective personality instruments) is
question of how intellectual functioning impacts highly related to a person's awareness of his or
the ability of a client to self-disclose on an her behavior. Moreover, this finding is in
objective personality test. No literature exists agreement with the interpretive guidelines for
which directly addresses the issue of the the MMPI-2 L scale, which suggests the general
Introduction 405

utility of this measure at appraising a client's Jaffe and Archer, 1987), and the Basic Person-
self-awareness. ality Inventory (BPI; Holden, Fekken, Reddon,
The quality of insight that a client brings to Helmes, & Jackson, 1988), to name three.
the assessment situation may greatly influence Although the research literature shows that
the validity of the assessment. Some instruments people can competently disclose information
are designed to assess clients through behavioral about their personalities through standardized,
items that require low insight or self-observa- objective, self-report measures, some indivi-
tion. These items are specifically selected duals, however, may not be motivated to
because of their ability to predict relevant cooperate in their psychological assessment.
criteria and may appear at face value totally That is, some respondents, because of a need or
unrelated to the particular aspects of person- motivation to present themselves in a particular
ality functioning intended to be predicted. This way, do not respond in a truthful, open manner.
highlights one advantage of using objective The most common examples of individuals
personality measures in clinical settings: clients being motivated to appear different from the
who may yield poor information during a way they actually are in today's assessment
clinical interview that requires self-evaluation settings are applicants for employment, parents
and/or introspection may be fully capable of involved in custody disputes, and other in-
providing valuable clinical information when a dividuals being evaluated as part of a court case.
personality instrument comprising empirically A self-report assessment instrument must con-
selected items is employed. tain an effective means of identifying respon-
dents who are dissimulating. Perhaps the most
common way of detecting whether people are
4.14.1.2 Veridicality of Self-report not motivated to report accurately about
themselves is via validity scales or control
Aside from the issues of intellectual function- scales. In some assessment situations the
ing and insight the utility of self-report measures validity scales provide the most important
can be assessed by examining their success at and useful information about the client. A
corroborating independent facts about a client. self-report personality measure without effec-
For example, in an early study of the MMPI, tive validity scales is too limited to operate
Payne and Wiggins (1972) examined the across a broad range of assessment situations.
relationship between content-based profiles Furthermore, tests that possess validity indica-
and external descriptors of a large group of tors, such as the L scale on the MMPI/MMPI-2,
psychiatric inpatients. Interpretation of offer the clinician information regarding both
content-based profiles is based on combinations the client's willingness to cooperate with the
of ªobviousº test items which correspond to assessment and his or her level of insight. These
traditional self-report instruments (e.g., ªI am a points argue in favor of the continued use of
high-strung personº). That is, the client en- standardized, objective self-report measures in
dorses the item as a means of directly relaying the assessment of personality in clinical settings.
information about him- or herself. The authors
found that the patients' self-reported MMPI
profiles matched quite well with interview 4.14.1.3 Personality Stability
report and external observation.
In a similar vein, Koss and Butcher (1973) One of the assumptions of objective person-
found that psychiatric inpatients identified ality assessment is that the personality char-
(through their observed behavior and present- acteristic being measured is stable. Various
ing symptoms) as belonging to one of six major studies have come to the conclusion that
operationally defined crisis situations could be personality changes little in adulthood, parti-
discriminated from one another on the basis of cularly after age 30 (Conley, 1985; Finn, 1986;
their endorsed MMPI content. The authors McCrae & Costa, 1990; Schuerger, Zarrella, &
interpreted their results as providing evidence Hotz, 1989). Typical levels of stability on
for the competence and willingness of adults in personality tests are in the 0.5±0.7 range. This
clinical settings to reflect information accurately generalization bears some comment.
about their personality and psychological
functioning through self-report test items.
4.14.1.3.1 Definitions
Similar evidence of a positive relationship
between self-reported personality measures What exactly is meant by stable responding?
and external criteria has been found for certain On an objective personality questionnaire, a set
scales from the California Psychological In- of responses that are consistent over time may
ventory (CPI; e.g., Hindelang, 1972), the Millon be defined in at least three ways: (i) as an
Clinical Multiaxial Inventory (MCMI; e.g., identical scale score, (ii) as a set of identical item
406 Objective Personality Assessment with Adults

responses, or (iii) as a scale score that signifies however, proposed a substantive rather than a
the same relative standing on the personality methodological explanation. He argued that
characteristic. particular personality characteristics ªswellº
Conclusions about the stability of personality into prominence over the short term and then
are most commonly based on scale scores. That fade again. For a student entering college, for
is, the stability of personality estimates result example, ªindependenceº may become a salient
from comparing scale scores obtained at two dimension and hence, self-reports on this
points in time. Presumably the same scale score dimension are likely to have exaggerated
could result even if the test respondent endorsed consistency over the (short) time period when
somewhat different subsets of items. The ªindependenceº is prominent.
consistency of responses to individual test items A third influence on personality stability has
has been studied as a meaningful individual to do with the operationalization of personality.
differences variable and as an index for Most objective personality questionnaires are
establishing the interpretability of individual intended to measure traits that are by definition
response protocols. Item response consistency stable and enduring. The construction of such
has not been examined as a definition of questionnaires favors selection of items that will
longitudinal personality stability per se. yield stable scores. In addition, the overall
Many researchers have focused on changes in instructions as well as the wording of specific
average scale scores. Their goal is to understand items prompt people to make broad general-
normative changes in personality. On the other izations about themselves and to downplay
hand, some researchers examine personality variations. Thus, personality may appear to be
stability by correlating scale scores. Thus, when stable because the constructs of personality are
coefficients of personality stability are reported explicitly conceptualized and measured as
to be in the 0.5±0.7 range, this should be stable. There are certainly empirical studies
understood to be a statement about relative, and that challenge the idea of personality stability by
not absolute, scale score stability. providing evidence of developmental change
over long time periods based on personality
constructs and measures that expressly incor-
4.14.1.3.2 Influences on personality stability
porate the notion of change (e.g., Whitbourne,
Although the typical stability of personality Zuschlag, Elliot, & Waterman, 1992).
scale scores is quite high, there are nonetheless Fourth, some personality constructs are
particular variables that affect score stability in associated with more stability than others.
predictable ways. The first consideration is the Schuerger et al.'s (1989) data show that
influence of instrument characteristics. Specifi- measures of psychopathology have less scale
cally, the average number of items per scale and score stability than measures of ªnormalº
the homogeneity of the scales are the important personality characteristics. Perhaps ªnormalº
predictors of scale score stability (Finn, 1986; personality constructs are more crystallized
Schuerger et al., 1989). Variance associated with than psychopathological constructs and thus
specific instruments (e.g., the MMPI vs. the less error is built into the construct itself.
Sixteen Personality Factor Test [16PF]) appar- Similarly, psychopathological constructs may
ently does not contribute to greater personality be more confounded with the effects of response
stability beyond the effects of scale length and style variance such as social desirability and
homogeneity (Schuerger et al., 1989). Such acquiescence than normal personality con-
findings underscore the need for test users to structs, again compounding the noisiness of
select reliable measures, particularly if test data the construct.
are intended to evaluate long-term personality Many researchers have tried to highlight
change. differences in stability among the specific
Second, estimates of personality stability are constructs that they chose to study. For
affected by the length of the retest interval. Not example, Finn (1986) demonstrated that mood-
surprisingly, personality appears to more stable related constructs (e.g., depression) had com-
over short intervals than long ones. In parti- paratively low levels of consistency, whereas
cular, score stability drops over the first year Helson and Moane (1987) found that constructs
before it levels off (Schuerger et al., 1989; related to socialization had high stability. There
Windle, 1954) and then stays stable at the same are shortcomings to such an approach. The
level for very long periods of time (McCrae & identification of those dimensions on which
Costa, 1990). people are changeable (i.e., states) as distinct
One simple reason for relatively high short- from the more permanent individual differences
term personality stability may be that people dimensions needs to be tackled systematically if
remember and repeat their responses to the a comprehensive list is to be obtained (Cattell,
items on a questionnaire. Lumsden (1977), 1963).
Assessing Adults in Clinical Settings 407

The interpretation of differences in stability expected to change on different constructs as a


between constructs is further complicated by function of distinct life paths or societal
age and cohort effects. Studies such as that of pressures. Few studies have been explicitly
Helson and Moane (1987) estimate personality designed to compare sex differences in person-
stability using information obtained from a ality stability.
single group of subjects who have been assessed Overall, personality shows considerable sta-
at different points in time. There are limits to the bility in adulthood. The objective measures
generalizability of this information to other traditionally used in personality assessment may
groups of people. Other studies such as that of predispose us to find evidence of stability as a
Finn (1986) use basically a cross-sectional function of the kinds of constructs, instructions,
approach. They assess two or more groups of and test construction techniques associated with
subjects having different ages at a single time. these measures. Aspects of the test instrument
Here it is difficult to attribute differences in the itself, such as length or homogeneity, may
stability of particular personality constructs to moderate stability estimates, as could the length
age as opposed to generational factors. More of the interval between personality assessments.
complicated designs may be needed in order to Samples of ªnormalsº and of older adults may
understand such age and cohort effects (Conley, exhibit more personality stability; sex differ-
1985). At the very least, any generalizations ences in general personality stability appear to
about which personality constructs are more or be minimal. The assumption of personality
less stable need to be carefully qualified. stability, at least over the short run, is central to
A fifth influence on personality stability has objective personality assessment if test scores
to do with features of the persons being are going to predict relevant behaviors and have
evaluated. Test respondents exhibit remarkable an impact on planning how to manage that
individual differences in personality stability behavior.
(Assendorp, 1992; McCrae & Costa, 1990). Do
some subgroups of people show more person-
ality stability? Across studies, patient and 4.14.2 ASSESSING ADULTS IN CLINICAL
prisoner samples show less personality test SETTINGS
score stability than ªnormalsº (Schuerger et al.,
In this section we will provide an overview of
1989). Perhaps ªnon-normalsº have system-
a number of objective personality inventories
atically higher scale scores than ªnormalsº and,
that have been designed to assess adults in
hence, their retest scores show regression
clinical settings. More detail on the use of the
toward the mean (Windle, 1954). Alternatively,
MMPI-2 will be given because it is the most
ªnon-normalº groups also may be more likely
widely researched and used measure. Two other
than normal groups to seek actively to change
measures that have recently been published to
personality.
measure essentially the same clinical problem
The other two person variables that have
areas as the MMPI-2, BPI (Jackson, 1989) and
received attention in the literature on person-
the Personality Assessment Inventory (PAI)
ality stability are age and sex differences. Some
(Morey, 1991) will be briefly described. In the
researchers report that older adults are more
next section, several specialized measures will
stable than younger adults on many traits (Finn,
then be surveyed that have been developed to
1986; Schuerge et al., 1989) although others
measure more specific clinical problems or
argue that stability differences after about age
behaviors: the MCMI to assess personality
30 are minimal (McCrae & Costa, 1990). Finn
disorders; the Beck Depression Inventory (BDI;
(1986) has asked whether older adults might just
Beck, Steer, & Garbin, 1988) to assess anxiety;
be more rigid in their self-perceptions. However,
the State±Trait Anxiety Inventory (STAI;
the picture of personality stability that results
Spielberger, Gorsuch, & Lushene, 1970); and
from self-reports does tend to be substantiated
the Whitaker Index of Schizophrenic Thinking
by ratings on personality information collected
(WIST; Whitaker, 1973).
from others, such as spouses (McCrae & Costa,
1990). Thus, personality may indeed be more
stable in older adults. 4.14.2.1 The Minnesota Multiphasic
Many studies in the area of personality Personality Inventory-Revised
stability sample exclusively men or women.
Yet there seems to be little evidence of an overall The MMPI-2 is the most widely used
difference in the level of personality stability for personality test with adults (Lubin, Larsen, &
men and women (Schuerger et al., 1989). Matarazzo, 1984; Watkins, 1996). Keilen and
Nonetheless, researchers interested in norma- Bloom (1986) found that the MMPI/MMPI-2
tive change often argue that men (Finn, 1986) was the most frequently used test in custody
and women (Helson & Moane, 1987) may be evaluations. Developed originally in the late
408 Objective Personality Assessment with Adults

1930s by Starke Hathaway and J. C. McKinley they had a mean score of 50 and a standard
and redeveloped by Butcher, Dahlstrom, Gra- deviation of 10. These standard score distribu-
ham, Tellegen, and Kaemmer (1989), this tions then allowed the scores to be plotted on a
instrument provides a comprehensive survey profile so that the interpreter would have a
of personality characteristics and clinical pro- visual picture of how extreme a particular score
blems. In the original MMPI, a strictly empirical was when compared with the normal. Their
scale construction approach was followed to empirical approach produced highly valid and
develop scales that would assess a patient's effective scales that predicted or described the
probable ªmembershipº in a clinical group. An likelihood that a person's score on a scale was in
extensive amount of research has been pub- the clinical range or similar to the patient
lished on the effectiveness of these measures at groups.
predicting and describing problems in adults. The original MMPI clinical scales have
Research on the original version of the MMPI undergone very substantial study and cross-
covered a very broad range of peopleÐ validation since their publication. They have
psychiatric and medical patients, substance become a standard means of objective symptom
abusers, incarcerated felons, and many other classification since their development. The
clinical groups. clinical scales and configurations of scales,
In addition, a very broad range of ªnormalsº referred to as code types, have undergone
have been studied including applicants for substantial documentation as an objective
various jobs such as airline pilots, US Navy classification schemaÐoften referred to as
submariner crew members, and police and ªcookbooks.º Researchers, for example, have
security personnel. Moreover, the instrument cataloged the behavioral characteristics asso-
came to be widely employed as a personality ciated with their test indices allowing for
research instrument. Butcher and Rouse (1996), automatic interpretation. That is, when a
in a survey of 20 years of research in clinical particular scale score or cluster of scales are
assessment, found that the MMPI/MMPI-2 is obtained, then a well-validated set of behavioral
the most widely researched instrument with descriptionsÐknown as descriptorsÐare ap-
nearly twice the number of articles as the seven plied. Their objective classification approach to
next leading tests. What has made the MMPI/ interpretation fostered the development of
MMPI-2 the most widely used assessment computer-based interpretation methods so pop-
technique in the personality area? We will ular today (Butcher, 1995; Butcher et al., 1998).
examine the make-up, utility, and limitations of
the instrument in assessing adults. 4.14.2.1.2 Measurement dimensions
(i) Validity scales
4.14.2.1.1 Origin of the MMPI/MMPI-2
As noted earlier, it is essential to any self-
A growing dissatisfaction with subjective reported personality assessment to appraise
methods of evaluating patients in clinical carefully possible invalidating conditions or
situations (such as interviews and projective circumstances. Structural elements of the test
tests) led the original MMPI developers, Hath- administration require evaluation to determine
away (a psychologist) and McKinley (a psy- if extratest factors influenced the item re-
chiatrist), to experiment with an objective sponses. For example, were the instructions
method of clinical problem assessment. They clearly presented, was the individual able to read
accumulated a large number of items (symp- and comprehend the items?
toms, beliefs, attitudes, etc.) from tests and case The MMPI-2 contains a number of scales and
material and administered them to a large group indexes that provide the test interpreter with
of ªnormalº people to serve as a comparison information on the individual's cooperativeness
group. They then administered the items to well- and honesty in responding to the items (see
defined and homogeneous groups of clinical Table 1 for a summary of the validity or control
patients. They empirically contrasted the re- scales for the MMPI-2). A clear picture is
sponses of the patient groups with the normals obtained as to whether the person has attempted
to obtain items that significantly separated the to present a false picture on the test. These scales
groups. These items were then combined into are of several types. First, noncontent-oriented
scales for the various clinical problem areas such measures can provide information on whether
as depression, schizophrenia, and so forth. Once the person was inconsistent in responding (the
derived, the clinical scales were normed on the True Response Inconsistency or TRIN and
non-patient sample and T scores were developed Variable Response Inconsistency or VRIN
to enable the test interpreter to determine how scales). Additionally, the Cannot Say or ª?'
extreme a particular person's score was on a scale provides information about the person's
given scale. All of the scales were normed so that cooperation in completing the items. Two other
Assessing Adults in Clinical Settings 409

indexes that can provide clues to uncooperative instrument to assess personality characteristics
test-taking behavior are the percentage of true of masculine±feminine interests and social
and percentage of false endorsement. Records introversion.
with a nearly all true or all false response pattern
suggest uncooperativeness in completing the
(iii) Content scales
task.
Three scales have been developed to provide The expanded item pool of the MMPI-2
information about the tendency on the part of allowed for the development of a new set of
some people to claim excessive virtue or to deny content-based scales to assess an expanded array
problems. The L scale assesses an unsophisti- of problems compared with the traditional
cated tendency to claim excessive virtue; the K clinical measures. The 15 content scales (see
scale measures test defensiveness; and the S Table 3) were developed according to a multi-
scale (Superlative Self-presentation) assesses the method/multistage strategy. The 567 items were
tendency of some people to present themselves subjected to a content analysis in order to derive
in a highly favorable manner (Butcher & Han, homogeneous content groups rationally. This
1995). approach is not as theoretically blind as it seems,
Three scales have been developed to assess the since the original MMPI content had been well
tendency that some respondents have to claim delineated by Wiggins (1969) and many of these
excessive mental health symptoms. The original constructs were still available in a modified or
F scale developed by Hathaway and McKinley reduced form in the revised instruments. More-
(1940) assesses excessive symptom checking by over, the MMPI-2 committee also wrote new
attending to extreme or rare responses. Those items to cover contents that were limited or not
who endorse a large number of rare items are available in MMPI-2, for example, suicide
thought to be presenting an unselective or ideation, substance abuse, and type A behavior.
exaggerated complaint pattern. High scores on It therefore became possible to develop scales for
F have been associated with malingering (Berry, a number of problem areas that have clinical
Baer, & Harris, 1991; Schretlen, 1988). relevance and are sufficiently large to assess
The F(b) scale, an infrequency scale that reliably. Following provisional derivation, other
covers extreme items in the back of the MMPI-2 scale construction strategies such as internal
items operates much like the original F scale consistency, item scale correlation, and external
(Butcher et al., 1989). The newest validity validation were followed to define and improve
measure developed by Arbisi and Ben-Porath the item groups.
(1995) is an extremely valuable type of Content scales provide a different type of
infrequency scale that differs from the original information to the clinical interpreter's re-
F scale in an important way: the F(p) scale sources. These scales are viewed as direct
assesses extreme responding in a clinical sample. communications between the client and the
The F(p) scale assesses infrequent responding in clinician. The obvious content on the scales
a clinical setting and thereby, when extreme, enables the client to address problems he or she
suggests malingering of psychiatric symptoms. has that are considered important to address in
the clinical intervention.
In addition to their value as summaries or
(ii) Clinical scales
themes considered pertinent by the patient the
The traditional MMPI clinical scales (Hath- content scales have clearly established external
away & McKinley, 1940) have been updated in validity. A number of studies have provided
the MMPI-2 (Butcher et al., 1989) and have data on the external validity of the content
been extensively revalidated in their revised scales (Ben-Porath, Butcher, & Graham, 1991;
form (Archer, Griffin, & Aiduk, 1995; Butcher Butcher, Graham, Williams, & Ben-Porath,
& Williams, 1992; Graham & Ben-Porath, 1990; 1990).
Graham & Butcher, 1988). The MMPI-2
clinical scales are empirically derived measures
4.14.2.1.3 Recent validity research for the
of several well-established clinical patterns such
MMPI-2
as hypochondriasis, depression, paranoia, and
schizophrenia (see Table 2). An empirically derived and based instrument
These scales have been well studied over more requires substantial research validation in order
than 50 years. They were kept nearly intact in to be useful. The MMPI-2 has been substan-
MMPI-2 because of their descriptive power and tially researched and validated. The fact that the
ability to generalize across groups. MMPI-2 clinical scales are the same items as in
Two scales on the clinical profile, Mf and Si, the original instrument means that all of the
are not empirically derived scales as were the research on the traditional measures applies
original clinical scales but were included on the with the revised form.
410 Objective Personality Assessment with Adults

Table 1 Personality characteristics associated with validity indicator elevations.

? Cannot say score


The total number of unanswered items. A defensive protocol with possible attenuation of scale scores is
suggested if the ? raw score is more than 35.

L (Lie) scale
A measure of an unsophisticated or self-consciously ªvirtuousº test-taking attitude. Elevated scores (above 70
T) suggest that the individual is presenting himself or herself in an overly positive light, attempting to create an
unrealistically favorable view of his or her adjustment.

F (Infrequency) scale
A high score (T above 90) suggests an exaggerated pattern of symptom checking that is inconsistent with
accurate self-appraisal and suggests confusion, disorganization, or actual faking of mental illness. T scores
above 100 invalidate the profile.

F(B) scale
A second infrequency scale that appears toward the back of the MMPI-2 item pool is used to assess exaggerated
responding at the end of the test. This scale operates much like the original MMPI F scale in detecting
malingering, random responding, or motivation to exaggerate symptoms.

F(p) scale
The Psychopathology Infrequency Scale F(p) was developed by Arbisi and Ben-Porath (1995) to assess
infrequent responding in psychiatric settings. This scale is valuable in appraising the tendency for some people to
exaggerate mental health symptoms in the context of patients with genuine psychological disorder.

K (Defensiveness) scale
Measures an individual's willingness to disclose personal information and discuss his or her problems. High K
scores (T above 65) reflect an uncooperative attitude and an unwillingness or reluctance to disclose personal
information. Low scores (T below 45) suggest openness and frankness. This scale is positively correlated with
intelligence and educational level, which should be taken into account when interpreting the scores.

Variable response inconsistency


The VRIN scale consists of 49 pairs of specially selected items. The members of each VRIN item pair have either
similar or opposite content; each pair is scored for the occurrence of an inconsistency in responses to the two
items. The scale score is the total number of item pairs answered inconsistently. High VRIN scores are a warning
that a test subject may have been answering the items in the inventory in an indiscriminate manner, and raise the
possibility that the protocol may be invalid and that the profile is essentially uninterpretable.

True response inconsistency


The TRIN is made up of 20 pairs of items that are opposite in content. If a subject responds inconsistently by
answering True to both items of certain pairs, one point is added to the TRIN score; if the subject responds
inconsistently by answering False to certain item pairs, one point is subtracted. A very high TRIN score
indicates a tendency to give True answers to the items indiscrimately (ªacquiescenceº), and a very low TRIN
score indicates a tendency to answer False indiscriminately (ªnonacquiescenceº). (Negative TRIN scores are
avoided by adding a constant to the raw score.) Very low or very high TRIN scores are a warning that the test
subject may have been answering the inventory indiscriminately so that the profile may be invalid and
uninterpretable.

S scale
A new MMPI-2 scale was developed in an effort to improve the assessment of highly virtuous responding. The
subjects used in the development of the scale were 274 male airline applicants and the MMPI-2 normative
sample (N = 1138 men and 1462 women). The S scale was initially developed by examining item response
differences between airline pilot applicants, who tend to engage in superlative self-description in order to
impress examiners, and normative men from the MMPI-2 restandardization sample. The scale was refined by
using internal consistency methods to ensure high-scale homogeneity. A factor analysis of the resulting 50 item
scale (S) yielded five factors named: Beliefs in ªHuman Goodnessº; Serenity; Contentment with Life; Patience
and Denial of Irritability and Anger; and Denial of Moral Flaws. Linear T score conversion tables were
computed for both men and women separately and combined (unisex) using the MMPI-2 restandardization
data sets. The S scale was shown to have a number of behavioral correlates reflecting the presentation of oneself
as a well-controlled, problem-free person.
Assessing Adults in Clinical Settings 411

Table 2 Empirical correlates for MMPI-2 clinical scales.

Scale 1 (Hypochondriasis)
High-scoring people show: excessive bodily concern; somatic symptoms that tend to be vague and undefined;
fatigue, pain, weakness; manifest anxiety; selfish, self-centered, and narcissistic behavior; pessimistic, defeatist,
cynical outlook on life; dissatisfied and unhappy; make others miserable with whining, complaining behavior;
demanding and critical of others; expresses hostility indirectly; rarely act out; dull, unenthusiastic, unambitious;
ineffective in oral expression, longstanding health concerns; function at a reduced level of efficiency without
major incapacity; not very responsive to therapy, tend to terminate therapy when therapist is seen as not giving
enough attention and support; tend to seek medical solutions to life problems.

Scale 2 (Depression)
High-scoring people show: depressed, unhappy, and dysphoric behavior; they are pessimistic and self-
deprecating; tend to feel guilty; report being sluggish; have somatic complaints such as weakness, fatigue, and
loss of energy; are agitated, tense, highly-strung, and irritable; prone to worry; lack self-confidence; feel useless
and unable to function; feels like a failure at school or on the job; introverted, shy, retiring, timid, and seclusive;
aloof, maintain psychological distance; avoids interpersonal involvement; cautious, and conventional; has
difficulty making decisions; nonaggressive; overcontrolled, deny impulses; make concessions to avoid conflict;
motivated for therapy.

Scale 3 (Hysteria)
High-scoring people show: poor response to stress; they avoid responsibility through development of physical
symptoms; have headaches, chest pains, weakness, and tachycardia, anxiety attacks; symptoms appear and
disappear suddenly; lack insight about causes of symptoms; lack insight about own motives and feelings; lack
anxiety, tension, and depression; rarely report delusions, hallucinations, or suspiciousness; psychologically
immature, childish, and infantile; self-centered, narcissistic, and egocentric; expect attention and affection from
others; use indirect and devious means to get attention and affection; do not express hostility and resentment
openly; socially involved; friendly, talkative, and enthusiastic; superficial and immature in interpersonal
relationships; show interest in others for selfish reasons; occasionally act out in sexual or aggressive manner with
little apparent insight; initially enthusiastic about treatment; respond well to direct advice or suggestion; slow to
gain insight into causes of own behavior; resistant to psychological interpretations.

Scale 4 (Psychopathic deviate)


High-scoring people show: antisocial behavior; rebellious toward authority figures; stormy family relationships;
blame parents for problems; history of under-achievement in school; poor work history; marital problems;
impulsive; strive for immediate gratification of impulses; do not plan well; act without considering consequences
of actions; impatient; limited frustration tolerance; poor judgment; take risks; do not profit from experience;
immature, childish, narcissistic, self-centered, and selfish; ostentatious, exhibitionistic; insensitive; interested in
others in terms of how they can be used; likeable and usually create a good first impression; shallow, superficial
relationships, unable to form warm attachments; extroverted, outgoing; talkative, active, energetic, and
spontaneous; intelligent; assert self-confidence; have wide range of interests; lack definite goals; hostile,
aggressive; sarcastic, cynical; resentful, rebellious; act out; antagonistic; aggressive outbursts, assaultive
behavior; little guilt over negative behavior; may feign guilt and remorse when in trouble; free from disabling
anxiety, depression, and psychotic symptoms; likely to have personality disorder diagnosis (antisocial or
passive±aggressive); prone to worry; dissatisfied; show absence of deep emotional response; feel bored and
empty; poor prognosis for change in therapy; blame others for problems; intellectualize; may agree to treatment
to avoid jail or some other unpleasant experience but are likely to terminate before change is effected.

Scale 5 (Masculinity±femininity)
MALES
Scores of T over 80: Show conflicts about sexual identity; insecure in masculine role; effeminate; aesthetic and
artistic interests; intelligent and capable; value cognitive pursuits; ambitious, competitive, and persevering;
clever, clear-thinking, organized, logical; show good judgment and common sense; curious; creative,
imaginative, and individualistic in approach to problems; sociable; sensitive to others; tolerant; capable of
expressing warm feelings toward others; passive, dependent, and submissive; peace-loving; make concessions to
avoid confrontations; good self-control; rarely act out. (The interpretation of high 5 scores should be tempered
for males with advanced academic degrees.)

High T score between 70 and 79: May be viewed as sensitive; insightful; tolerant; effeminate; showing broad
cultural interests; submissive, passive. (In clinical settings, the patient might show sex role confusion; or
heterosexual adjustment problems.)
412 Objective Personality Assessment with Adults
Table 2 (continued)

Low T < 35: ªMachoº self-image, present self as extremely masculine; overemphasize strength and physical
prowess; aggressive, thrill-seeking, adventurous, and reckless; coarse, crude, and vulgar; harbor doubts about
own masculinity; have limited intellectual ability; narrow range of interests; inflexible and unoriginal approach
to problems; prefer action to thought; are practical and nontheoretical; easy-going, leisurely and relaxed;
cheerful, jolly, humorous; contented; willing to settle down; unaware of social stimulus value; lack insight into
own motives; unsophisticated.

FEMALES
Scores of T over 70: Reject traditional female roles and activities; masculine interests in work, sports, hobbies;
active, vigorous, and assertive; competitive, aggressive, and dominating; coarse, rough, and tough; outgoing,
uninhibited, and self-confident; easy-going, relaxed, balanced; logical, calculated; unemotional, and unfriendly.

Low T < 35: Describe self in terms of stereotyped female role; doubts about own femininity; passive, submissive,
and yielding; defer to males in decision-making; self-pity; complaining, fault finding; constricted; sensitive;
modest; idealistic. (This interpretation for low 5 females does not apply for females with postgraduate degrees.)

Scale 6 (Paranoia)
Extremely high-scoring people (T > 80) show: frankly psychotic behavior; disturbed thinking; delusions of
persecution and/or grandeur; ideas of reference; feel mistreated and picked on; angry and resentful; harbor
grudges; use projection as defense; most frequently diagnosed as schizophrenia or paranoid state.

Moderate elevations (= 65±79 for males; 71±79 for females): Paranoid predisposition; sensitive; overly
responsive to reactions of others; feel they are getting a raw deal from life; rationalize and blame others;
suspicious and guarded; hostile, resentful, and argumentative; moralistic and rigid; overemphasizes rationality;
poor prognosis for therapy; do not like to talk about emotional problems; difficulty in establishing rapport with
therapist.

Extremely low (T < 35): should be interpreted with caution. In a clinical setting, low 6 scores, in the context of a
defensive response set, may suggest frankly psychotic disorder; delusions, suspiciousness, ideas of reference;
symptoms less obvious than for high scorers; evasive, defensive, guarded; shy, secretive, withdrawn.

Scale 7 (Psychasthenia)
High-scoring people show: anxious, tense, and agitated; high discomfort; worried and apprehensive; high strung
and jumpy; difficulties in concentrating; introspective, ruminative; obsessive, and compulsive; feel insecure and
inferior; lack self-confidence; self-doubting, self-critical, self-conscious, and self-derogatory; rigid and
moralistic; maintain high standards for self and others; overly perfectionistic and conscientious; guilty and
depressed; neat, orderly, organized, and meticulous; persistent; reliable; lack ingenuity and originality in
problem solving; dull and formal; vacillates; are indecisive; distort importance of problems, overreact; shy; do
not interact well socially; hard to get to know; worry about popularity and acceptance; sensitive, physical
complaints; shows some insight into problems; intellectualize and rationalize resistant to interpretations in
therapy; express hostility toward therapist; remain in therapy longer than most patients; makes slow but steady
progress in therapy.

Scale 8 (Schizophrenia)
Very high scorers (= over 80±90) show: blatantly psychotic behavior; confused, disorganized, and disoriented;
unusual thoughts or attitudes; delusions; hallucinations; poor judgment.

High (65±79): schizoid lifestyle; do not feel a part of social environment; feel isolated, alienated, and
misunderstood; feel unaccepted by peers; withdrawn, seclusive, secretive, and inaccessible; avoid dealing with
people and new situations; shy, aloof, and uninvolved; experience generalized anxiety; resentful, hostile, and
aggressive; unable to express feelings; react to stress by withdrawing into fantasy and daydreaming; difficulty
separating reality and fantasy; self-doubts; feel inferior, incompetent, and dissatisfied; sexual preoccupation,
and sex role confusion; nonconforming, unusual, unconventional, and eccentric; vague, long-standing physical
complaints; stubborn, moody, and opinionated; immature, and impulsive; highly-strung; imaginative; abstract,
vague goals; lack basic information for problem-solving; poor prognosis for therapy; reluctant to relate in
meaningful way to therapist; stay in therapy longer than most patients; may eventually come to trust therapist.

Scale 9 (Hypomania)
High-scoring people (T > 80) show: overactivity; accelerated speech; may have hallucinations or delusions of
grandeur; energetic and talkative; prefer action to thought; wide range of interest; do not utilize energy wisely;
do not see projects through to completion; creative, enterprising, and ingenious; little interest in routine or
detail; easily bored and restless; low frustration tolerance; difficulty in inhibiting expression of impulses;
episodes of irritability, hostility, and aggressive outbursts; unrealistic, unqualified optimism; grandiose
aspirations; exaggerates self-worth and self-importance; unable to see own limitations; outgoing, sociable, and
Assessing Adults in Clinical Settings 413
Table 2 (continued)

gregarious; like to be around other people; create good first impression; friendly, pleasant, and enthusiastic;
poised, self-confident; superficial relationships; manipulative, deceptive, unreliable; feelings of dissatisfaction;
agitated; may have periodic episodes of depression; difficulties at school or work, resistant to interpretations in
therapy; attend therapy irregularly; may terminate therapy prematurely; repeat problems in stereotyped
manner; not likely to become dependent on therapists; becomes hostile and aggressive toward therapist.

Moderately elevated scores (T > 65, LE 79): Over-activity; exaggerated sense of self-worth; energetic and
talkative; prefer action to thought; wide range of interest; do not utilize energy wisely; do not see projects
through to completion; enterprising, and ingenious; lack interest in routine matters; become bored and restless
easily; low frustration tolerance; impulsive; has episodes of irritability, hostility, and aggressive outbursts;
unrealistic, overly optimistic at times; shows some grandiose aspirations; unable to see own limitations;
outgoing, sociable, and gregarious; like to be around other people; create good first impression; friendly,
pleasant, and enthusiastic; poised, self-confident; superficial relationships; manipulative, deceptive, unreliable;
feelings of dissatisfaction; agitated; view therapy as unnecessary; resistant to interpretations in therapy; attend
therapy irregularly; may terminate therapy prematurely; repeat problems in stereotyped manner; not likely to
become dependent on therapists; become hostile and aggressive toward therapist.

Low scorers (T below 35): Low energy level; low activity level; lethargic, listless, apathetic, and phlegmatic;
difficult to motivate; report chronic fatigue, physical exhaustion; depressed, anxious, and tense; reliable,
responsible, and dependable; approach problems in conventional, practical, and reasonable way; lack self-
confidence; sincere, quiet, modest, withdrawn, seclusive; unpopular; overcontrolled; unlikely to express feelings
openly.

Scale 10 (Social introversion)


High-scoring people (> 65) show: socially introversion more comfortable alone or with a few close friends;
reserved, shy, and retiring; uncomfortable around members of opposite sex; hard to get to know; sensitive to
what others think; troubled by lack of involvement with other people; overcontrolled; not likely to display
feelings openly; submissive and compliant; overly accepting of authority; serious, slow personal tempo; reliable,
dependable; cautious, conventional, unoriginal in approach to problems; rigid, inflexible in attitudes and
opinions; difficulty making even minor decisions; enjoys work; gain pleasure from productive personal
achievement; tend to worry; are irritable and anxious; moody, experience guilt feelings; have episodes of
depression or low mood.

Low (T < 45): sociable and extroverted; outgoing, gregarious, friendly and talkative; strong need to be around
other people; mix well; intelligent, expressive, verbally fluent; active, energetic, vigorous; interested in status,
power and recognition; seeks out competitive situations; have problem with impulse control; act without
considering the consequences of actions; immature, self-indulgent; superficial, insincere relationships;
manipulative, opportunistic; arouses resentment and hostility in others.

Adapted from Butcher (1989).

In addition, prior to publication of the Since the MMPI-2 was published in 1989 a
MMPI-2 there were a number of validity studies number of other validation studies have been
conducted on the revised form. For example, published (Archer, Griffin, & Aiduk, 1995; Ben-
the MMPI revision committee collected person- Porath, McCully, & Almagor, 1993; Blake et al.,
ality ratings on more than 800 couples included 1992; Clark, 1996; Husband & Iguchi, 1995;
in the normative sample. These personality Keller & Butcher, 1991; Khan, Welch, &
ratings clearly cross-validated a number of the Zillmer, 1993).
original scales. Moreover, validation research
was conducted on a number of samples 4.14.2.2 Basic Personality Inventory
including schizophrenics and depressives
(Ben-Porath, Butcher, & Graham, 1991); mar- The BPI (Jackson, 1989) was published as an
ital problem families (Hjemboe & Butcher, alternative to the MMPI-2 for the global
1991); potential child-abusing parents (Egeland, assessment of psychopathology. The key aims
Erickson, Butcher, & Ben-Porath, 1991); alco- in developing the BPI were to produce a broad-
holics (Weed, Butcher, Ben-Porath, & McKen- band measure of psychological dysfunctioning
na, 1992); airline pilot applicants (Butcher, as measured by the MMPI that was: (i)
1994); military personnel (Butcher, Jeffrey et al., relatively short, (ii) incorporated modern prin-
1990). ciples of test construction, and (iii) showed
414 Objective Personality Assessment with Adults

Table 3 Description of the MMPI-2 content scales.

1. Anxiety (ANX)
High scorers report general symptoms of anxiety including tension, somatic problems (i.e., heart pounding and
shortness of breath), sleep difficulties, worries, and poor concentration. They fear losing their minds, find life a
strain, and have difficulties making decisions. They appear to be readily aware of these symptoms and problems,
are willing to admit to them.
2. Fears (FRS)
A high score indicates an individual with many specific fears. These specific fears can include blood; high places;
money; animals such as snakes, mice, or spiders; leaving home; fire; storms and natural disasters; water; the
dark; being indoors; and dirt.
3. Obsessiveness (OBS)
High scorers have tremendous difficulties making decisions and are likely to ruminate excessively about issues
and problems, causing others to become impatient. Having to make changes distresses them, and they may
report some compulsive behaviors such as counting or saving unimportant things. They are excessive worriers
who frequently become overwhelmed by their own thoughts.
4. Depression (DEP)
High scorers on this scale show significant depression. They report feeling blue, uncertain about their future,
and uninterested in their lives. They are likely to brood, be unhappy, cry easily, and feel hopeless and empty.
They may report thoughts of suicide or wishes that they were dead. They may believe that they are condemned
or have committed unpardonable sins. Other people may not be viewed as a source of support.
5. Health concerns (HEA)
Individuals with high scores report many physical symptoms across several body systems. Included are gastro-
intestinal symptoms (e.g., constipation, nausea and vomiting, stomach trouble), neurological problems (e.g.,
convulsions, dizzy and fainting spells, paralysis), sensory problems (e.g., poor hearing or eyesight),
cardiovascular symptoms (e.g., heart or chest pains), skin problems, pain (e.g., headaches, neck pains),
respiratory troubles (e.g., coughs, hay fever, or asthma). These individuals worry about their health and feel
sicker than the average person.
6. Bizarre mentation (BIZ)
Psychotic thought processes characterize individuals high on the BIZ scale. They may report auditory, visual, or
olfactory hallucinations and may recognize that their thoughts are strange and peculiar. Paranoid ideation (e.g.,
the belief that they are being plotted against or that someone is trying to poison them) may be reported as well.
These individuals may feel that they have a special mission or powers.
7. Anger (ANG)
High scorers tend to have anger control problems. These individuals report being irritable, grouchy, impatient,
hotheaded, annoyed, and stubborn. They sometimes feel like swearing or smashing things. They may lose self-
control and report having been physically abusive towards people and objects.
8. Cynicism (CYN)
High scorers tend to show misanthropic beliefs. They expect hidden, negative motives behind the acts of others,
for example, believing that most people are honest simply for fear of being caught. Other people are to be
distrusted, for people use each other and are only friendly for selfish reasons. They likely hold negative attitudes
about those close to them, including fellow workers, family, and friends.
9. Antisocial practices (ASP)
High scorers tend to show misanthropic attitudes like high scorers on the CYN scale. The high scorers on the
ASP scale report problem behaviors during their school years and other antisocial practices like being in trouble
with the law, stealing or shoplifting. They report sometimes enjoying the antics of criminals and believe that it is
all right to get around the law, as long as it is not broken.
10. Type A (TPA)
High scorers report being hard-driving, fast-moving, and work-oriented individuals, who frequently become
impatient, irritable, and annoyed. They do not like to wait or be interrupted. There is never enough time in a day
for them to complete their tasks. They are direct and may be overbearing in their relationships with others.
11. Low self-esteem (LSE)
High scores on LSE characterize individuals with low opinions of themselves. They do not believe that they are
liked by others or that they are important. They hold many negative attitudes about themselves including beliefs
that they are unattractive, awkward, and clumsy, useless, and a burden to others. They certainly lack self-
confidence, and find it hard to accept compliments from others. They may be overwhelmed by all the faults they
see in themselves.
Assessing Adults in Clinical Settings 415
Table 3 (continued)

12. Social discomfort (SOD)


High scorers tend to be very uneasy around others, preferring to be by themselves. When in social situations,
they are likely to sit alone, rather than joining in the group. They see themselves as shy and dislike parties and
other group events.
13. Family problems (FAM)
High scorers tend to show considerable family discord. Their families are described as lacking in love,
quarrelsome, and unpleasant. They even may report hating members of their families. Their childhood may be
portrayed as abusive, and marriages seen as unhappy and lacking in affection.
14. Work interference (WRK)
High scorers tend to show behaviors or attitudes that are likely to contribute to poor work performance. Some
of the problems relate to low self-confidence, concentration difficulties, obsessiveness, tension and pressure, and
decision-making problems. Others suggest lack of family support for the career choice, personal questioning of
career choice, and negative attitudes towards co-workers.
15. Negative treatment indicators (TRT)
High scorers tend to show negative attitudes towards doctors and mental health treatment. High scorers do not
believe that anyone can understand or help them. They have issues or problems that they are not comfortable
discussing with anyone. They may not want to change anything in their lives, nor do they feel that change is
possible. They prefer giving up, rather than facing a crisis or difficulty.

Adapted from: Butcher, Graham, Williams, & Ben-Porath (1990).

empirical evidence of being able to discriminate However, these norms are almost entirely based
between normal and dysfunctional persons as on white populations. Moreover, most of the
well as being able to predict pathological normative sample was collected using nonstan-
behavior. dard data collection procedures. For example,
The BPI is made up of 240 items, grouped into booklets were mailed to subjects instead of
12 6 20 item scales. Neurotic tendencies are being administered under standard conditions.
measured through the scales of Hypochondria- In addition, each of the subjects in the
sis, Depression, Anxiety, Social Introversion, normative sample responded to one-third of
and Self-depreciation. Aspects of sociopathy are the items in the booklet, an artifact that makes it
measured by Denial, Interpersonal Problems, difficult to perform some analyses on the
Alienation, and Impulse Expression scales. normative sample (e.g., alpha coefficients).
Psychotic behavior is assessed by scales labeled Among the difficulties that have been asso-
Persecutory Ideas, Thinking Disorder, and to ciated with the BPI is a lack of validity scales for
some degree, by the Deviation scale. The identifying invalid response protocols. The one
Deviation scale comprises 20 critical items that content scale that would logically appear to have
are intended to serve as the basis for further some bearing on the issue of protocol validity is
clinical follow up. In contrast, the definitions of the Denial scale, which is described in the
the constructs reflected in the other 11 items on manual as a measure of lack of insight and lack of
the BPI are based on the results of a multivariate normal affect. Unfortunately, Denial appears to
analysis of the content underlying the MMPI be a relatively weak scale in terms of its reliability
and the Differential Personality Inventory and validity in various empirical studies (Holden
(Jackson & Messick, 1971). et al., 1988; Jackson, 1989). Two other com-
The strong internal psychometric properties plaints that have been voiced are a lack of an
of the BPI attest to its careful construction established link to diagnostic categories and a
(Jackson, 1989). Item properties and item factor lack of work on profile interpretation.
analyses support the internal structure of the Overall, the BPI has shown psychometric
instrument. Both internal consistencies and potential as a general measure of psychopathol-
test±retest reliability estimate fall in the 0.70 ogy. The basic developmental work on it is
to 0.80 range. Various validity studies (some sound. What the BPI now needs are more
published in the manual, others in the literature) extensive norms along with further work on its
show that the BPI can indeed discriminate clinical applicability.
between normal and non-normal (e.g., delin-
quent) persons and can, within psychiatric
populations, predict a variety of clinical criteria. 4.14.2.3 Personality Assessment Inventory
Norms exist for adolescents and adults, span-
ning a variety of sample types such as commu- The PAI (Morey, 1991) is another inventory
nity, psychiatric, college, and forensic. of general psychopathology. The PAI is a 344
416 Objective Personality Assessment with Adults

item self-report measured designed to screen for done maintenance patients scored differently
approximately the same pathological domains from the normative populations.
as the MMPI/MMPI-2. It is used to collect Generally, preliminary data suggest that the
information related to diagnosis, and to provide PAI is a well-constructed and brief measure of
input on treatment planning. psychopathology. More research on its clinical
The PAI includes four validity scales, 11 validity needs to be completed before it can be
clinical scales, five treatment scales and two considered to be a useful measure of psycho-
interpersonal scales. The clinical scales are pathology in clinical settings. There are no data
Somatic Complaints, Anxiety-related Disor- to support the PAI's use in lieu of the MMPI-2
ders, Depression, Mania, Paranoia, Schizo- which has a more substantial empirical data-
phrenia, Borderline Features, Antisocial base.
Features, Alcohol Problems, and Drug Pro-
blems. Treatment scales are Aggression, Suici- 4.14.3 SPECIALIZED OR FOCUSED
dal Ideation, Stress, Nonsupport, and CLINICAL ASSESSMENT
Treatment Rejection. Interpersonal scales are MEASURES
Dominance and Warmth. Of these scales, 10 are
further divided into subscales that are intended In this section we will address several other
to measure distinct constructs. A total of 27 measures that have been developed for clinical
items are designated critical items which, assessment and research to assess specific or
according to the author, should be followed more narrowly focused characteristics rather
up. This does seem to be a relatively large than omnibus instruments such as the MMPI-2.
number of scales to interpret meaningfully in Several of these instruments will be examined to
view of the total number of items. illustrate their application and potential utility
Evidence for the reliability of the 22 scales is for evaluating specific problems in clinical
generally good. Across normative, clinical and settings.
college samples, median alpha coefficients are
all in the 0.80±0.90 range. One month test±retest 4.14.3.1 Millon Clinical Multiaxial Inventory
coefficients are reported to be in the 0.80s in the
manual and in the 0.70s in the literature (Boyle The MCMI (Millon, 1977, 1987, 1994) was
& Lennon, 1994). Interestingly, some of the developed by Theodore Millon for making
lowest reliabilities are found for the validity clinical diagnoses on patients. The MCMI was
scales, a phenomenon that has been found for intended to improve upon the long-established
the validity scales on other instruments such as MMPI. In contrast to the MMPI/MMPI-2, the
the MMPI (e.g., Fekken & Holden, 1991) and MCMI was designed with fewer items; is based
may well be a function of range restriction in the on an elaborate theory of personality and
scores on such scales. psychopathology; and explicitly focuses on
Normative PAI data for the USA are diagnostic links to criteria from the Diagnostic
extensive. Standardization samples for a census- and statistical manual of mental disorders
matched group, a clinical sample representing (DSM).
69 sites, and a college sample drawn from seven The MCMI was developed rationally rather
different US universities each number over 1000 than empirically. Millon has stated in his three
respondents. Normative data in the manual are test manuals (1977, 1987, 1994) as well as
also reported separately by age, education, elsewhere (Millon & Davis, 1995) that devel-
gender, and race. Normative information for opment of the MCMI is to be an ongoing
other countries (Canada, Australia, the UK, process. To keep the MCMI maximally useful
etc.) would be desirable. for clinical diagnosis and interpretation, it must
In view of the recency of the publication of the be continually updated in view of theoretical
PAI, there are only a handful of studies that refinements, empirical validation studies, and
bear on its validity. The manual reports evolutions in the official DSM classification
evidence of the concurrent validity of the PAI systems. Most updated test manuals leave the
in the form of correlations with other measures test user with the impression that the developer
of psychopathology. A number of other studies considered test revision a necessary evil. Very
show that the PAI can discriminate between rarely do you see continuous improvement as a
diagnostic groups. Boyle and Lennon (1994) test developer's goal, in part because this ever-
showed that the PAI can distinguish normals, changing process makes the accumulation of a
alcoholics, and schizophrenics. Schinka (1995) solid research base difficult.
was further able to use the PAI to develop a All three MCMI versions comprise 175 true/
typology for alcoholics that had validity with false items. However, across versions, the exact
several external variables. Finally, Alterman test items have evolved through revision or
and colleagues (1995) demonstrated that metha- replacement. The number of scales and validity
Specialized or Focused Clinical Assessment Measures 417

indices that can be calculated from these items the following three generalizations. First, the
has also increased. The original version, the MCMI has only modest accuracy for assigning
MCMI-I, had 20 clinical scales and two validity patients to diagnostic groups across a variety of
scales. The MCMI-II yielded 22 clinical scales clinical criteria (e.g., Chick, Martin, Nevels, &
and three validity scales. The current MCMI-III Cotton, 1994; Chick, Sheaffer, Goggin, & Sison,
has 24 clinical scales, three modifying indices 1993; Flynn, 1995; Hills, 1995; Inch & Crossley,
and a validity index. Many items appear on 1993; Patrick, 1993; Soldz, Budman, Demby, &
several scales making for great item overlap. Merry, 1993). Second, the MCMI may be better
On the MCMI-III, 14 clinical scales assess at predicting the absence than the presence of a
personality patterns that relate to DSM-IV Axis disorder (Chick et al., 1993; Hills, 1995; Soldz
II disorders. Another 10 scales measure clinical et al., 1993). Third, the MCMI may be better at
syndromes related to DSM-IV Axis I disorders. predicting some types of disorders than others
The modifying indices, Disclosure, Desirability, but there is little agreement on which ones (Inch
and Debasement, are correction factors applied & Crossley, 1993; Soldz et al., 1993).
to clinical scale scores to ameliorate respon- One source of the difficulty may be the base
dents' tendencies to distort their responses. The rate scores. Raw scores on scales are weighted
validity index comprises four bizarre or highly and converted to base rate scores. The base rate
improbable items meant to detect careless, scores reflect the prevalence of a particular
random, or confused responding. personality disorder or pathological character-
The relationship between scales and items is istic in the overall population. Their use is
explicated in detail in the manual. Millon intended to maximize the number of correct
started with a theory-based approach to writing classifications relative to the number of in-
items, followed by an evaluation of the internal correct classifications when using the MCMI to
structure of the items, and finally engaged in an make diagnoses (Millon & Davis, 1995). If the
assessment of the diagnostic efficiency of each estimated base rates for the various diagnostic
item for distinguishing among diagnostic categories are poor, then the predictive accuracy
groups before final placement of an item on a of the MCMI can be expected to be poor
scale. Millon departed from usual psychometric (Reynolds, 1992).
practice in a way that results in some unfortu- One negative consequence of the type of
nate complications. Item overlap across scales is norms used in the development of the MCMI
permitted: on average, items appear on three inventories is that they do not discriminate
different scales with differential weights. This between patients and normals. Use of the
makes scoring inordinately complex, which MCMI assumes that the subject is a psychiatric
makes assessment of scale homogeneity com- patient. Consequently, the MCMI overpatho-
plex, and in turn this makes evaluation of the logizes individuals who are not actually
empirical structure underlying the scales com- patients. The MCMI should not be used where
plex. There are technical solutions for these issues of normality need to be addressed. For
problems but the result is that the MCMI is not example, if the test were used in family custody
an easy instrument with which to work. evaluations or personnel screening, the test
Despite its psychometric drawbacks, the interpretation would appear very
theory underlying the MCMI is generally pathologicalÐit cannot do otherwise.
agreed to be elegant and a substantial asset How does the MCMI compare to the MMPI?
(McCabe, 1987; Reynolds, 1992). Each dimen- The MCMI publisher appears to emphasize that
sion measured by the test has a clear conceptual the MCMI and MMPI-2 measure different
link to Millon's theory of psychopathology. characteristics and the MCMI is shorter to
Such a theory allows for the generation of administer to patients. Whereas the MMPI
clinical inferences based on a small number of measures a broad range of psychopathology,
fundamental principles (Millon & Davis, 1995). the MCMI has its premier focus on the
Not only do these inferences guide measure- assessment of personality disorders. Consonant
ment, but also they enhance understanding of with its rational construction, the elaborate
the constructs, bear on practical treatment theoretical underpinnings of the MCMI are
decision, and produce research hypotheses. impressive. In contrast, however, the test
One of the stated goals of the MCMI is to literature that supports the validity of the
place patients into target diagnostic groups. To MMPI/MMPI-2 is not available for the MCMI.
this end, the MCMI scales are directly co- Validation research on it has not proceeded at a
ordinated with the DSM diagnostic categories. very high pace. Whether the MCMI will have
How well does the MCMI live up to its aim? The either the clinical utility or the heuristic value
manuals report good evidence of diagnostic that the MMPI enjoys remains unanswered
efficiency. However, the recent literature (avail- until more clinical research, and perhaps more
able on the MCMI-I and MCMI-II) suggests refinements, are undertaken with the MCMI.
418 Objective Personality Assessment with Adults

4.14.3.2 The Beck Depression Inventory certain instructions yield a state-like index of
depressive thinking, whereas, other instructions
The BDI was first introduced in 1961, and it yield a more trait-like index of depressive
has been revised several times since (Beck et al., thinking. Again, clinicians are encouraged to
1988). The BDI has been widely used as an use caution when administering the BDI and to
assessment instrument in gauging the intensity tailor the administration instructions to the type
of depression in patients who meet clinical of index (state or trait) that is desired.
diagnostic criteria for depressive syndromes. In their review of the psychometric properties
However, the BDI has also found a place in of the BDI, Beck et al. (1988) reported high
research with normal populations, where the internal consistency reliability of the instrument
focus of use has been on detecting depression or among both psychiatric and nonpsychiatric
depressive ideation. populations. The authors also reported that
The BDI was developed in a manner similar the BDI closely parallels the changes in both
to the MMPI: clinical observations of symp- patient self-report and clinicians' ratings of
toms and attitudes among depressed patients depression (i.e., the BDI score accurately
were contrasted to those among nondepressed reflects changes in depressive thinking). Finally,
patients in order to obtain differentiation of the they also presented evidence for the content,
depressed group from the rest of the psychiatric concurrent, discriminant, construct, and factor-
patients. The 21 symptoms and attitudes ial validity of the BDI.
contained in the BDI reflect the intensity of The acceptable reliability and validity of the
the depression; items receive a rating of zero to BDI have helped make it a widely used objective
three to reflect their intensity and are summed index of depressive thinking among clinicians.
linearly to create a score which ranges from 0 to Perhaps the most obvious use of the BDI is as an
63. The 21 items included reflect a variety of index of change in the level or intensity of
symptoms and attitudes commonly found depression. With an increasing focus on man-
among clinically depressed individuals (e.g., aged healthcare and accountability by psy-
Mood, Self-dislike, Social Withdrawal, Sleep chotherapeutic service providers, the BDI offers
Disturbance). The BDI administration is a reliable and valid index of depressive
straightforward, and it can be given as an symptoms and attitudes which can be used
interview by the clinician or as a self-report effectively to document changes brought about
instrument (requiring a fifth or sixth grade in therapy.
reading level).
The BDI is interpreted through the use of cut- 4.14.3.3 The State-Trait Anxiety Inventory
off scores. Cut-off scores may be derived based
on the use of the instrument (i.e., if a clinician The STAI was developed by Spielberger et al.
wishes to identify very severe depression, then (1970) to measure anxiety from the perspective
the cut-off score would be set high). According of states vs. traits. The state measurement
to Beck et al. (1988), the Center for Cognitive assesses how the individual feels ªright nowº or
Therapy has set the following guidelines for BDI at this moment. Subjects are asked to rate the
cut-off scores to be used with affective disorder intensity of their anxious feelings on a four point
patients: scores from 0 through 9 indicate no or scale as to their experience of feelings in terms
minimal depression; scores from 10 through 18 of: not at all, somewhat, moderately so, or very
indicate mild to moderate depression; scores much so. The trait anxiety measure addresses
from 19 through 29 indicate moderate to severe how the individuals generally feel by rating
depression; and scores from 30 through 63 themselves on a four-point scale: almost never,
indicate severe depression. sometimes, often, or almost always.
Two important issues must be considered by Since it was developed in 1966 the STAI has
clinicians regarding the results of the BDI. been translated into over 48 different languages
Unlike the MMPI/MMPI-2 and other major and has been widely researched in a variety of
self-report instruments, the BDI has no safe- clinical and school settings (Spielberger, Ritter-
guards against faking, lying, or variable re- band, Sydeman, Reheiser, & Unger, 1995). The
sponse sets. Thus, clinicians are warned against evidence for construct validity of the STAI
this drawback of the BDI in assessing depressive comes from a variety of sources, for example,
thoughts and symptoms. In settings where correlations with other anxiety measures (Spiel-
faking or defensiveness are probable threats berger, 1977), clinical settings (Spielberger,
to the validity of the test, clinicians may need to 1983), and medical and surgical patients
reconsider their use of the BDI. The other issue (Spielberger, 1976). Nonetheless, there has been
pertains to the state±trait debate in assessment. general debate in the literature about the
The BDI is extremely sensitive to differences in conceptual and practical benefits of the trait
the instructions given to an examinee such that vs. state distinction.
Specialized or Focused Clinical Assessment Measures 419

The STAI has become a very widely used conceptualization finds mixed results at best
measure in personality and psychopathology (Fekken, 1985). One view is that the predictive
research in the USA and in other countries. efficiency of the WIST is not enhanced by
However, the state-trait inventories have not including the Time component.
been as broadly used as clinical assessment There are two parallel forms of the WIST
instruments. which differ in content. Items on Form A are
intended to be stressful or anxiety provoking.
4.14.3.4 Whitaker Index of Schizophrenic They assess oral-dependency, hostility, and
Thinking manifestly sexual content. The content of Form
B is neutral. There has been little formal
The WIST (Whitaker, 1973, 1980) was evaluation of the comparability of the two
developed to measure the type of thought forms. Although some studies have reported
impairment that differentiates between schizo- similar rates of diagnostic efficiency for the two
phrenic and ªnormalº thought processes. The forms (Evans & Dinning, 1980; Leslie, Land-
WIST is intended to be individually adminis- mark, & Whitaker, 1984), overall validity
tered as a screening tool or as one part of a evidence would suggest that Form A is stronger
battery of tests. Its multiple choice format makes than Form B. Additional work to clarify the
the WIST a test that is easy to give and to score. comparability of Forms A and B would have
On the WIST, schizophrenic thought is implications for selecting which form to use and
defined as a discrepancy between actual and for using these alternate WIST forms to assess
potential performance on cognitive reasoning changes in symptomatology.
tasks. This impaired thinking has three compo- There is relatively little information on
nents: (i) a degree of illogicality, as reflected in reliability available on the WIST. The manual
inappropriate associations and false premises; does report intratest reliabilities of the two
(ii) a degree of impairment relative to previous WIST forms as Hoyt's reliability coefficients of
performance, as reflected in slowness; and (iii) a around 0.80. Test±retest reliability data, either
degree of unwittingness, as reflected in un- on the WIST subscores or subscales, are not
awareness of the incorrectness of responses. provided in the manual nor do they appear to be
Whitaker's definition of schizophrenic thought readily available in the literature. Similarly,
is carefully explicated in the manual, making it alternate forms of reliability, calculated as
possible for the test user to understand exactly correlations between subscales, or subscores,
what the WIST is measuring. Whitaker, how- are not readily available.
ever, has been criticized for basing his definition There is reasonable evidence for the con-
on a narrow reading of the literature on vergent validity of the WIST. Two reviews
schizophrenic thought disorder (Payne, 1978). report that WIST scores tend to have a 60±70%
The WIST itself is made up of 25 multiple agreement with systems for diagnosing schizo-
choice items that are divided into three subtests: phrenia (Fekken, 1985; Grigoriadis, 1993). The
Similarities, Word Pairs, and New Inventions. WIST has been empirically associated with
Each item consists of a stimulus and five other indices of schizophrenia such as the
response options that differ in degree of MMPI/MMPI-2 Sc scale (Evans & Dinning,
illogicality. To illustrate, consider this sample 1980; Fishkin, Lovallo, & Pishkin, 1977;
item for the Similarities subtest: car: automo- Grigoriadis, 1993) although not with other
bile, tires, my transportation, jar, smickle. The schizophrenia indices including conceptually
correct answer receives a score of 0; a loose relevant SCL-90 scales (Dinning & Evans, 1977)
association, reference idea, clang association, and the New Haven Schizophrenic Index
and nonsense association would receive scores (Knight, Epstein, & Zielony, 1980).
of 1, 2, 3, and 4, respectively. The test Discriminant validity of the WIST has been
administrator presents for a second time any harder to establish. The WIST has difficulty
items that the respondent answered incorrectly distinguishing among different psychiatric di-
on the initial test taking. agnostic groups (e.g., Burch, 1995; Pishkin,
Three scores are calculated for the WIST: Lovallo, & Bourne, 1986). This is a serious
total Score for all response alternatives selected shortcoming because the WIST is likely to be
either in the original or in the second enquiry used in clinical settings precisely to help make
phase; total Time for the initial completion such distinctions. A second problem with
of the WIST; and an overall Index that discriminant validity has been the tendency of
combines WIST Score and WIST Time. Pre- the WIST to correlate negatively with measures
sumably the WIST Score and Time components of general cognitive ability. Based on such data
are added together because they both relate to one could share the view of at least one
schizophrenic thought disorder. However, a pessimistic reviewer and claim that the WIST
review of the empirical data supporting this has no demonstrated use (Payne, 1978).
420 Objective Personality Assessment with Adults

Alternatively, the WIST may have a role in a lay people often use the term narrowly to refer
comprehensive assessment battery. The WIST to general ªoutgoingness.º Openness to Experi-
has never been promoted as a stand-alone test, ence represents, broadly, a person's level of
nor has it ever been promoted as a comprehen- constriction in their experiencing of the world; it
sive measure of the full range of schizophrenic is often associated with creativity (and even
symptomatology. Rather, the data on the WIST hypnotizability). Agreeableness represents the
may be thought of as a general measure of dimension of interpersonal behavior; the coun-
cognitive deficit rather than cognitive deficit terpart of Agreeableness is Antagonism. Final-
specific to schizophrenia. Thus, it may provide ly, Conscientiousness represents a dimension of
one objective source of data for accepting or scrupulous organization of behavior.
rejecting a more general diagnosis of psychosis.

4.14.4.1.2 NEO Personality Inventory


4.14.4 NORMAL RANGE PERSONALITY The NEO-PI is a 181-item inventory de-
ASSESSMENT signed to index the FFM personality dimen-
Identification and description of psychologi- sions (N, E, O, A, and C); the NEO-PI also
cal disorder is but one reason to administer a yields several subscales of the N, E, and O
personality measure to an adult. Objective dimensions (Costa & McCrae, 1992). There is a
personality measures also are widely used self-report form of the NEO-PI as well as an
outside the clinic or hospital setting for normal observer-rating form. Although the authors of
range assessment situations. Several objective the NEO-PI argue for its utility in clinical
personality measures will be discussed regarding settings, the instrument has been studied almost
their use in research, educational, vocational exclusively in nonclinical populations. One
assessment, and personnel selection. example of a recent investigation using the
NEO-PI in research with psychiatric samples is
the examination of differences in stimulus
4.14.4.1 Objective Personality Measures in intensity modulation among older depressed
Research individuals with and without mania features
(Allard & Mishara, 1995). The NEO-PI was
4.14.4.1.1 The Five Factor Model (the Big Five) used to examine the hypothesis that stimulus
intensity augmenters with unipolar depression
One of the most popular current conceptua-
would be introverted, whereas reducers with
lizations being studied today involves taxo-
bipolar depression would be extroverted.
nomies of personality traits based on factor
Determinations of depression with and without
analytic methods and is commonly referred to as
mania features were based on scores on the
the ªBig Five,º or the Five Factor Model
MMPI. Costa and McCrae (1992) point to the
(FFM). As Butcher and Rouse (1996) note in
NEO-PI as a useful assessment tool in aiding
their review, some personality researchers have
the clinician with understanding the client,
rejected the FFM as an end-all to personality
selection of treatment, and even anticipating
trait theory, while others in the area of
the course of therapy. However, the NEO-PI
personality research continue to embrace it.
has not found wide use among clinicians to date
Among the proponents of the FFM are Costa
(Butcher & Rouse, 1996).
and his colleagues, who have proposed the NEO
Unfortunately, the NEO-PI is very suscep-
Personality Inventory (NEO-PI) as a self-report
tible to faking (Bailley & Ross, 1996) and does
measure of the FFM personality dimensions
not contain validity indices to detect deviant
(Costa & McCrae, 1992). The FFM consists of
response sets.
five factor-analytically derived dimensions of
personality. The five dimensions (are Neuroti-
cism (N), Extroversion (E), Openness to
4.14.4.1.3 Multidimensional Personality
Experience (O), Agreeableness (A), and Con-
Questionnaire
scientiousness (C). Neuroticism has long been
a familiar adjective among clinicians for The Multidimensional Personality Question-
describing people who tend to experience naire (MPQ) is a 300-item self-report instrument
psychological distress. At the opposite end of that was developed by Tellegen (unpublished
the Neuroticism dimension is Emotional Stabi- manuscript) in an attempt to clarify the ªself-
lity, which represents the tendency to stay on a view domainº in personality research. Tellegen
psychologically even keel. Extroversion encom- used a classical iterative test construction
passes the concepts of positive emotionality, approach involving several rounds of factor
sociability, and activity. The Extroversion analysis to come up with the 11 primary scales,
dimension is rather broad in its scope, although six validity scales, and three ªhigher-orderº
Normal Range Personality Assessment 421

factors. The 11 primary scales (which include 4.14.4.1.5 Sixteen Personality Factor Test
the dimensions of Social Potency, Control,
The 16PF was originally developed in the
Harm Avoidance, Well-being, Aggression, and
1940s by Raymond Cattell to measure the
others) load onto the three higher-order scales,
primary factors of normal personality. At that
which represent the familiar personality do-
time, Cattell's unique contribution was to apply
mains of Positive Affectivity, Negative Affec-
factor analysis as a method for uncovering the
tivity, and Constraint. The six validity indices
full scope of personality. The fifth and current
include VRIN and TRIN, which are concep-
edition of the 16PF (Cattell, Cattell, & Cattell,
tually similar to those on the MMPI-2.
1993) measures the well-known 16 personality
The MPQ is not as widely used as other
factors, plus it summarizes these factors into
objective personality measures in either the
five global factors which again bear similarity to
clinical or the research domain. However, recent
the well-known ªBig Five.º The global factors
investigations suggest a place for the MPQ in
are: extroversion, anxiety, tough-mindedness,
normal range and clinical personality assess-
independence, and self-control. Relative to
ment. For example, Kuhne, Orr, and Baraga
earlier editions, the fifth editions includes
(1993) demonstrated the utility of certain MPQ
updated language; fewer items; and improved
scales for discriminating among veterans with
reliability and response style scales to measure
and without post-traumatic stress disorder.
impression management, infrequent respond-
Krueger et al. (1994) administered the MPQ
ing, and acquiescence. Because of the recency of
to adolescents in their community-based long-
the publication of the fifth edition, there are few
itudinal study and found that certain MPQ
studies pertaining to its validity. However, a
scales were useful in distinguishing those who
large database supports the validity of earlier
engaged in delinquency from those who ab-
editions of the 16PF. The 16PF has applicability
stained. These two reports suggest the utility of
in clinical, educational, organizational, and
the MPQ both in clinical settings and in normal
research settings. With the expansion of its
range personality assessment.
interpretive reports and profiles, the 16PF
would appear to be particularly useful in
personal or vocational counseling settings.
4.14.4.1.4 Personality Research Form
One well-known measure of normal person-
4.14.4.1.6 California Psychological Inventory
ality is the Personality Research Form (PRF).
Developed by Douglas N. Jackson (1984), the The CPI developed by Gough (1957) is a
PRF is a true/false, multiscale measure of 20 of multiscale, objective, self-report instrument
the psychosocial needs (e.g., achievement, used with normal range and psychiatric popula-
aggression, sociability) originally defined by tions (Megargee, 1972). The CPI is similar in
Henry Murray (1938). Many psychometrics structure and content to the MMPI (and the
texts hold up the PRF as a model of the MMPI-2). In fact, many of the items on the CPI
construct approach to test construction. Indeed, are identical in wording to the items on the
much of the appeal of the PRF lies in its ability MMPI. The CPI focuses on ªeverydayº con-
to measure a large number of normal person- cepts about personality, such as dominance and
ality characteristics while minimizing both scale responsibility among others. Eighteen scales
intercorrelations and the influence of social (divided into four classes) are derived from the
desirability and acquiescence. Moreover, a CPI; the results of the test are interpreted by
variety of validity studies have been published reference to the plotted profile of standard
in the 1980s and 1990s supporting the psycho- scores. The body of literature using the CPI in
metric soundness of the PRF. Critics of the PRF normal range assessment is quite extensive in
complain that despite its technical elegance the both quantity and breadth. Investigations
PRF does not reflect an integrated model of include the use of the CPI to identify personality
personality, which limits its applicability to real- types, based on profiles, among a group of
life testing situations. Recent research, however, college students (Burger and Cross, 1979) and
shows that the content assessed by the PRF may an examination of the underlying personality
be well described by the Five Factor structure structure as measured by the CPI, again using a
(Paunomen, Jackson, Trzebinski, & Fosterling, sample of college students (Deniston and
1992). In research the popularity of the PRF Ramanaiah, 1993). The CPI also has been
remains high as attested to by the number of employed in the assessment of psychiatric
references in bibliographies, such as the one samples. Especially noteworthy is the utility
produced by MacLennan in 1991, which lists of the CPI with criminal samples (see Laufer,
over 375 studies featuring the PRF in the Skoog, and Day, 1982, for a relevant review of
literature. this literature).
422 Objective Personality Assessment with Adults

Some specific applications of psychological personality dimensions assessed by the NEO-PI


tests in ªnormal rangeº settings are now and the six vocational personality dimensions
described. proposed by Holland (assessed with the Voca-
tional Preferences Inventory). In general, there
4.14.4.2 Objective Personality Measures in was overlap among two to four of the significant
Educational/Vocational Assessment factors extracted from each of the assessment
instruments. However, the NEO-PI Neuroti-
The use of objective personality measures has cism, Likability, and Control factors were not
become increasingly popular among profes- represented in the Holland vocational person-
sionals in the field of educational/vocational ality dimensions, which suggests a distinctive
assessment. Research has identified the FFM, and qualitatively different role for objective
the NEO-PI, and the CPI as particularly useful personality assessment in vocational counsel-
in educational/vocational assessment. ing. Other work with the NEO-PI has shown
that two of the Big Five personality dimensions,
Neuroticism and Agreeableness, are strongly
4.14.4.2.1 FFM
related to occupational burnout among health-
The Big Five personality dimensions of the care workers (Piedmont, 1993). Healthcare
FFM have been studied in several contexts workers with higher ratings on the Neuroticism
related to educational/vocational assessment. dimension were more likely to experience
Moreover, because the FFM is a theoretical occupational burnout. Conversely, workers
concept, various instruments have been used to with higher ratings on the Agreeableness
assess the relationship between the Big Five dimension were less likely to succumb to
personality dimensions and various educa- occupational burnout.
tional/vocational assessment variables. Two
recent investigations illustrate the utility of
4.14.4.2.3 CPI
the Big Five personality dimensions in identify-
ing candidates for admission to educational The CPI is one of the most widely used
institutions and identifying characteristics personality instruments in normal range assess-
among students in various university programs. ment in educational/vocational settings. Walsh
Williams, Munick, Saiz, and Formy-Duval (1974) examined personality traits among
(1995) found that a mock graduate school college students identified as making hypothe-
admissions board (composed of graduate school tical career choices that were either congruent or
faculty) favored for admission those hypothe- incongruent with their vocational personality
tical candidates whose applications reflected type (as proposed by Holland). It was found
high ratings on the Big Five dimensions of that congruent students could be described by
Conscientiousness and Openness to Experience. their CPI profiles as socially accepted, con-
Conversely, the Big Five dimensions of Ex- fident, and planful, whereas incongruent stu-
troversion and Agreeableness were not asso- dents could be described as impulsive,
ciated with a favorable impression of unambitious, and insecure. The well-known
hypothetical candidates. Kline and Lapham Strong Vocational Interest Blank (SVIB) used
(1992) assessed the Big Five personality dimen- commonly among vocational and educational
sions among a group of college students to counselors is related to various personality traits
examine personality differences among the as well. Johnson, Flammer and Nelson (1975)
various fields of study (i.e., between different found a relationship among SVIB factors and
college majors). Students of various majors were CPI personality factors, especially those related
not discriminated by levels of either Neuroti- to the global introversion/extroversion person-
cism or Extroversion. However, students in two ality dimension.
fields of study (science and engineering) were
marked by high ratings on the Big Five
4.14.4.2.4 MMPI/MMPI-2
personality dimensions of Conscientiousness
and Conventionality. Although it was not designed for educational
research or placement purposes the MMPI/
MMPI-2 has been among the most frequently
4.14.4.2.2 NEO-PI
employed instruments in this context. In a
The personality dimensions that underlie the recent survey of test use in personnel and
NEO-PI appear to be related to at least one well- educational screening the MMPI has been
known typology of vocational personalities employed effectively in a number of studies,
(which, in turn, correspond to vocational for example: Anderson (1949), Appleby and
preferences). Gottfredson, Jones, and Holland Haner (1956), Applezweig (1953), Barger and
(1993) found a relationship between the Big Five Hall (1964), Barthol and Kirk (1956), Burgess
Normal Range Personality Assessment 423

(1956), Centi (1962), Clark (1953, 1964), and (Butcher & Rouse, 1996) and currently is the
Frick (1955) to mention only a few. most frequently used personality measure in
It appears evident that personality assessment personnel screening situations, particularly
contributes information independent of direct when the position is one that requires good
educational/vocational interest measures. There mental health, emotional stability, and respon-
is extensive information available on the use of sible behavior.
the MMPI/MMPI-2 in this setting. As demon- The MMPI-2 is usually employed in person-
strated in the research with the NEO-PI and the nel selection to screen out candidates who are
CPI, objective personality assessment instru- likely to have psychological problems from
ments yield information relevant to vocational critical occupations such as police officer,
assessment over and above that given by airline pilot, and nuclear power control rooms,
narrowly defined vocational personality assess- fire department, or air traffic control personnel.
ments (like the one proposed by Holland).
Moreover, elements of the FFM and indepen-
dent personality constructs of the CPI have 4.14.4.4 Other Personality Measures in
proven their utility in educational assessment in Personnel Selection
diverse areas such as admissions to educational Objective measures of personality character-
institutions and choice of major fields of study. istics have a role in personnel selection similar to
Clearly, objective personality assessment has that in educational/vocational assessment.
carved out a valuable niche in normal range Namely, professionals in the field of personnel
educational/vocational assessment. selection are interested in knowing which
personality variables aid in the selection of
4.14.4.3 Personnel Screening quality employees. The literature suggests a
valuable role for objective personality measures
Among the earliest and most extensive uses of in the normal range assessment field of
personality tests with normals has been for the personnel selection.
purpose of employment screening. The first
formal North American, English language
personality inventory, the Woodworth Person- 4.14.4.5 The 16PF
nel Data Sheet, was developed to screen out
unfit draftees during World War I. A number of One of the most widely used personality
other personality questionnaires were devel- scales for employment screening is the 16PF.
oped in the 1930s to aid in personnel selection This inventory, with broad-ranging
decisions. employment-relevant personality items, has
The development and use of the MMPI been widely used in different contexts including:
during World War II provided a means for law enforcement (Burbeck & Furnham, 1985;
assessment psychologists to detect psychologi- Fabricatore, Azen, Schoentgen, & Snibbe, 1978;
cal problems that might make people unsuitable Hartman, 1987; Lawrence, 1984; Lorr & Strack,
for key military assignments. Early research on 1994; Topp & Kardash, 1986); pilots (Cooper &
the use of the MMPI centered around pilot Green, 1976; Lardent, 1991); cabin crew
selection and selection of nuclear submarine personnel (Furnham, 1991); managers (Bar-
crewman. Following World War II, the MMPI tram, 1992; Bush, & Lucas, 1988; Chakrabarti,
came to be widely used in personnel selection & Kundu, 1984; Henney, 1975); occupational
particularly for occupations that required great therapists (Bailey, 1988); church counselors
responsibility or involved high stress such as air (Cerling, 1983) and teachers (Ferris, Bergin, &
flight crews (Butcher, 1994; Cerf, 1947; Fulk- Wayne, 1988). The 16PF provides information
erson, Freud, & Raynor, 1958; Fulkerson & about personality functioning and is typically
Sells, 1958; Garetz & Tierney, 1962; Geist & employed to screen employees for positive
Boyd, 1980; Goorney, 1970; Jennings, 1948); personality features.
police and other law enforcement personnel
(Beutler, Nussbaum, & Meredith, 1988; Beutler, 4.14.4.6 FFM
Storm, Kirkish, Scogin, & Gaines, 1985;
Butcher, 1991; Dyer, Sajwaj, & Ford, 1993; Barrick and Mount (1991) conducted a meta-
Hargrave & Hiatt, 1987; Saxe & Reiser, 1976; analysis of the Big Five personality dimensions
Scogin, & Beutler, 1986; Scogin & Reiser, 1976); and their relationship to three criterion vari-
and nuclear power employees (Lavin, Chardos, ables (job proficiency, training proficiency, and
Ford & McGee, 1987). personnel data) within five occupational
Following the revision of the MMPI and groups (professionals, police, managers,
publication of the MMPI-2, the revised instru- skilled/semiskilled and sales). Conscientious-
ment has been used in personnel screening ness was related to each of the five occupational
424 Objective Personality Assessment with Adults

groups. Additionally, Conscientiousness was (including Sense of Well-being, Sociability,


related to the three criterion variables. These and Social Presence) differentiated suitable
findings lead the authors to conclude that from unsuitable cadets. In the second study of
Conscientiousness is a personality trait related the report, the authors compared incumbent law
to job performance across occupational types. enforcement officers who either had or had not
In a similar vein, Dunn, Mount, Barrick, and experienced serious on-the-job problems (e.g.,
Ones (1995) found that the Conscientiousness providing drugs to inmates, excessive use of
personality dimension was related to managers' force). The Socialization scale was among the
ratings of applicant hireability. This held true best discriminators between groups. The
for various job types (medical technologist, authors concluded that the CPI is a useful aid
carpenter, secretary, etc.). Taken together, to the selection of law enforcement officers.
these two reports suggest that professionals In addition to the literature on law enforce-
in the field of personnel selection can gain ment screening, the CPI may be useful as an
valuable information from objective personal- index of work performance in other fields.
ity measures of the FFM. Specifically, the Toward this end, Hoffman and Davis (1995)
Conscientiousness dimension, when taken with validated the Work Orientation and Managerial
other relevant application information, may be Potential scales for the CPI on groups of
a valuable discriminator among prospective job employees in an entertainment facility. How-
applicants. ever, several of the original CPI scales per-
formed as well as the two new scales in
4.14.4.7 NEO-PI predicting job performance, which questions
the need for additional CPI scales in the
While research suggests that the Big Five selection of personnel.
personality dimensions add qualitative infor- In summary, the use of objective personality
mation to the personnel selection process, it may measures in the selection of job applicants has
be that the various instruments used to measure proven a worthwhile endeavor overall. The Big
the Big Five have differential validity across Five personality dimensions, especially Con-
populations. Schmit and Ryan (1993) adminis- scientiousness, might be useful to the personnel
tered a shortened version of the NEO-PI to a selection process especially in the assessment of
sample of college students and to a sample of conscientiousness. Professionals involved in the
government job applicants. The FFM structure screening of law enforcement candidates can use
fitted the student population better than it fitted the CPI to differentiate suitable from unsuitable
the job applicant population, which suggests a candidates. The CPI may also be useful in
note of caution in putting too much weight on discriminating among groups of job applicants
the Big Five personality dimensions in person- (although its primary use has been in the law
nel selection. The authors note that job enforcement field).
applicants are under different situational de- The scales on the 16PF have been shown to be
mands, which may affect their approach to a relevant to personality descriptions that are
personality questionnaire (e.g., they may adopt useful in personnel selection. Finally, when it
a defensive response style). The lack of validity comes to evaluating potential psychopathology
scales to assess response styles clearly limits the in potential employees the MMPI-2 is usually
NEO-PI for this application. A recent study by the instrument most employed.
Bailley and Ross (1996) showed that the NEO- Although objective personality measures
PI is quite vulnerable to faking and is limited in appear to have a place in personnel selection,
not having scales to detect deviant response all information on the candidate must be
attitudes. weighed, as several personality researchers have
noted the problem of response sets associated
with the situational demands of the application
4.14.4.7.1 CPI
process. Specifically, job applicants may feel
The NEO-PI and other personality measures under pressure to make a good impression on
have been examined in personnel selection their prospective employer and they may
studies covering a wide range of occupational subsequently present themselves as overly
groups (managers, secretaries, carpenters, etc.). virtuous or defensive.
However, the CPI has a long tradition of use in
the selection of a specific occupation: law
enforcement officers. Hargrave and Hiatt 4.14.5 SUMMARY
(1989) examined the ability of the CPI scales
to differentiate police cadets rated by their Human fascination with the concept of
instructors as suitable or unsuitable for the job personality lead to the nineteenth century
of law enforcement officer. Several scales invention of the first objective personality test.
References 425

As with those very early personality measures, tories that measure the Big Five (or Five Factor
many of today's objective personality inven- Model, FFM) personality traits: Neuroticism
tories are self-reports. The clinician's first (N), Extroversion (E), Openness to Experience
concern when utilizing an objective personality (O), Agreeableness (A), and Conscientiousness
measure is whether or not a client can accurately (C). The NEO Personality Inventory (NEO-PI)
reveal information about his or her personality assesses N, E, and O and has been used as an
through a self-report instrument. Many objec- index of the Big Five in research. The Multi-
tive personality instruments incorporate indices dimensional Personality Questionnaire (MPQ)
of test-taking attitudes (e.g., the Lie scale of the assesses the broad personality domains of
MMPI/MMPI-2) which allow the clinician to Positive Affectivity, Negative Affectivity, and
gauge a client's level of insight and willingness Constraint. The MPQ contains two validity
to self-disclose. Additionally, research shows indexes (VRIN and TRIN) common to the
that clients who do cooperate with testing MMPI-2, which make the MPQ appealing to
produce personality profiles that match external many researchers. Other objective personality
criteria (e.g., clinician's notes and observations instruments that have been used widely in
regarding the patient). Essentially, most clients research include the Personality Research Form
are able to reveal their personalities compe- (PRF), a measure of several psychosocial needs;
tently through self-report measures. the 16PF, which measures global factors similar
Once a client is able to self-disclose informa- to the Big Five; and the California Psychological
tion regarding his or her personality, the scale Inventory (CPI), designed to assess ªeverydayº
scores that are produced appear to be quite concepts about personality.
stable over time. Thus, most of the objective Finally, professionals working in the fields of
personality measures manage to capture trait educational/vocational assessment and person-
(as opposed to state) characteristics. Five nel selection have found use for several objective
factors exhibit influence on the stability of personality measures. The 16PF, the CPI, and,
personality: (i) instrument characteristics, (ii) most commonly, the MMPI/MMPI-2 have
length of retest interval, (iii) operationalization been used in both of these settings. Objective
of personality as a stable construct for test personality instruments give professionals in-
construction, (iv) the extent to which a formation about an individual's personality
particular personality construct is associated which would not be obtained through standard
with stability, and (v) person variables of the applications or interviews. Thus, professionals
test-taker. can use objective personality inventories as an
Several objective personality measures are efficient method of obtaining more information
designed to assess adults in clinical settings. The to aid them in their task of advising clients about
most well-known and widely used objective educational/vocational decisions or advising
personality inventory in clinical settings is the employers in the selection of personnel.
MMPI-2, which provides a comprehensive Whether one needs a tool to assess adult
survey of personality characteristics and clinical personality in clinical, research, or industry
problems. The Basic Personality Inventory settings, there is probably an objective person-
(BPI) and the Personality Assessment Inventory ality inventory to fit the bill. Many of today's
(PAI) are alternatives to the MMPI-2, but objective inventories offer the efficiency of
neither is as widely used as the MMPI-2. Several providing a comprehensive assessment of
other objective measures have been developed personality functioning, and some even provide
for specialized or focused clinical use including computerized interpretation of the personality
the Millon Clinical Multiaxial Inventory profile. Perhaps most importantly, most of the
(MCMI) designed for making personality objective personality inventories available com-
diagnoses; the Beck Depression Inventory mercially are standardized instruments that can
(BDI), which assesses depressive ideation; the assess adult personality validly and reliably,
State-Trait Anxiety Inventory (STAI) designed which means these instruments can be used over
to assess both long-term and short-term anxiety the course of a client's treatment to document
features; and the Whitaker Index of Schizo- goals for changeÐa feature that is becoming
phrenic Thinking (WIST) designed to measure increasingly important to clinicians in this era of
the type of thought impairment that differenti- managed healthcare.
ates between schizophrenic and ªnormalº
thought processes.
While certain objective personality measures 4.14.6 REFERENCES
were developed for and have been used widely in
Allard, C., & Mishara, B. L. (1995). Individual differences
clinical setting, several inventories have gained in stimulus intensity modulation and its relationship
use in research and other normal range assess- to two styles of depression in older adults. Psychology
ment settings. Many researchers utilize inven- and Aging, 10, 395±403.
426 Objective Personality Assessment with Adults

Alterman, A. I., Zaballero, A. R., Lin, M. M., Siddiqui, N., Inventory. Journal of Psychopathology and Behavioral
Brown, L. S., Jr., Rutherford, M. J., & McDermott, Assessment, 16, 173±187.
P. A. (1995). Personality Assessment Inventory scores Burbeck, E., & Furnharm, A. (1985). Police officer
of lower-socioeconomic African American and Latino selection: A critical review of the literature. Journal of
methadone maintenance patients. Assessment, 2, Police Science and Administration, 13, 58±69.
91±100. Burch, J. W. (1995). Typicality range deficit in schizo-
Anderson, W. F. (1949). Predicting success in nurses phrenics' recognition of emotion in faces. Journal of
training. Unpublished master's thesis, University of Clinical Psychology, 51, 140±150.
Nebraska, Lincoln, NE. Burger, G. K., & Cross, D. T. (1979). Personality types as
Appleby, T. L., & Haner, C. F. (1956). MMPI profiles of a measured by the California Psychological Inventory.
college faculty group. Proceedings of the Iowa Academy Journal of Consulting and Clinical Psychology, 47, 65±71.
of Sciences, 53, 605±609. Burgess, E. (1956). Personality factors of over- and under-
Applezweig, M. H. (1953). Educational levels and Minne- achievers in engineering. Journal of Educational Psychol-
sota Multiphasic profiles. Journal of Clinical Psychology, ogy, 47, 89±99.
9, 340±344. Bush, A. J., & Lucas, G. H. (1988). Personality profiles of
Arbisi, P., & Ben-Porath, Y. S. (1995) An MMPI-2 marketing vs. R&D managers. Psychology and Market-
infrequency scale for use with psychopathological ing, 5, 17±32.
populations: The Infrequency±Psychopathology Scale, Butcher, J. N. (1989). MMPI-2 scale correlates. MMPI-2
F (p). Psychological Assessment, 7, 424±431. Workshop Materials. Minneapolis MN: University of
Archer, R. P., Griffin, R., & Aiduk, R. (1995). Clinical Minnesota Press.
correlates for ten common code types. Journal of Butcher, J. N. (1991). Screening for psychopathology:
Personality Assessment, 65, 391±408. Industrial applications of the Minnesota Multiphasic
Assendorp, J. B. (1992). Beyond stability: Predicting inter- Personality Inventory-2 (MMPI-2). In J. Jones, B. D.
individual differences in intra-individual change. Eur- Steffey, & D. Bray (Eds.), Applying psychology in
opean Journal of Personality, 6, 103±117. business: The manager's handbook. Boston: Lexington.
Bailey, D. M. (1988). Occupational therapy administrators Butcher, J. N. (1994). Psychological assessment of airline
and clinicians: Differences in demographics and values. pilot applicants with the MMPI-2. Journal of Personality
Occupational Therapy Journal of Research, 8, 299±315. Assessment, 62, 31±44.
Bailley, S. E., & Ross, S. R. (1996, May). The effects of Butcher, J. N. (1995). Clinical use of computer-based
simulated faking on the five-factor (NEO-FFI). Paper personality test reports. In J. N. Butcher (Ed.), Clinical
given at the 68th Annual Meeting of Midwestern personality assessment: Practical approaches (pp. 78±94).
Psychological Association, Chicago. New York: Oxford University Press.
Barger, B., & Hall, E. (1964). Personality patterns and Butcher, J. N., Berah, E., Ellertsen, B., Miach, P., Lim, J.,
achievement in college. Educational and Psychological Nezami, E., Pancheri, P., Derksen, J., & Almagor, M.
Measurement, 24, 339±346. (1998). Objective personality assessment: Computer-
Barrick, M. R., & Mount, M. K. (1991). The big five based MMPI-2 interpretation in international clinical
personality dimensions and job performance: A meta- settings. In C. Belar (Ed.), Comprehensive clinical
analysis. Personnel Psychology, 44, 1±26. psychology: Sociocultural and individual differences.
Barthol, R. P., & Kirk, B. A. (1956). The selection of New York: Elsevier.
graduate students in public health education. Journal of Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen,
Applied Psychology, 40, 159±163. A., & Kaemmer, B. (1989). Minnesota Multiphasic
Bartram, D. (1992). The personality of UK managers: Personality Inventory-2 (MMPI-2): Manual for admin-
16PF norms for short-listed applicants. Journal of istration and scoring. Minneapolis, MN: University of
Occupational and Organizational Psychology, 65, Minnesota Press.
159±172. Butcher, J. N., Graham, J. R., Williams, C. L., & Ben-
Beck, A. T., Steer, R. A., & Garbin, M. G. (1988). Porath, Y. S. (1990). Development and use of the MMPI-
Psychometric properties of the Beck Depression Inven- 2 Content Scales. Minneapolis, MN University of
tory: Twenty-five years of evaluation. Clinical Psychol- Minnesota Press.
ogy Review, 8, 77±100. Butcher, J. N. & Han, K. (1995). Development of an
Ben-Porath, Y. S., Butcher, J. N., & Graham, J. R. (1991). MMPI-2 scale to assess the presentation of self in a
Contribution of the MMPI-2 scales to the differential superlative manner: The S Scale. In J. N. Butcher & C.
diagnosis of schizophrenia and major depression. Psy- D. Spielberger (Eds.), Advances in personality assessment,
chological Assessment: A Journal of Consulting and (Vol. 10, pp. 25±50). Hillsdale, NJ.: Erlbaum.
Clinical Psychology, 3, 634±640. Butcher, J. N., Jeffrey, T., Cayton, T. G., Colligan, S.,
Ben-Porath, Y. S., McCully, E., & Almagor, M. (1993). DeVore, J., & Minnegawa, R. (1990). A study of active
Incremental validity of the MMPI-2 Content Scales in duty military personnel with the MMPI-2. Military
the assessment of personality and psychopathology by Psychology, 2, 47±61.
self-report. Journal of Personality Assessment, 61, Butcher, J. N., & Rouse, S. V. (1996). Personality:
557±575. Individual differences and clinical assessment. Annual
Beutler, L. E., Nussbaum, P. D., & Meredith, K. E. (1988). Review of Psychology, 47, 87±111.
Changing patterns of police officers. Professional Psy- Butcher, J. N., & Williams, C. L. (1992). MMPI-2 and
chology: Research and Practice, 19, 503±507. MMPI-A: Essentials of clinical interpretation. Minnea-
Beutler, L. E., Storm, A., Kirkish, P., Scogin, F., & Gaines, polis, MN: University of Minnesota Press.
J. A. (1985). Parameters in the prediction of police officer Cattell, R. B. (1963). Personality, role, mood, and situation
performance. Professional Psychology: Research and perception: A unifying theory of modulators. Psycholo-
Practice, 16, 324±335. gical Review, 70, 1±18.
Blake, D. D., Penk, W. E., Mori, D. L., Kleespies, P. M., Cattell, R. B., Cattell, A. K. S., & Cattell, H. E. P. (1993).
Walsh, S. S., & Keane, T. M. (1992). Validity and clinical Sixteen Personality Factor Questionnaire (5th ed.).
scale comparisons between MMPI and MMPI-2 with Champaign, IL: Institute for Personality and Ability
psychiatric inpatients. Psychological Reports, 70, Testing.
323±332. Centi, P. (1962). Personality factors related to college
Boyle, G. J., & Lennon, T. J. (1994). Examination of the success. Journal of Educational Research, 55, 187±188.
reliability and validity of the Personality Assessment Cerf, A. Z. (1947). Personality inventories. In J. P. Guilford
References 427

(Ed.), Printed classification tests. Washington, DC: (Eds.), Test critiques (Vol. III, pp. 717±725). Kansas
Army Air Force Aviation Psychology Program Research City, KS: Test Corporation Of America.
Reports. Fekken, G. C., & Holden, R. R. (1991). The construct
Cerling, G. L. (1983). Selection of lay counselors for a validity of person reliability. Personality and Individual
church counseling center. Journal of Psychology and Differences, 12, 69±77.
Christianity, 2, 67±72. Ferris, G. R., Bergin, T. G., & Wayne, S. J. (1988).
Chakrabarti, P. K., & Kundu, R. (1984). Personality Personal characteristics, job performance, and absentee-
profiles of management personnel. Psychological Studies, ism of public school teachers. Journal of Applied Social
29, 143±146. Psychology, 18, 552±563.
Chick, D., Martin, S. K., Nevels, R., & Cotton, C. R. Finn, S. E. (1986). Stability of personality self-ratings over
(1994). Relationship between personality disorders and 30 years: Evidence for an age/cohort interaction. Journal
clinical symptoms in psychiatric inpatients as measured of Personality and Social Psychology, 50, 813±818.
by the Millon Clinical Multiaxial Inventory. Psycholo- Fishkin, S. M., Lovallo, W. R., & Pishkin, V. (1977).
gical Reports, 74, 331±336. Relationship between schizophrenic thinking and MMPI
Chick, D., Sheaffer, C. I., Goggin, W. C., & Sison, G. F. for process and reactive patients. Journal of Clinical
(1993). The relationship between MCMI personality Psychology, 33, 116±119.
scales and clinician generated DSM-III personality Flynn, P. M. (1995). Issues in the assessment of personality
disorder diagnoses. Journal Personality Assessment, 61, disorder and substance abuse using the Millon Clinical
264±276. Multiaxial Inventory (MCMI-II). Journal of Clinical
Clark, D. L. (1964). Exploring behavior in men's residence Psychology, 51, 415±421.
halls using the MMPI. Personnel Guidance Journal, 43, Frick, J. W. (1955). Improving the prediction of academic
249±251. achievement by use of the MMPI. Journal of Applied
Clark, J. H. (1953). Grade achievement of female college Psychology, 39, 49±52.
students in relation to non-intellective factors: MMPI Fulkerson, S. C., Freud, S. L., & Raynor, G. H. (1958,
items. Journal of Social Psychology, 37, 275±281. February). The use of the MMPI in psychological
Clark, M. E. (1996). MMPI-2 negative treatment indicators evaluation of pilots. Aviation Medicine 122±128.
content and content component scales: Clinical corre- Fulkerson, S. C., & Sells, S. B. (1958). Adaptation of the
lates and outcome prediction for men with chronic pain. MMPI for aeromedical practice norms for military
Psychological Assessment, 8, 32±47. pilots. USAF School of Aviation Medicine Reports,
Conley, J. J. (1985). Longitudinal stability of personality 58±128.
traits: A multi trait±multi method±multi occasion Furnham, A. (1991). Personality and occupational success:
analysis. Journal of Personality and Social Psychology, 16PF correlates of cabin crew performance. Personality
49, 1266±1282. and Individual Differences, 12, 87±90.
Cooper, C. L., & Green, M. D. (1976). Coping with Garetz, F. K., & Tierney, R. W. (1962). Personality
occupational stress among Royal Air Force personnel on variables in army officer candidates. Military Medicine,
isolated island bases. Psychological Reports, 39, 731±734. 127, 669±672.
Costa, P. T., Jr., & McCrae, R. R. (1992). Normal Geist, C. R., & Boyd, S. T. (1980). Personality character-
personality assessment in clinical practice: The NEO istics of Army helicopter pilots. Perceptual Motor Skills,
Personality Inventory. Psychological Assessment, 4, 51(1), 253±254.
5±13. Goorney, A. B. (1970). MMPI and MMPI scores,
Crookes, T. G., & Buckley, S. J. (1976). Lie score and correlations and analysis for military aircrew population.
insight. Irish Journal of Psychology, 3, 134±136. British Journal of Social and Clinical Psychology, 9,
Deniston, W. M., & Ramanaiah, N. V. (1993). California 164±170.
Psychological Inventory and the five-factor model of Gottfredson, G. D., Jones, E. M., & Holland, J. L. (1993).
personality. Psychological Reports, 73, 491±496. Personality and vocational interests: The relation of
Dinning, W. D., & Evans, R. G. (1977). Discriminant and Holland's six interest dimensions to five robust dimen-
convergent validity of the SCL-90 in psychiatric inpa- sions of personality. Journal of Counseling Psychology,
tients. Journal of Personality Assessment, 41, 304±310. 40, 518±524.
Dunn, W. S., Mount, M. K., Barrick, M. R., & Ones, D. S. Gough, H. G. (1957). Manual for the California Psycholo-
(1995). Relative importance of personality and general gical Inventory. Palo Alto, CA: Consulting Psychologists
mental ability in managers' judgments of applicant Press.
qualifications. Journal of Applied Psychology, 80, Graham, J. R., & Ben-Porath, Y. S. (1990, June).
500±509. Congruence between the MMPI and MMPI-2. Paper
Dyer, J. B., Sajwaj, T. E. G., & Ford, T. W. X. (1993, given at the 25th Annual Symposium on Recent
March). MMPI-2 normative and comparative data for Developments in the Use of the MMPI/MMPI-2,
nuclear power plant personnel who were approved or Minneapolis, MN.
denied security clearances for psychological reasons. Graham, J. R., & Butcher, J. N. (1988, March). Differ-
Paper presented at the 28th Annual Symposium on entiating schizophrenic and major affective disorders with
Recent Developments in the Use of the MMPI/MMPI-2, the revised form of the MMPI. Paper presented at the
St. Petersburg, FL. 23rd Annual Symposium on Recent Developments in the
Egeland, B., Erickson, M., Butcher, J. N., & Ben-Porath, Use of the MMPI, St. Petersburg, FL.
Y. S. (1991). MMPI-2 profiles of women at risk for child Grigoriadis, S. (1993). Sources of inconsistency on tests of
abuse. Journal of Personality Assessment, 57, 254±263. psychopathology. Unpublished doctoral dissertation,
Evans, R. G., & Dinning, W. D. (1980). A validation of Queen's University, Kingston, ON.
Forms A and B of the Whitaker Index of Schizophrenic Hargrave, G. E., & Hiatt, D. (1987, May). Use of the
Thinking. Journal of Personality Assessment, 44, MMPI to predict aggression in law enforcement officer
416±419. applicants. Paper presented at the 22nd Annual Sympo-
Fabricatore, J., Azen, S. P., Schoentgen, S., & Snibbe, H. sium on Recent Developments in the Use of the MMPI,
(1978). Predicting performance of police officers using Seattle, WA.
the Sixteen Personality Factor Questionnaire. American Hargrave, G. E., & Hiatt, D. (1989). Use of the California
Journal of Community Psychology, 6, 63±70. Psychological Inventory in law enforcement officer
Fekken, G. C. (1985). The Whitaker Index of Schizo- selection. Journal of Personality Assessment, 53, 267±277.
phrenic Thinking. In D. J. Keyser & R. C. Sweetland Hartman, B. J. (1987). Psychological screening of law
428 Objective Personality Assessment with Adults

enforcement candidates. American Journal of Forensic clinical information. Journal of Research in Personality,
Psychology, 5, 5±10. 7, 225±236.
Hathaway, S. R., & McKinley, J. C. (1940). A multiphasic Krueger, R. F., Schmutte, P. S., Caspi, A., Moffitt, T. E.,
personality schedule (Minnesota): 1. Construction of the Campbell, K., & Silva, P. A. (1994). Personality traits are
schedule. Journal of Psychology, 10, 249±254. linked to crime among men and women: Evidence from a
Helson, R., & Moane, G. (1987). Personality change in birth cohort. Journal of Abnormal Psychology, 103,
women from college to midlife. Journal of Personality 328±338.
and Social Psychology, 53, 176±186. Kuhne, A., Orr, S., & Baraga, E. (1993). Psychometric
Henney, A. S. (1975). Personality characteristics of a group evaluation of post-traumatic stress disorder: The Multi-
of industrial managers. Journal of Occupational Psychol- dimensional Personality Questionnaire as an adjunct to
ogy, 48, 65±67. the MMPI. Journal of Clinical Psychology, 49, 218±225.
Hills, H. A. (1995). Diagnosing personality disorders: An Lardent, C. L. (1991). Pilots who crash: Personality
examination of the MMPI-2 and MCMI-II. Journal of constructs underlying accident prone behavior of fighter
Personality Assessment, 65, 21±34. pilots. Multivariate Experimental Clinical Research, 10,
Hindelang, M. J. (1972). The relationships of self-reported 1±25.
delinquency to scales of the CPI and MMPI. Journal of Laufer, W. S., Skoog, D. K., & Day, J. M. (1982).
Criminal Law, Criminology, and Police Science, 63, Personality and criminality: A review of the California
75±81. Psychological Inventory. Journal of Clinical Psychology,
Hjemboe, S., & Butcher, J. N. (1991). Couples in marital 38, 562±573.
distress: A study of demographic and personality factors Lavin, P. F., Chardos, S. P., Ford, W. T., & McGee, R. K.
as measured by the MMPI-2. Journal of Personality (1987). The MMPI profiles of troubled employees in
Assessment, 57, 216±237. relation to nuclear power plant personnel norms.
Hoffman, R. G., & Davis, G. L. (1995). Prospective Transactions of the American Nuclear Society, 54,
validity study: CPI work orientation and managerial 146±147.
potential scales. Educational and Psychological Measure- Lawrence, R. A. (1984). Police stress and personality
ment, 55, 881±890. factors: A conceptual model. Journal of Criminal Justice,
Holden, R. R., Fekken, G. C., Reddon, J. R., Helmes, E., 12, 247±263.
& Jackson, D. N. (1988). Clinical reliabilities and Leslie, B. A., Landmark, J., & Whitaker, L. C. (1984). The
validities of the Basic Personality Inventory. Journal of Whitaker Index of Schizophrenic Thought (WIST) and
Consulting and Clinical Psychology, 56, 766±768. thirteen systems for diagnosing schizophrenia. Journal of
Husband, S. D., & Iguchi, M. (1995). Comparison of Clinical Psychology, 40, 636±648.
MMPI-2 and MMPI clinical scales and high point scores Lorr, M. & Strack, S. (1994). Personality profiles of police
among methadone maintenance clients. Journal of candidates. Journal of Clinical Psychology, 50, 200±207.
Personality Assessment, 64, 371±375. Lubin, B., Larsen, R. M., & Matarazzo, J. D. (1984).
Inch, R., & Crossley, M. (1993). Diagnostic utility of the Patterns of psychological test usage in the United States:
MCMI-I and MCMI-II with psychiatric outpatients. 1935±1982. American Psychologist, 39, 451±454.
Journal of Clinical Psychology, 49, 358±366. Lumsden, J. (1977). Person reliability. Applied Psychologi-
Jackson, D. N. (1984). Personality Research Form manual cal Measurement, 1, 477±482.
(3rd ed.). Port Huron, MI: Research Psychologists Press. MacCabe, S. P. (1987). Millon Clinical Multiaxial Inven-
Jackson, D. N. (1989). Basic Personality Inventory manual. tory. In D. J. Keyser & R. C. Sweetland (Eds.), Test
Port Huron, MI: Research Psychologists Press. Critiques Compendium (pp. 304±315). Kansas City, KS:
Jackson, D. N., & Messick, S. (1971). The Differential Test Corporation Of America.
Personality Inventory. London, ON: Authors. MacLennan, R. N. (1991). Personality Research Form
Jaffe, L. T. & Archer, R. P. (1987). The prediction of drug annotated research bibliography. Regina, AL: University
use among college students from MMPI, MCMI, and of Regina, Department of Psychology.
sensation seeking scales. Journal of Personality Assess- McCrae, R. R., & Costa, P. T., Jr. (1990). Personality in
ment, 51, 243±253. adulthood. New York: Guilford.
Jennings, L. S. (1948). Minnesota Multiphasic Personality Megargee, E. I. (1972). The California Psychological
Inventory; differentiation of psychologically good and Inventory handbook. San Francisco: Jossey-Bass.
poor combat risks among flying personnel. Journal of Millon, T. (1977). Manual for the Millon Clinical Multiaxial
Aviation Medicine, 19, 222. Inventory. Minneapolis, MN: National Computer Sys-
Johnson, R. W., Flammer, D. P., & Nelson, J. G. (1975). tems.
Multiple correlations between personality factors and Millon, T. (1987). Manual for the Millon Clinical Multiaxial
SVIB occupational scales. Journal of Counseling Psychol- Inventory-II. Minneapolis, MN: National Computer
ogy, 22, 217±223. Systems.
Keilen, W. G., & Bloom, L. J. (1986). Child custody Millon, T. (1994). Manual for the Millon Clinical Multiaxial
evaluation practices: A survey of experienced profes- Inventory-III. Minneapolis, MN: National Computer
sionals. Professional Psychology: Research and Practice, Systems.
17, 338±346. Millon, T., & Davis, R. (1995). Putting Humpty Dumpty
Keller, L. S., & Butcher, J. N. (1991). Use of the MMPI-2 together again: Using the MCMI in psychological
with chronic pain patients. Minneapolis, MN: University assessment. In L. E. Beutler & M. R. Berren (Eds.),
of Minnesota Press. Integrative assessment of adult personality (pp. 240±279).
Khan, F. I., Welch, T., & Zillmer, E. (1998). MMPI-2 New York: Guilford.
profiles of battered women in transition. Journal of Morey, L. (1991). Personality Assessment Inventory:
Personality Assessment, 60, 100±111. Professional Manual. Odessa, FL: Psychological Assess-
Kline, P., & Lapham, S. L. (1992). Personality and faculty ment Resources.
in British universities. Personality and Individual Differ- Murray, H. A. (1938). Explorations in personality. Cam-
ences, 13, 855±857. bridge, MA: Harvard University Press.
Knight, R. A., Epstein, B., & Zielony, R. D. (1980). The Patrick, J. (1993). Validation of the MCMI-I Borderline
validity of the Whitaker Index of Schizophrenic Think- Personality Disorder scale with a well-defined criterion
ing. Journal of Clinical Psychology, 36, 632±639. sample. Journal of Clinical Psychology, 49, 28±32.
Koss, M. P., & Butcher, J. N. (1973). A comparison of Paunonen, S. V., Jackson, D. N., Trzebinski, J., &
psychiatric patients' self-report with other sources of Fosterling, F. (1992). Personality structures across
References 429

cultures: A multi method evaluation. Journal of Person- Spielberger, C. D. (1976). Stress and anxiety and cardio-
ality and Social Psychology, 62, 447±456. vascular disease. Journal of South Carolina Medical
Payne, F. D., & Wiggins, J. S. (1972). MMPI profile types Association (Supplement 15), 72, 15±22.
and the self-report of psychiatric patients. Journal of Spielberger, C. D. (1983). Manual for the State Trait
Abnormal Psychology, 79, 1±8. Anxiety Inventory: STAI (form Y). Palo Alto, CA:
Payne, R. W. (1978). Review of Whitaker Index of Consulting Psychologists Press.
Schizophrenic Thought. In O. K. Buros (Ed.), The Spielberger, C. D., Gorsuch, R. L., & Lushene, R. D.
eighth mental measurements yearbook (pp. 1146±1147). (1970). The STAI: Manual for the State±Trait Anxiety
Highland Park, NJ: Gryphon Press. Inventory. Palo Alto, CA: Consulting Psychologists
Piedmont, R. L. (1993). A longitudinal analysis of burnout Press.
in the health care setting: The role of personal Spielberger, C. D., Ritterband, L. M., Sydeman, S. J.,
dispositions. Journal of Personality Assessment, 61, Reheiser, E. C., and Unger, K. K. (1995). Assessment of
457±473. emotional states and personality traits: Measuring
Pishkin, V., Lovallo, W. R., & Bourne, L. E. (1986). psychological vital signs. In J. N. Butcher (Ed.), Clinical
Thought disorder and schizophrenia: Isolating and personality assessment: Practical approaches (pp. 43±58).
timing a mental event. Journal of Clinical Psychology, New York: Oxford University Press.
42, 417±424. Tellegen, A. Brief manual for the Differential Personality
Reynolds, C. R. (1992). Review of the Millon Clinical Questionnaire. Unpublished manuscript.
Multiaxial Inventory-II. In J. J. Kramer & J. C. Conoley Topp, B. W., & Kardash, C. A. (1986). Personality,
(Eds.), The eleventh mental measurements yearbook achievement, and attrition: Validation in a multiple-
(pp. 533±535). Lincoln, NE: Buros Institute of Mental jurisdiction police academy. Journal of Police Science
Measurements. and Administration, 14, 234±241.
Saxe, S. J., & Reiser, M. (1976). A comparison of three Walsh, W. B. (1974). Consistent occupational preferences
police applicant groups using the MMPI. Journal of and personality. Journal of Vocational Behavior, 4,
Police Science and Administration, 4, 419±425. 145±153.
Schinka, J. A. (1995). Personality Assessment Inventory Watkins, C. E. (1996). On Hunsley, Harangue, and
scale characteristics and factor structure in the assess- Hoopla. Professional Psychology: Research and Practice,
ment of alcohol dependency. Journal of Personality 27, 316±318.
Assessment, 64, 101±111. Weed, N. C., Butcher, J. N., Ben-Porath, Y. S., &
Schmit, M. J., & Ryan, A. M. (1993). The Big Five in McKenna, T. (1992). New measures for assessing alcohol
personnel selection: Factor structure in applicant and and drug abuse with the MMPI-2: The APS and AAS.
nonapplicant populations. Journal of Applied Psychol- Journal of Personality Assessment, 58, 389±404.
ogy, 78, 966±974. Whitaker, L. C. (1973). The Whitaker Index of Schizo-
Schretlen, D. (1988). The use of psychological tests to phrenic Thinking. Los Angeles: Western Psychological
identify malingered symptoms of mental disorder. Services.
Clinical Psychology Review, 8, 451±476. Whitaker, L. C. (1980). Objective measurement of schizo-
Schuerger, J. M., Zarrella, K. L., & Hotz, A. S. (1989). phrenic thinking: A practical and theoretical guide to the
Factors that influence the temporal stability of person- Whitaker Index of Schizophrenic Thinking. Los Angeles:
ality by questionnaire. Journal of Personality and Social Western Psychological Services.
Psychology, 56, 777±783. Whitbourne, S. K., Zuschlag, L. B., Elliot, L. B., &
Scogin, F., & Beutler, L. (1986). Psychological screening of Waterman, A. S. (1992). Psychosocial development in
law enforcement candidates. In P. A. Keller & L. G. Ritt adulthood: A 22-year sequential study. Journal of
(Eds.), Innovations in clinical practice (Vol. 5, Personality and Social Psychology, 63, 260±271.
pp. 317±330). Sarasota, FL: Professional Resources Wiggins, J. S. (1969). Content dimensions in the MMPI. In
Exchange. J. N. Butcher (Ed.), MMPI: Research developments and
Scogin, F., & Reiser, M. (1976). A comparison of three clinical applications (pp. 127±180).
police applicant groups using the MMPI. Journal of Wiggins, J. S. (1973). Personality and prediction: Principles
Police Science and Administration, 4, 419±425. of personality assessment. Reading, MA: Addison-
Soldz, S., Budman, S., Demby, A., & Merry, J. (1993). Wesley.
Diagnostic agreement between the Personality Disorder Williams, J. E., Munick, M. L., Saiz, J. L., & Formy-
Examination and the MCMI-II. Journal of Personality Duval, D. L. (1995). Psychological importance of the
Assessment, 60, 486±499. ªBig Fiveº: Impression formation and context effects.
Spielberger, C. D. (1977). Anxiety, theory and research. In Personality and Social Psychology Bulletin, 21, 818±826.
B. Wolman (Ed.) International encyclopedia of neurology, Windle, C. (1954). Test±retest effect on personality
psychiatry, psychoanalysis, and psychology. New York: questionnaires. Educational and Psychological Measure-
Human Sciences Press. ment, 14, 617±633.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.15
Projective Assessment of Children
and Adolescents
IRVING B. WEINER and KATHRYN KUEHNLE
University of South Florida, Tampa, FL, USA

4.15.1 INTRODUCTION 432


4.15.2 OBJECTIVITY AND SUBJECTIVITY IN PERSONALITY ASSESSMENT METHODS 432
4.15.3 STRUCTURE AND AMBIGUITY IN PROJECTIVE TECHNIQUES 433
4.15.4 VALUE OF PROJECTIVE ASSESSMENT 435
4.15.4.1 Conceptual Basis 435
4.15.4.2 Empirical Basis 436
4.15.5 UTILITY OF PROJECTIVE ASSESSMENT 436
4.15.6 APPLICABILITY OF PROJECTIVE METHODS TO CHILDREN AND ADOLESCENTS 437
4.15.7 REVIEW OF PROJECTIVE ASSESSMENT METHODS 439
4.15.7.1 Rorschach Inkblot Method 439
4.15.7.1.1 Administration and scoring 439
4.15.7.1.2 Psychometric foundations 439
4.15.7.1.3 Clinical utility 440
4.15.7.2 Thematic Apperception Test 440
4.15.7.3 Administration and scoring 441
4.15.7.3.1 Psychometric foundations 441
4.15.7.3.2 Clinical utility 442
4.15.7.4 Children's Apperception Test 442
4.15.7.4.1 Administration and scoring 443
4.15.7.4.2 Psychometric foundations 443
4.15.7.4.3 Clinical utility 443
4.15.7.5 Roberts Apperception Test for Children 444
4.15.7.5.1 Administration and scoring 444
4.15.7.5.2 Psychometric foundations 444
4.15.7.5.3 Clinical Utility 445
4.15.7.6 Tell-me-a-story 445
4.15.7.6.1 Administration and scoring 446
4.15.7.6.2 Psychometric foundations 446
4.15.7.6.3 Clinical utility 447
4.15.7.7 Draw-a-person 447
4.15.7.7.1 Administration and scoring 448
4.15.7.7.2 Psychometric foundations 448
4.15.7.7.3 Clinical utility 449
4.15.7.8 House-tree-person 449
4.15.7.8.1 Administration and scoring 450
4.15.7.8.2 Psychometric foundations 450
4.15.7.8.3 Clinical utility 450
4.15.7.9 Kinetic Family Drawing 450
4.15.7.9.1 Administration and scoring 451

431
432 Projective Assessment of Children and Adolescents

4.15.7.9.2 Psychometric foundations 451


4.15.7.9.3 Clinical utility 452
4.15.7.10 Sentence Completion Methods 452
4.15.7.10.1 Administration and scoring 452
4.15.7.10.2 Psychometric foundations 453
4.15.7.10.3 Clinical utility 453
4.15.8 FUTURE DIRECTIONS 454
4.15.9 SUMMARY 454
4.15.10 REFERENCES 455

4.15.1 INTRODUCTION objective methods. In actuality, however, ambi-


guity is a dimensional rather than a categorical
Projection, as first formulated by Freud characteristic of tests, and there is little basis for
(1962) and later elaborated in his presentation regarding projective methods as inherently
of the Schreber case (Freud, 1958), consists of unscientific and invalid or as sharply distinct
attributing one's own characteristics to external from objective measures.
objects or events without adequate justification Being scientific does not inhere in the nature
or conscious awareness of doing so. Frank of a method or instrument, whether subjective or
(1939) suggested that personality tests in which not, but only in whether it can be studied
there is relatively little structure induce a subject scientifically. When projective tests are used to
to ªproject upon that plastic field . . . his private generate personality descriptions that can be
world of personal meanings and feelingsº independently and reliably assessed for their
(pp. 395, 402). By linking the concept of accuracy, they function as a scientific procedure.
projection to the response process in such Likewise, the validity of a test inheres not in
measures as the Rorschach Inkblot Method its nature, but rather in the extent to which it
and the Thematic Apperception Test (TAT), generates significant correlations with person-
Frank gave birth to the so-called projective ality characteristics or behaviors it can identify
hypothesis in personality assessment. His ob- or predict. Abundant research attests that
servations about what he called ªprojection projective methods, when properly used, can
measuresº led to the Rorschach, the TAT, and yield valid inferences (Hibbard et al., 1994;
other assessment methods involving some Hibbard, Hilsenroth, Hibbard, & Nash, 1995;
ambiguity being routinely designated as pro- Parker, Hanson, & Hunsley, 1988; Weiner,
jective tests. 1996).
This chapter begins with some observations Regarding sharp distinctions between pro-
concerning the nature of objectivity and sub- jective and objective measures, the subject's task
jectivity in personality assessment methods and on commonly used personality tests injects
the role of structure and ambiguity in different considerable objectivity into many projective
types of projective techniques. It then turns to methods and substantial subjectivity into most
the value and utility of projective assessment objective methods. For example, responses on
and the applicability of projective methods in the Rorschach, which is the most widely used
clinical work with young people. Information is projective measure, are routinely inquired by
given on the composition, administration, asking subjects ªWhere did you see it?,º which is
scoring, psychometric foundations, and clinical a concrete, unambiguous request for a specific
utility of nine projective techniques widely used item of objective information. When subjects
in assessing children and adolescents. reply that they used the whole blot for a percept,
their response is given a location choice code of
4.15.2 OBJECTIVITY AND W, which is an objective and unambiguous
SUBJECTIVITY IN PERSONALITY procedure on which coders achieve virtually
ASSESSMENT METHODS 100% agreement. Numerous other features of
how subjects choose to look at the inkblots can
As a legacy of Frank's projective hypothesis, also be objectively coded with good inter-rater
tests designated as projective methods came to reliability, such as the percentage of responses
be regarded as subjective in nature and hence given to the multicolored cards (affective ratio),
quite different from objective methods, such as the number of commonly given percepts
self-report inventories. This presumed subjec- reported (populars), and the kinds of objects
tivity of projective methods fostered a common- the blots are said to resemble (people, animals,
place conviction that these methods are etc.) (Exner, 1991, pp. 459±460; McDowell &
inherently less scientific and less valid than Acklin, 1996; Seaton & Allen, 1996).
Structure and Ambiguity in Projective Techniques 433

Elements of objectivity mark the interpreta- scores and between the test profile and aspects
tion as well as the coding of many Rorschach of a subject's clinical history, interview beha-
variables. In the case of whole responses, for vior, and performance on other tests. Some of
example, an unusual preponderance of W in a these complex interactions have been examined
record correlates with objectively observable empirically, such as various two- and three-
tendencies to attend to experience in a global point codes on the MMPI-2/MMPI-A, but
fashion; a low affective radio identifies an many have not.
inclination to withdraw from emotionally This is not to say that the MMPI-2/MMPI-A
charged situations; a small number of popular and other self-report instruments are basically
responses correlates with behavioral manifesta- subjective in nature or that they derive their
tions of unconventionality; and numerous hu- utility primarily from clinical judgment. The
man percepts is associated with an active point is merely that, just as projective instru-
interest in people. In these and many other ments are not entirely subjective, self-report
ways, Rorschach responses identify personality methods are not completely objective, but
characteristics through an objective process of instead involve some aspects of ambiguity in
coding response features and relating these how subjects respond to them and how
coded features to their known corollaries in examiners interpret them.
observable behavior. It is for this reason that ambiguity is not a
There are aspects of Rorschach interpretation categorical function that characterizes some
that may be highly subjective, especially when tests called projective measures, but not others
inferences are drawn from the thematic imagery called objective measures. Instead, ambiguity is
subjects produce when they associate to the a dimensional function that characterizes most
inkblots along with describing them. Moreover, tests to some degree, in relation to how struc-
most other projective methods have not been as tured they are. Generally speaking, objective
extensively codified as the Rorschach and tests are more structured than projective tests
depend more on qualitative than quantitative and therefore less ambiguous; projective tests are
analysis. The present point is merely that the generally less structured than objective tests and
basic nature of projective methods does not hence more ambiguous; and there is no sharp
preclude their being codified and interpreted to objective/subjective dichotomy between rela-
some extent along objective lines. tively structured and relatively unstructured
Turning now to aspects of subjectivity in instruments.
objective methods, consider the uncertainty that
characterizes many items in the most widely
used objective measure of personality, the adult 4.15.3 STRUCTURE AND AMBIGUITY IN
and adolescent forms of the Minnesota multi- PROJECTIVE TECHNIQUES
phasic personality inventory (MMPI-2/MMPI-
A). Although MMPI-2/MMPI-A instructions Projective techniques comprise inkblot meth-
to respond true or false are unambiguous and ods, story-telling methods, figure drawing
the coding of these responses is completely methods, and sentence completion methods.
objective, Weiner (1993) has previously called In addition to being less structured than
attention to the idiography that is embedded in objective measures, these four types of projective
asking subjects to interpret such items as ªI methods differ from each other in their degree of
often lose my temper.º Items of this type ambiguity and in whether their ambiguity
provide no benchmarks for the frequency of resides mainly in their stimuli or in their
ªoftenº or for what constitutes loss of temper. instructions.
In many instances, consequently, responses to Thus in the case of the Rorschach inkblot
self-report items involve subjectivity on the part method, subjects are asked to look at relatively
of respondents, who must define for themselves ambiguous stimuli but are given fairly specific
what certain terms mean before they can decide instructions to indicate what they see, where
how to answer. they see it, and what makes it look as it does.
Subjectivity influences the interpretation as Story-telling methods such as the TAT involve
well as the response process in objective showing subjects real pictures that are much less
assessment. Granted, the hallmark of objective ambiguous than inkblots; however, by using
tests is an extensive array of quantified scale general instructions (ªTell me a storyº) and
scores having empirically demonstrated beha- open-ended questions (ªWhat will happen
vioral correlates. Nevertheless, the interpreta- next?º), examiners provide only minimal gui-
tion of self-report measures in clinical practice dance in how subjects should respond. If the
typically goes beyond identifying known cor- Rorschach instructions were ªTell me a story
ollaries of scale scores to include consideration about this inkblot,º the Rorschach would be
of complex patterns of interaction among these more ambiguous and more of a projective test
434 Projective Assessment of Children and Adolescents

than it is. If the TAT instructions were ªTell me typically include both relatively objective and
what you see here,º the TAT would lose most of relatively subjective elements. As elaborated by
its ambiguity and function only barely as a Weiner (1977), the objective elements of
projective test. projective test data involve structural features
Figure drawing techniques use no stimuli at of the manner in which responses are formu-
all, save a blank piece of paper, and provide little lated, whereas the subjective elements consist of
guidance to subjects, other than some instruc- thematic features of the imagery with which
tions concerning the figures to be drawn (e.g., responses are embellished.
yourself, a family). Sentence completion meth- When projective test data are being inter-
ods, in common with story-telling techniques, preted objectively, structural aspects of the
call for subjects to provide thematic content in subject's responses, such as focusing on wholes
response to real and relatively unambiguous test and seeing numerous human figures on the
stimuli. Unlike story-telling techniques, how- Rorschach, are taken as being directly repre-
ever, sentence completion methods do not sentative of similar behavioral tendencies in
ordinarily involve querying subjects about their the person's life, that is, attending to experi-
responses or encouraging them to elaborate ence globally and paying close attention to
those that are brief or unrevealing. Thus a stem people. When projective test data are being
of ªI AMº may be completed with ªa happy interpreted subjectively, thematic imagery is
person,º in which case some subjectivity has taken as being indirectly symbolic of a
been allowed to enter the response, or simply subject's underlying needs, attitudes, conflicts,
with ªhere,º in which case only a completely and concerns. Thus the Rorschach response of
objective response has been given. On balance, ªTwo girls who are really mad at each other
figure drawing techniques are the most ambig- fighting over something they both wantº may
uous of projective tests and sentence completion identify a subject's experiencing peer or sibling
methods the least, with inkblot and story-telling rivalry, viewing social interactions as aggres-
techniques in between. sive confrontations in which people are only
These differences in ambiguity among pro- concerned with what they can get for them-
jective methods were originally noted by Stone selves, or feeling angry or resentful about being
and Dellis (1960), who proposed ªa levels in such situations.
hypothesisº to take practical account of this On story-telling measures, an example of a
variability. According to the levels hypothesis, structural response feature is giving long stories,
the degree to which a test is structured is directly which can be objectively scored (by counting the
related to the level of conscious awareness at number of words) and which provides a
which it taps personality processes. The more representative indication of inclinations to be
structured and less ambiguous a test is, the more verbose. As for subjectively interpreted features,
likely it is to yield information about relatively a TAT story in which two people are described
conscious and superficial levels of personality; as about to separate, leaving one of them sad
conversely, the less structured and more and lonely for the rest of his or her life,
ambiguous a test is, the more likely it is to exemplifies thematic imagery that appears to
provide information about deeper levels of symbolize concerns about suffering the loss of
personality and characteristics of which subjects love objects and facing an unhappy future.
themselves may not be consciously aware. On figure drawing measures, which as
Research reported by Stone and Dellis (1960) previously noted are the most ambiguous of
and subsequently replicated by Murstein and projective tests, structural features of the data
Wolf (1970) provided empirical support for a are limited. Some variables, such as the size of
relationship between the ambiguity of a test and figures drawn, how complete they are, and
its likelihood of measuring deeper levels of whether they are clothed, are objective facts that
personality, especially in normally functioning can usually be coded with good agreement.
persons. These findings mirrored the basic However, interpretation of such objective
conception of TAT assessment articulated by characteristics of figure drawings, as well as
Murray (1951), who regarded the virtue of the of subjective impressions of drawing qualities, is
instrument as residing not in its revelations based mostly on their being symbolic rather
about what subjects are able and willing to say than representative of behavior. Interpreting the
about themselves, but in what it conveys about way figures are drawn or placed is thus primarily
personality characteristics: ªthe patient is un- thematic. For example, unusual emphasis on a
willing to tell or is unable to tell because he is particular part of the body may be interpreted as
unconscious of themº (p. 577). suggesting concern about functions associated
In addition to differing from objective tests with that part of the body, and a family drawing
and from each other in their degree of structure in which the self is located on one side of the
and ambiguity, individual projective measures page and the other family members are closely
Value of Projective Assessment 435

grouped on the other side of the page may be 4.15.4.1 Conceptual Basis
interpreted as symbolizing feelings of isolation
or rejection in the family setting. Because of their relatively unstructured
In sentence completion responses, frequent nature, projective tests measure personality
self-referencing is an example of an objectively characteristics in subtle and indirect ways. Even
scorable, behaviorally representative structural those features of projective test data that can be
index of tendencies to focus attention on oneself objectively scored and interpreted involve
rather than others. Consider the difference responses that seldom have obvious meaning.
between the completions ªWHAT PAINS ME Subjects in the process of responding usually
is seeing how many unfortunate people there are have little awareness of the interpretive sig-
in the worldº and ªWHAT PAINS ME is not nificance that attaches to their seeing numerous
being able to get the things that I want.º An human figures on the Rorschach, giving long
accumulation of the latter as opposed to the stories on the TAT, drawing themselves on the
former type of response is objectively repre- far side of the page from the rest of their family,
sentative of self-centeredness. At the same time, or repetitively referring to ªIº in their sentence
the thematic content of both completions completions; indeed, they may not even be
suggests in a more subjective way certain aware of having responded in these ways.
underlying concerns, such as worries about By contrast, relatively structured objective
the welfare of the human race in the first tests measure personality characteristics in
instances and feelings of being personally direct ways that often have obvious interpretive
deprived in the second. significance. Adolescents who answer ªtrueº to
To bring these introductory observations full such MMPI-A statements as ªAt times I feel like
circle, the opportunities that projective methods smashing thingsº and ªI am easily downed in an
create for subjects to project aspects of argumentº will usually have a good idea of what
themselves into their responses has frequently they are indicating about themselves.
led to their being associated with psychoanalytic The distinction between subtle, indirect
theories of personality, in the context of which measurement of personality characteristics with
the notion of projection was first elaborated. projective techniques and relatively direct
However, there is no necessary relationship assessment through questionnaire methods
between psychoanalytic theory and projective has been formulated by McClelland, Koestner,
testing, nor is there any reason for clinicians and Weinberger (1989) in terms of differences
who conceptualize behavior in other ways to between self-attributed and implicit motives.
view projective methods as incompatible with According to McClelland et al., self-attributed
their frame of reference. The basic principle motives are measured by self-report instruments
underlying projective techniques is that some- and are influenced by social incentives in a
thing can be learned about people from person's external environment. Implicit mo-
sampling how they respond in ambiguous tives, however, are measured by such indirect
situations. This principle is not prisoner to techniques as story-telling procedures and are
any personality theory, and its utility transcends influenced by the internal pleasure derived from
the theoretical persuasions of individual exam- various activities in which a person engages.
iners. Inferences from projective data can be Whereas self-attributed motives are com-
couched equally well in psychodynamic, beha- paratively good predictors of immediate specific
vioral, cognitive, and humanistic terms, and the responses to structured situations, McClelland
use to which these inferences can be put depends et al. continue, implicit motives are compara-
less on theoretical differences in terminology tively good predictors of long-term trends in
than on the nature of the assessment issues being behavior across various types of situations.
addressed. Research findings described by McClelland et al.
confirmed that indirect assessments of under-
lying motives have greater validity for predict-
ing long-term trends in behavior than self-
4.15.4 VALUE OF PROJECTIVE report assessments of motives that people
ASSESSMENT directly attribute to themselves.
More recently, Bornstein (1995) has used a
Projective test data provide valuable infor- metaanalysis of 97 studies of measures of
mation about how people are likely to think, dependency to demonstrate further this differ-
feel, and act that is difficult to obtain from ence between objective and projective assess-
objective assessment procedures. This contribu- ment. With respect to differential prediction,
tion of projective methods to the personality according to Bornstein, available research
assessment process has both a conceptual and indicates that objectively measured dependency
an empirical basis. correlates better with symptoms and the
436 Projective Assessment of Children and Adolescents

diagnosis of dependent personality disorder differential diagnosis and treatment planning


than does projectively measured dependency, (Abraham, Lepisto, Lewis, & Schultz, 1994;
whereas projectively measured dependency Alvarado, 1994; Bornstein, 1995; Cramer &
correlates better with dependency-related Blatt, 1990; Exner & Andronikoff-Sanglade,
behaviors. 1992; Ornduff & Kelsey, 1996; Ronan, Colavi-
The conceptual analysis formulated by to, & Hammontree, 1993; Weiner, 1996; Weiner
McClelland et al. and elaborated by Bornstein & Exner, 1991).
has direct bearing on the contribution of Later in this chapter, specific information is
projective methods to personality assessment. presented concerning the psychometric founda-
As previously described, projective assessment tions and demonstrated corollaries of the
taps implicit motives and underlying personality projective measures most frequently used with
characteristics that may not be readily apparent young people. Suffice it to say in summary at
and may not be within a subject's conscious this point that these measures prove valuable in
awareness. Because these covert motives and personality assessment because they add in-
characteristics exert a powerful influence on formation that would otherwise be unavailable
long-term behavioral trends, this type of and because they withstand relatively well
indirect measurement adds an important di- efforts to exaggerate or conceal.
mension to personality evaluations that would
not be tapped in its absence.
Finally, with respect to what projective 4.15.5 UTILITY OF PROJECTIVE
methods contribute to assessment batteries, ASSESSMENT
the relative ambiguity of these methods makes
them less subjective than structured instruments Whereas the value of projective techniques
to influence by test-taking attitudes. This is not lies in the previously elaborated reasons why
to say that projective methods are immune to they should be used in personality assessment,
subjects' efforts to present themselves in a their utility relates to decisions concerning when
positive or negative light. The relatively open- these methods should be included in a test
ended nature of projective testing situations battery. The kinds of information provided by
and the dialogue they frequently elicit give projective test data indicate that projective
subjects abundant opportunity to voice atti- measures should be used whenever a thorough
tudes toward the tests, the examiner, and being personality assessment is considered relevant to
examined. However, as long as subjects con- formulating a differential psychodiagnosis or
tinue to give responses, neither their attitudes recommending alternative intervention strate-
nor their expression of them is likely to prevent gies. Because of their relatively unstructured
their projective test responses from revealing nature and indirect format, projective measures
their personality characteristics. Simply put, the balance a test battery by tapping personality
limited face validity of projective measures characteristics at a less conscious level than
makes them more difficult to fake than objective relatively structured measures. Assessments
measures, which means that they can balance a lacking such balance sample personality func-
test battery to particularly good effect when self- tioning from a limited perspective that will
presentation effects are of concern. usually fail to paint a complete picture of the
individual being examined. Batteries limited
4.15.4.2 Empirical Basis solely to projective techniques are similarly
imbalanced and ill-advised in comprehensive
Projective measures vary in the extent to personality assessments.
which they have been examined in well-designed The previously mentioned conceptualization
research studies, and in many instances ade- of McClelland et al. (1989) bears closely on the
quate empirical support for these instruments importance of a balanced test battery in clinical
has lagged behind the uses to which clinicians assessment. McClelland and his colleagues
sometimes put them. Nevertheless, the two most noted that measures of self-attributed and
frequently used projective methods, the implicit motives seldom correlate with each
Rorschach and the TAT, have for the past other and should not be expected to do so,
generation been among the three most fre- because they are measuring different aspects of
quently studied personality assessment instru- personality. Moreover, given that directly and
ments, exceeded in this respect only by the indirectly measured motives each predict certain
MMPI/MMPI-2 (Butcher & Rouse, 1996). For kinds of behavior better than the other, they
both the Rorschach and the TAT, substantial concluded that ªSeparate measures of self-
evidence has accumulated to attest their validity attributed and implicit motives may be com-
for describing aspects of personality structure bined to yield a better understanding and
and dynamics and applying these descriptions in prediction of certain types of behaviorº (p. 692).
Applicability of Projective Methods to Children and Adolescents 437

These formulations concerning different depending on the situational context in which


types of measures have subsequently been the behavior appears.
elaborated and confirmed for clinical purposes Similarly, subjects in some circumstances
with respect to relationships between the may produce a clinically unremarkable
Rorschach and the MMPI. Rorschach structur- Rorschach while showing numerous elevations
al variables and MMPI scales have been found on the clinical and content scales of the MMPI-
to show only a few modest correlations in both A. Such divergence is best understood not as
adult and adolescent samples (Archer & error variance, but as a possible clue to the
Krishnamurthy, 1993a, 1993b). At the same psychological stance of subjects whose degree of
time, however, apparent contradictions be- disturbance is minimal but who, when asked
tween Rorschach and MMPI findings have about themselves in language they can under-
been conceptualized by Weiner (1993, 1995b) stand, want to make sure that others fully
not as invalidating either instrument or challen- appreciate whatever problems and concerns
ging the incremental utility of administering they do have. Further illustrations of the clinical
them in tandem, but rather as generative data. utility of divergence as well as convergence
Specifically, Weiner argues, apparently dis- between a projective measure, such as the
crepant findings between personality assess- Rorschach and an objective measure such as
ment instruments of different kinds can be the MMPI-2, are provided by Finn (1996) and
generative by virtue of complementing each Ganellen (1996).
other. Whereas findings on two tests that concur
in suggesting the same personality characteristic
are confirmatory and support definite conclu- 4.15.6 APPLICABILITY OF PROJECTIVE
sions, he continues, findings that diverge raise METHODS TO CHILDREN AND
important questions to which they may also ADOLESCENTS
suggest helpful answers, especially if one of the
tests is a relatively structured and the other a Except for an occasional example, this
relatively unstructured instrument. chapter has thus far made no specific reference
Consider, for example, a youngster who to young people. This apparent oversight is
appears depressed on the Rorschach, with a warranted by the fact that the nature of
high depression index (DEPI), but does not projective methods, the way in which they
elevate on Scale 2 or the depression content function, the kinds of information they provide,
scale of the MMPI-A. This divergent finding and the reasons for using them, are identical for
could well provide a useful clue to the persons of almost all ages. Hence the discussion
adolescent's having an underlying or emerging of projective assessment to this point is as
depression that is not yet being keenly felt or applicable to children and adolescents as to
manifest in well-structured situations. Alterna- adults, and requires no modification or quali-
tively, it could be that the subject is trying to fication as our focus now shifts specifically to
deny or repress depressive affects and cogni- young people.
tions, or is making a conscious decision not to Indeed, assessors who have learned to
report manifestations of depression that never- interpret projective test data provided by adults
theless emerge in the absence of supportive do not need to learn any new ways of working
structure or are revealed when the subject is with the data should they begin to examine
uncertain how to conceal them. children and adolescents. By and large, the basic
The situation described in this example is interpretive conclusions and hypotheses that
familiar to assessment psychologists, who not attach to projective test variables apply regard-
infrequently work with psychologically less of the age of the subject. Whether they are
troubled adolescents who can remain reason- age 8, 18, or 80, subjects who see numerous
ably comfortable and controlled in relatively human figures on the Rorschach are likely to be
structured situations but become upset and quite interested in people; those who give long
disorganized in relatively unstructured situa- TAT stories are likely to be verbose; those who
tions and who may accordingly produce a refer frequently to themselves in sentence
benign MMPI-A protocol and a disturbed completions are likely to be self-centered; and
Rorschach. In such circumstances the objective those who draw grotesquely distorted human
measure has not erroneously overlooked psy- figures probably harbor some disturbing con-
chopathology, nor has the projective measure cerns about their own nature or that of other
mistakenly exaggerated it. Instead, the two people.
types of test have combined in complementary However, in order to determine the implica-
fashion to provide valid information concerning tions of these and other personality character-
the subject's likelihood of behaving in a istics suggested by projective test data,
relatively adaptive or maladaptive fashion, examiners assessing young people must take
438 Projective Assessment of Children and Adolescents

into account normative developmental expecta- 100 nonpatient young people at each age from 5
tions. For example, the data of developmental to 16 (Exner & Weiner, 1995, chap. 3).
psychology indicate that children are more self- Aside from identifying needs for further
centered than adults, and subsequently become research, an analysis of available data can guide
increasingly aware of and concerned about the clinicians in choosing which projective methods
needs of others as they grow through adoles- to include in a battery for assessing a young
cence and approach maturity. Accordingly, test person's functioning. The more thorough and
data that identify a high degree of self- reliable the normative developmental data
centeredness may imply maladaptive narcissis- available for the instrument, the better the
tic personality traits in an adult, but reflect choice it will make. Similarly, the better
normal development and adaptation in a child; established an instrument's correlates are in
conversely, minimal self-centeredness may in- relation to behaviors that are central to the
dicate altruism and good adjustment in an adult purpose of an assessment, the more reason there
but suggest deviant development and low self- is to include it. Thus an instrument that has been
esteem in a child. demonstrated to be particularly helpful in
Developmental psychology similarly pro- identifying youthful depression may be a good
vides some normative expectations for how choice in one case, whereas an instrument
children are likely to make drawings. Preschool known to be especially sensitive in revealing
age children commonly draw with what is called family dynamics may be a good choice in
ªintellectual realism,º which means that they another case.
draw what they know to be there regardless of Similarly, available empirical data and re-
whether it would actually be visible. Thus, in x- ported clinical experience should be drawn on to
ray fashion, young children often draw trans- determine whether a particular instrument is
parencies, such as people who are visible likely to yield useful information concerning the
through walls (Di Leo, 1983). At about age 7 personality functioning of individuals at certain
or 8, this intellectual realism gradually gives way ages. Thus the children's apperception test
to ªvisual realism,º in which what is drawn (CAT) depicting animal figures may be a more
resembles what realistically can be seen. Di Leo effective story-telling measure for a young child
(1983, p. 38) observes that this developmental than the TAT, but certainly not for an
shift mirrors a metamorphosis in thinking from adolescent (Bellak, 1993, p. 237).
an egocentric to an increasingly objective view As these observations indicate, projective
of the world. Hence a human figure drawing by methods provide sound clinical data only if they
a preschool child showing a belly button are employed in appropriate ways. First,
through clothing is much less likely to imply examiners should have recourse to standardized
maladaptive functioning that the same drawing procedures for administering and scoring any
done by an adolescent. test they use. Lack of such standard methodol-
As these examples indicate, familiarity with ogy compromises the value of the data obtained,
and adequate attention to normative expecta- and inattention to standardized methods by
tion hold the key to valid and useful applications examiners who opt instead for personalized
of projective methods in the assessment of approaches to administration and scoring is
young people. Ideally, projective methods clinically disadvantageous and professionally
manuals should include normative reference questionable.
data that delineate quantitative as well as Second, clinical interpretations should be
qualitative expectations for such developmental derived from test variables with demonstrated
phenomena as maturational changes in self- reliability and validity. Inadequate psycho-
centeredness. Regrettably, even though numer- metric foundations limit the use to which test
ous projective test variables have been quanti- data can be put, and examiners who draw
fied in various ways, little progress has been conclusions in the absence of supporting
made in generating age-graded norms for them. empirical evidence, without framing such con-
The main exception to this dearth of clusions as speculative hypotheses, are doing
normative developmental data for projective their patients and their methods a disservice.
techniques is the Rorschach. Developmental Third, the adequacy of projective assessment
trends in Rorschach responses from early of young people will be limited in the absence
childhood through adolescence were initially of normative reference data for test responses of
charted many years ago by Ames and her both adjusted and maladjusted children and
colleagues (Ames, Metraux, Rodell, & Walker, adolescents and for developmental changes in
1974; Ames Metraux, & Walker, 1971). More these responses over time.
recently, the Rorschach comprehensive system This chapter continues with reviews of the
has provided reference data for each of its major inkblot, story telling, figure drawing, and
codified variables on samples of approximately sentence completion methods used in assessing
Review of Projective Assessment Methods 439

young people. The composition, administra- see are not anything in particular, but that
tion, and scoring of each of these measures are people see many different things in them and
described; what is known about their reliability, that their task will be to indicate what the
validity, and normative database is reported; inkblots look like to them. The 10 cards are then
and the clinical purposes they are likely to serve given to subjects one at a time with the
are discussed. The specific measures reviewed instruction ªWhat might this be?º Requests
are selected primarily on the basis of their for structure (e.g., ªCan I turn the card?º) are
emphasis and frequency of use in clinical and deflected back to the subject (e.g., ªIt's up to
school settings, as reported in surveys by youº; ªAny way you likeº). The unguided
Archer, Imhof, Maruish, and Piotrowski responses to the 10 cards constitute the free
(1991), Elbert and Holden (1987), Hutton, association phase of the administration, follow-
Dubes, and Muir (1992), Kennedy, Faust, ing which there is an inquiry phase in which the
Willis, and Piotrowski (1994), Piotrowski and examiner reads back each response and asks
Keller (1989), Stinnett, Havey, and Oehler- subjects where they saw it and what made it look
Stinnett (1994), and Watkins, Campbell, Nie- as it did. The purpose of the inquiry is to
berding, and Hallmark (1995). facilitate coding of the structural features of the
responses, and associations during this phase
are not requested or encouraged. Responses are
4.15.7 REVIEW OF PROJECTIVE recorded verbatim, however, and the content of
ASSESSMENT METHODS any spontaneous thematic elaborations is care-
4.15.7.1 Rorschach Inkblot Method fully noted.
Numerous approaches to codifying
The Rorschach inkblot method comprises 10 Rorschach responses have emerged during the
cards that are inked in shades of black and gray long history of this instrument. However, for
(five cards); black, gray, and red (two cards); many years the comprehensive system of Exner
and various pastel colors (three cards). The (1993) has been by far the most widely used and
cards are reproduced in standard fashion, but researched (Piotrowski, 1996). Rorschach re-
the inkblot stimuli were originally designed at sponses are coded in the comprehensive system
random and do not portray any specific objects for various aspects of where percepts are seen
(Rorschach, 1942). When subjects respond to (location), why they look as they do (determi-
the Rorschach, they draw on the shape, shading, nants), what they consist of (content), how
and color of the blots to form impressions of commonly they occur (form level and populars),
what they might be, and in so doing they treat and whether they involve pairs of objects,
the instrument as a cognitive±perceptual task organization of parts, or special kinds of
(e.g., ªIt looks like a bat, because it's got a body elaborations, such as cooperative or aggressive
here and wings here and it's blackº). In interaction. These codes are than tallied and
addition, subjects frequently elaborate their combined in various ways to yield a large
responses beyond the stimulus properties of the number of indices, ratios, and percentages that
blots, and in so doing they treat the instrument guide the interpretive process, as elaborated in
as an associational task (e.g., ªThis bird is flying detail by Exner (1991, chaps 5±10).
around looking for something to eatº).
The cognitive±perceptual aspects of re-
sponses constitute structural data in Rorschach 4.15.7.1.2 Psychometric foundations
assessment and provide representative indica- The psychometric foundations of an assess-
tions of the resources and coping style that a ment instrument comprise the extent to which it
person generally brings to bear in problem- can demonstrate adequate interscorer agree-
solving situations. The associational aspects of ment, reliable measurement, valid correlates,
responses constitute thematic data in and a representative normative database. The
Rorschach assessment and provide symbolic Rorschach inkblot method, as already indicated
clues to the underlying needs, attitudes, con- in part by examples used earlier in the chapter,
flicts, and concerns that are likely to influence a rests on a solid psychometric basis. Interscorer
person's actions and state of mind. The basic agreement for the types of variables coded in the
nature of the Rorschach in these respects is comprehensive system typically ranges from
discussed further by Exner and Weiner (1995, 80% to 100%. The reliability of Rorschach data
chap. 1) and Weiner (1986, 1994). has been demonstrated in a series of retest
studies conducted over intervals ranging from
seven days to three years and involving child,
4.15.7.1.1 Administration and scoring
adolescent, and adult subjects. Most of the core
The Rorschach is introduced by telling variables associated with trait dimensions of
subjects that the inkblots they are about to personality show stability coefficients greater
440 Projective Assessment of Children and Adolescents

than 0.80 in these studies, and some, including personality structure, the Rorschach has proved
the affective ratio and the egocentricity index, especially helpful in identifying and quantifying
consistently hover around 0.90 (Exner, 1991, states of subjectively felt distress that combine
pp. 459±460; Exner & Weiner, 1995, pp. 21±27; elements of anxiety and depression and in
McDowell & Acklin, 1996; Weiner, 1997). reflecting trait dimensions of how people
The validity of Rorschach assessment was typically think, process information, handle
confirmed in a series of metaanalytic studies emotions, manage stress, feel about themselves,
that led Parker, Hanson, and Hunsley (1988) to and relate to others. Regarding personality
conclude that the Rorschach meets usual dynamics, the thematic content of Rorschach
psychometric standards for validity and is responses, as previously noted, is often quite
comparable to the MMPI in this respect. revealing of underlying needs, attitudes, con-
Specifically, Parker et al. used the effect sizes flicts, and concerns that influence how people
reported in 411 studies to derive population are likely to think, feel, and act at particular
estimates of convergent validity of 0.41 for the points in time and in particular situations.
Rorschach and 0.46 for the MMPI. Subsequent In addition, Rorschach data can frequently
further confirmations of the validity of this contribute to differential diagnosis in clinical
instrument are noted by Weiner (1996). settings. The comprehensive system provides
With respect to its normative database, indices for schizophrenia and depression
available information for the comprehensive (DEPI) that can help to identify these conditions
system includes data on 700 nonpatient adults in children and adolescents as well as in adults;
demographically representative of the 1980 US for basic deficits in coping capacity that point to
census, 1390 nonpatient children and adoles- developmental arrest in young people; and for
cents age 5 to 16, and large samples of numerous features of conduct and anxiety and/
schizophrenic, depressed, and character disor- or withdrawal disorders (Exner & Weiner, 1995,
dered patients (Exner, 1993, chap. 12). In chaps 5±8; Weiner, 1986). Rorschach findings
addition, longitudinal data reported by Exner, have also demonstrated considerable clinical
Thomas, and Mason (1985) on a group of young utility in the treatment process by clarifying
people tested every two years from age 8 to 16 treatment targets, identifying potential obsta-
provide useful reference information concern- cles to progress in therapy, and providing a basis
ing developmental stability and change in for evaluating treatment change and outcome
Rorschach variables during childhood and (Abraham et al., 1994; Weiner, 1994).
adolescence.
As implied by the nature of the normative
data, the Rorschach comprehensive system is 4.15.7.2 Thematic Apperception Test
applicable to young people from age five.
Preschool age children have ordinarily not yet The most widely known and used story telling
matured sufficiently to deal with the cognitive± technique is the TAT. It was developed by
perceptual aspects of the Rorschach situation in Morgan and Murray (1935) in the belief that the
ways that lend themselves to the codification content of imagined stories would provide clues
that is central to the comprehensive system to the underlying dynamics of a subject's
interpretive process. An unusually mature four- interpersonal relationships and self-attitudes.
year-old might on occasion produce a useful As elaborated by Murray (1943, 1971) and
record, and immature five- and six-year-olds Bellak (1993, chap. 4), TAT data are expected to
may produce records that have limited inter- reveal the hierarchy of a person's needs and the
pretive significance within the framework of the nature of his or her dominant emotions and
comprehensive system. Working within other conflicts.
frameworks, Ames et al. (1974) discuss and The TAT stimuli comprise 19 black-and-
provide some normative findings for Rorschach white illustrations of people or scenes and one
responses of young children, and Leichtman blank card. The cards are intended for use with
(1996) has recently presented a developmental persons age five or older of both genders, and
rationale for deriving information from the for nine of the cards there are alternate versions
records of preschoolers. for use with adult and child/adolescent males
and with adult and child/adolescent females.
Because of the time required to administer the
4.15.7.1.3 Clinical utility
full set of TAT cards, examiners typically select
In common with projective techniques in a subset of 8±12 cards that they anticipate will
general, the Rorschach serves clinical purposes elicit themes relevant to the assessment issues in
primarily as a result of the information it a particular case. The themes usually elicited by
provides about the structure and dynamics the individual cards and the selection of subsets
of personality functioning. With respect to suited for children and adolescents are reviewed
Review of Projective Assessment Methods 441

by Bellak (1993, chap. 3), Dana (1985), and environment is conceived, the identity and
Obrzut and Boliek (1986). Regrettably with intentions of other figures in the story, the
respect to standardization, however, there are nature of any anxiety or other affect that is
no specific short forms of the instrument, and being experienced, the nature of any conflict
how many and which cards are typically chosen that is described or suggested, the ways in which
vary from one examiner to another and from conflicts and fears are defended against, the
one examination to the next. ways in which misbehavior is punished, and the
level of ego integration.
With respect to research studies, the most
4.15.7.3 Administration and scoring productive utilization of the TAT has derived
from quantitative scoring systems developed by
The TAT cards are given to subjects one at a McClelland, Atkinson, and their colleagues to
time with instructions to make up a story for measure needs for achievement, affiliation, and
each picture that includes (i) what is happening power (Atkinson & Feather, 1966; McClelland,
at the moment, (ii) what the characters are Atkinson, Clark, & Lowell, 1953). Although
thinking and feeling, (iii) what led up to the scoring for achievement, affiliation, and power
situation, and (iv) what the outcome will be. motivation has had little clinical impact, other
The narrated stories are recorded verbatim by schemes for coding specific personality char-
the examiner. acteristics reflected in TAT thematic content
Murray (1943) originally proposed a scoring have subsequently emerged. These include
scheme in which each TAT story is rated for the scales for level of ego development (Sutton &
presence and strength of a long list of needs that Swenson, 1983), preferred defense mechanisms
are being experienced by the central figure in the (Cramer, 1987), quality of interpersonal affect
story and presses that are being exerted by the (Thomas & Dudek, 1985), problem-solving
environment. This scoring system proved too style (Ronan et al., 1993), and object relations
elaborate and time consuming for clinical work, capacities (Westen, Lohr, Silk, Kerber, &
and numerous alternative approaches to clinical Goodrich, 1985).
interpretation were subsequently developed for Particularly promising among these is the use
the instrument. As reviewed by Chandler of the TAT to assess aspects of object related-
(1990), Murstein (1963), and Vane (1981), some ness through the Westen et al. (1985) measure,
of these interpretive approaches, like Murray's, known as the social cognition and object
have consisted of formal quantitative ratings of relations scale (SCORS). The SCORS provides
story characteristics. However, most interpre- quantitative indices of the affective tone sub-
tive approaches have eschewed quantitative jects ascribe to relationships, their capacity for
scoring in favor of qualitative analyses of story emotional investment in relationships and social
content, and no scoring system has gained standards, their understanding of social caus-
widespread use either clinically or in research ality, and the complexity of their representa-
studies. tions of people. By including ratings of subjects
The most commonly employed methods of along dimensions of maturity as well as
interpreting the TAT in clinical practice appear normality/pathology, the SCORS is proving
to be variations of an ªinspection techniqueº especially relevant to the assessment of young
proposed by Bellak (1993, chap. 4). This people (Westen et al., 1991).
technique consists simply of reading through
subjects' stories to identify repetitive themes
and recurring elements that appear to fall
4.15.7.3.1 Psychometric foundations
together in meaningful ways. Because this
approach lacks any quantification and rests Efforts to demonstrate the reliability and
on the capacity of individual examiners to relate validity of global approaches to interpreting the
story themes and elements to aspects of TAT have been handicapped by the previously
personality functioning, Dana (1985) was noted proliferation of scoring systems, by
moved to observe that ªTAT interpretation clinicians' preferences for a strictly qualitative
has become a clinical art formº (p. 90). and uncoded approach to the data, and by
Bellak's influential approach stresses 10 enormous variation in how the test is adminis-
aspects of a story, each of which is taken to tered, including which subset of cards is selected
have implications for how subjects view and are for use. Because of this long-standing lack of
likely to deal with interpersonal events and what standardization, there has been little opportu-
they anticipate the future to hold for them. nity for systematic accumulation of data
These include the main theme of the story, the bearing on the reliability and validity of the
identity of the central figure or hero, the main TAT in general, nor has it been possible to
needs of the hero, the way in which the develop any substantial normative database.
442 Projective Assessment of Children and Adolescents

Accordingly, for both inspection techniques Accordingly, TAT findings will usually not
and overall scoring systems developed in the add very much to structural diagnosis of
tradition of Murray, the psychometric literature adjustment problems in young people, but they
on the TAT is generally acknowledged to can be extremely helpful in suggesting possible
comprise a mix of positive and negative findings dynamic origins of adjustment problems. In this
that cannot easily be compared with one regard, the psychometrically sound SCORS
another. Hence, despite the widespread use of may be a useful scale to include in forensic
inspection techniques in clinical practice, assessment batteries when issues of custody or
neither these nor other global approaches have adoption are being addressed. This TAT scale
been demonstrated to show adequate psycho- can frequently assist examiners in grasping a
metric properties. young person's representations of people and
However, research studies with TAT scales his or her capacities for emotional investment in
developed to measure specific personality relationships.
characteristics have demonstrated that the The thematic content of TAT stories has
instrument can generate reliable and valid additional potential to facilitate planning and
findings when it is used in a standardized conducting psychotherapy with young people,
manner. The previously mentioned scales of particularly with respect to identifying treat-
Cramer (1987) and Westen et al. (1985) are cases ment targets and monitoring progress in
in point. Cramer's scale reliably identifies three therapy. The TAT can also be used in treatment
major mechanisms of defenseÐdenial, projec- as a play therapy tool, as in Gardner's (1971)
tion, and identificationÐand has shown valid story telling technique. For example, after a
corollaries in changes observed in patients youngster has told TAT stories, the therapist
undergoing psychotherapy (Cramer & Blatt, and child can act out the stories in play, or the
1990). The Westen et al. SCORS has been found therapist can create stories to the same picture
to provide reliable identification of develop- stimuli for comparisons with the child's stories.
mental variables related to disturbed object Hoffman and Kupperman (1990) describe such
relations in children and, as already mentioned, an intervention with a 13-year-old boy in which
is therefore especially relevant to the assessment both therapists wrote stories to the same TAT
of young people. Validation studies with cards to which the patient had responded. As it
SCORS have involved psychiatrically dis- turned out, Hoffman's stories emphasized the
turbed, borderline, physically abused, and main character's maladaptive coping mechan-
sexually abused young people (Freedenfeld, isms, whereas Kupperman's stories emphasized
Ornduff, & Kelsey, 1995; Ornduff, Freedenfeld, positive and healthy aspects of the central
Kelsey, & Critelli, 1994; Westen, Ludolph, character's coping capacities. Over a number of
Block, Wixom, & Wiss, 1990; Westen, Ludolph, sessions, this boy and his therapist engaged in
Lerner, Ruffins, & Wiss, 1990). discussions concerning whose version of the
story was most accurate.

4.15.7.3.2 Clinical utility


4.15.7.4 Children's Apperception Test
The clinical utility of the TAT lies mainly in
its potential for elucidating dynamic aspects of Consistent with the purpose of the TAT,
personality functioning, particularly with re- Bellak (1993, chap. 13) developed the CAT to
spect to the feelings and attitudes that subjects facilitate understanding of personality processes
hold towards other people, themselves, and in children, including their ªdynamic way of
possible turns of fortune in their lives for better reacting to and handling the problems of
or worse. Based on the assumption that children growthº (Bellak & Siegel, 1989, p. 102). The
and adolescents identify with the central figures CAT pictures were designed to elicit fantasies
in their TAT stories and project fantasies and about aggression, sibling rivalry, fears of being
realities regarding their own lives into the events alone at night, attitudes toward parental figures,
and circumstances they describe, the obtained and eating problems.
data can shed light on a broad range of The CAT-Animal (CAT-A) form, originally
underlying influences on how young people published in 1949 and designed for children
are likely to think, feel, and act. 3±10 years old, consists of 10 pictures depicting
As previously noted in commenting on the animals in human situations. The use of animal
research of McClelland et al. (1989), the implicit figures was based on the assumption that young
types of motives measured by the TAT are more children identify more readily with animals than
likely to correlate with persistent dispositions to with people and will accordingly tell more
behave in certain ways rather than with meaningful stories about animal than human
immediate actions or symptom formation. figures. Moreover, according to Bellak (1993,
Review of Projective Assessment Methods 443

chap. 13), the use of animal figures makes the home situation and the nature of any recent
CAT-A a culture-free test that is equally or impending crises in their lives.
applicable to Caucasian, African-American,
and other minority group youngsters as well
4.15.7.4.2 Psychometric foundations
as to children from different countries, except
where there is little familiarity with some of the There has regrettably been little accumula-
inanimate objects depicted, such as bicycles. tion of empirical data bearing on the reliability
There is also a human form of the CAT (the and validity of the CAT. The widespread use of
CAT-H) that was developed by Bellak and Bellak's qualitative inspection technique in
Hurvich (1966) in response to criticism of the CAT interpretation and a corresponding lack
assumption that children identify more easily of quantification have precluded examination
with animal than with human figures. Studies of the instrument's psychometric foundations.
reviewed by Bellak and Hurvich indicate little Although it has sometimes been suggested that
difference in stimulus value between the original the idiographic nature of CAT as well as TAT
CAT-A and the CAT-H, in which human data makes traditional psychometric criteria
figures are substituted for the animals in the difficult to apply or even irrelevant, there is
CAT-A scenes. However, the CAT-H does not nothing in the nature of the data generated by
appear ever to have become much used in story telling techniques that prevents their being
clinical practice. reliably coded for various types of feelings,
motives, attitudes, and capacities that can in
turn be validated against meaningful correlates.
4.15.7.4.1 Administration and scoring
The previously noted development of psycho-
Children being administered the CAT are metrically sound TAT scales for such specific
told that they are going to take part in a game in aspects of personality as achievement motiva-
which they will tell stories about pictures. tion and social cognition proves the point that
Subjects who appear to regard the CAT as a clinical interpretation of stories can go beyond
test are informed that it is not the type of test in being an art form and attain respectability as a
which they will be graded for correct or scientific procedure as well.
incorrect answers. Standard procedures call Regarding what research is available con-
for all 10 CAT pictures to be administered in cerning the CAT, Bellak (1993, chap. 16)
numerical order, from Card 1 to Card 10. provides a review of studies comparing the
Children are told to narrate what the animals responses typical of children at different ages
are doing in the pictured scenes and are asked at and examining special features in the stories of
appropriate points to say what went on maladjusted, schizophrenic, speech disordered,
previously and what will happen next. The retarded, brain-damaged, and chronically ill
examiner encourages and prompts the subject as children. Almost all of these studies date from
necessary but avoids being suggestive or asking the 1950s, however, and none provides an
leading questions. Examiners may also query adequate basis for developing any formal
each story by asking the child to elaborate normative standards or diagnostic guidelines
specific points such as the ages of characters and for the instrument.
why they were given particular names.
In clinical work the CAT is typically inter-
4.15.7.4.3 Clinical utility
preted along the lines proposed by Bellak for the
TAT, that is, with an inspection technique used As in the case of the TAT, the CAT is useful in
to form qualitative impressions of various clinical assessment primarily as a source of
dimensions of the subject's personality func- hypotheses concerning subjects' personality
tioning (Bellak, 1993, chap. 14). There is an dynamics, particularly with respect to how they
alternative but rarely used quantitative ap- view themselves and other important people in
proach developed by Haworth (1965), called the their lives, the nature of their hopes and fears,
schedule of adaptive mechanisms, in which and what they expect will happen to them.
CAT responses are rated numerically for the Although not suitable for adolescents, the CAT
degree of adaptability or disturbance they is more useful than the TAT in work with
reflect. Haworth also stressed the importance younger children, who are likely to relate more
of recognizing that young children are highly easily to the familiar situations and youthful
reactive to the immediate circumstances in their figures depicted in the CAT illustrations than to
lives and less likely than adolescents or adults to the primarily adult figures and unpopulated
have formed well-established personality traits. scenes shown in the TAT.
She accordingly emphasized careful interpreta- Like the TAT, the CAT may also contribute
tion of CAT responses in the context of to treatment planning, by suggesting areas of
adequate information concerning subjects' concern on which to focus in the therapy, and it
444 Projective Assessment of Children and Adolescents

may itself serve as a play technique. In have implications for personality characteris-
diagnostic assessments, however, hypotheses tics. Eight of these are adaptive scales that relate
generated by this instrument require support to thematic indications of reliance on others,
from other data prior to being addressed to giving support to others, supporting oneself,
questions that necessitate empirical decision limit-setting by authority figures, identification
making. of problem situations, and resolving problems
in unrealistic, constructive, or particularly
insightful ways. The other five are clinical scales
4.15.7.5 Roberts Apperception Test for Children that pertain to thematic manifestations of
anxiety, aggression, depression, experiences of
The Roberts apperception test for children rejection, and inability to resolve problems. The
(RATC) is intended for use with young people test manual provides guidelines and examples
of ages 6±15 and was designed to improve on the that promote reliable scoring of each of these
TAT and CAT by presenting familiar stimuli scales.
and employing a standardized scoring system To facilitate interpretation of the data, the
(McArthur & Roberts, 1990). raw scores for each of the 13 profile dimensions
Instead of illustrations primarily of adults, are summed over the 16 cards and then plotted
such as those used in the TAT, or illustrations of on a normatively scaled profile form to yield a
animals, as used in the CAT, the RATC visual representation of the data, much in the
primarily portrays child and adolescent figures manner of an MMPI-A profile. The interpretive
engaged in everyday interactions, including yield of the data is further enriched by use of an
scenes of parental affection, disagreement, interpersonal matrix, which consists of a tabular
school and peer relationships, and observation representation of the frequency of convergence
of nudity. There are 27 RATC cards, 11 of between particular scales and the figures
which are alternate versions for male or female identified (e.g., cooccurrence of themes of
subjects. There is an addition an alternate set of reliance on others with description of interac-
cards portraying African-American individuals tion with a maternal figure).
in similar scenes.

4.15.7.5.2 Psychometric foundations


4.15.7.5.1 Administration and scoring
As with the careful coding of specific
The standard 16 cards that compose the variables on the TAT, the development of
RATC are administered individually in numer- standardized administration and scoring pro-
ical order, using male or female versions as cedures for the RATC has demonstrated that
appropriate. Subjects are instructed to make up story telling projective methods can achieve
a story about each picture and to tell what is psychometric respectability. McArthur and
happening in the picture, what led up to the Roberts (1990) report inter-rater agreement
scene, how the story ends, and what the people ranging from 0.80 to 0.93 in the scoring of the
are talking about and feeling. Responses are various dimensions in their system and split-half
recorded verbatim. reliabilities ranging from 0.44 to 0.86 on their
If subjects tell an incomplete story or omit profile scales, with half of these scales showing
certain aspects, such as how the characters are reliability coefficients of 0.73 or higher. The
feeling, additional inquiry may be used to help adaptive scales appear to work better in this
teach them to give scorable responses. This regard than the clinical scales: only one of the
inquiry may be used liberally with the first two five clinical scales (inability to resolve problems)
cards but not thereafter, which means that shows a split-half reliability greater than 0.55,
responses on cards 3 through 16 may at times whereas all but one of the eight adaptive scales
have limited scorable data. The possibility of (reliance on others) shows a split-half reliability
limited scorable data is the price to be paid for above 0.60.
maintaining careful standardization of the Validity studies conducted with 200 non-
RATC procedure. McArthur and Roberts patient and 200 outpatient youngsters aged 6±15
sanction deviations from the standard proce- indicate that subjects generally tell stories that
dures only in specific instances, such as the fit the expectations for the cards for example,
examination of severely disturbed children. that a card intended to depict parental affection
When standard procedures are not or cannot typically elicits stories involving affection. In
be followed, they recommend considerable addition to thus showing content validity, the
caution in using the scoring procedures. RATC profiles have been found to distinguish
The scoring system for the RATC consists between well-adjusted and clinic youngsters, to
mainly of coding the content of each story for be more resistant to efforts at manipulation
the presence or absence of 13 profile scales that than self-report measures, and to correlate well
Review of Projective Assessment Methods 445

with a behavior problem checklist completed by required to administer and score it in the
subjects' parents (McArthur & Roberts, 1990; standardized manner and, more generally, that
Worchel, Rae, Olson, & Crowley, 1992). psychologists should if at all possible take
Significantly, however, in the comparisons whatever time is necessary to use standardized
between well-adjusted and clinic youngsters, instruments when they are available.
significant differences were found on all eight of The alternate form of the RATC involving
the adaptive scales and on two of the five clinical African-American figures may prove useful in
scales, but not on the clinical scales for anxiety, assessing African-American young people.
aggression, and depression. However, there are as yet no data establishing
With respect to its normative database, the the reliability and validity of this alternate form,
RATC manual provides the mean and standard nor have any representative normative data
deviation for each of the profile scales for 200 been published for it. Hence the RATC should
nonpatient youngsters divided by gender and by be used cautiously in multicultural assessments,
four age categories (6±7, 8±9, 10±12, and 13±15). for which at present the most promising
instrument is Tell-me-a-story.
4.15.7.5.3 Clinical utility
Similar to the TAT, CAT, and other story- 4.15.7.6 Tell-me-a-story
telling measures, the RATC contributes to
personality assessment of young people primar- The Tell-me-a-story (TEMAS) was designed
ily by casting light on their underlying attitudes as a multicultural story-telling test based on the
and concerns. Although the measure includes concept that personality development occurs
specifically designated clinical scales that might within a sociocultural system in which indivi-
be expected to assist in clinical diagnosis as well duals internalize the cultural values of their
as dynamic analysis of a youngster's personality family and society (Costantino, Malgady, &
functioning, its clinical scales as noted appear Rogler, 1988). The TEMAS is intended for use
less reliable than its adaptive scales and less with African-American, Hispanic, and Cauca-
capable of discriminating between patient and sian children and adolescents aged 5±18 and
nonpatient populations. comes in two parallel sets, one for minority and
The RATC is especially valuable to clinicians the other for nonminority youngsters. The two
as well as researchers by virtue of its careful sets feature either predominantly Hispanic and
standardization and adequate psychometric African-American characters or predominantly
properties. Aside from facilitating the inter- nonminority characters all shown in urban
pretive process and providing the basis for environments. Each set comprises 23 cards, 11
systematic accumulation of data, these features of which have alternate versions for males and
of the instrument make it particularly attractive females and one of which has alternate versions
to examiners conducting forensic evaluations. for children and adolescents.
Having used the RATC in their projective As a distinctive feature of the TEMAS, many
assessment of a child or adolescent, rather than of the pictures portray a split scene showing
the TAT or CAT, psychologists will be better contrasting or conflicting intrapersonal and
prepared to justify their procedures and con- interpersonal situations that require some
clusions when giving testimony as an expert resolution, much in the manner of Kohlberg's
witness. (1976) moral dilemma stories. For example, one
However, examiners may sometimes question side of a scene may depict apparent delay of
whether the information they get from the gratification, and the other side an inability to
RATC warrants the amount of time required to delay gratification. How subjects resolve this
administer it, which in our experience approx- conflict in their stories speaks to the adaptive-
imates an hour on the average. For some young ness of their personality functioning and their
children, moreover, there may be difficulties in stage of moral development.
sustaining their investment in the task. There is The TEMAS provides quantitative scales for
no short form of the RATC to use and no measuring the adequacy of subjects' adaptation
shortcut, such as selecting just a few cards to use with respect to nine aspects of personality
or discontinuing a protracted administration functioning: interpersonal relations, aggression,
before giving all 16 cards. Doing so eliminates anxiety/depression, achievement motivation,
the standardization of the instrument and delay of gratification, self-concept, sexual
prevents any meaningful scoring or compar- identity, moral judgment, and reality testing.
isons with normative data, which means that the There are also quantitative scales for four
test loses its advantage over a TAT or CAT cognitive functions related to how individual
interpreted by inspection. Our view in this process information (reaction time, total time,
matter is that the RATC warrants the time fluency as reflected in the number of words used
446 Projective Assessment of Children and Adolescents

in a story, and omissions of relevant visual 4.15.7.6.2 Psychometric foundations


details) and for four affective functions as
indicated by mood states attributed to the main As described in the test manual (Costantino,
characters in a story (happy, sad, angry, and Malgady, & Rogler, 1988), the TEMAS was
fearful). These quantitative scales are supple- carefully developed and standardized over
mented by several qualitative indicators used to several years prior to its publication. Unfortu-
describe various other characteristics of the nately, except for continued work by the test's
stories. authors, much of it in the form of paper
As stressed by Costantino, Malgady, and presentations, there has been little published
Rogler (1988), the TEMAS was developed to research concerning the psychometric adequacy
overcome limitations of traditional thematic of the instrument. On balance, however, the
apperception tests and differs from the TAT in data provided in the test manual appear to
several significant ways. These include a focus indicate a promising beginning in demonstrat-
on interpersonal relationships rather than on ing its soundness.
intrapsychic dynamics; the use of personally With respect to inter-rater agreement, studies
relevant and culturally sensitive stimuli that with Hispanic, African-American, and Cauca-
emphasize meaning rather than ambiguity; the sian subjects have indicated that both the
representation of both positive and negative minority and nonminority versions of TEMAS
poles of emotions, cognitions, and interpersonal can be scored reliably, with agreements between
functions, as opposed to the heavy weighting of trained examiners generally ranging from 75%
the TAT stimuli with representations of depres- to 95% across the various scales. Regarding
sion, gloom, anger, and hostility; and the issues of reliability, on the long form of the
introduction of the joint depiction of contrast- TEMAS 11 of the 17 personality, cognitive, and
ing circumstances to elicit expressions of affective scales showed internal consistency
conflict resolution and moral judgment. (alpha) correlations of 0.74 or higher in the
standardization data, and the median for all 17
was 0.83. Internal consistency was lower on the
short form, with a median of 0.68. The internal
4.15.7.6.1 Administration and scoring consistency data for the short form must be
Subjects are administered either the minority considered preliminary, however, because in
or nonminority form of the TEMAS, and this analysis the short form scores were
examiners can choose between a long form, extracted from the protocols of subjects who
which comprises all 23 cards and requires had in fact completed the long form, rather than
approximately two hours to administer, and a from actual administration of the short form. In
standard short form, which consists of nine an assessment of its retest reliability, the short
cards and requires one hour to give. In keeping form was administered twice to 51 behavior
with generally recommended practice in multi- problem children over an 18 week interval. Very
cultural assessment (Dana, 1993, chap. 6, 1996), little stability over time was demonstrated in this
young people should be tested in their primary study. Costantino, Malgady, and Rogler (1988)
language, and those who are bilingual should be suggest several plausible explanations for this
tested by a similarly bilingual examiner. Sub- disappointing result, such as the narrow range
jects are instructed to tell a complete story about of scores among the subjects. Nevertheless,
each picture that indicates what is happening test±retest reliability remains to be demon-
now, what happened before, and what will strated for both the short and long forms of the
happen in the future. These instructions may be instrument.
repeated as often as necessary and supplemen- Turning to the validity of the TEMAS,
ted with structured inquiries to elicit informa- ratings by psychologists of the types of
tion concerning who and where the characters personality functions pulled by the stimulus
are and what they are thinking and feeling. pictures have indicated good agreement with the
The TEMAS is scored by rating stories for the intent in designing them, thus attesting that the
presence of the various cognitive and affective test measures what it purports to measure. In
functions and on a four-point scale for the level terms of its criterion validity, the TEMAS has
of adjustment reflected in thematic indications been found in several studies to discriminate
of the personality functions. The resulting between patient and nonpatient youngsters in
scores are then totaled and translated into Hispanic and African-American as well as
normalized T scores that are graphed to provide Caucasian samples, and to accomplish this
a readily interpretable visual profile. The distinction in inner city as well as middle-class
TEMAS manual provides detailed guidelines settings (Costantino, Malgady, Bailey, &
and case examples to illustrate these scoring Colon-Malgady, 1989; Costantino, Malgady,
procedures. Colon-Malgady, & Bailey, 1992; Costantino,
Review of Projective Assessment Methods 447

Malgady, Rogler, & Tsui, 1988). The TEMAS not provide the breadth of information con-
has not demonstrated any capacity to differ- cerning specific types of concerns and relation-
entiate specific kinds of disorders, but there is ships that emerge from TAT, CAT, and RATC
some evidence to suggest that certain story analyses. Second, the normative data available
characteristics, including the omission of main thus far are limited to 5±13-year-old city
characters or events and failure to notice a dwellers and thus do not provide a psychometric
conflict in the picture (due to lack of attention), basis for drawing conclusions about adolescents
may be sensitive to the presence of attention aged 14±18, or suburban and rural dwelling
deficit hyperactivity disorder (Costantino, youngsters. Third, the length of time (two
Colon-Malgady, Malgady, & Perez, 1991). hours) required to administer the 23 card long-
Initial studies reported in the TEMAS form of the test is often impractical in clinical
manual indicate further that many of its scales evaluations. The nine card short-form is an
correlate significantly with behavior ratings by attractive alternative but, as noted, adequate
subjects' mothers and teachers and with reliability has not yet been demonstrated for the
behavioral observations of their inclinations short-form.
toward aggressive or disruptive behavior and
their capacities for self-confidence and delay of
gratification. Preliminary data indicate further 4.15.7.7 Draw-a-person
that TEMAS profiles of young people prior to
their entering therapy can significantly predict The use of human figure drawings as a
aspects of their treatment outcome. projective method of personality assessment is
The TEMAS standardization sample com- based on the expectation that how subjects draw
prised 281 male and 361 female youngsters from people will reveal aspects of how they perceive
the New York City area who ranged in age from themselves and feel about others. Clinical use of
5 to 13 years. This sample of 642 children and drawing techniques originated in work with
adolescents included groups of Caucasian, children, and the first formal drawing test,
African-American, Puerto Rican, and other called the draw-a-man, was developed by
Hispanic subjects from predominantly lower- Goodenough (1926) and later refined by Harris
and middle-income families. The test manual (1963) and Naglieri (1988) for use as a
lists the mean and standard deviation for each nonverbal measure of intellectual development.
scale by gender and ethnicity for the age groups Machover (1948) introduced the notion of
5±7, 8±10, and 11±13. using human figure drawings as a projective
device that generates nonverbal, symbolic
messages concerning subjects' impulses, anxi-
eties, and conflicts, and she proposed numerous
4.15.7.6.3 Clinical utility
possible meanings for structural features of
The TEMAS brings to clinical assessments drawings (e.g., where figures are placed on the
the distinct advantages of a well conceptualized page) and the manner in which various parts of
and quantitatively standardized story-telling the body are drawn (e.g., a disproportionaly
technique that is also culturally sensitive and large head).
normed for minority groups of young people. Koppitz (1968, 1984) subsequently focused
As in the case of the RATC, its coded scales and attention specifically on draw-a-person (DAP)
profile graphs facilitate interpretation and assessment of young people and used on
provide the type of documentation that typi- Machover's interpretive hypotheses to formu-
cally proves valuable in forensic cases. More- late 30 specific indicators of emotional dis-
over, this measure stands alone as a story-telling turbance involving the quality of drawings
test proved applicable in the assessment of (e.g., asymmetry, transparencies), special fea-
African-American and Hispanic children and tures (e.g., teeth showing, arms clinging to
adolescents. The TEMAS accordingly merits body), and omission of body parts (e.g., no
serious consideration for inclusion in a test eyes, no arms). Naglieri and his colleagues
battery for evaluating personality functioning in (Naglieri, McNeish, & Bardos, 1991; Naglieri
young people, especially if they come from an & Pfeiffer, 1992) have developed further coding
urban minority background and if there are refinements to produce the draw-a-person:
forensic issues in the case. screening procedure of emotional disturbance
At this stage in its development, however, the (DAP:SPED). The DAP:SPED is an actuarialy
TEMAS has some drawbacks that examiners derived and normatively based system compris-
should keep in mind. First, although it appears ing 55 objectively scorable items, such as the
to be useful for assessing level of adjustment and measured dimensions and placement of figures.
incorporation of societal norms among both It is intended as a screening test for classifying
minority and nonminority youth, TEMAS does young people aged 6±17 with respect to their
448 Projective Assessment of Children and Adolescents

likelihood of having adjustment difficulties that expect that unusual emphasis on or omission of
call for further evaluation. some body part will reflect some particular
concern about the nature or functions of that
body part, most of Machover's specific hypoth-
4.15.7.7.1 Administration and scoring eses concerning the symbolic significance of
The DAP is administered by giving subjects a figure drawing characteristics lack consistent
plain 8.5 6 11 inch piece of paper and asking research support (Kahill, 1984; Roback, 1968;
them to draw a person. When they have finished Swensen, 1957, 1968).
the drawing, they are given another piece of The Koppitz system is adequately codified
paper and asked to draw a person of the and intended to comprise items that have
opposite sex from the one they have just drawn. demonstrably low occurrence in the normal
Subjects are further instructed to draw the figure population. However, there is no normative
of a whole person rather than a cartoon or stick database for the system, its reliability is yet to be
figure. In keeping with further suggestions by demonstrated, and there is some question as to
Machover (1951), subjects are also typically whether it can differentiate between well-
asked and to draw a picture of themselves and to adjusted and emotionally disturbed children.
provide some thematic content concerning the In a carefully done study in which Tharinger
figures they have drawn. and Stark (1990) compared groups of mood
As elaborated by Handler (1985), there are disordered, anxiety disordered, mood/anxiety
three alternative ways of eliciting such thematic disordered, and well-adjusted youngsters aged
content: by asking subjects to associate to their 9.5±14.75 years, the Koppitz signs showed good
drawings, by asking them to make up a story interscorer agreement but did not differentiate
about the people they have drawn, or by asking among the subject groups either in mean total
them specific questions about their drawings. score or in the frequency of any of the 30
Machover (1951) provided 31 specific questions individual items.
to be used for this purpose with children, such as Such findings warrant concern that the DAP
ªWhat is their ambition?,º ªHow happy are may be a faulty projective technique with
they?,º ªWhat do they worry about?,º and questionable propriety for continued clinical
ªWhat are their good points?º use. However, it could be that the DAP is a
As with story-telling techniques, figure draw- potentially sound method for which there has
ing methods are most commonly interpreted in not yet been sufficiently sophisticated develop-
clinical practice by an inspection in which ment and evaluation to document its capacities.
personality characteristics are inferred primar- The previously mentioned work of Naglieri et al.
ily from subjective impressions of noteworthy or (1991) on the DAP:SPED appears to speak to
unusual features of the figures drawn. Items this point. The DAP:SPED was standardized on
included in the Koppitz and DAP:SPED scales a representative national sample of 2355
may often enter into these impressionistic children and adolescents aged 6±17 years; its
assessments. However, these scales serve only objective scoring procedures have generated
to identify emotional disturbance without inter-rater agreements above 90%; its internal
contributing in other ways to personality consistency (alpha) reliability estimates are 0.76
description, such as by indicating how indivi- among 6±8-year-olds, 0.77 among 9±12-year-
duals process information, handle emotion, and olds, and 0.71 among 13±17-year-olds; and its
manage stress. They have consequently not been normalized total score has shown substantial
widely adopted clinically, and there are no other capacity to differentiate nonpatient youngsters
DAP scoring systems that have attracted much from those with identified behavioral or emo-
attention in the research literature. tional problems (McNeish & Naglieri, 1991;
Naglieri & Pfeiffer, 1992).
Also noteworthy is the work of Tharinger and
Stark (1990), who paired their failure to validate
4.15.7.7.2 Psychometric foundations
the Koppitz system with an investigation of a
There is very little psychometric foundation proposed new system of their own, the DAP
for traditional applications of the DAP, and integrative system. The integrative DAP is
clinical use of this instrument in assessing young based on a qualitative holistic scoring approach
people frequently goes well beyond any empiri- in which drawings are given an overall adjust-
cal justification in empirical data. The influen- ment rating on a scale from 1 (absence of
tial inspection approach used by Machover is psychopathology) to 5 (severe psychopathol-
neither standardized nor codified, which pre- ogy). Examiners are instructed to base their
cludes any systematic evaluation of its reliability impressions on their integrated sense of four
or the accumulation of a normative database. characteristics of a drawing, with the patholo-
Additionally, although it may be reasonable to gical end of the scale involving (i) inhumanness
Review of Projective Assessment Methods 449

of the drawing suggesting feelings of being Second, as a nonverbal measure the DAP can
incomplete, grotesque, or monstrous; (ii) lack of prove especially useful in the evaluation of
agency as conveyed by a sense of powerlessness; frightened, reticent, or otherwise uncommuni-
(iii) lack of well-being as reflected in negative cative young people, and it is unaffected by
facial expressions; and (iv) a hollow, vacant, or language difficulties or bilingualism. As noted
stilted sense indicating lack of capacity to by Cummings (1986), moreover, many young
interact. children may be more capable of expressing
In the same study in which Tharinger and their thoughts and feelings in drawings than in
Stark failed to validate the Koppitz system, words, and drawing pictures is a more familiar
their integrative system total score significantly activity to most youngsters than most of the
discriminated the mood disordered and anxiety/ tasks that are set for them in a psychological
mood disordered subjects from the well-ad- examination.
justed youngsters and also correlated signifi- Third, unusual characteristics of drawings
cantly with the Coopersmith self-esteem can suggest avenues for further exploration,
inventory, thus attesting the capacity of the even in the absence of definitive conclusions
DAP to depict a youngster's sense of self. about what these features signify. Examiners
Contemporary literature abounds with sharply need to be circumspect in pursuing such
divided opinion concerning whether the DAP is avenues, however, lest they prematurely con-
a worthless test that should no longer be used clude that they are exploring in correct direc-
(e.g., Gresham, 1993; Motta, Little, & Tobin, tions. The pursuit of speculative hypotheses is
1993) or is instead a potentially valuable clinical just as likely to lead into blind alleys as down
tool that has too often been carelessly used or fruitful paths.
inadequately researched (e.g., Bardos, 1993; Finally, there is ample indication in the
Holtzman, 1993). Results to date with the contemporary literature that such approaches
DAP:SPED and the DAP integrative system as the DAP:SPED and the DAP Integrative
give some reason to believe that improved System can help to identify the presence and
methodology may yet establish sound psycho- severity of emotional disturbance in general and
metric foundations for carefully specified can accordingly contribute to treatment recom-
applications of the DAP. mendations, planning, and monitoring. In the
future, codifiable and interpretable thematic
content in subjects' stories about their drawings,
4.15.7.7.3 Clinical utility
comparable to the data derived from story
Despite its widespread use in describing telling techniques, may further enhance the
subjects' personality characteristics, there is utility of the DAP.
virtually no empirical evidence that traditional
DAP interpretation has any clinical utility.
Smith and Dumont (1995) asked a group of 4.15.7.8 House-tree-person
experienced psychologists and graduate stu-
dents who had been trained in the use of the The House-tree-person (HTP) test was de-
DAP to review and comment on a case file. They vised by Buck (1948, 1985) as a means of
found that these clinicians routinely utilized tapping the concerns, interpersonal attitudes,
specific symbolic representations in the draw- and self-perceptions of young people more fully
ings to draw inferences about the client's than is possible with human figure drawings
personality characteristics and diagnostic alone. The HTP is intended for use with anyone
statusÐeven though research does not support over the age of three and was regarded by Buck
any such isomorphic correspondence of specific as a nonthreatening instrument that can serve
drawing characteristics to specific features of well to minimize a child's anxiety in testing
personality. Aside from the general ethical situations and to assess personality functioning
issues of using test instruments in unwarranted in multicultural and bilingual settings. As
ways, examiners who are preparing forensic postulated by Buck and subsequently elabo-
testimony jeopardize their credibility by em- rated by Hammer (1958, 1985), subjects'
ploying the DAP in this manner. drawings of these three objects are considered
This does not mean that human figure to provide symbolic representations of impor-
drawings are without utility in the clinical tant aspects of their world.
assessment of children and adolescents. First, Specifically, the house, as a dwelling place, is
by contrast with the practical difficulties of expected to arouse feelings toward the subject's
administering the full RATC and TEMAS, the home life and family relationships and, parti-
DAP is an easily administered measure that cularly for children, attitudes toward their
requires no test stimuli and usually takes less parents and siblings. The tree is seen as
than 10 minutes. encouraging projection of personal feelings
450 Projective Assessment of Children and Adolescents

about the self that would be more anxiety- indicating reluctance to make contact with the
provoking to express in drawing a person, environment and an inhibited capacity for social
because the latter is more obvious in its relations, for example, and overemphasis on the
representation as a self-portrait. More specifi- roots of the tree where they make contact with
cally, the way the trunk of the tree is drawn is the ground is taken as evidence of subjects'
considered to portray a subject's feeling of basic concerns about losing their grip on reality.
power and inner strength; the branches are seen
as depicting the subject's ability to derive
4.15.7.8.2 Psychometric foundations
satisfaction from the environment; and the
overall organization of the drawing is taken as a The entire psychological literature contains
reflection of the individual's feeling of intra- only a handful of articles bearing on the
personal balance. Finally, the drawing of the psychometric foundations of the HTP. Most
person is expected to reveal aspects of how of these are over 25 years old, and none of them
subjects view themselves, how they would like to provides convincing supportive evidence for the
be, and what they think about significant other interpretive uses of the instrument recom-
people in their lives. mended by its leading proponents. Buck's
(1985) 350-page revised manual contains ex-
tensive guidelines and case illustrations to
4.15.7.8.1 Administration and scoring facilitate interpretation, but neither reliability
Although Buck recommends a four-page nor validity appears in the index. As intriguing
booklet with pages measuring 7 6 8.5 inches, as the rationale for the instrument may be to
most HTP examiners use four sheets of standard clinicians, especially those who are psychody-
8.5 611 in paper on which subjects are asked namically oriented, there is at present no
first to draw ªas good a picture of a house as you empirical basis to warrant inferring personality
can,º then a tree, and then a person of each sex. characteristics from it.
Subjects are told they can take us long as they
wish and draw any kind of house, tree, or person 4.15.7.8.3 Clinical utility
they like. Completion of the drawings is
followed by an interrogation phase in which Similar to the DAP, the HTP offers the
numerous questions devised by Buck are used to potential advantages in clinical practice of a
encourage subjects to define, describe, and brief, easily administered, nonverbal, and
associate to their drawings (e.g., ªAbout how largely culture-free assessment instrument,
old is that tree?,º ªWhat does that house make along with perhaps being even less anxiety
you think of?º ªIs that person happy?º Buck provoking than the DAP. Emerging refinements
also recommended a chromatic phase of the with the DAP suggest that the HTP as well
HTP in which subjects would do a second might prove useful in identifying maladjustment
rendering of their drawings in crayon rather in general, monitoring progress and change in
than pencil, followed by another interrogation. psychotherapy, and even in pointing to possible
There are no data to indicate the frequency with specific areas of conflict and concern. More-
which examiners conduct Buck's full HTP over, like the DAP inquiry, the HTP interroga-
administration or instead limit the test to the tion can produce story-telling content that in
pencil drawings, without either an inquiry or a turn can be codified and suggest topics for
chromatic phase. further exploration. As matters presently stand,
Following the example of Goodenough, Buck however, any such utility remains an unfulfilled
originally proposed an elaborate quantitative potential, and examiners should be circumspect
system for objective coding of structural about including the HTP in their test batteries
features of the HTP drawings, such as their and basing any firm conclusions on it.
size and proportions. As best as can be
determined from the literature, however, quan- 4.15.7.9 Kinetic Family Drawing
titative coding of the HTP has rarely been
employed in clinical practice. Instead, clinicians Machover (1948) and numerous other clin-
using this instrument typically rely on a icians who pioneered in using the DAP to assess
qualitative inspection technique to identify young people suggested that useful information
symbolic implications of drawing characteris- might also be obtained by asking individuals to
tics for aspects of personality functioning. As in draw members of their family. This suggestion
the case of interpreting the DAP in the tradition was formalized by Burns and Kaufman (1970,
of Machover, many of the interpretive hypoth- 1972) as the kinetic family drawing (KFD)
eses for the HTP suggested by Buck and by technique, in which subjects are instructed to
Hammer are quite specific. A door that is tiny in draw a picture of everyone in their family,
relation to the size of the house is interpreted as including themselves, doing something. These
Review of Projective Assessment Methods 451

drawings are then examined for such objective to consider. As summarized by Knoff and Prout
features as omissions of body parts or of (1985a, 1985b), at least four objective methods
members of the family and interpreted accord- for coding characteristics of kinetic drawings
ing to the actions, styles, and symbols repre- have also been proposed by various investiga-
sented in them. tors, but none of these has become consistently
Actions in the Burns and Kaufman approach visible in the literature. A review of the KFD
refer to the ways in which the figures drawn are literature by Handler and Habenicht (1994)
behaving toward each other, which are thought indicates in general that cumulative knowledge
to provide clues to the intensity and emotional concerning the adequacy and utility of this
tone of their relationships. Styles concern instrument has been limited by considerable
barriers between family members that prevent variation in whether and how it has been scored
them from interacting at all, which may be in research studies and applied in clinical
expressed by drawing some of them at a far practice.
distance from the others or encasing them in a Also of note is a thematic elaboration of the
circle or a box. Symbols comprise a list of KFD proposed by McConaughy and Achen-
specific items, such as beds, flowers, stoves, cats, bach (1994) and called the semistructured
and the like, the inclusion of which is considered interview protocol. In this approach subjects
to reflect various specific unconscious impulses are given the following question to answer after
or concerns. they have completed their family drawing: (i)
Publication of the KFD was followed by a what are they doing, (ii) what kind of person is
school adaptation of the technique by Prout and [each member], (iii) what are three words that
Phillips (1974), known as the kinetic school describe [each member], (iv) how does [each
drawing (KSD), in which children are asked to member] feel in this picture, (v) what is [each
draw a school picture of themselves, their member] thinking, (vi) who do you get along
teacher, and a friend or two in which everyone with best, (vii) who do you get along with least,
is doing something. The KSD was intended to and (viii) what is going to happen next in your
provide information about peer relationships picture?
and about attitudes and concerns related to
school in the same manner as the KFD does for
4.15.7.9.2 Psychometric foundations
family relationships and feelings about the
home. Handler and Habenicht (1994) were able to
Knoff and Prout (1985a, 1985b) subsequently reference a substantial number of publications
recommended combining the KFD and KSD concerning the KFD. However, they were
and administering both measures for purposes forced to conclude that, despite its widespread
of analysis and comparison. This combined use, this projective method has not yet been
approach, which they call the kinetic drawing adequately developed with respect to its
system, is expected to identify adjustment psychometric properties. Most of the systems
difficulties both at home and in school, to previously proposed for coding KFD charac-
clarify causal or reciprocal relationships be- teristics can achieve substantial inter-rater
tween family and school-related issues, and to reliability, with percentages of agreement gen-
indicate which people in subjects' lives (e.g., erally ranging well above 0.85 in various studies
father, sister, teacher) are sources of support of (Cummings, 1986; Handler & Habenicht, 1994).
tension. In a further extension of this approach, However, none of these scoring systems has
Burns (1987) has developed a kinetic house-tree- demonstrated satisfactory retest reliability, even
person, in which subjects are instructed to draw after very brief intervals, and there is little
on a single page a picture of a house, a tree, and empirical basis for challenging the opinion that
a person in ªsome kind of action.º ªthe KFD still remains primarily a clinical
instrument with inadequate norms and ques-
tionable validityº (Handler & Habenicht, 1994,
4.15.7.9.1 Administration and scoring
p. 441).
Similar to procedures followed in other With this in mind, Handler and Habenicht
drawing techniques, administration of the have recommended a holistic, integrative ap-
KFD consists of giving subjects an 8.5 6 11 proach to KFD interpretation, much in the
inch piece of plain paper and asking them to manner of the previously noted integrative DAP
draw a picture of everyone in their family, method used by Tharinger and Stark (1990),
including themselves, doing something. Begin- rather than the coding and summation of lists of
ning with an interpretive approach modeled signs and symbols. In the same study in which
basically after Machover's qualitative system, Tharinger and Stark demonstrated the super-
Burns (1982) developed a long list of actions, iority of their integrative DAP to the Koppitz
style, and symbol characteristics for examiners scoring system, they also found that a holistic
452 Projective Assessment of Children and Adolescents

method of evaluating kinetic drawings discri- 4.15.7.10 Sentence Completion Methods


minated adjusted from maladjusted children
more effectively than a traditional scoring guide Sentence completion methods consist of
for KFD interpretation developed by Reynolds initial words or phrases, called stems, that
(1978). subjects are asked to extend either orally or in
In developing his scoring guide, Reynolds writing into complete sentences. Typically used
urged clinicians and researchers to avoid stems vary in length and structure from just one
cookbook approaches that fail to go beyond or two words, such as ªPeople . . .º or ªI
positing direct and absolute links between wish . . .,º to detailed specification of people or
individual drawing characteristics and specific situations, such as ªIf only my mother
personality features. Instead, scores for drawing would . . .º or ªWhen he found he had failed
characteristics should be only a first step in the examination, he . . .º As in the case of
interpreting drawings as gestalts that take on responses on other projective methods, the
meaning only in relation to a youngster's manner in which subjects complete sentences is
personal, family, and cultural context. It can expected to provide an indirect source of
be hoped that eventual attention to this sound information concerning their underlying feel-
advice, perhaps through the standardization of ings, attitudes, and level of adjustment. In
integrated coding systems, will establish for addition to eliciting general information in these
kinetic drawings the validity that is presently respects, the specificity in many sentence
lacking for them to be used with confidence in completion stems encourages subjects to reveal
clinical evaluations. their orientation toward particular events and
circumstances in their lives.
Sentence completion methods originated in
word association tests dating back to the 1890s,
4.15.7.9.3 Clinical utility and, as reviewed by Haak (1990) and Lah
Conceived as a means of understanding how (1989b), were developed during the 1940s and
children and adolescents conceptualize their 1950s into a large number of formal and
family and perceive themselves within the informal versions, of which the most note-
family context, the KFD is a potentially useful worthy published scales were the Rohde, the
instrument for elaborating the interpersonal Sacks, the Forer, and the Miale-Holsopple
dynamics of young people and orchestrating sentence completion tests and the Rotter
individual or family therapy for those with incomplete sentences blank (RISB). Of these,
adjustment difficulties. By employing the the RISB has become the best known and most
kinetic drawing system, examiners can explore widely used and includes an adult, a college, and
school-based as well as family-related issues for a high school form (Rotter, Lah, & Rafferty,
such purposes. Of further potential benefit, 1992). Also available are more recently devel-
subjects from a variety of ethnic and minority oped sentence completion forms constructed by
group backgrounds have been found to express Brown and Unger (1992) for use with adoles-
dimensions of their family culture in their cents and adults and by Hart (1986) for
KFDs (Handler & Habenicht, 1994), which evaluating school age children. Further com-
suggests a role for this instrument in multi- ments focus primarily on the RISB, as the
cultural assessment. predominant and most recently revised exem-
Unfortunately, however, the previously noted plar of the sentence completion method, and on
psychometric limitations of the KFD makes its the Hart sentence completion test (HSCT),
clinical use problematic at present, particularly because of its specificity for assessing young
with respect to basing any firm conclusions on people.
what and how subjects draw. Until such time as
adequate research has clarified what the KFD
4.15.7.10.1 Administration and scoring
and other projective drawings methods can and
cannot do, clinicians are well advised to regard The RISB is a 40-item test printed on the front
their figure drawing findings as suggesting but and back of a one-page form and comprised
not confirming any notions about the subject. mainly of brief stems (eight have just a single
We would endorse in this regard the conclusion word, 21 have two words, six have three words,
of Knoff (1990) that ªThe hypothesis generating and the remaining five have four words).
use of projective drawings and their ability to be Subjects are given a pencil and asked to
interpreted within various psychological orien- ªcomplete these sentences to express your real
tations remains both viable and defensible [but] feelings.º The test can be administered either
the validity of hypotheses tied to specific individually or in a group setting; however,
drawing characteristics must still be deter- Rotter et al. (1992) caution against taking oral
minedº (p. 99). rather than written responses, primarily because
Review of Projective Assessment Methods 453

doing so injects an interpersonal component Rotter et al. approach (Lah, 1989a). Moreover,
into the testing administration that can influ- as discussed in the introductory portions of
ence subjects' responses in various confounding this chapter, sentence completion responses
ways. have numerous objective, structural, and beha-
The RISB and other sentence completion vioral components (e.g., time to completion,
tests are typically interpreted in clinical practice length of sentences, frequency of self- vs. other-
by the inspection methods we have described reference, items omitted) that have rarely been
previously; that is, examiners read the content considered in codifying these methods or
of the items and from impressions of what they attempting to establish a psychometric founda-
signify concerning a subject's probable person- tion for them.
ality characteristics. Rotter et al. (1992) also The HSCT coding system has also achieved
provide a scoring system for rating each item on substantial inter-rater agreement and some
a seven-point scale from 0 (most positive preliminary success differentiating between
adjustment) to 6 (most indication of conflict). emotionally disturbed and well-adjusted chil-
These ratings are totaled to yield an overall dren. However, Hart (1986, p. 269) observed
adjustment score. There is little indication that that it remained to further research to demon-
this or any other codification has received much strate adequate reliability and criterion validity
attention in clinical practice or research studies, for his instrument and to develop sufficiently
even though some work with college students broad normative data. The available literature,
indicated that the overall RISB score can circa 1996, suggests that not much if any,
discriminate those who are receiving counseling progress has been made in this regard.
or psychotherapy from their nonpatient peers
(Lah, 1989a).
4.15.7.10.3 Clinical utility
The HSCT is also a 40-item measure in which
the item stems were designed specifically to use Sentence completion methods bring to the
with children and to sample family, social, personality assessment battery an easily admi-
school, and self dimensions of their lives (Hart, nistered projective test that can be given to
Kehle, & Davies, 1983). HSCT scoring involves groups as well as individuals and requires only
rating each item as negative, neutral, or positive about 30 minutes on average for subjects to
with respect to adjustment and also rating 10 complete. The scorable 40-item RISB high
scales composed of various clusters of items school form is suitable for high school and
concerned with perceptions of self, family, and most middle school youngsters, and the 40-item
school on a five-point negative-to-positive HSCT can be used comfortably with younger
continuum. Criteria are provided for each item children. Barring a language or reading diffi-
to guide examiners in their ratings and thereby culty, the meaning of the sentence completion
to enhance scoring objectivity and the prospects stems is clear, and subjects are not asked to
for achieving inter-rater agreement. explain, elaborate, or otherwise account for
their responses. Hence, incomplete sentence
methods are often less threatening or anxiety-
4.15.7.10.2 Psychometric foundations
provoking than inkblot, story-telling, and figure
Although there is a substantial RISB research drawing procedures. However, the previously
literature, most of the published studies have noted relative lack of ambiguity in sentence
used the instrument as a measure of adjustment completion stems means that subjects are more
but, as pointed out by Lah (1989a), they were aware than on other projective methods of how
not designed to evaluate its properties. Parti- they are presenting themselves through their
cularly with respect to its validity, there has been responses.
virtually no accumulation of empirical evidence At this point in time, the RISB and HSCT can
to support any diagnostic or predictive infer- be used with justification to form general
ences about personality functioning beyond the impressions of a young person's level of
modestly demonstrated capacity of the RISB to adjustment and to formulate hypotheses con-
identify maladjustment. Likewise, little sys- cerning possible conflicts or concerns the young
tematic progress has been made in assessing person has and how he or she may feel about
the reliability of RISB data and establishing self, other people, and certain situations. Such
normative standards for them. hypotheses can justifiably be expressed only as
This is a regrettable state of affairs because, speculations, however, not as conclusions, and
aside from being a psychodynamically compel- they should be considered reasonably correct
ling method of enriching personality assess- only to the extent that they are supported by
ment, the RISB is an eminently codeable other reliable sources of information.
instrument for which excellent interscorer Haak (1990) has asserted that sentence
agreement has been demonstrated using the completion methods can be used effectively
454 Projective Assessment of Children and Adolescents

with children to rule out intellectual difficulties, methods will lead the way toward improved
attention deficit disorder, and stress and to rule standardization and codification of projective
in depression, anxiety, thought disturbance, and methods, that will in turn enhance the psycho-
defensiveness, and provides detailed clinical metric foundations on which they rest. As
guidelines for doing so. These suggestions seem Kuehnle (1996) has pointed out, inappropriate
sensible in each case, but they are entirely use of projective instruments for purposes for
unverified by empirical data. Hence, like other which they have not been validated, such as
impressions formed by experienced clinicians, identifying children as having been traumatized
Haak's diagnostic guidelines provide an agenda or sexually abused, violates ethical standards
for confirmatory research, but they should not and risks causing harm to young people and
be elevated from speculations to the status of their families.
conclusions until that empirical confirmation More than any of the other projective
becomes available. methods, the Rorschach has had the benefit
of rigorous attention to research methodology,
as demonstrated in publications by Exner (1995)
4.15.8 FUTURE DIRECTIONS and Weiner (1995a) and, as indicated earlier in
this chapter, this method presently rests on a
Projective methods have been extensively solid psychometric foundation. What the future
used, taught, and studied for many years, and of projective methods needs is similar metho-
it seems likely that they will continue in the dological attention to documenting the relia-
future to be regularly included in test batteries bility of other techniques and their validation
for assessing personality functioning in young for various purposes. To say that sophisticated
people. Survey data cited in this chapter research methods are not applicable to projec-
document that over the last 30 years psychol- tive methods is as unwarranted as asserting that
ogists have continued with much the same these methods are by nature invalid. To say that
frequency to apply these instruments in practice projective test responses cannot be examined
and utilize them in research. During this same scientifically without detracting from their
period the psychometric adequacy and clinical idiographic richness sells projective methods
contributions of projective methods have been short and prevents them from realizing their full
regularly and vigorously challenged, and this potential.
chapter indicates that the uses to which some of Along with these needs for improved research
these instruments are put often go beyond and a narrowed gap between data and practice,
available justification in empirical data. the most important future direction for projec-
How do we account for the persistent use by tive testing of young people lies in developing
clinicians of many such as yet unvalidated adequately representative normative data that
methods? One could say that there just happen will facilitate age-specific and multicultural
to be legions of uninformed or unethical assessment. There is no lack of projective
clinicians in practice who do not hesitate to techniques, but there is a decided lack of reliable
employ useless methods if doing so serves their information concerning how children and
purpose in some way. However, it seems adolescents of different ages, from diverse
doubtful that a vast segment of the profession backgrounds, and with various kinds of person-
deserves to be tarred with such a broad brush of ality strengths and weaknesses, should be
evil. Rather, at least the majority of profes- expected to respond to them. In addition to
sionals who use projective methods must have the nearly empty coffers of cross-sectional
good reason on the basis of their clinical and normative data of these kinds, the cupboard
individual case experience to believe sincerely is virtually bare with respect to longitudinal ata
that these methods provide valid and useful indicating how young people are likely to
information about personality functioning in change over time and with maturation in how
ways that have not yet been translated into they respond to projective tests. Collecting and
supporting research data. One could then say disseminating such data is the number one
that such clinical confidence in projective agenda item for the future development of
methods is based solely on illusory correlation. projective assessment of children and adults.
However, it seems doubtful that a vast segment
of professional personality assessors could be so
thoroughly deluded. 4.15.9 SUMMARY
These observations suggest that projective
techniques have been and will continue to be Projective tests are methods of personality
used because they yield valuable information assessment in which some degree of ambiguity
and generate fruitful hypotheses in clinical in the test stimuli or instructions creates
assessments. Hopefully, adequate research opportunities for subjects to structure their
References 455

responses in terms of their individual person- metric respectability. However, further research
ality characteristics, and thereby provide in- on the RATC and TEMAS is needed with
formation about the nature of these regard to multicultural and adolescent norms,
characteristics. Although projective methods respectively. With regard to the TAT, recently
are accordingly more ambiguous and less emerging psychometrically sound schemes for
structured than so-called objective methods, coding specific personality characteristics re-
the differences between these methods are flected in thematic content, such as SCORS,
relative rather than absolute. All projective appear to provide a basis for empirical decision
tests contain objective as well as subjective making.
features and elicit responses that are represen- The other measures reviewed have in various
tative as well as symbolic of behavior, and they ways also shown potential to be codified and
differ from each other in the extent to which refined on the basis of empirical data, and there
they are ambiguous. is reason to be hopeful that advances in research
Because of their relatively unstructured methodology will eventually close a currently
nature, projective tests measure personality regrettable gap between what is known for sure
functioning in subtle and indirect ways and about these projective methods and what is
tap underlying psychological characteristics at a frequently assumed to be true about them in
less conscious level than relatively structured clinical practice. Until this gap is narrowed, and
measures. Projective test data consequently especially until such time as more extensive
provide valuable information about how people normative and multicultural data become
are likely to think, feel, and act that is difficult to available, most clinical inferences from projec-
obtain from objective assessment procedures, tive data should be regarded as hypotheses to be
and they are also less susceptible than objective confirmed rather than as facts on which to base
test data to the influence of test-taking attitudes. conclusions and recommendations.
Projective methods can be used to good effect
with children and adolescents as well as adults.
The basic interpretive conclusions and hypoth- 4.15.10 REFERENCES
esis that attach to projective test variables apply Abraham, P. P., Lepisto, B. L., Lewis, M. G., & Schultz, L.
regardless of the age of the subject, provided (1994). Changes in Rorschach variables of adolescents in
that examiners determine the implications of residential treatment: An outcome study. Journal of
Personality Assessment, 62, 505±514.
their data in the light of normative develop- Alvarado, N. (1994). Empirical validity of the Thematic
mental expectations. Surveys of clinical and Apperception Test. Journal of Personality Assessment,
school settings indicate that the projective 63, 59±79.
instruments most frequently administered in Ames, L. B., Metraux, R. W., Rodell, J. L., & Walker, R.
N. (1974). Child Rorschach responses (Rev. ed.). New
evaluating young people are the Rorschach York: Brunner/Mazel.
inkblot method, the thematic apperception test Ames, L. B., Metraux, R. W., & Walker, R. N. (1971).
(TAT), the children's apperception test, the Adolescent Rorschach responses (Rev. ed.). New York:
Roberts apperception test for children (RATC), Brunner/Mazel.
the tell-me-a-story (TEMAS) test, the draw-a- Archer, R. P., Imhof, E. A., Maruish, M., & Piotrowski, C.
(1991). Psychological test usage with adolescent clients:
person, the house-tree-person, the kinetic family 1990 survey findings. Professional Psychology, 22,
drawing, and alternate forms of the sentence 247±252.
completion test. Archer, R. P., & Krishnamurthy, R. (1993a). A review of
For each of these nine projective tests, this MMPI and Rorschach interrelationships in adult sam-
ples. Journal of Personality Assessment, 61, 277±293.
chapter reviews their composition, administra- Archer, R. P., & Krishnamurthy, R. (1993b). Combining
tion, scoring, psychometric foundations, and the Rorschach and MMPI in the assessment of
clinical utility. Despite widespread utilization of adolescents. Journal of Personality Assessment, 60,
these nine tests in clinical practice to draw 132±140.
conclusions about the personality characteris- Atkinson, J. W., & Feather, N. T. (1966). A theory of
achievement motivation. New York: Wiley.
tics, level of adjustment, and treatment needs of Bardos, A. N. (1993). Human figure drawings: Abusing the
young people, only the Rorschach presently abused. School Psychology Quarterly, 8, 177±181.
rests on a solid empirical foundation. Properly Bellak, L. (1993). The T.A.T., C.A.T., and S.A.T. in clinical
collected and interpreted, Rorschach data use (5th ed.). Boston: Allyn & Bacon.
Bellak, L., & Hurvich, M. (1966). A human modification of
provide numerous demonstrably reliable and the Children's Apperception Test. Journal of Projective
valid indices that facilitate differential diagnosis Techniques, 30, 228±242.
and treatment planning for children and Bellak, L., & Siegel, H. (1989). The Children's Appercep-
adolescents with adjustment difficulties. tion Test (CAT). In C. S. Newmark (Ed.), Major
The RATC and TEMAS have shown through psychological assessment instruments (Vol. II,
pp. 99±127). Boston: Allyn & Bacon.
the development of standardized administra- Bornstein, R. F. (1995). Sex differences in objective and
tion and scoring procedures that story-telling projective dependency tests: A meta-analytic review.
methods have the potential to achieve psycho- Assessment, 2, 319±331.
456 Projective Assessment of Children and Adolescents

Buck, J. N. (1948). The H-T-P technique, a qualitative and system. Vol. 1. Basic foundations (3rd ed.). New York:
quantitative method. Journal of Clinical Psychology, 4, Wiley.
317±396. Exner, J. E., Jr. (Ed.) (1995). Issues and methods in
Buck, J. N. (1985). The House-Tree-Person technique: Rorschach research. Mahwah, NJ: Erlbaum.
Revised manual. Los Angeles: Western Psychological Exner, J. E., Jr., & Andronikoff-Sanglade, A. (1992).
Services. Rorschach changes following brief and short-term
Burns, R. C. (1982). Self-growth in families: Kinetic Family therapy. Journal of Personality Assessment, 59, 59±71.
Drawings (K-F-D) research and application. New York: Exner, J. E., Jr., Thomas, E. A., & Mason, B. (1985).
Brunner/Mazel. Children's Rorschachs: Description and prediction.
Burns, R. C. (1987). Kinetic-House-Tree-Person drawings Journal of Personality Assessment, 49, 13±20.
(K-H-T-P). New York: Brunner/Mazel. Exner, J. E., Jr., & Weiner, I. B. (1995). The Rorschach: A
Burns, R. C., & Kaufman, S. H. (1970). Kinetic Family comprehensive system. Vol. 3. Assessment of children and
Drawings (K-F-D): An introduction to understanding adolescents (2nd ed.). New York: Wiley.
children through kinetic drawings. New York: Brunner/ Finn, S. E. (1996, March). Assessment feedback integrating
Mazel. MMPI-2 and Rorschach findings. Paper presented at the
Burns, R. C., & Kaufman, S. H. (1972). Actions, styles, and annual meeting of the Society for Personality Assess-
symbols in Kinetic Family Drawings (K-F-D). New York: ment, Denver, CO.
Brunner/Mazel. Frank, L. K. (1939). Projective methods for the study of
Butcher, J. N., & Rouse, S. V. (1996). Personality: personality. Journal of Psychology, 8, 389±413.
Individual differences and clinical assessment. Annual Freedenfeld, R., Ornduff, S., & Kelsey, R. M. (1995).
Review of Psychology, 47, 87±111. Object relations and physical abuse: A TAT analysis.
Chandler, L. A. (1990). The projective hypothesis and the Journal of Personality Assessment, 64, 552±568.
development of projective techniques for children. In C. Freud, S. (1958). Psycho-analytic notes upon an autobio-
R. Reynolds & R. W. Kamphaus (Eds.), Handbook of graphical account of a case of paranoia (dementia
psychological and educational assessment of children (Vol. paranoides). Standard edition (Vol. XII, pp. 9±82).
2, pp. 55±69). New York: Guilford Press. London: Hogarth. (Original work published in 1911.)
Costantino, G., Colon-Malgady, G., Malgady, R. G., & Freud, S. (1962). Further remarks on the neuro-psychoses
Perez, A. (1991). Assessment of attention deficit disorder of defence. Standard edition (Vol. III, pp. 162±185).
using a thematic apperception technique. Journal of London: Hogarth. (Original work published in 1896.)
Personality Assessment, 57, 97±95. Ganellen, R. J. (1996). Integrating the Rorschach and the
Costantino, G., Malgady, R. G., Bailey, J., & Colon- MMPI-2 in personality assessment. Mahwah, NJ: Erl-
Malgady, G. (1989). Clinical utility of TEMAS: A baum
projective test for children. Paper presented at the meeting Gardner, R. A. (1971). Therapeutic communication with
of the Society for Personality Assessment, New York. children: The mutual storytelling technique. Northvale,
Costantino, G., Malgady, R. G., Colon-Malgady, G., & NJ: Aronson.
Bailey, J. (1992). Clinical utility of the TEMAS with non Goodenough, F. L. (1926). Measurement of intelligence by
minority children. Journal of Personality Assessment, 59, drawings. New York: Harcourt, Brace & World.
433±438. Gresham, F. M. (1993). ªWhat's wrong in this picture?º:
Costantino, G., Malgady, R. G., & Rogler, L. H. (1988). Response to Motta et al.'s review of human figure
TEMAS (Tell-Me-A-Story) manual. Los Angeles: Wes- drawings. School Psychology Quarterly, 8, 182±186.
tern Psychological Services. Haak, R. A. (1990). Using the sentence completion to
Costantino, G., Malgady, R. G., Rogler, L. H., & Tusi, E. assess emotional disturbance. In C. R. Reynolds & R. W.
C. (1988). Discriminant analysis of clinical outpatients Kamphaus (Eds.), Handbook of psychological and educa-
and public school children by TEMAS: A thematic tional assessment of children (Vol. 2, pp. 147±167). New
apperception test for Hispanics and Blacks. Journal of York: Guilford Press.
Personality Assessment, 52, 670±678. Hammer, E. F. (1958). The clinical application of projective
Cramer, P. (1987). The development of defense mechan- drawings. Springfield, IL: Charles C. Thomas.
isms. Journal of Personality, 55, 597±614. Hammer, E. F. (1985). The House-Tree-Person test. In C.
Cramer, P., & Blatt, S. J. (1990). Use of the TAT to S. Newmark (Ed.), Major psychological assessment
measure change in defense mechanisms following in- instruments (pp. 135±164). Boston: Allyn & Bacon.
tensive psychotherapy. Journal of Personality Assess- Handler, L. (1985). The clinical use of the Draw-a-person
ment, 54, 236±251. test (DAP). In C. S. Newmark (Ed.), Major psychological
Cummings, J. A. (1986). Projective drawings. In H. M. assessment instruments (pp. 165±216). Boston: Allyn &
Knoff (Ed.), The assessment of child and adolescent Bacon.
personality (pp. 199±204). New York: Guilford Press. Handler, L., & Habenicht, D. (1994). The Kinetic Family
Dana, R. H. (1985). Thematic Apperception Test (TAT). Drawing technique: A review of the literature. Journal of
In C. Newmark (Ed.), Major psychological assessment Personality Assessment, 62, 440±464.
instruments (pp. 89±134). Boston: Allyn & Bacon. Harris, D. B. (1963). Children's drawings as a measure of
Dana, R. H. (1993). Multicultural assessment perspectives intellectual maturity. New York: Harcourt, Brace &
for professional psychology. Boston: Allyn & Bacon. World.
Dana, R. H. (1996). Culturally competent assessment Hart, D. H. (1986). The sentence completion techniques. In
practice in the United States. Journal of Personality H. M. Knoff (Ed.), The assessment of child and adolescent
Assessment, 66, 472±487. personality (pp. 245±272). New York: Guilford Press.
Di Leo, J. H. (1983). Interpreting children's drawings. New Hart, D. H., Kehle, T. J., & Davies, M. V. (1983).
York: Brunner/Mazel. Effectiveness of sentence completion techniques: A
Elbert, J. C., & Holden, E. W. (1987). Child diagnostic review of the Hart Sentence Completion Test. School
assessment: Current training practices in clinical psy- Psychology Review, 12, 428±434.
chology internships. Professional Psychology, 18, Haworth, M. (1965). A schedule of adaptive mechanisms in
587±596. CAT responses. Larchmont, NY: CPS.
Exner, J. E., Jr. (1991). The Rorschach: A comprehensive Hibbbard, S., Farmer, L., Wells, C., Difillipo, E., Barry,
system. Vol. 2. Interpretation (2nd ed.). New York: W., Korman, R., & Sloan, P. (1994). Validation of
Wiley. Cramer's defense mechanism manual for the TAT.
Exner, J. E., Jr. (1993). The Rorschach: A comprehensive Journal of Personality Assessment, 63, 197±210.
References 457

Hibbard, S., Hilsenroth, M. J., Hibbard, J. K., & Nash, M. procedures for calculating Rorschach inter-rater relia-
R. (1995). A validity study of two projective object bility: Conceptual and empirical foundations. Journal of
representation measures. Psychological Assessment, 7, Personality Assessment, 66, 308±320.
432±439. McNeish, T. J., & Naglieri, J. A. (1991). Identification of
Hoffman, S., & Kupperman, N. (1990). Indirect treatment the seriously emotionally disturbed using the Draw A
of traumatic psychological experiences: The use of TAT Person: Screening Procedure for Emotional Disturbance.
cards. American Journal of Psychotherapy, 44, 107±115. Journal of Special Education, 27, 115±121.
Holtzman, W. H. (1993). An unjustified, sweeping indict- Morgan, C. D., & Murray, H. A. (1935). A method for
ment by Motta et al. of human figure drawings for investigating fantasies: The Thematic Apperception Test.
assessing psychological functioning. School Psychology Archives of Neurology and Psychiatry, 34, 289±306.
Quarterly, 8, 189±190. Motta, R. W., Little, S. G., & Tobin, M. I. (1993). The use
Hutton, J. B., Dubes, R., & Muir, S. (1992). Assessment and abuse of human figure drawings. School Psychology
practices of school psychologists: Ten years later. School Quarterly, 8, 162±169.
Psychology Review, 21, 271±284. Murray, H. A. (1943). Thematic Apperception Test manual.
Kahill, S. (1984). Human figure drawings in adults: An Cambridge, MA: Harvard University Press.
update of the empirical literature. Canadian Psychology, Murray, H. A. (1951). Uses of the Thematic Apperception
25, 269±292. Test. American Journal of Psychiatry, 107, 577±581.
Kennedy, M. L., Faust, D., Willis, W. G., & Piotrowski, C. Murray, H. A. (1971). Thematic Apperception Test: Manual
(1994). Social-emotional assessment practices in school (Rev. ed.). Cambridge, MA: Harvard University Press.
psychology. Journal of Psychoeducational Assessment, Murstein, B. I. (1963). Theory and research in projective
12, 228±240. techniques (Emphasizing the TAT). New York: Wiley.
Knoff, H. M. (1990). Evaluation of projective drawings. In Murstein, B. I., & Wolf, S. R. (1970). Empirical test of the
C. R. Reynolds & R. W. Kamphaus (Eds.), Handbook of ªlevelsº hypothesis with five projective techniques.
psychological and educational assessment of children (Vol. Journal of Abnormal Psychology, 75, 38±44.
2, pp. 89±146). New York: Guilford Press. Naglieri, J. A. (1988). Draw A Person: A quantitative
Knoff, H. M., & Prout, H. T. (1985a). The Kinetic scoring system. New York: Psychological Corporation.
Drawing System: A review and integration of the Kinetic Naglieri, J. A., McNeish, T. J., & Bardos, A. N. (1991).
Family and Kinetic School Drawing techniques. Psy- Draw-A-Person: Screening Procedure for Emotional
chology in the Schools, 22, 50±59. Disturbance. Austin, TX: ProEd.
Knoff, H. M., & Prout, H. T. (1985b). The Kinetic drawing Naglieri, J. A., & Pfeiffer, S. I. (1992) Performance of
system: Family and school. Los Angeles: Western disruptive behavior disordered and normal samples on
Psychological Services. the Draw A Person: Screening Procedure for Emotional
Kohlberg, L. (1976). Moral stage and moralization: The Disturbance. Psychological Assessment, 4, 156±159.
cognitive developmental approach. In T. Lickona (Ed.), Obrzut, J. E., & Boliek, C. A. (1986). Thematic approaches
Moral development and behavior: Theory, research, and to personality assessment with children and adolescents.
social issues (pp. 31±53). New York: Holt, Rinehart & In H. M. Knoff (Ed.), The assessment of child and
Winston. adolescent personality (pp. 173±198). New York: Guil-
Koppitz, E. M. (1968). Psychological evaluation of chil- ford Press.
dren's human figure drawings. New York: Grune & Ornduff, S. R., Freedenfeld, R., Kelsey, R. M., & Critelli,
Stratton. J. (1994). Object relations of sexually abused female
Koppitz, E. M. (1984). Psychological evaluation of human subjects: A TAT analysis. Journal of Personality Assess-
figure drawings by middle school pupils. New York: ment, 63, 223±228.
Grune & Stratton. Ornduff, S. R., & Kelsey, R. M. (1996). Object relations of
Kuehnle, K. (1996). Assessing allegations of child sexual sexually and physically abused female children: A TAT
abuse. Sarasota, FL: Professional Resource Press. analysis. Journal of Personality Assessment, 66, 91±105.
Lah, M. I. (1989a). New validity, normative, and scoring Parker, K. C. H., Hanson, R. K., & Hunsley, J. (1988).
data for the Rotter Incomplete Sentences Blank. Journal MMPI, Rorschach and WAIS: A meta-analytic compar-
of Personality Assessment, 53, 607±620. ison of reliability, stability, and validity. Psychological
Lah, M. I. (1989b). Sentence completion tests. In C. S. Bulletin, 103, 367±373.
Newmark (Ed.), Major psychological assessment instru- Piotrowski, C. (1996). The status of Exner's Comprehen-
ments, Volume II (pp. 133±163). Boston: Allyn & Bacon. sive System in contemporary research. Perceptual and
Leichtman, M. (1996). The Rorschach: A developmental Motor Skills, 82, 1341±1342.
perspective. Hillsdale, NJ: Analytic Press. Piotrowski, C., & Keller, J. W. (1989). Psychological
Machover, K. (1948). Personality projection in the drawing testing in outpatient facilities: A national study. Profes-
of the human figure. Springfield, IL: Charles C. Thomas. sional Psychology, 20, 423±425.
Machover, K. (1951). Drawing of the human figure: A Prout, H. T., & Phillips, P. D. (1974). A clinical note: The
method of personality investigation. In H. H. Anderson kinetic school drawing. Psychology in the Schools, 11,
& C. L. Anderson (Eds.), An introduction to projective 303±306.
techniques (pp. 341±369). New York: Prentice-Hall. Reynolds, C. R. (1978). A quick scoring guide to the
McArthur, D. S., & Roberts, G. E. (1990). Roberts interpretation of children's Kinetic Family Drawings
Apperception Test for Children manual. Los Angeles: (KFD). Psychology in the School, 15, 489±492.
Western Psychological Services. Roback, H. B. (1968). Human figure drawings: Their utility
McClelland, D. C., Atkinson, J. W., Clark, R. A., & in the clinical psychologist's armamentarium for person-
Lowell, E. L. (1953). The achievement motive. New York: ality assessment. Psychological Bulletin, 70, 1±19.
Appleton-Century-Crofts. Ronan, G. F., Colavito, V. A., & Hammontree, S. R.
McClelland, D. C., Koestner, R., & Weinberger, J. (1989). (1993). Personal problem-solving system for scoring
How do self-attributed and implicit motives differ? TAT responses: Preliminary validity and reliability data.
Psychological Review, 96, 690±702. Journal of Personality Assessment, 61, 28±40.
McConaughy, S. H., & Achenbach, T. M. (1994). Manual Rorschach, H. (1942). Psychodiagnostics. Bern, Switzer-
for the semistructured clinical interview for children and land: Hans Huber. (Original work published in 1921.)
adolescents. Burlington, VT: University of Vermont Rotter, J. B., Lah, M. I., & Rafferty, J. E. (1992).
Department of Psychiatry. ManualÐRotter Incomplete Sentences Blank (2nd ed.).
McDowell, C., & Acklin, M. W. (1996). Standardizing Orlando, FL: Psychological Corporation.
458 Projective Assessment of Children and Adolescents

Seaton, B., & Allen, J. (1996, March). Interscorer reliability Weiner, I. B. (1993). Clinical considerations in the conjoint
of Rorschach structural summary data. Poster session use of the Rorschach and the MMPI. Journal of
presented at the annual meeting of the Society for Personality Assessment, 60, 148±152.
Personality Assessment, Denver, CO. Weiner, I. B. (1994). Rorschach assessment. In M. E.
Smith, D., & Dumont, F. (1995). A cautionary study: Maruish (Ed.), The use of psychological testing for
Unwarranted interpretations of the Draw-A-Person test. treatment planning and outcome evaluation
Professional Psychology, 3, 298±303. (pp. 249±278). Hillsdale, NJ: Erlbaum.
Stinnett, T. A., Havey, J. M., & Oehler-Stinnett, J. (1994). Weiner, I. B. (1995a). Methodological considerations in
Current test usage by practicing school psychologists: A Rorschach research. Psychological Assessment, 7,
national survey. Journal of Psychoeducational Assess- 330±337.
ment, 12, 331±350. Weiner, I. B. (1995b). Psychometric issues in forensic
Stone, H. K., & Dellis, N. P. (1960). An exploratory applications of the MMPI-2. In Y. S. Ben-Porath, J. R.
investigation into the levels hypothesis. Journal of Graham, G. C. N. Hall, R. D. Hirschman, & M. S.
Projective Techniques, 24, 333±340. Zaragoza (Eds.), Forensic applications of the MMPI-2
Sutton, P. M., & Swenson, C. H. (1983). The reliability and (pp. 48±81). Thousand Oaks, CA: Sage.
concurrent validity of alternative methods for assessing Weiner, I. B. (1996). Some observations on the validity of
ego development. Journal of Personality Development, the Rorschach Inkblot Method. Psychological Assess-
47, 468±475. ment, 8, 206±213.
Swensen, C. H. (1957). Empirical evaluations of human
Weiner, I. B. (1997). Current status of the Rorschach
figure drawings. Psychological Bulletin, 54, 431±466.
Inkblot Method. Journal of Personality Assessment, 68,
Swensen, C. H. (1968). Empirical evaluations of human
5±19.
figure drawings: 1957±1966. Psychological Bulletin, 70,
20±44. Weiner, I. B., & Exner, J. E., Jr. (1991). Rorschach changes
Tharinger, D. J., & Stark, K. (1990). A qualitative versus in long-term and short-term psychotherapy. Journal of
quantitative approach to the Draw-a-Person and Kinetic Personality Assessment, 56, 453±465.
Family Drawing: A study of mood- and anxiety-disorder Westen, D., Klepser, J., Ruffins, S. A., Silverman, M.,
children. Psychological Assessment, 2, 365±375. Lifton, N., & Boekamp, J. (1991). Object relations in
Thomas, A. D., & Dudek, S. Z. (1985). Interpersonal affect childhood and adolescence: The development of working
in TAT responses: A scoring system. Journal of representations. Journal of Consulting and Clinical
Personality Assessment, 49, 30±37. Psychology, 59, 400±409.
Vane, J. R. (1981). The Thematic Apperception Test: A Westen, D., Lohr, N., Silk, K., Kerber, K., & Goodrich, S.
review. School Psychology Review, 1, 319±336. (1985). Object relations and social cognition TAT scoring
Watkins, C. E., Campbell, V. L., Nieberding, R., & manual. Ann Arbor, MI: University of Michigan.
Hallmark, R. (1995). Contemporary practice of psycho- Westen, D., Ludolph, P., Block, M. J., Wixom, J., & Wiss,
logical assessment by clinical psychologists. Professional F. C. (1990). Developmental history and object relations
Psychology, 26, 54±60. in psychiatrically disturbed adolescent girls. American
Weiner, I. B. (1977). Projective tests in differential Journal of Psychiatry, 147, 1061±1068.
diagnosis. In B. B. Wolman (Ed.), International encyclo- Westen, D., Ludolph, P., Lerner, H., Ruffins, S., & Wiss,
pedia of neurology, psychiatry, psychoanalysis, and F. C. (1990). Object relations in borderline adolescents.
psychology (pp. 112±116). Princeton, NJ: Van Nostrand Journal of the American Academy of Child and Adolescent
Reinhold. Psychiatry, 29, 338±348.
Weiner, I. B. (1986). Assessing children and adolescents Worchel, F. F., Rae, W. A., Olson, T. K., & Crowley, S. L.
with the Rorschach. In H. M. Knoff (Ed.), The (1992). Selective responsiveness of chronically ill children
assessment of child and adolescent personality to assessments of depression. Journal of Personality
(pp. 141±171). New York: Guilford Press. Assessment, 59, 605±615.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.16
Assessment of Schema and
Problem-solving Strategies with
Projective Techniques
HEDWIG TEGLASI
University of Maryland, College Park, MD, USA

4.16.1 INTRODUCTION 460


4.16.2 PATTERNS OF USE 460
4.16.3 BASIC ASSUMPTIONS OF PROJECTIVE TECHNIQUES AND RESEARCH FROM OTHER
PSYCHOLOGY SUBFIELDS 461
4.16.3.1 Schema Theory and Projective Techniques 462
4.16.3.2 Unconscious Processing of Information 464
4.16.3.3 Inert Knowledge vs. Usable Knowledge 465
4.16.3.4 Affective and Motivational Influences on Cognition 465
4.16.3.5 Narrative Psychology and Thematic Apperception Techniques 466
4.16.4 LEVELS OF PERSONALITY ASSESSED BY PROJECTIVE AND QUESTIONNAIRE METHODS 467
4.16.5 SELF-REPORT VS. PROJECTIVE PERSONALITY TESTING 470
4.16.6 CONTRIBUTION OF PROJECTIVE TECHNIQUES TO DIAGNOSIS AND INTERVENTION 470
4.16.7 PROJECTIVE TECHNIQUES AS PERFORMANCE MEASURES OF PERSONALITY 472
4.16.7.1 Total Personality 474
4.16.7.2 Many Correct Solutions 476
4.16.7.3 Generalize to Different Criteria 476
4.16.7.4 Differences in Conditions of Learning and Performance 477
4.16.8 SPECIFIC PROJECTIVE TECHNIQUES 477
4.16.8.1 Stimulus 477
4.16.8.2 Response 477
4.16.8.3 Task Demand 478
4.16.8.4 Interpretation 478
4.16.9 THEMATIC APPERCEPTIVE TECHNIQUES 479
4.16.9.1 Response 479
4.16.9.2 Stimuli 479
4.16.9.3 Interpretation 480
4.16.10 RORSCHACH TECHNIQUE 481
4.16.10.1 Stimulus Features and Response Parameters 482
4.16.10.2 Interpretation 483
4.16.11 DRAWING TECHNIQUES 483
4.16.11.1 Interpretation 484
4.16.12 CASE ILLUSTRATION 484

459
460 Assessment of Schema and Problem-solving Strategies with Projective Techniques

4.16.13 VALIDATION ISSUES 487


4.16.13.1 Reliability 488
4.16.13.1.1 Scorer reliability 488
4.16.13.1.2 Decisional reliability 488
4.16.13.1.3 Test±retest reliability 488
4.16.13.1.4 Internal consistency 488
4.16.13.2 Construct Validation 489
4.16.13.3 Multitrait, Multimethod Validation 489
4.16.13.4 Criterion Validation 489
4.16.13.5 Part±Whole Relationships 490
4.16.13.6 Convergence of Psychometric and Conceptual Treatment of Data 490
4.16.13.6.1 Normative 491
4.16.13.6.2 Nomothetic 491
4.16.13.6.3 Case study 491
4.16.14 FUTURE DIRECTIONS 492
4.16.15 SUMMARY 493
4.16.16 REFERENCES 495

4.16.1 INTRODUCTION key concepts in the various subdisciplines, but


the task of continuously updating the conceptual
Clinical assessment practices typically lag frameworks that are applied to projective
behind the conceptualizations psychologists use methods remains an ongoing challenge.
to understand adaptive and maladaptive pat- Responses to projective tests yield informa-
terns in human functioning. We recognize that tion about the internal structure of personality
personality is a functional whole, but cling to and provide the context for understanding overt
assessment procedures that resemble the pro- behavior and symptomatology. This emphasis
verbial blind men and the elephant. Each on inner structure is currently shared by social,
method of assessment pertains to one dimension cognitive, and clinical psychology. Newer
or level of personality, yet we expect different appreciation is emerging of the wide role of
sources of data to confirm one another rather emotions and unconscious processes (e.g.,
than to reveal the different patterns contribut- Bargh, 1994; Clore, Schwarz, & Conway,
ing to the larger mosaic. A clarification of the 1994), as well as of meaning structures (e.g.,
part±whole relationships with regard to person- Bruner, 1986) underlying behavioral expression.
ality assessment promotes an understanding of Projective methods reveal the structure and
the place of projective techniques in a compre- process of personality that are largely uncon-
hensive evaluation. The aims of this chapter are: scious, and they differ in important ways from
(i) to facilitate the process of integration of checklists or self-report interviews. In addition,
projective techniques with self-report ap- the competencies required by projective tasks
proaches to personality assessment; (ii) to can be distinguished from those demanded by
reframe projective techniques as performance other measures in the typical assessment battery
measures of personality that differ in their such as the Wechsler scales.
demands from other tasks within the battery; Therefore, projective techniques can be
(iii) to show that the central tenets of projective viewed as performance measures of personality
techniques embodied in the projective hypoth- that reveal inner structures or schema as
esis are supported by empirical research in resources needed in daily life. Each projective
various psychology subfields; and (iv) to method sets a task that requires the application
demonstrate the utility of projective techniques of these structures to respond adaptively. All
for assessing schema (the internal representa- factors contributing to personality development
tion of reality) and problem-solving strategies to are involved in shaping these inner structures of
meet the reality demands posed by the task. meaning. Therefore, the interpretation of
That projective techniques do not belong to a responses to projective techniques promotes
particular theory of personality was asserted the integration of knowledge from various
early on (Auld, 1954). Projective methods yield subdisciplines.
open-ended responses, and their interpretation
can accommodate changing paradigms in var-
ious subdisciplines, such as the cognitive and 4.16.2 PATTERNS OF USE
neurosciences as well as in the study of emotion
and social cognition. It is easy to say that the Despite criticisms of projective tests, there
framework to understand and interpret re- continues to be broad interest in the Rorschach
sponses to projective tasks should coincide with and the Thematic Apperception Test (TAT;
Basic Assumptions of Projective Techniques and Research from other Subfields 461

Butcher & Rouse, 1996). Most clinical psychol- methods in clinical practice. The clinician
ogy doctoral training programs include formal simultaneously evaluates multiple dimensions
instruction in the Rorschach (Piotrowski & of the response in reference to a psychological
Zalewski, 1993). In addition, the majority of construct rather than adding individual re-
internship sites approved by the American sponse variables. More sophisticated validation
Psychological Association (APA) value knowl- efforts are needed that focus-on the integrative
edge in Rorschach technique (Durand, Blan- approach rather than piecemeal summation of
chard, & Mindell 1988). The Exner isolated response characteristics (e.g., Handler
Comprehensive System (Exner, 1993; Exner & & Habenicht, 1994). Finally, more refined
Weiner, 1995) has become widely accepted approaches to validation suggest that different
because it provides a more reliable and objective facets of psychological constructs are measured
basis for interpretation than was previously by different techniques. For instance, a com-
available. Surveys find the Rorschach, TAT, prehensive meta-analysis indicated that projec-
and various drawing methods among the top 10 tive measures of achievement motivation
most frequently used assessment techniques predict long term behavioral trends, whereas
(Archer, Marnish, Imhof, & Piotrowski, 1991; self-report measures predict immediate choices
Piotrowski & Keller, 1989; Piotrowski, Sherry, (Spangler, 1992). Refinements in conceptuali-
& Keller, 1985; Watkins, Campbell, & McGre- zation and approaches to validation are calling
gor, 1988; Watkins, Campbell, Nieberding, & into question earlier criticism based on psycho-
Hallmark, 1995). Interest in the Rorschach and metric shortcomings.
TAT is also evident outside of the USA (Weiner,
1994).
The contrast between the negative evaluation 4.16.3 BASIC ASSUMPTIONS OF
of projective methods by researchers and their PROJECTIVE TECHNIQUES AND
continued popularity with practitioners (Pio- RESEARCH FROM OTHER
trowski, 1984) begs for a scrutiny of the PSYCHOLOGY SUBFIELDS
assumptions and methods applied to judging
the utility of these techniques. This state of The term ªprojective techniqueº was coined
affairs calls for an analysis not only of by Frank (1939) to describe tests using
psychometric properties but also of the con- ambiguous stimuli and/or tasks that are less
ceptual and methodological underpinnings obvious in their intent and are therefore less
pertaining to projective testing. Experienced subject to faking. The aim of any projective
practitioners continue to use these techniques technique is to set a task that allows the
because they find them helpful. For instance, an individual to express characteristic ways of
advantage of the Rorschach is that the validity perceiving and organizing experiences. The
of its administration is independent of the uniqueness of the individual's style of respond-
client's reading level (Archer et al., 1991). ing is particularly evident when stimuli are
Shneidman's (1951) edited volume presented ambiguous and there is no ready response.
the conclusions of 16 clinicians who blindly Under such conditions, there are many possible
interpreted the same TAT protocol using their ways to approach the task or situation, and the
own preferred method. Despite different ap- person must actively organize the response. A
proaches and foci, they were remarkably central assumption of all projective methods is
accurate and consistent with each other. Also that stimuli from the environment are perceived
included in this volume were Klopfer's conclu- and organized by the individual's specific needs,
sions based on a blind interpretation of the same motives, feelings, perceptual sets, and cognitive
individual's Rorschach record which was highly structures, and that in large part this process
consistent with those based on the TAT occurs automatically and outside of awareness
protocols. It should be noted that the inter- (Frank, 1948).
preters all had one thing in commonÐthey were These assumptions about projective testing,
ªexperts.º In light of what we know about how first elaborated by Frank (1939) in his analysis
experts perform (e.g., Chi, Glaser, & Farr, of the projective hypothesis, are compatible
1987), we would not expect to replicate such with converging trends across various subfields
findings with more recently trained individuals. in psychology pointing to the role of previously
In evaluating the evidence of reliability and organized inner structures or mental sets in the
validity in the clinical use of the TAT, the interpretation of stimuli. Since the cognitive
training and experience of interpreters was a revolution of the 1960s, psychology has under-
primary consideration (Karon, 1981). Research gone a significant transformation by moving
designs that isolate response variables and away from an emphasis on stimulus±response
subject them to separate statistical analyses units to describe human perception and action
do not mirror the interpretation of projective to increasing emphasis on how humans impose
462 Assessment of Schema and Problem-solving Strategies with Projective Techniques

meaning and organization on their experiences in a stereotyped or routine situation such as


(Singer & Salovey, 1991). Within cognitive ordering dinner in a restaurant (Abelson, 1981).
psychology, simple associational models (e.g., Such a ªscript is a set of expectations about what
paired associate learning) have become less will happen next in a well understood situationº
important in describing ongoing information (Schank, 1990, p. 7). As long as we know what
processing than the organized meaning struc- script others are following, we know how to act
tures, such as schema or scripts that are brought and to predict the actions of others. Upon
to bear on the interpretation of experiences entering a restaurant, we are seated, order from
(Piaget, 1954; Taylor & Crocker, 1981). At the the menu, receive our meal, pay, and depart.
same time, developments in social psychology This process of comparing experience with the
(Abelson, 1976; Festinger, 1957; Heider, 1958; internal schema enables us to perceive rapidly
Kelley, 1967) led to the exploration of organized unstated information and to anticipate what
meaning structures as the fundamental bases of will happen next. Some individuals may view
social action (Wyer & Srull, 1994). In the situations for which they have no scripts (novel)
psychodynamic paradigm, the emergence of as challenging, whereas others are stymied (e.g.,
object relations theory (Blatt & Wild, 1976) ªspeechlessº) in such conditions.
implicitly relies on internal structures similar to Person schema include information about the
schema to guide information processing about self, others, and their interactions. They deal
the self and other. The considerable work in with the individual's rules for predicting,
cognitive, social, personality, and clinical psy- interpreting, responding to, and controlling
chology bearing on schema theory and support- affectively charged encounters (Tomkins,
ing the projective hypothesis is briefly 1979). For Tomkins (1979, 1987, 1991), emo-
highlighted next. tion is a key organizer of personal scripts. Once
the script has been formed, the script organizes
and modifies new experiences to fit the
4.16.3.1 Schema Theory and Projective preexisting structure. Person schema develop
Techniques through the individual's synthesis of past
experiences according to the individual's style
Earlier cognitive work within clinical psy- of processing information (Horowitz, 1991).
chology focused on the role of self-statements Therefore the construct of person schema
and inner thoughts. But clinical scientists brings together models of perception, cogni-
interested in the role of cognition have increas- tion, memory, affect, action, and feedback.
ingly adopted an information-processing ap- Furthermore, it has the potential to integrate
proach to clinical phenomena (Ingram, 1986). psychodynamic formulations and cognitive
This approach focuses on the schema that perspectives in understanding personality (Hor-
provide rules guiding behavior in social rela- owitz, 1991; Stein & Young, 1992). This process
tionships and appear to influence how informa- of interpreting new information according to
tion about relationships is stored in memory previously acquired sets takes place without
(Fiske, Haslam, & Fiske, 1991). Well-learned, conscious awareness and may also lead to
organized knowledge structures (schema or systematic distortions in perception, interpreta-
scripts) have considerable impact on memory tion, and action in interpersonal encounters.
storage and retrieval. Projective tasks require Causal schema give rise to attributions about
the superimposition of previously acquired the causes of events. Schema theory and
schematic structures to the perception of stimuli attribution theory overlap in that attribution
and the organization of the response. Acklin theory maintains that individuals are motivated
(1994) presented a reformulation of the re- to make causal inferences about experiences and
sponses to the Rorschach on the basis of schema that attributions are made in ways that are
theory and information processing. Within this congruent with existing schema of self and
framework, emphasis is placed on the nature other, and of the assumed relationships between
and adaptiveness of the schema activated by the causes and effects (Tunis, 1991). The relation-
stimulus properties. ship between such causal attributions and
Memory structures such as schema or scripts depression has been studied.
that store information about situations, per- Evidence for the existence of schema is seen in
sons, and events guide the interpretation of their consequences for information processing,
experiences by providing criteria for regulating particularly in how information is remembered
attention to lend focus to the process of and retrieved. Schema-driven information pro-
encoding, storage, and retrieval of information cessing involves the application of organized
in specific domains (Taylor & Crocker, 1981). knowledge structures akin to theories that
Event schema refer to the type of script that govern the perception and interpretation of
organizes understanding of a sequence of events facts. Thus, schema function to allow general
Basic Assumptions of Projective Techniques and Research from other Subfields 463

knowledge to influence perception of specific models and active, enduring schema may
experiences. Schema increase efficiency in contribute to upsetting emotional experiences.
identifying perceptions, organizing them into Working models may not match the actual
manageable units, filling in missing informa- qualities of current social situations, leading to
tion, and selecting a strategy for how to obtain errors in judgment and behavior and to
further information if needed. Because schema subsequent negative emotions. The aim of
guide behavior in adaptive and maladaptive many psychotherapeutic techniques is to pro-
ways, their study pertains to clinical psychology mote conscious awareness of unconscious,
as well as to the general study of personality, schematic functions. Such awareness may
cognition, and social perception. Understand- permit the individual to actively override some
ing the recurrence of maladaptive interpersonal of the influence of the unconscious information
patterns is facilitated by studying the operation processing.
of schema along with the realistic properties of Scripts or schema may be further organized
the current situation (Horowitz, 1991). into metascripts (Singer & Salovey, 1991) that
The overlap between schema or script theory reflect the individual's style of dealing with
and the projective hypothesis is evident in their scripts. These superordinate schema may ex-
common features. The development of schema plain resilience because they influence how
or scripts and their retrieval from memory individuals reshape their schema when con-
represent an unconscious process through fronted with daily stress, unforeseen failure, or
which past perceptions influence the interpreta- unexpected upheaval. Those with more complex
tion of current situations. These are precisely schema may be more willing to replay negative
the processes that are assessed with projective information to explore a variety of alternative
techniques. The focus of projective methods is schema or scripts. More flexible schema involve
on the application of schema to the task the realization that current ideas that seem so
demands (e.g., how schema influence the clear and self-evident at the moment may
response process) and, by extension, to life change as a function of new experience. This
challenges requiring similar adaptations. process cannot be taught directly, but is a
Even though schema are structures of the byproduct of experiencing changes in percep-
mind, they must be sufficiently malleable to tion, understanding, and feeling in light of new
adapt to new situations and new configurations information. Consequently, it is important to
of events (Rummelhart, Smolensbus, McClel- assess the style of synthesizing experience
land, & Hinton, 1986). Therefore, it is appro- through the organization and reorganization
priate to view schema as emerging when they are of schema.
needed ªfrom the interaction of large numbers Analysis of responses elicited by projective
of much simpler elements all working in concert techniques can be informed by models of
with one anotherº (p. 20). A distinction has been schema development. Abelson (1976) assumes
made between enduring schema and working that schema begin as representations of single
models (Horowitz, 1991). The former are concrete examples and become more abstract.
intrapsychic meaning structures containing Rudimentary schema are compilations of single,
generalized formats of knowledge that can be concrete examples that are used to make snap
activated by other mental activities related to judgments about seemingly similar instances.
that knowledge such as motivational concerns Stereotypic schema encompass the most repre-
(Fiske & Taylor, 1984). Working models sentative features of events, persons, or groups.
combine internal and external sources of More abstract schema recognize inconsistencies
information, such as when the individual is between reality and the activated schema. The
actively contemplating an interpersonal situa- individual is aware of complexities and ambi-
tion or task such as the TAT. Working models guities rather than simply encoding schema±
actively integrate stimuli from a current situa- congruent information. Highly developed sche-
tion with past knowledge through triggering of ma permit the individual to engage in complex
an associative network of ideas drawn from information processing seemingly effortlessly
enduring person schema. and without awareness. The more abstract the
A person's repertoire may include several schema, the more flexibly they can be applied
different enduring person schema in relation to because they include conditional and inferential
a given type of relationship, situation, or concepts, abstract rules, and affective informa-
activity. The working model may incorporate tion. The progression from rudimentary to
different elements from each of the person more complex schema is the product of
schema. The wider the repertoire of enduring experience and the degree of active, strategic
person schema, the more flexibility in con- effort brought to the structuring and organiza-
structing working models of interpersonal tion of experience. Schema theory recognizes
situations. Discrepancies between working the role of affect, motivation, and biological
464 Assessment of Schema and Problem-solving Strategies with Projective Techniques

factors (genetics, temperament), as well as psychologists (Bargh, 1994). Since the 1970s,
environmental influences (stress and supports) increasing evidence suggests that information
as they interact with cognitive processes. The not accessible to conscious awareness influences
shaping of schema, especially social scripts by memory, perception, and thinking (Kihlstrom,
culture, is also acknowledged and has been 1990). Efforts to refine assumptions about
studied by anthropologists and linguists (Quinn unconscious processes continue across the
& Holland, 1987). various subfields of psychology and are not
Responses to projective methods may be restricted to projective testing or even to clinical
evaluated according to aspects of schema psychology.
described above. When telling stories to picture Social cognitive psychology (social cognition)
stimuli, some individuals superimpose associa- has become increasingly interested in uncon-
tive elements without an organizational net- scious thought processes of mental representa-
work; others impose stereotypes that may or tions of situations, self, and others (schema) and
may not precisely fit the stimulus configura- in the interaction of cognition and affect (Wyer
tions; still others draw creatively on various & Srull, 1994). Attitudes, expectations, or
elements of their experiences to tell a cohesive schema that are strong enough to be auto-
story that captures the gist of the stimulus and matically activated have been described in terms
complies with the instructions. If an individual of their ªchronic accessibilityº (Fazio, Sanbon-
is presented with a TAT picture that cannot be matsu, Powell, & Kardes, 1986). Chronically
readily explained by a stereotypic script or an accessible schema may be cued by emotions
intact schema, then the respondent is called arising in ambiguous situations, including
upon to actively construct a script to fit the reactions to projective stimuli. According to
situation. Similarly, when faced with inkblots Westen (1993), ªResearch on chronic schema
that are rough approximations of real objects, accessibility has confirmed the projective hy-
the individual must remain flexible in applying pothesis that in ambiguous situations enduring
schema to answer the question ªWhat might this interests, concerns, needs, and ways of experi-
be.º Those who rely on more rudimentary encing reality are likely to be expressedº (p. 381).
schema will make associations directly from Bargh (1994) suggests that temporarily acces-
discrete portions of the stimuli. However, those sible schema due to a recent experience can
who apply more cohesive schema find a mean- mimic chronic accessibility and constitute a
ingful framework with which to integrate pitfall for studies of enduring cognitive struc-
disparate or complex stimuli. tures. Indeed, this pitfall is common in the use of
Two aspects of long-term memory structures personality tests, be they questionnaire or
(schema), procedural and declarative knowl- projective devices. The distinction between state
edge (Anderson, 1983; Kihlstrom, 1984), are and trait variables acknowledges states as
relevant to the interpretation of projective temporary conditions of mentality, mood, levels
responses. Procedural knowledge refers to of arousal or drive, and these states may be
unconscious processes or skills such as the related to temporarily evoked schema. One of
structure of language, the organization of the advantages of projective measures is the
music, or other implicit rules that order opportunity to examine both the structure and
information or perception. Declarative sche- content of schema even if temporarily evoked.
matic structures entail the recall of factual The content of the schema may be more
information such as names, locations, and susceptible to influence by such variations in
historical events. Responses to projective tech- experience than the structure. However, data
niques reveal the implicit organization of are needed to support this contention.
knowledge as well as the content that enters A dramatic demonstration of the effect of a
awareness in response to the stimuli presented. recent experience occurred one semester when
Interpretive approaches can focus on the the author's students administered assessment
organization of schema by examining the batteries to incarcerated youths (generally
structural aspects of responses and the sequence between the ages of 18 and 22) to determine
of ideas expressed, as well as by analyzing the educational strategies. A great number of TAT
content. stories ended by the characters ªtalking about it
and solving their problem.º After some in-
vestigation, it was learned that a communica-
4.16.3.2 Unconscious Processing of Information tion program had been recently instituted to
teach the value of resolving conflict by talking
The idea that human behavior is based on a things out. A close inspection of the stories
considerable storehouse of organized knowl- themselves revealed that the schema about
edge structures operating outside of conscious communication were not well developed but
awareness is now widely accepted by cognitive verbalized as a stock ending apart from other
Basic Assumptions of Projective Techniques and Research from other Subfields 465

components of the problem-solving process. be dormant in memory and not available for use
For example, the problems were poorly under- unless externally prompted.
stood, and characters' feelings, perspectives, Different types of tasks are needed to
and intentions were vaguely or inaccurately differentiate knowledge that is used sponta-
deciphered, if at all. The idea for talking about neously to solve problems from knowledge that
the problem became accessible to conscious is available only when cued by the context.
awareness through training as declarative Everyday behavior requires spontaneous use of
knowledge but without the understandings or prior knowledge to make social judgments or
procedural knowledge needed for its imple- other decisions; the possession of factual
mentation. Conscious awareness of the benefits knowledge is far removed from problem solving
of communication was not combined with the in everyday encounters. The TAT stories of the
implicit and largely unconscious understand- incarcerated youths referred to earlier shows
ings and affective reactions that promote such how knowledge may be stated without being
interactions. This example illustrates the im- embedded in a usable framework. Projective
portance of studying the inner logic and measures differ from typical cognitive tasks and
cohesiveness of schematic structures to discover self-report measures of personality in ways that
how story content derived from recent experi- have relevance for distinguishing between inert
ences (including vicarious ones such as books and useful knowledge. Rather than being asked
or movies) is integrated with other aspects of to provide specific information about the self,
the task demand such as accurately perceiving the respondent is provided with a relatively
the stimuli and following instructions. Both unstructured task and asked to give an open-
declarative and procedural knowledge must ended response that is evaluated by a profes-
be addressed to overcome the pitfall of sional. Likewise, solving verbally communi-
attributing too much weight to content that cated problems such as those on intelligence
is not meaningfully incorporated into guiding tests does not resemble the conditions of
structures. everyday situations. Such problems tend to
contain words that act as cues for accessing
relevant knowledge, whereas everyday
4.16.3.3 Inert Knowledge vs. Usable Knowledge problem-solving requires the individual to size
up relevant features of the situation to notice
Inert knowledge has been described as that there is a problem (Sternberg, 1985). Real
information that the person knows, but does problems are accompanied by emotions that
not apply unless explicitly cued or prompted to shape the way the individual thinks about the
do so (Bransford, Franks, Vye, & Sherwood, problem. Individuals may verbalize several
1989). The reason why a person fails to use acceptable alternatives for handling a situation,
relevant knowledge to solve particular problems if prompted. Nevertheless, when faced with
is that this information does not spontaneously real-life dilemmas, they may be unable to
come to awareness in the context of that translate this knowledge into appropriate ac-
problem, or that the schema for applying the tion. The projective tests reveal the spontaneous
information are not sufficiently developed. The accessibility of the schema that guide the
manner in which knowledge is encoded and application of what comes to awareness.
organized in the first place determines its
subsequent accessibility and applicability in
other situations. 4.16.3.4 Affective and Motivational Influences
Effective learning in general requires active on Cognition
strategies to organize and recall information
such as rehearsal or breaking down the task into Most studies in this area focus on the way
manageable units. Torgesen (1977), who intro- mood or emotion influences memory and
duced the concept of the inactive learner, judgment (Isen, 1984, 1987) and on unconscious
reported that learning-disabled students exhibit processing of information (Uleman & Bargh,
a learning style characterized by minimal 1989). Epstein (1994) describes the impact of
planning and limited use of strategies. Indivi- emotion on thinking and argues for the
duals may process information from one existence of two fundamentally different but
domain more thoroughly than in others, or interactive modes of information processing: a
the inactive stance may be more pervasive. This rational system and an experiential system
inactive style is manifested by processing driven by emotion. Thinking in the experiential
information exactly as it comes in (memoriza- system is shaped by emotion and tends to occur
tion) without organizing and restructuring it. In without deliberate effort or conscious aware-
the absence of active, effortful, strategic ness. The rational system is characterized by a
processing of information, the knowledge may more deliberate process of thinking and of
466 Assessment of Schema and Problem-solving Strategies with Projective Techniques

acquiring information such as through text- 4.16.3.5 Narrative Psychology and Thematic
books and direct teaching. Epstein explains that Apperception Techniques
stories and anecdotes spice up an otherwise dry
lecture because they increase emotional engage- The analysis of narrative has been a rich
ment and thereby appeal to the experiential source of data in studying the schema or scripts
system. that individuals bring to current understanding
Emotions dramatically alter the thought of experiences. The social-cognitive approach
process and influence behavior. Thinking is examines the memory system by analyzing the
transformed by intense emotion in the direction individual's style of processing and organizing
of being categorical, personal, concrete, unre- information. Memories such as self-defining
flective, and action oriented. The product of experiences (Moffit & Singer, 1994) or incidents
such thinking is likely to be considered as self- of being rejected (Baumeister, Wotman, &
evident and not requiring proof (Epstein, 1994). Stillwell, 1993) are examined to extract abstract
Depressed affect is associated with preference rules and cognitive principles. A study on script
for immediate gratification in place of delayed formulation (Demorest & Alexander, 1992) first
but more substantial rewards (Wertheim & asked participants to generate autobiographical
Schwartz, 1983). Anxiety tends to bias the memories. Scripts extracted from these mem-
interpretation of ambiguous text (MacLeod & ories were subsequently (one month later)
Cohen, 1993). Chronically experienced negative compared with those derived from participants'
emotions appear to have a cumulative influence invented stories in response to affective stimuli.
on the development of cognitive structures The scripts drawn from fictional stories were
amenable to assessment with projective techni- similar to those derived from autobiographical
ques. Children temperamentally prone to ex- memories. These findings suggest that scripted
perience negative affect told stories to TAT knowledge structures are superimposed on new
stimuli that reflected less complex information affective stimuli.
processing and more reactive self-regulatory Schank (1990) addressed the question of how
styles dominated by more immediate concerns stories exchanged during the course of social
than their less emotional peers (Bassan-Dia- conversation are accessed in memory. To
mond, Teglasi, & Schmitt, 1995). understand how a person is reminded of a story
The following vignette shows how intricately to tell at the right moment in a conversation
problem solving is tied to emotions (Readers requires breaking it down into themes, plans,
Digest, May, 1982, p. 79. Quoted from Jim goals, actions, and outcomes (Schank, 1990).
Whitehead, quoted by Seymour Rosenberg in These elements constitute a filing system for
Spartanburg, SC Herald): indexing the relevant components of experi-
ences to be stored in memory and subsequently
Two hikers were walking through the woods when retrieved. A higher level organization of these
they suddenly confronted a giant bear. Immedi- concepts involves a prediction, a moral, or a
ately, one of the men took off his boots, pulled out lesson learned. When a lesson repeatedly occurs,
a pair of track shoes and began putting them on. it becomes a type of structure that exists apart
ªWhat are you doing?º cried his companion. ªWe from the specific stories from which the lesson
can't outrun that bear, even with jogging shoes.º arose.
ªWho cares about the bear?º the first hiker replied. For Schank, telling a good story at the right
ªAll I have to worry about is outrunning you.º time is a hallmark of intelligence because it
represents an understanding of what is being
Now what if the two were husband and wife, talked about now through its connection with
parent and child, or very close friends? The the lessons that the listener has stored in
emotional investment in the relationship would memory. Recalling an appropriate story to tell
change the problem to how can ªweº escape or and understanding how it relates to what is
even how can ªyouº reach safety. A parent may currently being spoken about depends on how
suggest that the child run for it. The automatic the stories are catalogued in memory. It seems
influence of affect in defining the range of logical that individuals who engage in more
acceptable solutions to problems is evident. active and reflective classification of experience
Westen (1993) observes that cognitive psychol- for storage into memory are also more likely to
ogists do not focus sufficiently on the motiva- be reminded of an appropriate story at the right
tional and emotional factors of unconscious moment in a conversation. This framework is
processing of information. Affect and motiva- directly applicable to the interpretation of
tion can enhance or disrupt the degree of stories told to TAT-like stimuli. Respondents
organization and resourcefulness in the acquisi- are reminded of a story by the stimulus, but the
tion and subsequent application of knowledge particular story details are shaped by convic-
structures. tions or lessons learned from experience.
Levels of Personality Assessed by Projective and Questionnaire Methods 467

One way to validate constructs is through the report of those traits (McGreevy, 1962).
congruence of models drawn from various lines Different techniques may be useful for elucidat-
of research. Arnold (1962) proposed that ing different levels or dimensions of personality.
individuals express their basic attitudes or However, a comprehensive understanding re-
convictions in their TAT stories, and that these quires a framework for how the different levels
convictions constitute their motivational set. relate to one another.
Like Schank, she focused on goals, plans, Measuring achievement motives through
actions, and outcomes as the basic ingredients self-report of intentions, goals and reasons
of the story import or lesson learned. for actions (self-attributed) and inferences
The rise of narrative psychology has the drawn from stories written to picture stimuli
potential to enrich story-telling methods of (implicit) are not equivalent (Koestner, Wein-
personality assessment. Analysis of narratives berger, & McClelland, 1991; McClelland,
such as those elicited by the TAT is considered Koestner, & Weinberger, 1989). McClelland
useful for clarifying the degree of complexity and colleagues propose that the general absence
and organization of information processing of significant correlations between measures of
pertaining to the synthesis of experience self-attributed motives derived through self-
(Carlson & Carlson, 1984). These cognitive report, and implicit motives inferred from
and perceptual processes can also be assessed stories, cannot be ascribed to the worthlessness
through the Rorschach technique. It has been of projective measures nor to poorly designed
suggested that the Rorschach should combine questionnaires, but must be taken seriously as
its emphasis on perception with a focus on evidence that these are in fact different
cognitive representation (Blatt, 1992). variables with the same name. Rather than
seeking to remedy deficits in either type of
measure, they suggest that it may be more
4.16.4 LEVELS OF PERSONALITY fruitful to acknowledge them as reflecting two
ASSESSED BY PROJECTIVE AND qualitatively different kinds of human motiva-
QUESTIONNAIRE METHODS tion. They contend that implicit motives
develop by intrinsic enjoyment generated when
Projective methods measure different dimen- doing tasks or experiencing activities, and
sions or levels of personality than self-reports. predict spontaneous goal-directed actions sus-
One of the clearest ways to understand the tained over time even in the absence of specific
unique contribution of projective techniques is social demands. However, self-attributed mo-
in the context of understanding these levels. tives are built around explicit social incentives
McAdams (1995) proposed three conceptually or demands of socializing others, and predict
distinct levels of understanding personality: responses to situations structured to provide
level oneÐtraits or stylistic, habitual tenden- such incentives. The implicit achievement
cies; level twoÐdevelopmental or motivational motivation does not give information about
constructs such as goals, plans, and strivings; the area of life to which a person will direct
and level threeÐthe life narrative or the efforts to succeed. Self-attributed motives,
evolving story that is internalized to provide plans, and goals may express the person's
broader meaning, purpose, and cohesiveness to conscious intentions but do not give informa-
specific experiences. McAdams does not de- tion about the person's commitment and
scribe relationships among the levels, though it capacity to follow through. Under specific
seems logical to assume that they are not conditions, self-attributed motives act like
independent since they are different perspec- implicit motives in that they energize, direct,
tives (or units) on the functional whole. For and select behavior. However, the problem
Klopfer (1981), a trait itself can be understood with predicting long-term behaviors from self-
at three levels. First, as viewed by significant attributed motives is that the social incentives
others or the public image (with or without may not be salient enough to elicit the
awareness); second, as viewed by the individual behavior. Implicit motives exert relatively
or conscious self-concept; and third as mani- greater influence because they drive activity
fested (with or without awareness) in behaviors that is inherently enjoyable even in the absence
such as responses to projective tests. This of specific social demands.
emphasis on the source of the data is consistent The acquisition of implicit and self-attributed
with a fundamental distinction between two motives such as proposed by McClelland and
levels of personality, based on the perspectives colleagues may relate, respectively, to the
of the actor and observer (Hogan, 1987). experiential and rational systems discussed
Projective tests were better able to predict peer earlier (Epstein, 1994). Learning in the experi-
rankings on four personality traits (publicly ential system takes place in an affective context
observed image) than the individual's self- where emotions and interests drive attention
468 Assessment of Schema and Problem-solving Strategies with Projective Techniques

and information processing. Additionally, the sessions. Sandra was in therapy at the insistence
development of these two types of motives may of her family because ªshe had no direction in
involve different principles of learning (Meiss- life.º During the initial interview with Sandra,
ner, 1974, 1981; Raynor & McFarlin, 1986; she described her erratic schedule of eating and
Sandler & Rosenblatt, 1962). One pertains to sleeping, her general lethargy and boredom, as
the acquisition of cognitive skills that are well as sporadic class attendance. She reported
functional and enlarge the individual's reper- having similar problems in high school such as
toire of adaptive capacities through the teaching difficulty getting up, missing school once a
and reinforcement of these competencies. The week, not studying, and cheating to get by. She
other relates to the development of the inner stated emphatically that she was not depressed.
world or the self-system that provides relative When asked directly about drug use, Sandra
independence from external incentives. The reported significant involvement starting in the
advantage of developing this inner structure is fifth grade. She did not discuss this issue with
to shift from dependence on external supports to her previous therapist because ªno one asked
increased reliance on self-regulatory mechan- me.º She described the therapist as not provid-
isms (Meisner, 1981). These regulatory func- ing enough structure during the sessions so she
tions increase the individual's capacity to master just talked about meaningless topics. Sandra
the environment, endure stress, and delay verbalized her desire to achieve and recognized
gratification of impulses. that she was not accomplishing her goals. At the
The development of increasingly complex and same time, she admitted that she found school
differentiated self-regulatory structures or sche- work aversive.
ma takes place at a level of organization that is As part of the evaluation, the TAT, the
different from that of acquiring behavioral Wechster Adult Intelligence Scale (WAIS-R),
patterns through reinforcement. Such develop- and Rorschach were administered among other
ment relies on the individual's integration of life measures. The first five TAT stories that she told
experience to seek consistency within the self are given below to illustrate the lack of
(Raynor & McFarlin, 1986). However, emo- connection between Sandra's goal setting and
tional, cognitive, and attentional limitations means toward their attainment. The emphasis is
interfere with synthesis of life experiences and not on developing a comprehensive clinical
detract from the development of inner self- picture but on presenting general conclusions
regulatory structures. For example, the dom- about each story that are pertinent to Sandra's
ination of awareness by that which is immedi- motivational schema:
ately striking or concrete is not compatible with
the development of long range interests, values, Card 1. Don't like pictures. Not all of them . . .
or aims. The primary affective impediment to Umm . . . like storyÐlike once upon a time.
the internalization of such self-regulatory values This is a boy . . . He's . . . OK, he . . . is it a
are the limitations in the capacity to sustain violin? I don't know. Can I say two things?
emotional involvement in interests that are (Whatever you want). He's either frustrated
relatively remote from personal needs (Shapiro, because he can't play it or he's sad because
1965). he's not allowed to play it anymore. He wants
Motivation can be understood in relation to to play it but for some reason he can't . . .
two necessary ingredients: the goal or intention (stares at picture) (TO) He becomes a great
and the self-regulatory capacity to sustain goal (did I say that was a violin?) violin player, and
directed action (Arnold, 1962; Kuhl & Beck- he's happy. (OK, I think you have the idea.)
mann, 1994). Thus, a stated goal or intention
such as playing violin in a symphony orchestra Sandra is immediately put off by the task and
may be difficult to implement and sustain, vacillates in her interpretation of the picture as
regardless of musical talent because of the she does in her own intentions. The story
rigorous independent practice required. Such content is appropriate to the stimulus, but
sustained effort is facilitated by the enjoyment Sandra cannot determine whether the boy is
of and interest in the activity. Otherwise, experiencing internal (frustration) or external
careful programming of external incentives is (not allowed) barriers to achievement. In any
needed. case, the boy takes no heed of these problems
Understanding the gaps between a person's and becomes a great violin player, seemingly
stated goals or intentions and the individual's without effort. This unrealistic connection
self-regulative resources is crucial to planning among circumstances, intentions, actions, and
interventions. A 19-year-old college student was outcomes suggests that Sandra's schema do not
referred by her therapist for an evaluation include a clear grasp of the obstacles to
because no meaningful discussions were taking achievement nor of the steps needed to attain
place after six months of regular attendance of goals.
Levels of Personality Assessed by Projective and Questionnaire Methods 469

Card 2. God! These things are . . . OK, Card 4. OK, . . . this is a man and a woman
there's . . . (stares) . . . This girl on her way who . . . they either just had a fight, or he
to school, she's walking to school, and she wants to go somewhere, and she doesn't want
sees this man working in the field, and that's him to go. OK, she's trying to persuade him
his wife watching him. His wife's talking to either not to go or to forgive her. And then, I
him I guess, may be talking to him. He's guess eventually he does it, he forgives her or
working with his horse on the farm, and she's he doesn'tÐor he might go, or he'll stay. And
thinking that she doesn't want to look like they live happily ever after.
that when she gets older (like the lady). She
doesn't want to live on the farm. She feels Again, Sandra cannot commit to one story
sorry for that lady, for both, for the lady. So line, and the characters' intentions or reasons
she goes to school to get an education so she's for acting are not examined. As with the
not like them. previous stories, she seems unable to develop
her ideas clearly beyond the initial identification
The girl goes to school not because she really of the tension depicted in the card, suggesting
wants to learn but to avoid becoming what she dependence on external stimuli. Her vacillation
fears. Sandra's schema about the importance of about what's going on in the picture (noted in
education does not include a positive affective several picture cards) suggests that she feels
quality nor the spontaneous accessibility of the uncertain about her judgment in situations that
effort involved. Sandra does not delve into the are somewhat ambiguous. As in previous stor-
purposes and inner life of the characters but ies, the ending is unrealistically positive as
relies on the stimulus, on external appearances Sandra does not make appropriate connections
(doesn't want to look like the lady), and social among intentions, actions, and outcomes.
convention (getting an education). Thus exter-
nal structures and supports are important in Card 5. OK, can you make up people who
guiding her reactions. aren't there? This is a lady who is coming to
tell her children it's time for their nap. So she
sticks her head in and says it's dinner time,
Card 3BM. Don't know what this is . . . and all the kids come to dinner. Everybody
umm . . . guess it's a girl. A boy or girl, either eats dinner, and kids go back and play. In the
sleeping, crying, or . . . umm . . . (frustrated) library, denÐI think it's supposed to be the
don't even know what. (repeat directions) I library.
guess she's crying because . . . OK, this
doesn't make sense though. (stares as if Sandra's query about introducing additional
answer is in the card) I think they're car characters further points to her need for ex-
keys. No idea. This is dumb. She's crying ternal guidance. The lady intends to tell the
because . . . what is that thing is . . . someone children that it's time to take their nap but ends
stole her car. She has no way home from this up calling them to dinner. This twist in the story
place. So then she calls the police and tells suggests that Sandra is not monitoring story
them that her car is missing. Eventually they details (also seen in other stories) and that she
find her car, and then she's happy. Can you has similar difficulty keeping track of her own
tell me what that is? (It can look like different intentions.
things to different people.) Generally, Sandra is able to recognize the
conflict or tension in the pictures but has
Sandra initially vacillates before settling on a difficulty sticking to one explanation. Some-
plot. She became frustrated, stated that the task times, she does not monitor story details
was dumb, but responded to encouragement. sufficiently, leaving gaps or contradictory de-
This tendency to blame outside factors is tails. She does not examine feelings, thoughts,
consistent with her reliance on the environment or intentions and has difficulty connecting
to regulate her behavior. The outcome of planful, realistic actions with outcomes. She
getting the car back only deals with part of cannot develop the themes that she introduces
the problem. Having ªno way home from this beyond a vague and Polyanna-like story.
placeº was not addressed. Sandra's incomplete Sandra persists in staying in school because
processing of information (only dealing with she believes that the way to ensure her future is
part of the problem, leaving out details called to get an education but is not able to get
for in the directions) and limited resources (no invested in the process of learning. Sandra is
internal representation of possible support; floundering in the unstructured campus setting
indecision and disjointed approach to narrating because she has not developed the inner
the story) suggest why she cannot regulate her structures (implicit motives) to regulate her
behavior. attention and behavior.
470 Assessment of Schema and Problem-solving Strategies with Projective Techniques

When she received the results of the evalua- including scales to detect faking or response
tion, Sandra easily identified with the example sets; (v) inventories yield information without
of someone who thinks an exercise program is clues for understanding the reason; and (vi) the
needed and joins an aerobics class but does not relationship between traits on inventories and
attend for one reason or another (too tired, too behaviors is complex
cold). She agreed that even the thought of Multiple choice or other highly constrained
doing school work is aversive, and the key was response formats do not give clues about the
in finding ways to side-step the aversive style of organizing the response. The respondent
elements, possibly through social supports. is asked to tell versus show. Thus, no evidence
She acknowledged that by joining a sorority, exists beyond self-report to judge response
she would have peers to eat with on a regular parameters despite availability of norms. Fi-
basis, and the structure would help her keep a nally, responses to structured questions only tap
schedule and attend class. At first, she worked conscious phenomena. Therefore, methods such
on seeking the external incentives that would as interviews or self-report inventories provide
motivate her behavior and through these what the individual is capable of perceiving and
initiatives began to feel she had some choices is willing to share. However, responses to
and control over her life. Subsequently, therapy projective testing can be evaluated by analysis
sessions focused on ways to build these of various parameters.
structures and prompts into her daily routines Both direct and indirect sources of informa-
to gain increasing independence. tion are useful depending on the purpose. Self-
Sandra's schema as assessed with the TAT did reports are susceptible to conscious or uncon-
not include connections among goals, plans, scious distortion and reveal only one dimension
actions, and outcomes, and these resources were of the personality. Nevertheless, it is important
not available to guide her daily behavior. The to obtain information that the respondent
therapeutic aim was to help Sandra deal with the willingly discloses. Self-reports are easily scored
factors that impeded the development of these and amenable to traditional methods for
connections. She could not have verbalized estimating reliability and validity. Projective
these impediments in an interview or self-report methods require extensive training, are time
measure because she was not aware of them. consuming, and require more cumbersome
Likewise, the more structured Wechsler scales methods to establish psychometric credibility.
did not reveal the sources of her difficulty. However, their purpose is not evident to the
Therefore, sole reliance on such measures would respondent, and they provide information that
have been insufficient. is different from other sources of data. The use
of both direct and indirect methods is likely to
yield the most accurate picture.
4.16.5 SELF-REPORT VS. PROJECTIVE
PERSONALITY TESTING
4.16.6 CONTRIBUTION OF PROJECTIVE
The foregoing discussion has already demon- TECHNIQUES TO DIAGNOSIS AND
strated that self-reports, even if candid and INTERVENTION
accurate, may predict behavior only when social
incentives are operating or when the situation is The process of responding to the projective
structured to prompt the response. However, it task can be conceptualized as an attempt to
is useful to contrast questionnaire and projec- impose meaning on the stimuli presented and to
tive devices as direct and indirect methods of comply with the instructions given. Analysis of
personality assessment (Vane & Guarnacia, the response reveals the individual's schema or
1989). Direct methods are comprised of ob- scripts, both in terms of the structural
servations of behavior, inventories, or inter- organization of experience and the content of
views. The indirect method utilizes relatively awareness. Appreciation that the manner in
unstructured stimuli such as the TAT, which individuals acquire, organize, and re-
Rorschach, or drawings. In the latter, the present knowledge into cognitive schema have
information sought is not readily apparent a fundamental influence on the way they
and, therefore, difficult to manipulate. Vane behave, guides the use of projective techniques.
and Guarnacia cite the following problems with Rather than looking for one-to-one correspon-
the direct method: (i) limitations of self-knowl- dence between test responses and inferences,
edge preclude accurate self reports; (ii) test items such as searching for aggressive content in
are open to misinterpretation; (iii) real-life projective tests, it is more fruitful to explore
situations cannot be represented by pencil and processes that promote aggression in particular
paper items; (iv) desire of the respondent to situations. The social-information-processing
manage impressions need to be overcome by model of children's social behavior (Crick &
Contribution of Projective Techniques to Diagnosis and Intervention 471

Dodge, 1994) details the social-information- previously developed schema. Furthermore,


processing steps that precede aggressive beha- responses reveal whether the person must be
vior in specific situations. According to this prompted to respond or actively restructures
model, aggressive behavior can be understood knowledge for adaptive use in stressful or
in terms of how individuals encode social cues, ambiguous situations.
how they interpret those cues, how they select Horowitz (1991) envisions a future DSM Axis
and clarify goals, how they generate possible II as incorporating three interconnected com-
solutions to perceived problems or dilemmas, ponents: (i) recurrent maladaptive patterns of
how they make decisions about the selection of self-regard and interpersonal behavior; (ii)
responses, and how they execute and monitor person schematic characteristics explaining that
the selected behavior. Information regarding behaviorÐthis would include the developmen-
each of these steps in social information tal level of self-organization; and (iii) style of
processing can be gleaned from responses to self-regulation comprised of habitual control
projective techniques. processes leading to coping and/or defense. An
Thematic apperceptive methods can provide example would be how a person regulates
information about the attribution of intentions emotionality by inhibiting or facilitating the
for others' reactions, the interpretation of social activation of schema. Horowitz's conceptuali-
cues, the anticipated consequences of alterna- zation implicitly recognizes that any Axis I
tive actions, as well as about the cognitive diagnosis, such as attention-deficit hyperactiv-
flexibility and organization of the individual's ity disorder relates to the development of inner
meaning structures. Information about the schema. Difficulty regulating the attentional
execution of the behaviors is provided at two process may interfere with the active and
levels: how the characters follow through on strategic synthesis of knowledge gained through
intentions and resolve problems, and how the experience. Without organized schematic struc-
narrator develops and monitors the story. Any tures, the individual may not be able to govern
aggressive content that is expressed can then be behavior according to internalized rules and
better understood in light of the aforementioned standards, a prominent characteristic of indivi-
processes. duals with attentional deficits (Barkley, 1990).
Responses to the Rorschach also clarify Responses to open-ended projective tasks
social information processing that promotes represent the convergence of all of the indivi-
aggression. For example, a preponderance of dual's traits on the organization of experience,
answers that simplifies the stimulus (e.g., high including compensatory strategies that could
lambda) suggests that the individual may be mitigate against risk factors.
involved in confrontational situations because A changing emphasis within psychoanalytic
of a tendency to make decisions without theory favors the view of psychopathology as
considering important cues (Exner, 1993). The representing an impairment in the formation of
connection between such responses to projective psychic structures that are analogous to the
tasks and aggressive behavior is borne out by schema of social and cognitive psychology. This
research suggesting that aggressive children focus has spurred diagnostic efforts to evaluate
selectively attend to and recall aggressive cues, the quality of these structures (self-system,
partly because they respond prior to processing object relations). Blatt (1991) assumed that
all of the available information (Dodge & the symptomatic manifestations of various
Feldman, 1990). forms of psychopathology are associated with
A taxonomy of psychopathology comprised different types of impairments of cognitive-
of a descriptive, atheoretical compilation of affective structures in the representational
symptom categories (e.g., Diagnostic and sta- world. For example, the growing consensus
tistical manual of mental disorders, 4th ed. about the existence of two basic subtypes of
[DSM-IV]) could be enhanced by a framework depression (empty and guilty) is based on the
that relates these symptoms to basic structures phenomenology of experience associated with
of psychological organization. Within a diag- each. One type involves exaggerated preoccu-
nostic category such as autism, there exists a pation with issues of interpersonal relatedness,
wide variability of impairment in functional feelings of depletion, dependency, helplessness,
capacities. Responses to projective techniques or loss. The second type occurring later
can range on a continuum from being flexibly developmentally is associated with issues of
adaptive to demonstrating various levels of self-definition, autonomy, guilt, and feelings of
impairment that can go beyond a specific failure. The two types of depressed individuals
diagnosis. The manner in which an individual change in various ways in the treatment process
uses the cues in the stimuli and responds in an and are differentially responsive to different
organized way to the task demands is an index forms of therapy (Blatt et al., 1988). Wilson
of the differentiation and organization of (1988) showed how each of these forms of
472 Assessment of Schema and Problem-solving Strategies with Projective Techniques

depression is demonstrated differently on the The clinician needs a flexible repertoire of


Rorschach and TAT protocols. For example, assessment techniques including projective
stories told to TAT pictures with single methods because the information that each
characters that describe the person as empty, one provides is not equivalent. No method
worthless, or helpless without introducing other supplants or takes priority over others but
characters exemplify the first type of depression. contributes its part to the functional whole. The
Such stories suggest that the narrator is contribution of projective techniques is best
dependent on what is immediately evident in addressed by looking at their place among other
the surroundings without being able to use inner measures. The content and structure of re-
resources to represent the support needed. The sponses to projective techniques can be com-
vulnerability to depression of an individual pared with information from self and other
lacking such resources may be moderated by a reports. The problem-solving strategies and
supportive environment. organization of ideas as expressed in projective
The profession is in need of a diagnostic responses can be compared with how these
system that identifies variables useful for the processes are expressed in more structured
treatment process (Leve, 1995). Such a system tasks. Each source of data is conceptualized
would emphasize characteristics that interact according to its contribution to the cohesive
with methods of intervention and, therefore, patterns that include different levels of person-
would serve as guidelines for their selection. The ality and variability in functioning under
variables that Leve tentatively proposed are different conditions. A synthesis of these multi-
grouped into three categories: cognitive, envir- ple layers permits more precise insight about
onmental, and social-emotional. The cognitive functioning in various contexts.
and social-emotional categories include the Projective techniques clarify the organization
following characteristics, among others: ade- and structure of the inner world. Their use is
quacy of causal reasoning, social-emotional warranted whenever inner life and complex self-
reality testing, moral reasoning, ability to form regulatory functions are important considera-
and maintain close relationships, ability to tions, particularly when respondents are lacking
identify and express emotions, as well as internal insight or would be motivated to distort self-
or external locus of control. Leve (1995) report. The task itself involves the utilization of
reasoned that cause±effect thinking is essential previously organized knowledge structures to
to the success of psychodynamic therapies since provide a response that is not well rehearsed.
developing insight is part of the process. The linkages among the various elements of the
Cognitive treatments, however, do not require response promote an understanding of the
as high a level of causal reasoning because the individual's organizational framework. Projec-
therapist is more actively training the client tive techniques add a unique dimension to the
either by explaining explicitly the causal con- assessment by revealing the respondent's stra-
nections or bypassing explanations altogether tegies for accomplishing the task and, at the
through giving behavioral instructions. Like- same time, showing the content and organiza-
wise, different therapies require different de- tion of ideas that occupy awareness.
grees of social intimacy for success, although all
therapies have an interpersonal component. The 4.16.7 PROJECTIVE TECHNIQUES AS
interface of psychological variables and therapy PERFORMANCE MEASURES OF
processes need to be refined through further PERSONALITY
research. It is evident, however, that projective
techniques can provide important data about The projective hypothesis implies that every
these types of variables. human action and reaction bears the character-
A fundamental consequence of schematic istic features of individuality (Rappaport, Gill,
processing is the perpetuation of the schema in & Schafer, 1968). Therefore, projection is basic
the face of conflicting evidence. Distorted to every response process rather than specific to
schema that bias attention and information a certain set of stimuli. Essentially, any test is a
processing to conform to the original concep- performance measure designed as an analog to a
tion are related to psychopathology (Horowitz, life task. Projective devices are simply stimuli
1991). The assessment of schema can guide with known qualities that are used to elicit
psychotherapy by identifying these unconscious samples of behaviors that correlate with other
schematic distortions (Blatt, 1991). Bringing to behaviors.
awareness different schema of self (e.g., actual The question has been raised about what
self, ideal self, ought self, dreaded self) and other aspects of responses to the Rorschach stimuli
resulted in positive therapeutic changes in both reflect compliance to task demands and what
psychodynamic and cognitive therapies (Singer elements entail projection. Exner (1989) sug-
& Salovey, 1991). gested that most responses are simply best-fit
Projective Techniques as Performance Measures of Personality 473

answers, and that projection on the Rorschach comprehension item, ªwhat would you do if
occurs only in responses that deviate from the you saw thick smoke coming from your
norm or are elaborated beyond the stimulus neighbor's home?º he responded: ªStop, drop,
field. This view assumes that projection on the and roll.º This response was an association cued
Rorschach is an attribute of the response and by some of the words in the item without
not inherent in the task. Exner explains that understanding the entire question. The fire was
projection is not encouraged by the instructions in the neighbor's home, yet the child produced a
which simply call for answering ªwhat might rote response as if he were personally experien-
this beº nor by the blots which are not cing the fire. He had rehearsed this scenario at
completely ambiguous. He points out that school, but was unable to apply the concept
techniques such as the TAT force projection, even under relatively structured conditions.
by asking the respondent to develop a story that The capacity to organize the inner world and
goes well beyond the stimulus provided. Yet, the deal effectively with distracting thoughts, emo-
story-telling task also makes problem-solving tions, or motives should translate into the
demands (Holt, 1961); the narratives are ability to tolerate and to deal successfully with
expected to be organized productions rather ambiguity, complexity, and apparent contra-
than fantasies or random associations. Just as diction in a variety of situations. Therefore, the
identification of blot contours are expected to experience of ambiguity and complexity may
be guided by reality constraints, so do percep- stem not only from reality demands but also
tions of emotions and relationships as well as from the individual's emotions and motives
the sequence of events in stories told to TAT (Blatt, Allison, & Feirstein, 1969). A framework
stimuli. is needed for conceptualizing how projective
Exner claims that the operations contributing techniques such as the Rorschach and TAT shed
to the formulation of responses to Rorschach light on cognitions in ways that differ from more
stimuli such as scanning, encoding, classifying, structured tests. Differences and similarities in
refining, evaluating, discarding, and selecting task demands of various projective techniques
are cognitive and not projective. However, in also need to be understood.
keeping with the projective hypothesis, indivi- The response to the Rorschach task involves
duals are expected to superimpose unique styles the matching of ambiguous stimuli with a
of organization on each of these cognitive memory trace (Exner, 1993). The goodness-of-
operations. The distinction between problem- fit or perceptual match between the blot
solving and projection is not an either/or, but a contours and the object reported constitutes
matter of relative emphasis. All projective tasks the basic element of reality testing. When telling
have problem-solving elements, and all re- a story about a picture portraying one or more
sponses are influenced by individual sets or people, what is demanded is not simply
internalized schema. Those who favor the perceptual matching required by the Rorschach
projective viewpoint insist that such inner task (Beck, 1981) but experiential matching.
structures are fundamental to the response The narrator searches for a relevant explanation
process. The conceptualization of the for the picture, then marshalls possible details
Rorschach as a perceptual-cognitive-behavioral from memory to satisfy the instructions and
task has been criticized on the grounds that this meet criteria for an acceptable story. If the
approach does not give sufficient weight to the stimulus is highly stereotypic, then finding an
role of stimuli from the inner world in the exact story from memory would be sufficient to
interpretation of stimuli from the external world accomplish the task. However, if the narrator
(Willock, 1992). encounters a picture that cannot be explained by
As discussed earlier, projection is an ongoing existing schema, then the individual actively
process that comes to the fore if the situation is constructs the response and fills in details from
novel. In a routine, overlearned situation, various schema. Schank (1990) suggests that
individuals will respond with familiar, scripted what exists in memory is a database of partial
knowledge. However, even a highly structured stories or story elements rather than whole ones.
situation can be misinterpreted. When ideas on When telling a story to a pictured scene, the
the more structured cognitive tasks are asso- narrator draws on this database to construct a
ciative or disorganized, these cognitive difficul- set of events and inner processes (characters'
ties will be magnified in responding to the less thoughts and feelings) that captures the gist of
structured performance tests. One youngster the stimulus.
who demonstrated serious problems in the All tasks are on a continuum of how much
application of rote knowledge in a relatively organization is required in making meaning of
structured task also showed impaired thought the stimulus and developing the response. The
process on the projective tests. To the Wechsler more open-ended the task, the greater the
Intelligence Scale for Children (WISC III) demand to construct the response actively
474 Assessment of Schema and Problem-solving Strategies with Projective Techniques

rather than to respond in rote fashion to the that are thought to reveal personality. This
stimuli. Therefore, projective techniques are distinction is not entirely a function of the task
particularly useful when adjustment to routine but includes how the performance is evaluated.
circumstances is good, but the individual is Human figure drawings, for example, have been
experiencing problems in less structured situa- used to estimate both cognitive level and
tions. Task by task analysis of the assessment personality functioning. The essential charac-
battery permits the identification of the specific teristics of tasks that measure personality, in
processes or competencies required for each contrast to cognition, are described below.
task and the life situations to which these
competencies can generalize.
A wide range of human ability lies outside the 4.16.7.1 Total Personality
domain of standard cognitive tests. It is these
competencies in the broad sense that are within As do life situations, tasks vary in terms of the
the purview of projective testing. The primary amount of structure, cues, or prompts provided
advantage in regarding projective methods as for responding. As the expression goes, ªIt's not
performance measures of personality is that what you know but when you know it that
various tasks and measures can be differentiated matters.º To be useful, knowledge structures
in their demands. Generalization from one test must be spontaneously available when circum-
performance to another and to real life stances warrant. In the psychological test
performance then can be based on similarity battery, personality performance tasks provide
of the demands. If projective techniques are minimal guidance to meet the problem-solving
treated as performance measures of personality, demands and, thereby, permit the assessment of
then it is possible to delineate task requirements, spontaneous strategies.
norms, and expectations about the product as Generally, cognitive measures such as tasks
seen with the Rorschach. In addition, it is on Wechsler scales do not reproduce the
possible to delineate linkages of that product conditions encountered in everyday situations.
with the acquisition of prior knowledge struc- Sternberg (1985) distinguished between analytic
tures and with the application of these resources intelligence that is assessed by more structured
to similar problem-solving situations. tasks and practical intelligence. Analytic pro-
The term ªperformance test of personalityº is blems provide all necessary information and
proposed because it captures two key dimen- have a single correct solution separate from the
sions of projective techniques: (i) the problem- emotional and social context. Real life problems
solving aspect of meeting specified performance are less clearly defined and typically call for
expectations such as form quality on the information seeking. Note the following ques-
Rorschach; and (ii) the organization of inner tion on the WAIS-R: ªIn a movie theater, you
resources or schema brought to bear on the are the first person to notice smoke and fire.
production. This dual approach to understand- What should you do?º The respondent is cued
ing projective techniques is shown in Table 1. that the expectation is to do something. If this
The distinction between self-report and were a real situation, the person would not know
projective measures has been described earlier. that he or she was the first to see the smoke and
Self-reports ask the respondent to tell about the fire. The person may not take responsibility with
self, whereas projective measures require the others around and, therefore, delay action. The
performance of a task from which psychological individual may consider specifics of the context
processes are inferred. If we want to know how a such as proximity to the fire, its size, or the
person solves certain types of math problems, availability of a fire extinguisher. However, the
rather than ask the individual to report on how individual's emotional reaction may disrupt
well he or she can solve linear algebraic thinking. The actual response would involve the
equations, we would present some sample total personality, not merely the intellectual
problems. Similarly, if we want to assess component.
personality, we could obtain information The limitations on generalizing from re-
through self-report by asking questions such sponses to structured tasks to real life condi-
as how flexible are you? How do you interpret tions are evident when we consider the
cues in an unstructured situation? In contrast, attributes that these measures do not assess.
we could request that the individual perform a These include being aware that a problem exists
task that demonstrates how various psycholo- or that a task needs to be done; setting priorities
gical processes are applied to meet the problem- and planning toward their implementation;
solving demand. pacing and self-monitoring; seeking or utilizing
Tests such as the Wechsler Scales are feedback; sustaining long term interest in
generally seen as measures of cognition, whereas independent activities; taking necessary risks;
the TAT and Rorschach techniques are tasks organizational skills; and recognizing and
Table 1 Task analysis of performance measures of personality.

Inner life (how experience is organized


Technique Ideal performance Competencies needed and internally represented)

Drawing Symmetry, coherence, realism, match with Plan execution of drawings within allotted space and Reality testing and quality of thought
instructions limits on drawing ability; organize details with process; use of schema and
context, handle frustration; investment in the task organizational strategies; specific
preoccupations or concerns

Narrative Story matches picture and incorporates Draw on personal experience to construct a story that Reality testing and quality of thought
instructions, appropriate transitions and adequately explains the picture; modulate affects and process; understanding social cues;
cause±effect connections; synthesis of recognize tensions depicted in the stimulus; organize, strategies to organize experiences;
various dimensions of experience within plan, and monitor details of story for cohesiveness specific preoccupations and concerns;
individual characters such as integration of and inner logic and for compliance with instructions; nature of inter- and intrapersonal
inner life with external circumstances; direct attention from one aspect of experience such as schema or scripts brought to
balance among views and needs of all feelings to actions and outcomes; initiative to interpretation of experiences and
characters depicted; balance between transcend the pictured cues to describe intentions and picture stimuli (experiential
excessive detail and vagueness; realistic, purposeful actions to resolve tensions; bring matching)
cohesiveness among intentions, actions, general abstract principles to bear on the task for
and outcomes; appropriate time optimal integration of multiple dimensions of the
perspectives stimulus and coordination of inner and outer aspects
of experience (thought, feeling, action, and outcome);
investment in the task

Inkblot Accuracy and specificity in matching percept Adequate investment in responding; realistic perception Reality testing; strategic processing of
to form; organizing various components of and organization of stimulus components; information and connecting new
the blots; balance among form and other understanding of hierarchical relationships implicit in input with previously acquired sets or
determinants as well as between precision balancing form with other determinants; comfort schema (perceptual matching)
and vagueness in form definitions; logical with matching precision of percepts to relative
and responsive communication during imprecision of blots; confidence with ambiguous
inquiry stimuli; systematic approach to the task

Note: The task is somewhat different for each specific set of instructions and stimuli (e.g., draw-a-person in the rain vs. draw-a-person). Similarly, each Rorschach card and or TAT picture presents a unique stimulus
configuration.
476 Assessment of Schema and Problem-solving Strategies with Projective Techniques

responding appropriately to subtle interperso- tured situations. Because most real life tasks
nal cues. These attributes are traditionally provide less structure, measures of typical
viewed as pertaining to the personality domain performance are also needed in a comprehensive
and reflect the interplay of cognitive and assessment. Correlations are low between
affective processes. A response to the TAT measures tapping maximal and typical perfor-
involves the interpretation of the emotions and mance (Sackett, Zedeck, & Fogli, 1988). There-
tensions depicted to recognize a problem, fore, each provides unique information.
whereas a verbally communicated question or The following conditions were suggested by
problem often contains cues to facilitate the Sackett and colleagues as generally yielding
response, and such cues may not exist in a real- estimates of maximum performance: (i) there is
life version of the scenario. Likewise, the a heightened level of effort and attention
Rorschach presents the respondent with op- because the task is seen as important; (ii)
tions for interpreting the blot contours, expectations and performance standards are
organizing the percepts, and communicating clear; and (iii) the observation takes place over a
them to the examiner. relatively short time where the individual can
exhibit an uncharacteristic spurt of effort that
4.16.7.2 Many Correct Solutions could not be sustained over the long haul. In
contrast, the characteristics of measures of
Just as real-life dilemmas are amenable to typical performance are as follows: (i) indivi-
diverse resolutions, personality performance duals are unaware that they are being observed
measures can be approached in several ways. or evaluated, so they are not trying deliberately
Therefore, many correct solutions are possible. to perform to the best of their ability; (ii)
Responses can differ according to the indivi- performance is monitored over a long period of
dual's interpretation of the stimuli and organi- time; (iii) the performance tasks require skills
zation of the response. Variability, rather than that have to be learned through continuous past
uniformity, is the expectation with personality effortsÐif the task is highly complex, the
performance tasks. Rather than imposing individual has to bring a great deal of past
specific criteria for correct responses, such tasks learning (typical performance) to the current
set a general expectation to deal with the effort; and (iv) performance guidelines are not
stimulus realistically, follow instructions, and clear and, therefore, individuals impose their
logically organize the response. In contrast, characteristic way of organizing and dealing
most cognitive tasks have one correct solution with the situation.
regardless of whether they call for rote knowl- Performance measures of personality meet
edge or a more flexible application of prior several criteria for assessing typical perfor-
knowledge. These distinctions between cogni- mance. Prior knowledge (schema) is super-
tive and personality performance tests are not imposed on current task demands; individuals
absolute; various cognitive tasks share some are unaware of what aspects of their perfor-
common processes with personality perfor- mance are evaluated; and in the absence of
mance measures. Reading comprehension, for structure, they are required to organize and plan
instance, bears a resemblance to personality their response according to their typical mode of
performance measures when previous sets functioning. Understanding different require-
influence the understanding of the text. There ments of different tasks helps explain variations
can be little argument about the facts presented in the assessment of similar constructs with
in a passage, but individuals can justifiably different methods. Johnston and Holzman
differ to some extent on their inferences based (1979) designated indices of thought disorder
on prior learning. in the WAIS and the Rorschach which they
tested with a sample of schizophrenics, their
4.16.7.3 Generalize to Different Criteria parents, and controls. The IQ score and the
Thought Disorder Index (TDI) derived from the
The distinction between typical vs. maximal WAIS were negatively correlated such that the
performance (Cronbach, 1970) has been used to higher the IQ score, the lower the TDI. In
differentiate between personality and ability contrast, IQ scores and the TDI derived from
measures. This dichotomy contrasts responses the Rorschach were uncorrelated. The authors
to immediate cues and external motivating conclude that the tasks make different demands.
structures from performance that is regulated The WAIS calls for ªhabitual reactions and the
and maintained by the individual's long-term social frame of reference is clearº (p. 61). The
investment and initiative. A structured situation social expectations on the Rorschach are less
such as an achievement or cognitive test would obvious. Therefore, the task sets different
generally tap maximum performance, but such requirements and has different implications
performance generalizes only to similarly struc- for the assessment of impaired thought process.
Specific Projective Techniques 477

In their sample of schizophrenics, those who stimulus and formulate the response according
were more intelligent were able to limit to the directions. The more open-ended the
expression of disordered thinking on the WAIS response, the greater the need for self-regulated
more easily than on the Rorschach, in part, strategies to organize the product.
because WAIS items can be answered with Performance measures of personality max-
overlearned responses. The feeling of ambiguity imize the imprint of organization so that the
versus security determines efficiency of func- principles by which experiences are structured
tioning on problem-solving tasks and in real life. and the inner organization of the personality are
Individuals who exhibit thought problems only revealed. A comprehensive assessment battery
on unstructured tasks may be more capable of provides tasks that vary on the continuum of
functioning with the supports of structures and structure provided. Given such variation, it is
clear social cues. possible to relate competencies required in each
A criticism of laboratory research has been task to performance in life situations requiring
that results do not generalize to nonlaboratory similar competencies. This linkage is accom-
situations (Fromkin & Streufert, 1976; Snow, plished by understanding how processes ex-
1974). This is the case because precise linkages hibited during test performance are carried over
between processes engaged by laboratory to everyday functioning in various situations
activities and other life domains are not and incentive conditions.
specified. Likewise, predictions from test per-
formance to real life adjustment can be accurate 4.16.8 SPECIFIC PROJECTIVE
only if they are functionally similar. By under- TECHNIQUES
standing patterns of strength and weakness
across various tasks, the professional can point Three basic aspects of projective techniques
to areas where the individual can and cannot have been recognized (Rabin, 1981). First, the
function adaptively. task entails the presentation of an ambiguous
set of stimuli and a request to give an open-
ended response. The Rorschach and TAT
4.16.7.4 Differences in Conditions of Learning techniques include both of these stimulus and
and Performance response attributes. Therefore, the focus of
Responses to performance measures of interpretation is on the perception of the stimuli
personality, as most tasks, require previously presented and the organization of the response.
organized knowledge and strategies. However, Drawing techniques also demand open-ended
we distinguish between learning that is pro- responses but usually do not provide a stimulus.
moted by direct teaching (e.g., lecture, text- Second, the response is shaped by processes that
book) from learning mediated by the are outside of conscious awareness. An im-
individual's synthesis of experience (Epstein, portant factor making the response less amen-
1994). Personality performance tasks are guided able to conscious manipulation (faking) is that
more by self-regulated learning than by formal the respondent does not comprehend the
education. Again, this distinction in the condi- meaning of the answers given. Third is the
tions of learning does not apply in a dichot- complexity of the interpretive process. Each of
omous fashion to personality versus cognitive these components is briefly addressed below.
measures. For example, an individual's general
fund of information is a joint function of direct 4.16.8.1 Stimulus
teaching and the individual's interest and active, In general, projective techniques present
effortful processing of information. Depending stimuli that are amenable to various interpreta-
on the individual, knowledge is acquired tions, and instructions that can be addressed in a
through some combination of direct teaching variety of ways. The degree of ambiguity has
and self-regulated synthesis of experience. been a prime consideration, although other
Task requirements also differ on the basis of features of stimuli are also important determi-
the conditions of performance. These condi- nants of the response. A systematic accounting
tions pertain to the spontaneous versus cued of how the respondent uses stimuli is the essence
accessibility of prior knowledge and degree of of the Rorschach technique and must be given
organization required to produce the response. greater weight in thematic apperceptive meth-
Cognitive measures that assess general fund of ods (Henry, 1956; Teglasi, 1993).
information often elicit previously acquired
knowledge in piecemeal fashion (highly struc- 4.16.8.2 Response
tured). Personality performance measures such
as the TAT and Rorschach set conditions of All projective tasks require the individual to
performance that demand spontaneous acces- draw on internal images, ideas, and relation-
sibility to prior knowledge to interpret the ships to create a response. The respondent must
478 Assessment of Schema and Problem-solving Strategies with Projective Techniques

dredge forth past experiences, direct or vicar- Interpretation is in keeping with the clinician's
ious, and organize them to meet the task theoretical framework and understanding of the
demands. The greater the stimulus ambiguity task demand. Therefore, the interpretation of
and the more open-ended the response, the responses to a projective test cannot be more
greater the reliance on the organizational satisfactory than the adequacy of the theory
structures of the personality rather than on informing the interpretation of the evidence and
rote knowledge. Yet, the stimulus must have the examiner's skill in evaluating that evidence.
sufficient structure to permit evaluation of the The complexity of the interpreter's job is
plausibility of the respondent's interpretation. maintained because conclusions rest on the
The projective task demand is an analogue of understanding of meaningful patterns rather
other unstructured tasks and situations where than isolated response elements. Personality
available cues are subject to interpretation. The performance tests yield products where the
manner in which the individual interprets the whole is more than the sum of the parts. The
stimuli and organizes the response shows how evaluation of the response must account for the
they will respond under similar conditions organization and cohesiveness of the different
(Bellak, 1975, 1993). components. Therefore, various units ab-
Responses to projective tasks involve com- stracted from the whole cannot be treated in a
plex, interrelated processes that have conscious piecemeal fashion. The more open-ended the
and unconscious components. These include the response, the more amenable to separate
interplay of cognition-emotion-action tenden- analysis of structure, form, and style. Although
cies that coordinate perceptions of the outward these aspects of the response can be concep-
world with experience of the inner. Projective tually separated, their interpretive value lies in
techniques may reveal aspects of emotion, their relationship to the task demand and to
motivation, and cognition that a person may each other. Content, for example, cannot be
not wish to expose. These unconscious aspects properly understood apart from the manner in
of responses may relate to issues of invasion of which it is organized. Finally, the examiner must
privacy. Faking, malingering, or defensiveness be aware of connections between empirical
are problematic for any form of assessment but findings and theory to avoid speculation. Yet,
are assumed to be less so with projective empirical support of theory is not always
techniques. However, the issue of fakability of conceptualized in ways that are useful in making
projective tests has been inadequately investi- decisions about one individual.
gated (Rogers, 1997). The sign approach attempts to provide
empirical evidence for interpretations through
4.16.8.3 Task Demand the identification of features that occur most
frequently in specified clinical populations.
The stimuli presented together with the However, using a list of signs in an atheoretical
instructions set the task demands (Teglasi, cook book fashion is inadequate because a
1993). Projective techniques impose task de- particular sign derives its meaning from the
mands that cannot be met with a simple response context of other responses. For example, in a
such as a request for specific information. They TAT protocol, a stereotyped approach or
require respondents to apply what they know to meticulous listing of details in story telling
produce a story or a drawing or to identify an may be viewed as resistance or as representing
object that may fit the ambiguous contours of an the respondent's best efforts. Likewise, concern
inkblot. Various projective tasks have features in with minutiae that most respondents disregard
common yet differ in important ways. Projective may be viewed as an index of hypervigilance or
methods have been understood as problem- of concrete functioning. The appropriate inter-
solving tasks with designated performance pretation depends on the pattern of responses
expectations as well as measures of personality within the story telling task and across tests in
with emphasis on individual variation. the battery.
Inferences are drawn from the clients'
4.16.8.4 Interpretation behavior during the evaluation as well as from
the product. Any changes in demeanor or
The interpretive task is complex even when emotional reactions in response to the various
the scoring categories and interpretive guide- tasks are noted as are spontaneous comments,
lines are straightforward as in the Comprehen- time elapsed, expressions of uncertainty about
sive System for the Rorschach. Despite the performance, or attempts to seek structure.
relatively clear guidelines and availability of Aspects of the response process are also
norms (not to mention computerized reports), considered. These refer to the manner of
the Rorschach requires the examiner's trained working, compliance with instructions, se-
inference along with the more objective coding. quence of ideas (e.g., planful, trial and error,
Thematic Apperceptive Techniques 479

organized, haphazard), as well as to dysfluen- these inner aspects of experience into an


cies, pauses, or hesitations. The product itself is appropriate time frame and sequence of events
amenable to analysis in relation to the process of that coordinate inner life with appropriate
the response, content, and structure. The formal actions and outcomes.
or structural aspects of the product, of course,
reflect the response process.
Next is a brief review of three major 4.16.9.2 Stimuli
projective techniques in terms of the following
elements: (i) task demands that include the Picture stimuli play a major role in determin-
stimuli and instructions; (ii) response process; ing the story content (Kenny, 1964; Murstein,
and (iii) general interpretive approaches and 1965), and reviews of frequency of themes
issues. specific to TAT stimuli are available (Bellak,
1975; Henry, 1956; Holt, 1978; Murstein, 1968;
Stein, 1955). Ambiguity was a primary con-
4.16.9 THEMATIC APPERCEPTIVE sideration in designing the TAT stimuli (Mur-
TECHNIQUES ray, 1938). However, the degree of ambiguity
varies within the set of TAT cards. The manner
Apperception tests generally use pictures and
of defining ambiguity has ranged from judges'
standard instructions to elicit stories. The TAT
estimates of the number of interpretations that
(Morgan & Murray, 1935) is the most popular
can apply to each card (Kenny & Bijou, 1953) to
of these. Although introduced as a method to
actual degree of variability in responses (Cam-
assess a particular theory of personality (Mur-
pus, 1976). Many of the TAT pictures clearly
ray, 1938), this technique did not remain
show who the characters are, what they are
wedded to the theory of personality from which
doing, and the emotions they are experiencing
it sprang. The thematic approach described in
(Murstein, 1965). Yet, they permit a great deal
the standard manual (Murray, 1943) was
of variation in the style of expressing similar
amenable to use with a wide range of scoring
themes. While stimulus ambiguity is essential to
methods and has been adapted for clinical and
the projective hypothesis, other issues pertinent
research purposes. Spin-offs of thematic apper-
to stimuli such as similarity of main character(s)
ceptive approaches have used different picture
to the narrator, emotional tone, complexity,
stimuli and diverse scoring approaches. A
and latent meaning have been researched (see
variety of nonclinical coding systems for
review by Teglasi, 1993).
content analysis of the TAT appears in an
Broad conclusions about the advantages of
edited volume by Smith (1992a). More clinically
various degrees of stimulus ambiguity cannot be
oriented systems also abound (e.g., Bellak,
drawn without specifying the nature of the
1975; Cramer, 1996; Henry, 1956; Karon, 1981;
scoring system, population, and purpose. Low
Rappaport, Gill, & Schafer, 1968; Teglasi, 1993;
ambiguity has been favored in the assessment of
Tomkins, 1947; Westen, 1991; Wyatt, 1947).
specific motives (Singer, 1981). However, cards
Typically, clinicians have used the TAT by
with high ambiguity have been preferred to
applying broad units of inference in an idio-
measure the relative strength of two motives
graphic, qualitative manner. In contrast, re-
(Atkinson, 1992). When studying hostility, it
searchers have scored TAT responses for
was found that aggressive intent was most
various personality characteristics using specific
effectively measured by a picture with low
and narrow scoring criteria. As of this date,
relevance for hostility, whereas guilt over
there is no widely agreed upon scoring system
hostility was best measured with a picture of
for the clinical use of thematic methods.
high relevance for hostility (Saltz & Epstein,
However, it is generally acknowledged that
1963). The authors concluded that pictures with
classification into diagnostic categories is not a
low relevance for unacceptable behavior such as
chief purpose of the TAT.
hostility measure drive toward its expression,
and those with high relevance measure inhibi-
4.16.9.1 Response tion or guilt about its expression.
Three degrees of ambiguity within the TAT
When given TAT-like pictures, the scene sets set were estimated by a group of judgesÐhigh,
the topic. The story telling directions call for a medium, and low (Kenny & Bijou, 1953). The
description of what is happening in the picture, richness of personality content obtained varied
what happened before, what people are think- as a function of card ambiguity. Cards in the
ing, how they're feeling, and how everything medium ambiguous set yielded stories with the
turns out at the end. These instructions require most personality information. The authors
the narrator to attribute thoughts, feelings, and discussed two dimensions of ambiguity in card
motives to the characters depicted and embed stimuli: (i) the number of cue constellations
480 Assessment of Schema and Problem-solving Strategies with Projective Techniques

available to guide the response; and (ii) the complex that no one has time to score the
definitiveness of the cues available. Sets of protocolº (pp. 94±95). Simple scoring systems
stimuli with graded levels of ambiguity permit are inadequate because they ªthrow away most
the evaluation of responses to situations along a of the information in the process of scoring, and
continuum of structure. hence turn out not to be clinically usefulº (p. 95).
Studies of stimulus ambiguity are inconclu- This point is well taken. A multiplicity of
sive because investigators compare various thematic methods are available that provide
degrees of ambiguity that are not on the same carefully described scoring systems to assess
points on the ambiguity dimension. Further- relatively narrow aspects of personality.
more, conclusions about stimulus variables Although these systems have demonstrated a
have generally been drawn regarding their high degree of accuracy and reliability, their
adequacy to elicit content germane to the narrowness limits their clinical utility. Karon's
assessment of specific motives or relatively remedy is to rely on clinical judgment in a
narrow psychological processes rather than sentence by sentence interpretation of the
their effectiveness to assess formal qualities of protocol, much like a clinical interpretation of
organization that reflect the respondent's an open-ended interview. This approach is
capacity to deal with the task demands. compatible with the view that projective
For clinical purposes, stimuli must permit the techniques are not tests, but are clinical tools
simultaneous evaluation of multiple psycholo- that rely on the skill of the practitioner
gical processes and analysis of both structural (Anastasi, 1976). An alternative is to agree on
and content properties. Some clinicians argue key psychological processes and guidelines for
that stimuli are relatively unimportant as long as their measurement that may be time consuming
they are not highly structured (Arnold, 1962; to learn but less cumbersome to use once
Karon, 1981). Criteria for picture selection mastered.
within the TAT set are available (Bellak, 1975; Historically, there have been two ways to
Birney, 1958; Henry, 1956; Teglasi, 1993). designate units of analysis for TAT stories:
Haynes and Peltier's (1985) survey of clinicians product centered versus narrator (person)
in juvenile forensic settings indicated that the centered. The former method focuses on
mean number of cards used was 10.25. The most qualities of stories that differentiate between
frequently administered TAT cards were similar groups; the latter approach emphasizes the
to those deemed as most productive in previous psychological constructs to which attributes of
studies (Cooper, 1981; Hartman, 1970). The stories relate. The narrator-centered approach
TAT pictures have been criticized for being out permits the hierarchical organization of ele-
of date in relation to clothes and hairstyle ments identified in stories in reference to
(Henry, 1956; Murstein, 1968) and for their broader features of the personality. For exam-
predominantly negative tone (Ritzler, Sharkey, ple, Rappaport, Gill, and Schafer (1968)
& Chudy, 1980). These concerns are mitigated organized story qualities according to affective
by findings that the stimuli effectively permit lability and looked for content shaped primarily
expression of convictions and conceptualiza- by affective responses to the picture. Holt (1958)
tions of affect. The negative tone presents looked at clarity of thought (vagueness, over-
unfinished business or a dilemma to be resolved generalization, disjointedness of organization)
and provides the opportunity to observe how and emotional inappropriateness (arbitrary
the respondent interprets the tensions depicted turn of events, forced endings). With such a
in the picture. Furthermore, the negative scenes focus on psychological processes, patterns of
make it possible to observe how the narrator story elements can be organized in relation to
moves beyond the sadness or conflict to a more relevant constructs.
positive resolution by noting appropriateness of Holt and Luborsky also emphasized ade-
transitions between the negative state of affairs quacy of hero. This variable, however, is not a
and positive outcomes. It has been suggested psychological process but a quality of the story
that pictures representing relatively universal considered important because it correlated with
social situations that most people encounter in supervisor's ratings of overall competence
their lives are suitable across various age and during psychiatric residency training. Adequacy
subcultural groups (Veroff, 1992). of the hero was subsequently viewed by Bellak
(1975) as an index of the narrator's competency.
However, this interpretation may not hold if
4.16.9.3 Interpretation other characters depicted in the stimulus or
introduced in the story are helpless. This
Karon (1981) suggests that only a complex conclusion is also problematic if the narrator
scoring system can be clinically useful. At the has not accurately incorporated the cues
same time, ªan adequate scoring system is so provided in the stimulus. A narrator-centered
Rorschach Technique 481

approach to this variable would focus on specific content, it is these generalized or formal
various psychological processes indicative of properties of content that can reliably reveal
the story teller's competency. Three points are clinically important psychological processes.
relevant here. First, no part of the story can be A focus on the formal elements involves the
interpreted separately from the others. Second, following general principles of TAT interpreta-
the emphasis must always be on the narrator tion.
rather than the narrative. Third, the demands (i) An organized narrative, not a patchwork
set by stimulus properties must be considered. of associations, is expected. The story is, there-
Emphasis on qualities of the narrator rather fore, evaluated in terms of how the narrator
than of the product permits consideration that meets problem-solving task demands. The in-
these qualities can be expressed in different dividual's problem-solving approach can be
ways. Therefore, the interpreter does not seek examined from two vantage points. The first
one-to-one correspondences but focuses on the focuses on how the story is told in relation to the
fit between the pattern of responses and the stimulus and instructions. Accordingly, the
psychological processes to which they pertain. narrative is evaluated in terms of accuracy in
For example, the assessment of a construct such capturing the tensions depicted in the stimuli,
as cognitive integration with thematic apper- compliance with instructions, and the logical
ceptive methods may be accomplished by and realistic unfolding of events. The second
exploring the manner of interpreting stimuli emphasizes how the characters are described in
or the manner of coordinating different dimen- reference to the stimulus cues, the cohesiveness
sions of experience such as thoughts, feelings, of inner states, actions and outcomes, as well as
and actions. More complex units, such as self- the manner in which characters define and
regulation, subsume cognitive integration as resolve the dilemmas set before them in the
well as affective-motivational processes (Tegla- stimulus.
si, 1993). Such a focus on the psychological (ii) Units of inference based on interconnec-
processes of the narrator avoids the sign tions of elements such as the links between
approach and promotes an understanding of causes and effects or actions and outcomes
the constructs being measured. This occurs by pertain to formal characteristics of content. The
organizing story qualities in terms of a clear content is not taken at face value. Instead, units
conceptualization of their relationship to re- of inference are based on understanding of
levant psychological variables. Conceptualiza- psychological processes and task demands. The
tions of the psychological variables can undergo instructions call for inclusion of various levels
continuous refinement in keeping with research or dimensions of experience such as thoughts,
across various subdisciplines. feelings, and actions. These human tendencies
Form and content in the TAT have been to think, feel, perceive, and act are interrelated,
distinguished (Henry, 1956; Holt, 1958; Teglasi, and interpretive meaning is derived from an
1993). Formal features focus on generalized understanding of their patterns. However, the
properties of content such as the organization narrator's understanding and the clinician's
and coherence of the response. These formal framework for conceptualizing such relation-
units of analysis refer to how the details of the ships must be clearly distinct (Cramer, 1996).
content relate to each other, to the evoking The professional determines the implications of
stimulus, and to the directions given. For the narrator's understanding of the linkages
example, inferences can be made from TAT among sequences of events, thoughts, feelings,
stories about striving for long range goals behaviors, and outcomes.
(independently of the specific goals or concerns) (iii) The interpretation seeks patterns that
based on connections among characters' pur- elucidate the schematic structure connecting the
poses, actions, and anticipated outcomes. various story elements (e.g., the manner in
Internal attribution of feelings is a formal which self and other are differentiated and
quality because it need not refer to any specific expectations about causal sequences). These
feeling. These formal qualities of the story schema are products of the synthesis of past
provide information about the cohesiveness and experience and provide the templates for orga-
reality base of the narrator's schema rather than nizing current experiences.
about specific concerns. They are akin to
procedural knowledge within schema theory
and should be evident across stimulus cards 4.16.10 RORSCHACH TECHNIQUE
within a task and even across various projective
tasks. Formal features of stories lead to The Rorschach has achieved wide acceptance
inferences about the structure of personality due in large part to the development of the
such as cognitive integration, affect maturity, or comprehensive system (Exner, 1993), an inte-
self-regulation. Given that picture stimuli evoke gration of previously established methods. The
482 Assessment of Schema and Problem-solving Strategies with Projective Techniques

availability of normative data for discrete that phenomena such as movement are not
coding variables establishes psychometric cred- perceptions (because blots are static, not
ibility. In addition, the comprehensive system moving) but mental constructions based on
provides interpretive strategies for synthesizing perceptual experiences (Blatt, 1992). In general,
the complex data and encourages the interpreter movement responses attempt to put greater
to move back and forth between the formal data specificity on form qualities. When such
of the structural summary and the content and responses involve human activity, they often
language of the response. represent the individual's use of inner resources
The Rorschach administration is conducted to modify perceptions of the external world.
in two phases. First is the association phase However, this interpretation changes if these
during which the respondent is shown each of 10 efforts are combined with inaccurate form
inkblots and asked ªWhat might this be?º The identification. If the forms identified differ
inquiry phase begins after the respondent greatly from those reported by others in a
reports what is seen on each of the cards. The culture or society, then the individual is likely to
examiner guides the respondent to clarify the make other people uncomfortable (without
determinants of each response to permit necessarily knowing why) by engaging in
accurate coding. Although the Rorschach is behaviors that depart from their expectations.
described as a cognitive-perceptual task (Beck, Use of white space as figure-ground reversal is
1981), what is interpreted is what the respondent an analog to other figure-ground reversals such
verbalized. Hence, the examiner must be well as an oppositional stance toward the world.
trained in coding and administrative procedures Conventionality is interpreted according to the
to avoid the many potential pitfalls of conduct- extent to which thought process is like others or
ing an inquiry and to maintain the delicate different. The shading is a relatively subtle
balance between seeking sufficient clarification aspect of the Rorschach stimulus so that not all
and promoting response sets by pressing too far. respondents report them as determinants
(although they may perceive them). Such
responses require a certain level of sensitivity
4.16.10.1 Stimulus Features and Response or perceptiveness to verbalize.
Parameters The style of communication and nuances of
language are important considerations (Smith,
The structural approach to interpreting 1994). Therefore, characteristics of verbaliza-
responses to Rorschach cards creates a close tions such as redundancies or logical incon-
link between the stimulus qualities and para- sistencies are noted. Even commonly reported
meters of the responses that are coded. Stimuli responses can be expressed in unique ways.
vary in the degree to which they are solid or Percepts on the Rorschach are at times given
broken, colorful or achromatic. They also without commitment or reflection, and the
display variation in shading, empty (white) examinee might in such instances be reluctant to
spaces, and contours that are sufficiently engage in the inquiry just as he or she was
familiar to evoke popular responses. Reported removed from the response process. These
perceptions are coded according to patterns of stylistic qualities of the response process provide
attending to stimulus qualities including choices a context for the interpretation of the variables
of blot areas and use of various features of the in the structural summary.
stimuli such as form, color, shading, hue, white Content given is also considered. For exam-
space, card symmetry, or any combination. Also ple, those who give a broad range of content
important are the accuracy of the match have a greater variety of interests, and those
between the reported percept and selected blot who provide many human associations are
contours and the relationships among the exhibiting a broader interest in people. How-
percepts (organization). Each code assigned to ever, what is emphasized in coding responses is
a response represents a particular psychological the manner in which the respondent uses the
process. However, specific response elements stimulus.
such as choice of blot areas are interpreted in The Holtzman Inkblot Technique (HIT)
relation to other variables such as degree of (Holtzman, Thorpe, Swartz, & Heron, 1961)
organization in relating various blot areas. A was designed to overcome what had been
response involving the synthesis of parts into perceived as psychometric limitations of the
larger units reflects more complex thinking than Rorschach. Two parallel sets of inkblots were
a vague, undifferentiated holistic perception of constructed, each set containing 45 inkblots plus
the card. two practice blots that are identical for both
Perception of movement is not attributable to sets. The HIT was designed to differ from the
a quality of the stimulus which is static, but an Rorschach in several ways besides the number
imposition of the perceiver. It has been argued of inkblots (Holtzman, 1981). Two important
Drawing Techniques 483

differences are that the respondent is instructed attempts at codings of object relations (Lerner,
to give only one response per card rather than 1992; Stricker & Healy, 1990) is difficult to
leaving it open and that a short and restricted reconcile with the emphasis on structural
inquiry is given after each response. This variables. Historically, these two types of data
overcomes criticisms of the Rorschach arising have been viewed through different theoretical
from the variations in the style of inquiry and perspectives. An alternative suggested by Wei-
from the widely varying number of responses ner (1994) is to focus on distinctions within
obtained which make it difficult to use norms. personality rather than on classes of data such
However, abandoning the principles basic to the as form versus content.
Rorschach also has disadvantages. One cannot
observe whether the respondent can see some-
thing different in an inkblot once it has been 4.16.11 DRAWING TECHNIQUES
identified in a particular way (flexibility), and
the number of responses per se is an important Drawing tasks do not provide a stimulus but
consideration. Sequential analysis of multiple do exhibit the dual nature of projective devices
responses to the same stimulus is not possible in as problem-solving tasks that reflect internal
the Holtzman format. representations. Drawing a person or a family
Clearly, in devising variations on techniques, requires the translation of a three-dimensional
there are trade-offs that need systematic atten- memory image into a two-dimensional graphic
tion. Given such acknowledgment, clinicians representation within the constraints of the
can make appropriate choices. respondent's artistic ability. This conversion of
three dimensions into two and compliance with
instructions are the chief problem-solving task
4.16.10.2 Interpretation demands. Performance of the drawing is
affected by the conceptualization of the object
The process of interpreting the Rorschach drawn, motor execution, attention to detail and
has distinct phases. First, the examiner codes spatial relationships, as well as planning and
the responses and organizes them into patterns organizing the production. Like other projec-
by calculating the variables in the structural tive tasks, the drawing can be evaluated
summary. These variables are then compared according to its structure (proportion, elabora-
with norms to designate deviations from tion, consistency of detail), content (what is
expected patterns. A strategy for interpreting drawn), and style (line pressure, size, place-
the protocol is selected based on constellations ment). Machover (1949) suggested that struc-
of responses that depart from the norms. The tural or stylistic elements of size, placement,
final step involves the synthesis of the norm quality of line, positioning, symmetry, elabora-
referenced response patterns according to the tion, or shading are more reliable than contents
psychological process associated with them. such as body parts and clothing. Performance
This step also involves analysis of the details standards for drawings can be set such as respect
of content and of the quality of the verbalized for inside and outside boundaries of the persons
responses. The comprehensive system provides drawn and coherence and balance among
clear guidelines for each step of the inter- details. A stylistic quality such as perfectionism
pretation process (Exner, 1993), but the and frequent erasures can be compared to the
examiner integrates information from various eventual quality of the product. The sheet of
sources to refine conclusions and formulate paper sets the boundaries or limits for the
recommendations. drawing. Very small or very large drawings
The comprehensive system began with an suggest that the individual has difficulty setting
almost exclusive emphasis on formal, structural boundaries in relation to the environment.
characteristics emphasizing the perceptual-cog- However, other influences such as impaired
nitive nature of the task to match blot areas with psychomotor functioning (e.g., arthritis) in
objects (Weiner, 1994). However, language and older clients (Kahana, 1978) must be acknowl-
other associations are considered important edged. Furthermore, processes such as lack of
(Rappaport, Gill, & Schafer, 1968) and have planning contribute to the use of available
become increasingly incorporated into the space. The individual may draw a head that is
interpretive procedure (Exner, 1993). Smith too large (lack of planning) so that the drawing
(1994) cautions against acceptance of the either is out of proportion or cannot be
Rorschach as a test with one correct method completed on the page.
of interpretation. An exclusive focus on a single Drawings have been used to estimate in-
approach fails to acknowledge the potential tellectual functioning and neurological status as
contribution of those with a different perspec- well as to assess personality. The draw-a-person
tive. Content interpretation including recent Test (DAP) widely used as a projective
484 Assessment of Schema and Problem-solving Strategies with Projective Techniques

technique to assess personality (Machover, hypothesized psychological variables. The pro-


1949) is also used to measure mental develop- fessional also notes consistency with drawing
ment (Goodenough, 1926). Its usefulness as an instructions, nature of verbalizations about the
indicator of concept formation (intelligence) in figures drawn, and the amount of effort
the developing child was based on how closely expended. Global interpretations of interrelated
the drawing approximated life-like proportions parts are more valid than isolated interpretation
and details. Basically, the degree of realism was of specific details. The sign approach to
the criterion reflecting the child's conceptuali- validating drawings has been criticized (Kahill,
zation of the outward world through correct 1984; Roback, 1968; Swensen, 1968) because
depiction of space and perspective. The Good- there are multiple possible interpretations for
enough (1926) Draw-A-Man Test and the any quality, depending on the overall pattern.
Harris (1963) revision of some criteria included For example, size and detail of drawings can
items that appeared more frequently with signal preoccupation, differential valuing, con-
increasing age. However, even as emphasizing flict (Fisher, 1986) and/or quality of planning
age related trends, Goodenough (1926) noted and organization. Similarly, qualities such as
the following individual variations in drawing shading or erasures are said to signal anxiety.
that were not linked to development: (i) detail However, the nature of the shading is important
dominated but with few ideas; (ii) unique since this feature may enhance artistic quality
depictions that seem comprehensible only to and not reflect anxiety. It should also be noted
the drawer; (iii) suggestive of a flight of ideas; that people differ widely in how they cope with
and (iv) contradictory combination of primitive anxiety. Some anxious individuals will race
and mature characteristics. through the task without shading or erasing. If
Koppitz (1968) also differentiated qualities of the focus is on the psychological processes of the
children's drawings that were not a function of individual, then the configuration of response
age. These aspects of drawings were stylistic: (i) elements or signs would be expected to cohere
qualitative aspects of integration, symmetry, around that process.
and shading; (ii) presence of unexpected It has been suggested that the clinician ask
characteristics; and (iii) absence of expected what a particular sign could mean rather than
characteristics beyond various ages. These what it does mean (Handler, 1985). A critical
qualities were viewed as manifestations of review of the literature on the KFD technique
concept formation that were impacted by (Handler & Habenicht, 1994) emphasized the
motivational and emotional aspects of the need to study more holistic, integrative ap-
personality rather than chronological develop- proaches to the KFD rather than the inter-
ment. Drawing tasks are frequently given to pretation of a series of single signs. The authors
adults to assess neurological status based on the argue for the importance of focusing research on
observation that brain damage interferes with the interpretive approach of the clinician using
the integration of spatial, perceptual, and motor the KFD rather than on the technique itself. The
responses needed to execute drawings (Swindell, use of drawings for multiple purposes (e.g.,
Holland, Fromm, & Greenhouse, 1988; Men- concept formation, neurological status, or
dez, Ala, & Underwood, 1992). personality) demonstrates the multidimensional
Currently, the most frequently used projec- aspects of projective tasks which permit simul-
tive drawing technique is the Draw-A-Person taneous evaluation of responses from multiple
(DAP) as developed by Machover (1949) and perspectives to reveal various interrelated facets
expanded by others (Hammer, 1958; Handler, of functioning. Broad performance expectations
1985; Koppitz, 1968, 1984; Urban, 1963). Also pertaining to structural features of the product
popular are the kinetic family drawing (KFD) can be delineated. As with other performance
technique (Burns, 1987; Burns & Kaufman, measures of personality, individuals are free to
1970, 1972); and House±Tree±Person (H±T±P) vary their approach to meeting these general
drawing task (Buck, 1948, 1987). problem-solving expectations. One caveat is
that artistic quality of the drawings appears to
influence clinician's interpretation (Feher, Van-
4.16.11.1 Interpretation deCreek, & Teglasi, 1983).

The dichotomy between person centered and


product centered interpretation mentioned ear- 4.16.12 CASE ILLUSTRATION
lier is relevant here. Person centered approaches
organize aspects of the production according to On the surface, the three projective techni-
relevant psychological processes of the client. ques reviewed seem very different. Yet, they
The interpreter looks for coherence in form, reveal similar information when compared in
content, and style of the production with the terms of structural qualities. The overlap
Case Illustration 485

between the TAT and Rorschach is illustrated low average. The intent here is not to develop a
by comparing formal aspects of Carl's TAT comprehensive clinical picture but to point out
stories with conclusions drawn from the the overlaps between the two projective tech-
Rorschach. Carl, a 19-year-old man, had niques. Table 2 shows the consistency of the
stopped attending school on a regular basis TAT variables across cards and also displays the
after the seventh grade and left school alto- relevant Rorschach data.
gether during the 10th grade. At the time of this
evaluation, Carl was incarcerated and attempt- Card 1. Little boy thinking, he's thinking
ing to continue his education in prison. He was how. I guess he's thinking a way how to work
doing poorly in his classes and was referred to the violin. (Before?) He don't know how to
determine if he would qualify for special work the violin. (TO?) Turns out he's still
education services. His WAIS-R IQ score was sitting there thinking.

Table 2 Corresponding aspects of Carl's TAT stories and Rorschach variables.

Card

TAT variables 1 2 3 4 5 6BM 8BM 12M 13MF 13B

Imprecise accounting of the X X X X X X


stimulus
Vague, concrete, or stereotypic X X X X X X X X X X
story
Requires more than one query X X X X X X X X X
Story ending is concretely tied X X X X X X X X X X
to the stimulus
Relationship among characters NA X NA X NA X X X X NA
are unclear, stereotypic, or
unstated
Sense of helplessness (e.g., X X X X X X X X
inaction, when it is
warranted; lack of initiative
or inertia)
Inner life such as intentions for X X X X X X X X X X
actions or feelings is not
sufficiently elaborated
Implausible sequence of events X X X X X X X X X X
(e.g., cause±effect, timing,
coherence of action with
purposes)
Focus is dominated by the X X X X X X X X X X
immediate circumstance or
consideration
Insufficient integration of detail X X X X X X X X X X
to comply with instructions

Rorschach variables
Resources and control
EB = 0:0; EA = 0; eb = 7:0; es = 7; *D = ±2; *Adj. D = ±2
Affect
FC:CF + C = 0:0; Pure C = 0; *Afr = .2143; S = 0; *Blends: R = 0:17; CP = 0
Interpersonal
* COP = 0; AG = 0; Food = 0; Isolate/R = .1764;* H:(H) + (Hd) + Hd = 0:0 *H + A:Hd + Ad = 13:0
Ideation
a:p = 4:3; Sum 6 = 0; Ma:Mp = 0:0; Lvl 2 = 0; 2 Ab + (Art + Ay) = 0; W Sum 6 = 0; M± = 0; M none = 0
Mediation
*P = 4 *X + % = .29; *F + % = .30; *X±% = .4117; S±% = 0; *Xu% = .29
Processing
Zf = 12; *Zd = ±.8.5; *W:D:Dd = 13:1:3; *W:M = 13:0 *DQ + = 3; *DQv = 4

* Deviates from normative expectation; R = 17; Lambda = 1.428; PSV = 3; Positive indices: SCZI; CDI.
486 Assessment of Schema and Problem-solving Strategies with Projective Techniques

The boy in the story can only keep sitting there stupid, yet doing nothing, is consistent with low
thinking how to work the violin. He does not self-worth shown on the Rorschach. Other
have sufficient resources to deal with the task or Rorschach variables suggest that self-percep-
to find alternatives. Likewise, the narrator tions are naive and not guided by insight (no
cannot go beyond the picture cues provided FD, no H) nor are these perceptions accom-
(concrete) to produce an ending or to describe panied by dysphoric mood, anger, or irritation.
purposes or deliberate actions. When prompted
to tell what happened before, the response Card 4. Lady trying to convince her husband
(didn't know how to work the violin) stayed that she love him, but he ain't trying to hear it.
within the moment without considering larger (Before?) They was fussing, probably. (TO?)
purposes. The fact that the narrator does not He's about ready to walk away.
introduce other characters to garner support is
consistent with the absence of human content in The story shows no attempt to understand the
the Rorschach. The helplessness displayed by the inner life or concerns of others but indicates
character and the narrator is consistent with the detachment from relationships. The explana-
positive Coping Deficit Index (CDI) on the tion of the stimulus is incongruousÐthe man
Rorschach, suggesting insufficient resources to looks very angry, but his wife is declaring her
formulate responses to demands. The short and love. When asked what happened before, the
simplistic story is devoid of inferential or inter- couple was described as fussing, but the nature
pretive processes as consistent with vague and of their disagreement is unstated. The picture
concrete information processing on the shows the man turning away, and the narrator
Rorschach (Lambda, Zd, DQ+, DQv, W:M). does not have the inner resource to provide a
solution to the conflict that departs from the
Card 2. The man in the field working. Lady stimulus. The Rorschach also suggests that
holding onto the stump because she looks like social relationships are superficial, distant,
she's pregnant. Girl just come home from and guarded. The oversimplifying style is likely
school. (T&F). Probably thinking about all to lower sensitivity to the needs and interests of
the work they gotta do. And the man and the others, and this tendency is coupled with a lack
girl probably feeling the same pain, or sorta, of interest or detachment from relationships (no
for the lady that's pregnant. (Before?) The H; no COP or AG). The affect cluster suggests
man started working. (TO?) Everything got an approach to the environment that minimizes
done. affective engagement. Lack of resources rather
than affective provocation is at the heart of
The story is closely anchored to the stimulus, Carl's difficulties. His reluctance to process
yet the narrator does not give priority to the emotional stimuli (Afr) is consistent with his
young woman in the foreground, nor does he general style of oversimplified processing of
indicate how the three people are related. information (low blends, high lambda, under-
Connections are vague and concrete such as incorporating approach).
the description of the lady as holding on to the
stump because she's pregnant. Characters are Card 5. The lady must have heard someone in
differentiated only by external appearances, and the house, so she ran and hide in the closet.
they all feel the same. As in the previous story, (T/F?) She probably thinking that she gonna
inner life is not elaborated. Carl's Rorschach get hurt. (TO?) That no one was there.
record contains no human movement, a vari-
able that is associated with greater acceptance The woman runs and hides because she feels
of inner thoughts and greater interpersonal vulnerable. Later, when she realizes there's no
awareness. None of the TAT stories suggests danger, she makes no attempt to discover the
the availability of these resources. source of her concern (noise). This absence of
initiative and curiosity is consistent with non-
Card 3. Little kid was probably tired so he or reflection, simplified processing, and detach-
she fell out the chair. (T/F?) Maybe, I guess ment evident in the Rorschach as described
they think that's where they gonna sleep at. previously.
And probably feeling stupid because he's on
the floor. (TO?) That's where they gonna be. Card 6BM. Looks like his mother told him
some bad news. But the news didn't only hurt
The character falls because he's tired and, him; it hurt both of them. (TO?) That some-
despite feeling stupid, just stays there. Again, thing did happen.
there's no purposeful action or inner resources
but an inertia that's consistent with vague and Again, the story offers vague, nonspecific
insufficient processing of information. Feeling descriptions such as bad news and little differ-
Validation Issues 487

entiation between characters. Since the nature play, but the narrator cannot develop the story
of the bad event is not understood, it is not beyond what is seen in the stimulus (reliance on
possible to deal with it adaptively. The absence immediate external circumstances).
of specific story details is consistent with the
other cards and with vague, simplified proces- Card 13B. Little boy ain't got no friends and
sing suggested by the Rorschach. The story ends wants something to drink bad. (Before?) Just
with the character convincing himself that sitting there. (T/F?) I guess he thinking why
something did happen. he's the only one there. (TO?) That he wasn't.
Card 8BM. Looks like they was trying to rob
Just as in Card 5, the ending contradicts the
the lady, and she shot one of them. And the
initial premise of the character. Things are not
man's two buddies trying to get bullets out of
the way they seem, but there's not enough
him. (T/F?) Probably thinking that he ain't
initiative to investigate or deal with the discre-
gonna make it. (TO?) That he made it.
pancy. Such lack of investment in processing
information is compatible with the failure to
The story presents an unrealistic sequence of
process critical cues (underincorporation) on the
events and poor integration of foreground and
Rorschach. Again, no purpose, interpersonal
background components of the picture. There's
connection, or guiding principle is expressed.
no reasoning about intentions or consequences.
The similarity of the conclusions derived
Rather, the narrator gives simplistic associa-
from both techniques suggests that they are
tions to the stimulus without rule governed
assessing common processes. Both methods
connections among likely sequences of events,
suggest that resources and reality testing are not
between causes and effects, or between short
adequate to meet daily life demands. Further-
and long term outcomes. This detachment from
more, the focus on the immediate and the
conventional thinking is consistent with poor
concrete, along with haphazard processing of
reality testing on the Rorschach (poor form
information, hinder the development of schema
quality, low populars, incomplete processing of
that are sufficiently elaborated to provide rules
available information).
that govern the synthesis of experience, the
regulation of behavior, and the expression of
Card 12M. This is a lady? (It can be whatever
affect. Without such inner guides to self-
it looks like to you) The old man is praying
regulation, Carl exhibits tendencies toward
for the lady because he thinks that if he prays,
impulsive, antisocial behavior and has limited
she'll get better. (Before?) She was sick. (TO?)
capacity to delay gratification. Carl would
That she didn't make it.
benefit from a highly structured learning
environment with highly structured tasks and
As with the other stories in this protocol, the
frequent feedback and redirection.
story does not depart from the immediate cues
of the stimulus. Connections between sequences
of events and among characters depicted are 4.16.13 VALIDATION ISSUES
vague or nonexistent. For example, we do not
know what is the relationship between the old Major criticism has been directed not only
man and the lady. The narrator is powerless to towards projective methods but also towards
introduce alternatives that are not cued by the psychometric approaches to assessing person-
stimulus such as consulting a physician. The ality which classify instances of experience,
lack of integration and insufficient organization thought, or action into trait categories. The
of details is consistent with Rorschach patterns inference process, in general, has been mal-
suggesting that Carl formulates decisions with- igned by the behaviorist movement. Compel-
out sufficient processing of information (e.g., ling arguments have been made to exclude
high lambda, low developmental quality, low from consideration anything but overt beha-
blends, and underincorporating style). vior and objectively coded environmental
variables (Skinner, 1953). Psychology has
Card 13MF. The old man just got up for moved far from that position to a recognition
work, but his wife didn't have breakfast made of the importance of understanding the mean-
for him. (T/F?) One of them feeling really ings that individuals assign to environmental
tired. The other feeling left out. (TO?) Turns events and to their own behaviors. Indeed,
out that one of them sleep and one of them cognitive structures and processes are now
woke. considered to be at the heart of individual
differences in experience, thought, and action,
This story is also rather concrete and stereo- both adaptive and maladaptive (Cantor &
typic. Each character has his and her roles to Kihlstrom, 1987).
488 Assessment of Schema and Problem-solving Strategies with Projective Techniques

The process of inference is not restricted to The following is a brief overview of how
the use of projective testsÐit is what makes the traditional psychometric indicators must be
psychologist a professional rather than a modified to establish reliability of projective
psychometrist. By giving priority to some scores techniques:
or using qualitative information to increase
understanding of the scores, the clinician 4.16.13.1.1 Scorer reliability
engages in the process of making professional
The accuracy of two people looking for the
judgments (Groth-Marnat, 1990). These judg-
same information requires adequate training
ments cannot and should not be eliminated. It is
and clear guidelines. When raters are well-
important, however, for practitioners to reflect
trained and scoring systems are well documen-
on the quality of their inferences and the
ted, such reliabilities tend to be high. A related
usefulness of their decisions.
aspect of scorer reliability, particularly relevant
Evidence for validity of conclusions made on
to clinical inference is the consistency with
the basis of inferences drawn from projective
which one rater codes the same protocol over
methods needs to be convincing. Absolute
time (Karon, 1981).
standards that cut across all methods are
essential and they must be precise enough to
4.16.13.1.2 Decisional reliability
inform judgments about when the standard is or
is not being met. However, in their application The consistency of decisions drawn from a
to projective techniques, general criteria should protocol can be assessed when specific units are
accommodate to the nature of the method. not the focus. For example, Shneidman (1951)
Validity and reliability are not established for a showed that 16 clinicians using their own
generic technique such as figure drawings, the methods came to similar conclusions. Other
TAT, or the Rorschach. Rather, the utility of influences on the reliability of decisions may
each set of instructions and stimuli together with relate to the number of performance samples
the scoring method is separately evaluated needed. For example, in the measurement of
according to the accomplishment of its specific particular motives with TAT cards, it is
purpose. Ways of demonstrating validity and important to know if the specific motive emerges
reliability for all techniques must be in tune with reliably in every card or only some. To assure
the logic and coherence of the measures and adequate reliability, it has been recommended
constructs under consideration. that at least six stories be obtained from each
respondent (Lundy, 1985; Smith, 1992b).
Decisions must be based on clear conceptua-
4.16.13.1 Reliability lizations. Therefore, decisional reliability must
account for the theoretical appropriateness of
Establishing reliability for projective techni- the match between predictor and criteria. Test
ques and for questionnaire methods is funda- responses can be expected to correlate with
mentally different. As Kelly (1958) pointed out: criteria only if their meaning is functionally
ªWhen the subject is asked to guess what the similar. Thus, adequate functioning in a
examiner is thinking, we call it an objective test; structured situation can occur despite disorga-
when the examiner tries to guess what the nized responses on a projective test.
subject is thinking, we call it a projective deviceº
(p. 332). Reliability of items on a rating scale is 4.16.13.1.3 Test±retest reliability
essentially a matter of consistency in the
One important factor in demonstrating the
respondent's interpretation of the items. Relia-
reliability of the measure upon retesting is
bility of inferences based on projective methods
whether one is looking for similarity of content
rests on a combination of the stimulus, the
or of the inference (Karon, 1981). The reliability
response, the method of interpretation, and the
of the specific content is less relevant than
skill of the interpreter. Psychometric procedures
consistency in the meaning of the response.
to establish reliability based on forced-choice
test items can be carried out without any clinical
4.16.13.1.4 Internal consistency
expertise because reliability pertains primarily
to the test-takers responses. With projective Internal consistency of thematic content
measures, reliability, in part, is an attribute of across cards would be an inappropriate measure
the interpreter because the scoring units are of reliability because cards are designed to elicit
inferences of the professional. Attempts to different themes (Lundy, 1985). However,
designate cookbook procedures so that the despite variability in specific content, the
validation process can be carried out quickly responses may yield similar inferences. More
and easily are out of tune with the nature of stylistic units representing psychological pro-
projective methods. cesses such as the accuracy with which the
Validation Issues 489

content captures the ªgistº of the scene presented is distinct from other constructs and that it is not
or linkages between causes and effects may be uniquely tied to a particular measurement
generalizable across different stimuli. Internal method. Yet, the possibility that the constructs
consistency based on the number of words per may indeed be tied to the measurement tool
story (alpha = 0.96) was much higher than must be considered. McClelland and colleagues
internal consistency of need for achievement showed that attempts to correlate self-report
(Atkinson, Bongort, & Price, 1977). Likewise, and projective measures of achievement moti-
Rorschach cards present stimuli with important vation to validate either one are misguided.
differences and, rather than estimating internal Likewise, Horowitz (1991) notes that schema,
consistency, inferences are made on the basis of although important to measure, are not directly
the entire coded protocol and not card by card. known by the subject and cannot be confirmed
Responses to the white spaces in the blot have through self-report but by a consensus of two or
different meaning for each card, and not all white more independent observers. Theoretical under-
space responses represent figure ground rever- standing of the construct must always be central
sal. Therefore, adding all space responses is a to the choice of validation efforts.
rough estimate of the trait in question. It is important to acknowledge the potential
Internal consistency expected within a mea- for corroboration of constructs with various
sure should relate to the nature of the task and assessment measures. Yet, different measures of
to the consistency inherent in the construct the same construct (e.g., ratings by actor versus
under consideration. observer) may be assessing qualitatively differ-
ent aspects of that construct. Meaningful
4.16.13.2 Construct Validation differences in the constructs are demonstrated
by distinct patterns of relationships with various
Messick (1989) defines construct validity as criterion measures (see criterion validation
ªan integration of any evidence that bears on below).
the interpretation or meaning of test scoresº Even highly similar measures (e.g., thematic
(p. 17). Because traditional indices of content or approaches) used to assess identical constructs
criterion validity contribute to the meaning of but with different procedures are not necessarily
test scores, they too pertain to construct addressing the same qualities. For example,
validity. Thus, construct validity subsumes all Arnold (1962) developed criteria for scoring
other forms of validity evidence. TAT stories for achievement motivation by
This emphasis on construct validity as the comparing groups of individuals known to
overriding focus in test validation represents a differ in their job success. In contrast, the
shift from prediction to explanation as the McClelland±Atkinson tradition (McClelland &
fundamental focus of validation efforts. Con- Koestner, 1992) for developing criteria for
struct validation emphasizes the development of scoring achievement motivation from stories
models explaining processes underlying perfor- told to picture stimuli was based on the
mance on various tests and their relationships to comparison of groups given different instruc-
other phenomena. Accordingly, correlations be- tional sets to arouse the achievement motive.
tween test scores and criterion measures con- Each of these two approaches to contrasting
tribute to the construct validity of both predictor groups may be appropriate for given purposes
and criterion. In Messick's words, ªValidity is an with specific populations but clearly involve
integrated evaluative judgment of the degree to distinct conceptualizations of the achievement
which empirical evidence and theoretical ratio- motive. One takes the position that the motive is
nales support the adequacy and appropriateness present and, when aroused, energizes and
of inferences and actions based on test scores or directs behavior towards a particular class of
other modes of assessmentº (p. 13). goals or incentives. The other focuses on the
achievement motive as being a relatively stable
4.16.13.3 Multitrait, Multimethod Validation disposition reflected by complex patterns of
cognitive±emotional processes that guide per-
Although the multitrait, multimethod ap- ceptions and behavior. If scoring units are
proach is a standard technique for construct empirically derived through contrasting groups,
validation (Campbell, 1960; Campbell & Fiske, then the construct is defined by the nature of the
1959), this technique is vulnerable to problems groups and procedures used.
with the definition and measurement of con-
structs. This procedure seeks to establish higher 4.16.13.4 Criterion Validation
correlations across diverse measures of the same
trait (convergent evidence) and lower correla- Criterion related evidence also belongs under
tions among similar measures of different traits the rubric of construct validity (Messick, 1989).
(discriminant evidence) to show that a construct Patterns of correlations with other variables
490 Assessment of Schema and Problem-solving Strategies with Projective Techniques

contribute to the defining attributes of each This understanding of the part±whole rela-
construct. The same set of external correlates do tionship is essential for the validation of
not apply to achievement motivation measured projective techniques. Projective techniques
through projective and self-report methods look at individual differences at a more global
(Spangler, 1992). Projective measures of level than other measures. The products reflect
achievement motivation correlate with sponta- schema or inner structures that develop through
neous effort sustained over time, whereas self- the individual's synthesis of life experiences.
reports correlate with activities that are cued by These involve the interplay of all of the person's
the situation. Such variation in the match characteristics and reciprocal transactions with
between the predictor and the target clarify the environment. Therefore, part±whole pat-
the dimensions of the achievement motive terns exist between various trait constructs and
assessed by each type of measure. This under- response parameters within the test. Part±whole
standing of constructs in terms of the life predictions can relate specific neuropsycholo-
conditions to which they generalize constitutes gical or temperamental processes to individual
the simultaneous validation of the predictor and differences in responses to projective tasks
the criteria. Relationships between predictors (Bassan-Diamond, Teglasi, & Schmitt, 1995).
and criteria derived from a construct must be Part±whole relationships depend on conceptua-
tested for expected patterns of discriminant and lizations about how the various dimensions of
convergent evidence. In doing so, both situa- individual differences come together in the
tional specificity and generalizability of con- development of broader units of personality
structs and their measures are addressed at the such as the influence of temperament on
same time. Generalizability of the predictor± conscience development (Kochanska, 1993).
criterion relationships across different popula- Given the expected role of multiple variables,
tion groups, settings, task domains, ages, or it is apparent that expectations about the size of
gender must be empirically established. correlations need to be in line with under-
standing of the phenomena.
4.16.13.5 Part±Whole Relationships The part±whole issue also applies to the
relationships among responses within a specific
Personality is a functional whole comprised projective technique. The functional signifi-
of patterns of interrelated components. There- cance of each part hinges on its interplay with
fore, predictions from single elements of other parts in relation to the constructs being
personality to more encompassing units, such considered. Various dimensions of the product
as general adjustment, proceed from part to such as form and content variables are
whole and can only account for a portion of the cohesively related and need to be systematically
variance. This part±whole prediction is illu- coordinated into higher-order constructs. Psy-
strated in Figure 1 showing the influence of chometric and theory-based approaches need to
temperament on adjustment as mediated by be integrated to build useful frameworks for
goodness-of-fit. Any one temperamental attri- clinical use. Specific response parameters can be
bute would account for a small proportion of abstracted from the whole and pieced together
the variance in the fit between the person and to find conceptually meaningful patterns.
various situational contexts or task demands. A
configuration of such traits would improve the 4.16.13.6 Convergence of Psychometric and
prediction. However, the actual goodness-of-fit Conceptual Treatment of Data
would be a better predictor of adjustment
because this broader construct subsumes the Normative, nomothetic, and case study
configuration of traits (even those not specifi- approaches to data collection are interrelated.
cally addressed in the prediction), the environ- Conceptual linkages among these approaches
mental demands and supports, as well as coping are essential to the interpretation and validation
mechanisms. of projective techniques, as described next.

Temperament General adjustment


(specific traits) (a global measure)

Goodness-of-fit
(in various situations)
Figure 1 Illustration of part±whole prediction.
Validation Issues 491

4.16.13.6.1 Normative 4.16.13.6.3 Case study


Before initiating the arduous process of
The case study is most relevant to the
collecting norms for projective methods, the
practicing clinician because decisions for
units of analysis must be conceptually mean-
clinical purposes relate to understandings
ingful, theoretically relevant, and clinically about one person. The adequacy of these
useful. Furthermore, criteria for coding must
decisions can be validated on a case by case
be provided in sufficient detail to assure rater
basis. Although case study methods focus on
reliability. The Comprehensive System for the
the unique patterns of variation in responses of
Rorschach has shown that the establishment of
a single individual, the pattern of variables
norms for meaningful units of analysis is
under consideration can be referenced to
possible, and that larger conceptual units can
norms and to theoretical constructs. With
be designated through the use of multiple cut-
projective methods, the respondent is allowed
offs to establish interpretively useful patterns of
free expression within a specified context
interrelated response parameters.
(stimuli and directions), and effective use of
these techniques require a conceptual frame-
4.16.13.6.2 Nomothetic
work at three levels. First is an understanding
It is important to base norms on units of of the psychological meaning of the response
interpretation that represent psychological pro- patterns within the projective measure. Second
cesses that can serve as explanatory constructs. is an understanding of how the projective
The nomothetic approach attempts to promote measure fits with other information in a
conceptual understanding through the study of comprehensive evaluation. Third is a concep-
general principles and functional relationships tualization of how the patterns relate to
among variables. Such a conceptual approach to competencies needed in the relevant life
the Rorschach (Blatt & Berman, 1984; Weiner, situations. Therefore, to conduct adequate
1977) has advocated organizing discrete re- case studies, the examiner needs a framework
sponse parameters into theoretically relevant to understand how various types of data fit
clusters that contribute to a well defined together within and across measures and how
construct. A strict empiricist would be satisfied the emerging patterns apply to the various
with the prediction of patterns of behavior from demands of the individual's environment. The
various configurations of test responses (e.g., of attempt to integrate psychometric (empirical)
Rorschach variables). However, establishing and nomothetic (theoretical) data into the case
such empirical relationships is only a starting study is typified in the attempt to synthesize
point for building conceptual frameworks for structural and content features of Rorschach
understanding and explaining behavior. The responses (Erdberg, 1993). The problem is that
meaning of test response patterns emerges from these efforts have applied different conceptual
a network of relationships with other theoreti- frameworks to different aspects of the data. A
cally relevant response patterns and theoreti- true integration of the empirical and theoretical
cally appropriate external criteria. perspectives requires a focus on personality
The collection of normative data for pro- constructs and the establishment of coherent
jective techniques is embedded in the construct patterns of data from various sources around
validation process. The constructs and units of the constructs.
inference for projective measures need to be The term ªconceptual validityº has been
defined a priori in the same way that test items applied to the process of psychological
in questionnaires are selected to represent assessment (Maloney & Ward, 1976).
constructs. An example of this approach is Whereas construct validity focuses on con-
the designation of units of inference for TAT firming expected patterns of relationships
stories on the basis of constructs derived from across individuals, conceptual validity focuses
research and theory on empathy (Locraft & on observing cohesive patterns within an
Teglasi, 1997). Scores based on these units individual. These patterns of expected rela-
subsequently differentiated groups of children tionships among observations constitute a
designated as high, medium, and low on working model of the individual being
empathy on the basis of teacher ratings. Such evaluated. According to Maloney and Ward,
units require cross validation on numerous establishing such a model is a prerequisite for
populations to assure wide applicability prior answering the referral question. When infor-
to attempting large scale normative studies. mation from various sources is understood in
Unless the unit of inference is conceptually terms of the constructs that explain an
clear and represents meaningful psychological individual's difficulties and point to appro-
processes, normative data or group compar- priate decisions, the assessment has concep-
isons are not particularly useful. tual validity.
492 Assessment of Schema and Problem-solving Strategies with Projective Techniques

4.16.14 FUTURE DIRECTIONS insufficiently developed to permit adaptive


coping rather than from conflict among the
The anticipation of future directions for structures. In an analogous manner, clinical
projective techniques emerges from the clues syndromes have been related to problems in the
gleaned from the scientific literature pertaining development of person schema (Horowitz,
to the development and assessment of person- 1991). If psychopathology is a reflection of
ality. However, the prognosticator's wishful impairment in the formation of psychic struc-
thinking also influences this process. The intent tures, then it is reasonable for assessment to be
of the prognosticator is to show how unfolding geared to the evaluation of these structures (e.g.,
trends or even new spins on old ideas point object relations, self-system, along with asso-
toward desired alternatives for moving forward. ciated processes such as reality testing).
Future possibilities for projective techniques are Levels of impairment based on the complexity
drawn from three conceptually distinct perspec- and organization of inner structures will be
tives: (i) converging constructs from the various identified along with a description of sympto-
subfields of psychology that scaffold the use of matic behaviors. Projective techniques will
projective techniques; (ii) refinements in the figure prominently in the assessment of psy-
conceptualization of personality that point to a chological variables for the purpose of inter-
multidimensional view and to a unique role for vention planning. The continued refinement of
projective techniques; and (iii) improved under- integrative therapies (Leve, 1995; Norcross,
standing and use of the specific projective tools. 1986) will increase the usefulness of identifying
Methods and constructs in other subfields of relevant processes to be targeted and matched
psychology provide a hospitable zeitgeist for with optimal intervention strategies.
projective techniques and bode well for their Understanding personality according to dif-
future development and utility. Converging ferent styles and levels of organization of
evidence appears to validate the basic tenets experience as they relate to functioning will rest
of the projective hypothesis. Concepts from on increasing emphasis on part whole con-
psychodynamic formulations such as transfer- ceptualizations. For example, with a focus on
ence have been redefined in terms of contem- inner structures, aspects of personality such as
porary psychology, demonstrating that such emotions will be increasingly recognized as
phenomena are not unique to one theory but organizing processes that shape adaptation and
can be understood in several ways (Singer & problem solving (Greenberg, Rice, & Elliott,
Singer, 1994). The examination of clinically 1993). Other personality processes, such as
relevant phenomena in light of conceptualiza- distractibility or inattention, will also be under-
tions of memories, schema, and scripts from stood in relation to how they shape the
various subdisciplines support the work in each development of inner psychic structures. Future
subfield (e.g., Horowitz, 1991; Stein & Young, research will focus on establishing linkages
1992). Cognitive theories of perception, mem- between various levels of personality constructs
ory, and learning are increasingly emphasizing such as the role of various traits in the
the unconscious or implicit social attitudes development of inner strivings and the organi-
(Uleman & Bargh, 1989). Research on construct zation of meaning structures. Research will also
availability and accessibility (Higgins, 1990) as focus on clarifying relationships between dis-
well as script or schema theory also recognizes crete measures of neuropsychological processes,
the influence of cognitive processes that occur such as planning, organizing, retrieval from
outside of awareness. Knowledge from other memory, or continuous attention, and the
areas of psychology will increasingly inform the manifestation of these processes in the perfor-
designation of interpretive units for measuring mance of more complex tasks analogous to real-
personality with projective techniques. Recent life situations. Inherent in the part whole
interest in narrative methods to assess schema conceptualization of personality is the view of
and social cognitions is applicable to thematic development as being propelled simultaneously
apperceptive techniques (Cramer, 1996). Stories by biological, psychological, and social forces.
elicited through picture stimuli, like other Within the psychological realm, perceptions
narrative approaches, reflect the human orien- and behaviors will be increasingly understood in
tation to perceive the world as stories or myths terms of the interplay of experience with
grounded in culture and personal experience. relevant affective and cognitive processes.
Conceptions of psychopathology within psy- A multilevel understanding of personality will
choanalytic theory are placing increasing em- permit clearer conceptualizations of the utility of
phasis on functions of inner psychic structures different types of techniques to assess different
and subjective meaning systems (Atwood & facets of personality. One example is the
Stolorow, 1984). In this context, adjustment distinction between measures of self-attributed
problems arise because the structures are and implicit motives to achieve (McClelland
Summary 493

et al., 1989). Story-telling measures of achieve- strategies in relation to the task demands.
ment motivation (implicit) were more effective in Therefore, it may be fruitful to view these
the prediction of long term outcomes such as techniques, in part, as performance measures of
career success, whereas self-reports (self-attrib- personality that can be compared to other tests
uted) were better in predicting immediate in an assessment battery in terms of what the
choices (Spangler, 1992). The two types of task requires. The most significant difference is
measures of achievement motivation relate to that performance tasks of personality maximize
different criteria, develop through different the use of spontaneous strategies to organize
pathways, and can be understood in terms of perceptions and responses. Variation in re-
different levels of personality. sponse patterns across different types of tasks
In training programs, the professional permits distinctions between knowledge struc-
courses are the places where the core psycho- tures that are inert and require external prompts
logical knowledge bases are integrated. Exper- from those that are meaningfully organized and
tise in projective techniques includes the linking spontaneously accessible. Furthermore, the
of the psychological processes assessed with configuration of responses within and across
appropriate strategies for therapeutic interven- diverse tasks foster explanations of situational
tions. The complexity of interpreting projective variability and consistency in performance and
instruments and of understanding their impli- behavior. For instance, social situations require
cations for guiding interventions requires no the individual to organize perceptions, size up
less than the systematic and flexible application intentions, and anticipate reactions of others.
of prior knowledge. Therefore, projective These requirements are similar to the demands
assessment can serve as a centerpiece for of projective tests to organize perceptions of the
integrating concepts from core areas such as stimuli and coordinate the responses with the
physiological, affective and cognitive bases of instructions.
behavior. Training should emphasize the gra- The relationship between content and formal
dual development of the professional's schema elements in the units of inference drawn from
to guide practice rather than the acquisition of projective techniques still needs to be clarified.
knowledge applied in rote fashion. Trainees' The most fruitful approach to integrating the
schema should be sufficiently broad and flexible analysis of form and content is through the
not only to incorporate information currently identification of their linkages with important
available but to accommodate continuous psychological processes. Formal structural
learning throughout the professional career. analysis should receive greater emphasis in
Measures are simply tools. Rather than em- the apperceptive methods. Generalized proper-
phasizing the assessment technique, training ties of content and features of narrative
programs would do well to focus on the structure are more durable indices of psycho-
development of integrative frameworks that logical processes than isolated content. Further-
include how various sources of information fit more, the analysis of the subtext or underlying
together into meaningful patterns. structure of narratives can yield significant
Increased acceptance of qualitative measures information about the individual's schematic
as having scientific merit has led to more organization of experience. The struggle to
frequent use of open-ended methods in the apply valid and reliable methods to assessment
study of personality such as the study of early will continue. The inevitable link between
memories (Bruhn, 1992), thought sampling theory and method necessitates the simulta-
(Rubin, 1986), examination of the life story neous effort to refine both. Conceptualizations
(McAdams, 1990), self-defining memories developed in other subfields of psychology and
(Moffit & Singer, 1994), and analysis of therapy lessons learned from qualitative research meth-
transcripts (Luborsky & Crits-Christoph, 1990). ods will be applied to improve the reliability,
The psychometric challenges of projective validity, and clinical utility of projective
methods are shared among all open-ended measures.
techniques. The effort to master these difficul-
ties will require, above all, conceptual clarity
and an emphasis on construct validation. These 4.16.15 SUMMARY
endeavors will promote shared conceptualiza-
tions and methods across disciplines, apprecia- The chapter is summarized by presenting a
tion of the progress already made with model for understanding performance measures
projective techniques, and the spurring of of personality shown in Figure 2. The interaction
further developments. of the person and environment (boxes 1 and 2) as
Frameworks for interpretation of projective a fundamental unit of study has wide acceptance
techniques not only elucidate the inner structure and applicability. However, the objective fea-
of personality but reveal problem-solving tures of the environment and the subjective
494 Assessment of Schema and Problem-solving Strategies with Projective Techniques

world are not the same. Lewin (1935) refers to the of experience through their impact on the
psychological environment as the inner experi- information entering awareness, the feedback
ence of external reality. Persons participate in received from others and on the development of
and influence their external environments as well internalized schema. Therefore, the part±whole
as being shaped by them. The manner in which relationships between specific symptoms or
the individual encodes and stores encounters psychological processes and larger units of the
with the external world drives the construction of personality need to be understood. Projective
the inner psychological world (box 3). Therefore, testing relies on an appreciation that overt
one can only conceptually separate the person behavior is linked to inner meaning. Conse-
from the environmentÐthey are embedded in quently, inner life is a mechanism for perceiving
one another. the outer world and for regulating behavior to
Schema are memory structures that develop adapt to these perceptions. The projective
through the individual's constant and reciprocal techniques (e.g., TAT and Rorschach) provide
transactions with the environment. The devel- information about how prior experience has
opment of schema is influenced by (all factors been organized and how inner resources
said to influence personality) various interactive (schema, inner structures) are applied to
determinants including constitutional factors of relatively unstructured situations. Responses
temperament (see reviews by Emde, 1989; to projective tasks are understood as cohesive
Plomin, 1986) as well as meaning structures products reflecting all of the processes involved
transmitted by family and culture in conjunc- in the synthesis of experience. Specific psycho-
tion with the individual's experiences through- logical processes such as attention or emotion
out life. These schema are important to assess can be studied as precursors and sequelae of the
because they: (i) are mental sets that guide development of inner structures.
attentional processes and serve as filters for Projective techniques require a person to
interpreting information; (ii) guide actions in demonstrate the qualities of organization and
situations that require complex resources; and strategic planning by performing a problem-
(iii) shape the storage of new information in solving task rather than by telling about the self
memory. in an interview or by responding to a ques-
Certain conditions such as attention-deficit tionnaire. Therefore, the term performance
hyperactivity disorder are diagnosed through a measure of personality aptly describes projective
careful examination of previous history as well methods. These performance measures are less
as scrutiny of current behavioral patterns. structured than the other tasks in the typical
However, these attentional processes shape assessment battery and provide unique informa-
the individual's prior and ongoing synthesis tion. The manner in which the problem is solved

1. Individual
variation in the
configuration of
traits or
dispositions

3. Ongoing 4. Schema for 5. Application 6. Implications


synthesis of interpreting of schema to for adjustment
transactions experience and tasks in the in situations or
between resources to assessment performance of
1 and 2 meet implicit or battery including life tasks that
explicit performance make similar
demands of measures of demands
daily life personality

2. Experiences

Figure 2 Understanding performance measures of personality.


References 495

or the task is accomplished reveals the indivi- Anastasi, A. (1976). Psychological testing. New York:
dual's organizational strategies and resources to Macmillan.
Anderson, J. R. (1990). Cognitive psychology and its
deal with similarly unstructured situations. In implications. New York: Freeman.
addition, the nature and organization of knowl- Archer, R. P., Maruish, M., Imhof, E. A., & Piotrowski, C.
edge structures reveal the frameworks or con- (1991). Psychological test usage with adolescent clients:
victions that guide responses in ambiguous 1990 survey findings. Professional Psychology: Research
and Practice, 22, 247±252.
situations. Furthermore, inferences drawn from Arnold, M. B. (1962). Story sequence analysis: A new
performance measures and self-reports pertain method of measuring motivation and predicting achieve-
to different facets of personality and predict ment. New York: Columbia University Press.
different criteria (McClelland et al., 1989). Atkinson, J. W. (1992). Motivational determinants of
As Figure 2 shows (box 4), the individual's thematic apperception. In C. P. Smith (Ed.), Motivation
and personality: Handbook of thematic content analysis
ongoing synthesis of experience leads to the (pp. 21±48). New York: Cambridge University Press.
development of knowledge structures. Various Atkinson, J. W., Bongort, K., & Price, L. H. (1977).
performance measures in an assessment battery Explorations using computer simulation to comprehend
can be examined in terms of what the task TAT measurement of motivation. Motivation and Emo-
tion, 1, 1±27.
requires and what inner structures or resources Atwood, G., & Stolorow, R. (1984). Structures of
are brought to bear on the performance (box 5). subjectivity. Hillsdale, NJ: Analytic Press.
Actual adjustment in a given situation (box 6) Auld, F., Jr. (1954). Contributions of behavior theory to
depends not only on the person's resources but projective testing. Journal of Projective Techniques, 18,
also hinges on environmental expectations and 421±426.
Bargh, J. A. (1989). Conditioned automaticity: Varieties of
supports. automatic influence on social perception and cognition.
The person is assessed within a particular In J. S. Uleman & J. A. Bargh (Eds.), Unintended thought
social context of the testing situation including (pp. 3±51). New York: Guilford Press.
expectations about testing, anticipation of the Bargh, J. A. (1994). The four horsemen of automaticity:
potential consequences of testing, and the Awareness, intention, efficiency, and control in social
cognition. In R. S. Wyer, Jr. & T. K. Srull (Eds.),
relationship with the examiner. Therefore, these Handbook of social cognition (Vol. 1, pp. 1±40). Hillsdale,
variables are considered along with the task NJ: Erlbaum.
demands. The job of the examiner is to organize Barkley, R. A. (1990). Attention-deficit hyperactivity
multiple sources of data including reports of self disorder: A handbook for diagnosis and treatment. New
York: Guilford Press.
and others, various performance measures, Bassan-Diamond, L. E., Teglasi, H., & Schmitt, P. (1995).
relevant past history, and current circumstances Temperament and a story-telling measure of self-regula-
into a framework for understanding the referral tion. Journal of Research in Personality, 29, 109±120.
issues. All of the information is sifted through Baumeister, R. F., Wotman, S. R., & Stillwell, A. M.
an understanding of family and cultural back- (1993). Unrequited love: On heartbreak, anger, guilt,
scriptness, and humiliation. Journal of Personality and
ground as pertains to the individual being Social Psychology, 64, 377±394.
assessed. The understanding is informed not Beck, S. J. (1981). Reality Rorschach and perceptual
only by theories of personality, human devel- theory. In A. I. Rabin (Ed.), Assessment with projective
opment, and psychopathology, but also by a techniques (pp. 23±46). New York: Springer.
Bellak, L. (1975). The T.A.T., C.A.T., and S.A.T. in clinical
conceptualization of the various tasks in the use. New York: Grune & Stratton.
battery and how they relate to performance in Bellak, L. (1993). The T.A.T., C.A.T., and S.A.T. in clinical
life situations. Findings are not a listing of use (5th ed.). Needham Heights, MA: Allyn & Bacon.
isolated facts but conclusions based on cohesive Birney, R. C. (1958). Thematic content and the cue
patterns that provide an understanding of the characteristics of pictures. In J. W. Atkinson (Ed.),
Motives in fantasy, action, and society (pp. 630±643).
problem and point to appropriate action. New York: Van Nostrand.
Blatt, S. J. (1991). A cognitive morphology of psycho-
pathology. Journal of Nervous and Mental Disease, 179,
449±458.
4.16.16 REFERENCES Blatt, S. J. (1992). The Rorschach: A test of perception
on an evaluation of representation. In E. I. Megargee
Abelson, R. P. (1976). Script processing in attitude & C. D. Spielberger (Eds.), Personality assessment in
formation and decision making. In J. S. Carroll & J. America: A retrospective on the occasion of the fiftieth
W. Payne (Eds.), Cognition and social behavior. Hillsdale, anniversary of the Society for Personality Assessment
NJ: Erlbaum. (pp. 160±169). Hillsdale, NJ: Erlbaum. (Reprinted
Abelson, R. P. (1981). Psychological status of the script from 1990 Journal of Personality Assessment, 55,
concept. American Psychologist, 36, 715±729. 394±416).
Acklin, M. W. (1994). Some contributions of cognitive Blatt, S. J., & Berman, W. H., Jr. (1984). A methodology
science to the Rorschach Test. In I. B. Weiner (Ed.), for the use of the Rorschach in clinical research. Journal
Rorschachianna (pp. 129±145). Seattle, WA: Hogrefe & of Personality Assessment, 48, 226±239.
Huber. Blatt, S. J., Ford, R. Q., Berman, W., et al. (1988). The
Alper, T. G., & Greenberger, E. (1967). Relationship of assessment of therapeutic change in schizophrenic and
picture structure to achievement motivation of college borderline young adults. Psychoanalytic Psychology, 5,
women. Journal of Personality and Social Psychology, 7, 127±158.
362±371. Blatt, S. J., & Lerner, H. (1983). The psychological
496 Assessment of Schema and Problem-solving Strategies with Projective Techniques

assessment of object representations. Journal of Person- Durand, V. M., Blanchard, E. B., & Mindell, J. A. (1988).
ality Assessment, 47, 7±28. Training in projective testing: Survey of clinical training
Blatt, S. J., & Wild, C. M. (1976). Schizophrenia: A directors and internship directors. Professional Psychol-
developmental analysis. New York: Academic Press. ogy: Research and Practice, 19, 236±238.
Blatt, S. J., Allison, J., & Feirstein, A. (1969). The capacity Emde, R. N. (1989). The infant's relationship experience:
to cope with cognitive complexity. Journal of Personality, Developmental and affective aspects. In A. J. Sameroff &
37, 269±288. R. N. Emde (Eds.), Relationship disturbances in early
Bransford, J. D., Franks, J.J., Vye, N. J., & Sherwood, R. D. childhood: A developmental approach. New York: Basic
(1989). New approaches to instruction: Because wisdom Books.
can't be taught. In S. Voshiadou & A. Ortony (Eds.), Epstein, S. (1994). Integration of the cognitive and
Similarity and analogical reasoning (pp. 470±497). Cam- psychodynamic unconscious. American Psychologist,
bridge, UK: Cambridge University Press. 49, 709±724.
Bruhn, A. R. (1992). The early memories procedure: A Erdberg, P. (1993). The U.S. Rorschach scene: Integration
projective test of autobiographical memory, Part 2. and elaboration. In I. B. Weiner (Ed.), Rorschachianna
Journal of Personality Assessment, 58, 326±346. XIX: Yearbook of the International Rorschach Society
Bruner, J. S. (1986). Actual minds, possible worlds. Cam- (pp. 139±151). Seattle, WA: Hografe & Huber.
bridge, MA: Harvard University Press. Exner, J. E. (1989). Searching for projection in the
Buck, J. N. (1948). The H±T±P technique: A quantitative Rorschach. Journal of Personality Assessment, 53,
and qualitative scoring manual. Clinical Psychological 520±536.
Monographs, 5, 1±120. Exner, J. E. (1993). The Rorschach: A comprehensive
Buck, J. N. (1987). The House-Tree-Person technique: system: Vol. 1. Basic processes. New York: Wiley.
Revised manual. Los Angeles: Western Psychological Exner, J. E., & Weiner, I. B. (1995). The Rorschach: A
Services. comprehensive system: Vol. 3. Assessment of children and
Burns, R. C. (1987). Kinetic-House-Tree-Person drawings adolescents. New York: Wiley.
(KHTP). New York: Brunner/Mazel. Fazio, R. H., Sanbonmatsu, D. M., Powell, M. C., &
Burns, R., & Kaufman, S. (1970). Kinetic Family Drawings Kardes, F. R. (1986). On the automatic activation of
(K-F-D): An introduction to understanding children attitudes. Journal of Personality and Social Psychology,
through kinetic drawings. New York: Brunnel/Mazel. 50, 229±238.
Burns, R. C., & Kaufman, H. S. (1972). Actions, styles, and Feher, E., VandeCreek, L., & Teglasi, H. (1983). The
symbols in Kinetic Family Drawings: An interpretive problem of art quality in the use of the Human Figure
manual. New York: Brunner/Mazel. Drawing Test. Journal of Clinical Psychology, 39, 268±275.
Butcher, J. N., & Rouse, S. V. (1996). Personality: Festinger, L. (1957). A theory of cognitive dissonance.
Individual differences and clinical assessment. In J. T. Evanston, IL: Row, Peterson.
Spence, J. M. Darley, & D. J. Foss (Eds.), Annual Review Fisher, S. (1986). Development and structure of the body
of Psychology, 47, 87±111. image. Hillsdale, NJ: Erlbaum.
Campbell, D. T. (1960). Recommendations for APA test Fiske, A. P., Haslam, N., & Fiske, S. T. (1991). Confusing
standards regarding construct, trait or discriminant one person with another: What errors reveal about the
validity. American Psychologist, 15, 546±553. elementary forms of social relations. Journal of Person-
Campbell, D. T., & Fiske, D. W. (1959). Convergent and ality and Social Psychology, 60, 656±674.
discriminant validation by the multitrait-multimethod Fiske, S. T., & Taylor, S. (1991). Social cognition (2nd ed.).
matrix. Psychological Bulletin, 56, 81±105. New York: McGraw-Hill.
Campus, N. (1976). A measure of needs to assess the Frank, L. D. (1939). Projective methods for the study of
stimulus characteristics of TAT cards. Journal of personality. Journal of Psychology, 8, 389±413.
Personality Assessment, 40, 248±258. Frank, L. D. (1948). Projective Methods. Springfield, IL:
Cantor, N., & Kihistrom, J. F. (1987). Personality Thomas.
and social intelligence. Englewood Cliffs, NJ: Prentice- Fromkin, H. L., & Streufert, S. (1976). Laboratory
Hall. experiments. In M. D. Dunnette (Ed.), Handbook of
Carlson, L., & Carlson, R. (1984). Affect and psychological industrial and organizational psychology (pp. 415±465).
magnification. Deviations from Tomkins' script theory. Chicago: Rand McNally.
Journal of Personality, 52, 36±45. Goodenough, F. L. (1926). Measurement of intelligence by
Chi, M., Glaser, R., & Farr, M. (1987). Nature of expertise. drawings. New York: World Books.
Hillsdale, NJ: Erlbaum. Greenberg, L. S., Rice, L. N., & Elliott, R. (1993).
Clore, G. L., Schwarz, N., & Conway, M. (1994). Affective Facilitating emotional change: The moment by moment
causes and consequences of social information proces- process. New York: Guilford Press.
sing. In R. S. Wyer & T. K. Srull (Eds.), Handbook Groth-Marnat, G. (1990). Handbook of psychological
of social cognition (pp. 323±417). Hillsdale, NJ: Erlbaum. assessment (2nd ed.). New York: Wiley.
Cooper, A. (1981). A basic TAT set for adolescent males. Hammer, E. F. (1958). The clinical application of projective
Journal of Clinical Psychology, 37, 411±414. drawings. Springfield, IL: Charles C. Thomas.
Cramer, P. (1996). Storytelling narrative and the Thematic Handler, L. (1985). The clinical use of the Draw-A-Person
Apperception Test. New York: Guilford Press. Test (DAP). In C. S. Newmark (Ed.), Major psycholo-
Crick, N. R., & Dodge, K. A. (1994). A review and gical assessment instruments. Newton, MA: Allyn &
reformulation of social-information-processing mechan- Bacon.
isms in children's social adjustment. Psychological Handler, L., & Habernicht, D. (1994). The Kinetic Family
Bulletin, 115, 74±101. Drawing Technique: A review of the literature. Journal
Cronbach, L. J. (1970). Essentials of psychological testing of Personality Assessment, 62, 440±464.
(3rd ed.). New York: Harper & Row. Harris, D. B. (1963). Children's drawings as measures of
Demorest, A. P., & Alexander, I. E. (1992). Affective intellectual maturity. New York: Harcourt Brace Jova-
scripts as organizers of personal experience. Journal of novich.
Personality, 60, 645±663. Hartman, A. A. (1970). A basic TAT set. Journal of
Dodge, K. A., & Feldman, E. (1990). Issues in social Projective Techniques, 34, 391±396.
cognition and sociometric status. In S. R. Asher & J. D. Haynes, J. P., & Peltier, J. (1985). Patterns of practice with
Coie (Eds.), Peer rejection in childhood (pp. 119±155). TAT in juvenile forensic settings. Journal of Personality
New York: Cambridge University Press. Assessment, 49, 26±29.
References 497

Heider, F. (1958). The psychology of interpersonal relations. Kenny, D. T., & Bijou, S. W. (1953). Ambiguity of pictures
New York: Wiley. and extent of personality factors in fantasy responses.
Henry, W. E. (1956). The analysis of fantasy: The Thematic Journal of Consulting Psychology, 17, 283±288.
Apperception Test in the study of personality. New York: Kihlstrom, J. F. (1984). Conscious, subconscious and
Wiley. preconscious: A cognitive perspective. In K. S. Bower
Higgins, E. T. (1990). Personality, social psychology, and & D. Michenbaum (Eds.), The unconscious reconsidered.
person-situation relations: Standards and knowledge New York: Wiley.
activism as a common language. In L. A. Pervin (Ed.), Kihlstrom, J. F. (1987). The cognitive unconscious.
Handbook of personality: Theory and research, Science, 237, 1445±1452.
(pp. 301±338). New York: Guilford Press. Kihlstrom, J. F. (1990). The psychological unconscious. In
Hogan, R. (1987). Personality psychology: Back to basics. L. A. Pervin (Ed.), Handbook of personality: Theory and
In J. Aronoff, A. J. Rabin, & R. A. Zucker (Eds.), The research (pp. 445±464). New York: Guilford Press.
emergence of personality (pp. 79±105). New York: Kihlstrom, J. F., & Cantor, N. (1984). Mental representa-
Springer. tions of the self. In L. Berkowitz (Ed.), Advances in
Holt, R. R. (1958). Formal aspects of the TATÐA experimental social psychology (Vol. 17) New York:
neglected resource. Journal of Projective Techniques, 22, Academic Press.
163±172. Klopfer, W. G. (1981). Integration of projective techniques
Holt, R. R. (1961). The nature of TAT stories as cognitive in the clinical case study. In A. I. Rabin (Ed.),
products: A psychoanalytic approach. In J. Kagan & G. Assessment with projective techniques (pp. 233±264).
Lesser (Eds.), Contemporary issues in thematic appercep- New York: Springer.
tive methods (pp. 3±40). Springfield, IL: Charles C. Kochanska, G. (1993). Toward a synthesis of parental
Thomas. socialization and child temperament in early develop-
Holt, R. R. (1978). Methods in clinical psychology: Vol. 1. ment of conscience. Child Development, 64, 325±347.
Projective assessment. New York: Plenum. Koestner, R., Weinberger, J., & McClelland, D. C. (1991).
Holt, R., & Luborsky, L. (1958). Personality patterns of Task-intrinsic and social-extrinsic sources of arousal for
psychiatrists: A study of methods for selecting residents. motives assessed in fantasy and self-report. Journal of
New York: Basic Books. Personality, 59, 57±82.
Holtzman, W. H. (1981). Holtzman Inkblot Technique. In Koppitz, E. (1968). Psychological evaluation of children's
A. I. Rabin (Ed.), Assessment with projective techniques human figure drawings. New York: Grune & Stratton.
(pp. 47±83). New York: Springer. Koppitz, E. M. (1984). Psychological evaluation of human
Holtzman, W. H., Thorpe, J., Swartz, J., & Heron, E. figure drawings by middle school pupils. New York:
(1961). Inkblot perception and personality: Holtzman Grune & Stratton.
Inkblot Technique. Austin, TX: University of Texas Kuhl, J., & Beckmann, J. (Eds.) (1994). Volition and
Press. personality. Seattle, WA: Hogrefe & Huber.
Horowitz, M. J. (1991). Person schema and maladaptive Lerner, P. (1992). Toward an experiential psychoanalytic
interpersonal patterns. Chicago: University of Chicago approach to the Rorschach. Bulletin of the Meninger
Press. Clinic, 56, 451±464.
Ingram, R. E. (Ed.) (1986). Information processing Leve, R. M. (1995). Child and adolescent psychotherapy:
approaches to clinical psychology. New York: Academic Process and integration. Boston: Allyn & Bacon.
Press. Lewin, K. (1935). A dynamic theory of personality. New
Isen, A. M. (1984). Toward understanding the role of affect York: McGraw-Hill.
in cognition. In R. S. Wyer, Jr. & T. K. Srull (Eds.), Locraft, C., & Teglasi, H. (1997). Teacher rated empathic
Handbook of social cognition (Vol. 3, pp. 179±236). behavior and children's T.A.T. stories. Journal of School
Hillsdale, NJ: Erlbaum. Psychology, 35, 217±237.
Isen, A. M. (1987). Positive affect, cognitive processes, and Luborsky, L., & Crits-Christoph, P. (1990). Understanding
social behavior. In L. Berkowitz (Ed.), Advances in transference: The CCRT method. New York: Basic
experimental social psychology (Vol. 20, pp. 203±253). Books.
New York: Academic Press. Lundy, A. (1985). The reliability of the Thematic
Johnston, M. H., & Holzman, P. S. (1979). Assessing Apperception Test. Journal of Personality Assessment,
schizophrenic thinking. San Francisco: Jossey-Bass. 49, 141±145.
Kahana, B. (1978). The use of projective techniques in Machover, K. (1949). Personality projection in the drawing
personality assessment of the aged. In M. Storandt, I. of the human figure. Springfield, IL: Charles C. Thomas.
Siegler, & M. Elias (Eds.), The clinical psychology of MacLeod, C., & Cohen, I. L. (1993). Anxiety and the
aging. New York: Plenum. interpretation of ambiguity: A text comprehension
Kahill, S. (1984). Human figure drawings in adults: An study. Journal of Abnormal Psychology, 102, 238±247.
update of the empirical evidence, 1962±1982. Canadian Maloney, M. P., & Ward, M. P. (1976). Psychological
Psychology, 25, 269±292. assessment: A conceptual approach. New York: Oxford
Karon, B. P. (1981). The Thematic Apperception Test University Press.
(TAT). In A. I. Rabin (Ed.), Assessment with projective McAdams, D. P. (1990). Unity and purpose in human
techniques: A concise introduction (pp. 85±120). New lives: The emergence of identity as a life story. In
York: Springer. A. I. Rabin, R. Zucker, R. Emmons, & S. Frank (Eds.),
Kelley, H. H. (1967). Attribution theory in social Studying persons and lives (pp. 148±200). New York:
psychology. In D. Levine (Ed.), Nebraska Symposium Springer.
on Motivations (Vol. 15). Lincoln, NE: University of McAdams, D. P. (1995). What do we know when we know
Nebraska Press. a person? Journal of Personality, 63, 365±396.
Kelly, G. A. (1958). The theory and technique of McClelland, D. C., & Koestner, R. (1992). The achieve-
assessment. In P. R. Farnsworth & Q. McNemar ment motive. In C. P. Smith (Ed.), Motivation and
(Eds.), Annual Review of Psychology, 9, 323±352. Palo personality: Handbook of thematic content analysis
Alto: Annual Reviews. (pp. 143±152). New York: Cambridge University Press.
Kenny, D. T. (1964). Stimulus functions in projective McClelland, D. C., Koestner, R., & Weinberger, J. (1989).
techniques. In B. A. Maher (Ed.), Progress in experi- How do self-attributed and implicit motives differ?
mental personality research (pp. 285±354). New York: Psychological Review, 96, 690±702.
Academic Press. McGreevy, J. C. (1962). Interlevel disparity and predictive
498 Assessment of Schema and Problem-solving Strategies with Projective Techniques

efficiency. Journal of Projective Techniques, 26, 80±87. Rogers, R. (Ed.) (1997). Clinical assessment of malingering
Meissner, W. W. (1974). Differentiation and integration of and deception. New York: Guilford Press.
learning and identification in the developmental process. Rubin, D. C. (Ed.) (1986). Autobiographical memory. New
Annual of Psychoanalysis, 2, 181±196. York: Cambridge University Press.
Meissner, W. W. (1981). Internalization in psychoanalysis. Rummelhart, D. E., Smolensky, P., McClelland, J. L., &
Psychological Issues Monograph, 50. New York: Inter- Hinton, G. E. (1986). Schematic and sequential thought
national Universities Press. processes in PDP models. In J. L. McClelland & D. E.
Mendez, M. F., Ala, T., & Underwood, K. L. (1992). Rummelhart (Eds.), Parallel distributed processing: Ex-
Development of scoring criteria for the clock drawing plorations in the microstructure of cognition (Vol. 2).
task in Alzheimer's disease. Journal of the American Cambridge, MA: MIT Press.
Geriatric Society, 40, 1095±1099. Sackett, P. R., Zedeck, S., & Fogli, L. (1988). Relations
Messick, S. (1989). Validity. In R. L. Linn (Ed.), between measures of typical and maximum job perfor-
Educational measurement (3rd ed., pp. 13±103). New mance. Journal of Applied Psychology, 73, 482±486.
York: Macmillan. Saltz, G., & Epstein, S. (1963). Thematic hostility and guilt
Moffit, K. H., & Singer, J. A. (1994). Continuity in the life responses as related to self-reported hostility, guilt, and
story: Self-defining memories, affect, and approach/ conflict. Journal of Abnormal and Social Psychology, 67,
avoidance personal strivings. Journal of Personality, 62, 469±479.
21±43. Sandler, J, & Rosenblatt, B. (1962). The concept of the
Morgan, C. O., & Murray, H. A. (1935). A method for representational world. The Psychoanalytic Study of the
investigating fantasies: The Thematic Apperception Test. Child, 17, 128±145.
Archives of Neurology and Psychiatry, 34, 289±306. Schank, R. C. (1990). Tell me a story: A new look at real
Murray, H. A. (1938). Explorations in personality. New and artificial memory. New York: Charles Scribner's
York: Oxford University Press. Sons.
Murray, H. A. (1943). Thematic Apperception Test manual. Shapiro, D. (1965). Neurotic styles. New York: Basic
Cambridge, MA: Harvard University Press. Books.
Murstein, B. I. (1965). The stimulus. In B. Murstein (Ed.), Shneidman, E. S. (1951). Thematic test analysis. New York:
Handbook of projective techniques. New York: Basic Grune and Stratton.
Books. Singer, J. L. (1981). Research applications of projective
Murstein, B. I. (1968). Efforts of stimulus, background, methods. In A. I. Rabin (Ed.), Assessment with projective
personality, and scoring system on the manifestation of techniques (pp. 297±331). New York: Springer.
hostility on the TAT. Journal of Consulting and Clinical Singer, J. L., & Salovey, P. (1991). Organized knowledge
Psychology, 32, 355±365. structures and personality. In M. J. Horowitz (Ed.),
Norcross, J. C. (Ed.) (1986). Handbook of eclectic Personal schemas and maladaptive interpersonal patterns
psychotherapy. New York: Brunner/Mazel. (pp. 33±80). Chicago: University of Chicago Press.
Piaget, J. (1954). The construction of reality in the child. Singer, J. A., & Singer, J. L. (1994). Social-cognitive and
New York: Basic Books. narrative perspectives on transference. In J. M. Masling
Piotrowski, C. (1984). The status of projective techniques: & R. F. Bornstein (Eds.), Empirical perspectives on object
Or, ªwishing won't make it go away.º Journal of Clinical relations theory. Washington, DC: American Psycholo-
Psychology, 40(6), 1495±1502. gical Association.
Piotrowski, C., & Keller, J. W. (1989). Psychological Skinner, B. F. (1953). Science and human behavior. New
testing in outpatient mental health facilities: A national York: Macmillan.
study. Professional Psychology: Research and Practice, Smith, B. L. (1994). Object relations theory and the
20, 423±425. integration of empirical and psychoanalytic approaches
Piotrowski, C., Sherry, D., & Keller, J. W. (1985). to Rorschach interpretation. In I. B. Weiner (Ed),
Psychodiagnostic test usage: A survey of the Society Rorschachiana XIX: Yearbook of the International
for Personality Assessment. Journal of Personality Rorschach Society (pp. 61±77). Seattle, WA: Hogrefe &
Assessment, 49, 115±119. Huber.
Piotrowski, C., & Zalewski, C. (1993). Training in Smith, C. P. (Ed.) (1992a). Motivation and personality:
psychodiagnostic testing in APA-approved PsyD and Handbook of thematic content analysis. New York:
PhD clinical psychology programs. Journal of Person- Cambridge University Press.
ality Assessment, 61, 394±405. Smith, C. P. (1992b). Reliability issues. In C. P. Smith
Plomin, R. (1986). Development, genetics, and psychology. (Ed.), Motivation and personality: Handbook of thematic
New York: Erlbaum. content analysis (pp. 126±139). New York: Cambridge
Quinn, N., & Holland, D. (1987). Cultural models in University Press.
language and thought. New York: Cambridge University Snow, R. E. (1974). Representative and quasi-representa-
Press. tive designs for research on teaching. Review of Educa-
Rabin, A. I. (1981). Projective methods: A historical tional Research, 44, 265±291.
introduction. In A. I. Rabin (Ed.), Assessment with Spangler, W. D. (1992). Validity of questionnaire and TAT
projective techniques (pp. 1±22). New York: Springer. measures of need for achievement. Psychological Bulle-
Rappaport, D., Gill, M., & Schafer, R. (1968). Diagnostic tin, 112, 140±154.
psychological testing (Rev. ed.). New York: International Stein, D. J., & Young, J. E. (Eds.) (1992). Cognitive science
University Press. and clinical disorders. San Diego, CA: Academic Press.
Raynor, J. D., & McFarlin, D. B. (1986). Motivation and Stein, M. J. (1955). The Thematic Apperception Test (Rev.
the self system. In R. M. Sorrentino & E. T. Higgins ed.). Cambridge, MA: Addison-Wesley.
(Eds.), Handbook of motivation and cognition: Founda- Sternberg, R. J. (1985). Beyond IQ: A triarchic theory of
tions of social behavior (pp. 315±349). New York: human intelligence. New York: Cambridge University
Guilford Press. Press.
Ritzler, B. A., Sharkey, K. J., & Chudy, J. F. (1980). A Stinson, C. H., & Palmer, S. E. (1991). Parallel and
comprehensive projective alternative to the TAT. Journal distributed processing models of person schemas and
of Personality Assessment, 44, 358±362. psychopathologies. In M. J. Horowitz (Ed.), Personal
Roback, H. B. (1968). Human figure drawings: Their utility schemas and maladaptive interpersonal patterns
in the clinical psychologist's armamentarium for person- (pp. 339±377). Chicago: University of Chicago Press.
ality assessment. Psychological Bulletin, 70, 1±19. Stricker, G., & Healey, B. J. (1990). Projective assessment
References 499

of object relations: A review of the empirical literature. Vane, J. R., & Guarnaccia, V. J. (1989). Personality theory
Psychological Assessment: A Journal of Consulting and and personality assessment measures: How helpful to the
Clinical Psychology, 2, 219±230. clinician? Journal of Clinical Psychology, 45, 5±19.
Swensen, C. H. (1968). Empirical evaluations of human Veroff, J. (1992). In C. P. Smith (Ed.), Motivation and
figure drawings: 1957±1966. Psychological Bulletin, 70, personality: Handbook of thematic content analysis
20±44. (pp. 100±109). New York: Cambridge University Press.
Swindell, C. S., Holland, A. L., Fromm, D., & Greenhouse, Watkins, C. E., Campbell, V. L., & McGregor, P. (1988).
J. B. (1988). Characteristics of recovery of drawing Counseling psychologists' uses of and opinions about
ability in left and right brain-damaged patients. Brain psychological tests: A contemporary perspective. The
and Cognition, 7, 16±30. Counseling Psychologist, 16, 476±486.
Taylor, S. E., & Crocker, J. (1981). Schematic bases of Watkins, C. E., Campbell, V. L., Nieberding, R., &
social information processing. In E. T. Higgins, C. P. Hallmark, R. (1995). Contemporary practice of psycho-
Herman, & M. P. Zanna (Eds.), Social Cognition. The logical assessment by clinical psychologists. Professional
Ontario symposium on personality and social psychology. Psychology Research and Practice, 26, 54±60.
Hillsdale, NJ: Erlbaum. Weiner, I. B. (1977). Approaches for Rorschach validation.
Teglasi, H. (1993). Clinical use of story telling: Emphasizing In M. Rickers-Ovsiankina (Ed.), Rorschach psychology.
the TAT with children and adolescents. Needham Huntington, NY: Krieger.
Heights, MA: Allyn & Bacon. Weiner, I. B. (Ed.) (1994). Rorschchianna XIX: Yearbook of
Thomas, A. D., & Dudek, S. Z. (1985). Interpersonal affect the International Rorschach Society. Seattle, WA: Ho-
in Thematic Apperception Test responses: A scoring grafe & Huber.
system. Journal of Personality Assessment, 49, 30±36. Wertheim, E. H., & Schwartz, J. C. (1983). Depression,
Tomkins, S. S. (1947). Thematic Apperception Test. New guilt, and self-management of pleasant and unpleasant
York: Grune & Stratton. events. Journal of Personality and Social Psychology, 45,
Tomkins, S. S. (1979). Script theory: Differentiated 884±889.
magnification of affects. In H. E. Howe, Jr. & R. A. Westen, D. (1991). The clinical assessment of object
Dienstbier (Eds.), Nebraska Symposium on Motivation relations using the TAT. Journal of Personality Assess-
(Vol. 26). Lincoln, NE: University of Nebraska Press. ment, 56, 56±74.
Tomkins, S. S. (1987). Script theory. In J. Aronoff, A. J. Westen, D. (1993). Social cognition and social affect in
Rabin, & R. A. Zucker (Eds.), The emergence of psychoanalysis and cognitive psychology: From regres-
personality (pp. 147±216). New York: Springer. sion analysis to analysis of regression. In J. W. Barron,
Tomkins, S. S. (1991). Imagery, affect, consciousness (Vol. M. N. Eagle, & D. L. Wolitzky (Eds.), Interface of
3). New York: Springer. psychoanalysis and psychology (pp. 375±388). Washing-
Torgesen, J. K. (1977). The role of non-specific factors in ton, DC: American Psychological Association.
task performance of learning disabled children: A Willock, B. (1992). Projection, transitional phenomena,
theoretical assessment. Journal of Learning Disabilities, and the Rorschach. Journal of Personality Assessment,
10, 27±35. 59, 99±116.
Tunis, S. L. (1991). Causal explanations in psychotherapy: Wilson, A. (1988). Levels of depression and clinical
Evidence for Target-and-Domain-Specific schematic assessment. In H. D. Lerner & P. M. Lerner (Eds.),
patterns. In M. J. Horowitz (Ed.), Personal schemas Primitive mental states and the Rorschach (pp. 441±462).
and maladaptive interpersonal patterns (pp. 261±276). Madison, CT: International Universities Press.
Chicago: University of Chicago Press. Wyatt, F. (1947). The scoring and analysis of the Thematic
Uleman, J. S., & Bargh, J. A. (Eds.) (1989). Unintended Apperception Test. Journal of Psychology, 24, 319±330.
thought. New York: Guildford Press. Wyer, R. S., Jr., & Srull, T. K. (Eds.) (1994). Handbook of
Urban, H. M. (1963). The Draw-A-Person. Los Angeles: social cognition (2nd ed., Vols. 1 & 2). Hillsdale, NJ:
Western Psychological Services. Erlbaum.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.17
Computer Assisted Psychological
Assessment
GALE H. ROID and W. BRAD JOHNSON
George Fox University, Newberg, OR, USA

4.17.1 INTRODUCTION 502


4.17.1.1 Definitions and Distinctions 502
4.17.1.2 Brief History of Computer Assisted Psychological Assessment 503
4.17.2 A TYPOLOGY OF COMPUTER ASSISTED PSYCHOLOGICAL ASSESSMENTS 504
4.17.2.1 Test Administration 505
4.17.2.2 Computer Scoring 506
4.17.2.3 Descriptive Interpretation 507
4.17.2.4 Narrative Interpretation 507
4.17.2.5 Statistical±Actuarial Programs 507
4.17.3 COMPUTER ASSISTED PSYCHOLOGICAL ASSESSMENT: ADVANTAGES 509
4.17.3.1 Improved Administration and Scoring 509
4.17.3.2 Objectivity 509
4.17.3.3 Speed 509
4.17.3.4 Reliability 509
4.17.3.5 Cost Effectiveness 509
4.17.3.6 Expert Consultation 510
4.17.3.7 Flexibility 510
4.17.4 COMPUTER ASSISTED PSYCHOLOGICAL ASSESSMENT: DISADVANTAGES 510
4.17.4.1 Excessive Generality: The Barnum Effect 510
4.17.4.2 Lack of Validity 511
4.17.4.3 Depersonalizing the Assessment Process 511
4.17.4.4 Potential for Misuse and Client Harm 511
4.17.4.5 Computer as Clinician 511
4.17.5 ETHICAL ISSUES 512
4.17.5.1 Test Development 512
4.17.5.2 Basis for Scientific and Professional Judgments 512
4.17.5.3 Describing the Nature of Psychological Services 513
4.17.5.4 Competence 513
4.17.5.5 Professional Context 514
4.17.5.6 CAPA with Special Populations 515
4.17.6 GUIDELINES FOR USERS OF COMPUTER-BASED TESTS AND INTERPRETATIONS 515
4.17.6.1 Administration 517
4.17.6.2 Evaluation and Selection of CBTIs 517
4.17.6.3 Interpretation 518
4.17.7 GUIDELINES FOR DEVELOPERS OF COMPUTER-BASED TEST SERVICES 519
4.17.7.1 Human Factors 519
4.17.7.2 Psychometric Properties 519
4.17.7.3 Classification Strategy 519
4.17.7.4 Validity of Computer Interpretations 519
4.17.7.5 Facilitation of Review 520

501
502 Computer Assisted Psychological Assessment

4.17.8 DIRECTIONS FOR THE FUTURE 520


4.17.9 CONCLUSIONS AND RECOMMENDATIONS 521
4.17.10 REFERENCES 522

4.17.1 INTRODUCTION computers in test development are beyond the


scope of the present chapter. For a review of
A common dilemma for clinical psychologists some of the technical methods proposed for the
is the tension between the need for diagnosis and development of computer-based psychological
the complexities of the individual case. tests, see Green (1991a) on adaptive testing;
Although resources to help the client may be Guastello and Rieke (1994) on the expert-
contingent on certain types of psychological system approach; Snyder, Widiger, and Hoover
disorders, concern for the client and the (1990) for a review of test-interpretive program
avoidance of stigmatizing labels clearly make development; and the review by Roid (1986).
assessment difficult when cases are unusual or By ªassessmentº we refer to the process of
complex. Also, time may be limited or resources examining the entire range of information,
of the client may be limited, making an extended including standardized tests, scales, inventories,
assessment difficult to complete. In the midst of questionnaires, protective tests, observations,
this dilemma, various methods of computer interview data, histories, and other observations
assisted psychological assessment (CAPA) have by experienced psychologists who study an
emerged and some have become prominent, individual for purposes of diagnosis, descrip-
especially with the advent of the personal tion, classification, treatment planning, pre-
computer in the 1980s, with both promise therapy observation, or any other professional
(e.g., Jackson, 1985) and potential problems evaluation. Most psychometric textbooks make
(e.g., Matarazzo, 1986). This chapter surveys an important distinction between assessment
the definition, history, types of implementation, and ªtestingº or ªmeasurement.º Testing
advantages and disadvantages, reliability and usually involves the administration of standar-
validity concerns, and issues of ethical and dized stimuli (e.g., test or inventory items) to
professional responsibility in the use of com- clients whose responses are scored by objective
puters in assessment. No attempt will be made in or judgment-based scoring methods. Measure-
this chapter to review or critique actual software ment is traditionally defined as the assignment
or existing CAPA programs, except by example of categories or scale values to such aspects of
or reference, due to the rapidly changing nature individuals as traits, attributes, attitudes, beha-
of the technology and the continual updating of vior, and preferences by a psychological testing
various programs. This chapter will, however, instrument. The word ªtestº is typically reserved
take a hard look at the proposed advantages of for tasks in which there are correct answers or
CAPA, and, instead of placing inordinate problems to be solved, in contrast to ªscaleº or
weight on the technical promise of computers, ªinventoryº or ªquestionnaireº which measure
propose some firm limits on the use of CAPA. traits or preferences for which no single correct
First, some definitions and important distinc- answer exists for all people.
tions are made, followed by a brief history of the Matarazzo (1986) made the most vivid
development of CAPA. distinction between testing and assessment by
arguing that assessment should be reserved as a
4.17.1.1 Definitions and Distinctions term to describe the process conducted by an
experienced psychologist who gathers informa-
CAPA is broadly defined as any application tion for purposes of interpreting it and giving it
of computers to the development, administra- meaning within the context of the examinee's
tion, scoring or interpretation of tests, scales, total life history. Matarazzo's distinction would
inventories or questionnaires used in education result in questions such as, ªCan computers
or psychology. really provide assessments or only measure-
Psychological texts such as Gregory (1996) ments which must be interpreted by profes-
have proposed similar definitions. Those appear sionals?º This penetrating question strikes at the
consistent with early use of the term CAPA such heart of the ethical issues to be discussed in this
as that employed by Fowler (1985). The present chapter. Matarazzo was contrasting his defini-
chapter will focus more narrowly on computer tion of assessment with the technical, clerical, or
applications to cognitive or personality instru- computerized processing of test information.
ments in which administration, scoring, and His position received immediate reactions such
interpretation are attempted. Methods of as that of Fowler and Butcher (1986) who
CAPA development and techniques of using argued for the view of the assessment process as
Introduction 503

one that could include both clinical and all of the careful distinctions among clinical vs.
computerized statistical assistance in the inter- statistical methods, and the reader is referred to
pretive process. Meehl (1954), Goldberg (1968), or to the review
The standards for psychological testing by Garb (1994).
(American Psychological Association (APA),
1985) make a distinction between test admin-
istration and test interpretation, and warn that 4.17.1.2 Brief History of Computer Assisted
test developers should inform users of any Psychological Assessment
special training or expertise needed for either
administration or interpretation. Thus, CAPA Fowler (1985) and Moreland (1992) provide
may be used for test administration, for brief histories of CAPA. Highlights include the
administration and scoring, or for a full range early attempts in the 1940s to computer score
of administration and interpretation. Typically, the Strong Vocational Interest Blank (SVIB),
a higher level of training is assumed for the Meehl's (1954) landmark book, and early 1960s
interpretation of test results, such as graduate- versions of mainframe computer programs to
level measurement or psychometrics courses, score and interpret MMPI responses. To say
supervised assessment and reporting, and that the history of CAPA has followed the
knowledge of concepts such as error of development of computers and scanning equip-
measurement. Also, a higher level of validation ment is obvious, but several landmarks are
is required for any interpretations of tests that instructive. Following the refinement of main-
impact clients. The distinction between admin- frame computers in World War II, and until
istration and interpretation of tests is critical to approximately 1970, most efforts to study or
the evaluation of CAPA and the role of implement CAPA were conducted on main-
computers because it highlights the continuing frame computers with several critical attributes:
responsibility of the psychologist to proactively (i) access to the computer was typically limited
interpret the results of the assessment process. to professional computer operators, (ii) pro-
Thus, we define another category of programs, grams were developed and operated in ªbatchº
computer-based test interpretive (CBTI) pro- mode where little interaction occurred between
grams, to delineate those that emphasize the user and the computerÐinteractions were
interpretation, often with narrative descriptions specified in advance with various control
of results. commands, and (iii) the responses of examinees
A final distinction of great importance was had to be scanned by a separate document
first discussed in depth by Meehl (1954) scanner or key entered. Thus, ªdynamicº entry
Ðclinical versus statistical (actuarial) predic- of data was not possible at most installations.
tion. By clinical prediction, common usage The advanced development of high-capacity
would normally suggest the processing of scanning equipment is often attributed to
information by the trained clinician who makes Lindquist and his colleagues at the University
a prediction, diagnosis, classification, or eva- of lowa (e.g., Flanagan & Lindquist, 1951), for
luation of a client based on clinical experience, use on various educational achievement tests,
ªintuition,º and professional judgment. By although similar developments were employed
statistical prediction, common usage would for the SVIB and the MMPI (Moreland, 1992).
suggest methods such as multiple regression In the 1970s (and, perhaps, earlier in
or validated ªcutting scoresº being used to experimental laboratories), ªtime sharingº
predict, diagnose, and classify an individual computer systems proliferated that connected
based on a relevant research database. Meehl's the user via Teletype machine or early versions
(1954) lengthy discussion of the issues pointed of computer terminals. Some CAPA applica-
out important distinctions such as the fact that tions were developed on systems initially
both clinical judgment and statistical methods designed for computer-assisted instruction.
may be predicting an individual case from These ªreal timeº systems allowed for a degree
trends in previous group dataÐthe group of all of interaction between the programmer and the
previous clients in the instance of clinical system, between the user and the output, and, in
judgment. Later researchers such as Goldberg some experimental applications, between the
(1968) showed that clinical judgments are not examinee and the time-share computer (e.g.,
necessarily more complex or ªconfiguralº as Klingler, Miller, Johnson, & Williams, 1977).
compared to statistical predictions, since simple Klingler et al. developed an automated assess-
linear equations effectively modeled the beha- ment system for psychiatric inpatients at a
vior of skilled judges who were assessing, for veterans administration hospital in Utah. The
example, psychosis vs. neurosis from the potential cost-benefit of such applications was
Minnesota Multiphasic Personality Inventory immediately apparent, and this stimulated
(MMPI). It would be impossible to summarize discussions at psychological conventions about
504 Computer Assisted Psychological Assessment

the ethical issues of on-line test administration ters, it is difficult to anticipate the developments
and scoring. in CAPA. Certainly, the addition of multimedia
Another important innovation in CAPA was to assessment instruments will be more easily
the development of multistage ªbranchingº or achieved and affordable by more test devel-
ªadaptiveº tests, which emerged from early opers. Dynamic video segments within standar-
psychometric studies in educational measure- dized tests could make real-life situational
ment (e.g., Angoff & Huddleston, 1958; Linn, assessment more feasible. In any case, the
Rock, & Cleary, 1969; Lord, 1968, 1971), from potential and the new ethical concerns stimu-
sequential methods in statistics (e.g., Cowden, lated by the prospect of test processing by
1946; Wald, 1947), and from the development of Internet or e-mail services looms large on the
item-response theory (e.g., Birnbaum, 1968; horizon.
Rasch, 1980). As early as the late 1960s, there In the next section, we review a typology of
was a considerable unpublished literature on CAPA programs. Included is a review of some
adaptive testing methods and their application of the key literature that documents the current
to psychological scales as well as educational status of CAPA and further definitions and
tests (e.g., Bayroff & Sealy, 1967; Patterson, distinctions.
1962; Roid, 1969). However, the first widely
distributed applications of computerized adap- 4.17.2 A TYPOLOGY OF COMPUTER
tive testing emerged in the 1980s. Weiss (1983) ASSISTED PSYCHOLOGICAL
was a key developer of methodology and ASSESSMENTS
applications in college and military settings.
Operational programs on personal computers Not all applications of CAPA have the same
emerged in aptitude testing (e.g. McBride, 1988) level of complexity or developmental sophisti-
and in conjunction with a large-scale project to cation to support them. Thus, it is important to
computerize the Armed Services Vocational have a typology of various programs so that the
Aptitude Battery (ASVAB; Green, 1991a; proper role of CAPA can be evaluated. It will be
Sands & Gade, 1983). the overall recommendation of this chapter that
With the development of the first Apple the psychologist carefully evaluate the level and
computers and the release of the first IBM type of CAPA product considered for clinical
personal computer in approximately 1980, the use, and perhaps, ªdraw a lineº of restriction in
widespread implementation and practicality of terms of the type, technical quality, and validity
CAPA for the local psychologist was finally of CAPA results actually employed in the
available. The early 1980s witnessed a flurry of evaluation of clients. For these reasons, a
rapid development and distribution of test typology of CAPA programs, derived from
scoring and interpretive programs for micro- the literature and the previous work of the
computers (e.g., Roid & Gorsuch, 1984). In senior author (Roid & Gorsuch, 1984; Roid,
reaction to the swift proliferation of CAPA 1986), is presented below. First, a few key
programs, Matarazzo (1983) published a stern distinctions and definitions are discussed.
warning about the lack of validation of As stated earlier in this chapter, a wide range
narrative interpretations and the potential of computer ªassistanceº is used in the devel-
dangers of distribution to inexperienced users. opment of psychological tests and assessments,
Special series of articles appeared in the Journal and we have chosen to delete these programs
of Consulting and Clinical Psychology (Butcher, from our typology in favor of emphasis on
1985), and Computers in Human Behavior CAPA products that may be directly used in
(Mitchell & Kramer, 1985.) These early articles client assessments. Also, there are a potentially
were a mixture of praise about the potential of large number of CAPA products available in the
the methodology and warnings about the ªemployment testingº industry that are not
limitations and ethical consequences of irre- considered due to the clinical focus of the
sponsible usage. In response to this outpouring current discussion. Finally, computer-based
of attention to CAPA, the APA (1986) products that are designed for on-line delivery
published a booklet of guidelines for develop- of psychotherapeutic exercises or cognitive
ment and usage of ªcomputer-based tests and training are excluded, even though some of
interpretationsº (see Table 2). Several resource them include assessment components.
books, cataloging various programs and user Significant differences exist between CAPA
options, were published (e.g., Butcher, 1987; programs for cognitive, neuropsychological,
Krug, 1984). and personality assessments. Some of the same
Because of the rapid innovation in computer statistical techniques have been applied to the
technology, and, in the mid-1990s, the devel- profile scores of both ability and personality
opment of the world-wide web, Internet, video batteries, but, for the most part, each has a
conferencing, and multimedia desk-top compu- particular type and style of presentation. For
A Typology of Computer Assisted Psychological Assessments 505

example, it is now common in the software to Table 1 A typology of computer assisted


score intelligence scales to include computa- assessment.
tions, frequencies, and probabilities of differ-
ences between two or more subtests or indexes Test administration
(e.g., verbal IQ vs. performance IQ). Statistical Conventional test administration
Multimedia or specialized test administration
contrasts of differences are more rare in Computerized adaptive testing (CAT)
personality assessment, where, instead, visual
inspection of the profile of scores is more Computer scoring
common. In computerized interpretive pro- Conventional scoring
grams in the personality realm (e.g., Lachar, Statistical scoring and profile analysis
1984), it is common to include ªcritical itemsº Graphic displays
and correlates between clinician ratings and Database analysis
ªprofile elevationsº (ranges of scores, such as
70±79T on the MMPI). However, in terms of the Descriptive interpretation
Narrative interpretation (CBTI programs)
typology that follows, the collective statistical
Informal narrative
techniques can be classified together as ªprofile Clinician-modeled or expert system
and statistical analysis.º
Statistical±Actuarial interpretation
4.17.2.1 Test Administration
Table 1 presents the typology. Some software
is designed to administer tests and to record the dynamically (Lord, 1968) to the ability level
results, which may be printed or briefly of the examinee during the testing session.
summarized on the computer screen. Adminis- Research (e.g., Green, 1991a) has shown that
tration can include highly sophisticated, multi- modern implementations of CAT can reduce
media programs with attached database testing time by half while maintaining an
capability, such as the MicroCog by Powell et acceptable level of measurement error. To
al. (1993), an on-line testing system to detect implement such a system requires complex
cognitive impairment in older adults. Programs psychometric development in which each item
can vary from basic ªpage turningº software in the pool of available items is given field trials
that simulates paper-and-pencil tests, to those and calibrated statistically. Programs such as
such as MicroCog that include graphic displays, MicroCAT (Assessment Systems, 1990) or
reaction time, optional sound, and delayed other specialized computer programs are then
memory trials also timed. Of particular im- used to implement the presentation of items. A
portance in this category of the typology, are broad range of ªbranching rulesº and scoring
programs such as the Conners' (1995) Contin- methods are now available for such tests,
uous Performance Test (CPT), used to assess depending on the type of item-response theory
attentiveness, particularly for children and model used in the calibration of the items (e.g.,
adults referred for attention-deficit evaluation. Wainer, 1990).
The CPT assesses visual vigilance in scanning While the statistical complexity of CAT
and concentrating on a stimulus arrayÐa type applications may be beyond the level of
of performance that must have accurate, split- psychometric training for many clinical psy-
second timing of both stimuli and responsesÐa chologists, one can consult the professional
perfect application of computers in test admin- reviews of such tests in the Buros Mental
istration. Measurements Yearbooks, or the current on-
A separate category of test administration is line electronic versions of these reviews. Alter-
computer-adaptive testing (CAT). CAT re- natively, legitimate CAT programs should have
quires specialized software to implement the an accompanying manual which can be eval-
sophisticated (usually item-response theory or uated for conventional reliability and validity
Bayesian) statistical model on which it is based. data, just as any published test is evaluated.
Several prominent examples exist, including the Psychological conventions and meetings often
newer versions of the computerized Graduate include exhibits or demonstrations of computer
Record Exams (Educational Testing Service, software, which can be a more cost-effective
1995), the ASVAB (Sands & Gade, 1983), and, way to review these products, since publishers
one of the first widely published commercially are often reluctant to distribute expensive
published tests, the Differential Aptitude Tests software for review. As a final recourse, one
(DAT Adaptive) Computerized Adaptive Edi- should consult a psychometric psychologist at a
tion (Psychological Corporation, 1987). The local university who can help to evaluate the
essential attribute of CAT administration is that technical qualities of programs considered for
the difficulty of the test items is tailored purchase.
506 Computer Assisted Psychological Assessment

4.17.2.2 Computer Scoring and those of new profiles being scored. The
essential part of the matching routine is a
Conventional scoring software has typically multivariate chi-square test of ªfitº between the
processed answer sheets or key-entered item profile pattern (which is allowed to vary within
responses and printed basic raw scores and ªbandsº or ranges of scores) and ªtargetº
various derived scores such as percentiles or prototypical patterns that have been estab-
standard scores. For many years, such pro- lished through statistical studies of clinical and
grams had a minimum of graphic display, due to normative cases. A second example of sophis-
the restricted nature of graphics possible until ticated profile analysis is in the Wechsler
the advent of the high-speed laser printer. As Scoring Assistant (Psychological Corporation,
mentioned in the brief history of CAPA, the 1992), where profile-score differences among
earliest versions of conventional scoring were on all the subtests of the WISC-III are analyzed
mainframe computers for the SVIB or the statistically, and the regression±prediction
MMPI. Many of these programs were quite method of calculating ability vs. achievement
creative and complex, relying on ªtypefaceº discrepancies (WISC-III vs. Wechsler Indivi-
characters to plot profiles or display histogram- dual Achievement Test scores) are included for
type plots of scores. screening of learning disabilities.
At the next level in the typology are scoring As more and more technical advances are
programs that include sophisticated statistical made in computer printers, compact-disk and
computations and profile analysis. A prolifera- computerized video displays, the more sophis-
tion of ªscoring assistantº programs has ticated the graphic display of test profiles and
emerged, such as the ªcompuscoreº series for score patterns will become. In the final level of
the Stanford-Binet and Woodcock-Johnson at the typology are programs that allow for the
Riverside Publishers, the PsyTest and Wechsler archiving of multiple cases, set up and usage of
Intelligence Scale for Children-Third Edition extensive databases of test results, and statistical
(WISC-III) and Wechsler Individual Achieve- manipulation of these data. Examples of such
ment Test (WIAT) programs of The Psycho- programs are found among the offerings of all
logical Corporation, the Western Psychological the publishers of the large achievement-test
Services Test Report series, the software batteries, who typically offer software for
systems developed by Psychological Assess- analyzing trends in achievement data across
ment Resources, and the ªAssistº software for an entire school district. The most sophisticated
the tests published by American Guidance versions of these programs allow for the
Service, to name a few of the larger groupings assessment of growth or change at the indivi-
of scoring programs. Although many of these dual, classroom, and school levels of analysis.
programs also include interpretive functions, to Based on the experience of the senior author
be discussed in the next section, the scoring of this chapter, who participated in the devel-
sections of these programs tend to be more opment of several computer-scoring systems
graphic and more sophisticated than earlier, (e.g., Roid, 1985; Roid & Fitts, 1988; and the
conventional scoring programs (Roid, 1985). Wechsler Scoring Assistant, Psychological Cor-
For example, a much wider array of profile poration, 1992), these profile-analysis programs
indexes, score-difference analyses, critical item are typically based on extensive data analysis,
comparisons, and profile-matching routines are not informal or subjective processes that often
included than previously available programs occur in the ªinformal narrativeº category of
(with exceptions being some of the complex, the typology. Most of the profile analyses are
early MMPI interpretive programs). Two based on actual data from the standardization
examples, one in personality and one in and validity studies of the major tests, and
cognitive assessment, may illustrate the char- extensive staff, resources and time are invested
acteristics of such programs. The Tennessee in the development, and (elaborate) cross-
Self Concept Scale (Roid & Fitts, 1988) checking of program accuracies. Because it is
computerized scoring program includes a wide well known that clerical errors in scoring
array of research-based profile indexes, checks standardized tests are all too common, these
on the validity of response patterns, faking- scoring programs provide a valuable service to
good scales, critical-item lists, and a multi- clinical psychology by delivering accurate
variate profile matching method that is scores. Further, the complexity of profile
implemented on a complex, color-printed dis- analysis, difference-score computation and
play. The profile-matching method, interval- analysis, and profile-pattern matching would
banded profile analysis, was initially developed not be feasible with hand-scoring methods.
by Huba (1986) to provide a statistical test of Except for a few unusual cases, the statistical
the degree of match between prototypical scoring programs tend to be well documented in
profiles (stored within the computer program) the technical manuals published with them.
A Typology of Computer Assisted Psychological Assessments 507

Thus, in summary, the sophisticated ªscoring mean the wordy descriptions printed on
assistantº model of computer scoring software computer reports of assessment results that
has distinct advantages for the clinical psychol- attach clinical significance, often based on
ogist. Time saved by the clinician in using clinical lore or theoretical explanations, to the
scoring software could be reinvested in more patterns or levels of scores. Such narratives have
time with the client or additional personal been faulted for lack of validation of specific
interview or case follow-up. sentences or phrases and for lack of attention to
individual client variations. Informal narratives
also have the impact of vague generalities that
4.17.2.3 Descriptive Interpretation result in a ªBarnum effectº in which enough
truth lies within a complex of statements if one
A descriptive type of program would generate
emphasizes the accurate parts and minimizes
sentences of explanation, such as ªThe client has
the inaccurate parts. In research on the accuracy
a significantly higher score on subtest three as
of computerized reports, Adams and Shore
compared to the other subtests in the profile,º
(1976) found an inverse relationship between
along with the printed scores and profile. The
length of reports and their accuracy rated by
distinction between this level of description and
clinicians. For these reasons, clinicians should
the narrative interpretations described in the
be very cautious about the validity of informal
next section of the typology is that description
narrative programs.
remains tied to the facts and does not indicate
Basically, the key feature of such programs,
cause-effect relationships or connections with
that clinicians should be able to discern from
research or clinical findings. The descriptions
the manual and advertising material published
would be analogous to the phrases used in the
with the computer program, is whether or not
results section of a research articleÐreporting
the narrative sentences have been accumulated
the findings before they are discussed or
from empirical research or whether they were
interpreted. As noted in the example above,
written by the author(s) of the program
the descriptions can be rooted in sophisticated
without validation studies. Another key is
statistical comparisons between scores (e.g.,
whether or not the statements were validated
Silverstein, 1981).
by empirical linking of clinician ratings and
Some of the first computer interpretive
scores, where both were collected in the same
programs widely used in the 1980s had
research studies. Positive examples of proper
redundant, printed phrases that described score
validation of narrative programs are reviewed
elevations for multiple scores, using exactly the
by Snyder et al. (1990). Published programs
same wording. The best examples of descriptive
that include extensive empirical validation of
interpretation include sophisticated ªsentence
narrative reports include Lachar (1984) and
generatorsº that compose explanations using a
Snyder (1981), to name only two prominent
variety of modifiers, sentence construction and
examples.
style, similar to the variety present in good
By clinician-modeled programs we mean
report writing. One example is that of the
those that either (i) employ the process used
Barclay Classroom Assessment System (Bar-
by a renowned clinician within the logic of the
clay, 1983) which varied pronouns such as ªheº
computer program, or (ii) employ statistical
and ªshe,º and used research on the scaling of
models of the process used by expert clinicians,
verbal phrases (e.g., Lichtenstein & Newman,
determined through research on clinical judg-
1967; Pohl, 1981) to compose explanations of
ment (e.g., the methods described by Goldberg,
score elevations. For example, scale values for
1968). Examples of the former include the
descriptors such as ªfairly oftenº vs. ªvery
WISC-III program of Kaufman (1996) that
infrequentlyº can be used to anchor score-level
implements his documented approach to inter-
descriptions, in a way more precise than an
pretation of the Weschler scales (Kaufman,
informal or subjective use of such language. Use
1994). Examples of the latter have never been
of more precise methods of description, could
implemented, as far as we know, although the
increase the potential of such programs to
methodology is particularly promising. For a
maintain a more objective level of description,
review of expert systems and their application to
that is, sticking to the facts.
CAPA, see Guastello and Rieke (1994).

4.17.2.4 Narrative Interpretation


4.17.2.5 Statistical±Actuarial Programs
The worst examples of informal, unvalidated,
narrative interpretations of psychological tests The term ªactuarialº as applied to psycho-
were the target of Matarazzo (1983, 1986) in his logical assessment was coined by Meehl (1954)
critiques of CAPA. By informal narrative we in analogy to the actuary process in insurance-
508 Computer Assisted Psychological Assessment

risk determination. Gregory (1996) presented a large sample of children in a normative sample
classic definition of an actuarial interpretation, were given the PIC and the checklist was
attributed to J. O. Sines, as founded on ªthe completed by examiners for these children also.
empirical determination of the regularities that For certain scales (and certain scale elevations,
may exist between specified psychological test e.g., scores greater than 79T), empirical corre-
data and equally clearly specified socially, lates were those descriptive statements statisti-
clinically, or theoretically significant nontest cally associated with the ªelevationº (score in
characteristics of the person testedº (p. 579) the clinical range). For example, for the
Thus, an actuarial approach is databased and achievement scale, a percentage (e.g., 73%) of
must show a statistical link between information clinical cases would show symptoms such as
collected outside the test (e.g., clinician's behavioral adjustment difficulties or negative
observations) and the test scores or patterns self concept (as indicated on the clinical check-
of test results. At their best, narrative statements list), when the achievement scale score exceeded
appearing in a truly actuarial program would 79T. To confirm such findings, the same
not be based on clinical opinion, but rather on correlational study would be repeated as a
rigorous research linking test and nontest cross-validation. Since empirical findings al-
information. ways ªshrinkº upon cross validation, the
A classic example of an actuarial approach to percentage may reduce to 70% of cases in the
the MMPI is provided by Marks and Seeman above example.
(1963) who defined a 4±8±2 profile ªcode typeº The best statistical±actuarial interpretation
as follows: programs are those that meet the standards
(i) Scales 4, 8, and 2 over 70T suggested by Snyder et al. (1990), where cut-
(ii) Scale 4 minus 2 less than 15T score rules, program logic, and narrative
(iii) Scale 7 not to exceed 4 by more than 4T sentences have been subjected to empirical
(iv) Scale 8 minus 2 less than 15T research and are well documented in a technical
(v) Scale 8 minus 7 more than 5T manual. Also, the best of such programs have
(vi) Scale 8 minus 9 more than 10T been subjected to validation research that
(vii) Scale 9 less than 70T surveys clinical users of the reports, collects
(viii) Scales L and K less than F, F less than accuracy ratings, and assesses the impact on
80T clinical decisions of various client reports in
Thus, detailed specifications are given for the rigorous follow-up studies. The number of
entire profile pattern, not just the high scores on programs with such rigorous development
4 (psychopathic deviate), 8 (schizophrenia), and are, unfortunately, few. Some systems that
2 (depression). Even the validity scales, L (lie come close are the larger, well-researched
scale), F (frequency), and the suppressor scale MMPI programs, the PIC (Lachar, 1984),
K, are used to verify the accuracy of the profile. the Marital Satisfaction Inventory (Snyder,
Marks and Seeman (1963) reported that pa- 1981), and the Wechsler programs developed
tients obtaining this profile were mainly diag- by the Psychological Corporation (1994), to
nosed psychotic (71% schizophrenic, paranoid name only four examples. Even these programs
type) though some were seen as personality can be faulted in that every phrase or
disorders (e.g., 21% sociopathic). Note that descriptor may not have been subjected to
these are percentages for a given sample of empirical trials based on moderator variables,
clinical patients, and that they do not predict and all may not have been examined for report
with 100% accuracy, as with all probability accuracy. Such research, as with all good
relationships. Therefore, the most accurate construct validation, takes decades of accumu-
statement that should appear in narrative form lated research. One trend and positive aspect of
would be something like, ªsome research studies some of the newer programs (e.g., WISC-III
of clinical patients have shown a frequency of Writer, Psychological Corporation, 1994), is
about 70% schizophrenic±paranoid for this the provision for placing the clinician in control
pattern of scores,º and the clinician should be of the final collection of narrative statements
careful to screen all such comments to be sure that appear in the report. As will be discussed
they apply to the current case. in later sections of this chapter, the individual
Another type of actuarial approach called clinician must maintain control and oversight
ªempirical correlatesº was developed by Lachar over the selection and accuracy of all state-
(1984) for the Personality Inventory for Chil- ments generated by computer interpretations
dren (PIC) computer interpretations. Lachar that are employed in case reports.
administered clinical checklists to psychologists With the typology of CAPA programs in
who had interviewed children who were subse- mind, the following section reviews some of the
quently tested with the PIC and, previously, advantages and disadvantages of CAPA. Key
known to have certain diagnoses. In addition, a literature is cited in the next section.
Computer Assisted Psychological Assessment: Advantages 509

4.17.3 COMPUTER ASSISTED the part of clinicians (Butcher, 1987; Dahl-


PSYCHOLOGICAL ASSESSMENT: strom, 1993). Once interpretive rules are
ADVANTAGES developed and programmed, they are automa-
tically applied to protocols regardless of extra-
Computerized approaches to psychological neous circumstances.
assessment have rapidly become standard
practice in most mental health treatment
settings. Not only have computer administra- 4.17.3.3 Speed
tion and scoring of various tests become
commonplace, but CBTIs have become a Perhaps the most pragmatic and obvious
booming industry in their own right. In 1989, advantages of computer assisted psychological
a survey of 413 mental health facilities in the assessment is the potential for marked reduction
USA revealed that 53% of these major facilities in time required for administration, scoring and
employed some form of CBTI (Piotrowski & interpretation of psychological instruments
Keller, 1989) CAPA offers numerous advan- (Butcher, 1987; Kleinmuntz, 1969, 1975; Wise
tages to users, organizations, and clients alike. & Plake, 1990). Administration time for both
The following are some of the more salient achievement and personality tests is signifi-
advantages. cantly reduced by computer administration
(Wise & Plake, 1990). Computerized scoring
and interpretive report generation serve to
4.17.3.1 Improved Administration and Scoring enhance timely processing of test data by
clinicians and subsequent delivery of feedback
Computer administration of psychological
to consumers. In addition, these substantial
tests serves to enhance standardization and
reductions in time are associated with improve-
clinician control over the testing process. In
ments both in consistency and accuracy.
addition to reduced time for administration and
rapid availability of feedback, computer admin-
istration allows presentation of even complex 4.17.3.4 Reliability
testing stimuli (Krug, 1987) and early identifi-
cation of examinees who misunderstand direc- Computer assisted scoring and interpretation
tions or test stimuli (Wise & Plake, 1990). of psychological tests serves to radically reduce
Though some have expressed concerns about error variance. Butcher (1987) noted ªThe
the equivalence of computer administered test computer seldom has an off day as human test
scores, reviews suggest that when instructions interpreters doº (p. 5). As a result, well
are similar, computer-administered tests gen- developed test interpretation programs offer
erally yield scores which are equivalent to paper nearly perfect reliability for the functions of
and pencil versions (Finn & Butcher, 1991). scoring and compiling preprogrammed inter-
Additionally, preliminary research indicates pretive statements (Burke & Normand, 1987;
that computer-administered tests are not only Graham, 1993; Krug, 1987). Once scale corre-
acceptable to examinees but often preferred lates have been identified and replicated, they
over conventional testing procedures (Burke & can easily be stored and reliably recalled when
Normand, 1987; Finn & Butcher, 1991). particular scale elevations or score configura-
Specifically, examinees report greater interest, tions appear in a protocol.
less anxiety, and greater comfort in responding
to computer generated test stimuli. Computer-
administered tests are also quite useful for more 4.17.3.5 Cost Effectiveness
disturbed clinical populations who, by virtue of Computer scoring and interpretation of
their level of disorganization or inattention, psychological tests generally result in marked
may respond more effectively to a computer reduction in clinician time and, therefore,
than another person (Bloom, 1992). expense to the consumer (Burke & Normand,
1987; Butcher, 1987; Graham, 1993). Though
4.17.3.2 Objectivity computerized systems will certainly require
an initial outlay for equipment, software,
In his early call for a good actuarial ªcook- and training, long-term savings to both users
bookº for use in psychological assessment, and consumers may be expected. With the
Meehl (1956) noted the potential benefit of advent of managed behavioral healthcare and
decreased distortion and bias, in the recording, increasing attention to issues such as cost-
storage, and retrieval of test data and inter- effectiveness, efficiency, and treatment utility,
pretive material. Most appear to agree that CAPA offers several ways for clinicians to
CAPA can substantially reduce errors asso- reduce the time and expense required for a
ciated with ignorance, bias, or stereotyping on thorough assessment.
510 Computer Assisted Psychological Assessment

4.17.3.6 Expert Consultation and ongoing refinement of adaptive or tailored


testing programs. Here the system is pro-
Many modern day CBTI programs appear to grammed to adjust the difficulty level or volume
have effectively realized Meehl's (1956) hope for of material in discrete areas to an individual
systems which employ large databases and examinee. Examples include research with the
widely representative samples on which to base ASVAB and the MMPI. Sympson, Weiss, and
predictions and diagnoses. Many well devel- Ree (1982) found that an adaptive version of the
oped CBTIs systematically organize and access ASVAB produced validity coefficients which
massive normative databases and extensive were equivalent to conventional administration
bodies of empirical research findings to bolster of the test in spite of the fact that the adaptive
and undergird test interpretations (Krug, 1987). versions were typically one-half the length of
In spite of some methodological weaknesses, their conventional counterparts. Similarly, Ro-
research on the equivalency of computer per, Ben-Porath, and Butcher (1991) found that
generated vs. clinician generated test interpreta- among 155 college age subjects, an adaptive
tion suggests many CBTIs perform at least as version of the MMPI-2 (averaging 28% shorter
well as clinicians (Bloom, 1992; Burke & in length) produced profiles which were equiva-
Normand, 1987; Finn & Butcher, 1991). For lent to conventional MMPI-2 administration.
example, Kleinmuntz (1969), in a large scale
multi-site study, found that computer generated
MMPI interpretations performed as well as 4.17.4 COMPUTER ASSISTED
expert MMPI interpreters and surpassed the PSYCHOLOGICAL ASSESSMENT:
performance of average clinicians in predicting DISADVANTAGES
a client's primary clinical problem.
Computer generated interpretations may In addition to the many advantages inherent
ideally offer the skilled clinician a source of in CAPA, there are also concerns and unre-
expert consultation. Due to the voluminous solved dilemmas in the development, imple-
information available via the computer system, mentation, and utilization of computerized
interpretations generated by a CBTI may systems. Below, are highlighted several of these
markedly enhance both the accuracy and potential disadvantages. These include the
comprehensiveness of clinical decision making, problem of excessive generality, scanty validity,
treatment planning, and feedback to the client. potential for depersonalizing the assessment
Garb (1994) pointed out that even when a CBTI process, potential for misuse or client harm, and
report appears to conflict with interview, the danger inherent in viewing the computer as a
historical, and observational data, the report competent clinician.
can still be quite valuable in leading the clinician
to consider alternative hypotheses and collect
additional data. CBTIs may be most helpful 4.17.4.1 Excessive Generality: The Barnum
when a case is ambiguous and the diagnosis Effect
unclear. Also, as a result of the ªexpertº and Computer generated testing interpretive re-
ªobjectiveº look and sound of computerized ports have been criticized soundly for their
psychological reports, CBTIs may be particu- frequently broad and highly generalized narra-
larly advantageous in forensic settings (Butcher, tive statements (Butcher, 1987; Groth-Marnat
1987) where such reports may be viewed as a & Schumaker, 1989; Matarazzo, 1986). Butcher
form of outside and corroborating opinion. described this as the problem of excessive
Finally, computer generated reports are parti- generality, otherwise known as the Barnum
cularly advantageous as a source of objective effect or the Aunt Fanny report. CBTIs are
and expert opinion in the psychotherapy notorious for offering patient descriptions
enterprise (Finn & Butcher, 1991). Here the based on insufficient empirical research. These
test interpretive report is presented as a narrative statements then apply to most human
consultation by an outside expert and intro- beings and are merely modal statements vs.
duced for the purpose of clarification and person-specific descriptors.
discussion. This problem appears to have consistent
empirical demonstration. O'Dell (1972) found
4.17.3.7 Flexibility that subjects perceived Barnum reports to be
more accurate descriptions of their own person-
A final advantage of computer assisted ality than actual computer generated interpre-
psychological assessment is the flexibility pos- tations of their MMPI profiles. O'Dell
sible in such systems (Graham, 1993; Klein- concluded that statements with very high base
muntz, 1975). Perhaps the most tangible rates tend to be believed or concurred with.
example of this flexibility is the development Guastello, Guastello, and Craft (1989) found
Computer Assisted Psychological Assessment: Disadvantages 511

that 58% of respondents rated Barnum reports computers in the assessment process will serve
for the Personality Profile Compatibility Ques- to heighten the distance between the clinician
tionnaire as quite accurate descriptions. None- and client and lead to a more sterile and perhaps
theless, actual CBTI reports for this measure dehumanizing assessment process (Burke &
were rated as significantly more accurate than Normand, 1987; Krug, 1987). In fact, research
the Barnum reports. Less encouraging was a does not substantiate this concern and instead,
similar study employing a CBTI for the Exner suggests quite the opposite. Most people appear
Rorschach system. Prince and Guastello (1990) to prefer computer administered tests and some
reported that the Exner report offered only 5% are more truthful in response to computer
discriminating power for any one patient when administered questions (Fowler, 1985). Com-
compared with bogus Exner interpretive re- puter administered assessments appear particu-
ports. Perhaps more concerning was the finding larly useful for more disturbed or anxious
that approximately 60% of the CBTI state- clients.
ments contained statements which merely
described characteristics of the outpatient
population in general. Related to this concern 4.17.4.4 Potential for Misuse and Client Harm
regarding Barnum statements and excessive
Later in this chapter we will address a range
generality is the frequently expressed concern
of potential ethical problems inherent in the use
regarding an ªauraº of credibility or accuracy
of computer assisted assessment. The potential
potentially attributed to CBTI reports by
for misuse of CBTIs and resulting harm to
consumers. The scanty research in this area
clients is substantial and results from difficulties
suggests that this concern may be largely
with their current use (Butcher, 1987; Groth-
unfounded and consumers rate the quality,
Marnat & Schumaker, 1989). First, because
credibility, and accuracy of CBTI or clinician
CBTI programs and services are widely avail-
generated reports as essentially similar (An-
able and sold to a wide range of professionals, it
drews & Gutkin, 1991).
is likely that professionals without adequate
awareness of the limitations of CBTIs will apply
4.17.4.2 Lack of Validity them to clients regardless of context or
important mitigating factors. A related concern
One of the most glaring problems inherent in involves the potential for factors indigenous to
the development and use of CBTI report computerized assessment, but irrelevant to the
systems is the pervasive dearth of empirical purposes of the test, to significantly alter test
evidence of their validity. Very few of the performance (Moreland; 1985). Second there
existing CBTI systems have been validated in may be a tendency for consumers to uncritically
even a rudimentary manner (Butcher, 1987; accept statements generated by a computer as
Finn & Butcher, 1991; Matarazzo, 1986). more factual than those generated by a clinician.
Rather than actuarial programs based exclu- Finally, because CBTI reports are rarely signed,
sively on empirically derived base rate data, there are serious concerns regarding profes-
most CBTI reports are based on clinical lore or sional responsibility and legal culpability (Gre-
the conclusions and hypotheses of expert gory, 1996). Without a qualified psychologist
clinicians (Gregory, 1996; Groth-Marnat & assuming responsibility for the service offered,
Schumaker, 1989). Concerns regarding validity potential for misuse is enhanced.
are often expressed as part of a larger concern
about the wide range in product quality within
the CBTI market. Butcher (1987) wondered at 4.17.4.5 Computer as Clinician
the ªmind-bogglingº (p. 6) array of computers
The final potential disadvantage in employ-
and software packages from which clinicians
ment of computers in the assessment process has
might choose. While some systems are based on
to do with the danger that well trained
reasonably rigorous development procedures,
psychologists might lose control (Krug, 1987)
most are not. As a result, many psychologists
of the assessment process. Specifically, this
remain quite skeptical of CBTI programs
would be a loss to mechanization, technology,
(Burke & Normand, 1987) and avoid using
and large marketing interests. Though clinicians
them.
clearly stand to benefit from the speed and
reliability of computer driven assessments, there
4.17.4.3 Depersonalizing the Assessment is concern that the expert clinician of the past
Process will become the testing technician of the future
(Butcher, 1987; Butcher, Keller, & Bacon, 1985;
Critics of computerized assessment systems Groth-Marnat & Schumaker, 1989; Matarazzo,
often express concern that introduction of 1990). As as result of the highly professional
512 Computer Assisted Psychological Assessment

appearance of many CBTI reports, both users require developers to provide rather detailed
and recipients of computerized narratives may information regarding the system's develop-
confuse them with comprehensive assessments. ment and structure in a separate manual.
To the extent that the skills of the human Because individual users are responsible for
clinician are relegated to a position of dimin- determining the validity of any CBTI for
ished importance in generating assessment individual test-takers, availability of such
outcomes, and to the extent that clinicians system information is critical. Bersoff and
perceive themselves as less responsible for these Hofer (1991) noted that in spite of the apparent
outcomes, there certainly exists greater risk to conflict between the developer's proprietary
the profession and the consumer. interest in the product and the clinician's need to
responsibly evaluate the service, open and
critical review of tests and CBTIs is critical
4.17.5 ETHICAL ISSUES for ensuring the quality of such materials and
upholding the profession's ethical code.
In light of the rapid proliferation of CAPA
techniques, and CBTIs in particular, it is not
surprising that this burgeoning area of research
and practice has become the focus of a wide 4.17.5.2 Basis for Scientific and Professional
range of ethical concerns. The APA's Ethical Judgments
Principles and Code of Conduct (APA, 1992)
Psychologists rely on scientifically and pro-
has as its primary goal ªthe welfare and
fessionally derived knowledge when making
protection of the individuals and groups with
both scientific and professional judgments. The
whom psychologists workº (p. 1599). Principle
Ethics Code (APA, 1992) also states explicitly
C from the APA code is perhaps most relevant
that ªPsychologists select scoring and inter-
to the ethical issues inherent in computerized
pretation services (including automated ser-
assessment, ªPsychologists uphold professional
vices) on the basis of evidence of the validity of
standards of conduct, clarify their professional
the program and procedures as well as on other
roles and obligations, accept appropriate re-
appropriate considerationsº (p. 1604). While
sponsibility for their behavior and adapt their
psychologists are clearly compelled to justify
methods to the needs of different populationsº
their professional statements and behavior with
(p. 1599). With these goals in view, we will
empirically derived evidence, the current state of
consider several of the most salient ethical
CBTI development makes this a difficult task
obligations for psychologists involved in
indeed. The overwhelming majority of auto-
CAPA. If neglected, each could serve as a
mated interpretive programs lack even pre-
source of potential harm to consumers of
liminary forms of established validity (Lanyon,
computerized services.
1984). With this concern in mind, Matarazzo
(1986) insisted that until research establishes
4.17.5.1 Test Development (even the most primitive) validity for CBTIs, ªIt
is essential that they be used only as tools by the
The Ethics Code (APA, 1992) states clearly clinician trained in their use and not as
that psychologists involved in the development equivalents of, and thus substitutes for, profes-
and provision of computerized assessment sional education and trainingº (p.14).
services accurately describe the purpose, devel- Empirical validation of CBTI systems is
opment procedures, norms, validity, reliability, exceptionally difficult given the exhaustive
and applications of the service as well as range of potential test scores and profiles. As
particular qualifications or skills required for a result, there are no purely actuarial inter-
their use. Psychologists participating in such pretive programs in existence (Butcher et al,
product development should attempt to clearly 1985; Fowler, 1985). Instead, most CBTIs offer
link interpretive statements to specific client a form of automated clinical prediction (Gra-
scores or profiles, qualify narrative statements ham, 1993) in which published research, clinical
to minimize the potential for misinterpretation hypotheses, and clinical experience on the part
and perhaps provide some form of warning of an ªexpertº clinician are integrated into
statement to alert users to the potential for interpretive narrative statements. Nonetheless,
misinterpretation (Hofer & Green, 1985). At the validity of such statements cannot be
the very least, the developer might note that the assumed by users of such reports and must be
clinical interpretations offered in narrative demonstrated every bit as much as the validity
printouts are not to serve as the sole basis of the test on which it is based. Psychologists
on which important clinical decisions are who employ CBTI reports must clearly under-
made (Matarazzo, 1986). Adherence to the stand the basis for the statements offered by
highest standard of the profession would also such services, their validity or lack thereof, and
Ethical Issues 513

take reasonable steps to ensure that those with products. ªThey [Psychologists] are alert to
whom they work are not harmed by the and guard against personal, financial, social,
irresponsible or uncritical use of such reports. organizational, or political factors that might
lead to misuse of their influenceº (APA, 1992,
p. 1601). Endorsements by respected psychol-
4.17.5.3 Describing the Nature of Psychological ogists in the field of assessment are often
Services coveted and highly promoted by product mar-
keters. The Ethics Code clearly warns against
In his review of the state of personality
irresponsible promotion of CBTI products in a
assessment for the Annual Review of Psychol-
manner which might compromise the profes-
ogy, Lanyon (1984) offered a stern indictment of
sion or increase misuse of such materials by
the CBTI industry. He noted that available
other professionals and consumers.
literature regarding these programs appeared to
come in three essential varieties. These included:

(a) glossy promotional literature sometimes mas- 4.17.5.4 Competence


querading as scientific data and usually accom-
panied by sample records, (b) studies of customer Establishing and maintaining an appropriate
and user satisfaction, which has never been much level of competence in the area of computerized
of a problem, and (c) an occasional paper giving psychological assessment may be one of the
actual information about the development or most significant areas of ethical risk for
validation of an automated system. (p. 690) psychologists at this time. The Ethics Code
(APA, 1992) makes numerous references to the
Lanyon was particularly distressed by the importance of competence for psychologists.
fact that gross deficits in program validity The first ethical principle in the Ethics Code
appeared to have become the norm for CBTI relates to competence and stresses that psychol-
systems. The primary ethical issue of concern ogists maintain a high degree of competence in
here has to do with the manner in which their work and clearly articulate the boundaries
psychologists describe and/or promote compu- of their expertise. ªPsychologists function only
terized assessment services. Two sections of the within boundaries of competence based on
Ethics Code (APA, 1992) have particular re- education, training and supervised experience
levance here. First, when describing the nature or appropriate professional experienceº
and results of psychological services, ªPsychol- (p. 1599). Further, they provide services in
ogists provide, using language that is reason- new areas of practice ªonly after first under-
ably understandable to the recipient of those taking appropriate study, training, supervision
services, appropriate information later about and/or consultation from persons competent in
results and conclusionsº (p. 1600). Second, the those areasº (p. 1600). Initially establishing and
section on avoidance of false or deceptive demonstrating competence in the area of CAPA
statements emphasizes that psychologists re- would not appear to be adequate, however, as
frain from making false, deceptive, or mislead- psychologists are additionally enjoined by the
ing statements either by virtue of what they Ethics Code to maintain their expertise as long
convey or omit concerning their services and as they practice as psychologists. ªPsychologists
work activities. This emphasis on avoiding maintain a reasonable level of awareness of
deception also extends to descriptions of the current scientific and professional information
scientific or clinical basis for, or results or in their fields of activity and undertake ongoing
degree of success of psychologists services. efforts to maintain competence in the skills they
The implication of these standards would useº (p. 1600).
suggest a rather clear mandate for psychologists Development of clear standards and guide-
to explicitly describe those CBTI services they lines for use of CBTIs in particular will serve to
participate in developing, promoting, or utiliz- protect consumers and the profession. They
ing in their clinical work. This would include might also serve as critical guides to psychol-
descriptions of the process by which the pro- ogists, judges, and the courts in determining the
gram was constructed, the manner in which it standard of practice in this area (Hofer &
generates interpretive material and any existing Green, 1985). Bersoff and Hofer (1991) noted
evidence of validity. By the same token, psy- that establishing a prevailing ªstandard of careº
chologists must be proactive in highlighting in the area of CBTIs is critical to determining
deficits in system validity or performance such whether a test program's user, developer, or
that consumers and users might avoid harmful publisher violated a prevailing standard and is
outcomes. Finally, psychologists should avoid therefore ethically out of compliance or legally
misuse of their influence in the promotion of culpable. Examples of such noncompliance
CBTIs and other computerized assessment might include negligent entry of data, selection
514 Computer Assisted Psychological Assessment

of a scoring system the psychologist should procedures for retaining, reviewing, and releas-
know is inappropriate for a client or unreason- ing computerized assessment data. Tranel
able reliance on interpretive material from a (1994) noted that psychologists bear responsi-
CBTI narrative report. Problematically, the bility for determining whether those requesting
wide availability of CBTI and other computer- test data are qualified to interpret it appro-
ized assessment services to persons of varied priately. The concern here relates to nonexperts
professional and educational backgrounds drawing erroneous conclusions based on naive
(Fowler, 1985), has rendered development of use of CBTI reports. It appears that psychol-
guidelines for competence in this area quite ogists must not only avoid irresponsible use of
difficult. computerized test data themselves, but must
The current Ethics Code more explicitly also prevent the same on the part of others,
addresses the practice of psychological assess- ªPsychologists do not misuse assessment tech-
ment and offers a clearer picture of how niques, interventions, results and interpreta-
ªcompetenceº in this area might be defined: tions and take reasonable steps to prevent
others from misusing the information these
Those who develop, administer, score, interpret or techniques provideº (APA, 1992, p. 1603).
use psychological assessment instruments do so in Related to the release of test data is concern
a manner and for purposes that are appropriate in that test questions or CBTI system information
light of research or on evidence of the usefulness may become part of the public domain,
and proper application of the techniques . . .
Psychologists who perform interventions or ad-
resulting in risk of invalidation of tests as well
minister, score, interpret, or use assessment tech- as potential violation of copyright laws and
niques are familiar with the reliability, validation, contractual obligations (APA, 1996; Tranel,
and related standardization or outcome studies of 1994). Because many CBTI reports include
and proper applications and uses of the techniques printouts of the client's raw scores as well as
they use. (APA, 1992, p. 1603) those critical items endorsed, users must
exercise the same approach to maintaining test
Although broad, the Ethics Code does sug- security they might employ with any other form
gest several primary areas in which psycholo- of test data.
gists should have reasonable expertise if they are
to competently utilize computer generated as-
sessment material in their work with clients. 4.17.5.5 Professional Context
As a result of the ethical requirement for
practitioners to evaluate the soundness of CBTI Perhaps the most alluring and potentially
reports, they must obviously be qualified to dangerous property of CBTI narrative reports is
interpret the test themselves. This requires not their polished, professional, and thorough
only basic familiarity with psychometric prin- appearance. Psychologists, like other users of
ciples but also a rather detailed understanding these services, may be tempted to rely exces-
of the manner in which the particular system in sively on information from such computerized
question was developed and generates inter- narratives without adequate interaction with
pretive material. On the basis of this under- the individual client or reasonable consideration
standing, the psychologist might then be able to of the unique circumstances in which the client
reject, modify, or expand reports for particular presents for evaluation. The Ethics Code (APA,
clients (Hofer & Green, 1985). Psychologists 1992) rather clearly addresses this concern,
will need to be familiar with three components
of the CBTI services they utilize in order to do Psychologists perform evaluations, diagnostic ser-
this effectively. These include (i) the examinees vices or interventions only within the context of a
defined professional relationship . . . Psychologists'
score on the relevant test or scale, (ii) the test assessments, recommendations, reports and psy-
scale or combination of scales on which chological diagnostic or evaluative statements are
interpretations are based, and (iii) research or based on information and techniques (including
clinical evidence supporting the interpretation. personal interviews of the individual when appro-
In addition to basic competence in psycho- priate) sufficient to provide appropriate substan-
metrics and interpretation of CBTI assessment tiation for their findings. (p. 1603)
findings for individual clients, psychologists
must also demonstrate competence in the Most concur that computerized assessment
appropriate monitoring of CBTI data. The narratives must be carefully reviewed for appro-
Ethics Code requires psychologists to make priateness and ªfitº with the examinee in light of
reasonable efforts to maintain the security of research, complete information about the ex-
tests and other assessment techniques (APA, aminee, and solid professional judgment (Car-
1992). To do this effectively in the area of son, 1990; Fowler, 1985; Graham, 1993; Hofer
CAPA, psychologists must establish formal & Green, 1985).
Guidelines for Users of Computer-based Tests and Interpretations 515

Matarazzo (1990) made the case that one of generated assessment material, is concern
the primary sources of ethical and professional regarding the implications of CBTIs for special
danger in this regard has been a rather subtle client populations. The APA Ethics Code
but progressive loss of distinction between requires sensitivity to and respect for human
psychological assessment as a professional differences among examinees and clients. This
activity and mere testing. includes sensitivity to differences across such
domains as age, gender, race, ethnicity, national
Objective psychological testing and clinically origin, religion, sexual orientation, disability,
sanctioned and licensed psychological assessment language, and socioeconomic status. The Ethics
are vastly different, even though assessment Code specifically states that psychologists
usually includes testing . . . Psychological assess- ªremain vigilant for situations in which adjust-
ment is engaged in by a clinician and a patient in a ments must be made in administration or
one-to-one relationship and has statutorily de-
fined or implied professional responsibilities . . .
interpretation secondary to individual or con-
Specifically, it [assessment] is the activity of a textual factorsº (APA, 1992, p. 1603). The
licensed professional, an artisan familiar with the rather clear implication here is that computer-
accumulated findings of his or her young science. assisted interpretive system results must be
(p. 1000) passed through the clinician's own interpretive
grid with an eye toward identification of
Similarly, Carson (1990) called for defense of demographic or contextual factors on the part
ªclinicianshipº (p. 437) within psychological of the client which might raise concern about the
assessment and highlighted many of the dan- validity of the findings (Bersoff & Hofer, 1991).
gers inherent in considering CBTI data apart This of course demands that the psychologist
from other primary client information and understands, and has access to, differences in
without the benefit of a clear client± base rates for specific demographic groups with
professional relationship. Those portions of respect to both the test and the interpretive
the Ethics Code addressing utilization of program in question.
assessment results and computerized scoring As a result of the burgeoning of CBTI
and interpretive services are clear that psychol- systems, the proliferation of unsatisfactory
ogists retain full responsibility for conducting and typically invalidated systems and the
competent and context appropriate assessment, widespread marketing of such systems to
versus context blind and potentially harmful unqualified users, there have been frequent
psychological testing, ªPsychologists retain calls in the scholarly and professional literature
appropriate responsibility for the appropriate for development of standards and regulations
application, interpretation and use of assess- relative to computerized approaches to assess-
ment instruments, whether they score and ment (Burke & Normand, 1987). In 1986 the
interpret such tests themselves or use auto- APA published a set of guidelines for practice
mated or other servicesº (APA, 1992, p. 1604). by psychologists in this arena. Guidelines for
A related but unresolved concern, however, is computer-based test interpretations (APA, 1986)
how psychologists are to retain such responsi- offered a set of brief and general aspirational
bility when most CBTI reports are not signed guidelines for developers and users of compu-
by a responsible psychologist. Matarazzo terized assessment techniques and products.
(1986) highlighted this problem: Conscious of the foregoing summary of salient
ethical concerns in the field of CAPA, we will
Although it is not yet part of psychology's code of now offer a synopsis of those guidelines with a
ethics, my experience leaves no question that focus on why such guidelines are significant and
computerized clinical interpretations offered in a how they might be applied by psychologists.
professional setting about a person's intellectual,
personality, brain-behavior and other highly per-
sonal characteristics constitutes a legally and
professionally significant invasion of privacy 4.17.6 GUIDELINES FOR USERS OF
and requires at the least, that the individuals COMPUTER-BASED TESTS AND
offering these clinical interpretations sign their INTERPRETATIONS
names to such consultations just as is done in every
other profession. (p. 21) The following guidelines are intended for
those professionals who use computer-based
4.17.5.6 CAPA with Special Populations testing and interpretive services with those to
whom they provide services. Table 2 contains
Related to the foregoing concern regarding the APA Guidelines for users of computer-
the professional context of assessment and the based tests and interpretations (APA, 1986) and
psychologist's responsibility for ensuring the will serve as an outline for the current
accuracy and appropriateness of computer discussion.
516 Computer Assisted Psychological Assessment

Table 2 Guidelines for users and developers of computer-based tests and interpretations.

Guidelines for users


Administration
1. Influences on test scores due to computer administration that are irrelevant to the purposes of assessment
should be eliminated or taken into account in the interpretation of scores.
2. Any departure from the standard equipment, conditions, or procedures, as described in the test manual or
administrative instructions, should be demonstrated not to affect test scores appreciably. Otherwise,
appropriate calibration should be undertaken and documented (see Guideline 16).
3. The environment in which the testing terminal is located should be quiet, comfortable, and free from
distractions.
4. Test items presented on the display screen should be legible and free from noticeable glare.
5. Equipment should be checked routinely and should be maintained in proper working condition. No test
should be administered on faulty equipment. All or part of the test may have to be readministered if the
equipment fails while the test is being administered.
6. Test performance should be monitored, and assistance to the test-taker should be provided, as is needed and
appropriate. If technically feasible, the proctor should be signaled automatically when irregularities occur.
7. Test-takers should be trained on proper use of the computer equipment, and procedures should be
established to eliminate any possible effect on test scores due to the test-taker's lack of familiarity with the
equipment.
8. Reasonable accommodations must be made for individuals who may be at an unfair disadvantage in a
computer testing situation. In cases where a disadvantage cannot be fully accommodated, scores obtained
must be interpreted with appropriate caution.
Interpretation
9. Computer-generated interpretive reports should be used only in conjunction with professional judgment.
The user should judge for each test-taker the validity of the computerized test report based on the user's
professional knowledge of the total context of testing and the test-taker's performance and characteristics.
Guidelines for developers
Human factors
10. Computerized administration normally should provide test-takers with at least the same degree of feedback
and editorial control regarding their responses that they would experience in traditional testing formats.
11. Test-takers should be clearly informed of all performance factors that are relevant to the test result.
12. The computer testing system should present the test and record responses without causing unnecessary
frustration or handicapping the performance of test-takers.
13. The computer testing system should be designed for easy maintenance and system verification.
14. The equipment, procedures, and conditions under which the normative, reliability, and validity data were
obtained for the computer test should be described clearly enough to permit replication of these conditions.
15. Appropriate procedures must be established by computerized testing services to ensure the confidentiality
of the information and the privacy of the test-taker.
Psychometric properties
16. When interpreting scores from the computerized versions of conventional tests, the equivalence of scores
from computerized versions should be established and documented before using norms or cutting scores
obtained from conventional tests. Scores from conventional and computer administrations may be
considered equivalent when the rank orders of scores of individuals tested in alternative modes closely
approximate each other, and the means, dispersions, and shapes of the score distributions are
approximately the same, or have been made approximately the same by rescaling the scores from the
computer mode.
17. The validity of the computer version of a test should be established by those developing the test.
18. Test services should alert test users to the potential problems of nonequivalence when scores on one version
of a test are not equivalent to the scores on the version for which norms are provided.
19. The test developer should report comparison studies of computerized and conventional testing to establish
the relative reliability of computerized administration.
20. The accuracy of computerized scoring and interpretation cannot be assumed. Providers of computerized
test services should actively check and control the quality of the hardware and software, including the
scoring, algorithms, and other procedures described in the manual.
21. Computer testing services should provide a manual reporting the rationale and evidence in support of
computer-based interpretation of test scores.
Guidelines for Users of Computer-based Tests and Interpretations 517
Table 2 (continued)

Classification
22. The classification system used to develop interpretive reports should be sufficiently consistent for its
intended purpose (see Chapter 2 of the 1985 Testing Standards). For example, in some cases it is important
that most test-takers would be placed in the same groups if retested (assuming the behavior in question did
not change).
23. Information should be provided to the users of computerized interpretation services concerning the
consistency of classifications, including, for example, the number of classifications and the interpretive
significance of changes from one classification to adjacent ones.
Validity of computer interpretations
24. The original scores used in developing interpretive statements should be given to test users. The matrix of
original responses should be provided or should be available to test users on request, with appropriate
considerations for test security and the privacy of test-takers.
25. The manual or, in some cases, interpretive report, should describe how the interpretive statements are
derived from the original scores.
26. Interpretive reports should include information about the consistency of interpretations and warnings
related to common errors of interpretation.
27. The extent to which statements in an interpretive report are based on quantitative research vs. expert clinical
opinion should be delineated.
28. When statements in an interpretive report are based on expert clinical opinion, users should be provided
with information that will allow them to weigh the credibility of such opinion.
29. When predictions of particular outcomes or specific recommendations are based on quantitative research,
information should be provided showing the empirical relationship between the classification and the
probability of criterion behavior in the validation group.
30. Computer testing services should ensure that reports for either users or test-takers are comprehensible and
properly delimit the bounds within which accurate conclusions can be drawn by considering variables such
as age or sex that moderate interpretations.
Review
31. Adequate information about the system and reasonable access to the system for evaluating responses
should be provided to qualified professionals engaged in a scholarly review of the interpretive service. When
it is deemed necessary to provide trade secrets, a written agreement of nondisclosure should be made.

4.17.6.1 Administration 4.17.6.2 Evaluation and Selection of CBTIs


Guidelines 1±8 clearly address the respon- The potential for profound misuse of testing
sibility of the computerized system's user in material exists when individuals without ade-
ensuring that examinees are not adversely quate training are granted access to automated
affected by the computerized administration interpretive systems (Butcher, 1987). When the
itself. The environment must be conducive to CBTI user is not a psychologist or lacks the
comfort and maximal performance on the part background to effectively and reliably interpret
of the examinee and extraneous influences on tests, danger exists that CBTIs will not be
his or her performance should be minimized. appropriately scrutinized and carefully selected
While computerized administration generally for the client in question. Moreland, Eyde,
appears to enhance the ease of test taking Robertson, Primoff, and Most (1995) reported
while decreasing overall time required (Green, on an attempt to establish basic test user
1991b), such programs should routinely assess qualifications. The authors found that the 86
for evidence that the examinee understands identified testing competencies could be distilled
tests and procedures. If an examinee demon- to 12 ªminimum competenciesº and that these
strates any reservation or apprehension re- could be further reduced to two major cate-
garding interacting with a computer, a gories of competence. Test-users must (i)
conventional format version of the test should possess adequate knowledge of the test (or test
be substituted when possible. In the future, a scoring and interpretation program) and its
good deal more research is needed to better limitations, and (ii) accept responsibility for the
understand the impact of computers on competent use of the test (or computerized test
examinee experience of the assessment process, program). Competent use of CBTIs requires
satisfaction with the experience, and response that the psychologist carefully evaluate and
to the results generated (Finn & Butcher, select appropriate and reasonably validated
1991). interpretive programs.
518 Computer Assisted Psychological Assessment

Although well-designed interpretive pro- 4.17.6.3 Interpretation


grams which utilize a careful compilation of
empirical data and expert opinion often yield The APA guidelines (APA, 1986) are clear in
more valid interpretive reports than those warning that CBTIs should never be used as a
generated by clinicians (Green, 1991b), users singular indicator of a person's characteristics,
must become familiar with and conversant psychological functioning, or diagnosis. Rather,
regarding the program's established validity. professional judgment is required to determine
Nonetheless, determining acceptable levels of the extent to which the computerized report is
validity for automated reports poses two valid and appropriate in light of the total
problems. First, many tests themselves lack context of testing and test-taker's specific
adequate development research and second, performance and characteristics. Many CBTI
most interpretive programs have no established program developers and users express similar
validity themselves or very little (Butcher, concern about the danger inherent in divorcing
1987). Therefore, it is critical for the prospec- the skilled psychologist from the interpretation
tive user to review the test manual, related and use of computerized reports (Bersoff &
documentation, and examples of the compu- Hofer, 1991; Butcher, 1987; Finn & Butcher,
terized reports prior to utilizing them with 1991; Groth-Marnat & Schumaker, 1989). The
clients. It is particularly important to evaluate polished and objective appearance of narrative
the system rationale for interpreting scores and reports may lend to the temptation to accept
profiles. How well are the classification and them as valid without adequate scrutiny of their
decisional criteria operationalized? Potential match with the examinee. Bersoff and Hofer
users are also encouraged to evaluate the (1991) noted
credentials of the system's authors (Lanyon,
1987) and consider the authors' standing as There must be an interposition of human judgment
both an expert clinician and a scholar in the between the CPTI report and decision making to
field of CBTI. ensure that decisions are made with full sensitivity
to the nuances of test administration and inter-
Another basis on which to evaluate the pretation, and that the unique constellation of
potential value of a CBTI is that of general attributes in each person is evaluated. (p. 241)
utility. As a rule, the rationale or justification
for conducting an assessment hinges on provi-
sion of information of value with respect to Responsible assessment will necessarily in-
planning and executing treatment (Hayes, volve a multistep process (Finn & Butcher,
Nelson, & Jarrett, 1987). A CBTI will have 1991) in which computerized assessment results
utility to the extent there is evidence that it are skillfully integrated with other sources of
contributes to positive treatment outcome, or in information about the examinee. Currently,
some way enhances the status of individuals for computers are not capable of offering a sophis-
whom it is employed. Those systems high in ticated synthesis of psychological test results.
utility will generally be highly efficient and This integrative function appears to be squarely
usable, time and cost-effective, and able to within the purview and professional responsi-
discriminate effectively such that real between- bility of the psychological diagnostician. With-
examinee differences are detected (Krug, 1987). out such integration, psychological reports are
Finally, Ben-Porath and Butcher (1986) necessarily overly general, nonspecific, and
offered several questions which might be questionably accurate. They would certainly
utilized by prospective users of CBTI systems fail to capture the test-taker's cognitive, affec-
determining which system to employ. We tive, and behavioral functioning across a variety
believe these questions offer a good summary of situations (Bersoff & Hofer, 1991).
of the major concerns expressed in these The prevailing standard ethically and legally
guidelines. They include (i) to what extent has appears to be that the computer-based report is
the validity of the reports been established? (ii) a professional-to-professional consultation
to what extent do the reports rely on empirical (Butcher, 1987). In this way, the computer is
findings in generating interpretations? (iii) to merely the equivalent of a library or consultant
what extent do the reports incorporate all of which might offer the most frequently indicated
the currently available validated descriptive inferences or correlates for specific test scores or
information? (iv) do the reports take demo- profiles. With the computer as librarian or
graphic variables into account? (v) are different ªlook up tableº (Butcher, 1987, p. 9), the final
versions of the report available for different determiner of the accuracy and adequacy of the
referral questions? (vi) do the reports include computer-based report is the psychologist who
practical suggestions? and (vii) are the reports receives the report.
periodically revised to reflect newly acquired An excellent example of adherence to this
information? guideline is offered by Finn & Butcher (1991)
Guidelines for Developers of Computer-based Test Services 519

who quote the disclaimer from the Clinical 4.17.7.3 Classification Strategy
Interpretive Report for the MMPI. This
disclaimer is printed on each interpretive report When a computerized assessment system
and highlights for users how the report should utilizes a classification system based on cutting
be utilized, ªThis MMPI interpretation can scores, system developers must offer a convin-
serve as a useful source of hypotheses about cing rationale for the particular system adopted
clients . . . the personality descriptions, infer- and the cutting scores selected. In order for a
ences, and recommendations contained herein CBTI system to be fully actuarial, system output
need to be verified by other sources of clinical is determined solely by statistical regularities
information since individual clients may not that have been empirically demonstrated to
fully match the prototypeº (p. 367). exist between input and output data (Moreland,
1985). Instead, most systems combine actuarial
and clinical expertise approaches. Users must be
4.17.7 GUIDELINES FOR DEVELOPERS aware of the system strategy for integrating
OF COMPUTER-BASED TEST statistical and clinical prediction in the service of
SERVICES classifying examinees. Additionally, developers
bear responsibility for communicating to users
The following guidelines apply most directly the consistency of the classification system and
to those involved in the construction, valida- the meaning associated with changes between
tion, and marketing of computerized assess- categories.
ment systems for use by others. Table 2 contains
the guidelines for developers of computer-based
test services (APA, 1986). 4.17.7.4 Validity of Computer Interpretations
As indicated by Guidelines 24±30, developers
4.17.7.1 Human Factors of computer-based test services are responsible
for communicating information to users con-
Guidelines 10±15 suggest developers of cerning the system's validity. Not only should
CAPA systems are broadly responsible to validity data be made available to users, but
ensure that computerized administration of developers must also clarify the extent to which
tests does not hamper the examinees' perfor- interpretive statements are based on expert
mance in any way. Examinees must also clinical opinion or quantitative research. Users
maintain reasonable control of the testing must be shown the connection between such
process and the testing service must assume research or clinical opinion and specific inter-
full responsibility for establishing appropriate pretive statements and classification decisions.
procedures for ensuring the confidentiality of Further, developers are to warn users of
data collected and the privacy of the examinee. common errors and potential pitfalls associated
This may be particularly challenging in light of with the interpretive system. One of the reasons
the development of on-line test scoring and for the pervasive problem with establishing the
interpretation services. On-line access to client validity of CBTI systems has to do with the
information must be carefully controlled by the practice of utilizing clinician generated reports
testing service. as the criterion in validity research (Moreland,
1985). At times, computer reports may be more
4.17.7.2 Psychometric Properties accurate than their clinician generated counter-
parts, thus falsely lowering validity coefficients.
Guidelines 16±21 in Table 2 require test or Developers should consider alternative criterion
CBTI program developers to carefully evaluate variables when possible.
and communicate the psychometric quality of Lanyon (1987) described six factors which
the test or computerized interpretive system. should be considered by CBTI system devel-
When developing computerized versions of opers in the service of increasing the validity of
conventional tests, norms and cutting scores CBTI programs. First, reliability coefficients
for the conventional test can only be used if the for both predictor and criterion variables have
computer and conventional form are found to been unacceptably low for most CBTIs and
be equivalent (Green, 1991b). Not only should should be increased. Second, in balancing the
correlations between the two versions be high, ªbandwidth-fidelityº tension, CBTI developers
but score distributions must be generally should develop test indices that lead to single,
equivalent as well. Developers are expected to focused predictions (narrow bandwidth/high-
communicate results of equivalency studies to fidelity) versus trying to say too much from too
prospective users and to take initiative in little data. Third, departures from empirical
ªalertingº users to potential problems resulting data should be minimized. When test inter-
from nonequivalence between forms. pretations are based on data without attempts
520 Computer Assisted Psychological Assessment

to polish, cluster, or otherwise alter them for the technology. Along these lines, there is a rather
sake of appearance, errors in interpretation are profound need for better research designs and
minimized. Fourth, unwarranted generaliza- demonstrations of program validity (Lanyon,
tions from the standardization sample of CBTIs 1984) among CBTI systems. Currently, a
is a major source of invalidity. When the gap number of such programs offer little in the
between the population from which an inter- way of adherence to APA (1986) guidelines for
pretive system was derived and the population development of CBTI software.
on which it is used is substantial, validity Hopefully, the future holds more promise for
declines and erroneous interpretations abound. empirical research on the reliability and validity
Fifth, Lanyon recommends that an ªunclassifi- of CBTI systems. As more powerful desk-top
ableº option be vigorously employed. Adding (and lap-top) computers expand, increasingly
such a category (versus forcing predictive or complex statistical analyses are possible for test
interpretive statements for every profile) sub- developers who have the expertise to use them.
stantially enhances system validity. Finally, No longer is computer power located in a small
whenever possible, different base rates for number of central or ªmainframeº facilities.
characteristics being predicted or described Greater resources and more creativity in
should be employed in the interpretive system. research designs for the study of CBTI are
certainly needed. Creative methods of linking
4.17.7.5 Facilitation of Review test and nontest data, in cost-effective formats,
will also be crucial to the development and long-
The APA guidelines (APA, 1986) highlight term refinement of CBTI. Test developers
the requirement for CBTI system developers to should study the classic examples given by
provide adequate information about the system Lachar (1984) and the methods reviewed by
as well as reasonable access to this information Snyder et al. (1990) and find ways to connect
on the part of qualified professionals engaged in clinician's ratings, medical and treatment his-
reviewing the interpretive system. Previous tories, and demographic data to the test-score
reviewers of this topic appear to concur that data file. An important area of expansion in
availability of detailed development informa- neuropsychological assessment will be the
tion and data is critical to responsible evaluation connection of imaging technology (e.g., mag-
of CBTI systems and their usefulness to netic resonance imaging [MRI] results) to
clinicians (Green, 1991b; Lanyon, 1987; Roid psychological and cognitive test scores.
& Gorsuch, 1984; Snyder et al., 1990). High Alternatively, the field of CBTI should shift
quality interpretive programs clearly label the more toward a model of computer as ªresearch
program using standardized descriptions which assistantº (Roid, 1985), and provide methods
clarify the function of the system and the specific for the clinician to access research findings,
manner in which it generates interpretive empirical correlates, and, perhaps, brief and
statements. Such programs provide detailed verifiable narrative descriptions such as symp-
data relative to development of the system and tom lists from clinician checklists. However, in
extensive references to the empirical basis for the ªassistantº model, the final selection and
the decision rules used. control of narrative statements would be
retained as a function of the clinician, not the
computer program. As mentioned previously,
4.17.8 DIRECTIONS FOR THE FUTURE the word-processing capabilities of systems such
as WISC-III Writer (Psychological Corpora-
Various authors have speculated about the tion, 1994) should make correlated empirical
future role and developmental course of CAPA. findings available in a ªscrapbookº but not be
These have included highly optimistic outlooks automatically printed in a report. Such systems
such as that of Moreland (1985), ªI am could become even more elaborate in the future,
confident that the computer will eventually employing extensive database functions and
replace the professional for most, but not all, multimedia graphics, to supplement the con-
assessment functions . . . and this may happen ventional test-score results. Much could be done
sooner rather than laterº (p. 229), as well as in the future to have even more demographic
more pessimistic perspectives based on fears and historical data available for each client, as
about the dehumanization of the assessment long as privacy is protected. The control of the
process (Matarazzo, 1986). assembly, selection, and composition of the
A primary problem in the development in final collection of, for example, data, state-
CAPA is the substantial lag in ªpsychotechnol- ments, and graphs, should remain in the hands
ogyº currently evident in the assessment field. of the experienced clinician.
Specifically, our understanding of assessment Another area of future development will
appears to lag behind available computer surely be in the expansion of the multimedia
Conclusions and Recommendations 521

capability of CAPA programs. All types of As interest in developing integrated clinical diag-
CAPA, from administration to scoring to nostic reports broadens, more research or system
interpretation could employ more sophisticated adequacy will be stimulated, and, no doubt, more
graphic and, perhaps, video capability. Gregory intense dialogue will be generated on the appro-
priateness of machines to perform what is believed
(1996) reports on the development of multi-
by some to be an essentially human activity. (p. 11)
media ªsituationalº tests being developed at
IBM that depict actual on-the-job scenes. The
computer may briefly ªpauseº a video display 4.17.9 CONCLUSIONS AND
and ask the examinee to answer questions, or RECOMMENDATIONS
predict the ªnext stepº in the scenario, for
purposes of assessing the examinee's sensitivity In summary, the best recommendation for
to interpersonal or technical concerns on the clinical practice, given the current state of
job. For example, the senior author of this validation of narrative computer-interpretive
chapter assisted in a research project to develop programs is for clinicians to ªdraw a lineº
qualification tests for parole officers who were between scoring programs and narrative-inter-
being screened for ability to perform ªcomba- pretive programs. Computer administration of
tive arrests.º Film and audio depictions of tests and scoring (including elaborate statistical
scenarios with real parole officers were studied scoring and graphical display) can clearly assist
to identify critical behaviors such as advanced the clinician in terms of efficient use of time and
planning of back-up assistance, prediction of accuracy of calculation. Unless the following
the parolee's potential route in the event of an conditions are satisfied, it is recommended that
escape attempt, and physical strategy for CBTI programs with extensive narrative reports
placing handcuffs on the subject. Extensive be used only for hypothesis generation and never
studies of physical movements involved in arrest as an unedited section of a psychological report:
scenarios were conducted to determine hand (i) Clinician retains control of word/sentence
and arm strengths required. However, it became selection. If the narrative program gives control
clear that many skills were ªcognitiveº rather to the clinician for assembly of narrative
than physical and included skills such as descriptions, then the unique situation of the
planning, prediction, and knowledge of typical client can be used to temper the assessment
parolee strategies of escaping arrest. Such an narrative. Automatically printed statements
evaluation might be depicted on computerized should never be used unedited. Thus, CBTI
video collections, shown to potential applicants, programs should be ªresearch assistants,º not
and present questions of strategy and planning. ªautomated interpreters.º
Clearly, extensive empirical validation with (ii) Test developers apply actuarial research
actual parole officers, including those who were and document it. The technical manual accom-
previously judged to be ªexpertº in combative panying CBTI programs should clearly describe
arrest, would be essential to the development of the details of the empirical validation of all
such a system. The key element of this descriptive statements, list all cutting scores and
ªfuturisticº system would be the video reenact- their validation studies, present classification
ments, presented on multimedia computer accuracy statistics for all profile rules employed,
equipmentÐa method of simulation that has and provide cautionary statements of study
been available on motion-picture film for limitations.
decades. In the future, more precise computer- (iii) Researchers of CBTI should develop more
adaptive testing may be possible for this cost-effective research designs. Given that the
medium. cost of validation is a frequent excuse given for
One of the more promising areas for future lack of empirical study, statisticians and research
development is that of adaptive testing. By experts are encouraged to creatively design new
administering only those test items necessary types of studies in cooperation with CBTI
to draw supportable clinical conclusions, developers so the field of CAPA can advance
psychologists should be able to substantially on a more scientific basis where possible.
reduce testing time while enhancing both (iv) Psychologists should be vigilant in distin-
the reliability and validity of findings (Krug, guishing assessment from testing. In the interests
1987). Computer-generated and tailor-made of preventing the erosion of the meaning of
personality tests are a particularly interesting clinical assessment, as emphasized by Matar-
possibility. azzo (1986, 1990), all psychologists should be
Finally, the debate regarding the role of alerted to the concern that the final evaluation
computers in the assessment process will of all relevant client information, including the
certainly continue in the future around the context, history, and uniqueness of the client, be
issue of the role of the clinical psychologist. reserved for the experienced, trained clinician,
Butcher (1987) noted not the computer.
522 Computer Assisted Psychological Assessment

4.17.10 REFERENCES 1995 Annual Report. Princeton, NJ: Author.


Finn, S. E., & Butcher, J. N. (1991). Clinical objective
Adams, K. M., & Shore, D. L. (1976). The accuracy of an personality assessment. In M. Hersen, A. E. Kazdin, &
automated MMPI interpretation system in a psychiatric A. S. Bellack (Eds.), The Clinical Psychology Handbook
setting. Journal of Clinical Psychology, 32, 80±82. (pp. 362±373). New York: Pergamon.
American Psychological Association. (1985). Standard for Flanagan, J. C., & Lindquist, E. F. (Eds.) (1951).
educational and psychological testing. Washington, DC: Educational measurement, Washington, DC: American
Author. Council of Education.
American Psychological Association. (1986). Guidelines for Fowler, R. D. (1985). Landmarks in computer-assisted
computer-based tests and interpretations. Washington psychological assessment. Journal of Consulting and
DC: Author. Clinical Psychology, 53, 748±759.
American Psychological Association. (1992). Ethical prin- Fowler, R. D., & Butcher, J. N. (1986). Critique of
ciples of psychologists and code of conduct. American Matarazzo's view on computerized psychological testing.
Psychologist, 47, 1597±1611. American Psychologist, 41, 94±96.
American Psychological Association. (1996). Statement on Garb, H. N. (1994). Judgment research: Implications for
the disclosure of test data. American Psychologist, 51, clinical practice and testimony in court. Applied and
644±648. Preventive Psychology, 3, 173±183.
Andrews, L. W., & Gutkin, T. B. (1991). The effects of Goldberg, L. R. (1968). Simple models or simple processes?
human versus computer authorship on consumers' Some research on clinical judgments. American Psychol-
perceptions of psychological reports. Computers in Hu- ogist, 23, 483±496.
man Behavior, 7, 311±317. Graham, J. R. (1993). MMPI-2: Assessing personality and
Angoff, W. H., & Huddleston, E. M. (1958). The multi- psychopathology. New York: Oxford University Press.
level experiment: A study of two-stage system for the Green, B. F. (1991a). Computer based adaptive testing in
College Board SAT (Statistical Report No. 58±21). 1991. Psychology & Marketing, 8(4), 243±257.
Princeton, NJ: Educational Testing Service. Green, B. F. (1991b). Guidelines for computer testing. In T.
Assessment Systems (1990). MicroCAT testing system B. Gutkin & S. L. Wise (Eds.), The computer and the
manual. Minneapolis, MN: Author. decision-making process (pp. 245±274). Hillsdale, NJ:
Barclay, J. R. (1983). Barclay classroom assessment system Erlbaum.
manual. Los Angeles: Western Psychological Services. Gregory, R. J. (1996). Psychological testing: History,
Bayroff, A. G., & Seeley, L. C. (1967, June). An exploratory principles, and applications. Boston: Allyn and Bacon.
study of branching tests. Technical Research Note 188, Groth-Marnat, G., & Schumaker, J. (1989). Computer-
US Army Behavioral Science Research Laboratory. based psychological testing: Issues and guidelines.
Ben-Porath, Y. S., & Butcher, J. N. (1986). Computers in American Journal of Orthopsychiatry, 59, 257±263.
personality assessment: A brief past, an ebullient present, Guastello, S. J., & Rieke, M. L. (1994). Computer-based
and an expanding future. Computers in Human Behavior, test interpretations as expert systems. Computers in
2, 167±182. Human Behavior, 10, 435±455.
Bersoff, D. N., & Hofer, P. J. (1991). Legal issues in Guastello, S. J., Guastello, D. D., & Craft, L. L. (1989).
computerized psychological testing. In T. B. Gutkin & Assessment of the Barnum effect in computer-based
S. L. Wise (Eds.), The computer and the decision-making test interpretations. The Journal of Psychology, 123,
process (pp. 225±244). Hillsdale, NJ: Erlbaum. 477±484.
Birnbaum, A. (1968). Some latent trait models and their Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The
use in inferring an examinee's ability. In F. M. Lord & treatment utility of assessment. American Psychologist,
M. R. Novick (Eds.), Statistical theories of mental test 42, 963±974.
scores. Reading, MA: Addison Wesley. Hofer, P. J., & Green, B. F. (1985). The challenge of
Bloom, B. L. (1992). Computer-assisted psychological competence and creativity in computerized psychological
intervention: A review and commentary. Clinical Psy- testing. Journal of Consulting and Clinical Psychology, 53,
chology Review, 12, 169±197. 826±838.
Burke, M. J., & Normand, J. (1987). Computerized Huba, G. J. (1986). Interval banded profile analysis: A
psychological testing: Overview and critique. Profes- method for matching score profiles to ªsoftº prototypic
sional Psychology: Research and Practice, 18, 42±51. patterns. Educational and Psychological Measurement,
Butcher, J. N. (1985). Introduction to the special series. 46, 565±570.
Journal of Consulting and Clinical Psychology, 53, Jackson, D. N. (1985). Computer-based personality testing.
746±747. Computers in Human Behavior, 1, 255±264.
Butcher, J. N. (1987). The use of computers in psycholo- Kaufman, A. S. (1994). Intelligent testing with the WISC-
gical assessment: An overview of practices and issues. In III. New York: Wiley.
J. N. Butcher (Ed.), Computerized psychological assess- Kaufman, A. S. (1996). Wechsler integrated interpretive
ment: A practitioner's guide (pp. 3±14). New York: Basic system. Odessa, FL: Psychological Assessment Re-
Books. sources.
Butcher, J. N., Keller, L. S., & Bacon, S. F. (1985). Current Klinger, D. E., Miller, D., Johnson, J., & Williams, T.
developments and future directions in computerized (1977). Process evaluation of an on-line computer-
personality assessment. Journal of Consulting and Clin- assisted unit for intake assessment of mental health
ical Psychology, 53, 803±815. patients. Behavior Research Methods and Instrumenta-
Carson, R. C. (1990). Assessment: What role the assessor? tion, 9, 110±116.
Journal of Personality Assessment, 54, 435±445. Kleinmuntz, B. (1969). Personality test interpretation by
Conners, K. (1995). Conners' Continuous Performance Test computer and clinician. In J. N. Butcher (Ed.), MMPI:
computer program. North Tonawanda, NY: Multi Research developments and clinical applications
Health Systems. (pp. 97±104). New York: McGraw-Hill.
Cowden, D. J. (1946). An application of sequential Kleinmuntz, B. (1975). The computer as clinician. Amer-
sampling to testing students. Journal of the American ican Psychologist, 30, 379±387.
Statistical Association, 41, 547±556. Krug, S. E. (1984). Psychware: A reference guide to
Dahlstrom, W. G. (1993). Tests: Small samples, large computer-based products. Kansas City, MO: Test Cor-
consequences. American Psychologist, 48, 393±399. poration of America.
Educational Testing Service (1995). Learning for tomorrow: Krug, S. E. (1987). Microtrends: An orientation to
References 523

computerized assessment. In J. N. Butcher (Ed.), quantifiers. Journal of Experimental Education, 49,


Computerized psychological assessment: A practitioner's 235±240.
guide (pp. 15±25). New York: Basic Books. Powell, D., Kaplan, E., Whitla, D., Weintraub, S., Catlin,
Lachar, D. (1984). WPS Test Report for the Personality R., & Funkenstein, H. (1993). MicroCog: Assessment of
Inventory for Children. Los Angeles: Western Psycholo- cognitive functioning. San Antonio, TX: Psychological
gical Services. Corporation.
Lanyon, R. I. (1984). Personality assessment. Annual Prince, R. J., & Guastello, S. J. (1990). The Barnum effect
Review of Psychology, 35, 667±701. in a computerized Rorschach interpretation system. The
Lanyon, R. I. (1987). The validity of computer-based Journal of Psychology, 124, 217±222.
personality assessment products: Recommendations Psychological Corporation (1987). Differential aptitude
for the future. Computers in Human Behavior, 3, tests computerized adaptive edition manual. San Antonio,
225±238. TX: Author.
Lichtenstein, S., & Newman, J. R. (1967). Empirical scaling Psychological Corporation (1992). Weschler scoring assis-
of common verbal phrases associated with numerical tant manual. San Antonio, TX: Author.
probabilities. Psychonomic Science, 9, 563±564. Psychological Corporation (1994). WISC-III Writer man-
Linn, R. L., Rock, D., & Cleary, A. (1969). The ual. San Antonio, TX: Author.
development and evaluation of several programmed Rasch, G. (1980). Some probability models for aptitude and
testing methods. Educational and Psychological Measure- attainment tests. Chicago: University of Chicago Press.
ment, 29, 129±146. Roid, G. H. (1969). Branching methods for constructing
Lord, F. M. (1968). Some test theory for tailored testing. psychological test scales. Unpublished doctoral disserta-
Research Bulletin RB-68±38. Princeton, NJ: Educational tion, University of Oregon.
Testing Service. Roid, G. H. (1985). Computer-based test interpretation:
Lord, F. M. (1971). The self-scoring flexilevel test. Journal The potential of quantitative methods of test interpreta-
of Educational Measurement, 8, 147±151. tion. Computers in Human Behavior, 1, 207±219.
Marks, P. A., & Seeman, W. (1963). The actuarial Roid, G. H. (1986). Computer technology in testing. In B.
description of abnormal personality. Baltimore: Wil- S. Plake & J. C. Witt (Eds.), The future of testing
liams and Wilkins. (pp. 29±69). Hillsdale, NJ: Erlbaum.
Matarazzo, J. D. (1983). Computerized psychological Roid, G. H., & Fitts, W. H. (1988). Tennessee Self Concept
testing (Editorial). Science, 221, 323. Scale revised manual. Los Angeles: Western Psychologi-
Matarazzo, J. D. (1986). Computerized clinical psycholo- cal Services.
gical test interpretations: Unvalidated plus all mean and Roid, G. H., & Gorsuch, R. L. (1984). Development and
no sigma. American Psychologist, 41, 14±24. clinical use of test-interpretive programs on microcom-
Matarazzo, J. D. (1990). Psychological assessment versus puters. In M. D. Schwartz (Ed.), Using computers in
psychological testing. American Psychologist, 45, clinical practice (pp. 141±149). New York: Haworth
999±1017. Press.
McBride, J. R. (1988, August). A computerized adaptive Roper, B. L., Ben-Porath, Y. S., & Butcher, J. N. (1991).
version of the Differential Aptitude Test. Paper presented Comparability of computerized adaptive and conven-
at the annual meeting of the American Psychological tional testing with the MMPI-2. Journal of Personality
Association, Atlanta. Assessment, 57, 278±290.
Meehl, P. E. (1954). Clinical vs. statistical prediction. Sands, W. A., & Gade, P. A. (1983). An application of
Minneapolis, MN: University of Minnesota Press. computerized adaptive testing in Army recruiting.
Meehl, P. E. (1956). Wanted: A good cookbook. American Journal of Computer-Based Instruction, 10, 37±89.
Psychologist, 11, 263±272. Silverstein, A. B. (1981). Reliability and abnormality of test
Mitchell, J. V., & Kramer, J. J. (1985). Computer-based score differences. Journal of Clinical Psychology, 37,
assessment and the public interest: An examination of 392±394.
the issues and introduction to the special issue. Snyder, D. (1981). Manual for the marital satisfaction
Computers in Human Behavior, 1, 203±305. inventory. Los Angeles: Western Psychological Services.
Moreland, K. L. (1985). Computer-assisted psychological Snyder, D., Widiger, T., & Hoover, D. (1990). Methodo-
assessment in 1986: A practical guide. Computers in logical considerations in validating computer-based test
Human Behavior, 1, 221±233. interpretations: Controlling for response bias. Psycholo-
Moreland, K. L. (1992). Computer-assisted psychological gical Assessment, 2, 470±477.
assessment. In M. Zeidner and R. Most (Eds.) Psycho- Sympson, J. B., Weiss, D. J., & Ree, M. J. (1982).
logical testing: An inside view. Palo Alto, CA: Consulting Predictive validity of conventional and adaptive tests in
Psychologists Press. an Air Force training environment (AFHRL TR 81±40).
Moreland, K. L., Eyde, L. D., Robertson, G. J., Primoff, Brook Air Force Base, TX: Manpower and Personnel
E. S., & Most, R. B. (1995). Assessment of test user Division, Air Force Human Relations Laboratory.
qualifications. American Psychologist, 50, 14±23. Tranel, D. (1994). The release of psychological data to non
O'Dell, J. W. (1972). P. T. Barnum explores the computer. experts: Ethical and legal considerations. Professional
Journal of Consulting and Clinical Psychology, 38, Psychology: Research and Practice, 25, 33±38.
270±273. Wainer, H. (Ed.) (1990). Computerized adaptive testing: A
Patterson, J. J. (1962). An evaluation of the sequential primer. Hillsdale, NJ: Erlbaum.
method of psychological testing. Unpublished doctoral Wald, A. (1947). Sequential analysis. New York: Wiley.
dissertation, Michigan State University. Weiss, D. J. (Ed.) (1983). New horizons in testing. New
Piotrowski, C., & Keller, J. W. (1989). Use of assessment in York: Academic Press.
mental health clinics and services. Psychological Reports, Wise, S. L., & Plake, B. S. (1990). Computer-based testing
64, 1298. in higher education. Measurement and evaluation in
Pohl, N. F. (1981). Scale considerations in using vague counseling and development, 23, 3±9.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.18
Therapeutic Assessment: Linking
Assessment and Treatment
MARK E. MARUISH
Strategic Advantage, Minneapolis, MN, USA

4.18.1 INTRODUCTION 526


4.18.2 THE CURRENT PRACTICE OF PSYCHOLOGICAL ASSESSMENT IN THE
THERAPEUTIC ENVIRONMENT 528
4.18.3 PSYCHOLOGICAL ASSESSMENT AS A THERAPEUTIC ADJUNCT 530
4.18.3.1 Psychological Assessment for Clinical Decision-making 530
4.18.3.2 Psychological Assessment as a Treatment Technique 530
4.18.3.3 Psychological Assessment for Outcomes Assessment 531
4.18.4 GENERAL CONSIDERATIONS FOR THE SELECTION AND USE OF PSYCHOLOGICAL
TEST INSTRUMENTATION 531
4.18.4.1 Types of Instrumentation for Therapeutic Assessment 532
4.18.4.1.1 Psychological/psychiatric symptom measures 532
4.18.4.1.2 Measures of general health status and role functioning 533
4.18.4.1.3 Quality of life measures 534
4.18.4.1.4 Service satisfaction measures 534
4.18.4.2 Guidelines for Instrument Selection 535
4.18.4.2.1 National Institute of Mental Health criteria 535
4.18.4.2.2 Other criteria and considerations 536
4.18.5 PSYCHOLOGICAL ASSESSMENT AS A TOOL FOR SCREENING 538
4.18.5.1 Research-based Use of Psychological Screeners 539
4.18.5.2 Implementation of Screeners into the Daily Work Flow of Service Delivery 540
4.18.6 PSYCHOLOGICAL ASSESSMENT AS A TOOL FOR TREATMENT PLANNING 541
4.18.6.1 Assumptions About Treatment Planning 541
4.18.6.2 The Benefits of Psychological Assessment for Treatment Planning 542
4.18.6.2.1 Problem identification 542
4.18.6.2.2 Problem clarification 542
4.18.6.2.3 Identification of important patient characteristics 543
4.18.6.2.4 Monitoring of progress along the path of expected improvement 544
4.18.7 PSYCHOLOGICAL ASSESSMENT AS A THERAPEUTIC INTERVENTION 545
4.18.7.1 What Is Therapeutic Assessment? 545
4.18.7.2 The Impetus for Therapeutic Assessment 546
4.18.7.3 The Therapeutic Assessment Process 546
4.18.7.3.1 Step 1: The initial interview 547
4.18.7.3.2 Step 2: Preparing for the feedback session 547
4.18.7.3.3 Step 3: The feedback session 547
4.18.7.3.4 Additional steps 548
4.18.7.4 Empirical Support for Therapeutic Assessment 548
4.18.8 PSYCHOLOGICAL ASSESSMENT AS A TOOL FOR OUTCOMES MANAGEMENT 549
4.18.8.1 What Are Outcomes? 549
4.18.8.2 Outcomes Assessment: Measurement, Monitoring, and Management 550

525
526 Therapeutic Assessment: Linking Assessment and Treatment

4.18.8.3 The Benefits of Outcomes Assessment 550


4.18.8.4 The Therapeutic Use of Outcomes Assessment 550
4.18.8.4.1 Purpose of the outcomes assessment 551
4.18.8.4.2 What to measure 551
4.18.8.4.3 How to measure 552
4.18.8.4.4 When to measure 553
4.18.8.4.5 How to analyze outcomes data 554
4.18.9 FUTURE DIRECTIONS 555
4.18.9.1 What the Industry Is Moving Away From? 555
4.18.9.2 Trends in Instrumentation 556
4.18.9.3 Trends in Data Use and Storage 556
4.18.9.4 Trends in the Application of Technology 557
4.18.10 SUMMARY 558
4.18.11 REFERENCES 559

4.18.1 INTRODUCTION national survey of 40 000 people in 16 000


households, Olfson and Pincus (1994a, 1994b)
The cost of health care in the USA has found that 3% of the population was seen for at
reached astronomical heights. In 1995, approxi- least one psychotherapeutic session that year.
mately $1 trillion, or 14.9% of the gross Of these visits, 81% were to mental health
domestic product, was spent on health care, professionals. Estimates provided by Vanden-
and a 20% increase is expected by the year 2000 Bos, DeLeon, and Belar (1993) in the early
(Mental Health Weekly, 1996a). The cost and 1990s indicated that in any year, 37.5 million
prevalence of mental health problems and the Americans (or 15% of the population at that
accompanying need for behavioral health care time) could benefit from mental health services.
services in the USA continue to rise at rates What is the value of the services provided to
which give cause for concern. America's mental those suffering from mental illness or substance
health bill in 1990 was $147 billion (Mental abuse/addiction/dependency? Some might ar-
Health Weekly, 1996c). The Center for Disease gue that the benefit is either minimal, or too
Control and Prevention (1994) recently re- costly to achieve if significant effects are to be
ported on the results of a survey of 45 000 gained. This is in the face of data which suggest
randomly interviewed Americans regarding otherwise. Numerous studies have demon-
their quality of life. The survey found that strated that treatment of mental health and
one-third of the respondents reported they substance abuse/dependency problems can
suffered from depression, stress, or emotional result in substantial savings when viewed from
problems at least one day a month, and 11% a number of perspectives. This ªcost offsetº
percent of the sample reported having these effect probably has been demonstrated most
problems more than eight days a month. clearly in savings in medical care dollars over
The American Psychological Association given periods of time.
(APA; 1996) also reports statistics, summarized Medical cost offset considerations are sig-
below, that bear attention. nificant, given reports that 50±70% of usual
(i) It is estimated that 15±18% of Americans primary care visits are for medical problems that
suffer from a mental disorder; 14 million of involve psychological factors (APA, 1996).
these individuals are children. APA also reports that 25% of patients seen
(ii) Approximately eight million Americans by primary care physicians have a disabling
suffer from depression in any given one-month psychological disorder, and that depression and
period. anxiety rank among the top six conditions seen
(iii) As many as 20% of Americans will suffer by family physicians. Following are just a few of
one or more major episodes of depression the findings supporting the medical cost benefits
during their lifetime. that can accrue from providing behavioral
(iv) An estimated 80% of elderly residents in health care treatment.
Medicaid facilities were found to have moderate (i) At least 25% or more of patients seen in a
to intensive needs for mental health services. primary care setting have diagnosable beha-
Moreover, information from various studies vioral disorders and use two to four times as
indicates that at least 25% of primary health many medical resources as those patients with-
care patients have a diagnosable behavioral out these disorders (Mental Health Weekly,
disorder (Mental Health Weekly, 1996b). 1996b).
The need for behavioral health care services is (ii) Sipkoff (1995) reported several conclu-
significant. In analyzing data from a 1987 sions, drawn from a review of numerous studies
Introduction 527

conducted between 1988 and 1994 and listed in and otherwise, that accrue from the treatment
the Cost of addictive and mental disorders and of mental health and substance abuse/depen-
effectiveness of treatment report published by dency problems also can come in forms that
the Substance Abuse and Mental Health Ser- may not be so obvious. One area in which
vices Administration (SAMHSA). One conclu- treatment can have a tremendous impact is that
sion derived from a meta-analysis of offset of the workplace. For example, note a few of the
effect was that treatment for mental health facts assembled by APA (1996).
problems results in an approximately 20% (i) In 1985 behavioral health problems re-
reduction in the overall cost of health care. sulted in over $77 billion in lost income to
The report also concluded that while alcoholics Americans.
were found to spend twice as much on health (ii) California's stress-related disability claims
care as those without abuse problems, one-half totaled $350 million in 1989.
of the cost of substance abuse treatment is offset (iii) In 1980, alcoholism resulted in over 500
within one year by subsequent reductions in the million lost work days in this country.
combined medical cost savings for the patient (iv) Major depression cost an estimated $23
and his or her family. billion in lost work days in 1990. In addition,
(iii) Strain et al. (1991) found that screening a individuals with this disorder are three times
group of 452 elderly hip fracture patients for more likely than nondepressed individuals to
psychiatric disorders prior to surgery and miss time from work and four times more likely
providing mental health treatment to the 60% to take disability days.
of the sample needing treatment reduced total (v) Of all subjects from 58 psychotherapy
medical expenses by $270 000. The cost of the effectiveness studies focusing on the treatment
psychological/psychiatric services provided to of depression, 77% received significantly better
this group was only $40 000. work evaluations than depressed subjects who
(iv) Simmons, Avant, Demski, and Parisher did not receive treatment.
(1988) compared the average medical costs for (vi) Treatment resulted in a 150% increase in
chronic back pain patients at a multidimen- earned income for alcoholics and a 390%
sional pain center (providing psychological and increase in income for drug abusers in one
other types of intervention) during the year study of 742 substance abusers.
prior to treatment to those costs of the year In related findings, anxiety disorders ac-
following treatment. The pretreatment costs per counted for one-third of America's $147 billion
patient were $13 284 while post-treatment costs mental health bill in 1990 (Mental Health
were $5596. Weekly, 1996c). And on another front, the
The reader is referred to Friedman, Sobel, former director of the Office of the National
Myers, Caudill, and Benson (1995) for a detailed Drug Control Policy reported that for every
discussion of various ways in which behavioral dollar spent on drug treatment, America saves
interventions can both maximize care to medical seven dollars in health care and criminal justice
patients and achieve significant economic gains. costs (Substance Abuse Funding News, 1995).
APA (1996) has very succinctly summarized Society's need for behavioral health care
what appears to be the prevalent findings of the services provides an opportunity for trained
medical cost offset literature. providers of mental health services to become
part of the solution to a major health care
(i) Patients with mental disorders are heavy problem that shows no indication of decline.
users of medical services, averaging twice as many Each of the helping professions has the potential
visits to their primary care physicians as patients to make a particular contribution to this
without mental disorders. solution. Not the least of these contributions
(ii) When appropriate mental health services are those that can be made by clinical
are made available, this heavy use of the system psychologists. As pointed out in an earlier
often decreases, resulting in overall health savings.
volume (Maruish, 1994), the use of psycholo-
(iii) Cost offset studies show a decrease in total
health care costs following mental health inter- gical tests in the assessment of the human
ventions even when the cost of the intervention is condition is one of the hallmarks of clinical
included. psychology. In fact, the training and acquired
(iv) In addition, cost offset increases over time, level of expertise in psychological testing
largely because . . . patients continue to decrease distinguishes the clinical psychologist from
their overall use of the health care system, and other behavioral health care professionals
don't require additional mental health services. probably more than anything else. Indeed,
(p. 2) expertise in test-based psychological assessment
can be said to be the particular and unique
Medical cost offset effects are relatively contribution that clinical psychologists make to
obvious and easy to measure. Benefits, financial the behavioral health care field.
528 Therapeutic Assessment: Linking Assessment and Treatment

For decades, clinical psychologists and other Spielberger (1992) have described a decrease in
behavioral health care providers have come to interest in assessment that began in the 1960s.
rely on psychological assessment as a standard This was due to a number of factors, including
tool to be used with other sources of informa- shifts in focus to those aspects of treatment for
tion for diagnostic and treatment planning which assessment was thought to contribute
purposes. However, changes that have taken little. Examples of these aspects included a
place in the delivery of health care in general, growing emphasis on behavior modification
and behavioral health care services in particular, techniques, the increasing use of psychotropic
during the past several years have led to changes medications, and an emphasis in studying
in the way in which third-party payers and symptoms rather than personality syndromes
clinical psychologists themselves think about and structures. Fortunately, Megargee and
and/or use psychological assessment in day-to- Spielberger also noted a number of factors that
day clinical practice. Some question the value of indicate a relatively recent resurgence in the
psychological assessment in the current time- interest in assessment, including a new realiza-
limited, capitated service delivery arena where tion of how psychological assessment can assist
the focus has changed from clinical priorities to in interventions provided to mental health care
fiscal priorities (Sederer, Dickey, & Hermann, patients.
1996). Others argue that it is in just such an But where does psychological assessment
arena that the benefits of psychological assess- actually fit into the daily scope of activities
ment can be most fully realized and contribute for practicing psychologists? The results of two
significantly to the delivery of cost-effective recent surveys provide inconsistent findings.
treatment for behavioral health disorders. The newsletter Psychotherapy Finances (1995)
Consequently, it could assist the health care reported the results of a nationwide readership
industry in appropriately controlling or possibly survey of 1700 mental health providers of
reducing the utilization and cost of health care various professions. In this survey, 67% of
over the long term. It is this latter side of the the participating psychologists reported that
argument that is supported by this author, and it they provide psychological testing services. This
provides the basis for this chapter. represents about a 10% drop from a similar
In developing this chapter, the intent has been survey published in 1992 by the same publica-
to provide students and practitioners of clinical tion. Also of interest in this survey is the percent
psychology with an overview of how psycholo- of professional counselors (39%), marriage and
gical assessment could and should be used in this family counselors (16%), psychiatrists (21%),
era of managed behavioral health care. In doing and social workers (13%) offering these same
so, this author discusses how psychological services.
assessment is currently being used in the In a 1995 survey conducted by the APA's
therapeutic environment and the many ways Committee for the Advancement of Profes-
in which it might be used to the ultimate benefit sional Practice (Phelps, 1996), 14 000 practi-
of patients, providers, and payers. tioners responded to questions related to
As a final introductory note, it is important workplace settings, areas of practice concerns,
for the reader to understand that the term and range of activities. Most of the respondents
ªpsychological assessment,º as it is used in this (40.7%) were practitioners whose primary work
chapter, refers to the evaluation of a patient's setting was an individual independent practice.
mental health status using psychological tests or Other general work settings, that is, govern-
related instrumentation. This evaluation may be ment, medical, academic, group practice set-
conducted with or without the benefit of patient tings, were represented by fairly equal numbers
or collateral interviews, review of medical or of respondents from the remainder of the
other records, and/or other sources of relevant sample. The principal professional activity
information about the patient. reported by the respondents was psychother-
apy, with 43.9% of the sample acknowledging
involvement in this service. Assessment was the
4.18.2 THE CURRENT PRACTICE OF second most prevalent activity, being reported
PSYCHOLOGICAL ASSESSMENT by 14% of the sample.
IN THE THERAPEUTIC Differences in the two samples utilized in the
ENVIRONMENT above surveys may account for the inconsis-
tencies in their findings. Psychologists who are
For a number of decades, psychological subscribers to Psychotherapy Finances may
assessment has been viewed as a valued and represent that subsample of the APA survey
integral part of the services offered by clinical respondents who are more involved in the
psychologists. However, its popularity has not delivery of clinical services. Certainly the fact
been without its ups and downs. Megargee and that only about 44% of the APA respondents
The Current Practice of Psychological Assessment in the Therapeutic Environment 529

offer psychotherapy services supports this to obtaining a descriptive narrative with scores. In
hypothesis. this context, testing is perceived as a strong tool for
Regardless of the two sets of findings, assisting the primary provider in more accurately
psychological assessment does not appear to determining patient ªimpairmentsº and how to
ªrepairº them. (p. 15)
be utilized as much as in the past, and one does
not have to look hard to determine at least one
reason why. One of the major changes that has In general, Werthman views psychological
come about in the American health care system assessment as being no different from other
during the past several years has been the forms of patient care, thus making it subject to
creation and proliferation of managed care the same scrutiny, demands for demonstrating
organizations (MCOs). The most significant medical necessity and/or utility, and consequent
direct effects of managed care include reduc- limitations imposed by MCOs on other covered
tions in the length and amount of service, services.
reductions in accessibility to particular mod- The foregoing representations of the current
alities (e.g., reduced number of outpatient visits state of psychological assessment in behavioral
per case), and profession-related changes in the health care delivery could be viewed as an omen
types of services managed by behavioral health of worse things to come. In this author's
care providers (Oss, 1996). Overall, the impact opinion, they are not. Rather, the limitations
of managed behavioral health care on the that are being imposed on psychological
services offered by psychologists and other assessment and the demand for justification
health care providers has been tremendous. In of its use in clinical practice represent part of the
the APA survey reported above (Phelps, 1996), customers' dissatisfaction with the way things
approximately 79% of the respondents re- always have been done in the past. In general,
ported that managed care had either a low, this author views the tightening of the purse
medium, or high negative impact on their strings as a positive move for both behavioral
work. How has managed care negatively health care and the profession of psychology. It
impacted the use of psychological assessment? is a wake-up call to those who have contributed
It is not clear from the results of this survey, to the health care crisis by either uncritically
but perhaps others can offer at least a partial performing costly psychological assessments,
explanation. being unaccountable to the payers and recipi-
Ficken (1995) has provided some insight into ents of our services, and generally not perform-
how the advent of managed care has limited the ing our services in the most responsible, cost-
reimbursement for (and therefore the use of) effective and efficient way possible. It is telling
psychological assessment. In general, he sees the us that we need to evaluate what we've done and
primary reason for this as being a financial one. the way we've done it, and to determine what is
In an era of capitated behavioral health care the best way to do it in the future. As such, it
coverage, the amount of money available for provides an opportunity for clinical psycholo-
behavioral health care treatment is limited. gists to re-establish the valuable contributions
MCOs therefore require a demonstration that they can make to improving the quality of
the amount of money spent for testing will result behavioral health care delivery through their
in a greater amount of treatment cost savings. In knowledge and skills in the area of psycholo-
addition, Ficken notes that much of the gical assessment.
information obtained from psychological as- In the sections that follow, this author will
sessment is not relevant to the treatment of convey what he sees are the opportunities for
patients within an MCO environment. Under- psychological assessment in the behavioral
standably, MCOs are reluctant to pay for the health care arena, both in the present and the
gathering of such information. future, and the means of best achieving them.
Werthman (1995) provides similar insights The views that are advanced are based on his
into this issue, noting that knowledge of and experience in current psy-
chological assessment practices as well as
Managed care . . . has caused [psychologists] to directions provided by the current literature.
revisit the medical necessity and efficacy of their Some will probably disagree with the proposed
testing practices. Currently, the emphasis is on the approach, given their own experience and
use of highly targeted and focused psychological thinking on the matters discussed. However,
and neuropsychological testing to sharply define
the ªproblemsº to be treated, the degree of
it is hoped that even though in disagreement, the
impairment, the level of care to be provided and reader will be challenged to defend his or her
the treatment plan to be implemented. position to themselves and as a result, feel more
The high specificity and ªproblem-solvingº comfortable in their thinking about their
approach of such testing reflects MCOs' commit- approach to their psychological assessment
ment to effecting therapeutic change, as opposed practices.
530 Therapeutic Assessment: Linking Assessment and Treatment

4.18.3 PSYCHOLOGICAL ASSESSMENT of decision-making for which it has been used


AS A THERAPEUTIC ADJUNCT include those related to screening, treatment
planning, and monitoring of treatment pro-
The role of psychological assessment in the gress. Generally, screening may be undertaken
therapeutic environment traditionally has been to assist in either: (i) identifying the patient's
quite limited. Those of us who did not receive need for a particular service, or (ii) determining
our graduate clinical training within the past few the likelihood of the presence of a particular
years probably have been taught the value of disorder or other behavioral/emotional/psycho-
psychological assessment only at the ªfront logical problem. More often than not, a positive
endº of treatment. We were instructed in the finding on screening leads to a more extensive
power and utility of psychological assessment as evaluation of the patient in order to confirm
a means of assisting in the identification of with greater certainty the existence of the
symptoms and their severity, personality char- problem, or to further delineate the problem.
acteristics relevant to understanding the patient The value of screening lies in the fact that it
and his or her typical way of perceiving and permits the clinicians to identify, quickly and
interacting with the world, and other aspects of economically, with a fairly high degree of
the individual (e.g., intelligence, vocational confidence (depending on the particular instru-
interests) that are important in arriving at a mentation used), those who are and are not
description of the patient at one particular point likely to need care or at least further evaluation.
in time. Based on these data and information In many instances, psychological assessment
obtained from patient and collateral interviews, is performed in order to obtain information that
medical records and the individual's stated goals is deemed useful in the development of a specific
for treatment, a diagnostic impression was given plan for treatment. Typically, it is the type of
and a treatment plan was probably formulated information that is not easily (if at all) accessible
and placed in the patient's chart, hopefully to be through other means or sources. It is informa-
reviewed at various points during the course of tion which, when combined with other informa-
treatment. In some cases, the patient was tion about the patient, aids in understanding the
assigned to another practitioner within the patient, identifying the most important pro-
same organization or referred out, never to be blems and issues that need to be addressed, and
contacted or seen again, much less be assessed formulating recommendations about the best
again by the one who performed the original means of addressing them.
assessment. Another way in which psychological assess-
Fortunately, during the past few years the ment can play a role in clinical decision-making
usefulness of psychological assessment as more is in the area of treatment monitoring. Repeated
than just a tool to be used at the beginning of assessment of the patient at regular intervals
treatment has come to be recognized. Conse- during the treatment process can provide the
quently, its utility has been extended beyond therapist with feedback regarding the progress
being a mere tool for describing an individual which is being made in the therapeutic en-
presenting themselves for treatment to being a deavor. Based on the findings, the therapist will
means of facilitating the treatment and under- be encouraged either to continue with the
standing of behavioral health care problems original therapeutic approach or, in the case
throughout the episode of care and beyond. of no change or exacerbation of the problem, to
Psychologists and others who employ it in their modify or abandon the approach in favor of an
practices are now finding that psychological alternate one.
assessment can be used for a variety of purposes.
Generally speaking, several psychological tests
currently being marketed can be employed as 4.18.3.2 Psychological Assessment as a
tools for assisting in clinical decision-making, Treatment Technique
outcomes assessment and, more directly, as It is only recently that empirical studies and
treatment techniques in and of themselves. Each other articles addressing the therapeutic benefits
of these uses can uniquely contribute incre- that can be realized directly from discussing
mental value to the therapeutic process. psychological assessment results with the pa-
tient have been published. Rather than just
4.18.3.1 Psychological Assessment for Clinical providing test feedback as directed by APA's
Decision-making Ethical principles of psychologists (APA, 1992),
therapeutic use of assessment involves a
Traditionally, psychological assessment has presentation of assessment results (including
been used to assist clinical psychologists and assessment materials such as test protocols,
other behavioral health care clinicians in profile forms, other assessment summary ma-
making important clinical decisions. The types terials) directly to the patient; an elicitation of
General Considerations for the Selection and Use of Psychological Test Instrumentation 531

the patient's reactions to them; and an in-depth to instill in the patient greater self-confidence
discussion of the meaning of the results in terms and self-esteem, and/or a more realistic view of
of patient-defined assessment goals. In essence, where he or she is (from a psychological
the assessment data can serve as a catalyst for standpoint) at that particular time in their life.
the therapeutic encounter via the objective Conversely, it may serve as an objective
feedback that is provided to the patient, the indicator to the patient of the need for
patient self-assessment that is stimulated, and continued treatment.
the opportunity for patient and therapist to The purpose of the foregoing is to present a
arrive at mutually agreed upon therapeutic broad overview of psychological assessment as a
goals, based on impressionistic and objective multipurpose behavioral health care tool.
data available to both parties. Depending on the individual clinician or
provider organization, it may be employed for
4.18.3.3 Psychological Assessment for one or more, or all, of the purposes just
Outcomes Assessment described. Knowing the various ways in which
psychological assessment can be used in the
Currently, one of the most common reasons service of therapeutic change should help the
for conducting psychological assessment in the reader understand the more in-depth and
USA is to assess the outcomes of behavioral detailed discussion about how these applica-
health care treatment. It is difficult to open a tions can facilitate or otherwise add value to the
trade paper or health care newsletter or to psychotherapeutic services offered by provi-
attend a professional conference without being ders. This detailed discussion follows below.
presented with a discussion on either how to ªdo Before beginning this discussion, however, it
outcomesº or what the results of a certain is important to briefly review the types of
facility's outcomes study have revealed. The instrumentation most likely to be used in
focus on outcomes assessment most probably therapeutic psychological assessment, as well
can be traced to the ªcontinuous quality as the significant considerations and issues
improvementº (CQI) movement that was related to the selection and use of this
initially implemented in business and industrial instrumentation for the stated purposes. This
settings. The impetus for the movement origin- should further facilitate the reader's under-
ally was a desire to produce quality products in standing of the remainder of the chapter.
the most efficient manner, resulting in increased
revenues and decreased costs.
In the health care arena, outcomes assessment 4.18.4 GENERAL CONSIDERATIONS FOR
has multiple purposes, not the least of which is THE SELECTION AND USE OF
as a tool for marketing the organization's PSYCHOLOGICAL TEST
services. Related to this, those organizations INSTRUMENTATION
vying for lucrative contracts from third-party
payers to provide health care services to their New instrumentation for facilitating and
covered lives frequently require outcomes data evaluating behavioral health care treatment is
demonstrating the effectiveness of the services released by the major test publishers annually.
offered by the bidders. Equally important to Thus, the availability of instrumentation for
those awarding contracts is how satisfied these purposes is not an issue. However,
patients are with the provider's services. But selection of the appropriate instrument(s) for
probably the most important potential use of one or more of the therapeutic purposes
this data for provider organizations (although described above is a matter requiring careful
not always recognized as such) can be found in consideration. Inattention to the instrument's
the knowledge it yields about what works and intended use, its demonstrated psychometric
what doesn't. In this regard it can serve a characteristics, its limitations, and other aspects
program evaluation function. It is this knowl- related to its practical application can result in
edge that, if attended to and acted upon, can misguided treatment and potentially harmful
lead to improvement in the services the consequences for a patient.
organization offers. When used in this manner, Several types of instruments could be used for
outcomes assessment can become an integral the general therapeutic assessment purposes
component of the organization's CQI initiative. described above. For example, neuropsycholo-
But more importantly for the individual gical instruments might be used to assess
patient, outcomes assessment provides a means memory deficits that could impact the clin-
of objectively measuring how much improve- ician's decision to perform further testing, the
ment he or she has made from the time of goals established for treatment, and the
treatment initiation to the time of treatment approach to treatment that is selected. Tests
termination. Feedback to this effect may serve designed to provide estimates of level of
532 Therapeutic Assessment: Linking Assessment and Treatment

intelligence might be used for the same of distress. Probably the most widely used and/
purposes. It is beyond the scope of this chapter or recognized of these measures are the
to address, even in the most general way, all of Minnesota Multiphasic Personality Inventory
the types of tests, rating scales, and the (MMPI; Hathaway & McKinley, 1951) and its
instrumentation that might be employed in restandardized revision, the MMPI-2 (Butcher,
the therapeutic environment. Instead, the focus Dahlstrom, Graham, Tellegen, & Kaemmer,
here will be on general classes of instrumenta- 1989), the Millon Clinical Multiaxial Inventory-
tion that have the greatest applicability in the III (MCMI-III; Millon, 1994), and the Person-
service of the therapeutic endeavor. To a limited ality Assessment Inventory (PAI; Morey, 1991).
extent, specific examples of such instruments Multiscale instruments of this type can serve a
will be presented. This will be followed by a variety of purposes that facilitate therapeutic
discussion of criteria and considerations that efforts. They may be used upon initial contact
will assist the clinician in selecting the with the patient to screen for the need for service
best instrumentation for his or her intended and, at the same time, yield information that is
purposes. useful for treatment planning. Indeed, some
such instruments (e.g., the MMPI-2) may make
available supplementary, content-related, and/
4.18.4.1 Types of Instrumentation for or special scales that are designed to assist the
Therapeutic Assessment user in addressing specific treatment considera-
tions (e.g., low motivation for treatment). Other
The instrumentation required for any ther- multiscale instruments might be useful in
apeutic application will depend on: (i) the identifying specific problems that may be
general purpose(s) for which the assessment is unrelated to the patient's chief complaints
being conducted, and (ii) the level of informa- (e.g., low self-esteem). They can also be
tional detail that is required for those pur- administered at numerous times during the
pose(s). Generally, one may classify the types course of treatment to monitor the patient's
of instrumentation that would serve the pur- progress toward achieving established goals and
pose(s) of the therapeutic assessment into one of to assist in determining what adjustments (if
four general categories. As mentioned above, any) must be made to the clinician's approach.
other types of instrumentation are frequently In addition, use of the instrument in a pre- and
used in clinical settings for therapeutic pur- post-treatment fashion provides information
poses. However, the present discussion will be related to the outcomes of the treatment. Data
limited to those more commonly used by a wide obtained in this fashion can be analyzed with
variety of clinical psychologists in their day-to- results from other patients to evaluate the
day practices. effectiveness of an individual therapist as well as
an organization.
Abbreviated multidimensional measures are
4.18.4.1.1 Psychological/psychiatric symptom
quite similar to the comprehensive multidimen-
measures
sional measure in many respects. First, by
Probably the most frequently used instru- definition, they contain multiple scales for
mentation for several therapeutic purposes are measuring a variety of symptom domains
measures of psychopathological symptomatol- and/or disorders. They also may allow for the
ogy. Besides the fact that these are the types of derivation of an index of the patient's general
instruments on which the majority of the level of psychopathology or distress. In addi-
clinician's psychological assessment training tion, they may be used for screening, treatment
has probably been focused, they were developed planning and monitoring, and outcomes assess-
to assess the problems that typically prompt ment purposes just like the comprehensive
people to seek treatment. instruments. The distinguishing feature of the
There are several subtypes of these measures abbreviated instrument is its length. Again, by
of psychological/psychiatric symptomatology. definition, these instruments are relatively
The first is the comprehensive multidimensional short, and easy to administer and (usually)
measure. This is typically a lengthy, multiscale score. Their brevity does not allow for an in-
instrument that measures and provides a depth assessment of the patient and his or her
graphical profile of the patient on several types problems, but this is not what these instruments
of psychopathological symptom domains (e.g., were designed to do.
anxiety, depression) or disorders (schizophre- Probably the most widely used of these brief
nia, antisocial personality). Also, summary instruments are Derogatis' family of symptom
indices sometimes are available to provide a checklist instruments. These include the original
more global picture of the individual with Symptom Checklist-90 (SCL-90; Derogatis,
regard to his or her psychological status or level Lipman, & Covi, 1973) and its revision, the
General Considerations for the Selection and Use of Psychological Test Instrumentation 533

SCL-90-R (Derogatis, 1983). Both of these health status. They also point to similar
instruments contain a checklist of 90 psycho- conclusions reached by Jahoda (1958) specific
logical symptoms, most of which score on the to the area of mental health. Here, an
instruments' nine symptom scales. For each of individual's self-assessment relative to how he
these instruments an even briefer version has or she feels they should be is an important
been developed. The first is the Brief Symptom component of ªmental health.º
Inventory (BSI; Derogatis, 1992), which was Measures of health status and physical
derived from the SCL-90-R. In a health care functioning can be classified into one of two
environment that is cost-conscious and un- groups: generic and condition-specific. Prob-
willing to make too many demands on patient ably the most widely used and respected generic
time, this 53-item instrument is gaining popu- health status measures are the 36-item Medical
larity over its longer and more expensive 90- Outcomes Study Short Form Health Scale (SF-
item parent instrument. Similarly, a brief form 36; Ware & Sherbourne, 1992; Ware, Snow,
of the original SCL-90 has been developed. Kosinski, & Gandek, 1994) and the 39-item
Titled the Symptom Assessment-45 Question- Health Status Questionnaire 2.0 (HSQ; Health
naire (SA-45; Strategic Advantage, Inc., 1996), Outcomes Institute, 1993; Radosevich, Wetzler,
its development did not follow Derogatis' & Wilson, 1994). Aside from the minor
approach to the development of the BSI; variations in the scoring of one of the instru-
instead, cluster analytic techniques were used ments' scales (i.e., Bodily Pain) and the HSQ's
to select five items each for assessing each of the inclusion of three depression screening items,
nine symptom domains found on the three the two measures essentially are identical. Each
Derogatis checklists. assesses eight dimensions of health, four
The major strength of the abbreviated multi- addressing mental health-related constructs
scale instruments is their ability to broadly and and four addressing physical health-related
very quickly survey several psychological constructs, that reflect the WHO concept of
symptom domains and/or disorders relative to ªhealth.º
the patient. Its value is most clearly evident in Role functioning has recently gained atten-
settings where both the time and dollars tion as an important variable to address in the
available for assessment services are quite course of assessing the impact of a physical or
limited. These instruments provide a lot of mental disorder on an individual. In devising a
information quickly. Because of their brevity, treatment plan and monitoring progress over
they are much more likely to be completed by time, it is important to know how the person's
patients than their lengthier comprehensive ability to work, perform daily tasks, or interact
counterparts. This last point is particularly with others is affected by the disorder. The SF-
important if one is interested in monitoring 36 and HSQ both address these issues with
treatment or assessing outcomes, both of which scales designed for this purpose.
require at least two or more assessments to Responding to concerns that even these
obtain the desired information. relatively brief objective measures are too
lengthy for regular administration in clinical
and research settings, 12-item, abbreviated
4.18.4.1.2 Measures of general health status and
versions of each have been developed. The
role functioning
SF-12 (Ware, Kosinski, & Keller, 1995) was
During the past decade, there has been an developed for use in large scale, population-
increasing interest in the assessment of health based research where the monitoring of health
status in health care delivery systems. Initially, status at a broad level is all that is required.
this interest was shown mostly by those Also, a 12-item version of the HSQ, the HSQ-12
organizations and settings focusing primarily (Radosevich & Pruitt, 1996), was developed for
on the treatment of physical diseases and similar uses. Interestingly, given that the two
disorders. Within recent years, behavioral abbreviated versions were derived from essen-
health care providers have recognized the value tially the same instrument, there is only a 50%
in assessing the patient's general level of health. item overlap between the two shortened instru-
It is important to recognize that the term ments. Both instruments are relatively new but
ªhealthº means more than just the absence of the data supporting their use that has been
disease or debility; it also implies a state of well- gathered up to 1997 is promising.
being throughout the individual's physical, Condition-specific health status and func-
psychological, and social spheres of existence tioning measures have been utilized for a
(World Health Organization [WHO], 1948). number of years. Most have been developed
Dickey and Wagenaar (1996) point out how this for use with physical rather than mental
view of health recognizes the importance of disorders, diseases, and conditions. However,
eliciting the patient's point of view in assessing condition-specific measures of mental health
534 Therapeutic Assessment: Linking Assessment and Treatment

status and functioning are beginning to appear. and organizations equate satisfaction with
A major source of this type of instrument is the outcomes and frequently consider it the most
Minnesota-based Health Outcomes Institute important outcome. In a recent survey of 73
(HOI), a successor to the health care think tank behavioral health care organizations, 71% of
InterStudy. In addition to the HSQ and the the respondents indicated that their outcomes
HSQ-12, HOI serves as the distributor/clearing- studies included measures of patient satisfaction
house for the condition-specific ªtechnology of (Pallak, 1994).
patient experience (TyPE) specifications.º The Although some view service satisfaction as an
available TyPEs that would be most useful to outcome, it is this author's contention that it
clinical psychologists and other behavioral should not be classified as such. Rather, it
health care practitioners include those devel- should be considered a measure of the overall
oped by a team of researchers at the University therapeutic process, encompassing the patient's
of Arkansas Medical Center for use with (and at times, others') view of how the service
depressive, phobic, and alcohol and substance was delivered, the capabilities and attentiveness
disorders. TyPEs for other specific psychologi- of the service provider, the benefits of the service
cal disorders are currently under development at (if any), and any of a number of other selected
the University of Arkansas for distribution aspects of the service he or she received. Patient
through HOI. satisfaction surveys don't answer the question
ªWhat was the result of the treatment rendered
to the patientº; they do answer the question
4.18.4.1.3 Quality of life measures
ªHow did the patient feel about the treatment he
In their brief summary of this area, Andrews, or she received?º Thus, they serve an important
Peters, and Teesson (1994) indicate that most of program evaluation/improvement function.
the definitions of ªquality of lifeº (QOL) The number of questionnaires that are
describe a multidimensional construct encom- currently being used to measure patient satis-
passing physical, affective, cognitive, social, and faction is countless. This reflects the attempts of
economic domains. Objective measures of QOL individual health care organizations to develop
focus on environmental resources required to customized measures that assess variables
meet one's needs and can be completed by important to their particular needs, which in
someone other than the patient. The subjective turn reflects a response to outside demands to
measures of QOL assess the patient's satisfac- ªdo somethingº to demonstrate the effective-
tion with the various aspects of his or her life ness of their services. Often, this ªsomethingº
and thus must be completed by the patient. has not been evaluated to determine its basic
Andrews et al. (1994) draw other distinctions psychometric properties. As a result, there exists
in the QOL arena. One has to do with the numerous options that one may choose from,
differences between QOL and health-related but very few that actually have demonstrated
quality of life, or HRQL, and (similar to the case their validity and reliability as measures of
with health status measures) the other has to do service satisfaction.
with the distinction between generic and Fortunately, there are a few instruments that
condition-specific measures of QOL. QOL have been investigated for their psychometric
measures differ from HRQL measures in that integrity. Probably the most widely used and
the former assess the whole ªfabric of life,º while researched patient satisfaction instrument de-
the latter assess quality of life as it is affected by a signed for use in behavioral health care settings
disease or disorder, or by its treatment. Generic is the eight-item version of the Client Satisfac-
measures are designed to assess aspects of life tion Questionnaire (CSQ-8; Attkisson & Zwick,
that are generally relevant to most people; 1982; Nguyen, Attkisson, & Stenger, 1983). The
condition-specific measures are focused on CSQ-8 was derived from the original 31-item
aspects of the lives of particular disease/disorder CSQ (Larsen, Attkisson, Hargreaves, &
populations. However, as Andrews et al. point Nguyen, 1979), which also yielded two longer
out, generic and condition-specific QOL mea- 18-item alternate forms, the CSQ-18A and
sures tend to overlap quite a bit. CSQ-18B (LeVois, Nguyen, & Attkisson,
1981). The more recent work of Attkisson
and his colleagues at the University of Cali-
4.18.4.1.4 Service satisfaction measures
fornia at San Francisco is the Service Satisfac-
With the exploding interest in assessing the tion Scale-30 (SSS-30; Greenfield & Attkisson,
outcomes of treatment for the patient, it is not 1989), a 30-item multifactorial scale that yields
surprising to see an accompanying interest in information regarding different aspects of
assessing the patient's and, in some instances, satisfaction with mental health service, such
the patient's family's satisfaction with the as perceived outcome and manner and skill of
services received. In fact, many professionals the clinician.
General Considerations for the Selection and Use of Psychological Test Instrumentation 535

4.18.4.2 Guidelines for Instrument Selection dence from the type of treatment to be offered
to the population.
Regardless of the type of instrument one The second set of general considerations is
might consider using in the therapeutic envir- that of methods and procedures (Newman &
onment, many clinical psychologists frequently Ciarlo, 1994). Several selection criteria are
must choose between many product offerings. related to this group. The first is that admin-
But what are the general criteria for the selection istration of the instrument is simple and easily
of any instrument for psychological assessment? taught. Generally, this is more of an issue with
What should guide the clinician's selection of an clinician-rating scales than self-report scales. In
instrument for a specific therapeutic purpose? the case of rating scales, concrete examples, or
As part of their training, clinical psychologists objective referents, at each rating level should be
and professionals from related psychological provided to the user. Next, the instrument
specialties have been educated about the should allow input not only from the patient but
important psychometric properties that should also from other sources (e.g., the clinician,
be considered when determining the appropri- collaterals). The benefits of this include the
ateness of an instrument for its intended use. opportunities to obtain a feel for the patient
However, this is just one of several issues that from many perspectives, to validate reported
should be taken into account in an evaluation of findings and observations, and to promote
a specific instrument for a specific therapeutic honesty in responding from all sources (given
use. The guidance that has been offered by that all parties will know that others will also be
experts with regard to instrument selection is providing input). The final methods and
worth noting here. procedures criterion, though not necessarily
as important for the instrument being used for
screening or treatment planning purposes, is
that the instrument provide information rele-
4.18.4.2.1 National Institute of Mental Health
vant to understanding how the treatment may
criteria
have effected change in the individual.
Probably the most thorough and clinically Newman and Ciarlo's (1994) third set of
relevant guidelines for the selection of psycho- considerations have to do with the psychometric
logical assessment instruments comes from the strengths of the instruments. According to the
National Institute of Mental Health (NIMH) NIMH panel of experts, outcomes measures
supported work of Ciarlo, Brown, Edwards, should: (i) meet the minimum psychometric
Kiresuk, and Newman (1986). A synopsis of standards for reliability (including internal
Newman and Ciarlo's (1994) updated summary consistency, test±retest reliability, and as appro-
of this NIMH work is presented here. Note that priate, interrater reliability) and validity (con-
the criteria discussed below were originally tent, construct, and concurrent validity); (ii) be
developed for use in evaluating instruments for difficult to ªfake badº or ªfake goodº; and (iii)
outcomes assessment purposes. However, most be free from response bias and not reactive or
have relevance to the selection of instrumenta- sensitive to factors unrelated to the constructs
tion used for the other therapeutic assessment that are being measured (e.g., physical settings,
purposes described above. Exceptions and behavior of the treatment staff). These criteria
qualifications with regard to this issue will be obviously also apply to other psychological
noted when appropriate. instruments used for purposes other than
Newman and Ciarlo (1994) describe 11 outcomes assessment. However, for outcomes
criteria for the selection of outcomes assess- assessment purposes, the instrument also must
ment instruments, each of which can be be sensitive to change related to treatment.
grouped into one of five types of consideration. The fourth group of considerations concerns
The first consideration is that of applicability. the cost of the instruments. Newman and
The issue here is the relevance of the instrument Ciarlo (1994) point out that the answer to the
to the target population. The instrument question of how much one should spend on
should assess those problems, symptoms, assessment instrumentation and associated
characteristics, and so on, that are common costs (e.g., staff time for administering, scoring,
to the group to whom the instrument will be processing, and analyzing the data) will depend
administered. The more heterogeneous the on how important the data gathered is to
population, the more chance that modifications assuring a positive return on the functions they
will be required and that these will alter the support. In the context of the NIMH under-
standardization and psychometric integrity of taking, Newman and Ciarlo felt that the data
the instrument. Another applicability issue to obtained through treatment outcomes assess-
consider when the instrument is to be used for ment would support screening/treatment plan-
outcomes assessment purposes is its indepen- ning, efforts in quality assurance and program
536 Therapeutic Assessment: Linking Assessment and Treatment

evaluation, cost containment/utilization review 4.18.4.2.2 Other criteria and considerations


activities, and revenue generation efforts.
Although the work of Ciarlo and his
However, that may be considered the ideal.
colleagues provides more extensive instrument
At this point, the number and nature of the
selection guidelines than most, others who have
purposes that would be supported by the
addressed the issue have arrived at recommen-
obtained data will depend on the individual
dations that serve to reinforce and/or comple-
organization. The more purposes the data can
ment those found in the NIMH document. For
serve, the less costly the instrumentation is
example, Gavin Andrews' work in Australia has
likely to be, at least from a value standpoint. In
led to significant contributions to the body of
terms of actual costs, Ciarlo et al. (1986)
outcomes assessment knowledge. As part of
estimated that 0.5% of an organization's total
this, Andrews et al. (1994) have identified six
budget would be an affordable amount for
general ªqualities of consumer outcome mea-
materials, staff training, data collection, and
suresº that are generally in concordance with
processing costs related to outcomes assess-
those from the NIMH study. First, the measure
ment. However, one should be mindful that the
should meet the criterion of applicability. In
recommendation was made in 1986 and may
other words,
not reflect changes in policies, requirements,
and attitudes related to the use of psychological
it should address dimensions which are important
assessment instruments since that time. to the consumer (symptoms, disability, and con-
The final set of considerations in instrument sumer satisfaction) and useful for the clinician in
selection has to do with the utility of the formulating and conducting treatment, yet the
instrument. Four criteria related to utility are measure should be one which can have its data
posited by Newman and Ciarlo (1994). First, the aggregated in a meaningful way so that the
scoring procedures and the manner in which the requirements of management can be addressed.
results are presented should be comprehensible (p. 30)
to all with a stake in the treatment of the
organization's patients. This would not only Multidimensional instruments yielding a profile
include the patient, his or her family, the of scores on all dimensions of interest are viewed
organization's administrative staff and other as a means of best serving the interests of all
treatment staff, but also third-party payers and concerned.
(in the case of outcomes assessment or program Acceptability, that is, being both brief and
evaluation) legislative and administrative policy user-friendly, is another desirable quality iden-
makers. Related to this is the criterion that the tified by Andrews et al (1994). Closely asso-
results of the instrument be easily interpreted by ciated with this is the criterion of practicality. It
those with a stake in them. Another utility- might be viewed as a composite of those NIMH
related criterion is that the instrument should be criteria related to matters of cost, ease of scoring
compatible with a number of clinical practices and interpretation, and training in the use and
and theories that are employed in the behavioral interpretation of the measure. Again in agree-
health care arena. This should allow for a ment with the NIMH work, the final three
greater range of test applicability and greater criteria identified by Andrews et al. relate to
acceptance by the various stakeholders in the reliability, validity, and sensitivity to change.
patient's treatment. With regard to reliability, Andrews et al. specify
Another important aspect of utility is that what they consider to be the minimum levels of
ªthe instrument support[s] the clinical processes acceptable internal consistency reliability (0.90
of a service with minimal interferenceº (New- for long tests), interrater reliability (0.40), and
man & Ciarlo, 1994, p. 107). There are two construct and criterion validity (0.50). They also
issues here. The first has to do with whether the stress the importance of an instrument's face
instrument can support the screening, planning, validity in helping to ensure cooperation from
and/or monitoring activities in addition to the the patient, and of self-report instruments
outcomes assessment activities. In other words, having multiple response options (rather then
are multiple purposes served by the instrument's just ªyes/noº options) for increasing sensitivity
results? The second issue is one that has to do of an instrument to small but relevant changes in
with the extent to which the organization's staff the patient's status over time.
is burdened with the collection and processing In Ficken's (1995) discussion of the role of
of assessment data. How much will the assessment in an MCO environment, he con-
assessment process interfere with the daily cludes that the difficulties clinicians are experi-
work flow of the organization's staff? Equally encing in demonstrating the utility of
important is whether the benefits that accrue psychological assessment to payers lies in the
justify the cost of implementing an assessment fact the instruments and objectives of tradi-
program for whatever purpose(s). tional psychological assessment are not in synch
General Considerations for the Selection and Use of Psychological Test Instrumentation 537

with the needs of MCOs. The solution to the validity of at least 0.50. The second criterion is
problem appears simple: ªpracticality features.º These include brevity,
ease of administration and scoring, and simpli-
the underlying objectives of testing must be aligned city of the reporting of results. Third, the
with the values and processes of MCOs. In short, instrumentation should be ªsuitableº for the
this means identifying decision points in managed patients that are seen within the setting. Thus,
care processes that could be improved with because of the nature of most presenting
objective, standardized data. There are two ave- problems in mental health settings, it should
nues in which these can be pursued: through assess symptomatology and psychosocial func-
facilitation/objectification of clinical-decision pro- tioning. The fourth criterion is sensitivity to
cesses and through outcome assessment. (p. 12) ªmeaningfulº change over time, allowing for a
differentiation of symptomatic change from
In general, Ficken (1995) sees opportunities interpersonal/social role functional change.
in areas that this author has previously identi- Schlosser (1995) proposed a rather nontradi-
fied as screening, treatment planning, and out- tional view of ªoutcomes assessment.º In what
comes assessment, specifically in the areas of he refers to as a ªpatient-centricº view,
primary medical care and behavioral health assessment information is gathered and used
care (see below). Requirements of instruments during the course of therapy to bring about
used for screening were noted to include: change during therapy, not after therapy has
(i) high levels of sensitivity and specificity to ended. Essentially, this equates to what this
diagnostic criteria from the Diagnostic and author has referred to above (and discusses in
statistical manual of mental disorders (4th ed., more detail below) as treatment monitoring. In
DSM-IV; American Psychiatric Association, this model, Schlosser feels that this type of
1994) or the most up-to-date version of the assessment requires ªelements regarding very
International classification of diseases (ICD); specific, theoretically derived, empirically vali-
(ii) a focus on hard-to-detect (in a single dated areas of functioningº (Schlosser, 1995,
office visit) but treatable disorders that are p. 66). These would involve the use of both
associated with imminent harm to self or illness and well-being measures that assess the
others, significant suffering, and a decrease in patient on emotional, mental/cognitive, physi-
productivity; cal, social, life direction, and life satisfaction
(iii) an administration time of no more than dimensions.
10 minutes; and Many of Schlosser's (1995) considerations for
(iv) an administration protocol that easily selection of such measures are not unique (i.e.,
integrates into the organization's work flow. having ªacceptable' levels of reliability and
Cases testing ªpositiveº on the screener would validity, brief, low-cost, and sensitive). How-
be administered one or more ªsecond-tierº ever, for the purposes described Schlosser also
instrument(s) to establish severity and a specific indicates that they should also: (i) have
diagnosis. Ficken feels that if they are to be ªparadigmatic sensibilityº (i.e., key words have
accepted by MCOs, these second-tier instru- the same meaning across instruments); (ii) be
ments should meet the requirements of screen- designed for repeated administration for feed-
ers and either specify or rule out a diagnosis. back or self-monitoring purposes; and (iii)
According to Ficken (1995), successful out- provide actionable information.
comes assessment instruments also must possess In addition to some already mentioned
certain qualities. Because the areas most criteria (acceptable validity, reliability, afford-
important to assess for outcomes measurement ability, ease of administration, and ease of data
purposes are symptom reduction, level of entry and analysis), Sederer et al. (1996) discuss
functioning, quality of life and patient satisfac- other considerations that warrant attention in
tion, the instrument should (i) focus on one of selecting outcomes measures for specific situa-
these areas, (ii) be brief, (iii) meet ªtraditional tions. These include automation capabilities
standardsº for validity and reliability, and (iv) related to availability of software for data
be sensitive to clinical change. analysis and reporting, compatibility with the
Based on the work of Vermillion and Pfeiffer organization's existing information system,
(1993), Burlingame, Lambert, Reisinger, Neff, and the ability to enter data via an optical
and Mosier (1995) recommended four criteria scanner. They also provide advice that should
for the selection of outcomes measures. The first help guide the user in selecting the appropriate
is acceptable ªtechnical features,º that is, instrumentation:
validity and reliability. Specifically, these
authors recommended that instruments have A plan should be developed that addresses the
an internal consistency of at least 0.80, test± following questions: which patients will be in-
retest reliability of at least 0.70, and concurrent cluded in the study? What outcomes will be most
538 Therapeutic Assessment: Linking Assessment and Treatment

effected by the treatment? When will the outcomes identification of those not having the character-
be measured? Who is going to read (and use) the istic vs. the importance of optimizing the
information provided by the outcomes study? The identification of both groups. This in turn will
more specific the answer to these questions, the be dependent on the cutoff score recommended
better the choice of outcome instrument. (p. 4)
by the developer of the instrument and/or the
efficiency values that are available when other
This and other recommendations would appear cutoff scores are applied. These and related
equally applicable when selecting instruments issues are discussed more extensively in the next
for other therapeutic assessment purposes. section.
One final set of criteria should be considered
in the light of the following section on screening.
Screening for the likelihood of the presence of 4.18.5 PSYCHOLOGICAL ASSESSMENT
disorders or for the need for additional AS A TOOL FOR SCREENING
assessment requires considerations that do not One of the most significant ways in which
necessarily apply to instruments when they are psychological assessment can contribute to the
used for the other therapeutic assessment development of an economic and efficient
purposes addressed in this chapter. A major behavioral health care delivery system is by
one here is a specific consideration relative to a using it to screen potential patients for need for
screener's criterion validity. Although broadly behavioral health care services, and/or to
encompassed by the construct of ªvalidityº that determine the likelihood that the problem being
was previously discussed, it demands particular screened is a particular disorder of interest.
attention when evaluating instruments for Probably the most concise, informative treat-
screening purposes. What is being referred to ment of the topic of the use of psychological
here is the instrument's classification accuracy tests in screening for behavioral health care
or efficiency. disorders is provided by Derogatis and Della-
Classification efficiency is usually expressed Pietra (1994). In this work, these authors turn to
in terms of the following statistics: sensitivity, the Commission on Chronic Illness (1987) to
that is, the proportion of those individuals with provide a good working definition of health care
the characteristic of interest who are accurately screening in general, that being:
identified as such; specificity, that is, the
proportion of individuals not having the the presumptive identification of unrecognized
characteristic of interest who are accurately disease or defect by the application of tests,
identified as such); positive predictive power, examinations or other procedures which can be
which is the proportion of a population applied rapidly to sort out apparently well persons
identified by the instrument as having the who probably have a disease from those who
characteristic who actually do have the char- probably do not. (Commission on Chronic Illness,
acteristic, and negative predictive power, which is 1987, p. 45)
the proportion of a population identified by the
instrument as not having the characteristic who Derogatis and DellaPietra (1994) further
actually do not have the characteristic. This clarify the nature and the use of screening
information can provide the clinical psycholo- procedures, stating that:
gist and other evaluators with empirically based
information that is useful in the type of decision- the screening process represents a relatively un-
making requiring the selection of one of two refined sieve that is designed to segregate the
choices. The questions answered are typically cohort under assessment into ªpositivesº who
those of the ªyes/noº type, such as ªIs the presumptively have the condition, and ªnegativesº
who are ostensibly free of the disorder. Screening is
patient depressed or not?º or ªDoes the patient not a diagnostic procedure per se. Rather, it
have a psychological problem significant en- represents a preliminary filtering operation that
ough to require treatment?º The reader is identifies those individuals with the highest prob-
referred to Baldessarini, Finkelstein, and Arana ability of having the disorder in question for
(1983) for a discussion of issues related to the subsequent specific diagnostic evaluation. Indivi-
use of these statistics. duals found negative by the screening process are
In evaluating these statistics, one must not evaluated further. (p. 23)
consider a few very important issues. One is
the degree to which the clinician is willing to The most important aspect of any screening
accept false-positives or false-negatives. This procedure is the efficiency with which it can
will be a function of the importance of provide information useful to clinical decision-
maximizing the correct identification of those making. In the area of clinical psychology, the
with the particular characteristic of interest vs. most efficient and thoroughly investigated
the importance of maximizing the correct screening procedures involve the use of psycho-
Psychological Assessment as a Tool for Screening 539

logical assessment instruments. As implied by preferable approach in all instances, there are
the foregoing, the power or utility of a psycho- situations in which a maximization approach is
logical screener lies in its ability to determine, more desirable. For example, a psychiatric
with a high level of probability, whether the hospital with an inordinately high rate of
respondent does or does not have a particular inpatient suicide attempts begins to employ a
disorder or condition, or whether he or she is or screener designed to help identify patients with
is not a member of a group with clearly defined suicide potential as part of its admission
characteristics. In daily clinical practice, the procedures. The hospital adjusts the classifica-
most commonly used screeners are those de- tion cutoff score to a level that identifies all
signed specifically to identify some aspect of suicidal patients in the screener's normative
psychological functioning or disturbance or group. This cutoff score is then applied to all
provide a broad overview of the respondent's patients being admitted to the hospital for the
point-in-time mental status. Examples of purpose of identifying those requiring an
problem-specific screeners include the Beck extensive evaluation for suicide potential. This
Depression Inventory (BDI; Beck, Rush, Shaw, not only increases the number of true positives,
& Emery, 1979) and the State±Trait Anxiety but it also decreases the specificity and increases
Inventory (STAI; Spielberger, 1983). Examples the number of false positives. However, the
of screeners for more generalized psychopathol- trade-off of identifying more suicidal patients
ogy or distress include the SA-45 and BSI. early on with having more nonsuicidal patients
receiving suicide evaluations would appear
worthwhile for the hospital's purposes. Simi-
4.18.5.1 Research-based Use of Psychological larly, in other instances, maximization of
Screeners specificity may be the preferred approach. For
example, an MCO might wish to use a measure
The establishment of a system for screening of overall level of psychological distress to
for a particular disorder or condition involves identify those covered lives that are not in need of
determining what it is one wants to screen in or behavioral health care services. Sensitivity will
screen out, at what level of probability one feels decrease but, for the MCO's purposes, this might
comfortable in making that decision, and how be quite acceptable.
many incorrect classifications or what percen- Hsiao, Bartko, and Potter (1989) note that ªa
tage of errors one is willing to tolerate. Once it is diagnostic test will not have a unique sensitivity
decided what one wishes to screen for, one then and specificity. Instead, for each diagnostic test,
must turn to the instrument's classification the relationship between sensitivity and speci-
efficiency statistics, that is, sensitivity, specifi- ficity depends on the cutoff point chosen for the
city, positive predictive power (PPP), and testº (p. 665). The effect of employing individual
negative predictive power (NPP), for the classification cutoff points can be presented via
information necessary to determine whether a the use of receiver operating characteristic
given instrument is suitable for the intended (ROC) curves. These curves are nothing more
purpose(s). than a plotting of the resulting true positive rate
Recall that sensitivity refers to the proportion (sensitivity) against the false positive rate for
of those with the characteristic of interest who each cutoff score that might be employed with a
are accurately identified as such by an instru- test used for classification purposes. The
ment or procedure, while specificity refers to the plotting allows for a graphical representation
proportion of those not having the character- of what may be gained and/or lost by shifting
istic of interest who are accurately identified. cutoff scores. The resulting area underneath the
The cutoff score, index value, or other criterion curve provides an indication of how well the test
used for classification can be adjusted to performs. Development of ROC curves from
maximize either sensitivity or specificity. How- available data for a test being considered for
ever, maximization of one will necessarily result screening purposes is recommended. The reader
in a decrease in the other, thus increasing the is referred to Hsiao et al. and Metz (1978) for a
percentage of false-positives (with maximized more detailed discussion of ROC curves and
sensitivity) or false-negatives (with maximized their use.
specificity). Stated differently, false-positives In day-to-day clinical work, an instrument's
will increase as specificity decreases, while false- PPP and NPP can provide information that is
negatives will increase as sensitivity decreases more useful than sensitivity and specificity. As
(Elwood, 1993). Elwood (1993) has pointed out,
Another approach is to optimize both sensi-
tivity and specificity, thus yielding a fairly even Although sensitivity and specificity do provide
balance of true positives and true negatives. important information about the overall perfor-
Although optimization might seem to be the mance of a test, their limitation in classifying
540 Therapeutic Assessment: Linking Assessment and Treatment

individual subjects becomes evident when they are the effect of eliminating from consideration in-
considered in terms of conditional probabilities. dividuals with low likelihood of having the dis-
Sensitivity is P (+/d), the probability (P) of a order, and simultaneously raising the base rate of
positive test result (+) given that the subject has the condition in the remaining sample. (p. 45)
the target disorder (d). However, the task of the
clinicians in assessing individual patients is just In summary, PPP and NPP can provide
the opposite: determining P (d/+), the probabil- information that is quite valuable to those
ity that a patient has the disorder given that he or
making important clinical decisions, such as
she obtained an abnormal test score. In the same
way, specificity expresses P (7/7d), the prob- determining need for behavioral health care
ability that a patient will have a negative test services, assigning diagnoses, or determining
result given that he or she does not have the appropriate level of care. However, these users
disorder. Here again, the task confronting the must be cognizant of the manner in which the
clinician is usually just the opposite: determining predictive powers may change with the popula-
P (7d/7), the probability that the patient does tion to which the test or procedure is applied.
not have the disorder given a negative test result.
(p. 410)
4.18.5.2 Implementation of Screeners into the
Daily Work Flow of Service Delivery
A note of caution is warranted when evalu-
ating the two predictive powers of a test. Unlike The utility of a screening instrument is only as
sensitivity and specificity, both PPP and NPP good as the degree to which it can be integrated
are affected and change according to the pre- into an organization's daily regimen of service
valence or base rate at which the condition or delivery. This, in turn, depends on a number of
characteristic of interest (i.e., that which is being factors. The first is the degree to which the
screened by the test) occurs within a given administration and scoring of the screener is
setting. As Elwood (1993) reports, the lowering quick and easy, and the amount of time required
of base rates results in lower PPPs while to train the provider's staff to successfully
increasing base rates result in higher PPPs. incorporate the screener into their day-to-day
The opposite trend is true for NPPs. He notes activities.
that this is an important consideration because The second factor relates to its use. Here, the
clinical tests are frequently validated using screener is not used for anything other than
samples in which the prevalence rate is 0.50, determining the likelihood that the patient does
or 50%. Thus, it is not surprising to see a test's or does not have the specific condition or
PPP drop in ªreal-lifeº applications where the characteristic the instrument is designed to
prevalence is lower. assess. Use for any other purpose (e.g., assigning
Derogatis and DellaPietra (1994) indicate a diagnosis based solely on screener results,
that a procedure referred to as ªsequential determining the likelihood of the presence of
screeningº may provide at least a partial other characteristics) only serves to undermine
solution to the limitations or other problems the integrity of the instrument in the eyes of
that low base rates may pose for the predictive staff, payers, and other parties with a vested
powers of an instrument. Sequential screening interest in the screening process.
essentially involves the administration of two The third factor has to do with the ability of
screeners, each of which measures the condition the provider to act on the information. It must
of interest, and two-phase screening. In the first be clear how the clinician should proceed based
phase, one screener is administered to the low on the information available.
base rate population. The purpose of this is to The final factor is staff acceptance and
identify those individuals without the condition, commitment to the screening process. This
thus requiring relatively good specificity. These comes only with a clear understanding of the
individuals are eliminated from involvement in importance of the screening, the usefulness of
the second phase, resulting in an increase in the the obtained information, and how the screen-
prevalence of the condition among those who ing process is to be incorporated into the
remain. This group is then administered another organization's business flow.
screener of equal or better sensitivity. With the Ficken (1995) provides an example of how
increased prevalence of the condition in the screeners can be integrated into an assessment
remaining group, the false positive rate will be system designed to assist primary care physi-
much lower. As Derogatis and DellaPietra point cians to identify patients with psychiatric
out, disorders. This system (which also allows for
the incorporation of practice guidelines) seems
Sequential screening essentially zeros in on a high- to take into account the first three utility-
risk subgroup of the population of interest by related factors listed above. It begins with the
virtue of a series of consecutive sieves. These have administration of a screener that is highly
Psychological Assessment as a Tool for Treatment Planning 541

sensitive and specific to DSM- or ICD-related potential obstacles to therapy, areas of potential
disorders. Ficken indicates that screeners growth, and problems of which the patient may
should require no more than 10 minutes to not be consciously aware. Moreover, both
complete, and that ªtheir administration must Butcher and Appelbaum (1990) viewed testing
be integrated seamlessly into the standard as a means of quickly obtaining a second
clinical routineº (p. 13). Somewhat similarly opinion. Other benefits of the results of psy-
to the sequence described by Derogatis and chological assessment, identified by Appel-
DellaPietra (1994), positive findings would lead baum, include assistance in identifying patient
to a second level of testing. Here, another strengths and weaknesses, identification of the
screener that meets the same requirements as complexity of the patient's personality, and
those for the first screener and also affirms or establishment of a reference point or guide to
rules out a diagnosis would be administered. refer to during the therapeutic episode.
Positive findings would lead to additional The types of information that can be derived
assessment for treatment planning purposes. from patient assessment and the manner in
Consistent with standard practice, Ficken which it is applied for this purpose are quite
recommends confirmation of screener findings variedÐa fact that will become evident below.
by a qualified psychologist or physician. Nevertheless, Strupp (see Butcher, 1990) prob-
ably provided the best summary of the potential
contribution of psychological assessment to
4.18.6 PSYCHOLOGICAL ASSESSMENT treatment planning, stating that ªcareful assess-
AS A TOOL FOR TREATMENT ment of patient's personality resources and
PLANNING liabilities is of inestimable importance. It will
predictably save money and avoid misplaced
The administration of screeners is only one
therapeutic effort; it can also enhance the
way in which psychological assessment can
likelihood of favorable treatment outcomes
serve as a valuable tool for treatment planning.
for suitable patientsº (pp. v±vi).
However, many would argue that it is the most
limited way in which this tool can be used for
planning a course of treatment. When employed
by a trained clinician, psychological assessment 4.18.6.1 Assumptions About Treatment
can provide information that can greatly Planning
facilitate and enhance the planning of the
The introduction to this section presented a
therapeutic intervention for the individual
broad overview of ways in which psychological
patient.
assessment can assist in devising and success-
The importance of treatment planning has
fully implementing plans of treatment for
received significant attention during recent
behavioral health care patients. These and
years. The reasons for this were summarized
other benefits will be discussed in greater detail
previously by this author (Maruish, 1990) as
below. However, it is important to first clarify
follows:
what treatment planning is and some of the
general, implicit assumptions that one typically
Among important and interrelated reasons . . .
can make about this important therapeutic
[are] concerted efforts to make psychotherapy
more efficient and cost effective, the growing activity.
influence of ªthird partiesº (insurance companies For the purpose of this discussion, the term
and the federal government) that are called upon ªtreatment planningº indicates that part of a
to foot the bill for psychological as well as medical therapeutic episode in which the treatment
treatments, and society's disenchantment with provider develops a set of goals for an individual
open-ended forms of psychotherapy without presenting with behavioral health care pro-
clearly defined goals. (p. iii) blems, and outlines the specific means by which
he/she or other resources will assist the patient
The role that psychological assessment can in achieving those goals in the most efficient
play in planning a course of treatment for manner. General assumptions underlying the
behavioral health care problems is significant. treatment planning process are as follows.
Butcher (1990) indicated that information avail- (i) The patient is experiencing behavioral
able from instruments such as the MMPI-2 can health problems that have been identified either
not only assist in identifying problems (see by themself or by another party. Common
above) and establishing communication with external sources of problem identification in-
the patient (see below), it can also help ensure clude the patient's spouse, parent, teacher,
that the plan for treatment is consistent with the employer, and the legal system.
patient's personality and external resources. In (ii) The patient experiences some degree of
addition, psychological assessment may reveal internal and/or external motivation to eliminate
542 Therapeutic Assessment: Linking Assessment and Treatment

or reduce the identified problems. An example 4.18.6.2.1 Problem identification


of external motivation to change is the potential
Probably the most common use of psycho-
loss of job or marriage if problems are not
logical assessment in the service of treatment
resolved.
planning is for the purpose of problem
(iii) The goals of treatment are tied either
identification. Often, the use of psychological
directly or indirectly to the identified problems.
testing per se is not needed to identify what
(iv) The goals of treatment have definable
problems the patient is experiencing. He or she
criteria for achievement, are indeed achievable
will either tell the clinician directly without
by the patient, and are developed in collabora-
questioning, or they will readily admit to their
tion with the patient.
problem(s) during the course of a clinical
(v) The prioritization of goals is reflected in
interview. However, this is not always the case.
the treatment plan.
The value of psychological testing becomes
(vi) The patient's progress toward achieve-
apparent in those cases where the patient is
ment of the treatment goals can be tracked and
hesitant or unable to identify the nature of his or
compared against an expected path of improve-
her problems. However, with a motivated and
ment in either a formal or informal manner.
engaged patient who responds to items on a well
This expected path of improvement may be
validated and reliable test in an open and honest
based on the clinician's experience or (ideally)
manner, the process of identifying what brought
on objective data gathered on patients similar to
the patient to treatment also may be greatly
the patient.
facilitated. Cooperation shown during testing
(vii) Deviations from the expected path of
may be attributable to the nonthreatening
improvement will lead to a modification in the
nature of responding to questions presented
treatment plan, followed by subsequent mon-
on paper or a computer monitor (as opposed to
itoring to determine the effectiveness of the
those posed by another human being); the
alteration.
subtle, indirect, or otherwise nonthreatening
These assumptions should not be considered
nature of the questions (compared to those
exhaustive, nor are they reflective of what
asked by the clinician); instrumentation that
actually occurs in all situations. For example,
ªcasts a wider netº than the clinician in his or her
some patients seen for therapeutic services may
interview with the patient; or any combination
have no motivation to change. As may be seen
of these reasons.
in juvenile detention settings or in cases where
In addition, the nature of some of the more
children are brought to treatment by the
commonly used psychological test instruments
parents, their participation in treatment is
allows for the identification of secondary
forced, and they may engage in intentional
problems of significant severity that might
efforts to sabotage any therapeutic intervention.
otherwise be overlooked. Multidimensional
Also, it is likely that there are still clinicians who
inventories such as the MMPI-2 and the PAI
identify and prioritize treatment goals without
are good examples of these types of instruments.
the direct input of the patient. Nevertheless, the
Moreover, these instruments may be sensitive to
assumptions above represent this author's view
other problems or patient traits or character-
of the aspects of treatment planning that have a
istics that may not necessarily be problems but
direct bearing on the manner in which psycho-
which may exacerbate or otherwise contribute
logical assessment can best serve treatment
to the maintenance of the patient's problems.
planning efforts.
Note that the type of problem identification
described here is different from that conducted
during screening (see above). Whereas screening
is focused on determining the presence or
4.18.6.2 The Benefits of Psychological absence of a single problem, problem identifica-
Assessment for Treatment Planning tion generally takes a broader view and
investigates the possibility of the presence of
As has already been touched upon, there are
multiple problem areas. At the same time, there
several ways in which psychological assessment
is also an attempt to determine the extent to
can assist in the planning of treatment for
which the problem area(s) affect the patient's
behavioral health care patients. Following is a
ability to function.
discussion of the more common and evident
contributions that assessment can make to
treatment planning efforts. These can be
4.18.6.2.2 Problem clarification
organized into four general categories: problem
identification, problem clarification, identifica- Psychological testing can often assist in the
tion of important patient characteristics, and clarification of a known problem. Through tests
monitoring of treatment progress. designed for use with individuals presenting
Psychological Assessment as a Tool for Treatment Planning 543

problems similar to the patient's, aspects of serve as potential allies in the therapeutic
identified problems can be elucidated. This will process. In general, the most important role-
improve the patient's and clinician's under- functioning domains for assessment would be
standing of the problem and likely lead to a those related to work or school performance,
better treatment plan. The three most important interpersonal relationships, and activities of
types of information that can be gleaned for this daily living (ADLs).
purpose are the severity of the problems, the
complexity of the problems, and the degree to
4.18.6.2.3 Identification of important patient
which the problems impair the patient's ability
characteristics
to function in one or more life roles.
The manner in which a patient is treated The identification and clarification of the
depends a great deal on the severity of his or her patient's problems is of key importance in
problem. In particular, severity has a great planning a course of treatment for the patient.
bearing on the setting in which the behavioral However, there are numerous types of
health care intervention is provided. Those nonproblem-oriented patient information that
whose problems are so severe that they are can be useful in planning treatment and can be
considered a danger to themselves or others rather easily identified through the use of
more often than not are best suited for inpatient psychological assessment instruments. The vast
treatment, at least until dangerousness is no majority of treatment plans are developed or
longer an issue. Similarly, problem severity may modified with consideration of at least some of
be a primary criterion for an evaluation for a these other patient characteristics. The excep-
medication adjunct to treatment. Severity also tions mostly are found with clinicians or
may have a bearing on the type of psychother- programs that take a ªone size fits allº approach
apeutic approach that is taken by the clinician. to the treatment of general or specific types of
For example, it may be more productive for the disorders. It is beyond the scope of this chapter
clinician to take a supportive role with severe to provide an exhaustive list of what other types
cases; all things being equal, a more confronta- of information may be available to the
tional approach may be more appropriate with clinician. However, a few are particularly worth
patients with problems in the mild to moderate mentioning.
range of severity. Probably the most useful type of nonproblem-
As alluded to above, the problems of patients oriented information that can be gleaned from
seeking behavioral health care services are psychological assessment results is the identifi-
frequently multidimensional. Patient and en- cation of the patient characteristics or condi-
vironmental factors that play into the formation tions that can serve as assets or areas of strength
and maintenance of a problem, along with the for the patient in working toward achieving the
latter's relationship with other problems, all therapeutic goals. For example, Morey and
contribute to its complexity. Knowing the Henry (1994) point to the utility of the PAI's
complexity of the target problems is invaluable Nonsupport scale in identifying whether the
in devising an effective treatment plan. Again, patient perceives an adequate social support
multidimensional instruments or batteries of network, this being a predictor of positive
tests measuring specific aspects of psychological therapeutic progress. Other examples include
dysfunction serve this purpose well. ªnormalº personality characteristic informa-
As with problem severity, knowledge of the tion, such as that which can be obtained from
complexity of a patient's psychological pro- Gough, McClosky, and Meehl's Dominance
blems can help the clinician and patient in many and Social Responsibility scales (1951, 1952)
aspects of treatment planning, including deter- developed for use with the MMPI/MMPI-2.
mination of appropriate setting, therapeutic Greene (1991) indicates that those with high
approach, need for medication, and other scores on the Dominance scale are described as
matters on which important decisions must be ªbeing able to take charge of responsibility for
made. However, possibly of equal importance their lives. They are poised, self-assured, and
and concern to the patient and outside parties confident of their own abilitiesº (p. 209). Gough
(spouse, employer, school, etc.) is the extent to and his colleagues interpreted high scores on the
which these problems affect the patient's ability Social Responsibility scale as being indicative of
to function in his or her role as parent, child, individuals who, among other things, trust the
employee, student, friend, and so on. Data world, are self-assured and poised, and stress the
gathered from the administration of measures need for one to carry his or her share of duties.
of role functioning can provide information that Thus, scores on these scales may reveal some
not only clarifies the impact of the patient's important aspects of patient functioning that
problems and serves to establish role-specific can be used in the service of affecting therapeutic
goals, but also identifies other parties that may change.
544 Therapeutic Assessment: Linking Assessment and Treatment

Similarly, knowledge of the patient's weak- Moreland (1996) points out how psychologi-
nesses or deficits may impact the type of cal assessment can assist in determining whether
treatment plan that is devised. Greene and the patient deals with problems through inter-
Clopton (1994) provided numerous types of nalizing or externalizing behaviors. All things
deficit-relevant information from the MMPI-2 being equal, internalizers would probably profit
Content Scales that have implications for most from an insight-oriented approach rather
treatment planning. For example, a clinically than a behaviorally oriented approach. The
significant score (T 4 64) on the Anger scale reverse would be true for externalizers. In
should lead one to consider the inclusion of addition, cognitive factors also are important.
training in assertiveness and/or anger control as Knowing that intelligence test results indicate
part of the patient's treatment. On the other an average or above IQ can assist the clinician in
hand, uneasiness in social situations, as sug- determining whether a patient will be able to
gested by a significantly elevated score on either benefit from a cognitive approach.
the Low Self-Esteem or Social Discomfort scale,
suggests that a supportive approach to the
4.18.6.2.4 Monitoring of progress along the path
intervention would be beneficial, at least
of expected improvement
initially.
Moreover, use of specially designed scales Information from repeated testing during the
and procedures can provide information related treatment process can help the clinician to
to the patient's ability to become engaged in the determine if the treatment plan is appropriate
therapeutic process. For example, the MMPI-2 for the patient at that particular point in time.
Negative Treatment Indicators content scale Thus, many clinicians use psychological assess-
developed by Butcher and his colleagues ment to determine whether their patients are
(Butcher, Graham, Williams, & Ben-Porath, showing the expected improvement as treatment
1989) may be useful in determining whether the progresses. If not, adjustments can be made.
patient is likely to be resistant to any form of These adjustments may reflect the need for a
ªtalkº therapy. Morey and Henry (1994) have more intensive or aggressive treatment ap-
supplied algorithms utilizing T scores for proach (e.g., increased number of psychother-
various PAI scales to make statements about apeutic sessions each week, addition of a
the presence of positive characteristics, such as medication adjunct) or for a less intensive
the presence of sufficient distress to motivate approach (e.g., reduce or terminate medication,
engagement in treatment, the ability to form a transfer from inpatient to outpatient care).
therapeutic alliance, and the capacity to utilize Either way, this may require further retesting in
psychotherapy. The Therapeutic Reactance order to determine whether the treatment
Scale (Dowd, Milne, & Wise, 1991) is yet revisions have impacted the course of change
another example of an instrument from which in the expected direction. This process may be
the clinician can be forewarned of potential repeated any number of times. In-treatment
resistance to therapeutic intervention. retestings also can provide information relevant
Other types of patient characteristics that can to the decision of when to terminate treatment.
be identified through psychological assessment The goal of monitoring is to determine
have implications for the choice of the ther- whether treatment is ªon trackº with the
apeutic approach and thus can contribute progress that is expected at a given point in
significantly to the treatment planning process. time. When and how often one might assess the
Beutler and his colleagues (Beutler & Clarkin, patient is dependent on a few factors. The first is
1990; Beutler, Wakefield, & Williams, 1994; the instrumentation. Many instruments are
Beutler & Williams, 1995) have identified four designed to assess the patient's status at the
patient characteristics that are thought to be time of testing. Items on these measures are
important to matching patients and treatment generally worded in the present tense (e.g., ªI
approach for maximized therapeutic effective- feel tense and nervous,º ªI feel that my family
ness. These include symptom severity, symptom loves and cares about meº). Changes from one
complexity, coping style, and potential resis- day to the next on the constructs measured by
tance to treatment. At different points in time, the instrument should be reflected in the test
other patient variables also have been identified results.
by these investigators as important considera- Other instruments, however, ask the patient
tions in the selection of the best treatment for a to indicate if a variable of interest has been
given patient. These include the problem- present, or how much or to what extent it has
solving phase the patient has reached (Beutler occurred during a specific time period in the
& Clarkin, 1990), and subjective distress and past. The items usually are asked in the context
social support (L.E. Beutler, personal commu- of something like ªDuring the past month, how
nication, January 15, 1996). often have you . . .º or ªDuring the past week, to
Psychological Assessment as a Therapeutic Intervention 545

what extent has . . .º Readministration of a itself has received more than passing attention
measure containing interval-of-time-specific during the past few years. ªTherapeutic assess-
items or subsets of items should be undertaken mentº with the MMPI-2 has received particular
only after a period of time equivalent to or attention primarily through the work of Finn
longer than the time interval to be considered in and his associates (Finn, 1996a, 1996b; Finn &
responding to the items has past. For example, Martin, in press; Finn & Tonsager, 1992).
an instrument which asks the patient to consider Finn's approach appears to be applicable with
how much certain symptoms have been proble- instruments or batteries of instruments that
matic during the past seven days should not be provide multidimensional information relevant
readministered for at least seven days. The to the concerns of patients seeking answers to
responses elicited during a readministration questions related to their mental health status.
that occurs less than seven days after the first The approach espoused by Finn thus will be
administration would include the patient's presented here as a model for deriving direct
consideration of his or her status during the therapeutic benefits from the psychological
previously considered time period. This may assessment experience.
make interpretation of the change of symptom
status (if any) from the first to the second
assessment difficult if not impossible. 4.18.7.1 What Is Therapeutic Assessment?
Methods to determine whether clinically
In discussing the use of the MMPI-2 as a
significant change has occurred from one point
therapeutic intervention, Finn (1996a) describes
in time to another have been developed and can
an assessment procedure whose goal is to
be used for treatment monitoring purposes.
ªgather accurate information about clients . . .
These are discussed in the outcomes assessment
and then use this information to help clients
section of this chapter below. However, for
understand themselves and make positive
monitoring purposes, another approach to
changes in their livesº (p. 3). Elaborating on
evaluating therapeutic change may be superior.
this procedure and extending it to the use of any
This approach may be referred to as the ªglide
test, Finn and Martin (in press) describe
pathº approach, with the term referring to the
therapeutic assessment as
narrow descent course or path that airplanes
must follow when landing. Deviation from the
flight glide path requires corrections in the collaborative, interpersonal, focused, time limited,
and flexible. It is . . . very interactive and requires
plane's speed, altitude, and/or attitude in order the greatest of clinical skills in a challenging role
to return to the glide path and a safe landing. for the clinician. It is unsurpassed in a respectful-
R.L. Kane (personal communication, July ness for clients: collaborating with them to address
22, 1996) has indicated that just as a pilot has the their concerns (around which the work revolves),
instrumentation to alert him or her about the acknowledging them as experts on themselves and
plane's position on the glide path, the clinician recognizing their contributions as essential, and
may use psychological assessment instruments providing to them usable answers to their ques-
to track how well the patient is following the tions in a therapeutic manner.
glide path of treatment. The glide path in this
case represents expected improvement over time The ultimate goal of therapeutic assessment is
in one or more measurable areas of functioning to provide an experience for the client that will
(e.g., symptom severity, social role functioning, allow him/her to take steps toward greater
occupational performance). The expectations psychological health and a more fulfilling life.
would be based on objective data obtained from This is done by recognizing the client's char-
similar patients at various points during their acteristic ways of being, understanding in a
treatment and would allow for minor deviations meaningful, idiographic way the problems the
from the path. The end of the glide path is one or client faces, providing a safe environment for
more specific goals that are part of the treatment the client to explore change, and providing the
plan. Thus, ªarrivalº at the end of the glide path opportunity for the client to experience new
signifies the attainment of specific treatment ways of being in a supportive environment.
goals. Simply stated, therapeutic assessment may be
considered an approach to the assessment of
mental health patients in which the patient is not
4.18.7 PSYCHOLOGICAL ASSESSMENT only the primary provider of information
AS A THERAPEUTIC needed to answer questions, but also is actively
INTERVENTION involved in formulating the questions that are to
be answered by the assessment. Feedback
The use of psychological assessment as a regarding the results of the assessment is
means of therapeutic intervention in and of provided to the patient and is considered a
546 Therapeutic Assessment: Linking Assessment and Treatment

primary, and possibly the principal element of reasonably understandable to the person assessed
the assessment process. Thus, the patient or to another legally authorized person on behalf of
becomes a partner in the assessment process; the client. Regardless of whether the scoring and
as a result, therapeutic and other benefits accrue. interpretation are done by the psychologist, by
assistants, or by automated or other outside
The reader should note that in this section, the
services, psychologists take reasonable steps to
term ªtherapeutic assessmentº is used to denote ensure that appropriate explanations of results
the specific approach advocated by Finn and his are given. (p. 8)
colleagues for using the psychological assess-
ment process as an opportunity for therapeutic
Many clinicians and other psychologists in-
intervention. It should not be confused with the
volved in assessment activities (e.g., counseling
more general term ªtherapeutic psychological
psychologists, neuropsychologists) have had to
assessmentº as it has been employed throughout
modify their practice routine to accommodate
this chapter; ªtherapeutic assessmentº is but one
this requirement. Some view this requirement as
aspect of therapeutic psychological assessment.
resulting in an improvement in the quality of
their services; others likely see it as nothing
4.18.7.2 The Impetus for Therapeutic more than an inconvenience which, in the era of
Assessment managed care and limited access to treatment,
further limits the amount of time they have to
To say that clinical psychologists performing work with a patient. However, most would
psychological assessments in mental health agree that the patient has benefited from the
settings traditionally have never shared much required feedback.
of their findings with their patients is probably Finn and Tonsager (1992) identified other
not an overstatement. A common scenario factors that may have contributed to the recent
throughout many mental health settings was interest in providing patients with assessment
that a patient being treated by a psychologist feedback. One is another external influence,
was evaluated by the latter, or a patient was that is, the recognition of the patient's right to
referred to the psychologist by another mental see their medical and psychiatric health care
health professional for assessment only. In the records. However, they also point to several
first instance, the degree to which the psychol- clinically and research-based findings and
ogist might directly share the results of the often impressions that suggest that therapeutic as-
lengthy and expensive evaluation would vary. sessment enhances patient care through the
Generally, a detailed review of the findings facilitation of patient±therapist rapport, coop-
would be a rarity. In the latter instance, the eration during the assessment process, positive
patient would be evaluated, a report of the feelings about the process and the clinician,
results dictated, and a copy of the report sent improvement in mental health status, and
back to the referring clinician. In either instance, feelings of being understood by another. In
the purpose of the assessment probably would addition, Finn and Tonsager refer to Finn and
be to answer questions posed by the treating Butcher's (1991) summary of potential benefits
clinician. Unfortunately, the patient and his or that may accrue from providing test results
her concerns as they related to the psychological feedback. The listed benefits, based on clinical
assessment were typically of only secondary experience, include increased feelings of self-
consideration, if any. esteem and hope, reduced symptomatology and
Fortunately, recent occurrences have begun feelings of isolation, increased understanding
to change the way in which assessment and self-awareness, and increased motivation to
information is used. Consequently, the degree seek or be more actively involved in mental
to which the patient is involved in the assess- health treatment. Finally, Finn and Martin (in
ment process is changing. One reason for this is press) note that the therapeutic assessment
the relatively recent revision of the ethical process can lead to increased feelings of mastery
standards of the APA (1992). This revision and control and decreased feelings of aliena-
included a mandate for psychologists to provide tion. At the same time, it can serve as a model
feedback to clients whom they test. According for relationships that can result in mutual
to ethical standard 2.09: respect and the patient being seen for who he or
she is.
Unless the nature of the relationship is clearly
explained to the person being assessed in advance
and precludes provision of an explanation of results 4.18.7.3 The Therapeutic Assessment Process
(such as in some organizational consulting, pre-
employment or security screenings, and forensic Finn (1996a) has outlined a three-step
evaluations), psychologists ensure that an explana- procedure for therapeutic assessment using
tion of the results is provided using language that is the MMPI-2. As indicated above, it should
Psychological Assessment as a Therapeutic Intervention 547

work equally well with other multidimensional relationships, implications for treatment, diag-
instruments that one might select. Finn de- nostic impression, and recommendations. Un-
scribes this procedure as one to be used in those fortunately, clinicians who do not or cannot use
situations in which the patient is seen only for the MMPI-2 or other well-researched, multi-
assessment (i.e., the patient is not to be treated dimensional instruments will not have the same
later by the assessing clinician). From the amount or type of data available to them. (This
present author's standpoint, the procedures should not preclude them from identifying the
are equally applicable for use by clinicians who types of valid and useful information that can
test patients whom they later treat. With these be derived from the instruments and organizing
points in mind, the three-step procedure is it into a usable form for presentation to the
summarized below. patient.) This is followed by a determination of
how to present the results to the patient. This
can be guided by the clinician asking himself or
4.18.7.3.1 Step 1: The initial interview
herself the following questions:
According to Finn (1996a), the initial inter-
view with the patient serves multiple purposes. (i) How do the (test) findings relate to the client's
It provides an opportunity to build rapport, or goals?
to increase rapport if a patient±therapist (ii) What are the most important findings of the
relationship already exists. The assessment task (tests administered)?
(iii) To what extent is the client likely to already
is presented as a collaborative one, and the
know about and agree with the (test) findings?
patient is given the opportunity to identify (iv) How much new information is the client likely
questions that he or she would like answered to be able to integrate in the feedback session?
using the assessment data. Background infor- (v) What is likely to happen if the client becomes
mation related to the patient-identified ques- overwhelmed or is presented with findings that are
tions is subsequently gathered. Any reservations greatly discrepant from his/her current self-con-
about participating in the therapeutic assess- cept? (p. 34)
ment process (e.g., confidentiality, previous
negative experiences with assessment) are dealt As a final point in this step, Finn (1996a)
with in order to facilitate maximal involvement indicates that the clinician must determine what
in the process. is the best way to present the information to the
After responding to the patient's concerns, patient so that he or she can accept and
Finn (1996a) recommends that the clinician integrate the information while maintaining
restate the questions posed earlier by the patient. his or her sense of identity and self-esteem.
This ensures the accuracy of what the patient This also is a time when the clinician can
would like to have addressed by the assessment. identify information that he or she may not
The patient also is encouraged to ask questions wish to reveal to the patient because it is not
of the clinician, thus reinforcing the collabora- important to answering the patient's questions;
tive context or atmosphere that the clinician is doing so may negatively affect the collaborative
trying to establish. Step 1 is completed as the relationship. In addition, the clinician may want
instrumentation and its administration, as well to prepare for presenting those aspects of
as the responsibilities and expectations of each feedback that he or she feels will be most
party, are clearly defined and the particulars of problematic for him or her (i.e., the clinician)
the process (e.g., date and time of assessment, by role-playing with a colleague.
date and time of the feedback session, clinician
fees) are discussed and agreed upon.
4.18.7.3.3 Step 3: The feedback session
As Finn (1996a) states: ªThe overriding goal
4.18.7.3.2 Step 2: Preparing for the feedback
of feedback sessions is to have a therapeutic
session
interaction with clientsº (p. 44). Thus, the initial
Upon completion of the administration and tasks of the feedback session are focused on
scoring of the instrumentation used during the setting the stage for this type of encounter. This
assessment, the clinician first outlines all results is accomplished by allaying any anxiety the
obtained from the assessment, including those patient may have about the session, reaffirming
not directly related to the patient's previously the collaborative relationship, and familiarizing
stated questions. Finn (1996a) presents a well- him or her with the presentation of the test
organized outline for the types of information results (e.g., explaining the profile sheet upon
that the trained user can extract from MMPI-2 which the results are graphed, discussing the
data. These include response consistency, test- normative group to which he or she will be
taking attitude, distress and disturbance, major compared, providing an explanation of stan-
symptoms, underlying personality, behavior in dard scores).
548 Therapeutic Assessment: Linking Assessment and Treatment

When the session preparation is completed, attempts to elicit feedback and reactions from
the clinician begins providing feedback to the the patient about the assessment.
patient (Finn, 1996a). This, of course, is The reader should note that the preceding
centered on answering the questions posed by summary presents only the key technical aspects
the patient during Step 1. Beginning with a of the therapeutic assessment procedures
positive finding from the assessment, the espoused by Finn and his associates. Much of
clinician proceeds to first address those ques- the clinical/dynamic aspect of this approach has
tions that the patient is most likely to accept. He not been addressed because of the focus of this
or she then carefully moves to the findings that chapter. Those interested in incorporating the
are more likely to be anxiety-arousing for the process into their clinical practice are encour-
patient and/or challenge his or her self-concept. aged to read Finn (1996a).
A key element to this step is to have the patient
verify the accuracy of each finding and provide a
real-life example of the interpretation that is 4.18.7.4 Empirical Support for Therapeutic
offered. Alternately, one should ask the patient Assessment
to modify the interpretation to make it more in
line with how he or she sees themselves and their Noting the lack of direct empirical support
situation. Finn (1996a) provides specific sugges- for the therapeutic effects of sharing test results
tions about how to deal with a rejection of a with patients, Finn and Tonsager (1992)
finding, the final suggestion being to allow the investigated the benefits of providing feedback
client to disagree with but not totally dismiss the to university counseling center clients regarding
finding. This leaves the door open for re- their MMPI-2 results. A total of 32 subjects
presenting the finding at another time when the underwent therapeutic assessment and feed-
patient is more open to accepting it. back procedures similar to those described
Finn (1996a) recommends that the clinician above while on the counseling center's waiting
should end the session by responding to any list. Another 28 subjects were recruited from
additional questions the patient may have; the same waiting list to serve as a control
confirming that the patient has accurately group. There were no significant differences
understood the information that was presented; between the two groups on any important
giving permission for the patient to contact the demographic or examiner contact-interval
clinician should further questions arise; and (in variables.
the assessment-only arrangement) termination Instead of receiving feedback, Finn and
of the relationship. Throughout the session, the Tonsager's (1992) control group received non-
clinician maintains a supportive stance with therapeutic attention from the examiner. How-
regard to any affective reactions to the findings. ever, they were administered the same
dependent measures as the feedback group at
the same time as the experimental group
received feedback. They also were administered
4.18.7.3.4 Additional steps
the same dependent measures as the experi-
Finn and Martin (in press) indicate two mental group two weeks later (i.e., two weeks
additional steps that may be added to the after the experimental group received the feed-
therapeutic assessment process. The purpose of back) in order to determine if there were
the first additional step, referred to as an differences between the two groups on those
ªassessment intervention sessionº essentially is dependent measures. These measures included a
to clarify initial test findings through the self-esteem questionnaire, a symptom checklist
administration of additional instruments. For (i.e., the SCL-90-R), a measure of private and
example, Finn and Martin explain how MMPI- public self-consciousness, and a questionnaire
2 findings can be further fleshed out through a assessing the subjects' subjective impressions of
nonstandard administration of an instrument the feedback session. (Note that the control
such as the Thematic Apperception Test (TAT). group was administered only that portion of the
Here, the clinician controls the patient's inter- feedback assessment questionnaire that was
pretation in order to draw out information relevant to them.)
relevant to the patient's questions. Also, solu- The results of Finn and Tonsager's (1992)
tions to problems elicited by the TAT cards are study indicated that compared to the control
suggested to the patient. group, the feedback group demonstrated sig-
The other additional step discussed by Finn nificantly less distress at the two-week post-
and Martin (in press) is the provision of a feedback follow up, and significantly higher
written report of the findings to the patient. In levels of self-esteem and hope at both the time of
addition to summarizing both the test results feedback and the two-week post-feedback
and the answers to the patient's questions, it also follow up. In other findings, feelings about
Psychological Assessment as a Tool for Outcomes Management 549

the feedback sessions were positively and Donabedian (1985) has identified three
significantly correlated with changes in self- dimensions of quality of care. ªStructureº refers
esteem from testing to feedback, both from to the organization providing the care. It
feedback to follow up and from testing to follow includes aspects such as how the organization
up among those who were administered the is ªorganized,º the physical facilities and
MMPI-2. In addition, the change in level of equipment, and the number and professional
distress from feedback to follow up correlated qualifications of its staff. ªProcessº refers to the
significantly with private self-consciousness specific types of services that are provided to a
(i.e., the tendency to focus on the internal given patient (or group of patients) during a
aspects of oneself) but not with public self- specific episode of care. These might include
consciousness. various types of tests and assessments (e.g.,
psychological tests, lab tests, magnetic reso-
nance imaging), therapeutic interventions (e.g.,
4.18.8 PSYCHOLOGICAL ASSESSMENT group psychotherapy, medication), and dis-
AS A TOOL FOR OUTCOMES charge planning activities. Treatment complica-
MANAGEMENT tions (e.g., drug reactions) are also included
here. ªOutcomesº, on the other hand, refers to
The 1990s have witnessed a positively accel-
the results of the specific treatment that was
erating growth curve reflecting the level of
rendered.
interest in and development of behavioral
The outcomes, or results, of treatment should
health care outcomes programs. Cagney and
not refer to change in only a single aspect of
Woods (1994) attribute this to four major
functioning. Treatment may impact various
factors. First, behavioral health care purchasers
facets of a patient's life. Stewart and Ware
are asking for information regarding the value
(1992) have identified five broad aspects of
of the services they buy. Second, there is an
general health status: physical health, mental
increasing number of purchasers who are
health, social functioning, role functioning, and
requiring a demonstration of patient improve-
general health perception. Treatment may affect
ment and satisfaction. Third, MCOs need data
these aspects of health in different ways,
that demonstrate that their providers render
depending on the disease or disorder being
efficient and effective services. And fourth,
treated and the effectiveness of the treatment.
outcomes information will be needed for the
Some specific aspects of functioning related to
ªquality report cardsº that MCOs anticipate
these five areas of general health status that are
they will be required to provide in the future. In
commonly measured include: feeling of well-
short, fueled by soaring health care costs, there
being, psychological symptom status, use of
has been an increasing need for providers to
alcohol and other drugs, functioning on the job
demonstrate that what they do is effective. And
or at school, marital/family relationships,
all of this has occurred within the context of the
utilization of health care services, and ability
continuous quality improvement (CQI) move-
to cope.
ment, in which there have been similar trends in
In considering the various types of outcomes
the level of interest and growth.
that might be assessed in behavioral health care
As this author has noted previously, the
settings, a substantial number of clinicians
interest in and necessity for outcomes measure-
probably would identify symptomatic change
ment in the era of managed care and account-
in psychological status as being the most
ability provides a unique opportunity for
important. Nevertheless, however important
clinical psychologists to use their training and
change in symptom status may have been in the
skills in assessment (Maruish, 1994). However,
past, clinical psychologists and other behavioral
the extent to which the clinical psychologist
health care providers have come to realize that
becomes a key and successful contributor to an
changes in many of the other aspects of
organization's outcomes initiative (whatever
functioning identified by Stewart and Ware
that might be) will depend on his or her
(1992) are equally important indicators of
understanding of what ªoutcomesº and their
treatment effectiveness. As Sederer et al.
measurement and applications are all about.
(1996) have noted:

4.18.8.1 What Are Outcomes? Outcome for patients, families, employers, and
payers is not simply confined to symptomatic
Before discussing outcomes, it is important to change. Equally important to those affected by
have a clear understanding of what is meant by the care rendered is the patient's capacity to
the term. Experience has shown that the function within a family, community, or work
meaning varies according to whom one may environment or to exist independently, without
speak. undue burden to the family and social welfare
550 Therapeutic Assessment: Linking Assessment and Treatment

system. Also important is the patient's ability to on revealing aspects about the therapeutic
show improvement in any concurrent medical and process that seem to affect change.
psychiatric disorder . . . Finally, not only do The third and most useful purpose of
patients seek symptomatic improvement, but they outcomes assessment is that of outcomes
want to experience a subjective sense of health and
management. Dorwart (1996) defines outcomes
well being. (p. 2)
management as ªthe use of monitoring infor-
mation in the management of patients to
A much broader perspective is offered in
improve both the clinical and administrative
Faulker and Gray's The 1995 behavioral out-
processes for delivering careº (pp. 46±47). In
comes and guidelines sourcebook (Migdail,
outcomes management, information is used to
Youngs, & Bengen-Seltzer, 1995):
improve the quality of services offered to the
patient population(s) served by the provider,
Outcomes measures are being redefined from a
vague ªis the patient doing better?º to more not to any one patient. Information gained
specific questions, such as, ªDoes treatment work through the assessment of patients can provide
in ways that are measurably valuable to the patient the organization with indications of what works
in terms of daily functioning level and satisfaction, best with whom and under what set of
to the payor in terms of value for each dollar spent, circumstances, thus helping to improve the
to the managed care organization charged with quality of services for all patients. In essence,
administering the purchaser's dollars, and to the outcomes management can serve as a tool for
clinician charged with demonstrating value for those organizations with an interest in imple-
hours spent?º (p. 1) menting a CQI initiative.
Thus, ªoutcomesº holds a different meaning
for each of the different parties who have a stake 4.18.8.3 The Benefits of Outcomes Assessment
in behavioral health care delivery. What is
measured generally depends on the purposes The implementation of any type of outcomes
for which outcomes assessment is undertaken. assessment initiative within an organization
As will be shown, these vary greatly. does not come without effort from and cost to
the organization. However, if it is implemented
properly, all interested parties, that is, patients,
4.18.8.2 Outcomes Assessment: Measurement, clinicians, provider organizations, payers, and
Monitoring, and Management the health care industry as a whole, should find a
substantial yield from the outlay of time and
Just as it is important to be clear about what is
money. Cagney and Woods (1994) identify
meant by outcomes, it is equally important to
several benefits to patients, including enhanced
clarify the three general purposes for which
health and quality of life, improved health care
outcomes assessment may be employed. The
quality, and effective use of the dollars paid into
first is outcomes measurement. This involves
benefits plans. For providers, the outcomes data
nothing more than pre- and post-treatment
can result in improved clinical skills, informa-
assessment of one or more variables to
tion related to the quality of care provided and
determine the amount of change that has
local practice standards, increased profitability,
occurred (if any) in these variables as a result
and decreased concerns over possible litigation.
of therapeutic intervention.
Outside of the clinical context, benefits also
A more useful approach is that of outcomes
can accrue to payers and MCOs. Cagney and
monitoring. This refers to ªthe use of periodic
Woods (1994) see the potential payer benefits as
assessment of treatment outcomes to permit
including healthier workers, improved health
inferences about what has produced changeº
care quality and worker productivity, and
(Dorwart, 1996, p. 46). Like treatment progress
reduced or contained health care costs. As for
monitoring used for treatment planning pur-
MCOs, the benefits include increased profits,
poses, outcomes monitoring involves the track-
information that can help shape the practice
ing of changes in the status of one or more
patterns of their providers, and decisions that
outcomes variables at multiple points in time.
are based on quality of care.
Assuming a baseline assessment at the begin-
ning of treatment, reassessment may occur one
or more times during the course of treatment 4.18.8.4 The Therapeutic Use of Outcomes
(e.g., weekly, monthly), at the time of termina- Assessment
tion, and/or during one or more periods of post-
termination follow up. Whereas treatment The foregoing overview of outcomes assess-
progress monitoring is used to determine how ment provides the background necessary for
much the patient is on or off the expected course discussing the use of psychological outcomes
of improvement, outcomes monitoring focuses assessment data in day-to-day clinical practice.
Psychological Assessment as a Tool for Outcomes Management 551

Whereas the focus of the above review was In addition to monitoring the course of
centered on both the individual patient and progress during treatment (see above), clin-
patient populations, it now will narrow to the icians may employ outcomes assessment to
use of outcomes assessment primarily in service obtain a direct measure of how much patient
to the individual patient. The reader interested improvement has occurred as the result of the
in issues related to large, organization-wide course of treatment intervention. Here, the
outcomes studies conducted for outcomes findings are of more benefit to the clinician than
management purposes (as defined above) is to the patient himself because a pre- and post-
encouraged to seek other sources of information treatment approach to the assessment is
that specifically address that topic (see, for utilized. The information will not lead to any
example, Migdail, Youngs, & Bengden-Seltzer, change in the patient providing the information,
1995; Newman, 1994). but the feedback it provides to the clinician
There is no one system or approach to the could assist him in the treatment of other
assessment of treatment outcomes for an patients later on.
individual patient that is appropriate for all Another common reason for outcomes
providers of behavioral health care services. assessment is to demonstrate the patient's need
Because of the specific type of outcomes one is for therapeutic services beyond that which is
interested in, the reasons for assessing them, and typically covered by the patient's medical and
the manner in which they may impact the behavioral health care benefits. When assess-
decisions made by the patient, payer and ment is conducted for this reason, the patient
clinician, any successful and useful outcomes and clinician are only secondary beneficiaries of
assessment approach must be customized. the outcomes data. As will be shown below, the
Customization should reflect the needs of the type of information that a third party payer
primary beneficiary of the information gained requires for authorization of extended benefits
from the assessment (i.e., patient, payer, or may not be the most relevant or useful to the
provider), with consideration of the secondary patient or the clinician.
stakeholders in the therapeutic endeavor.
Ideally, the identified primary beneficiary
would be the patient. Although this is not
4.18.8.4.2 What to measure
always the case, it would appear that only rarely
would the patient not benefit, at least indirectly, The aspects or dimensions of patient func-
from the gathering of outcomes data. tioning that are measured as part of outcomes
Following are considerations and recommen- assessment will depend on the purpose for
dations for the development and implementa- which the assessment is being conducted. As
tion of an outcomes assessment initiative by discussed earlier, probably the most commonly
behavioral health care providers. Although measured variable is that of symptomatology or
space limitations do not allow a comprehensive psychological/mental health status. After all,
review of all issues and solutions, the informa- disturbance or disruption in this dimension is
tion that follows can be useful to clinical probably the most common reason why people
psychologists and others with similar training seek behavioral health care services in the first
wishing to begin to incorporate outcomes place. However, there are other reasons for
assessment into their standard therapeutic seeking help, including difficulties in coping
routine. with various types of life transitions (e.g., a new
job, recent marriage or divorce, other changes in
the work or home environment), inability to
deal with the behavior of others (e.g., spouse,
4.18.8.4.1 Purpose of the outcomes assessment
children), general dissatisfaction with life, or
There are numerous reasons for assessing perhaps other less common reasons. Additional
outcomes. For example, in a recent survey of 73 assessment of related variables therefore may be
behavioral health care organizations, various necessary, or may even take precedence over the
reasons were identified by the participants as to assessment of symptoms or other mental health
why they had conducted outcomes studies indicators.
(Pallak, 1994). Among the several indicated, In the vast majority of the cases seen for
the top five reasons (in descending order) were behavioral health care services, the assessment
to: evaluate outcomes for patients, evaluate of the patient's overall level of psychological
provider effectiveness, evaluate integrated treat- distress or disturbance will yield the most
ment programs, manage individual patients, singularly useful information, regardless of
and support sales and marketing efforts. whether it is used for outcomes measurement,
However, from the clinician's standpoint, a outcomes monitoring, outcomes management,
couple of purposes are worth noting. or to meet the requirements of third-party
552 Therapeutic Assessment: Linking Assessment and Treatment

payers for authorization of additional benefits. (iii) What are the patient's criteria for the
Indices such as the Positive Symptom Total successful completion of the current therapeutic
(PST) or Global Severity Index (GSI) that are episode? The patient's goals for treatment may
part of the SA-45 or BSI can provide this type of provide only a broad target for the therapeutic
information efficiently and economically. intervention. Having the patient identify exactly
For some patients, measures of one or more what will have to happen to consider treatment
specific psychological disorders or symptom successful and no longer needed will help in
clusters are at least as important if not more specifying the most important constructs and/or
important than overall symptom or mental behaviors to assess.
health status. Here, if interest is in only one (iv) What are the clinician's criteria for the
disorder or symptom cluster (e.g., depression), successful completion of the current therapeutic
one may choose to measure only that particular episode? What the patient identifies as being
set of symptoms using an instrument designed important to accomplish during treatment may
specifically for that purpose (e.g., use of the BDI reflect a lack of insight into his or her problems,
with depressed patients). For those interested in or it might not otherwise concur with what the
assessing the outcomes of treatment relative to clinician's experience would indicate.
multiple psychological dimensions, the admin- (v) What are the criteria of significant third
istration of more than one disorder-specific parties for the successful completion of the
instrument or a single, multidimensional in- current therapeutic episode? From a strict treat-
strument which assesses all or most of the ment perspective, this should be given the least
dimensions of interest would be required. amount of consideration. From a more realistic
Again, instruments such as the SA-45 or the perspective, one cannot overlook the expecta-
BSI can provide a quick, broad assessment of tions and limitations that one or more third
multiple symptom domains. Although much parties have for the treatment that is rendered.
lengthier, other multiscale instruments, such as The expectations and limitations set by the
the MMPI-2 or the PAI, permit a more detailed patient's behavioral health care plan, the guide-
assessment of several disorders or symptom lines of the organization in which the clinician
domains using one inventory. practices, and possibly other external forces
In many cases, the assessment of mental may significantly play into the decision about
health status is adequate for outcomes assess- when to terminate treatment.
ment purposes. There are other instances in (vi) What, if any, are the outcomes initiatives
which changes in psychological distress or within the provider organization? One cannot
disturbance either provide only a partial ignore any outcomes programs that have been
indication of the degree to which therapeutic initiated by the organization in which the
intervention has been successful, are not of therapeutic services are delivered. Regardless
interest to the patient or a third-party payer, are of the problems and goals of the individual
unrelated to the reason why the patient sought patient, organization-wide studies of effective-
services in the first place, or are otherwise ness may dictate the gathering of specific types
inadequate or unacceptable as measures of of outcomes data from patients who have
improvement in the patient's condition. One received services.
may find that for some patients, improved Note that the selection of the variables to be
functioning on the job, at school, or with family assessed may address more than one of the
or friends is much more relevant and important above issues. Ideally, this is what should
than symptom reduction. For other patients, happen. However, one needs to take care that
improvement in their quality of life or feeling of the gathering of outcomes data does not become
well-being is more meaningful. too burdensome. As a general rule, the more
It is not always a simple matter to determine outcomes data one attempts to gather from a
exactly what should be measured. However, given patient or collateral, the less likely one is
careful consideration of the following questions to obtain any data at all. The key is to identify
should greatly facilitate the decision. the point at which the amount of data that can
(i) Why did the patient seek services? People be obtained from a patient and/or collaterals,
pursue treatment for many reasons. The pa- and the ease with which it can be gathered, is
tient's stated reason for seeking therapeutic optimized.
assistance may be the first clue in determining
what is important to measure.
4.18.8.4.3 How to measure
(ii) What did the patient hope to gain from
treatment? The patient's stated goals for the Once the decision concerning what to
treatment he or she is about to receive may be a measure has been made, one must then decide
primary consideration in the selection of out- how this should be measured. In many cases, the
comes to be assessed. most important data will be that obtained
Psychological Assessment as a Tool for Outcomes Management 553

directly from the patient through the use of self- whatever variables will be measured at the
report instruments. Underlying this assertion termination. At the minimum, this allows for
are the assumptions that valid and reliable ªoutcomes measurementº as described above.
instrumentation, appropriate to the needs of the As has been discussed, additional assessment of
patient, is available to the clinician; the patient the patient on the variables of interest can take
can read at the level required by the instruments; place at other points in time, that is, at other
and the patient is motivated to respond honestly times during the course of treatment and upon
to the questions asked. If this is not the case, post-discharge follow up.
other options are available. Many would argue that postdischarge/post-
Other types of data-gathering tools may be termination follow-up assessment provides the
substituted for self-report measures. Rating best or most important indication of the
scales completed by the clinician or other outcomes of therapeutic intervention. Two
members of the treatment staff may provide types of comparisons may be made on follow-
information that is as useful as that elicited up. The first is a comparison of the patient's
directly from the patient. In those cases in which status on the variables of interest at the time of
the patient is severely disturbed, unable to give treatment initiation, or at the time of discharge
valid and reliable answers (e.g., younger or termination, to that of the patient at some
children), unable to read, or is an otherwise point after treatment has ended. Either way, this
inappropriate candidate for a self-report mea- follow-up data will provide an indication of the
sure, clinical rating scales can substitute as more lasting effects of the intervention. Gen-
useful means of gathering data. Related to these erally, the variables of interest for this type of
instruments are parent-completed inventories comparison include such things as symptom
for child and adolescent patients. These are presence and intensity, feeling of well-being,
particularly useful in obtaining information frequency of substance use, and social and role
about the child or teen's behavior that might not functioning.
otherwise be known. The second type of post-treatment investiga-
Collateral rating instruments can also be used tion involves looking at the frequency at which
to gather information in addition to that some aspect(s) of the patient's life circum-
obtained from self-report measures. When used stances, behavior or functioning occurred
in this manner, these instruments provide a during a given period prior to treatment,
mechanism by which the clinician and other compared to that which occurred during an
treatment staff can contribute data to the equivalent period of time immediately preced-
outcomes assessment endeavor. This not only ing the post-discharge assessment. This ap-
results in the clinician or provider organization proach is commonly used in determining the
having more information upon which to cost-offset benefits of treatment. For example,
evaluate the outcomes of therapeutic interven- the number of times a patient has been seen in an
tion, it also gives the clinician an opportunity to emergency room for psychiatric problems
ensure that the perspective of the treatment during the three-month period preceding the
provider is considered in the evaluation of the initiation of outpatient treatment can be
effects of the treatment given. compared to the number of emergency room
Another potential source of outcomes in- visits during the three-month period preceding
formation is administrative data. In many of the the postdischarge follow-up assessment. Not
larger provider organizations, this information only can this provide an indication of the degree
is easily retrieved through their management to which treatment has helped the patient deal
information systems (MISs). Data related to the with his problems, it can also demonstrate how
patient's diagnosis, dose and regimen of much medical expenses have been reduced
medication, physical findings, course of treat- through the patient's decreased use of costly
ment, and other types of data typically stored in emergency room services.
these systems can be useful to those evaluating In general, post-discharge outcomes assess-
the outcomes of therapeutic intervention. ment probably should take place no sooner than
a month after treatment has ended. When
feasible, one probably should wait three to six
4.18.8.4.4 When to measure
months to assess the variables. This should
There are no hard and fast rules, guidelines, provide a more valid indication of the lasting
or accepted conventions related to when out- effects of treatment. Assessments being con-
comes should be assessed. The common practice ducted to determine the frequency with which
is to assess the patient at least at treatment some behavior or event occurs (as may be
initiation and treatment termination/discharge. needed to determine cost-offset benefits) can be
Obviously, at the time of treatment initiation, accomplished no sooner than the reference time
the clinician should obtain a baseline measure of interval used in the baseline assessment. Thus,
554 Therapeutic Assessment: Linking Assessment and Treatment

suppose that the patient reports 10 emergency (i) fall outside the range of the dysfunctional
room visits during the three-month period prior population by at least two standard deviations
to treatment. If one wants to know if the away from the mean of that population, in the
patient's emergency room visits have decreased direction of functionality;
after treatment, the assessment cannot take (ii) fall within two standard deviations of the
place any earlier than three months after mean for the normal or functional population;
treatment termination. or
(iii) be closer to the mean of the functional
population than to that of the dysfunctional
4.18.8.4.5 How to analyze outcomes data population.
Jacobson and Truax viewed the third option
There are two general approaches to the as being the least arbitrary and provided
analysis of treatment outcomes data. The first is different recommendations for determining cut-
to determine whether changes in patient scores offs for clinically significant change, depending
on outcomes measures are statistically signifi- upon the availability of normative data. Lam-
cant. The other is to establish whether these bert (1994) demonstrated how the third option
changes are clinically significant. Use of could be modified to allow for the inclusion of
standard tests of statistical significance is more than one categorization of dysfunction
important in the analysis of group or population (e.g., mild, moderate, severe). This assumes, of
change data. Clinical significance is more course, that the necessary normative data
relevant to change in the individual patient. needed to separate the gradations of dysfunc-
As this chapter is focused on the individual tion are available.
patient, this section will center on matters At the same time, these same investigators
related to determining clinically significant noted the importance of considering the change
change as the result of treatment. in the measured variables of interest from pre- to
The issue of clinical significance has received post-treatment, in addition to the patient's
a great deal of attention in psychotherapy functional status at the end of therapy. To this
research during the past several years. This is at end, Jacobson et al. (1984) proposed the
least partially owing to the work of Jacobson concomitant use of a reliable change (RC)
and his colleagues (Jacobson, Follette, & index to determine whether change is clinically
Revenstorf, 1984, 1986; Jacobson & Truax, significant. This index, modified on the recom-
1991) and others (e.g., Christensen & Mendoza, mendation of Christensen and Mendoza (1986),
1986; Speer, 1992; Wampold & Jenson, 1986). is nothing more than the pretest score minus the
Their work came at a time when researchers posttest score divided by the standard error of
began to recognize that traditional statistical the difference of the two scores, expressed as:
comparisons do not reveal a great deal about the
efficacy of therapy. In discussing the topic, RC = (x2 7 x1)/Sdiff
Jacobson and Truax broadly define the clinical
significance of a treatment as where x1 is the pretest score, x2 is the post-test
score, and Sdiff is the standard error of the
its ability to meet standards of efficacy set by difference. The standard error of the difference
consumers, clinicians, and researchers. While there is computed as:
is little consensus in the field regarding what these
standards should be, various criteria have been
suggested: a high percentage of clients
Sdiff = H2 (SEM)2
improving . . .; a level of change that is recognizable
by peers and significant others . . .; an elimination where SEM is the standard error of measure-
of the presenting problem . . .; normative levels of ment for a functional group (e.g., normals,
functioning at the end of therapy . . . ; high end- nonpatients) on the instrument. If the RC index
state functioning at the end of therapy . . .; or is greater than 1.96, the change in scores is not
changes that significantly reduce one's risk for likely to be due to chance (p 5 0.05), but rather
various health problems. (p. 12) to reflect real change.
Speer (1992) recommended a different ap-
From their perspective, Jacobson and his proach when regression to the mean has been
colleagues (Jacobson, Follette, & Revenstorf, demonstrated to contribute to the improvement
1984; Jacobson & Truax, 1991) felt that clini- in scores from pre- to post-test. The alternate
cally significant change could be conceptualized approach, based on the combined work of
in one of three ways. Thus, for clinically Nunnally (1967) and Edwards, Yarvis, Mueller,
significant change to have occurred, the mea- Zingale, and Wagman (1978), involves devel-
sured level of functioning following the ther- oping a confidence interval of +2 SEMs around
apeutic episode would either: the estimated true pretest score. A post-test
Future Directions 555

score falling outside of this confidence interval industry certainly had contributed its share of
is considered significantly different from the waste, inefficiency, and lack of accountability
initial pretest score at p 5 0.05. Using this to the problems that led to the revolution. Now,
approach, more change is needed to show like other areas of health care, it is forced to
clinically significant improvement than to show ªclean up its act.º Some consumers of mental
clinically significant deterioration. Note that the health or chemical dependency services have
criterion for determining whether regression to benefited from the revolution, others have not.
the mean is occurring is met when a negative In any case, the way in which health care is
correlation is found to exist between the delivered and financed has changed, and
pretreatment score and amount of change that clinical psychologists and other behavioral
has taken place. This implies the evaluation of health care professionals must adapt to survive
group data, and for this reason this empirical in the market.
criterion may not be of use for the individual Some of those involved in the delivery of
patient unless the latter is a member of a sample psychological assessment services may wonder
for which test results are available. (with some fear and trepidation) where the
Lambert (1994) proposes a modified recom- revolution is leading the behavioral health care
mendation for the dual criteria for clinically industry and, in particular, how their ability to
significant change (that is, RC greater than 1.96 practice will be affected. At the same time,
and movement of the patient's score from the others are eagerly awaiting the inevitable
dysfunctional group's distribution to the func- advances in technology and other resources
tional group's distribution) such that movement that come with the passage of time. What will
from one degree of dysfunction to a lesser occur is open to speculation. However, close
degree would also meet one of the two criteria observation of the practice of psychological
for clinically significant change. In an example, assessment and the various industries that
Lambert illustrated that normative data for the support it (particularly the forms of therapeutic
Global Severity Index (GSI) found in the SCL- assessment described in this chapter) has led this
90-R literature can be used to empirically define author to arrive at some predictions as to where
four levels of symptom intensity: asymptomatic, the field of therapeutic psychological assess-
mildly symptomatic, moderately symptomatic, ment is headed and the implications these have
and severely symptomatic. Assuming an RC of for clinicians, provider organizations, and
1.96 or greater, clinically significant change can patients. What follows in this section are the
be said to have occurred if a patient's GSI score most important of these predictions. Also
moves from severely to moderately or mildly included are what this author feels are the
symptomatic, or to asymptomatic; from mod- needs that must be met if psychological
erately to mildly symptomatic, or to asympto- assessment is to continue to be a valued
matic; or from mildly symptomatic to contributor to the delivery of efficient, cost-
asymptomatic. Although this criterion is less effective behavioral health care.
stringent than having to move from being
symptomatic (regardless of the severity) to
asymptomatic, it still provides information that 4.18.9.1 What the Industry Is Moving Away
is quite useful for clinical decision making. From?
One way of discussing what the field is
4.18.9 FUTURE DIRECTIONS moving toward is to first talk about what it is
moving away from. In the case of therapeutic
The ways in which clinical psychologists have psychological assessment, two trends are be-
conducted the types of psychological assess- coming quite clear. First, starting at the
ment described in this chapter have undergone beginning of this last decade of the twentieth
dramatic changes during the 1990s. This should century, the use of (and reimbursement for)
come as no surprise to anyone who spends a few psychological assessment has gradually been
minutes a day skimming the newspaper or curtailed. This particularly has been the case
watching television. The health care revolution with regard to indiscriminate assessment invol-
started gaining momentum at the beginning of ving the administration of lengthy and expen-
the 1990s and has not since slowed down. And sive batteries of psychological tests. Payers
there are no indications that it will subside in began to demand evidence that the knowledge
the foreseeable future. There was no real reason gained from the administration of these instru-
to think that behavioral health care would be ments contributed to the delivery of cost-
spared from being a target of the revolution, effective, efficient care to mental health and
and there is no good reason why it should have substance abuse patients. There are no indica-
been spared. The behavioral health care tions that this trend will stop.
556 Therapeutic Assessment: Linking Assessment and Treatment

Second, assessment has begun to move away recognition by payers and patients that changes
from the use of lengthy, multidimensional in several areas of functioning are at least as
objective instruments (e.g., the MMPI) or important as changes in level of symptom
time-consuming projective techniques (e.g., severity when evaluating the effectiveness of the
Rorschach) that previously represented the treatment. For example, employers are inter-
standard of practice. When assessment is ested in the patient's ability to resume the
authorized now, it usually involves the use of functions of his or her job, while family
inexpensive yet well-validated, problem-or- members may be concerned with the patient's
iented instruments. This reflects modern beha- ability to resume their role as spouse or parent.
vioral health care's time-limited, problem- Increasingly, measurement of the patient's
oriented approach to treatment. The clinician functioning in areas other than psychological/
can no longer afford to spend a great deal of mental status has come to be included as part of
time in assessment activities when the patient behavioral health care outcomes systems.
has only a limited number of payer-authorized Probably the most visible indication of this
sessions with him or her. Thus, both now and is the incorporation of the SF-36 or HSQ
in the foreseeable future, brief instruments will into various behavioral health care studies,
be used for problem identification or clarifica- and the fact that two major psychological test
tion, progress monitoring, and/or outcomes publishers offer HSQ products in their catalogs
assessment. of clinical products. One will likely see other
public domain and commercially available
4.18.9.2 Trends in Instrumentation nonsymptom-oriented instruments, especially
those emphasizing social and occupational role
The move toward the use of brief, problem- functioning, in increasing numbers over the next
oriented instruments for therapeutic psycholo- several years.
gical assessment purposes has just been identi- Other types of instrumentation will also
fied. Another trend in the selection of become prominent. These will include measures
instrumentation is the increasing use of public of variables that support the outcomes and
domain tests, questionnaires, rating scales, and other therapeutic assessment initiatives under-
other types of measurement tools. Previously, taken by provider organizations. What one
these free-use instruments were not developed organization or provider feels is important, or
with the rigor that is usually applied in the what it is told is important for reimbursement or
development of psychometrically sound instru- other purposes, will dictate what is measured.
ments by commercial test publishers. Conse- Instrumentation will also include measures that
quently, they typically lacked the validity and will be useful for the prediction of outcomes for
reliability data that are necessary to judge their individuals seeking psychotherapeutic services
psychometric integrity. from those organizations.
Recently, however, there has been a signifi-
cant improvement in the quality and documen- 4.18.9.3 Trends in Data Use and Storage
tation of the public domain and other ªfor-freeº
tests that are available for therapeutic psycho- There are two areas of application in which
logical assessment. Instruments such as the SF- the valuable data obtained from therapeutic
36/SF-12 and HSQ/HSQ-12 health measures psychological assessment have heretofore been
are good examples of such tools. These and overlooked or underutilized. Indications are
instruments such as the Behavior and Symptom that this will change for both in the future. One
Identification Scale (BASIS-32; Eisen, Grob, & area for which assessment data has potential
Klein, 1986) and the Outcome Questionnaire application is that of clinical decision-making.
(OQ-45.1; Lambert, Lunnen, Umphress, Han- This of course pertains only to the use of
sen, & Burlingame, 1994) have undergone outcomes assessment data. Generally, data
psychometric scrutiny and have gained wide- gathered solely for the purpose of outcomes
spread acceptance. Although copyrighted, they assessment is used for just that: the assessment
may be used for a nominal one-time or annual of the results of treatment. This is particularly
licensing fee; thus, they generally are treated the case in large, formal outcomes management
much like public domain assessment tools. One programs. As has been discussed earlier in this
can expect that other quality, useful instruments chapter, data gathered at the beginning of
will be made available for use at little or no cost treatment can be used immediately for treat-
in the future. ment planning purposes while also serving as
As for the types of instrumentation that will baseline data that can be compared to discharge
be needed and developed, one can probably data later on.
expect some changes. Accompanying the in- The other potential area of data application is
creasing focus on outcomes assessment is a in the development of local, regional, and
Future Directions 557

national databases of therapeutic assessment To speculate on how technology will be


data. Patient data gathered by various provi- advanced in the service of therapeutic psycho-
ders, organizations or programs within organi- logical assessment in the future is a risky
zations, at one or more points during the businesss. As has been witnessed since the late
therapeutic episode, can be pooled and used for 1970s, much can happen quickly. There are,
various purposes. These databases can then however, three areas of technologic or
serve as the bases for two highly beneficial (and technology-dependent advances on the horizon
probably profitable) endeavors. The first is the to which clinical psychologists should have
generation of sets of normative data for various access in the not too distant future. The first is
populations delineated along any number of the availability of online administration, scor-
parameters. Norms for any number of instru- ing, and interpretation and reporting of tests via
ments or health care variables could be the Internet. In fact, an Internet version of the
generated ªon demandº and continuously SA-45 is being beta tested at the time of writing
updated to reflect trends in behavioral health (1997). To this author's knowledge, this
care. This author is aware of one large, national represents the first use of the Internet for
behavioral health care system where such a psychological assessment purposes. It is antici-
database already exists. He also is aware of pated that the Internet version of the SA-45 will
efforts at establishing cross-organizational be commercially available in the very near
databases of this kind. future, and it will be quickly followed by the
The second benefit afforded by the informa- availability of Internet versions of other assess-
tion contained in these databases is that of ment instruments.
predictive modeling. For example, the beha- The second advance is actually a technology
vioral health care organization mentioned that has been around for a while but has
above has taken advantage of the organizational undergone improvements, that is, the fax-back
data available to it to investigate the relation- technology that is being used for scoring and
ships between a number of treatment, demo- reporting of objective, paper-and-pencil tests.
graphic and other variables and the outcomes of Essentially, the fax machine replaces the optical
treatment. Subjecting the large data sets avail- scanner as a means of data entry. The electronic
able to it to sophisticated statistical analyses has data is entered directly into the test publisher's
allowed this organization to determine those computer for processing and report generation.
types of patients requiring special care or However, instead of generating a hard-copy
attention in order to achieve desired outcomes report of results at the processing site, the report
at the time of treatment termination. Predictive is transmitted in electronic form and sent back
modeling can also be used for identifying to the clinician's fax machine within minutes of
variables related to other aspects of patient processing. At that point, the report is printed
care, such as patient satisfaction with the care out just like any other fax transmission.
received. The possibilities for the use of data in Currently, this technology is being used on a
this manner are enormous. somewhat limited basis. This is partially owing
to the degree to which test publishers are
4.18.9.4 Trends in the Application of making this form of automated scoring and
Technology reporting available to their customers. How-
ever, in the relatively near future, one should see
Clinical psychologists have not been shy more tests being offered to clinicians in this
when it has come to taking advantage of the manner, particularly as the technology con-
technological advances that have been achieved tinues to improve.
since the late 1970s. This is no more evident than The third area of technologic advancement
in the extent to which the personal computer has more to do with the application of
and the vast array of psychological assessment technology than the development of new
software have been incorporated into their technology. L. E. Beutler and O. B. Williams
delivery of clinical services. Automated admin- (personal communication, January 15, 1996)
istration, scoring, and interpretation and re- have taken Beutler and Clarkin's (1990)
porting of the results of nearly all major Systematic Treatment Selection (STS) model
objective tests are currently available to the of prescriptive treatment assignment and have
clinician through PC-based software. In addi- developed specifications for software for auto-
tion, the availability of affordable desktop mating the matching of treatments, therapists,
optical scanners allows the clinician to maintain and patients. The capability of subsequently
the portability of the assessment instruments tracking patients during the course of treatment
while retaining the scoring and interpreting is also included in these specifications. Origin-
power of the computer for processing the test ally entitled STS for Windows, this software is
data. now under development through a behavioral
558 Therapeutic Assessment: Linking Assessment and Treatment

health care publishing and consulting company. can accrue from the treatment of mental health
Driving the STS system is patient assessment and substance use disorders. This has been the
data related to six variables: subjective distress, bright spot in an otherwise bleak picture for
functional severity, problem complexity, poten- some behavioral health care professionals. For
tial for therapeutic resistance, coping style, and clinical psychologists, the picture appears to be
social support. Each of these variables may be somewhat different. They now have additional
assessed through either commercially available opportunities to contribute to the positive
self-report instruments or clinician rating scales aspects of the revolution and to gain from the
developed specifically for use with STS. ªnew orderº it has imposed. By virtue of their
When fully developed, the STS should serve training in psychological assessment and
as the standard for in-office treatment±patient± through the application of appropriate instru-
therapist matching and patient-tracking soft- mentation, they are uniquely qualified to
ware. According to the developers (L. E. Beutler support or otherwise facilitate multiple aspects
& O. B. Williams, personal communication, of the therapeutic process. It is the clinical
January 15, 1996), the software will include psychologist's contributions to aspects of
numerous features, the most important of which ªtherapeutic psychological assessmentº that
will be: a comprehensive treatment planning this chapter has sought to identify and address
report with up-to-date references to relevant in some detail.
research articles and treatment manuals; auto- Earlier, this author identified some of the
matic entry of each patient's data into a growing types of psychological assessment instruments
database that is used for treatment planning and that are commonly used in the service of
prediction; the ability to predict the amount of therapeutic endeavors. These included both
symptom reduction from a specific course of brief and lengthy (multidimensional) symptom
therapy; a report profiling the patient's symp- measures, as well as measures of general health
tom status over time; a report indicating status, quality of life, and patient satisfaction
individual clinician's ability to treat specific with the services received. Also identified were
types of symptomatology; and the ability to different sets of general criteria that can be
incorporate case notes into the patient's applied when selecting instruments for use in
electronic file. The major benefits of the therapeutic settings. The main intent of this
system include the ability to: use different chapter, however, was to present a detailed
assessment means (self-report or clinician rat- discussion of the various therapeutic uses of
ing) to obtain the information needed to drive psychological assessment.
the system; develop treatment recommenda- Generally, psychological assessment can
tions based on information that is optimal for assist the clinician in three important clinical
the patient; easily monitor patient progress on a activities: clinical decision-making, treatment
glide path developed from the treatment of itself (when used as a specific therapeutic
similar patients and adjust the therapy plan (if technique), and treatment outcomes assess-
necessary) on a timely basis; and determine a ment. Regarding the first of these activities,
clinician's therapeutic strengths and weak- three important clinical decision-making func-
nesses, thus permitting the most effective tions can be facilitated by psychological assess-
patient±therapist match. ment: screening, treatment planning, and
All in all, when fully developed, the STS treatment monitoring. The first of these can
software will combine the knowledge and be served by the use of ªdown and dirtyº
expertise of a leader in the field of psychother- instruments to identify, within a high degree of
apeutic research with state-of-the-art technol- certainty, the likelihood of the presence (or
ogy, thus yielding a powerful decision-making absence) of a particular condition or character-
behavioral health care tool. One can be assured istic. Here, the diagnostic efficiency of the
that similar products are likely to follow once instrument used (as indicated by the PPP and
the benefits of the STS software become widely NPP) is of great importance. Through their
known. ability to identify and clarify problems as well as
other important treatment-relevant patient
4.18.10 SUMMARY characteristics, psychological assessment instru-
ments can also be of great assistance in planning
The health care revolution has brought mixed treatment. In addition, treatment monitoring,
blessings to those in the behavioral health care or the regular determination of the patient's
professions. It has resulted in limitations for progress throughout the course of treatment,
reimbursement for services rendered and has can be served well by the application of
forced many to change the way they practice psychological assessment instruments.
their profession. At the same time, it has led to Secondly, assessment may be used as part of a
revelations about the cost savings benefits that therapeutic technique. In what Finn terms
References 559

ªtherapeutic assessment,º situations in which Canberra, Australia: Australian Government Publishing


patients are evaluated via psychological testing Service.
Appelbaum, S. A. (1990). The relationship between
are used as opportunities for the process itself to assessment and psychotherapy. Journal of Personality
serve as a form of therapeutic intervention. This Assessment, 54, 791±801.
is accomplished through involving the patient as Attkisson, C. C., & Zwick, R. (1982). The Client
an active participant in the assessment process, Satisfaction Questionnaire: Psychometric properties
and correlations with service utilization and psychother-
not just as the object of the assessment. apy outcome. Evaluation and Program Planning, 6,
Thirdly, psychological assessment can be 233±237.
employed as the primary mechanism by which Baldessarini, R. J., Finkelstein, S., & Arana, G. W. (1983).
the outcomes or results of treatment can be The predictive power of diagnostic tests and the effect of
measured. However, the use of assessment for prevalence of illness. Archives of General Psychiatry, 40,
569±573.
this purpose is not a cut-and-dried matter. As Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979).
discussed, there are issues, pertaining to what to Cognitive therapy of depression. New York: Guilford
measure, how to measure, and when to measure, Press.
that require considerable thought prior to Beutler, L. E., & Clarkin, J. (1990). Systematic treatment
selection: Toward targeted therapeutic interventions. New
undertaking any standard (or even nonstan- York: Brunner/Mazel.
dard) plan to assess outcomes. Guidelines for Beutler, L. E., Wakefield, P., & Williams, R. E. (1994). Use
resolving these issues are presented, as is of psychological tests/instruments for treatment plan-
information pertaining to how to determine ning. In M. E. Maruish (Ed.), The use of psychological
whether the measured outcomes of treatment testing for treatment planning and outcome assessment
(pp. 55±74). Hillsdale, NJ: Erlbaum.
are indeed ªsignificant.º Beutler, L. E., & Williams, O. B. (1995). Computer
In the final section of the chapter, this author applications for the selection of optimal psychosocial
shares some thoughts about where psychologi- therapeutic interventions. Behavioral health care Tomor-
cal assessment is probably heading in the future. row, 4, 66±68
No radical revelations are presented since no Burlingame, G. M., Lambert, M. J., Reisinger, C. W., Neff,
W. M., & Mosier, J. (1995). Pragmatics of tracking
signs really point in that direction. What is mental health outcomes in a managed care setting.
foreseen is the appearance of more quality Journal of Mental Health Administration, 22, 226±236.
assessment instruments that will remain in the Butcher, J. N. (1990). The MMPI-2 in psychological
public domain, and greater application of treatment. New York: Oxford University Press.
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen,
communications technology, fax and the Inter- A. M., & Kaemmer, B. (1989). MMPI-2: Manual for
net, in particular, as assessment delivery, administration and scoring. Minneapolis, MN: University
scoring and reporting mechanisms. Also pre- of Minnesota Press.
dicted is the application of tomorrow's com- Butcher, J. N., Graham, J. R., Williams, C. L., & Ben-
puter technology to available assessment data Porath, Y. (1989). Development and use of the MMPI-2
content scales. Minneapolis, MN: University of Minne-
for optimized treatment±patient±therapist sota Press.
matching. The innovative proposals of Beutler Cagney, T., & Woods, D. R. (1994). Why focus on
and Williams in this regard seem to represent outcomes data? Behavioral health care Tomorrow, 3,
the state-of-the-art thinking at this time. 65±67.
There is no doubt that the practice of Center for Disease Control and Prevention (1994, May 27).
Quality of life as a new public health measure:
psychological assessment has been dealt a blow Behavioral risk factor surveillance system. Morbidity
within recent years. However, as this chapter and Mortality Weekly Report, 43, 375±380.
hopefully has shown, clinical psychologists have Christensen, L., & Mendoza, J. L. (1986). A method of
the skills to take this powerful tool, apply it in assessing change in a single subject: An alteration of the
RC index [Letter to the editor]. Behavior Therapy, 17,
ways that will benefit those suffering from 305±308.
mental health and substance abuse problems, Ciarlo, J. A., Brown, T. R., Edwards, D. W., Kiresuk, T.
and demonstrate its benefits and their skills to J., & Newman, F. L. (1986). Assessing mental health
patients and payers. Whether they will be treatment outcomes measurement techniques. DHHS Pub.
successful in this demonstration will be deter- No. (ADM)86±1301. Washington, DC: US Government
Printing Office.
mined in the near future. In the meantime, Commission on Chronic Illness (1987). Chronic illness in
advances will continue to be made that will the United States: 1. Cambridge, MA: Commonwealth
facilitate their work and improve its quality. Fund, Harvard University Press.
Derogatis, L. R. (1983). SCL-90-R: Administration, scoring
and procedures manual-II. Baltimore: Clinical Psycho-
4.18.11 REFERENCES metric Research.
Derogatis, L. R. (1992). BSI: Administration, scoring and
American Psychological Association (1992). Ethical princi- procedures manual-II. Baltimore: Clinical Psychometric
ples of psychologists. Washington, DC: Author. Research.
American Psychological Association (1996). The costs of Derogatis, L. R., & DellaPietra, L. (1994). Psychological
failing to provide appropriate mental health care. tests in screening for psychiatric disorder. In M. E.
Washington, DC: Author. Maruish (Ed.), The use of psychological testing for
Andrews, G., Peters, L., & Teesson, M. (1994). The treatment planning and outcome assessment (pp. 22±54).
measurement of consumer outcomes in mental health. Hillsdale, NJ: Erlbaum.
560 Therapeutic Assessment: Linking Assessment and Treatment

Derogatis, L. R., Lipman, R. S., & Covi, L. (1973). SCL- Health Outcomes Institute (1993). Health Status Ques-
90: An outpatient psychiatric rating scaleÐpreliminary tionnaire 2.0 manual. Bloomington, MN: Author.
report. Psychopharmacology Bulletin, 9, 13±27. Hsiao, J. K., Bartko, J. J., & Potter, W. Z. (1989).
Dickey, B., & Wagenaar, H. (1996). Evaluating health Diagnosing diagnoses: Receiver operating characteristic
status. In L. I. Sederer & B. Dickey (Eds.), Outcomes methods and psychiatry. Archives of General Psychiatry,
assessment in clinical practice (pp. 55±60). Baltimore: 46, 664±667.
Williams & Wilkins. Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984).
Donabedian, A. (1985). Explorations in quality assessment Psychotherapy outcome research: Methods for reporting
and monitoring: Vol. III. The methods and findings of variability and evaluating clinical significance. Behavior
quality assessment monitoring: An illlustrative analysis. Therapy, 15, 336±352.
Ann Arbor, MI: Health Administration Press. Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1986).
Dorwart, R. A. (1996). Outcomes management strategies in Toward a standard definition of clinically significant
mental health: Applications and implications for clinical change [Letter to the editor]. Behavior Therapy, 17,
practice. In L. I. Sederer & B. Dickey (Eds.), Outcomes 309±311.
assessment in clinical practice (pp. 45±54). Baltimore: Jacobson, N. S., & Truax, P. (1991). Clinical significance:
Williams & Wilkins. A statistical approach to defining meaningful change in
Dowd, E. T., Milne, C. R., & Wise, S. L. (1991). The psychotherapy research. Journal of Consulting and
therapeutic Reactance Scale: A measure of psychological Clinical Psychology, 59, 12±19.
reactance. Journal of Counseling and Development, 69, Jahoda, M. (1958). Current concepts of mental health. New
541±545. York: Basic Books.
Edwards, D. W., Yarvis, R. M., Mueller, D. P., Zingale, H. Lambert, M. J. (1994). Use of psychological tests for
C., & Wagman, W. J. (1978). Test-taking and the outcome assessment. In M. E. Maruish (Ed.), The use of
stability of adjustment scales: Can we assess patient psychological testing for treatment planning and outcome
deterioration? Evaluation Quarterly, 2, 275±292. assessment (pp. 75±97). Hillsdale, NJ: Erlbaum.
Eisen, S. V., Grob, M. C., & Klein, A. A. (1986). BASIS: Lambert, M. J., Lunnen, K., Umphress, V., Hansen, N. B.,
The development of a self-report measure for psychiatric & Burlingame, G. M. (1994). Administration and scoring
inpatient evaluation. The Psychiatric Hospital, 17, manual for the Outcome Questionnaire (OQ-45.1). Salt
165±171. Lake City, UT: IHC Center for Behavioral health care
Elwood, R. W. (1993). Psychological tests and clinical Efficacy.
discrimination: Beginning to address the base rate Larsen, D. L., Attkisson, C. C., Hargreaves, W. A., &
problem. Clinical Psychology Review, 13, 409±419. Nguyen, T. D. (1979). Assessment of client/patient
Ficken, J. (1995). New directions for psychological testing. satisfaction: Development of a general scale. Evaluation
Behavioral Health Management, 20, 12±14. and Program Planning, 2, 197±207.
Finn, S. E. (1996a). Manual for using the MMPI-2 as a LeVois, M., Nguyen, T. D., & Attkisson, C. C. (1981).
therapeutic intervention. Minneapolis, MN: University of Artifact in client satisfaction assessment: Experience in
Minnesota Press. community mental health settings. Evaluation and
Finn, S. E. (1996b). Assessment feedback integrating Program Planning, 4, 139±150.
MMPI-2 and Rorschach findings. Journal of Personality Maruish, M. (1990, Fall). Psychological assessment: What
Assessment, 67, 543±557. will its role be in the future? Assessment Applications,
Finn, S. E., & Butcher, J. N. (1991). Clinical objective 7±8.
personality assessment. In M. Hersen, A. E. Kazdin, & Maruish, M. E. (1994). Introduction. In M. E. Maruish
A. S. Bellack (Eds.), The clinical psychology handbook (Ed.), The use of psychological testing for treatment
(2nd ed., pp. 362±373). New York: Pergamon. planning and outcome assessment (pp. 3±21). Hillsdale,
Finn, S. E., & Martin, H. (in press). Therapeutic NJ: Erlbaum.
assessment with the MMPI-2 in managed health care. Megargee, E. I., & Spielberger, C. D. (1992). Reflections on
In J. N. Butcher (Ed.), Objective personality assessment in fifty years of personality assessment and future direc-
managed health care: A practitioner's guide. Minneapolis, tions for the field. In E. I. Megargee & C. D. Spielberger
MN: University of Minnesota Press. (Eds.), Personality assessment in America (pp. 170±190).
Finn, S. E., & Tonsager, M. E. (1992). Therapeutic effects Hillsdale, NJ: Erlbaum.
of providing MMPI-2 test feedback to college students Mental Health Weekly. (1996a, April 8). Future targets
awaiting therapy. Psychological Assessment, 4, 278±287. behavioral health field's quest for survival, 1±2.
Friedman, R., Sobel, D., Myers, P., Caudill, M., & Benson, Mental Health Weekly. (1996b, April 8). Leaders predict
H. (1995). Behavioral medicine, clinical health psychol- integration of MH primary care by 2000, 1±6.
ogy, and cost offset. Health Psychology, 14, 509±518. Mental Health Weekly. (1996c, April 29). Anxiety disorders
Gough, H. G., McClosky, H., & Meehl, P. E. (1951). A screening day occurs this week, 7.
personality scale for dominance. Journal of Abnormal Metz, C. E. (1978). Basic principles of ROC analysis.
and Social Psychology, 46, 360±366. Seminars in Nuclear Medicine, 8, 283±298.
Gough, H. G., McClosky, H., & Meehl, P. E. (1952). A Migdail, K. J., Youngs, M. T., & Bengen-Seltzer, B. (Eds.)
personality scale for social responsibility. Journal of (1995). The 1995 behavioral outcomes & guidelines
Abnormal and Social Psychology, 47, 73±80. sourcebook. New York: Faulkner & Gray.
Greene, R. L. (1991). The MMPI-2/MMPI: An interpretive Millon, T. (1994). MCMI-III manual. Minneapolis, MN:
manual. Boston: Allyn & Bacon. National Computer Systems.
Greene, R. L., & Clopton, J. R. (1994). Minnesota Moreland, K. L. (1996). How psychological testing can
Multiphasic Personality InventoryÐ2. In M. E. Maruish reinstate its value in an era of cost containment.
(Ed.), The use of psychological testing for treatment Behavioral health care Tomorrow, 5, 59±61.
planning and outcome assessment (pp. 137±159). Hills- Morey, L. C. (1991). The Personality Assessment Inventory
dale, NJ: Erlbaum. professional manual. Odessa, FL: Psychological Assess-
Greenfield, T. K., & Attkisson, C. C. (1989). Progress ment Resources.
toward a multifactorial service satisfaction scale for Morey, L. C., & Henry, W. (1994). Personality Assessment
evaluating primary care and mental health services. Inventory. In M. E. Maruish (Ed.), The use of
Evaluation and Program Planning, 12, 271±278. psychological testing for treatment planning and outcome
Hathaway, S. R., & McKinley, J. C. (1951). MMPI manual. assessment (pp. 185±216). Hillsdale, NJ: Erlbaum.
New York: The Psychological Corporation. Newman, F. L. (1994). Selection of design and statistical
References 561

procedures for progress and outcome assessment. In M. Sipkoff, M. Z. (1995, August). Behavioral health treatment
E. Maruish (Ed.), The use of psychological testing for reduces medical costs: Treatment of mental disorders
treatment planning and outcome assessment (pp. 111±134). and substance abuse problems increases productivity in
Hillsdale, NJ: Erlbaum. the workplace. Open Minds, 12.
Newman, F. L., & Ciarlo, J. A. (1994). Criteria for Speer, D. C. (1992). Clinically significant change: Jacobson
selecting psychological instruments for treatment out- and Truax (1991) revisited. Journal of Consulting and
come assessment. In M. E. Maruish (Ed.), The use of Clinical Psychology, 60, 402±408.
psychological testing for treatment planning and outcome Spielberger, C. D. (1983). Manual of the State±Trait
assessment (pp. 98±110). Hillsdale, NJ: Erlbaum. Anxiety Inventory: STAI (Form Y). Palo Alto, CA:
Nguyen, T. D., Attkisson, C. C., & Stegner, B. L. (1983). Consulting Psychologists Press.
Assessment of patient satisfaction: Development and Stewart, A. L., & Ware, J. E., Jr. (1992). Measuring
refinement of a service evaluation questionnaire. Evalua- functioning and well-being. Durham, NC: Duke Uni-
tion and Program Planning, 6, 299±313. versity Press.
Nunnally, J. C. (1967). Psychometric theory. New York: Strain, J. J., Lyons, J. S., Hammer, J. S., Fahs, M.,
McGraw-Hill. Lebovits, A., Paddison, P. L., Snyder, S., Strauss, E.,
Olfson, M., & Pincus, H. A. (1994a). Outpatient psy- Burton, R., & Nuber, G. (1991). Cost offset from a
chotherapy in the United States, I: Volume, costs, and psychiatric consultation-liaison intervention with elderly
user characteristics. American Journal of Psychiatry, 151, hip fracture patients. American Journal of Psychiatry,
1281±1288. 148, 1044±1049.
Olfson, M., & Pincus, H. A. (1994b). Outpatient psy- Strategic Advantage, Inc. (1996). Symptom Assessment-45
chotherapy in the United States, II: Patterns of utiliza- Questionnaire manual. Minneapolis, MN: Author.
tion. American Journal of Psychiatry, 151, 1289.
Substance Abuse Funding News, (1995, December 22).
Oss, M. E. (1996). Managed behavioral health care: A look
Brown resigns drug post. 7.
at the numbers. Behavioral Health Management, 16,
VandenBos, G. R., DeLeon, P. H., & Belar, C. D. (1993).
16±17.
How many practitioners are needed? It's too early to
Pallak, M. S. (1994). National outcomes management
know! Professional Psychology: Research and Practice,
survey: Summary report. Behavioral health care Tomor-
22, 441±448.
row, 3, 63±69.
Phelps, R. (1996, February). Preliminary practitioner Vermillion, J., & Pfeiffer, S. (1993). Treatment outcome
survey results enhance APA's understanding of health and continuous quality improvement: Two aspects of
care environment. Practitioner Focus, 9, 5. program evaluation. Psychiatric Hospital, 24, 9±14.
Psychotherapy Finances (1995, January). Fee, practice and Wampold, B. E., & Jenson, W. R. (1986). Clinical
managed care survey. 21(1), Issue 249. significance revisited [Letter to the editor]. Behavior
Radosevich, D., & Pruitt, M. (1996). Twelve-item Health Therapy, 17, 302±305.
Status Questionnaire (HSQ-12) version 2.0 user's guide. Ware, J. E., Kosinski, M., & Keller, S. D. (1995). SF-12:
Bloomington, MN: Health Outcomes Institute. How to Score the SF-12 Physical and Mental summary
Radosevich, D. M., Wetzler, H., & Wilson, S. M. (1994). scales (2nd ed.). Boston: New England Medical Center,
Health Status Questionnaire (HSQ) 2.0: Scoring com- The Health Institute.
parisons and reference data. Bloomington, MN: Health Ware, J. E., & Sherbourne, C. D. (1992). The MOS 36-Item
Outcomes Institute. Short Form Health Survey (SF-36). I. Conceptual
Schlosser, B. (1995). The ecology of assessment: A framework and item selection. Medical Care, 30,
ªpatient-centricº perspective. Behavioral health care 473±483.
Tomorrow, 4, 66±68. Ware, J. E., Snow, K. K., Kosinski, M., & Gandek, B.
Sederer, L. I., Dickey, B., & Hermann, R. C. (1996). The (1993). SF-36 Health Survey manual and interpretation
imperative of outcomes assessment in psychiatry. In L. I. guide. Boston: New England Medical Center, The Health
Sederer & B. Dickey (Eds.), Outcomes assessment in Institute.
clinical practice (pp. 1±7). Baltimore: Williams & Werthman, M. J. (1995). A managed care approach to
Wilkins. psychological testing. Behavioral Health Management,
Simmons, J. W., Avant, W. S., Demski, J., & Parisher, D. 15, 15±17.
(1988). Determining successful pain clinic treatment World Health Organization (1948). Constitution. In Basic
through validation of cost effectiveness. Spine, 13, 34. Documents. Geneva, Switzerland: Author.
Copyright © 1998 Elsevier Science Ltd. All rights reserved.

4.19
Forensic Assessment
DAVID FAUST
University of Rhode Island, Kingston, RI, USA

4.19.1 INTRODUCTION 563


4.19.2 SOME KEY ISSUES AND CONSIDERATIONS IN PERSONAL INJURY CASE 564
4.19.2.1 Four Central Issues 564
4.19.2.2 The Four Elements and the Forensic Examiner's Task 565
4.19.2.3 Treating Professionals Called into Court 568
4.19.2.3.1 Maintaining the treatment alliance as primary 568
4.19.2.3.2 Release of assessment or treatment records 569
4.19.3 ADMISSIBILITY 570
4.19.4 ASSESSMENT METHODS 573
4.19.4.1 Informed Consent 573
4.19.4.2 Clarifying Referral Questions and Deciding Whether they are Appropriate and Within the
Clinician's Expertise 573
4.19.4.3 Access to Information 574
4.19.4.4 Design of Assessment Procedures: Some Do's and Don't Do's 575
4.19.4.4.1 Use the best available methods 575
4.19.4.4.2 Obtaining adequate information 578
4.19.4.4.3 Conduct technically proficient evaluations 580
4.19.4.4.4 Give adequate consideration to alternatives 582
4.19.4.5 Interpretive Strategies 584
4.19.4.6 Preparing Reports 587
4.19.5 LAWYERS' STRATEGIES AND TACTICS 588
4.19.5.1 Credentials 589
4.19.5.2 Bias 589
4.19.5.3 Manner of Conducting the Examination 590
4.19.5.4 Erroneous or Questionable Conclusions 590
4.19.5.5 Scientific Status 590
4.19.6 DEPOSITIONS AND TRIAL TESTIMONY 591
4.19.6.1 A Sampling of Deposition Topics 591
4.19.6.2 Some Suggestions for Depositions 594
4.19.6.3 Trial Testimony 596
4.19.7 REFERENCES 598

4.19.1 INTRODUCTION suitable parent in a custody dispute in family


court, or may opine about a defendant's mental
Psychologists interface with the courts in state when he committed murder in a criminal
various contexts and situations. A developmen- case. A neuropsychologist may give evidence
tal psychologist may describe a 3 year old's about cognitive or brain functions in a personal
recognition of risk in a product liability case in injury case. Other areas of testimony might
which a child was injured playing with a toy. A include such diverse topics as the trustworthiness
clinical psychologist may testify about the more of eye witnesses reports, the appropriateness of a

563
564 Forensic Assessment

statistical analysis purportedly linking a toxin (see the upcoming revision of Faust, Ziskin, &
with a disorder, test bias in screening job Hiers, 1991; Faust, Zisken, Hiers, & Miller, in
applicants, and the characteristics of individuals press). Whatever differences there might be in
who delay a considerable period of time before viewpoint about the needed level of scientific
reporting traumatic events. backing for legal testimony, which, of course, is
Such qualitative variation in courtroom ultimately a matter for the courts to decide,
activities precludes coverage of all these topics presumably all psychologists would agree that
in one chapter and necessitates an alternative there is merit in increasing the quality of
strategy. Much of the material to follow will have courtroom work: it is with that spirit and intent
a generic quality, that is, many of the points that that this chapter is written.
will be raised relate to a broad array of Some psychologists are hardened veterans of
courtroom work. However, such generic discus- courtroom work and others are just beginning
sion will be anchored by, or mainly directed to consider courtroom involvement. This chap-
towards, the civil arena, with an emphasis on ter is directed mainly towards those at an
distress claims (e.g., post-traumatic stress dis- intermediate or beginning level. Further, some
order) and brain injury claims. These topics of the points I raise are based on my experience
provide suitable illustration of many of the more with legal work, particularly as a consultant. I
general points that will be raised. Additionally, am not assuming that my experience with legal
distress and brain injury claims are very active cases is necessarily representative of such cases
areas in psychology and law, and hence of likely overall, and I am not using my observations to
interest to a broad audience. Although assess- try to reach scientific conclusions or general-
ment is the organizing theme of the chapter, this izations, but rather to provide what I hope will
activity is not entirely separable from the context be practical guidance.
of courtroom opinions and other collateral The guidelines and suggestions for the
issues (e.g., depositions), and thus the material practice of forensic psychological assessment
that follows will not be rigidly limited to detailed in this chapter emerge from, and are
evaluative methodology. Finally, readers who directed towards, the USA court system, and
are mainly interested in a highly specialized area their generalization and application to other
(e.g., child custody, or multiple personality in court systems vary. Nevertheless, many of the
criminal cases) may supplement the material main points likely will apply to forensic work in
presented here with additional sources. other countries, in particular the emphasis on
The chapter will begin with coverage of basic methods and strategies that increase the like-
issues that frame psychologists' courtroom lihood of accurate conclusions.
involvement and that are often the focus of
legal proceedings, such as the core elements of
civil cases and whether psychologists participate 4.19.2 SOME KEY ISSUES AND
as treating professionals or as retained experts. I CONSIDERATIONS IN PERSONAL
will then discuss, in turn, the admissibility of INJURY CASE
psychological evidence; methods for conducting 4.19.2.1 Four Central Issues
and improving courtroom evaluations; and case
strategies and tactics, including lawyers' main Personal injury cases, which involve civil (and
areas of attack, depositions, direct testimony, not criminal) law, address situations in which it is
and cross-examination. claimed that one or more individuals have not
The emphasis of this chapter is far more carried out some duty owed to one or more other
practical than philosophical or theoretical: the individuals, or have not exercised reasonable
main intent is to provide materials that will care. For example, Smith may have been
enhance the quality of mental health profes- negligent in failing to shovel the driveway, and
sionals' legal work. Other psychologists may Jones may have slipped on the ice and suffered a
take a different view than my own on the level or broken arm. Many personal injury cases involve
type of scientific backing that is appropriate for lay individuals, not professionals. Thus, the legal
legal work, although my views on this matter are issues frequently relate to responsibilities of the
considerably more moderate than some readers general citizenry, such as exercising reasonable
might suppose (e.g., I simply maintain that care when driving a car or shooting a gun.
psychological testimony should normally rest Individuals are held to obligations and
on fundamentally sound scientific foundations standards commensurate with the role in which
[see Faust,1993]). Further, I believe that they are functioning and the assumed levels of
psychology is making substantial strides, and knowledge and responsibility that are expected
that there are an increasing number of areas in or required. For example, in a medical mal-
which testimony is likely to be of considerable practice case, a neurosurgeon will almost surely
scientific and practical merit in the courtroom be held to a much higher standard in recognizing
Some Key Issues and Considerations in Personal Injury Cases 565

a brain tumor than a clinical psychologist with a resulted in no damages or minimal damages,
general practice. At the same time, both the there will be little or no award (and the case, in
neurosurgeon's and the psychologist's respon- fact, will probably go nowhere in the courts).
sibilities are dictated by their roles as profes- For example, a therapist may have engaged in
sionals. Once they take on a patient's case, they an egregious action, such as throwing a suicidal
are not only obligated to the duties that bind the patient out of treatment the moment he
ordinary citizen, but in addition the responsi- discovered the patient's insurance had run
bilities involved in the professional care of a out. However, if no harm resulted, for example,
patient. In contrast, and obviously, ordinary the patient did not make a suicide attempt and
citizens are not obligated to properly diagnose through some odd quirk ended up benefiting
and treat individuals for medical or psycholo- from the termination, there are no damages for
gical disorders. The citizen may be responsible which the patient is to be compensated,
for the broken arm resulting from the fall on his although the courts or a professional board
unshoveled driveway, but not for interpreting could still sanction the professional.
the x ray or setting the cast properly.
In most legal cases, in order for a plaintiff to
prevail, or at least to make a monetary recovery, 4.19.2.2 The Four Elements and the Forensic
four elements, and all four elements, must be Examiner's Task
proven (this also applies to malpractice cases).
First, there must be a duty owed. For example, Depending on the type of civil case in which a
we might owe a duty to maintain our premises psychologist is involved as an expert and the
free of foreseeable hazards, but not to maintain particulars, any or all of the four elements
our neighbor's premises. Second, it must be discussed may be pertinent to the professional's
proven that there was a breach of duty, or that task. In malpractice cases, for example, the
reasonable care was not exercised. For example, professional may address each of the elements.
if we do not shovel our sidewalk, we may well There may be debate about whether: (a) a brief
have breached our duty. But if an earthquake or even casual contact with a troubled person
caused our sidewalk to crack and someone fell created a professional duty or relationship, or
and hurt himself before the rumbling stopped about the nature of that duty; (b) whether the
and action could be taken, the plaintiff likely psychologist has followed reasonable care in
will not prevail. assessing suicidal ideation or has met a certain
Third, the breach of duty must be a cause of standard of care; (c) whether failure to meet
the harm. If the person fell, but it was because some practice standard caused the harm the
her feet got tangled in the dog's leash and had person suffered; and (d) about the type and
nothing to do with the tools left scattered on the extent of harm that occurred. However, in many
stoop of the shop, the business owner is not other cases, mental health professionals do not
liable. Depending on the circumstances, issues, touch on at least the first two issues, that is,
and other possible factors, the potential agents whether or what duty was owed and whether it
in question may need to be the main cause of the was breached. For example, a clinical psychol-
harm, a substantial contributor, or may only ogist usually does not testify about proper
have to have made any type of meaningful driving practices.
(versus trivial) contribution. Damages are The issue of causation may or may not be
sometimes awarded in relation to the relative central to the psychologist's courtroom testi-
contribution of the cause at issue. Thus, if the mony. In some cases, another expert will testify
event is assumed to account for 10% of the about cause. For example, a neurologist may
outcome, one multiplies the award by .10. In indicate that the plaintiff's cognitive disturbance
other cases, if a factor made any significant is due to brain damage suffered in the accident at
contribution, the defendant is responsible for issue. In such a case, a neuropsychologist might
the total damage that ensued. For example, if primarily address the issue of damages, or the
the defendant's careless driving caused the cognitive and behavioral impairments that are
accident, then it makes no difference that most present. There may be little need for the
of the plaintiff's injuries would have been neuropsychologist to link the damages to brain
avoided had she worn a seat belt. injury because another professional, the neurol-
The fourth and final issue is damages. ogist in this case, has already made the causal
Assuming the first three criteria are met and connection. The plaintiff's attorney may prefer
the defendant is hence found to be blameworthy to leave the issue of cause in the neurologist's
(i.e., liable), plaintiffs are to be compensated for hands, or the neuropsychologist may be re-
any damages stemming from the event. A duty stricted from providing causal testimony. In
that was owed may have been breached and some cases there may be multiple causal elements
caused an undesired event, but if that event to be considered, one or more of which the
566 Forensic Assessment

psychologist will address. For example, the issue It follows that careful consideration of alternate
of whether reckless driving caused the accident causes, including procurement of relevant
can be separate from an issue such as whether the documents and information, is often crucial
accident caused brain damage. The first question to the psychologist's legal work.
about cause involves the assignment of blame for Psychologists usually should have a basic
the event, and the second the link between the familiarity with key legal issues in a case before
event and its possible consequences. The jury they undertake, or, preferably, before they
could decide that an improper left-hand turn decide whether to undertake, a courtroom
caused the accident, and that the accident caused evaluation, and should not assume that the
a chronic neck condition but not a brain injury. lawyer will convey this information. The
In other cases, however, if the mental health psychologist is typically better off knowing in
professional does not testify to cause, there may advance whether cause is at issue and whether
be no other expert who can. For example, in a the lawyer does (or does not) want the
malpractice case, it will likely be up to the psychologist to address this matter; and whether
psychiatrist or psychologist to connect a ther- the standard applicable to the case is a
apist's breach of duty with the plaintiff's substantial versus primary contributor. Such
problems, or at least those problems linked to foreknowledge may allow psychologists to
the event (such as depression, increased anxiety, identify circumstances in which they are being
or decreased responsiveness to therapeutic asked to do something that is unreasonable or
intervention). If there are plausible alternative beyond their expertise, or may allow them to
causes for the problems (e.g., substance abuse or direct their efforts towards the matters at issue. I
other emotional traumas), and if the profes- have seen many cases in which psychologists
sional herself cannot determine whether there is had limited or minimal acquaintance with key
a causal link, how can one expect the jury to issues in a case and overlooked them in their
make a judgment about a matter the expert assessments or reports. It is difficult to conduct
cannot decide? A decent cross-examiner can thus an optimal or proper examination when one
quickly show that with all the professional's does not know the specific purposes to which it
presumed expertise, she cannot say there is a link should be directed. For example, exhaustive
between the event and adverse functioning. This cognitive testing may be of little use if there is
may be devastating to the plaintiff's case. In really no issue in the case related to intellectual
some instances, if the expert's inability to address functioning, or occupational functioning may
cause comes to light prior to trial (e.g., during a not be covered thoroughly although it is far and
deposition that is taken as part of the discovery away the most important area of dispute.
process), this may lead the judge to place tight Monetary awards are intended to cover past,
restrictions on the psychologist's trial testimony, present, and future expenses, and to compensate
or even to exclude the expert or dismiss the entire for losses that follow and are due to the injury.
case. For example, in a toxin case in which brain Expenses might include damaged property,
damage is being claimed, if the attorney can needed equipment (e.g., a cervical collar),
produce no expert to help with this core alterations of one's home setting (e.g., a wheel-
elementÐcausalityÐit may not be possible to chair accessible ramp), and treatment. Treat-
prove it legally and the case may be over. ment might be one of the largest, if not the
The need to establish cause also illustrates largest, component of damages. For example, in
some of the differences that can arise across a serious brain injury case, extended inpatient
clinical and courtroom situations. In clinical rehabilitation and outpatient services may cost
practice, a determination of cause or etiology hundreds of thousands of dollars.
may make little difference because it may not Losses might include earnings, pain and
alter treatment. In the courtroom, the specific suffering, and, at least theoretically separate
cause, whatever the treatment implications, may from the latter and where allowed, ªhedonic
make all of the difference. A psychologist may damages.º Hedonic damages refer to lost
fail to recognize how an issue that can be of such opportunities to experience pleasurable or
minimal relevance in a clinical situation, and, positive events (as opposed to experiencing
therefore, sometimes bypassed or set aside negative events), such as the enjoyment of
during the course of an evaluation, might have attending one's senior prom. Consortium claims
huge repercussions at deposition or trial. This is are also common, in which the spouse claims
also why, in the legal arena, it is often not that the injured partner is less able to fulfill
enough to identify a condition as present. It may marital roles (e.g., the injured person has
be true that Smith has brain damage, but if it decreased sexual interest or capacity). With lost
was not the car accident but a toxic exposure in a earnings, one considers time missed from work
different place and circumstance that caused it, to date and whether the person has returned, or
the driver of the other vehicle is not responsible. likely will return, to gainful employment. If the
Some Key Issues and Considerations in Personal Injury Cases 567

individual has resumed working, decreased asking the psychologist about the grades he
likelihood of promotions or ability to compete received in business classes or technical ques-
for higher paying jobs can be considered. In the tions about calculating interest rates and cost of
case of younger persons, projections might living increases. It often only takes one or a few
involve the type of jobs the individual may have instances in which an expert has been shown to
held in comparison to post-injury work capa- express opinions in areas she knows little about
cities. In principle, damages should flow from before the jury assumes the expert does not
the injury. An individual who previously had a know much about anything.
poor marital relationship should not recover for Also, although I will not repeat the same
the marital problems that pre-dated the event or point over and over throughout the chapter and
were caused by other factors. instead will mainly emphasize practical con-
The above discussion should make it clear sequences, there are compelling ethical reasons
that individuals are generally not supposed to be to restrain practice in relation to level of
compensated for having a condition per se, but expertise and the availability of adequate
from the consequences, problems, suffering, or scientific methodology/backing. Parties have
dysfunction associated with the condition. For much to gain and lose in legal proceedings, and
example, if two individuals both have Post- the hope and expectation is that the expert in a
Traumatic Stress Disorder (PTSD), but one is branch of science will assist juries in reaching
usually able to control her symptoms, needs sounder decisions than they would otherwise,
limited treatment, and is functioning well at something that demands some reasonable level
work, she is likely to be compensated much less of knowledge and scientific underpinning.
than the individual who has broad impairment A psychologist need not go to law school to
and cannot hold competitive employment or become an effective forensic examiner, but
maintain intimate relationships. This emphasis should try to understand potential differences in
on the consequences of injury is one reason it is the clinical and legal arenas, including the basic
so important for an evaluator to go beyond a rules or principles of procedure and just what
label and the immediate office setting and try to the issues are that define the scope of the
appraise, if scientifically viable methods are expert's involvement and task. For example, in
available, an individual's life functioning (see the clinical arena: (a) we are concerned first and
further below). Failure to do so can result in a foremost with the client's best interests (within
serious over- or underestimation of damages. the bounds of ethics) and we hope the client
For example, an individual with orbital frontal perceives us (or comes to perceive us) as such;
brain damage may seem relatively unremark- (b) we usually assume that clients are not
able on cognitive testing and structured inter- motivated to deceive us for purposes of self-
view but may exhibit incapacitating problems in gain, potentially at cost to innocent parties, and
everyday behavior, whereas another person we usually do not obtain collateral records
with markedly elevated scores on personality specifically for the purpose of checking on their
testing and many complaints may be function- veracity; (c) we may not be particularly
ing proficiently. concerned about cause, especially if we have
Although it may fall within the psychologist's narrowed possibilities down to the point that
purview to describe areas of positive and the alternatives do not change treatment; and,
negative functioning and to formulate projec- as follows, (d) we tailor our assessment to
tions over time, certain technical aspects of clinical needs and treatment questions.
damage appraisal, especially their conversion to In contrast, in the legal arena: (a) we
monetary values, are usually better left to others presumably try to render an objective opinion,
(which does not necessarily or always mean that even if it might conflict with the examinee's self-
one endorses their methods). For example, life interests, and we (should) assume that the
care planners may list the equipment a brain- examinee realizes we are a potential adversary
injured individual needs and the cost of various (and ought to inform him of such); (b) we
items, or an economist may translate conclu- recognize that examinees might be motivated to
sions about occupational capacity to numbers. deceive us and that we need to take active steps
Psychologists should be very careful not to to check on their credence; (c) we often must
undertake tasks for which they are not well focus on cause and may attempt relatively fine
qualified or knowledgeable or for which under- or subtle distinctions that may not impact on
lying methodology is dubious. As will be treatment at all; and, as follows, (d) we need to
discussed, effective cross-examiners have re- address, where appropriate and possible, key
fined abilities to detect weak points and to start legal questions, such as everyday functional
tearing at these very locations. The credibility of capacities, as opposed to what might otherwise
an otherwise exemplary evaluation may be be our main concerns (e.g., subjective beliefs
quickly demolished when the lawyer starts and perceptions).
568 Forensic Assessment

As is also evident, in a clinical context we do Further, the psychologist might find certain
not typically expect our ideas or conclusions to expectations for legal examinations (e.g., taking
be subjected to cross-examination designed to little or nothing the patient says for granted and
counter our view or to damage our believability performing careful checks on credibility) con-
or character; we might feel free to engage in trary to the role of treater. Psychologists who
hypothesizing and conjecture when trying to testify in such circumstances usually are best
achieve greater understanding of our patient; served by openly acknowledging any limits in
and we do not expect the examinee to suffer their methods. Any decrement in credibility
potentially devastating consequences even if his associated with such limits may be more than
claim is worthy but we cannot defend our offset by the believability often accorded to
opinions in court. Legal evaluators try to keep in treating professionals as opposed to experts
mind the eventual uses of, and challenges to, hired by one or the other side, and by the jury's
their work. They apply this awareness to guide relative intolerance for a lawyer who is overly
their efforts from the start, one hopes in the aggressive with a treater, especially one who is
direction of performing high quality work that likeable and candidly discusses shortcomings.
addresses the legally relevant issues and respects For example, if the lawyer criticizes the
the bounds of scientifically sound method and psychologist for failing to obtain extensive
opinion. background information, the psychologist
Finally, with all this said, juries' decisions might comment that were he conducting a legal
may overlook technicalities that, in theory, assessment he would have sought this informa-
should be decisive or at least highly influential. tion, and that it might have been useful, but
For example, although in principle the plaintiff when the patient presented to him with suicidal
usually must prove his case (the standard often ideation he was not worried about how he
being ªmore likely than notº or ªby a would look in court but about preventing a
preponderance of the evidenceº), other con- tragedy. In contrast, if the psychologist becomes
siderations may prevail. For example, in one defensive and insists the lack of background
case, jurors interviewed after the trial agreed information is irrelevant or that such records
that they thought the plaintiff probably was not could not possibly change his opinion, his
brain damaged, but that in case he might be they credibility is likely to be damaged and he might
wanted to give him enough money to pursue the no longer be perceived as the type of doctor the
treatment he might require. In other situations, jurors themselves would seek out for treatment.
one or the other side may be perceived in such a Treating psychologists may want to try to
negative light that judgments that should not be avoid testifying altogether, at least in cases in
swayed by such reactions may nevertheless be which they are concerned about damaging the
altered. For example, the jury may be overly therapeutic relationship. The specter of court
generous in deciding fault in favor of the can also compromise treatment relationships
defendant because they found her honest and from the beginning. For example, parents who
likeable and thought that the plaintiff was an are considering divorce may be more concerned
undeserving, devious individual. An expert who with impressing the psychologist favorably, just
fares poorly or who is perceived as dishonest can in case a custody suit eventuates and they need a
in fact do considerable damage to her own side helpful opinion, than about solving their
on issues that might seem only remotely related problems.
because she may cast doubt on others, such as When initiating assessment or beginning
the lawyer who retained her. therapy with possible courtroom involvement
looming in the future, and especially when
circumstances are such that it threatens to
4.19.2.3 Treating Professionals Called into impede progress, it can be helpful to take an
Court immediate position that elevates the therapeutic
role as primary and minimizes, to the extent
4.19.2.3.1 Maintaining the treatment alliance as
possible, the likelihood that the present clinician
primary
will impact on any subsequent legal proceed-
My comments distinguishing the clinical and ings. A psychologist cannot assure a patient she
legal contexts assume, in the latter instance, that will not be called on to testify in court and
the psychologist has been asked to perform a should think twice before assuring the patient
legal assessment and knows from the start that that, even if called, she will not comply (because
this is the task. Treating psychologists may be this could lead to serious sanctions, such as jail
called into court and, having served until then in time). However, the therapist can say, with all
a clinical role, often cannot be expected to have sincerity, that she considers the treatment role to
approached the case in a manner, or to have be primary. She can indicate that if asked to
taken the steps, expected of a legal examiner. serve as a potential witness, she will tell the
Some Key Issues and Considerations in Personal Injury Cases 569

attorney (or either attorney) that she has not cases raise an additional set of concerns; for
attempted to form an objective or detached additional details and guidelines, the reader is
opinion and is likely to be biased or strongly referred to Bennett, Bryant, VandenBos, and
influenced by the role as treater, that is, that she Greenwood's helpful book (1990).
does not have, nor has she tried to develop, the Some psychologists believe they can assert a
type of impartiality desired of a testifying right of confidentiality when their records are
expert. She would therefore advise the attorney subpoenaed in legal cases, especially if they or
to retain an independent expert to perform an their patients never anticipated courtroom
evaluation and address the issues relevant to the involvement, the legal issues seem indirectly or
courtroom. Some potential clients, who, in fact, minimally related to the therapeutic work, and
are primarily interested in using the clinician for the file contains personally sensitive materials.
courtroom leverage may abandon treatment On the contrary, confidentiality is a right of the
immediately. patient, not the therapist. If the individual wants
If the treating psychologist conveys such a his records released (and especially if he is aware
therapeutically-oriented position to the attor- of the contents), and if there is no reason to
ney, he may well retain a separate expert. believe the person's decision-making powers are
Attorneys have strong reservations, if not terror, seriously impaired, the therapist is obligated to
about calling witnesses that are supposedly there comply. (Issues related to collateral records that
to support their position but that they cannot the therapist has not generated or copyrighted
control or predict. The damage caused by an materials, such as test items, can become
adverse opinion is usually much greater if it complex and will not be taken up here.) Further,
comes from the attorney's own witness as in civil cases in which an individual has placed his
opposed to that of the other side. The attorney mental status at issue, the confidentiality of past
may still call the therapist, not to state expert or current therapeutic materials is likely to be
opinions, but simply to describe facts about such waived, no matter what the client prefers (unless
things as entries in the chart or treatment costs to she drops her case or at least certain damage
date. Of course, it is difficult for the treating claims). There are exceptions, however. For
psychologist to arrive at a satisfactory solution example, some states have exceptions for alcohol
sometimes, because a patient, although initially or drug treatment, and juvenile records are often
endorsing the psychologist's position of neu- protected. Further, records may contain very
trality, ends up feeling disappointed or betrayed sensitive materials that seem irrelevant to the
when the moment of truth arrives and abstrac- case at hand but that could be terribly embar-
tion becomes reality. Clients also need to realize rassing to the client or might damage her public
that the therapist's commitment to treatment image and occupational endeavors. In such
over legal involvement is not carte blanche, situations, the responsibility usually resides with
because certain behaviors or actions (e.g., child the plaintiff's attorney to protest the release, or
abuse) would not only likely alter obligations but use, of the materials, although the keeper of the
are also associated with reporting requirements. records (i.e., the therapist) may need to alert the
appropriate party to the presence of sensitive
materials in the file. A judge might choose to
4.19.2.3.2 Release of assessment or treatment
review the records privately (in camera), and
records
then decide whether to release all, none, or parts
Although a series of generalizations can be of the material. As a general guide, the
provided about the release of records, only psychologist should not release records unless
limited specifics are possible given both restric- he has the consent of the patient or a court order
tions in the author's knowledge and the (not just a subpoena) demanding their release,
complexities and idiosyncrasies that often arise. although one should not simply ignore a
For example, situations may occur regarding subpoena (see further below).
minors in state custody and who has the It is mistaken to assure a patient when
authority to release records; special questions beginning an assessment or therapy that she has
may be raised when the mental competence of an absolute right to confidentiality or that there
the patient is at issue, therapists may feel a are only a few circumstances in which records
strong obligation to review records in detail can be obtained by third parties. There are, in
with former or present patients before releasing fact, many exceptions to confidentiality in most
them; psychologists may be concerned about states, often over a dozen, and patients may end
guarding the security of test materials, the up disclosing materials that damage their
clinician may be unsure what constitutes his file courtroom cases or reputations that they
(e.g., does this also include records from other otherwise would have withheld had they been
providers), or records may make references to properly informed. For example, an individual
others that are not party to a suit. Malpractice in a custody case may not want to tell the
570 Forensic Assessment

therapist about occasional wild sexual fantasies Standards for admissibility often vary across
that he can easily resist and would never pursue. federal and state courts, and from state to state,
A few additional points can be noted. First, and judges may vary considerably in the way
the psychologist should not just ignore a standards are applied. What one court and one
subpoena, and certainly not a court order. This judge lets in, another might bar. There is little
does not mean that records must be released doubt, however, that the Supreme Court's
immediately, but if the psychologist is going to recent ruling in Daubert v. Merrell Dow
resist doing so or needs more information before (1993), which involved the admissibility of
acting, he needs to communicate the basis for scientific evidence, is having considerable im-
the initial noncompliance to the relevant party pact. Many states are guided, in large or
in a timely and proper manner. In such cases it substantial part, by these new Federal guide-
might be wise to consult an attorney. Second, it lines. Briefly, prior to Daubert, admissibility of
is usually wise to assume that treatment notes scientific evidence in Federal court was decided
can be obtained in a legal case, or at least to by two sets of standards: Frye (1923), which
recognize the real possibility that some other emphasized such considerations as acceptance
party might gain possession of the psycholo- within the scientific community, and a separate
gist's records. Third, do not alter records. In a set of Federal standards. In Daubert, the court
malpractice case, for example, it is not only rejected Frye, elevated the Federal standards,
obviously unethical and illegal to alter records, and further elaborated upon them. Although
but it is often fatal to the case. Some the exact impact and interpretation of the
psychologists seem naive about the mechan- Court's ruling will be clarified gradually as more
isms, and sometimes relative ease, with which and more cases are decided, at least two
altered records can be detected. Altered records relatively clear trends have emerged.
demolish the clinician's credibility, and once this First, Daubert explicitly acknowledges the
occurs the case is often, in effect, over. Fourth, if judge's role as gatekeeper in deciding whether or
in doubt, contact an attorney, or at least another not to admit scientific evidence. This means, in
professional who is knowledgeable about legal essence, that judges will likely feel freer to
matters. Issues relating to the release of records exclude expert evidence on scientific (or pur-
can quickly become complex, the adversarial portedly scientific) matters with less concern
nature of courtroom proceedings may be about reversal.
foreign to the professional, and the potential Second, in a number of cases, Daubert has
consequences of improper decisions can be lead to the exclusion of evidence that might
serious, thereby calling for caution. otherwise have been admitted under Frye,
particularly in instances in which scientific
4.19.3 ADMISSIBILITY foundations are weak (see Bersoff, 1997). In
deciding whether to admit evidence and in
The issue of admissibility, or what is allowed evaluating scientific status, Daubert directs the
into evidence, has occupied many legal scholars judge to consider such matters as demonstra-
and has many facets that often become extremely tions (or lack thereof) of scientific validity,
intricate. The issues that will involve us here is whether findings have been published in peer-
when, or whether, psychologists are allowed to reviewed journals, and whether the method has
testify as experts, and the extent to which their a known error rate. None of these considera-
testimony might be constrained. To clarify the tions are necessarily dispositive, and the court
latter point, although a psychologist may be did not attempt to create an exhaustive list of
allowed to testify, the scope of testimony might criteria nor to place them in hierarchical order.
be restricted. For example, in a criminal case, she Thus, Daubert is very clear in situations in
might be allowed to address issues related to which all scientific indicators are either positive
symptoms of PTSD that she believes the claimed or negative, and increasingly incomplete in its
victim of an assault manifests. However, she guidance when indicators conflict (although the
might not be allowed to address whether the same basic point could be made for methods
examinee's presentation fits expectations for and approaches that practicing scientists and
victims of violent rape, as such testimony might philosophers of science use to try to settle
be seen as unreliable and as invading the scientific disputesÐsee Faust & Meehl, 1992).
province of the jury to decide matters of fact As would be expected in such a circumstance,
or ultimate issues. Various types of experts with there has been, and undoubtedly will continue
various types of credentials testify in court. to be, considerable variation in the application
Psychological testimony is usually considered to and interpretation of criteria for deciding
fall within a branch of science, and thus the whether evidence meets the Daubert test.
applicable standards are those relating to the However, the weaker the scientific support,
admission of scientific evidence. especially in extreme cases, the greater the
Admissibility 571

likelihood post-Daubert that evidence or testi- stated commitment to science, could support.
mony will be excluded. Consequently, in cases Such standards might ultimately improve psy-
in which an expert, when challenged, cannot cite chology's position with the courts, given the
any decent scientific support for his opinion or methodological sophistication of many mem-
assessment methods (e.g., there are no pub- bers of the field and the potential to build or
lished studies), there is a very real chance that extend strong scientific foundations in key areas
the psychologist will be prohibited from of legal interest. For now, a psychologist might
testifying. Overall, Daubert has reinvigorated want to think carefully before agreeing to
challenges to the admission of scientific testi- perform an assessment or provide testimony in
mony, the potential testimony of mental health areas with weak scientific underpinnings. Of
professionals and others is being placed under course, this same argument certainly could be
much greater scrutiny, there are an increasing made whether or not there is likely to be a
number of decisions in which testimony has Daubert challenge. I would note in this context
been excluded, and some reduction of junk that although psychologists might believe that
science in the courtroom seems to be occurring. experience can partly or fully compensate for
What does all of this mean for the mental scientific shortcomings, even serious ones, the
health professional? In an increasing number of extensive literature on experience and accuracy
situations, especially when the scientific founda- raises major questions about this conclusion. In
tion for testimony is weak, experts can expect fact, the negative results of many studies on
challenges to the admission of evidence. In experience and accuracy in the mental health
Federal court and in an increasing number of field have been noted by diverse individuals (e.g.,
state courts, there is now more than a remote Faust, 1984; Garb, 1989; Brodsky, in press, the
chance that testimony may be limited or latter being a revision of his earlier views [1991]).
excluded if there is minimal scientific support Even if the courts allow testimony on some
for the expert's opinions or methods, or if the topic, a psychologist may feel that the state of
literature is predominantly negative. Testimony knowledge is insufficient to develop trustworthy
might be allowed in areas in which scientific opinions and thus may opt not to be involved.
backing is stronger, but if foundations are weak Just as the courts may err in excluding
across the board the psychologist may be barred testimony, they may err in admitting it; and it
from testifying entirely. Most commonly, an is a dubious position for a professional in a
objection to the admission of testimony takes branch of science to assert that so long as it's
place before the trial. Written documents are good enough for the courts it is good enough for
submitted and, depending on the judge's discre- me. Given psychologists' methodological
tion, there may be a pre-trial hearing at which knowledge, situations certainly can arise in
arguments are heard and experts might testify. which the professional believes that a judge has
In some areas there seems to be little question overestimated the scientific standing of a test,
that psychological method or knowledge meets method, or assertion and that, instead, knowl-
the Daubert standard. For example, many edge is very shaky or dubious. Most psychol-
statistical methods have sound scientific foun- ogists' courtroom involvement is voluntary, and
dations. There is also a good deal known about there is usually no external authority that
the potential impact of mild to severe brain compels the psychologist to testify if she does
injuries, at least in general (as opposed to the not wish to do so, or that will invoke official
difficulties that might be involved in assessing sanctions for nonparticipation. In these situa-
specific impact in individual cases). In other tions, psychologists can apply internal or
areas, however, psychologists may be very professional standards in deciding on their
susceptible to Daubert challenges. For example, courtroom involvement. (For a more extensive
many neuropsychologists construct their own discussion of dimensions a professional might
idiosyncratic batteries, and there may be no consider when determining a method's readi-
peer-reviewed studies on the effectiveness of ness for the courtroom, see Faust, 1993).
these batteries as a whole. Predictions of long- The court does not decide admissibility solely
term outcomes may also lack scientific founda- on matters relating to science. In addition,
tions, or studies of their accuracy may have testimony must be relevant to the issue at hand.
produced consistently negative results. For example, a psychologist might be a first-rate
Although a psychologist might view Daubert authority on test bias, but the topic might not
as an unenlightened or unjustified restriction, arise in the case, or it might be so secondary or
the hurdle that it createsÐthat evidence pur- remote to any of the matters in dispute that the
ported to be scientific should have a reasonable judge excludes the expert. Additionally, the
scientific basisÐhardly seems outlandish. In expert must contribute something over and
fact, standards of this type would seem to be above what the jury knows or can determine on
something that organized psychology, with its its own. For example, if the defendant's left foot
572 Forensic Assessment

has six toes and webbed feet and a plaster cast A: Of course.
matches the pattern exactly, the jury probably Q: And during that oath, you gave your
does not need a footprint expert to provide word that you would tell the whole truth, isn't
commentary on the resemblance. that correct?
Experts also need to have adequate creden- A: Yes, I did.
tials. Usually, before opinions are expressed Q: And doctor, you told us, did you not, that
about issues in the case, the lawyer has the you were a graduate of an APA-approved
psychologist recite her credentials and then program. Wasn't that your testimony?
offers (proffers) the witness as an expert in some A: Yes.
area (e.g., clinical psychology or PTSD). The Q: That was a misrepresentation, wasn't it?
expert should have credentials to support the A: Absolutely not.
offer. Psychologists may not realize that the A: And your status as a graduate of an APA-
standards judges apply when deciding whether approved program, that's the same represen-
to qualify the expert are often relatively minimal tation contained in your resume, isn't it?
(e.g., a Ph.D. degree and some education and A: Yes.
training in the area) rather than lofty, or that Q: And this is the resume you send to
jurors may have difficulty separating, or care lawyers, and have presented in court many
little about, differences in credentials that might times, isn't that true?
strike the professional as weighty. A few A: Well, I don't know about many times, but
publications might seem about as good as 20 it is the information I have provided.
to a jury, and a Ph.D. is likely to be an Q: It is the same information you would
impressive degree whether or not it came from a present to patients, would you not, if they
big-name university. Ironically, some experts asked you whether you came from an APA-
create major problems for themselves by failing approved training program?
to represent their credentials with a high degree A: I'm not sure a patient would ask such a
of accuracy, when the difference often would question.
matter little to a juror or to a judge evaluating Q: Doctor, I'm sorry if my question was
qualifications. For example, an expert might unclear to you. If asked by a patient, you
exaggerate the number of patients seen with a would tell them that you were a graduate of
particular problem, or might represent himself an APA-approved program, correct?
as a graduate of an APA-approved training A: Yes.
program when, in fact, the program was Q: Then let me ask you one more time. You
accredited after the time of graduation and are telling the ladies and gentlemen of the jury
the listing is consequently inappropriate. that it is true, without qualification, that you
Take the last example. If the credential is listed are a graduate of an APA-approved program.
accurately, the psychologist might be able to You are giving us your word, is that correct,
indicate, perhaps during his direct, that his just like you promised to be honest about the
program gained APA approval shortly after he other areas in which you testified?
graduated. The psychologist could further in- A: Yes.
dicate that although the APA, in essence, Q: And, doctor, you graduated in 1985, isn't
approved the same program he attended, it that the case?
would be technically incorrect for him to list A: Yes.
himself a graduate of an APA-approved pro-
gram, and he would not want to create a false The lawyer can then produce an issue of the
impression. A juror is unlikely to give this a American Psychologist that includes the section
second thought, and it might even be a plus as it indicating that a psychologist cannot describe
conveys honesty. In contrast, the following his- or herself as a graduate of an APA-
cross-examination illustration shows what can approved program unless the program was
happen with even a seemingly small misrepre- accredited at the time of graduation. Things
sentation: can almost only go on a disastrous downhill
course from there. For example:
Question (Q): Doctor, is there anything
you'd like to correct in your direct testimony Q: Having read that, and seeing the listing
before I start asking you other questions? for your program and the date of approval as
Answer (A): No. 1990, the truth is you are not a graduate of an
Q: Doctor, is there anything about your APA-approved program, isn't that the case?
credentials you may have stated in error? A: The listing is probably in error.
A: No, not to my awareness. Q: Is that right doctor? I'll tell you what, we
Q: Doctor, do you remember raising your can check on the listings from other years of
right hand before you started to testify? the American Psychologist, which I have on
Assessment Methods 573

the table here, and the sworn statement I've incentive to diagnose disorder in order to engage
also obtained from the director of your the individual in treatment for which one can
graduate program. Do you think that will charge. For these and other reasons, it is
help clarify the truth of the matter? probably best, when possible, to avoid serving
as both a forensic examiner and treating
A misrepresentation of this type can be fatal in psychologist.
not only the instant case, but in future cases.
For example, in a subsequent case, a lawyer
might ask the expert whether she has been busy 4.19.4.2 Clarifying Referral Questions and
calling all of the judges, lawyers, and patients Deciding Whether they are Appropriate
with whom she's been involved to inform them and Within the Clinician's Expertise
that she misrepresented her credentials.
Lawyers usually make initial contact with an
expert on a specific case by written correspon-
4.19.4 ASSESSMENT METHODS dence or phone. Such communication will
commonly eventuate in a request to perform
In this section I will present various ªtipsº for
an evaluation and perhaps some commentary
performing legal assessments. Many of these
about the questions the lawyer wants the expert
guides apply similarly to conducting quality
to consider. At times, such requests are vague.
clinical assessments, although there are also a
For example, the lawyer might ask the expert to
number of features distinct to forensic work.
evaluate psychological status but not even
specify whether she is primarily interested in
4.19.4.1 Informed Consent cognitive or emotional status, or both, or
neither.
The fact that an evaluation is being per- Vague questions may reflect the lawyer's lack
formed for legal purposes typically does not of familiarity with issues in the mental health
alter the need to obtain informed consent before field, and the expert should attempt to clarify just
proceeding. The nature and purpose of the what is at issue in the case, such as the possibility
assessment should be presented, and it should of brain damage, emotional disturbance in
be made clear that the clinician's primary response to a traumatic event, present and
obligation is not to advance the examinee's future work capacity, malingering of mental
interests but to address certain questions, disorder, or whatever. The lawyer may be unsure
whether or not the conclusions help or hurt about what a psychologist, or the expert in
the individual's case. The expert should indicate question, does and does not do and can and
who retained him, such as plaintiff or defense cannot do; and the psychologist may need to
counsel, or judge. The expert should explain gain some familiarity with the file before a more
that his role is to perform an evaluation and not productive conversation is possible or greater
to provide treatment, and thus that there is no clarification can be achieved. For example, the
doctor-patient relationship. (This certainly does lawyer may not realize that psychologists cannot
not mean the psychologist should not make prescribe medicine, that not all clinical psychol-
treatment recommendations she deems appro- ogists are trained to evaluate neuropsychologi-
priate or helpful.) At this point, the examinee cal status, or that many psychologists are not
may not consent to the procedure, which, in civil fully prepared to evaluate members of minority
litigation, and often in criminal cases, is his groups. Or, they might not realize that psychol-
perfect right. ogists often provide treatment services, that
There are cases in which experts are retained to many clinical psychologists are very familiar
perform evaluations, but with the understanding with various matters involving psychotherapy,
that they themselves may undertake treatment if or that the condition in question has a grave
they deem it appropriate. For example, the prognosis and that this issue would seem highly
plaintiff's attorney may retain an expert she relevant to the plaintiff's case. It is not that the
esteems to perform a neuropsychological eva- psychologist should do the attorney's job, but
luation of head trauma and, as needed, to rather that the psychologist's input may be
provide rehabilitative services. The involvement needed to hone in on relevant issues in the case or
in the dual role of forensic examiner and treater what is and is not possible to accomplish through
can create complications. For example, once a psychological evaluation.
engaged in a therapeutic relationship, it may be Referral questions also need to be unobtuse
difficult, when it comes time to testify, to be as so that the clinician can determine whether the
objective as one otherwise might be or to express required scientific foundations exist to provide
views that will likely harm the plaintiff's case. informed answers, and whether she has the
Also, challenges can be raised about a financial requisite expertise. It is not only ethically
574 Forensic Assessment

questionable to take on issues or questions for in her use and weighting of the additional
which one lacks the needed knowledge, but it information under such circumstances.
can easily lead to courtroom fiascoes. It is Further, given the uncertainties and ambi-
rather unnerving to have to admit on cross- guities that often exist in our field, it would be
examination in open court that this is only the surprising if considerable amounts of new
second case of Fisbee's disease one has seen and information did not contain at least some
that almost all of one's scientific knowledge on evidence contrary to our conclusions, or even
the topic was acquired through concentrated evidence that required new areas of inquiry or
reading during the last week. This from the same raised serious questions about the initial
expert who has contended, in effect, that the jury opinion. Receiving such information at a late
should defer to him over Dr. Jones, who has date can therefore create major difficulties on
written the seminal work on the disorder and the stand. For example, the expert might be
has acquired exhaustive knowledge about the asked a line of questions like the following:
condition through years of study. The psychol-
ogist is also likely to embarrass herself if a well- Q: Don't you agree that mild head injury and
prepared and knowledgeable lawyer begins to the effects of serious alcohol abuse can look
ask about specifics. For example, someone who similar on cognitive testing? (If the expert
attempts to acquire quick knowledge about disagrees, a statement he may have made in
neuropsychology is unlikely to know what the his deposition or in other cases confirming
abbreviations HEENT and GCS in the emer- this assertion can be raised.)
gency room record mean, or may startle the jury Q: And didn't you describe the level of
when the response to the inquiry, ªHow many drinking that can result in abnormal perfor-
journal articles have you read devoted exclu- mance on cognitive testing?
sively to mild head injury in the last 5 years,º is, Q: And didn't you indicate that one of the
ªNone.º There is a saying in aviation that there ways you ruled out alcohol as a factor in Mr.
are old pilots, and there are bold pilots, but there Smith's testing results was his telling you that
are few old, bold pilots. This might be he had almost never touched a drop in his life?
remembered as one engages in internal debate Q: Until yesterday, when attorney Jones
about whether or not to take a case. The gave you the records, you were unaware of
psychologist might get by with questionable Mr. Smith's previous treatment for alcohol
work for some time, but a single encounter with abuse, weren't you?
a skilled and well-prepared lawyer can inflict Q: So, when you formed your opinion and
serious, long-term damage. issued your report, you had not seen the
treatment records describing Mr. Smith's
history of alcohol abuse, had you?
4.19.4.3 Access to Information Q: Surely this is the type of information you
would have preferred to have known about
As will be seen, the quality of legal evalua- when you conducted your evaluation, isn't
tions, and how well they stand up under that correct? (The lawyer is unlikely to care
scrutiny, often depend on the thoroughness of much, and may even welcome it, if the
information gathering. Further, the time at psychologist wants to fight her on this point,
which information is received can matter a great because it will probably seem unreasonable to
deal. For example, a psychologist who forms an the jury and further compromise the expert's
opinion rapidly and in the face of inadequate standing.)
information and who never revises his views in Q: And had you had this information, you
the slightest way, even after obtaining a great would have asked additional questions about
deal of additional information, may not seem substance abuse, wouldn't you?
credible. Also, a jury might look askance at an Q: You would have asked questions to find
expert who reaches opinions and perhaps issues out whether Mr. Smith drank to the point
treatment recommendations, or even begins that you would expect abnormalities on the
treatment, well before much of the information testing, isn't that right?
has been obtained, especially if that information Q: But it's too late to ask those questions
was readily available. For example, many now, isn't it?
experts, who may have completed their reports
a year or more earlier, indicate that they An expert who denies the import of such
received or reviewed additional key information questions may lose all credibility with the jury.
the morning of the deposition or the night Many lawyers will not know what records a
before their courtroom appearance. It might be psychologist needs to perform a proper evalua-
difficult for the jury to believe that the tion (see more on this below). Thus, it is up to
psychologist could be completely dispassionate the psychologist to inform the lawyer. The
Assessment Methods 575

mechanics for obtaining information (as op- stance, that the lawyer will never have a
posed to a decision about what information to compelling counterargument, or that all such
obtain) can be worked out in different ways, differences reflect a lawyer trying to control the
although it is often appropriate for the lawyer to expert. For example, a record a psychologist
do the needed leg work. It can be very difficult might ideally like to have may really be very
or impossible to obtain certain types of remote to the issues in dispute and extremely
information (e.g., earlier school records in the difficult and expensive to obtain. Or, due to time
case of an elderly individual), but it may become limits imposed by the court, a psychologist may
obvious that a lawyer is not really motivated to have to decide whether to perform an evaluation
carry out the psychologist's requests and rather before all of the desired information is obtained.
wishes to control the flow of information. This The psychologist might decide to go ahead, but
reluctance and attitude may severely compro- to be sure the report contains an explicit
mise the quality of the evaluation and place the statement describing the circumstances and
psychologist in a vulnerable position. For indicating that opinions are subject to change
example, on deposition, an expert might be or elaboration as more is learned. However,
asked about the type of information she psychologists who take the long view probably
routinely procures, or prefers to have, when should err on the side of caution. There are
performing a comparable evaluation. Proce- certainly some attorneys who wish to control
dures followed in similar cases in the past might access to information in order to increase the
be raised. Discrepancies between the usual and chances of a desired result, or who plainly want
present case can be pointed out, and the expert to manipulate the outcome. A psychologist will
can then be asked to explain the reasons why the not only be on safer ethical ground but likely to
current evaluation is so much less complete, the have a much longer career as a forensic evaluator
extent to which gaps in information compro- by avoiding such lawyers.
mise the trustworthiness of the conclusions, and
who has controlled the flow of information. A
careless response to the effect that such
extensive information is unnecessary can lead 4.19.4.4 Design of Assessment Procedures:
to such questions as, ªAre you telling me that Some Do's and Don't Do's
when you gathered far more extensive informa-
4.19.4.4.1 Use the best available methods
tion in your many other cases and charged
thousands of dollars for the totality of your time It is important for psychologists to use the
reviewing those materials, all that was unne- best available methods, not only to maximize
cessary?º An expert who is true to the oath and the probability of reaching accurate conclu-
admits that the lawyer exercised control might sions, but because their work may be placed
be asked whether lawyers taught their assess- under intense scrutiny in legal proceedings. For
ment courses in graduate school or whether it is example, the psychologist may be asked a series
a routine procedure when performing a psy- of pointed and specific questions about techni-
chological evaluation to consult a lawyer in cal issues. Good attorneys can have a remark-
order to determine what information the able ability to pick up quasi-expertise rapidly,
professional needs. retain it, and generalize it to new contexts. An
An expert might also contemplate whether occasional attorney will ask questions about
she wants to be involved with a lawyer who tries such matters as false-positive rates, standard
to exercise this type of control. There is a big error of measurement, or criterion-based valid-
difference between a lawyer pointing out legal ity with a proficiency that can shock an
technicalities that fall within his purview (e.g., unprepared expert. Attorneys may also retain
what standards of evidence will be applied or consultants to scrutinize records, educate them
what exactly is being disputed in the case), and about technical matters, and provide them with
one who tries to control how the expert background literature on the methods used in
performs activities that fall within the psycho- the case in order to better prepare for deposi-
logist's professional domain. One way to handle tions and cross-examination.
possible differences in viewpoint on such Psychologists may use tests with poor
matters is to tell the lawyer that one will feel normative information when better choices
compelled to express reservations about one's are available, or may use old or obsolete norms
conclusions given the present restrictions the when more contemporary or complete norma-
lawyer suggests. Both parties might then make tive information exists on the same test. For
more informed decisions about course of action, example, many neuropsychologists still use
including whether to part ways. earlier norms for the Halstead-Reitan Battery
This does not mean that every difference, even (Reitan & Wolfson, 1993), which are decades
a very small one, necessitates an all-or-nothing old and which show a marked propensity to
576 Forensic Assessment

overdiagnose certain groups of individuals, such Q: And there have been many studies pub-
as older and less educated persons. More recent lished in the United States criticizing the use
norms of the type that Heaton, Matthews, and of those standards with today's Americans,
Grant (1991) developed, which adjust for age, isn't that true?
education, and gender, can decrease over- Q: There are more than 10 recent studies that
diagnosis considerably. would classify Mr. Smith's very same perfor-
Additionally, norms developed for one group mance as perfectly normal, isn't that true?
of individuals may not be wholly applicable or
may create major problems when applied to At times, it is clear that one set of norms has
other groups of individuals. For example, been better developed than another, or it is
norms for measures of motor speed developed obvious that the findings from one study deviate
with young individuals may be applied to the markedly from those of most or all other
elderly, or norms developed among members of studies. Unfortunately, at other times, the
the dominant culture in the USA may be applied underlying basis for conflicting norms, or the
to recent immigrants with very limited English most appropriate choice among competing
proficiency. Psychologists may be unaware of norms, is unclear, creating something of an
specialized normative data that have been impasse. A psychologist might consider the
developed for differing sociodemographic merits (or lack there of) of using such tests,
groups or, when such information is unavail- especially if tests with a better or clearer
able, may act as if the application of norms normative base are available that are designed
developed with one group to another group with to assess comparable areas.
distinct differences could not possibly raise any Lawyers may also raise questions about other
particular concerns or issues. When these types technical qualities of tests, such as reliability and
of problematic practices occur, lawyers can validity. Some psychologists may see such
introduce materials from such sources as the questioning as meddlesome or something akin
Standards for Educational and Psychological to badgering. However, standing on these
Testing (APA, 1985), which describe the need technical qualities can be all important in
for normative data appropriate to the individual assessing the likelihood of an accurate result,
under consideration. they are certainly fair game, and they often do
For a substantial number of psychological need to be critically appraised. A psychologist
tests, there are inconsistent sets of norms that may be surprised about the occasional lawyer's
may lead to very different interpretations. For sophistication on these topics. For example, on
example, depending on the norms used for the deposition, one might be asked:
Halstead-Reitan Battery, the number of errors
on the Category Test that falls two standard Q: In the language of your field, test-retest
deviations beyond the mean may be less than 50 reliability refers to the stability of test scores
or more than 100 (see Faust et al., 1991). In the over time, isn't that true? (If the expert is
case of the Auditory Verbal Learning Test evasive on this and other questions, the
(AVLT) (Rey, 1964), a list learning task, many attorney may call her own testing expert to
psychologists rely on the original norms devel- provide appropriate and clear answers and to
oped many years ago in France with relatively show, in effect, that the opposing expert was
small samples. There have been many subse- being less than genuine.)
quent studies showing that those norms are too Q: Isn't it a basic tenet in your field that tests
demanding for many groups (e.g., Bolla-Wilson that lack satisfactory reliability will also lack
& Bleecker, 1986; Wiens, Crossen, & McMinn, satisfactory validity or accuracy?
1988). A psychologist may be asked questions Q: Test-test reliability is often measured by
like the following on this topic: the correlation coefficient, isn't that true?
Q: Correlation coefficients can range from
Q: You told us that the plaintiff had pro- .00 to plus or minus 1.00, isn't that correct?
blems on the Smith Memory Test, isn't that Q: And in general, higher correlation coeffi-
correct? cients are better, or indicate a higher level of
Q: And the Smith test is the one with the list reliability, isn't that correct?
of words, like a grocery list, correct? Q: Isn't it true that (a prominent psycholo-
Q: The standards you used to decide what gist in the area of measurement will be named
was normal for the Smith Memory Test were here) considers a test-retest reliability of .80
developed more than 50 years ago, isn't that to represent a minimal standard? (If the
correct? expert does not acknowledge the author or
Q: Those standards were developed in an- source, there are other ways to get at the same
other country where they speak a language thing, such as by asking what authorities on
other than English, isn't that correct? testing the expert respects, or about texts and
Assessment Methods 577

journal articles that were used in the expert's psychologists to develop instruments specifi-
education and training.) cally designed to address legal issues, such as
Q: You've not published your own text on competency to stand trial (see Grisso, 1986) and
psychological testing, have you? malingering of psychological disorder (see
Rogers, 1997a). Psychologists who conduct
At trial, after reintroducing these topics in clear legal evaluations, especially in areas that tradi-
and understandable language for the jury, the tional psychological tests are not designed to
lawyer may proceed to show that various tests appraise and for which there is little or no
the psychologists used do not meet the stan- research on these measures, might be well
dards for reliability the witness affirmed and advised to become familiar with specialized
thus, by the witness's own reckoning, cannot be methods, if they are not so already.
trusted. Along related lines, although one test may
Of course, issues related to reliability are have usually outperformed another in com-
important not only because they can lead to parative studies, important exceptions may exist
courtroom embarrassment, but because ade- in specific areas. For example, although Test A
quate reliability is often necessary for reaching may be generally better than Test B, there may
accurate conclusions. For example, deficient be much more extensive or positive research on
reliability can impede or cripple interpretive the capacity of Test B to distinguish between the
strategies that psychologists often consider effects of alcohol abuse versus mild head injury.
essential, such as comparisons between test Such differences in level of background research
scores and pattern analysis. If reliability is low, might dictate a reversal of usual test selection.
the standard error of measurement of the Such possibilities again suggest that specific
difference (SEM)diff may be so great that there knowledge about specific test qualities can
are likely to be frequent false-negative and false- greatly facilitate test selection.
positive judgments about whether true contrasts Legal evaluations may also be weakened by
exist across tests. When comparing test scores, the use of obsolete tests, particularly when more
and especially when analyzing test score pat- contemporary and improved versions are
terns, error is usually additive. Stated differently, available. Some tests do not age well, and
the level of measurement error will exceed, should the lawyer present some of the items in
sometimes by a great margin, the average level of the cross-examination, jurors may be left
error per test (with the margin expanding rapidly shaking their heads and wondering, in fact,
as the number of scores considered together as who these supposedly famous people that the
part of pattern analysis grows). test required the examinee to identify might
Test validity is not a global quality, and hence have been.
there is little meaning to a statement like, ªThe Psychologists may substitute short, and
Fisbee Test is highly valid.º Rather, validity is a clearly inferior, versions of full or standard
specific quality, and a test that is highly valid for forms of tests. Shortened versions commonly
one purpose and with a certain population may are far from perfect predictors of results on full
lack validity for other purposes and with other versions of tests. As full versions of tests are
populations. That is, validity refers to the usually rather imperfect predictors of the
interpretations given to test scores and not to matters at issue, using shortened versions adds
the test itself. Therefore, when selecting tests, it error to error (Ziskin, 1995). The lawyer might
is important to go beyond general information also be able to introduce statements by the test
and to become familiar with research on creator arguing vehemently against the use of
specifics, in particular the specifics involved in short forms.
the current application. In some cases, alternate or parallel versions of
Particularized questions about validity are a test are available, but one of the versions has
often essential to legal work because many of the been much more thoroughly researched than
issues psychological tests were originally de- the other and there is inadequate research about
signed to answer are different, in subtle or not so the equivalence of the alternative forms.
subtle ways, from those commonly asked in the Especially if the test is administered only once,
courtroom. For example, in a criminal case, it it would seem prudent to use the better known
may be of much greater interest to determine an alternative. Similarly, for some tests, there are
individual's mental state at a previous point in alternate administration formats, or alternate
time rather than in the present. The possibility of test materials or equipment, with one version
malingering and the need to evaluate for it are being far better investigated than another. For
often considerably greater in the legal context. example, a psychologist may use certain equip-
The differences that can exist in the purposes of ment when administering a test of finger tapping
traditional psychological tests and the questions speed, but may depend on norms developed on
that arise in the courtroom have led a number of other equipment when interpreting the results.
578 Forensic Assessment

Although the use of one piece of equipment over psychologist lacks an adequate factual base.
another might seem like a trivial matter, Reaching accurate conclusions about key
research may show, as it does in this case, that matters also often demands such information.
different equipment can yield different results Consider again that legal evaluations often raise
(e.g., Brandon, Chavez, & Bennett,1986). The questions that differ from those that are most
psychologist may also have to admit that he common in clinical settings and may well call for
ignored the stern admonition of Dr. Reitan, a information gathering that differs in type and
co-creator of the very battery he used, to avoid amount.
alterations in original equipment lest one As discussed in Section 4.19.4.3, when
compromise the interpretive value of results information is received can be crucially im-
(Reitan & Wolfson, 1993). portant. For example, jurors are likely to react
I have consulted on more than a few cases in negatively if they find that the psychologist
which psychologists used badly deteriorated test formed opinions very early and far before most
materials or made up their own ªduplicatesº of information was reviewed, or waited until the
tests that contained errors in instructions or in eve of trial to perform a thorough analysis of
the reproduction of stimulus materials. For records, years after issuing treatment recom-
example, in one case, the stimulus sheets for a mendations intended to guide other profes-
task requiring color discrimination were more sionals. The latter can look especially bad if the
than a little faded; and in another, a homemade psychologist has insisted he places his role as a
form for an intelligence test contained altera- treater above that of courtroom expert and
tions in standard test instructions. Still another cares deeply about the examinee's welfare. The
case involved Part B of the Trail Making Test jury almost cannot help but wonder if this is
(Reitan & Wolfson, 1993), a paper-and-pencil impression management and disingenuous.
task, which contains numbers and letters After all, if the psychologist really is concerned
randomly arranged on the page that are to be about the individual, he would not have made
connected in order (e.g., 1-A-2-B-3-C, etc.). The important treatment recommendations on the
stimulus materials are arranged so that the basis of such incomplete information. Also, as
continuing line to be drawn from one number or noted, new records may call for entirely new
letter to another never crosses over itself, lines of inquiry, and if materials are not
thereby keeping the spatial layout relatively examined until the 11th hour the opportunity
clean or simple. However, the psychologist had may be lost. All of this again emphasizes the
mispositioned the stimuli such that the respon- need for the psychologist to be thorough in
dent now had to crisscross over the line information gathering in legal cases and, to the
repeatedly, a new variation that seemingly made extent feasible, to obtain records earlier rather
the task much harder. In other cases, reproduc- than later.
tions of materials from personality question- Information gathering usually must go be-
naires contained errors: it must have been yond self-report and testing and include various
difficult for the examinee to answer unintended types of collateral information, such as past and
alterations of original items from the Minnesota present medical records and reports, employ-
Multiphasic Personality Inventory (Hathaway ment records, school records, and materials
& McKinley, 1951, 1983) such as the following: memorializing other's observations or reports
ªI sometimes feel that there is a tight bank about the plaintiff. It may be especially useful to
around my head.º access the observations of a neutral party who
knew the plaintiff before and after the accident
or event. These types of observations may be
4.19.4.4.2 Obtaining adequate information
contained in background records (e.g., work
One of lawyers' most common cross-exam- evaluations) or in depositions, which may be as
ination tactics is to bring up concrete facts that good or better than interviews. Information
contradict, or seem to contradict, the psycho- gathering will usually address at least four
logist's opinion. For example, in a case of issues: a) prior functioning, b) current everyday
purported PTSD, in which the psychologist has functioning, c) possible causes or alternative
described the plaintiff's decided tendency to explanations for the plaintiff's presentation or
avoid reminders of the accident, the lawyer may reported problems, and d) the accuracy and
ask if the psychologist knows that the plaintiff completeness of the plaintiff's report to the
replaced the totalled vehicle with a new one of examiner.
the same make and model and regularly drives To elaborate on these four areas, an analysis
on the road where the accident occurred when of prior functioning is important in order to
an alternative route is easily accessible. In determine the presence and possible extent of
general, courtroom opinions will likely be much changes in functioning. In principle, plaintiffs
more vulnerable to cross-examination if the are not to be compensated for problems that
Assessment Methods 579

pre-date the injury. Typical psychological mine whether his impression about decreased
assessment methods, by themselves, are often mechanical writing quality is accurate. Informa-
limited tools for determining prior abilities or tion about post-incident and present functioning
adjustment. For example, psychological testing may be available from similar and other sources,
results usually provide, at best, inferential or such as work records and work samples.
indirect evidence. Much more direct informa- Further, some individuals who stop working
tion about prior functioning often is available or return to educational activities or develop new
can be made available. Thus, extensive occupa- hobbies. Contemporaneous school records may
tional records showing excellent and steady job suggest that a brain-damaged patient over-
performance are likely to be considerably more reported current academic performance. Alter-
helpful and trustworthy in developing an natively, they may show abilities to learn new
understanding of past work function than skills and to engage in problem solving that raise
inferences based on an IQ test. There are also, very serious questions about the accuracy or
often, similarly direct and independent sources meaning of low scores on measures designed to
of information about current functioning that tap such functions.
can supplement and extend, and serve as means The plaintiff's deposition may provide an
to check on, impressions formulated from the extended sample of various cognitive functions,
plaintiff's report and testing. such as language use and the ability to attend to
Using collateral records to help appraise the questions. I have been involved in cases in which
accuracy and completeness of the plaintiff's plaintiffs with purportedly severe attentional
report is essential, not only because there is a and language comprehension problems an-
heightened risk of misrepresentation in legal swered hundreds of questions without a single
cases, but because inadvertent misrepresenta- one having to be repeated and seemed to have
tions can occur and lead to over- or under- no difficulty understanding the lawyer's in-
estimates of adverse changes. For example, quires, despite the recurrent use of complex
depressed individuals may understate, or in- grammatical structure and relatively sophisti-
dividuals with serious brain injuries may over- cated vocabulary. Review of the plaintiff's
state, their current level of functioning. Also, the deposition may also provide experts with
plaintiff may simply have forgotten, or may be important information that is not contained
unaware of, important matters. For example, in other records, such as materials on past
she probably will be unable to state precisely accidents or problems with the law.
when her developmental milestones were Legal cases often involve questions relating to
achieved, may be uncertain about or unaware quality of life, day-to-day functioning, and
of performances on past standardized testing, or work capacity. Psychologists may not habi-
may be confused about prior medical diagnoses. tually collect detailed information in areas that
A psychologist is in an especially compromised are directly relevant to these concerns. For
position when he does not try to obtain collateral example, an accident may have forced the
information in areas in which the deficits he injured individual to discontinue various en-
assumes the examinee manifests would seem to joyable activities or hobbies. It is difficult to
impede that person's capacity to provide the very address impact on day-to-day functioning
information the professional seeks. For exam- unless one develops a reasonably detailed
ple, the expert might ask a person, who he description and chronology in this area. A
believes has serious problems in long-term courtroom evaluation in which the psychologist
memory, about remote historical events. assigns a diagnosis, describes areas of malad-
Many sources of information are usually justment, often in abstract terms (e.g., Smith has
potentially available to psychologists conduct- elevated anxiety levels), and presents a general
ing legal evaluations. Past testing frequently can treatment plan may be of minimal help in
be obtained through such sources as school understanding possible damages or the specific
records, and sometimes through military records consequences of an injury. Too many legal
and pre-employment evaluations. Past work evaluations are devoid of an adequate connec-
records, medical records, pharmacy records, tion with the individual's day-to-day life and
psychological and counseling records, and functions. It is usually much more helpful to
criminal records may also provide important know what the individual did before and can
information or leads about previous strengths and cannot do now due to his reduced memory
and weaknesses and about possible causes for than to know the percentiles for his scores on
presenting problems. One can examine these memory testing.
materials for the facts they provide, and also for In the area of work capacity, except in gross
direct and indirect indicia of prior functioning or obvious cases, it would often seem difficult to
and capacities. Forms completed in the past, for reach a well-grounded conclusion about possi-
example, may allow the psychologist to deter- ble return to former employment without a
580 Forensic Assessment

reasonable understanding of prior job require- custody case: when it was to his advantage to
ments. A general label or description may not be claim disorder he claimed disorder, and when it
sufficient, and one needs to understand just was to his advantage to claim psychological
what the job required. In one case, the plaintiff health he claimed health.
indicated he operated a machine that made
paper bags, which the psychologist assumed was
4.19.4.4.3 Conduct technically proficient
a rather simple matter. However, it turned out
evaluations
that the equipment was incredibly complex and
often required intricate adjustments involving It is not unusual for opposing counsel to
dozens of steps. Psychologists need not feel obtain the psychologist's complete file, scruti-
obligated to address work capacity, even if it is nize it closely, and have her own expert review it
an issue in the case, should it involve matters as well. Technical problems with the examina-
with which they do not feel sufficiently informed tion may well be uncovered and cause difficul-
or if they do not believe the required scientific ties.
backing is available to perform this analysis. (It Some of the more common problems I have
would be much better to inform the attorney of observed in psychologists' courtroom work
this limit in advance, however, so that counsel include errors in scoring psychological test
can consider retaining an additional expert and items and in summing and transforming scores.
is not unexpectedly stuck at trial with no one to I have reviewed cases in which psychologists
address occupational issues.) The point is that if made dozens of scoring errors or miscalculated
a psychologist is going to address work standard scores by 30 or 40 points. In fact, the
capacity, certain information would seem psychologist's errors may be close cousins to
essential. those he used to diagnose disorder in the
A well-rounded picture of an individual and plaintiff. The psychologist might then face a
an understanding of her functional capacities is series of questions like the following:
also likely to touch on such topics as self-help
skills, household activities and chores, inter- Q: You told us that Ms. Smith made errors
personal relationships, activities outside of the on simple math problems, and that this
home and job, the state of the marriage, child entered into your conclusion that she is brain
care responsibilities and capacities, and travel damaged, correct?
activities. The examiner may also want to know Q: You made errors on simple math pro-
who handles other responsibilities of major blems also, didn't you.
importance, even if they come up only occa- Q: You told us that Ms. Smith sometimes did
sionally, such as large purchases. There is not attend to important details when per-
something potentially inconsistent about a forming paperwork, and that this was an-
claim of gross reductions in mental ability other piece of evidence suggesting brain
and the fact that this same individual, with the damage, isn't that true?
spouse's full blessing, handled delicate and Q: You forgot to date a number of your test
complex negotiations for the acquisition of the records, isn't that correct?
new home. Q: Dating a test record might be considered
In this vein, the psychologist might be an important detail when performing paper-
especially alert to, and actively look for, work, isn't that correct?
circumstances in which the benefits and dis- Q: And you told us that Ms. Smith some-
advantages of being capable and incapable are times failed to follow fairly simple directions
reversed. For example, it might be to the and that this was another indication of brain
plaintiff's advantage to appear quite disabled damage, correct?
during the psychological evaluation, but quite Q: Doctor, the instruction you had to follow,
capable when applying for a business loan. ªStop testing after five consecutive errors,º is
Consistency in presentation across such circum- not complex, is it?
stances, even when the individual has a great Q: And you failed to follow this direction on
deal to lose, suggests something about the multiple occasions, isn't that correct?
genuineness of incapacity. In one case, a Q: Let me see if I have this right doctor.
plaintiff in a psychological injury case was also When Ms. Smith makes errors, they indicate
embroiled in a unrelated custody battle. The brain damage. When you make errors, they're
plaintiff had undergone independent psycholo- just errors? Strike that, I withdraw the
gical evaluations in each case. On the evaluation question.
conducted for the personal injury case, he
endorsed many items in the pathological In other cases, psychologists may violate stan-
direction on the MMPI, but reversed these dard testing procedures without any clear or
answers in virtually every instance in the compelling rationale. A psychologist in one case
Assessment Methods 581

repeatedly terminated tests prematurely, calcu- Q: And one of the reasons you are not sure it
lated scores as if this had never occurred, failed applies to memory functioning is because we
to mention these alterations in procedure in her have no direct pre-accident memory tests, do
report, and had even published an article on the we?
need to adhere exactly to standardized admin- A: Well, I think some of the past tests reflect
istration formats. Fortunately for her, the case on memory abilities, but it is true I have seen
settled before trial. There are certainly situa- no records of specific memory tests that pre-
tions in which a test must be terminated date the accident.
prematurely or it is sensible to alter procedures Q: Doctor, let's return to the Fisbee Memory
(e.g., dividing a test that is preferably adminis- Test that you gave to help determine the
tered in one sitting into two sessions due to the possible impact of the accident. Doesn't the
onset of extreme fatigue part way through), but test manual state that the delayed version is to
these departures should be recorded and de- be given about 30 minutes after completing
scribed. What is hard to justify is altering the immediate version?
standard procedures without any good reason. A: Yes.
Psychology can be difficult enough without Q: Doctor, do we usually remember things
introducing unnecessary sources of error. Stan- better over a longer or shorter period of time?
dard procedures may also be violated due to A: Shorter may be better, but not always.
insufficient familiarity with prescribed meth- Q: Doctor, you're not telling us, are you, that
ods. For example, in the cases I review, many if I want to win a bet about when these jurors
experts seem unfamiliar with the precise rules will remember the most about the trial, I
for scoring the Visual Reproduction subtests should bet on next month rather than to-
from the Wechsler Memory Scale-Revised morrow?
(Wechsler, 1987). A: I don't accept your analogy, and psycho-
Failure to follow standardized testing proce- logical phenomena do not always follow what
dures can make the psychologist look bad. might be considered common sense.
Take, for example: Q: So we're learning. In any case, when you
administered the delayed version, you did not
Q: Doctor, you told us that Ms. Smith wait 30 minutes as the manual indicates, you
demonstrated problems in delayed memory, waited 60 minutes instead, isn't that true?
isn't that true? A: Yes, but I don't think it makes any
A: Yes, there were problems that seemed difference.
significant to me. Q: Doctor, the instruction in the manual does
Q: And you administered the delayed ver- not say it's fine to wait 60 minutes, does it?
sion of the Fisbee Memory Test, correct? A: No it doesn't, but many studies suggest
A: Correct. that the amount of memory loss that occurs
The lawyer might next ask some questions to after 30 minutes and 60 minutes is similar.
help the jury get a clear picture of the test and Q: Doesn't the manual also state, and I
the way delayed memory is examined. The quote, ªThe examiner should follow the
lawyer then asks: instructions specified in this manual for test
Q: And you told us that Ms. Smith scored in administration exactly in order to ensure
the borderline range on the test, didn't you? proper comparison with the standardization
A: Yes, that's a label I used, although her groupº?
percentile was rather low, at about the 9th A: I can't recall exactly.
percentile, if I remember correctly. Q: Well, I can show you the manual if you
Q: Borderline is the category between nor- want.
mal and abnormal, right? A: No, that's OK, I'll take your word for it.
A: I'm not sure I would say it that way, but I Q: Certainly it is possible, is it not, that a
would agree with the basic thrust of your different result might have been obtained if
question. delayed memory had been administered after
Q: Had she gotten a few more points of 30 minutes versus waiting twice as long? (It
credit on the test, the score would have been probably will not matter what the psychol-
classified as low average, isn't that true? ogist says.)
A: That's true, but she didn't. A: It is possible, but I doubt it.
Q: And you have already agreed that accord- Q: And doctor, the reason to give the test as
ing to school records, Ms. Smith was func- the manual specifies is so that we know how
tioning at a low average level in a number of the individual performs when the test is given
areas before the accident, isn't that true? as designed, rather than having to guess how
A: Yes, but I'm not sure that applies to her it would have come out if it had been given as
memory functioning. the manual instructs, correct?
582 Forensic Assessment

A: Those are your words. example, if the plaintiff's cognitive dysfunction


Q: Doctor, I'm asking you. However, let's is due to the onset of a schizophrenic disorder as
just move on. In a sense, doctor, you were opposed to mild brain damage, very different
measuring her on the 30 yard dash but made treatment is likely to be indicated.
her run 60 yards, isn't that true?
A: No, I don't think you can make that
(i) Alternative diagnoses, including malingering
comparison at all.
(The expert might think that but the jury is Given the overlap in symptomatology across
likely to believe that the analogy makes pretty conditions, or the lack of specificity of many
good sense.) symptoms (e.g., anxiety, sleep problems, diffi-
Q: You waited twice as long as the prescribed culties concentrating), clinical presentations
time, and Ms. Smith obtained a borderline often raise multiple alternative possibilities that
score, which you agreed was just a few points require careful analysis and reanalysis of the
below the low average range, correct? positive and negative evidence. The defense
A: Yes. expert who is leaning towards adjustment
disorder may need to reconsider the chronicity
I have also reviewed reports laden with factual of the condition and the substantial level of
inaccuracies relating to such matters as level of maladjustment. The plaintiff's expert who
education, the date of the accident, the number diagnoses PTSD may need to re-examine the
of prior accidents, the number of children the absence of physiological arousal and the
plaintiff has, job history, etc. It may be difficult infrequency of intrusive thoughts. A neuropsy-
for a juror to believe that an expert can reach chologist who is quick to diagnose a mild brain
accurate conclusions about things that cannot injury may have mistaken it for a depression
be directly observed and that are complexÐ- that started earlier following disturbing life
such as the area in which the brain is damaged events. A psychologist who identifies character-
or about the inner workings of the mindÐif the ological disorder as a basis for what he believes
expert cannot get simple facts straight, such as are false or exaggerated perceptions of sexual
the day the accident occurred. harassment may need to abandon this conclu-
sion when collateral records for the 10 years
prior to the reported events nearly all suggest
4.19.4.4.4 Give adequate consideration to
good adjustment and interpersonal relations.
alternatives
Along these lines, various diagnoses are packed
Research on clinical judgment suggests that with assumptions about previous functioning,
diagnosticians can increase accuracy by waiting and experts may miss an excellent opportunity
longer before reaching conclusions and con- to check on their impressions by examining
sidering alternative possibilities more actively whether records about these periods conform to
(Faust, 1984; Faust & Willis, in press). In expectation. For example, someone diagnosed
courtroom cases, the opposing lawyer will often with hypochondriasis, which is usually assumed
present causal theories or explanations that to be chronic, would be expected to have voiced
contrast to those the expert proposed. An expert multiple medical complaints in the past, and not
who has already made a systematic effort to just since the injury 6 months ago. If various
evaluate and consider alternatives is more likely past medical records show select, delineated,
to be correct in the first place and better and realistic medical complaints (e.g., the finger
prepared to defend his position. For example, did turn out to be broken when x rayed), it may
when the lawyer asks, ªIsn't it true that excess be time to rethink the diagnosis.
caffeine intake can also cause symptoms of In many legal cases, psychologists should be
anxiety? the response might be, ªAlthough that more complete in the assessment of malingering.
is certainly true, the chart I constructed of For example, they may not collect any collateral
caffeine intake and anxiety levels shows that information and may limit appraisal of mal-
anxiety levels often were high even when ingering to interview impressions and a single
caffeine intake was low.º The expert who has test that the literature shows to have poor
not carefully evaluated plausible alternatives sensitivity. I have reviewed many evaluations in
might have to admit, instead, that she is not in a which the Rey 15-Item Test (Rey, 1964) was the
position to say whether, or the extent to which, only measure specifically used to appraise
caffeine might have contributed to the clinical malingering. The sensitivity of this test is so
presentation. The increased probability of an poor that in cases of malingering, it apparently
accurate conclusion also enhances the like- stands a worse chance of detection than a coin
lihood of providing true assistance to the trier of toss (see Rogers, 1997a). Other research
fact and may be critical in the examinee's care, suggests that it is difficult to detect malingering
whatever the courtroom implications. For through typical interview methods and clinical
Assessment Methods 583

impression (see Faust & Ackley, 1998); and as or perceived difficulty, of the task. With a
Ziskin (1995) suggested, when a mental health dichotomous choice format, even an individual
professional asserts otherwise, the lawyer can with no memory capacity should achieve about
ask, ªEach time you've been fooled, you don't a 50% level of accuracy through random
know it, do you?º guessing. Some malingerers overplay the role,
The psychologist who aspires to expertise in failing to realize that performance that falls
legal assessment should have a solid familiarity substantially below chance requires knowledge
with the literature on malingering. Rogers' of correct answers. Although such types of
(1997a) edited book is a good starting point in forced-choice methods may have high valid-
this venture. The literature demonstrates the positive rates (correctly identifying malingering
limits of traditional methods (e.g., interviews, when test results are positive), valid-negative
many psychological tests), suggesting that: (a) rates (correctly excluding malingering when
individuals can manipulate results on a wide results are negative) are often poor (Rogers,
variety of tests, and in so doing may fool 1997a). There are, however, rapid ongoing
clinicians into overdiagnosing emotional or developments with this and other specialized
cognitive disorder; (b) it may be very difficult malingering detection methods, creating good
to detect face-to-face lies, especially by inter- reason for optimism and making it important to
view or impressionistic methods; (c) experience regularly update one's knowledge of the
by itself does not ensure adequate detection literature.
capacities; and (d) the base rates for malingering Interview methods are also available, in
may be higher than many psychologists believe particular the Structured Interview of Reported
(but lower than other psychologists believe) (see Symptoms (SIRS), developed by Rogers, Bag-
Faust & Ackley, 1998; Rogers, 1997a). For by, and Dickens (1992). The SIRS uses various
example, Reynolds (1998) asserts that ªreason- detection strategies, the majority of which seem
able and thorough research indicates that at to capitalize on false stereotypes or general
least 25% of cases of head injury in litigation beliefs about mental disorder. A variety of
involve malingeringº (p. viii). Familiarity with studies from independent researchers on a range
the ªnegativeº literature on clinicians' detection of topics suggests that the SIRS may have
accuracy using more traditional methods is strong properties for malingering detection (see
helpful because it directs us to take additional Rogers, 1997b).
steps and because there has been such a recent Other recent studies have investigated
growth in malingering detection methodsÐal- whether knowledge of disorder and knowledge
ternative or supplemental methods are avail- of the strategies that underlie the design of
able. This is a prime example of an area in which malingering detection methods help the exam-
recognition of limitations in the field has had a inee escape detection (see Faust & Ackley,
very constructive impact by sparking intensive, 1998). This research is still at a relatively early
productive research efforts. stage of development, although it suggests the
The MMPI remains the most thoroughly following tentative generalizations. With testing
researched test for malingering detection. methods that use something more than the very
MMPI indices and methods for malingering simple types of detection strategies (like the
detection go beyond such traditional scales as F, MMPI and unlike, say, forced-choice methods),
L, and K, and are well described in various knowledge of disorder seems to be of limited
overviews of the topic (e.g., Greene, 1991, 1997). helpfulness. In contrast, knowledge of detection
Differences in application with emotional strategies increasesÐsometimes markedlyÐthe
distress versus brain damage claims are im- likelihood of escaping detection. With typical or
portant to recognize, although the MMPI may unstructured interviews, however, knowledge of
be of considerable value in malingering detec- disorder may well be useful, and perhaps the
tion with both types of cases. most helpful element in successful malingering.
There are also many specialized tests for If these conjectures turn out to be correct, it
malingering detection that are at varying stages seems likely that the chances of detecting better
of scientific development and of varying utility. prepared or more sophisticated malingerers
One such example is forced-choice methods may be greatly enhanced by using a combina-
(Pankratz, 1988). Tasks are set up in which the tion of specialized testing and specialized
examinee is instructed to select the correct interview methods (such as the SIRS), although
answer from among two or more choices. For one must worry about inflating the false-
example, the examiner may read a string of positive error rate. Optimal use of these
digits to an examinee, then show her two written methods, especially if multiple data sources
strings, and ask her to select the one that she just are obtained, will likely be achieved through the
heard. A delay may be introduced before development and application of formally vali-
selections are made to increase the difficulty, dated decision rules or actuarial strategies for
584 Forensic Assessment

data combination (Dawes, Faust, & Meehl, early in the morning to drive the 200 miles to the
1989; Grove & Meehl, 1997; Meehl, 1954). The psychologist's office, and ended up so tired he
idea that more information is better may already could not stay awake continuously during the
seem obvious, but considerable research in testing. I have even reviewed cases in which
clinical decision making across multiple areas examinees told the psychologist that they
shows that intuitions about these matter are literally had not slept at all the night before,
frequently misplaced and that there are many and yet the testing proceeded. Children are
circumstances in which greater selectivity in sometimes tested when they are ill and cranky or
data use and combination would result in suffering from ear infections. In other situa-
greater accuracy (see Dawes et al., 1989; Faust, tions, psychologists continue testing for 8 or 9 or
1984; Faust & Willis, in press). 10 hours, and then write a report indicating that
one of the plaintiff's main difficulties is rapid
fatigability (while maintaining they have ob-
(ii) Considering alternative causative factors
tained ªvalidº testing results or results that do
Assuming a condition (e.g., brain damage or not underestimate capacities).
PTSD) is present, the question of what caused it Many psychologists have very busy sche-
can completely determine the outcome of a legal dules, and cancelling a case slotted for a half or
case, and hence is often one of the forensic full day can create real headaches, but some-
evaluator's central concerns. Although some times there would seem to be no reasonable
experts depend mainly or solely on self-report to alternative. Otherwise, the psychologist may
determine cause, even the most honest plaintiff have to answer questions like, ªDoctor, do you
may not know what created her difficulties, or ever tell your patients, before an important
can make inadvertent errors in associating examination, to try to stay up all night so that
symptoms with events and conditions. If they can do their best?º In contrast, there may
patients always knew what caused their pro- be circumstances in which these types of
blems they would not need doctors or mental accompanying problems and maladies are
health experts to perform differential diagnosis. stable features of an individual's condition.
Overattention to temporal sequence or focus on For example, it might be that the plaintiff's
a salient event may lead to misattributions, as physical discomfort, versus the brain injury, was
might faulty diagnoses by other providers. the greatest contributor to performance diffi-
Factors may be operating that the individual culties during the examination. However, if the
could not possibly have been aware of, such as accident caused painful orthopaedic problems
an unrecognized exposure to a toxin at the work that are unlikely to remit, the adverse effects on
site that causes a gradual, delayed reaction. cognitive performance may well be character-
Suppose such exposure leads to insidious istic and typical of the individual's day-to-day
decreases in reaction time and coordination, functioning.
which in turn lead to a car accident that causes a The many factors that may cause extraneous
very mild head injury. It is easy to see how causal or transient alterations in cognitive, emotional,
confusion can arise. In other instances, of or behavioral status include, to name a few,
course, plaintiffs purposely mislead. This may sleep deprivation, pain, caffeine abuse, alcohol
be very difficult to detect, because their and other forms of substance abuse, medication
conditions may be real but may have arisen side effects, and a myriad of independent
from another cause that they will hide or medical conditions. Stressors and problems
disguise. The chances of uncovering the decep- separate from the accident may also impact
tion may depend largely on obtaining sufficient greatly on emotional or cognitive functioning.
background information. In situations in which these types of factors may
Assessment results may also stem, in part or have altered results, it can be very helpful to re-
in whole, from transient or extraneous factors examine the individual at a later time. For
(for discussion of multiple possible confounding example, if there is a question about the relative
variables in brain damage cases, see Faust, contributions of mood disorder versus brain
1995). Individuals are sometimes tested under injury to performance on cognitive tests,
poor conditions that obfuscate conclusions retesting after the depression has been treated
about more stable impairments. In one case, a successfully may help to parse these factors.
nationally known expert, who had been retained
by the defense, examined the plaintiff in a room
in an airport. He concluded that problems seen 4.19.4.5 Interpretive Strategies
on testing probably were not caused so much by
the head injury in question but rather by A detailed discussion of interpretive strategies
exposure to jet fuel fumes during the examina- is well beyond the scope of this chapter, and only
tion. In another case, an individual set out very a few general guidelines will be suggested. Some
Assessment Methods 585

of these guides may seem obvious and might be informed and skilled cross-examination in-
best viewed as reminders; I hope they will not creases. These and other issues involved with
strike the reader as too rudimentary. Other the combination of data are addressed in such
guides stem from the extensive literature on sources as Dawes et al. (1989); Meehl (1973);
decision making, which provides many useful and Faust (1984) (see also Faust & Willis, in
ideas and procedures. The decision making press). As these authors note, considerable
literature often has not infused writings on research suggests that it is not necessarily best to
clinical practice. This is unfortunate because try to combine all of the data or to analyze
many of these principles, while very helpful, are complex configural relations. Formal decision
counterintuitive, even for professionals, and procedures (e.g., actuarial methods) that have
hence are unlikely to be recognized or realized been developed through proper scientific meth-
without direct exposure. Introductions to the ods almost always equal or exceed the overall
decision making literature can be found in a accuracy attained through these types of
variety of sources (e.g., Arkes, 1981; Faust, attempts at subjective or clinical data integra-
1986; Faust & Wedding, 1989; Faust & Willis, in tion.
press; Wedding & Faust, 1989). Interpretive methods may fail to consider
The recognition that interpretation is unlikely base rates (see Meehl & Rosen, 1955; Wedding
to be better than the data upon which it rests & Faust, 1989), or the frequency of events. For
highlights the need for careful collection of example, the accuracy of diagnostic signs and
information and adherence to proper examina- methods varies in relation to the frequency of a
tion procedures. Prudent data gathering can go condition, and adjustments in the application of
a long way towards increasing the chances of such indicators and in estimates of likelihood
reaching sound and defensible opinions. are likely to be needed as base rates change. To
It is hardly newsworthy to suggest that not all illustrate, as the frequency of malingering
interpretive procedures and available methods decreases across settings of application, the
are equally sound. Not uncommonly, various same score on a simulation index signals a lower
procedures are applicable to the same data and probability of malingering, and hence cutting
lead to conflicting conclusions. For example, scores may need to be raised to avoid an
there may be five or more procedures for unacceptable false-positive error rate. Along
judging cooperation with the examination or related lines, characteristics that are common
malingering, with some pointing in different among the general population may be described
directions. Further, there may be no simple way as indicators of pathology. Some such ªsignsº
to combine or integrate these results, because are part of clinical lore, and their use may not
they may directly contradict one another. If one have been modified in accord with research. For
decision rule indicates that there is brain example, a psychologist may assert that five
damage and the other that there is not, both points of subtest scatter on the Wechsler Adult
cannot be correct. Intelligence Scale-Revised (Wechsler, 1981) is
Obviously, when all decision rules coincide, indicative of disorder, although research shows
there is nothing to resolve, but when they that most normal individuals equal or exceed
conflict directly, one must select one over the this level (see Matarazzo, Daniel, Prifitera, &
other (or defer judgment). At times this may not Herman, 1988). Given mental health profes-
be too difficult, because the great bulk of the sionals' skewed exposure to abnormal popula-
evidence, especially that of the highest quality, tions, it may be difficult to determine whether
points in a certain direction; but in other cases features seen commonly among patients are
the results are more ambiguous. For example, indicative of pathology or rather just common
what does one do if the single best decision among individuals in general and thus poor
procedure argues for Conclusion A, but two discriminators of disorder.
other methods of more modest accuracy argue I have reviewed many reports in which
for Conclusion B, and one is not sure how these pedestrian human failings were stated as self-
latter two methods operate in combination? evident signs of disorder. When supposed
In these conflictual situations, a psychologist pathological features include sometimes mis-
is much better off, to illustrate a relative extreme placing one's keys, occasionally forgetting
(and assuming all other things are equal), if the where the car is parked, and irritability with
decision rule given preference has stronger misbehaving children, a good cross-examiner
research support, has provided a clearer result, can have a field day. It is often necessary to
is better suited to the examinee, and, when consult epidemiological research and associated
pitted against the other decision rule(s), has literature to determine just how often features
been shown to be correct more often. As the occur among normal and abnormal popula-
table tilts in the other direction, the chances of tions, and also whether they occur with
being correct decrease and vulnerability to differential frequency among different forms
586 Forensic Assessment

of disorder. The latter type of knowledge can be overattention to select weaknesses, insufficient
essential to differential diagnosis. awareness of the overlap between normal and
Overdiagnosis, or the tendency to see pathol- abnormal populations, and failure to recognize
ogy that is not present or to overestimate its normal variations within or across individuals,
severity, may result from various sources. the latter of which was previously illustrated by
Overestimation of prior functioning may lead reference to subtest scatter on intellectual tests.
to overestimation of changes in functioning. Normal individuals are rarely unremarkable or
False conclusions about prior functioning may well adjusted or well functioning in all respects,
stem from such sources as faulty history that the and the tendency to conclude that an individual
plaintiff reported or faulty methodology. For is aberrant due to even minor shortcomings
example, some psychologists believe that the creates a hurdle that few of us would pass.
single highest, or few highest, test scores provide Other psychologists underestimate or fail to
a good estimate of prior overall or general recognize pathology, with explanatory factors
intellectual functioning (the so-called ªbest sometimes representing the flip side of the same
performance approachº). Given normal varia- coin that leads to overdiagnosis. For example,
tion in test scores, this method is almost sure to psychologists may be quick to assume that
lead to overestimates, and not uncommonly problems pre-dated the accident. Weaknesses
gross overestimates, of prior functioning, espe- on cognitive or neuropsychological tests may be
cially among unaffected individuals. Given the attributed to pre-existing learning disabilities,
average seven point spread between the highest even without checking school records. Emo-
and lowest subtest scores on the Wechsler tional problems may be assumed to represent
Intelligence Scales (Matarazzo et al., 1988), Axis II disorders, and hence characterized as
use of the single highest score with normal life-long difficulties as opposed to consequences
individuals will result in an average overestimate of a more recent accident. In a case in which I
of 15 to 20 points in prior Full Scale IQ (FSIQ). consulted involving an individual with a
The result will be an estimated loss of 15 to 20 moderate to severe brain injury, a psychologist
FSIQ points in an individual with absolutely no passed off various potential indications of
adverse change in intellectual functioning. This frontal lobe disorder, such as impulsiveness
might be the difference, for example, between a and marked difficulties with interpersonal
score near the 90th percentile and one at the 50th relations, as indicators of a borderline person-
percentile. The absurdity of this situation is that, ality disorder (BPD). However, this individual
on average, an individual will have to have also presented with various features unrelated to
gained over 15 FSIQ points to be judged to have BPD but positively associated with brain injury,
retained his prior capacities! only demonstrated characteristics of BPD that
Another basis for overdiagnosis is the use of overlapped with those of brain damage, and
inappropriate norms that set overly demanding demonstrated virtually nothing pre-injury to
performance standards. This problem is espe- suggest any type of personality disorder.
cially common when norms are not adjusted for Some experts are very quick to assume that
sociocultural and demographic features or test individuals are malingering and that this
bias. For example, a test may show pronounced explains their symptom presentation. In a toxic
age and education effects and yet a clinician, exposure case, a psychologist casually dismissed
perhaps unaware of these findings, might apply seemingly strong neurological findings as
norms developed on young, highly educated feigned symptoms. Ironically, as it turned out,
individuals to much older and less educated although the psychologist was likely right that
individuals. Also, in at least some normative the toxin did not cause a problem, she was
studies, individuals with virtually any signifi- otherwise quite wrong: further medical work up
cant risk factor are eliminated from the sample, yielded a nearly definitive diagnosis of multiple
resulting in supernormal groups, or groups sclerosis and left virtually no doubt that the
whose performances well exceed the general plaintiff had serious neurologic disorder all
population (this being one reason why average along. Underestimations of prior functioning
IQs are often so high in normative studies on may also lead to missed pathology. Finally, in
various neuropsychological measures). The some cases, experts misidentify chronic or
result can be a mistaken belief that unaffected permanent symptoms for transient ones. For
individuals have suffered a loss attributable to example, in brain damage cases, experts may be
the event in question (e.g., a head injury) too ready to assume that all of the plaintiff's
because their test performances do not meet the problems are due to depression when, for
inflated standard set by the ªnormativeº group. example, the original injury was serious, the
This can also lead to the initiation of treatments individual did not appear to have a low mood
which may carry risks for target problems that when evaluated, depression has fluctuated but
do not exist. Overdiagnosis can also stem from impairment has not, and at least some of the
Assessment Methods 587

observed problems (e.g., perseveration and Q: You didn't mention that in your report or
aphasic errors) are much more strongly asso- your testimony, did you?
ciated with brain damage than with depression. The lawyer than reads five more positive
answers, and the expert may make some
4.19.4.6 Preparing Reports comment such as the following:
A: You're only mentioning the positive re-
Reports may or may not be introduced as sponses.
evidence at a trial. When they are, every word Q: And it would be very wrong to only
becomes a possible target for cross-examina- present one side of the picture, wouldn't it?
tion, and thus they should be prepared very
carefully. Although I will leave detailed recom- Some experts write exceedingly long reports
mendations about report preparation to others, that address many minor or irrelevant matters.
I do have a few suggestions. First, sloppy Again, every word and every comment is
reports with factual inaccuracies can create a potential fodder for cross-examination, and
very bad impression, even if the errors are not thus including matters that are really not
substantive. In one case, due to a typographical important to the task at hand is unlikely to
error and poor notes, a psychologist could not be of much benefit and could create a problem.
say whether a plaintiff, earlier in life, had fallen For example, such text might include various
out of a rocker or had been struck in the head factual inaccuracies that can make the psychol-
with a rock. Reports should also strive to ogist look foolish.
provide a balanced representation of the case Reports may also include many extreme
and of positives and negatives. For example, the statements. A good cross-examiner usually
report of an expert that repeatedly glosses over welcomes extreme statements or exaggerations,
negatives that seemingly are evident and should because they create an easier target. For
not be ignored may be difficult to defend. example, in one case, an expert did not just
Alternatively, the report may contain nothing say that one method was better than another,
but negatives. For example, when describing but that there was a general consensus among
responses on a questionnaire or a depression psychologists that it was not just superior, but
inventory, the examiner may list only the far superior to another method. This extreme
unhealthy responses. This can become espe- claim was easy to deflate when literature was
cially problematic when, in fact, the majority of introduced by the very author of the test
the responses are positive. This creates an easy describing its limitations. In another case, a
target for the cross-examiner, to wit: psychologist testified that even a very mild brain
injury affects every single aspect of an indivi-
Q: Doctor, in presenting before us today, dual's functioning and existence. Although the
you strive, do you not, to provide fair and case was resolved before trial, in part because
balanced testimony? this psychologist's assertions were so vulner-
Q: In gaining a complete understanding of able, it would have been a simple matter to
an individual, strengths can be just as im- confront the witness with the many normal test
portant as weaknesses, isn't that true? performances (which were entirely consistent
Q: Interventions or approaches to helping a with pre-accident measures) and to ask her
person often build on someone's strengths, whether there was any chance they could have
isn't that correct? been exceptions to her claim. She almost
Q: When describing the results of the Meth- certainly would have said no, leading to a rapid
od X Depression Inventory, you shared Mr. self-destruction.
Smith's responses to three items, isn't that The exact wording of reports may be
correct? important and lead to unanticipated legal
Q: Each of these items suggested possible consequences. For example, in one case, an
problems, you would agree with that, extremely bright neurologist wrote a conclusory
wouldn't you? sentence that included the phrase, ªbut for the
Q: And each of us, doctor, is something of a accident.º His intended meaning was that the
mixture of positives and negatives, wouldn't condition would have occurred whether or not
you agree? the accident occurred. However, in the legalistic
Q: There are 20 items on the Method X world of the arbitrator, the conventional
Depression Inventory, isn't that right? interpretation of this phrase was to the contrary,
Q: We haven't heard anything about the thas is, that in the absence of the accident, the
other 17 responses, have we? condition likely would not have occurred. It is
Q: On item number 3, doesn't Mr. Smith generally much safer to stay away from legal
indicate that he gets as much enjoyment out terminology unless it is necessary and one
of things as he used to? understands exactly what one is doing, and
588 Forensic Assessment

rather to just say things plainly and clearly. The do so through an opposing expert. The lawyer
wording of a report should also be checked might raise inaccuracies in the plaintiff's report
carefully to avoid unintended meanings or to the expert that seem purposeful and self-
interpretations, although it may be possible to serving. A weakened expert is often fairly
clear these up at deposition or trial without helpless against adverse, or seemingly adverse,
creating a problem. evidence; and it can have a much greater impact
for the attorney to introduce affirmative
elements of her case through an opposing,
4.19.5 LAWYERS' STRATEGIES AND rather than her own, witness. One might
TACTICS consider the effect when the opposing expert
cannot fend off attacks on the plaintiff's
When dealing with adverse expert testimony, credibility, versus the dubious impressions that
the lawyer's most basic task is to undermine can arise when the plaintiff seems credible and
credibility. Some lawyers will attempt to do this the plaintiff's experts, who vouched for the
with a scalpel and others with a blunt stone, but plaintiff's honesty, were not really questioned
the aim is the same in either case, and it is to the on this score. Rather, the first person to raise
witness's advantage not to forget it. serious concerns about truthfulness just hap-
Although an attack may be directed at pens to be the expert the attorney hired.
personal matters (e.g., bias, financial incen- Cross-examination often does not follow the
tives), this is often merely part of doing business contour of the expert's direct examination.
and reflects nothing personal. The same lawyer Although, in some jurisdictions, cross-examina-
who attempted to paint the witness as biased or tion is supposedly restricted to the content
incompetent may shake hands on the way down covered in direct, in practice it is usually a
the courtroom steps and tell the expert he thinks relative free-for-all with few topics off limits.
she did an excellent job. On the one hand, most Further, a good cross-examiner rarely wants or
lawyers do not act in a blatantly hostile or needs to return to the points the expert covered
obnoxious manner because they believe their tit-for-tat. Rather, the attorney is selective,
case is best served if the jurors like them. A looking for weaknesses or vulnerabilities. Many
scowling, hateful, and abusive manner towards lawyers would much prefer to win a cross-
a witness, especially one who has done little or examination 3 to 0, rather than 10 to 2. This is
nothing to provoke it and who acts in a perfectly one reason experts should aspire to avoid weak
civilized and seemingly impartial manner, can components in their assessment batteries or
hurt the attorney much more than the expert. procedures.
For an expert who might be a bit thin-skinned, it The areas of cross may or may not touch on
can be helpful to perform a little personal Albert any of the points covered in direct. In one case, a
Ellis and tell oneself that such personal attacks plaintiff's neuropsychologist had already ad-
often stem from a position of weakness and do mitted on deposition that during his entire
not provide true commentary on one's human professional career he had never concluded with
worth. A lawyer who is really loaded with relative certainty that someone was not brain
ammunition may well prefer to take on an damaged, nor had he ever directly identified
almost pained and solemn expression when someone as malingering. He further admitted
bringing to light terrible problems with the that in the instant case, he had presumed that
expert's work (so that the jury does not end up the plaintiff was brain damaged nearly from the
feeling sorry for the doctor). On the other hand, start, based solely on the very limited (and
arrogance and unwillingness to admit obvious seemingly far from definitive) information he
points, and other similar demeanor may leave received at the time the referral was arranged. It
the jury hoping that the expert gets what he would not matter very much what that expert
deserves and may give the lawyer license to deal said during his direct, because the lawyer was
out punishment. Obvious lack of preparation going to come back to these points, drive them
can also quickly alienate a jury. home, and then stop, leaving the expert in
If the lawyer is particularly successful in shambles. In fact, a lawyer often is able to
weakening an expert, she may then introduce prepare a cross-examination without thinking
materials and questions that are not so much all that much about what the expert will say on
intended to damage the expert's credibility direct. Rather, the lawyer may consider the gist
further, for that job has already been accom- of the expert's testimony, and then focus most of
plished, but rather to put on her own side of the her attention on the underlying bases or
case. For example, rather than the attorney evidence for conclusions and on matters mini-
attacking the credibility of a plaintiff directly, mally related to the specific content of the direct.
which, if overdone or too aggressive, could Points of attack, to be covered in order, may
inflame a jury, it is often safer for an attorney to include credentials, bias, flaws in the conduct of
Lawyers' Strategies and Tactics 589

the examination, questionable conclusions, and then try to show that the expert failed to live by
weaknesses in scientific underpinnings. Again, such principles in the present case, or was not
almost all of these subjects come back to the even-handed in considering and presenting
expert's credibility. Many lawyers would rather evidence.
stay away from scientific topics, but others are Some reports overemphasize or overattend to
well prepared to enter this arena. Experts may either the good or the bad, and may thereby
be accustomed to getting by with cursory or make it easy for the lawyer to demonstrate bias.
questionable answers to inquiries about under- For example, an expert may list only the positive
lying scientific methods or potential scientific responses on an anxiety inventory. The lawyer
weaknesses, and may be shocked the first time might start reading the negative responses and
they confront an attorney who starts asking after each one merely ask the expert, ªDid I read
very specific questions about one or another line the item correctly?º Alternatively, the expert
of research and will not settle for incomplete or may use dramatic terms and more lengthy
vague answers. Some experts have permanently narrative to describe weak cognitive perfor-
damaged their credibility by placing on record a mances, while downplaying, or even ignoring,
host of patently wrong answers to questions strong performances. Problems may be referred
about science. The result is the appearance of to as ªdeficits,º ªimpairments,º or ªserious
incompetence or, even worse, dishonesty. I have shortcomings,º but a remarkable performance
read more than a few transcripts with claims is described as ªessentially within normal
such as the following: Basic psychometric limits.º
principles that apply to psychological tests are Bias may also be suggested when experts fail
not relevant to neuropsychological tests; there is to conduct sufficient investigation of alternative
no type of board certification in psychology; the explanations for results. For example, an expert
(you name it) battery is nearly 100% effective in who quickly decided that the accident caused a
diagnosing brain injury; there is an absence of brain injury may have been superficial or
research showing a limited relation between incomplete when looking into substance abuse
experience and accuracy, and so on. as a possible alternative cause for reduced
cognitive efficiency, despite records containing
various suggestive references. Or, in a PTSD
4.19.5.1 Credentials case, the expert may have made a perfunctory
attempt to uncover and analyze other potential
As discussed, most experts have credentials
stressors. If the lawyer can show that alternative
that sound impressive to a jury. For example,
explanations were plausible but not pursued
completing a doctoral dissertation, receiving an
with anything approaching the zealousness
advanced degree, and publishing (at all) are
displayed for the favored explanation, it can
likely to be viewed favorably. More advanced
create a very negative impression.
accomplishments are not likely to make much of
Other experts come across as biased because
a difference. It is ironic, then, that some experts
they will not concede the obvious or will not give
will puff and inflate, or even distort, small
ground, even when they should. Similarly, some
points that a juror is likely to find unimportant
experts strike jurors as evasive. In one case in
or trivial. As illustrated earlier in the cross-
which I consulted, it was blatantly obvious that
examination exchange on graduation from an
the plaintiff's signature was much poorer when
accredited program (see the heading, ªAdmis-
produced at the psychologist's request then it
sibilityº), these little points can expand into
was when he signed checks and documents in
disasters if they are discovered, especially if the
the course of his everyday life. When the lawyer
expert will not yield in the face of obvious
merely asked, ªDo these signatures appear to be
contrary evidence. It is not unusual for
of different quality?º the expert would not
attorneys to conduct background checks and
concede the point. This allowed the attorney to
to obtain information about an expert's
keep reframing the same basic question in a way
credentials and past experiences.
that made the expert look increasingly out-
landish, e.g.,
4.19.5.2 Bias
Q: Isn't there a clear difference in the quality?
A cross-examiner may start by asking the Q: Isn't there some difference in the quality?
expert whether he endorses the scientific Q: Do these different signatures look exactly
method, and whether it calls for impartiality the same to you?
in examining data, including fair consideration
of evidence for and against a proposition. It is The lawyer concluded the cross by stating,
difficult to imagine an expert answering nega- ªDoctor, what would you say if I asked
tively to inquiries of this type. The lawyer may you . . . ah, never mind, I know what you'd
590 Forensic Assessment

say.º Although there may have been a perfectly wrong. Many lawyers are careful in collecting
reasonable explanation for the difference in and reviewing facts, are attuned to incomplete
writing quality (e.g., the plaintiff was heavily or faulty factual renditions, and are capable of
medicated when seen by the psychologist), the using such material to raise challenges, that is,
expert's stubborn adherence to an unsupporta- they are much more at home with this type of
ble position convinced the jury that nothing he subject matter than with psychological concepts
had said before, and nothing he might say after, and research. An attorney is likely to be
was worth a second thought. considerably more comfortable and effective
Other ways the lawyer might try to show bias arguing about former earnings than about
include financial incentive (e.g., the expert whether the subject-to-variable ratio was suffi-
charges much higher fees for legal than clinical cient to conduct a multivariate analysis.
work, or has performed numerous legal evalua- Experts sometimes present conclusions that
tions with the same attorney), or systematic sound just plain silly or out of touch. For
error that consistently favors the expert's example, they may make a great deal out of
position. For example, a defense expert may normal human failings or overlook obvious,
have repeatedly overcredited a plaintiff on everyday explanations for events. I have read
cognitive tests, thereby underestimating loss in many reports in which supposed deficits are
functioning. illustrated through examples that apply to most
anyone, e.g., difficulty getting organized for a
vacation, occasionally forgetting the exact day
4.19.5.3 Manner of Conducting the Examination of the month, or a tendency to fatigue by the late
afternoon (this in a person with a hectic job and
Examination procedures have been covered
four children). Financial incentives in cases may
at length previously and will not be repeated
be too readily dismissed, or not acknowledged
here. Errors of omission and commission can
as potentially relevant. For example, a plaintiff
create easy fodder for attorneys and can lead to
who complains bitterly about problems but who
a complete disregard of the expert's testimony.
has complied with almost no treatment recom-
mendations, even those involving minimal
4.19.5.4 Erroneous or Questionable Conclusions effort and possible discomfort, may be de-
scribed as giving no thought to the legal case and
Erroneous conclusions can stem from various only wanting to get better. Some psychologists
factors, such as mistakes in the scoring of tests are so used to thinking in complex and abstract
or misapplication of scientific methods. In many ways that they tend to overlook more common
instances, the correct answer is not cut and dry, or seemingly mundane considerations. At other
and it may be difficult to show, conclusively, times, experts may fail to think through the
that an expert has made an error. Considering, implications of presumed problems or deficits,
however, the selectivity of cross-examiners, if and whether the kinds of things one conse-
even a few instances of clear-cut errors on non- quently expects to be present in the plaintiff's
trivial matters can be identified and brought everyday life and lifestyle are present and those
out, it may greatly reduce the expert's impact. that seemingly should not be present are not
Concrete facts often provide the lawyer with the present. For example, if an individual really has
best opportunity to find such occurrences, and a severe problem with hostility and impulse
the search may be greatly facilitated if the control, it is doubtful his hunting buddies would
expert's review of records has been incomplete continue their monthly get togethers at the
or careless. cabin. Or, if the plaintiff really develops
I have reviewed many cases in which, for excruciating headaches when exposed to noise,
example, an expert did not confirm educational one would not expect her to join a rock band.
level. For a neuropsychologist who adjusts most Everyday activities that fly in the face of the
or all test scores in relation to education, this expert's conclusions can be decisive with juries.
may lead to erroneous results on almost all
measures. It can be difficult for a witness to
regain his equilibrium with cross-examination 4.19.5.5 Scientific Status
surprises of this magnitude, especially if the
normative tables are back at the office. I have An attack on scientific status might be broad
also reviewed many cases in which the expert and aimed at the field in general, or narrow and
had taken a very limited occupational history. more specifically targeted at the particular
The psychologist might have testified that the methods used in the case. Many lawyers shy
plaintiff had never been fired from a job and had away from challenging scientific foundations,
achieved a certain level of earnings, when work but others have little or no hesitancy to do so,
records show that these assumptions are plainly are conversant with the issues, and have retained
Depositions and Trial Testimony 591

a consultant to help them prepare. Further, under increased scientific scrutiny and eliminat-
unlike cross-examination, deposition inquiries ing those found wanting.
about scientific status carry little risk, and
helpful admissions or responses to even a small
minority of questions may satisfy the attorney's 4.19.6 DEPOSITIONS AND TRIAL
eventual and basic purposeÐto reduce or vitiate TESTIMONY
the expert's credibility at trial. An attorney may
As part of the discovery process, most states
spend 2 or 3 hours asking many questions about
allow attorneys to depose opposing experts in
the expert's methods and scientific backing, and
order to learn what opinions they may express at
then, at trial, focus on the one or few areas of
trial and the underlying bases for their views.
questioning in this domain in which she feels she
Attorneys vary greatly in their approaches to
can make the most headway. Again, effective
depositions, and the type and position of the
cross in just a few areas can inflict more general
case may dictate strategy. Some attorneys play
damage. Thus, other then wasted time and
dumb, ask many open-ended questions, and try
expense, it matters little if only one in three, or
to get as much helpful material as they can while
one in five, or one in ten areas of deposition
revealing as little as possible about their trial
questioning about research will be used at trial.
strategy and anticipated lines of cross-examina-
An expert makes the lawyer's search for
tion. Their aim is to surprise the expert at trial.
damaging material on science much easier if she
Others are far more aggressive and ask pointed,
uses poor methods, lacks basic familiarity with
challenging questions that are intended to inflict
pertinent research, makes grossly overblown
damage, even should the element of surprise be
claims, will not concede limitations that are well
reduced. If the case is almost certain to go to
established in the literature, or repeatedly
trial, many attorneys will take a more guarded
guesses when she is not sure about answers to
posture, trying to save their best material for the
questions. Much like certain forms of the
cross. If it is a case the attorney wishes to settle,
martial arts, many points about science would
the deposition style may be more aggressive and
probably carry little impact with the jury if
aimed at showing the other side that its expert
experts did not take actions that gave them
has problems and that monetary demands or
strength. Suppose, for example, there are a few
offers need to be adjusted. Many cases call for
negative studies on some otherwise well-sup-
some type of strategy in between, for example,
ported method for appraising malingering. If
one that exposes weaknesses in a few areas to
asked, the expert could say, ªAlthough most of
convey a message that might aid in settlement
the literature on the method is positive, one has
negotiations, but that reserves some or most of
to use it with some caution because isolated
the material for cross should the case go to trial.
studies have not been supportive.º However, on
deposition, the expert might have argued that
the literature is uniformly supportive. Once 4.19.6.1 A Sampling of Deposition Topics
receiving that answer at the deposition, the
lawyer may have nudged the expert out further Although the supposed aims of a deposition
and further on a limb. For example, the lawyer may be to uncover trial opinions and their bases,
might ask about the importance of maintaining the attorney often already has a very good idea
familiarity with literature on the methods one what conclusions the expert will express,
uses, how negative literature can bring a especially if a detailed report has been prepared.
proposition into question, etc. Then, at trial, For instance, basic elements of PTSD cases are
the lawyer can wave around the negative studies frequently similar, e.g., that it was the event in
that the expert denied existed and recite the question that caused the disorder, that a
names of authors that the expert separately decrease in level of functioning resulted, and
acknowledged as authorities. that the plaintiff was essentially forthright and
As discussed, selection of the strongest cooperative during the examination. The attor-
possible methods is important not only in ney's main deposition aims will probably lie in
dealing with cross-examination, but in max- other areas, and the scope of questioning is
imizing the chances of reaching correct conclu- often wide ranging.
sions and thereby fulfilling the presumed Although many experts prefer to talk in more
prescriptive (i.e., normative) role of a court- general and abstract terms (e.g., this is a serious
room expert, to assist the jury in its delibera- case of PTSD that has caused substantial
tions. If there are no adequate methods distress and diminished functioning), many
available, or if the best available methods are attorneys prefer to talk in more specific and
questionable, the expert might decide not to concrete terms. In the area of damages, they
undertake the assignment at all. Also, as noted, want to know specifics. What is it exactly that
post-Daubert many courts are placing methods the plaintiff can and cannot do? How long will
592 Forensic Assessment

therapy need to continue and at what fre- available in 1994 and that may have altered the
quency? Does the expert have an opinion about diagnosis and treatment recommendations?
work capacity and about diminution in earn- Additionally, on deposition, the opposing
ings? Exactly which problems can be attributed attorney commonly asks the expert whether
to the event and which pre-dated it? For he plans to do anything further on the case
problems that have supposedly been exacer- before the trial, and requests that any changes in
bated, what is the precise extent of the change? opinion be disclosed. (Some jurisdictions re-
Some psychologists become unnerved by these quire amended reports if new or altered
types of questions and react in ways that become opinions are to be introduced as evidence at
problematical at trial. For example, they may trial.) This puts the expert in a very difficult
fail to admit uncertainty, speculating about position if he plans a last minute review of
specifics that can be shown through concrete documents he should have examined much
example and evidence to be wrong. earlier.
In addition to specific elements of damage, The attorney may also ask whether any
some of the topics that the expert can expect a literature was used or consulted in the case. If
well-prepared attorney to cover on deposition in the response is positive, detailed questioning
a serious case can be discussed in turn. The may ensue. Some attorneys will ask knowledge-
attorney is likely to ask the expert about all her able questions about such matters as norms,
sources of information and when they were reliability, and validity. Related questions may
obtained. The attorney will probably request be asked about the specific assessment methods
that the expert bring her complete file to the used in the case. For example, for the various
deposition, and may go through it document by tests the psychologist administered, the lawyer
document, asking when each was received. The may want to know what normative standards
attorney may also raise questions to determine were used, whether other norms were available,
who controlled the flow of information, e.g., did the basis for selecting one set over another, what
the expert request the document, did the results would have been produced with other
attorney send it on his own, or did the plaintiff norms, and whether the norms contained
provide it? This type of questioning may demographic corrections. The expert might also
translate into lines of cross-examination aimed be asked about the existence of literature that
at showing either that the expert formed raises questions about her assessment methods
opinions early and absent critical information; or demonstrates limitations in their use, and
that the expert never has reviewed key docu- whether each method is supported by a body of
ments; or that the lawyer, rather than the expert, research on accuracy or validity. Again, a
determined what materials the expert reviewed. positive response may be met with numerous
Further, knowing what documents the expert specific questions, such as what studies exist and
has not seen can give the lawyer a decided who published them; whether they involved the
advantage. For example, if the attorney, but not exact same methods, populations, and ques-
the expert, knows that the plaintiff misrepre- tions; and whether there is contrary literature.
sented her educational and occupational his- With questioning such as this, inaccurate
tory, this can lead to embarrassing problems at answers or overblown claims, rather than
trial. concessions about relative weaknesses, often
As discussed already, the psychologist should cause experts far greater problems at the time of
not assume that the attorney who retained her trial.
will provide all relevant documents, or that it An attorney usually cannot introduce litera-
matters little when records were reviewed. ture during cross-examination unless the oppos-
Attorneys may not know which of the available ing expert has acknowledged it as authoritative
documents an expert might want to see, or what or as something he relied on in forming his
new materials an expert might want to obtain. opinion. (The attorney may still have the option
Some attorneys try to contain costs by limiting of introducing that literature through his own
an expert's access to records or, in some cases, expert.) Generally, the lawyer wants to find out
prefer to withhold certain documents for fear of at deposition whether the needed acknowl-
their impact, hoping that the case will settle or edgement can be obtained, because it is highly
that it will not create too big a problem at trial. preferable to know what is feasible in advance of
Also, if a conclusion is reached weeks, months, trial and plan accordingly. Discovering at trial
or years earlier, and the expert does not review that an expert will not acknowledge an article
potentially critical documents until the eve of a that was to serve as the centerpiece of a cross-
deposition or trial, it can look very bad, examination may leave the attorney on a bridge
especially if treatment recommendations had that just lost its undergirding. Once an expert
been issued. How can the expert explain why it acknowledges an article as authoritative at
took until 1997 to review documents that were deposition, there is usually no keeping it out at
Depositions and Trial Testimony 593

trial, even if the expert tries to backtrack. For her own expert, who can acknowledge the
example, should the expert say at trial, ªBased existence or status of literature the opposing
on subsequent reading I no longer view that expert has denied. Finally, it is almost always
publication as authoritative,º the lawyer can possible to get certain literature introduced
still refer back to the acknowledgement at the through one or another means. For example,
deposition and will probably be allowed to the lawyer will almost always be permitted to
introduce the article and ask questions about it. ask questions about the manuals for the tests
Some experts, due perhaps partly to uncer- that the expert used.
tainty about the legal meaning of ªauthorita- The attorney will probably ask whether the
tiveº and because they wish to appear widely expert has talked with anyone about the case. If
read, acknowledge a great range of literature as the answer is affirmative, questions will almost
authoritative or provide very general endorse- surely follow about the identify of the other
ments. However, in the legal arena, authorita- party or parties and what specifically was
tive means, in effect, that the expert defers to the discussed; and the attorney might decide to
source or considers it worthy of attention. This depose one or more of these individuals to get
type of definition should be kept in mind when their descriptions. For example, if the defendant
answering deposition questions about what is psychologist in a malpractice case talked to his
authoritative. Thus, for example, unless an supervisor, that supervisor may well be de-
expert believes that every article ever published posed.
in a particular journal is the definitive word on a The attorney may conduct thorough ques-
topic or somewhere in this arena, it is probably a tioning about credentials. The expert may be
mistake to endorse that journal as a whole as asked about courses taken in graduate school,
authoritative. Rather, one might say something whether an APA-approved internship was
like, ªThe X journal contains a number of completed, and if the expert pursued a post-
strong articles and others I do not think are as doctoral fellowship. Questions may also be
good, and I would really need to know what raised about supervisors and their qualifica-
article or articles you are referring to in order to tions, any malpractice claims or ethics com-
tell you whether I think they are authoritative or plaints, performance on the licensing
if I relied on them in this case.º It is also examination, continuing education activities,
reasonable to say that one respects a certain and board certification. There are also likely to
author, although one does not necessarily agree be questions about research activities and
with everything the writer has said, and would publications. Some experts exclude publications
need to know specifically what the attorney is from their resume that might be embarrassing,
referring to in order to answer a question about which tends to make things worse if the attorney
authoritativeness or possible use in the case. uncovers them.
Other experts will go to the other extreme, Attorneys frequently ask about fees and fee
denying that anything is authoritative. Such arrangements, and how charges for legal work
experts can make a poor courtroom impression compare to those for other activities. In some
because they come across as pushing the view cases, experts insist that an evaluation was
that there is only one person worth listening to conducted primarily or solely for clinical
on a topic, themselves naturally; jurors tend not purposes, but have billed at the much higher
to like individuals who act like self-appointed, rate used for their legal work. The expert may
self-anointed know-it-alls. Also, the lawyer also be asked how often she has been retained by
often can still ask general questions about the the same attorney in the instant case, and by the
literature that incorporate the gist of the attorney's firm, and perhaps what percentage of
findings, such as, ªIsn't it true there are many her income comes from these cases or her legal
studies showing (something contradictory to work in general.
what the expert has described)? The lawyer The expert should be cautious about circum-
might hold up a stack of articles to convey the stances in which she is unlikely to be paid unless
impression that the assertion about the litera- ªher sideº wins. For example, an expert may
ture did not emerge from thin air. The expert have a multi-thousand dollar fee outstanding
may not acknowledge the authoritative status of with an impoverished plaintiff. A cross-exam-
the literature or the findings, and hence the ining attorney can bring out the situation and
lawyer might not be able to get further, or much ask the expert whether these financial arrange-
further, into the specifics of the particular work. ments might make it difficult to be fully
However, the lawyer has been able to bring the objective. The lawyer probably will care little
findings to the attention of the jury, and the about what the expert answers because the seed
expert's repeated rejections of contrary litera- has been planted in the jury's mind, and an
ture can create a strong suggestion of bias. The expert who denies such a possibility might well
cross-examiner also has the option of putting on appear to be showing the very bias he denies he
594 Forensic Assessment

manifests. It is thus often advantageous to have a time line or chronology that summarizes
at least an understanding, if not a written major events or findings can greatly reduce the
agreement, that the attorney who retained the time that is needed to become reacquainted with
expert will be responsible for fees (although the details at a later date.
deposition time will usually be covered by the It is usually sensible to meet with the attorney
opposing attorney). with whom one is working in advance of the
Experts may also be asked about possible deposition. If not already accomplished, the
alternative explanations for reported problems attorney needs to gain a clear understanding of
or difficulties, evidence that might exist for and what the expert can and cannot say, the
against each possibility, and the process they boundaries of the expert's opinions, and what
followed in making their selections. They may materials have been reviewed (e.g., the expert
be asked what type of evidence could be may have examined literature on his own that
uncovered that might alter their selections or was not part of the attorney's case file). The
opinions. If the expert has not been thorough in expert also may need to be informed about
reviewing background materials, the answer particular technical or legal issues. For example,
may contain elements that, in actuality, are certain materials may have been excluded or
present in the file. For example, suppose the deemed inadmissible and should not be referred
expert answers that narcotic abuse could to when discussing opinions. The expert may
produce the symptoms he observed on his also benefit from learning something about the
examination, and that the main way of opposing attorney. For example, some attor-
distinguishing this possible cause from the one neys may try to be especially provocative during
he identified is temporal sequence or associa- depositions, hoping the expert will respond
tion. It might just be that the plaintiff had angrily and say something foolish.
obtained prescriptions for pain-killing narcotics If a lawyer is not interested in a pre-deposition
from three separate doctors, was filling them meeting, it might be time to start wondering
simultaneously, and that the alterations the about the attorney and the situation that the
expert described started 3 months after the expert might have gotten himself into. Similarly,
accident but just days after the visit to the last the attorney needs to allow (and be willing to
pharmacy. pay for) adequate preparation time. For
The expert may be asked to critique the work counsel, this is likely to be one case with one
and opinions of other experts, including those expert, and the attorney's more basic obligation
on her own side of the case. Sometimes the idea is to the client; for the expert, most every case
is to link a strong expert with a weak one, becomes part of his ªpermanent record.º The
creating considerable difficulties for the former attorney is unlikely to lose the next case because
when the latter fares badly. If the expert says he her (now abandoned) expert in the previous case
endorses another expert's work completely, he was trounced, but the consequences for the
had better be prepared to live with the approval expert may carry across cases and years. Some
not only in the present case, but perhaps in experts, however, go overboard, spending too
future cases as well. For example, failure to much time on minor or secondary issues or
criticize weak methods when asked about views requiring inordinate amounts of time to prepare
on the other expert's work may translate, in for depositions. It is hard to say exactly where to
effect, into tacit endorsement, which can be draw the line, and matters become much more
brought up again in the next case and may also difficult if disagreements about how to handle
lead the expert into contradictory positions. and prepare for a case arise late. If the expert
feels that the attorney is setting restrictions that
compromise professional standards, and if this
4.19.6.2 Some Suggestions for Depositions situation is understood from the beginning, she
can turn down the case as outlined. Alterna-
Perhaps the most obvious suggestion, and tively, the attorney can be told what limits this
one that applies similarly to depositions and could place on the soundness of opinions, and
cross-examination, is to prepare thoroughly. that any such shortcomings would be openly
The required preparation varies and may conveyed on a deposition. The attorney can
include a review, or re-review, of the case then decide whether to loosen restrictions or
records, relevant scientific publications, and, retain someone else, the latter option sometimes
perhaps, test manuals and related materials. being best for both parties.
With large files, it would not be unusual to need Deposition questions, especially those that
a half day to refamiliarize oneself with the case, are well articulated, frequently call for specific
especially if it has been laid aside for some time, answers and not dissertations. For example, if
and for very large files, a full day might be the attorney asks, ªWhat is contained in your
required. With more complex files, constructing file?º or, ªDid you consult any scientific
Depositions and Trial Testimony 595

literature when working on this case?º an a deposition proceeds can depend in large part
exegesis on the state of psychology is not on the rapport between the opposing lawyer and
required. It is often unwise to answer vague, the expert. If the lawyer is constantly rephrasing
unclear, or overly general questions. For answers, attempting to get the expert to endorse
example, the attorney may ask, ªDo you believe a distortion that supports some contrary
Smith suffered a head injury?º It may be argument, or if the expert is really being
impossible to tell whether the question refers nonresponsive in order to evade the import of
to any type of injury to the head (e.g., a facial questions, the whole process can bog down and
laceration), or specifically to the brain. If one become cumbersome, to say the least. If,
goes ahead and answers ªyes,º an attempt at however, the lawyer is doing his best to be
trial to explain that one thought the question clear and the expert her best to be responsive,
referred to any type of head injury, and not and if the two can cooperate in clarifying
solely to a brain injury, may fall on deaf ears ambiguities in questions, the process will usually
(assuming the expert ever gets a chance to go along reasonably.
attempt an explanation). Responding to overly The lawyer who has retained the expert will
general questions can cause similar problems. sometimes ask her to approach depositions
For example, the attorney may ask, ªIsn't it true parsimoniously, restricting herself to the ques-
that individuals with PTSD show difficulties tion asked and volunteering no additional or
with interpersonal relationships?º extra information. As a somewhat hyperbolized
When problematical questions are raised, one example, if the lawyer asks, ªDid you do
can simply say that the question is difficult to anything to confirm allegation A?º the answer
answer as stated and explain why, e.g., ª . . . is, ªYes,º not, ªYes, what I did was . . .º Other
because the answer differs depending on the lawyers will ask experts to answer more fully.
specifics.º If the question was not worded They may fear that areas of testimony will be
artfully but the expert thinks she knows what blocked if, upon questioning, opinions and their
was intended, it is reasonable to respond with bases are not adequately explicated. For
something like, ªI understand your question to example, if cursory answers are provided to
mean . . .. Assuming this, then . . .º The lawyer, questions about the literature the expert relied
of course, can stop the expert if the rephrasing on, she may be prohibited from discussing that
distorts the intended meaning (although he literature at trial. After all, a fundamental
might prefer the question the expert has created purpose of a deposition is to learn the under-
over his original one). lying bases for an expert's opinion, or to avoid
Sometimes single or seemingly small changes courtroom surprises, so that the opposing
in wording can ruin a question. For example, lawyer can prepare for trial. Other lawyers
the lawyer might start by asking, ªIs it fair to say want more complete answers because they hope
that post-traumatic stress syndrome . . .,º and or expect that a show of strength will give them
one might not know if the reference is to the settlement leverage.
formal diagnostic category or to symptoms that Whatever the response style adopted, it is
can follow trauma. Depending on which mean- questionable to avoid an answer by exploiting
ing is intended, the answer can change entirely. trivial technical problems with the wording
There is often nothing wrong with explaining when the expert really knows what the opposing
what the ambiguity is and why it may be attorney is asking. I attended one trial in which
important to clarify the exact reference; and an expert was asked, ªIs there board status in
usually a good attorney can quickly discern neuropsychology?º The context of this question
whether experts are trying to be cooperative and almost surely made it apparent to the expert that
are requesting needed clarification, or rather are the lawyer meant board certification. However,
being difficult and evasive. because the lawyer did not ask the question
Although it is important to listen to questions exactly right and the expert wanted to avoid
very carefully to be sure one knows just what is losing ground, the response was, ªNo.º Aside
being asked and to determine whether the from such a response style being disingenuous
question is answerable, one can go too far and and arguably obstructing the legal process,
make the process miserable for everyone. It can getting caught can be costly. In the case in
be very difficult for lawyers who are not expert question, during a break, the lawyer recognized
in an area to ask questions with a high level of his mistake. He went back and made it very
technical proficiency and exactitude. As long as obvious to the jury that the expert had known
the question is clear or the expert, through exactly what the attorney was asking and had
rephrasing, can check to make sure the question avoided answering on a technicality. This made
is understood (with this assumed meaning the answer hurt 10 times more than it ever would
memorialized on the record), it should be have had it been surrendered earlier. Those less
possible to answer. The relative ease with which familiar with the trial process may be surprised
596 Forensic Assessment

by the magnitude of the consequences that can Thus, at least within the context of the litigation,
follow when an expert is caught in an intentional that attorney may best be thought of as someone
misrepresentation, whether it is a direct lie or the for whom you are the enemy.
offspring of evasion. Lawyers will sometimes spring new sources of
Uncertainty about the answer to a deposition information on experts during depositions, such
question should be readily conceded. One as publications, medical reports, or documents
cannot know everything, and such a concession relating to the plaintiff's everyday functioning,
is usually far less injurious than a wrong guess, such as work records. If one is unfamiliar with
or repeated wrong guesses. Lawyers may try to the material, it calls for an open admission. If
bait experts by insinuating in some way, perhaps the lawyer wishes to ask questions, the expert
simply by tone of voice or suggestive words, that should request the time needed to review the
they are asking about something that is very material. If it is not possible to perform an
basic and that any professional would be adequate review on the spot, the deponent
expected to know. ªDoctor, are you familiar should so indicate. Alternatively, if the expert
with the large number of studies on . . .?º ªIsn't believes she has acquired a reasonable grasp of
one of the most frequently demonstrated the new material, she might state on the record
findings in your field . . .º The lawyer may be that she will try to be helpful by answering
hoping that the expert will make guesses, questions about documents that she has just
because with each guess there is an increased seen for the first time, although upon more
risk of error. If an expert guesses 10 times and is careful study and reflection impressions or
wrong three times, it is not hard to anticipate conclusions might change. Keep in mind that
which three topics will be raised on cross- new materials might be presented out of context,
examination. The expert should not feel that she which can lead to misimpressions. For example,
has to be absolutely certain about everything one may come to learn that a letter describing
she utters in a deposition, but if the level of insubordination as a basis for termination was
uncertainty passes some relatively low thresh- written by a boss subsequently arrested for
old, it might at least be noted (e.g., ªIf I criminal behavior that endangered the public,
remember right . . .º), or one should just say that and that it was the plaintiff who had heroically
one is not sure and does not want to hazard a reported the misdeeds.
guess. (Of course, if these types of uncertainties
apply to opinions, and especially if the expert is
merely guessing or almost tossing coins, he 4.19.6.3 Trial Testimony
probably does not belong in the courtroom in
the case; and one hopes the attorney who has Prior to trial testimony, as with depositions,
retained him will not learn about this for the first the expert typically should meet with the
time during the deposition.) Similarly, it is attorney. The meeting can address the topics
extremely risky (and potentially unethical) to to be covered on the direct examination and
make some claim that cannot be supported, e.g., what might occur on cross-examination. Once
ªDespite the many negative studies on experi- again, the lawyer needs to understand what the
ence and accuracy, there are far more studies expert can and cannot say and any reservations
showing a positive relation between the two.º and uncertainties pertinent to the expert's views.
Depositions require a high level of concen- Almost any attorney would want to know about
tration and can strain endurance. If one is too these limitations and problems in advance,
tired to pay close attention to questions, it is at preferably as soon as possible, rather than
least time to take a break, or to call a halt to the discovering them during the expert's cross-
process. Stopping can be cumbersome if examination. In one case, an expert testified that
expensive travel arrangements are involved, the plaintiff would have developed a particular
but poor answers on depositions can destroy disorder whether or not an accident had
cases and haunt experts for years, or forever. occurred. What he did not tell the attorney
Experts also should not let their guard down, that retained him was that he believed it may
something that is more likely to occur with have taken as long as 10 more years for the
fatigue. An off-handed comment on a break condition to develop if the traumatic accident in
may be the first topic raised when the deposition question had not occurred, something that came
is resumed. It may be particularly hard to out at the end of the cross-examination. Had the
maintain vigilance with a highly personable, lawyer known about this qualification in
friendly attorney. Whether or not the attorney is advance, he would have offered considerably
an honest, good-willed, considerate, and kind more money to settle the case.
individual, her job is to win the case and she is For each topic to be covered on direct, the
likely to use anything she can (within profes- lawyer needs to be able to ask the expert a
sional ethics and the law) against the expert. question, or questions, that permit entry into the
Depositions and Trial Testimony 597

subject matter. If a sufficiently precise question displays, I often use overheads because the
is not articulated, the expert may not even equipment is readily available, materials are
understand what the lawyer is asking, or may inexpensive, materials can be blown up to
have no way to connect the question to the sufficient size with little distortion, and one can
intended material. The lawyer can try out keep the lights up in the room, a real advantage
questions in advance to see if they prove late in the day.
sufficient to elicit the intended topic. An expert Some experts try to impress juries with their
also needs to prepare for the possibility that technical knowledge. In particular, their pre-
upon objection, the judge may preclude queries sentations are laden with jargon. Although
or entire lines of questioning, which again some technical terms may be necessary and have
emphasizes the need for expert and attorney to exact meanings that are otherwise difficult to
have a good idea about the topics to be covered. capture succinctly (e.g., ªpost-traumatic amne-
If the attorney does not really have this siaº), other terms are pedantic and contribute
appreciation and is operating more by rote, little. Overuse of jargon is likely to alienate
unexpected alterations in the direct due to a jurors.
judge's rulings or small slip-ups can be On direct and cross-examination, the expert
extremely disruptive. should remember which audience is the im-
Like a decent lecture, testimony is usually portant one. It is not either counsel, but the trier
more effective if it is straightforward, accessible, of factÐthe judge or juryÐthat will make the
unfettered with needless and endless complica- ultimate decisions and that the expert is there to
tions, and not too lengthy. The jury may already address. If the opposing attorney hates you, or
have had a long day by the time the expert goes acts like you are an object of disdain, it does not
on, and, often, the expert's testimony is just one influence the outcome of the trial an iota if the
piece, although perhaps a crucial one, of a much jury feels otherwise. Answers often should be
larger composite that may include long strings directed at the jury, that is, one should look at
of witnesses and days of evidence. A complex, the jurors and speak to them.
obtuse, and exceedingly detailed presentation Many points about cross-examination have
may quickly bore the jury and lead to inatten- already been covered, and only a few additions
tion. This is not to argue for glossing over will be provided here. Cross-examination, as
crucial points or simplifying at the cost of noted, often does not follow the outline of the
distortion. It is often very important to present direct examination, so that the expert should
the underlying bases for opinions, rather than not be surprised if completely different topics
just the conclusions, and in some instances are raised. Depositions may give helpful clues
detailed analysis and explanation are needed. about at least some of the upcoming points of
For example, when discussing the use of a attack, but there will almost always be some
psychological instrument in malingering detec- unanticipated questions at trial. As with
tion, a relatively detailed description of scales depositions, or more so, preparation is extre-
and their rationale may be necessary. Never- mely important. In addition to a pre-trial
theless, careful analysis will show that in most meeting with the attorney, one should be
cases one's testimony revolves around only a familiar, or very familiar, with the file, and
few major points, many details are not of with key background literature.
particular importance, and the direct can It is exceptional to get through an entire
usually be limited to 1 to 2 hours or less. Also, cross-examination untouched, and few cases
every point raised in direct is potential material that get to trial are so one-sided that at least
for cross-examination. Thus, unneeded compli- some reasonable counterarguments to the
cations or content are not neutral and primarily expert's opinions cannot be raised. In the end,
have a down side. if the expert has done a decent job on direct, and
Visual displays can strengthen an expert's if the attorney scores some points on the cross-
direct considerably by increasing interest and examination but misses as or more often, it is the
clarity. One might consider using at least one cross-examiner who has probably lost consider-
visual aid for each major point covered. It is able ground. One of the worst characteristics of
usually preferable to keep demonstratives basic. some witnesses is to not concede anything, even
A rule of thumb is to ask whether a visual aid is the obvious, as if the loss of even one exchange is
nearly, or completely, self-interpreting and can intolerable and completely nullifying. An expert
be comprehended quickly or with minimal who is defensive and unreasonable, and who
explanation. Visual materials do not need to does not make what would seem to be required
be fancy or elaborate, only clear. For example, a concessions loses credibility. If the attorney
memory disorder in which rehearsal yields asks, in the context of DSM-IV criteria for
minimal gains in new learning can be easily malingering, ªWould you agree that lying on the
illustrated through a learning curve. For visual interview could be viewed as lack of cooperation
598 Forensic Assessment

with the examination?º the simple answer would tapping instruments: Halstead±Reitan and Western
seem to be, ªYes.º The attorney has not asked Psychological Services. International Journal of Clinical
Neuropsychology, 8, 64±65.
whether the expert thinks the plaintiff lied or Brodsky, S. L. (in press). A hierarchical-conflict model of
whether the plaintiff was malingering. An expert ethics in expert testimony. Unpublished manuscript.
might try to rush in to make these points, but a Brodsky, S. L. (1991). Testifying in court. Guidelines and
good attorney will simply say something like, maxims for the expert witness. Washington, DC: Amer-
ican Psychological Association Press.
ªDoctor, I don't think you've answered my Daubert v. Merrell Dow Pharmaceuticals, Inc. (1993). 509
question. Please listen carefully,º and then will US 579, 113 S. Ct. 2786, 125 L ed 2d 469.
slowly repeat the exact same question. If the Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical
expert again tries to dance around the question, versus actuarial judgment. Science, 243, 1668±1674.
she will begin to look evasive. When it seems to Faust, D. (1984). The limits of scientific reasoning.
Minneapolis, MN: University of Minnesota Press.
the jury that the expert would vehemently Faust, D. (1986). Research on human judgment and its
disagree should the cross-examiner assert that application to clinical practice. Professional Psychology,
1 + 1 = 2, he becomes an object of ridicule. 17, 420±430.
The expert's responsibility is to conduct Faust, D. (1993). Use and then prove, or prove and then
use? Some thoughts on the ethics of mental health
herself competently, professionally, and ethi- professionals' courtroom involvement. Ethics and Beha-
cally, not to win or lose the case, and not to vior, 3, 359±380.
influence the outcome through sleight of hand. Faust, D. (1995). Neuropsychological (brain damage)
If honest concessions lead the jury to reject the assessment. In J. Ziskin (Ed.), Coping with psychiatric
expert's opinion, so long as these concessions do and psychological testimony (5th ed., Vol. 2,
pp. 916±1044). Los Angeles: Law and Psychology Press.
not stem from avoidable error or negligence Faust, D, & Ackley, M. A. (1998). Did you think it was
(e.g., scoring errors), it does not mean that the going to be easy? Some methodological suggestions for
expert failed to perform in a respectable and the investigation and development of malingering detec-
worthy manner. It may well be that the jury tion techniques. In C. R. Reynolds (Ed.), Detection of
reached the right decision and that justice has malingering during head injury litigation (pp. 1±54). New
York: Plenum.
been done. The issues at stake in the courtroom Faust, D., & Meehl, P. E. (1992). Using scientific methods
are often of great personal import, and one to resolve questions in the history and philosophy of
hopefully would much prefer to have one's science: Some illustrations. Behavior Therapy, 23,
opinion rejected than to prevail if, in truth, one 195±211.
Faust, D., & Willis, W. G. (in press). Counterintuitive
was wrong. Almost every attorney wins and imperatives: A guide to improving clinical assessment and
loses cases, and a ªbadº outcome will not care by predicting more accurately. Boston, Allyn &
necessarily lead them to develop negative views Bacon.
towards an expert. This is unlike a situation in Faust, D., Ziskin, J., & Hiers, J. B., Jr. (1991). Brain
which an expert has been dishonest with the damage cases: Coping with neuropsychological evidence
(Vols. 1 & 2). Los Angeles: Law and Psychology Press.
attorney, by withholding weaknesses in his Faust, D., Ziskin, J., Hiers, J. B., Jr., & Miller, W. J. (in
opinions, for example, and gets caught on the press). Revision and update of Faust, D., Ziskin, J., &
stand. Unfortunately, in some such instances, Hiers, J. B., Jr. (1991). Brain damage cases: Coping with
the expert's actions may have destroyed the neuropsychological evidence (Vols. 1 & 2). Los Angeles:
Law and Psychology Press.
chances that the meritorious party will prevail. Frye v. US (DC Cir. 1923). 293 Fed. 1013, 1014.
Garb, H. N. (1989). Clinical judgment, clinical training,
and professional experience. Psychological Bulletin, 105,
4.19.7 REFERENCES 387±396.
Greene, R. L. (1991). The MMPI-2/MMPI: An interpretive
American Psychological Association (1985). Standards for manual. Boston: Allyn & Bacon.
educational and psychological testing. Washington, DC: Greene, R. L. (1997). Assessment of malingering and
Author. defensiveness by multiscale inventories. In R. Rogers
Arkes, H. R. (1981). Impediments to accurate clinical (Ed.), Clinical assessment of malingering and deception
judgments and possible ways to minimize their impact. (2nd ed., pp. 169±207). New York: Guilford.
Journal of Consulting and Clinical Psychology, 49, Grisso, T. (1986). Evaluating competencies. New York:
323±330. Plenum.
Bennett, B. E., Bryant, B. K., VandenBos, G. R., & Grove, W. M., & Meehl, P. E. (1996). Comparative
Greenwood, A. (1990). Professional liability and risk efficiency of informal (subjective, impressionistic) and
management. Washington, DC: American Psychological formal (mechanical, algorithmic) prediction procedures:
Association Press. The clinical±statistical controversy. Psychology, Public
Bersoff, D. N. (1997). The application of Daubert to forensic Policy, and Law, 2, 293±323.
and social science evidence. Presentation to the Federal Hathaway, S. R., & McKinley, J. C. (1951). MMPI manual.
Judicial Center's National Workshop for Magistrate New York: Psychological Corporation.
Judges, Denver, CO. Hathaway, S. R., & McKinley, J. C. (1983). Manual for
Bolla-Wilson, K., & Bleecker, M. L. (1986). Influence of administration and scoring of the MMPI. Minneapolis,
verbal intelligence, sex, age, and education on the Rey MN: National Computer Systems.
Auditory Verbal Learning Test. Developmental Neuro- Heaton, R. K., Matthews, C. G., & Grant, I. (1991).
psychology 2, 203±211. Comprehensive norms for an expanded Halstead±Reitan
Brandon, A. D., Chavez, E. L., & Bennett, T. L. (1986). A Battery. Odessa, FL: Psychological Assessment Re-
comparative evaluation of two neuropsychological finger sources.
References 599

Matarazzo, J. D., Daniel, M. H., Prifitera, A., & Herman, Rogers, R. (Ed.) (1997a). Clinical assessment of malingering
D. O. (1988). Inter-subtest scatter in the WAIS-R and deception (2nd. ed.). New York: Guilford.
standardization sample. Journal of Clinical Psychology, Rogers, R. (1997b). Structured interviews and deception.
44, 940±950. In R. Rogers (Ed.), Clinical assessment of malingering
Meehl, P. E. (1954). Clinical versus statistical prediction: A and deception (2nd ed., pp. 301±327). New York:
theoretical analysis and a review of the evidence. Guilford.
Minneapolis, MN: University of Minnesota Press. Rogers, R., Bagby, R. M., & Dickens, S. E. (1992). SIRS.
Meehl, P. E. (1973). Psychodiagnosis. Selected papers. Structured Interview of Reported Symptoms. Professional
Minneapolis, MN: University of Minnesota Press. manual. Odessa, FL: Psychological Assessment Re-
Meehl, P. E., & Rosen, A. (1955). Antecedent probability sources.
and the efficiency of psychometric signs, patterns, or Wechsler, D. (1981). Manual for the Wechsler Adult
cutting scores. Psychological Bulletin, 52, 194±216. Intelligence Scale-Revised. New York: Psychological
Pankratz, L. (1988). Malingering on intellectual and Corporation.
neuropsychological measures. In R. Rogers (Ed.), Wechsler, D. (1987). Manual for the Wechsler Memory
Clinical assessment of malingering and deception Scale-Revised. New York: Psychological Corporation.
(pp. 169±192). New York: Guilford. Wedding, D., & Faust, D. (1989). Clinical judgment and
Reitan, R. M., & Wolfson, D. (1993). The Halstead±Reitan decision making in clinical neuropsychology. Archives of
Neuropsychological Test Battery. Theory and clinical Clinical Neuropsychology, 4, 233±265.
interpretation (2nd. ed.). S. Tucson, AZ: Neuropsychol- Wiens, A. N., Crossen, J. R., & McMinn, M. R. (1988).
ogy Press. Rey Auditory±Verbal Learning Test: Development of
Rey, A. (1964). L'examen clinique en psychologie. Paris: norms for healthy young adults. Clinical Neuropsychol-
Presses Universitaires de France. ogist, 2, 67±87.
Reynolds, C. R. (1998). Preface to C. R. Reynolds (Ed.), Ziskin, J. (1995). Coping with psychiatric and psychological
Detection of malingering during head injury litigation testimony (5th ed., Vols. 1±3). Los Angeles: Law and
(pp. vii±ix). New York: Plenum. Psychology Press.

You might also like