Interpreting The Meaning of Multiple Symptom Validity Test Failure

The Clinical Neuropsychologist
ISSN: 1385-4046 (Print) 1744-4144 (Online) Journal homepage: http://www.tandfonline.com/loi/ntcn20
Interpreting the Meaning of Multiple Symptom

Validity Test Failure
Tara L. Victor , Kyle B. Boone , J. Greg Serpa , Jody Buehler & Elizabeth A.
Ziegler
To cite this article: Tara L. Victor , Kyle B. Boone , J. Greg Serpa , Jody Buehler & Elizabeth A.
Ziegler (2009) Interpreting the Meaning of Multiple Symptom Validity Test Failure, The Clinical
Neuropsychologist, 23:2, 297-313, DOI: 10.1080/13854040802232682
To link to this article: http://dx.doi.org/10.1080/13854040802232682
Published online: 11 Jun 2009.
Submit your article to this journal
Article views: 488
View related articles
Citing articles: 73 View citing articles
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ntcn20
Download by: [Palo Alto University] Date: 12 April 2016, At: 13:46
The Clinical Neuropsychologist, 23: 297–313, 2009
http://www.psypress.com/tcn
ISSN: 1385-4046 print/1744-4144 online
DOI: 10.1080/13854040802232682
INTERPRETING THE MEANING OF MULTIPLE

SYMPTOM VALIDITY TEST FAILURE
Tara L. Victor1,2,4, Kyle B. Boone2,3, J. Greg Serpa4,

Jody Buehler3, and Elizabeth A. Ziegler3
1
California State University – Dominguez Hills, CA, 2UCLA Department of
Psychiatry and Biobehavioral Sciences, Los Angeles, CA, 3Harbor-UCLA
Medical Center, Los Angeles, CA, and 4West Los Angeles Veterans Healthcare
Downloaded by [Palo Alto University] at 13:46 12 April 2016
System, Los Angeles, CA, USA
While it is recommended that judgments regarding the credibility of test performance be

based on the results of more than one effort indicator, and recent efforts have been made to
improve interpretation of multiple effort test failure, the field currently lacks adequate
guidelines for using multiple measures of effort in concert with one another. A total of 103
patients were referred for outpatient neuropsychological evaluation, which included multiple
measures of negative response bias embedded in standard test batteries. Using any pairwise
failure combination to predict diagnostic classification was superior (sensitivity ¼ 83.8%,
specificity ¼ 93.9%, overall hit rate ¼ 90.3%) to using any one test by itself and to using
any three-test failure combination. Further, the results were comparable to the results of
logistical regression analyses using the embedded indicators as continuous predictors. Given
its parsimony and clinical utility, the pairwise failure model is therefore a recommended
criterion for identifying non-credible performance; however, there are of course other
important contextual factors and influences to consider, which are also discussed.
Keywords: Symptom validity; Response bias; Incremental validity; Effort.
INTRODUCTION
Assessing symptom validity or negative response bias in the context of forensic
and neuropsychological assessment is becoming increasingly important as civil,
criminal, and disability-related proceedings continue to rely heavily on the results of
such evaluations, and the stakes are often quite high (Taylor, 1999). However,
reliance on single measures of effort is problematic due to the fact that individual
effort tests have imperfect sensitivity and specificity. As a result, many investigators
have recommended use of multiple symptom validity indicators interspersed
throughout a test battery (Boone, 2007; Iverson & Franzen, 1996; Larrabee,
2003a; Orey, Cragar, & Berry, 2000; Vickery et al., 2004) and discriminant functions
have been developed based on multiple symptom validity tests (SVTs; e.g., Martin,
Hayes, & Gouvier, 1996). Larrabee, Greiffenstein, Greve, and Bianchini (2007)
recently addressed this area of concern using analysis with likelihood ratios (i.e., the
Address correspondence to: Tara L. Victor, Department of Psychology, CSUDH, 1000 E Victoria
Street, Carson, CA 90747, USA. E-mail: tvictor@csudh.edu
Accepted for publication: May 26, 2008. First published online: September 25, 2008.
ß 2008 Psychology Press, an imprint of the Taylor & Francis group, an Informa business
298 TARA L. VICTOR ET AL.
likelihood of a SVT failure in a malingerer as compared to the likelihood of a SVT

failure in a non-malingerer)—which was further elaborated upon by Larrabee
(2008) using more detailed empirical analysis—concluding that multiple SVT
failures are necessary for the accurate detection of negative response bias, and that
there exists a substantial increase in the probability of malingering when multiple
SVTs are failed. However, little work had been done to address how best to use the
various SVTs in concert with one another, and the answer undoubtedly depends on
the error rates and intercorrelations of the individual tests used, a topic addressed in
more detail below. Specifically, questions remain regarding how many SVTs need to
be failed before a patient is categorized as non-credible; further validation of the
extant literature in this area of needed. It is the purpose of this paper to further
address this area of deficit, building on existing work with an eye towards
stimulating additional research in this area.
INTER-TEST CORRELATION AND INCREMENTAL VALIDITY

One of the most important considerations in this endeavor is the extent to
which SVTs are correlated with one another (e.g., Dot Counting Test e-score and
Digit Span age-corrected scale score; Nelson et al., 2003), which has important
implications for interpreting multiple test failure. As Hunsley and Meyer (2003,
p. 453) explain:
On the one hand, to the extent that . . . data are derived from independent sources of
information and share minimal method variance, there is certainly value in obtaining
convergent data that supports the same clinical conclusions . . . On the other hand . . . if
the data sources are based on what is essentially the same source or form of
information, then the apparently convergent data provide little more than an
unwarranted sense of security regarding the validity or accuracy of the conclusions.
Similarly, as Rosenfeld, Sands, and van Gorp (2000) point out, we are in
danger of ‘‘overestimating the actual benefits provided by the use of multiple tests’’
(p. 355). To address this issue, these authors call for research ‘‘specifically designed
to assess the degree of independence and/or overlap in classification accuracy across
different tests’’ (p. 356).
At the heart of this issue is the concept of incremental validity, or ‘‘the degree
to which [an] instrument provide[s] measures that [are] more valid than alternative
measures of the same variables’’ (Haynes & Lench, 2003, p. 456). Indeed, the field of
applied psychology is growing increasingly aware of the importance in showing that
newly developed tests demonstrate an ability to add to the prediction of outcomes
above and beyond that which is already possible using available methods
(Hunsley & Meyer, 2003). It has been recently demonstrated that using individual
embedded SVTs from different test domains (attention, speed, etc.) allows for
relative independence in measurement (Larrabee, 2008). Embedded indicators are
those derived from standard neuropsychological assessment procedures, while
freestanding indicators are those specifically designed to assess non-credible
performance. Those few studies that have attempted to examine SVTs in
combination have typically done so by primarily employing sets of solely
‘‘freestanding’’ or solely ‘‘embedded’’ indicators.
MULTIPLE SYMPTOM VALIDITY TEST FAILURE 299
FREESTANDING SVTS
With regards to the use of multiple freestanding SVTs, there are three studies
from David Berry’s laboratory attempting to quantify the extent to which using
such tests in combination affects our predictive accuracy. Inman and Berry (2002)
and Orey et al. (2000) used a college sample reporting history of head injury in the
context of an analog design to investigate the combined utility of various
freestanding effort indicators, and concluded that failure on any one test (in a
group of highly sensitive freestanding indicators) should raise questions of
inadequate effort. Requiring failure on two or more tests was inadequately sensitive.
In the third study from this laboratory, Vickery et al. (2004) used a more
ecologically valid sample of four comparison groups (23 moderately to severely
head-injured controls, 23 moderately to severely head-injured simulators, 23
community volunteer controls, and 23 community volunteer simulators). Again,
the authors found that the best model was one that used a criterion of failure on one
or more of these freestanding SVTs (sensitivity ¼ 89.1, specificity ¼ 93.5, overall
hit rate ¼ 91.5), as compared to models requiring two or more failures
(sensitivity ¼ 65.2, specificity ¼ 97.8, overall hit rate ¼ 79.9) or three or more
failures (sensitivity ¼ 32.6, specificity ¼ 100.0, overall hit rate ¼ 63.6). Taken
together, their findings suggest that using failure on one or more freestanding
SVT produced the most accurate classification rates. However, all of these
investigations were carried out with use of simulation (i.e., analog) designs and
relatively small samples. Further, two of these studies employed college samples self-
reporting a history of mild head injury whose impairment was likely minimal,
possibly leading to overestimation of specificity and further limiting the general-
izability to real-world clinical samples. In addition, the use of simulators may not
accurately capture test sensitivity in real-world applications (see Boone et al., 2002a;
Boone, Lu, & Wen, 2005). In this vein, it is relevant to note that Orey and colleagues
(2000) documented only a 4% sensitivity for the Rey 15-item, whereas studies
examining ‘‘real world’’ non-credible subjects have found much higher sensitivity
rates (e.g., 47%; Boone et al., 2002b). In fact, studies using clinical samples have
shown that it is fairly common for a credible patient to fail one SVT upon
neuropsychological exam (e.g., Dean, Victor, Boone, & Arnold, 2008; Meyers &
Volbrecht, 2003). As such, it is important to obtain converging evidence from both
experimental and clinical research designs (Rogers, 1997b).
EMBEDDED SVTS
The field of clinical neuropsychology also focuses on the use of SVTs
embedded into existing batteries as opposed to dedicated SVTs. Tests with
embedded symptom validity or response indicators include the Rey-Osterreith
Complex Figure Test (ROCFT; Lu, Boone, Cozolino, & Mitchell, 2003), the Rey
Auditory Verbal Learning Test (RAVLT; Boone et al., 2005; Sherman,
Brauer-Boone, Lu, & Razani, 2002), Finger Tapping, (Arnold et al., 2005), and
the Weschler Adult Intelligence Scale – Third Edition (WAIS-III; e.g., Reliable
Digit Span; Babikian, Boone, Lu, & Arnold, 2006; Greiffenstein, Gola,
& Baker, 1994).
To our knowledge, there are also only three prior published attempts to
quantify the extent to which the use of multiple embedded SVTs affects our
predictive accuracy. First, Iverson and Franzen (1996), using both freestanding
and embedded indicators, investigated the predictive accuracy of 10 different
indices derived from five tests in their three comparison groups (20 under-
graduates, 20 psychiatric inpatients, and 20 memory-impaired patients; note that
the first two groups were tested under two conditions, including once when they
were asked to do their best and once when they were asked to simulate a
malingerer). When a criterion of any 1 failure out of a possible 10 was used,
sensitivity was 92.5%, and specificity was 100% (overall hit rate ¼ 97%). However,
extrapolation of this finding is limited given that the study employed SVTs that are
less commonly used (as compared to others) in actual clinical practice, had small
sample sizes, and employed simulators rather than ‘‘real world’’ non-credible
participants.
A second study by Meyers and Volbrecht (2003) investigated the validity of

nine embedded SVTs in a large clinical sample of varying diagnoses compared to
normal controls and simulators (overall n ¼ 796). The results suggested that while
it was not atypical to find one failure, failure on two or more embedded indices
among non-institutionalized or non-litigating groups was rare, and using a
criterion of two or more resulted in 100% specificity and 83% sensitivity in
credible (i.e., controls; n ¼ 32, depressed; n ¼ 25, non-litigating mild traumatic
brain injury; n ¼ 56, and non-litigating chronic pain patients; n ¼ 38) and non-
credible (i.e., graduate students asked to simulate, or ‘‘fake’’, a brain injury; n ¼ 21)
groups, respectively. Further, it was found that groups of traumatic brain-injury
patients with long periods of loss of consciousness did not fail two or more of
these nine validity indicators (n ¼ 40), providing further support for use this
criterion rule.
Finally, in the only attempt to investigate this research question using
‘‘real-world’’ malingerers, Larrabee (2003a) used a pairwise failure model (i.e., any
combination of two test failures) to classify the performance of a malingering group
defined by Slick, Sherman, and Iverson (1999) criteria for definite malingered
neurocognitive dysfunction (i.e., worse than chance performance on the Portland
Digit Recognition Test; n ¼ 24) and a credible (moderate to severe) closed
head-injury group (n ¼ 27). Consistent with the two-failure rule, results indicated
impressive combined sensitivity (87.5%), specificity (88.9%), and overall hit
(88.2%) rates. Requiring additional failure (failure on 3 SVTs) increased
specificity to 100%; however, it lowered sensitivity to only 54.2% and led to a
combined hit rate of only 78.4%.
Combining this sample with a cross-validation sample that included a
different group meeting Slick et al. (1999) criteria for probable malingered
neurocognitive dysfunction (combined n ¼ 41), as well as non-litigating mixed
clinical (neurologic and psychiatric) group (combined n ¼ 54) yielded even higher
sensitivity (87.8%), specificity (94.4%), and overall hit (91.6%) rates when the
pairwise failure criterion was employed. All five embedded indicators were then
entered into a logistic regression, which was significant and produced sensitivity
and specificity rates comparable to requiring failure on two tests (derivation study:
sensitivity ¼ 79.2%, specificity ¼ 85.2%, overall hit rate ¼ 85.2%; cross-validation
study: overall hit rate ¼ 82.4%). As explained by the author, this demonstration
is important because, given that clinical use of a logistic regression equation
requires the administration of all tests used in the equation (and test batteries vary
from clinician to clinician), it suggests that the more parsimonious and user-friendly
pairwise failure model might be a viable rule of thumb that does not compro-
mise predictive accuracy. Based on this work, the recommendation of using two or
more failures as a guideline for detecting symptom invalidity was made by
Larrabee et al. (2007) in a recently published book chapter, but this requires
cross-validation.
The focus on embedded SVTs is important because they are derived from
standard cognitive tests and therefore have the advantage of serving ‘‘double duty’’
(i.e., measurement of both response bias and specific cognitive abilities) without
adding to test battery length. In addition, they are less likely to be targeted for
coaching because they have well-documented functions/purposes unrelated to
measurement of response bias and because they do not typically rely on a forced
choice format that is often recognized by sophisticated malingerers as a malingering
assessment technique (Suhr & Gunstad, 2000). Further, while freestanding SVTs are
thought to have high face validity and therefore relatively lower associated rates of
false positive error, they may be more susceptible to attorney coaching and can
considerably extend the length of a typical testing battery (Inman & Berry, 2002).
For these reasons, SVTs that are derived from standard neuropsychological
assessment procedures are increasing in popularity.
It is the purpose of this study to investigate the classification rates associated
with using multiple embedded SVTs in an actual clinical population by comparing
individual test sensitivities/specificities to those associated with use of pairwise
failure and to further evaluate the unique contributions of various tests used in
combination. The specific aims of this study are: (1) to cross-validate the pairwise
failure model of embedded SVTs, using a modified set of indices, and (2) to compare
these results to those that emerge from logistic regression analyses using continuous
predictors to assess the independent contributions of included SVTs (i.e., the extent
to which they demonstrate incremental predictive value), and finally (3) to address
limitations of generalizability and provide suggestions to stimulate and guide future
work in this area of research.
METHOD
Subjects/group assignment
An archival database of patients referred to a tertiary care outpatient
neuropsychological assessment service at the large county hospital was accessed
with institutional review board approval. All subjects underwent standard
administrations of a comprehensive neuropsychological test battery and consented
to have their results included in research analyses.
Subjects were identified as non-credible if they had motive to feign (i.e., they
were either in litigation at the time of testing or attempting to secure disability)
and failed at least two of the five freestanding SVTs. Subjects were included in the
credible group if they had no identified motive to feign or exaggerate their
cognitive deficits (i.e., they were not in litigation or attempting to obtain

disability), they failed fewer than two (i.e., none or only one) of the freestanding
SVTs, and they had no history of, or met current psychometric criteria for,
dementia or mental retardation. Of note, correlation of group assignment based
on this method with actual clinical judgment of malingering in these real-world
cases was .95 (p 5 .01); mismatched cases included two subjects judged to be
credible upon clinical evaluation, but who were identified as non-credible based on
the above described method of group assignment (i.e., both had external incentive
to feign and both failed two freestanding SVTs). Examiners concluded that low
cognitive scores were consistent with current and past ‘‘real-world’’ function, with
SVT failures judged to be an artifact of actual cognitive limitations. One subject
was a 55-year-old Hispanic male with only a ninth-grade education and history of
severe substance abuse, learning disability, and depression. The other was a
58-year-old African American male with a history of 12 years of education but
with some special education classes, and diagnosed with depression and panic
disorder with a rule out for schizotypal personality disorder. These subjects were
not included in the analyses presented below, rendering a final sample size of 103.
Table 1 provides the frequency of primary diagnoses for both the credible and
non-credible groups in this study.
Measures
The neuropsychological test battery included nine indicators of symptom
validity or response bias (five freestanding and four embedded) with standardly
employed cutoffs (see Table 2), although not every subject was administered all nine
SVTs for various reasons (e.g., time constraints). Only subjects who had data on at
least three out of the five freestanding SVTs and at least three out of four embedded
SVTs were included in the study.
RESULTS
Sample demographics for both credible (n ¼ 66) and non-credible (n ¼ 37)
groups are presented in Table 3. Using independent samples t-test (for continuous
variables) and chi-square analyses (for categorical variables), groups were compared
on demographic, IQ, and embedded SVT performance. Groups did not differ with
respect to age, number of years of education, or gender, and overall the samples
were fairly comparable with respect to ethnic diversity, although the chi-square
statistic approached significance (p ¼ .078). The non-credible group had a
significantly lower WAIS-III FSIQ, likely secondary to their corresponding
higher levels of response bias given that the groups did not differ in level of
education. As expected, the groups differed in their performance on the four
embedded symptom validity indicator predictor variables, with the non-credible
group performing significantly worse on each. Correlational analysis (Table 4)
revealed significant modest to moderate correlations among all the predictor
variables (ranging from .23 to. 63).
Table 5 displays the sensitivities and specificities of the embedded SVTs at
published cut-scores (see Table 2). Individual test sensitivities ranged from 43.2
Table 1 Frequency count of diagnosis by group
Credible Non-credible
Diagnosis (n ¼ 66) (n ¼ 37)
Alcohol abuse 6 2
Attention deficit disorder 7 2
Anxiety disorders 9 3
Bacterial meningitis 1 0
Bipolar disorder 6 0
Brain aneurysm 2 0
Brain tumor 3 0
Cognitive disorder NOS 4 0
Dissociative disorder 1 0
Epilepsy/seizures 7 2
Head injury – severity unknown 1 1
HIV/AIDS 4 0
Klinefelters syndrome 2 0
Learning disability 20 6
Mild cognitive impairment 1 0
Mild head injury 3 10
Moderate head injury 2 4
Mood disorder (not bipolar) 28 13
Multiple sclerosis 1 0
Personality disorder NOS 2 3
Rule out anxiety 2 1
Rule out attention deficit disorder 0 1
Rule out bipolar disorder 1 0
Rule out cognitive disorder NOS 1 0
Rule out dementia 1 1
Rule out mood disorder (not bipolar) 4 6
Rule out dissociative disorder 1 0
Rule out learning disorder 1 1
Rule out psychotic disorder NOS 1 1
Rule out somatoform disorder 5 3
Schizophrenia/psychosis 9 3
Severe head injury 6 4
Somatoform disorder 11 6
Stroke 4 2
Substance abuse/dependence 17 7
Toxic exposure 0 4
Some patients had more than one diagnosis and were therefore included in
the count more than once. NOS ¼ Not otherwise specified.
(Reliable Digit Span) to 86.1 (RAVLT Effort Equation) while individual

specificities ranged from 81.3 (Finger Tapping) to 91.7 (R-O Effort Equation).
Classification rates for requiring failure on any one, two, or three test combinations
were also examined. The pairwise failure model demonstrated superiority over the
use of any one individual test by itself in terms of overall hit rate (90.3%),
maximizing sensitivity (83.8%) while still maintaining an impressive level of
specificity (93.9%). Requiring only one failure was extremely sensitive (94.6%), but
unacceptable in terms of false positive error (specificity ¼ 53.0%). Requiring failure
on any three tests led to almost perfect specificity (98.5%); however, it led to a
Table 2 Symptom validity indicators used
Measure Reference Cutoff
Free Standing (Criterion Variables)

Rey 15-Item Test plus
recognition combination scorea Boone et al., 2002b 520
Dot Counting Test (DCT) E-scoreb Boone et al., 2002a 17
Warrington Recognition Iverson & Franzen, 1994;
Memory Test - Words (RMT)c Millis, 1992, 2002 533
b-testd Boone et al., 2000 160
Rey Word Recognition Test (RWRT)e Nitch et al., 2006 Men 5
Women 7
Embedded (Predictor Variables)
WAIS-III Reliable Digit Span (RDS)f Babikian et al., 2006 6
Greiffenstein et al., 1994 6
Rey-Osterreith Complex Figure Effort Equationg Lu et al., 2003 47
RAVLT effort equationh Boone et al., 2005 12

Finger Tapping Test (FTT)i Arnold et al., 2005 Men 35
Women 28
a
Rey 15-Item Test plus recognition ¼ (recall correct þ (recognition correct false positive errors)).
b
Dot Counting Test e-score ¼ Mean ungrouped dot counting time þ Mean grouped dot counting
time þ number of errors.
c
Warrington Recognition Memory Test Words ¼ total number correct out of 50.
d
b-test e-score ¼ mean time to complete per page þ number omissions 10 (number of commis-
sions þ number of d commissions).
e
Rey Word Recognition Test ¼ Total recognized (without subtracting number of false positives).
f
WAIS-III Reliable Digit Span ¼ Total number of digits recalled forwards and backwards on both
trials.
g
Rey-Osterreith Complex Figure Effort Equation ¼ Copy þ [(recognition true positives atypical false
positives) 3].
h
RAVLT effort equation ¼ Recognition false positives þ number of words recognized from the first
five words on list.
i
Finger Tapping Test ¼ average of dominant hand across three trials.
substantial drop in sensitivity (51.4%). Table 5 also provides clinically relevant

data on diagnostic classification rates, or positive predictor power (PPP) and
negative predictive power (NPP), at three base rates of malingering (i.e., 15%, 40%,
and 50%).
Table 6 displays the results of a logistic regression analysis including all four
embedded SVTs as continuous predictors. Using this model, sensitivity was 85.7%,
specificity was 95.6%, with an overall hit rate of 91.8%, reflecting comparable levels
of predictive accuracy to the pairwise failure model. Using all four predictors as
opposed to using fewer (three or two) led to greater overall levels of predictive
accuracy.
The proportion of both groups failing zero, one, two, three, or four of the
embedded indicators is presented in Table 7. As can be seen in the table, it was not
unusual for individuals in the credible group to fail one of the embedded SVTs (in
fact, 41% of the sample did), and only one of the credible subjects failed more than
two (see case #4 below).
Table 3 Sample characteristics and performance on predictors by group (i.e., credible vs non-credible)
Characteristic Credible Non-credible t/2 df p

M (SD) M (SD)
Age 45.0 (13.4) 44.6 (11.8) 0.154 101 .878

n ¼ 66 n ¼ 37
Education 12.4 (3.4) 12.8 (2.6) 0.696 101 .448

n ¼ 66 n ¼ 37
WAIS-III Full Scale IQ 91.8 (15.7) 72.1 (12.5) 3.942 74 .000
n ¼ 65 n ¼ 11
Male 44% 59% 2.285 1 .096
n ¼ 66 n ¼ 37
Ethnicity — — 12.773 7 .078
Caucasian 42% 56%

African American 20% 32%
Hispanic 20% 3%
Asian 7% 3%
Middle Easterner 5% 3%
Native American 1% 0%
Other 6% 3%
n ¼ 66 n ¼ 37
Predictor variables
RDS 8.6 (2.4) 6.4 (1.8) 4.718 101 .000
n ¼ 66 n ¼ 37
R-O 55.0 (6.2) 40.0 (9.1) 8.783 78 .000
n ¼ 48 n ¼ 32
RAVLT 15.2 (3.6) 7.5 (5.6) 8.334 99 .000
n ¼ 65 n ¼ 36
FTT 42.5 (11.1) 33.9 (10.8) 3.668 96 .000
n ¼ 64 n ¼ 34
RDS ¼ WAIS-III Reliable Digit Span; R-O ¼ Rey Osterreith Complex Figure Effort Equation;
RAVLT ¼ Rey Auditory Verbal Learning Test effort equation; FTT ¼ Finger Tapping Test.
Table 4 Correlations among predictor variables
RDS R-O RAVLT FTT
RDS 1.00
R-O .58** 1.00
(n ¼ 80)
RAVLT .45** .63** 1.00
(n ¼ 101) (n ¼ 78)
FTT .38** .29* .23* 1.00
(n ¼ 98) (n ¼ 75) (n ¼ 96)
**p 5 .01; *p 5 .05; RDS ¼ WAIS-III Reliable Digit Span; R-O ¼ Rey Osterreith Complex
Figure Effort Equation; RAVLT ¼ Rey Auditory Verbal Learning Test effort equation; FTT ¼ Finger
Tapping Test.
Table 5 Classification rates for individual indicators, and those requiring failure of one, two, or three
symptom validity tests for the detection of non-credible performance
BR ¼ 15% BR ¼ 40% BR ¼ 50%
Symptom
validity indicator Hit rate Sensitivity Specificity PPP NPP PPP NPP PPP NPP
RDS 70.9 43.2 86.4

R-O 87.5 81.3 91.7
RAVLT 84.2 86.1 83.1
FTT 69.4 47.1 81.3
Any one test 68.0 94.6 53.0 26.8 97.9 57.4 94.3 67.1 90.0
Any two tests combined 90.3 83.8 93.9 72.2 96.5 89.5 89.2 93.6 85.7
Any three tests combined 81.6 51.4 98.5 88.9 91.5 95.5 75.3 96.4 66.7
BR ¼ base rate, percentage of individuals in sample who were malingering; Hit rate ¼ percentage of
both groups correctly classified; Sensitivity ¼ percentage of malingering group falling below cutoff;
Specificity ¼ percentage of credible group falling above cutoff; PP ¼ Positive Predictive Power, percentage
of those with positive test sign who were malingering; NPP ¼ Negative Predictive Power, percentage of
those with negative test sign who were not malingering; RDS ¼ WAIS-III Reliable Digit Span; R-O ¼ Rey
Osterreith Complex Figure Effort Equation; RAVLT ¼ Rey Auditory Verbal Learning Test effort
equation; FTT ¼ Finger Tapping Test
Table 6 Results of backward step-wise logistic regression analysis
Overall
Model Predictors accuracy Sensitivity Specificity 2 df p Wald df p Odds ratio
1 91.8 85.7 95.6 70.321 4 .000

RDS 1.889 1 .169 .542
R-O 6.922 1 .009 .772
RAVLT 9.958 1 .002 .613
FTT 2.784 1 .095 .909
Constant 9.956 1 .002
2 90.4 85.7 93.3 68.118 3 .000

R-O 9.192 1 .002 .757
RAVLT 10.415 1 .001 .645
FTT 3.078 1 .079 .905
Constant 10.999 1 .001
3 88.5 80.6 93.6 64.392 2 .000

R-O 11.051 1 .001 .803
RAVLT 11.005 1 .001 .729
Constant 15.415 1 .000
Odds ratio ¼ the increase in odds of being a malingerer with any unit decrease on the symptom validity
indicator; RDS ¼ WAIS-III Reliable Digit Span; R-O ¼ Rey Osterreith Complex Figure Effort Equation;
RAVLT ¼ Rey Auditory Verbal Learning Test effort equation; FTT ¼ Finger Tapping Test.
Analysis of false positive errors obtained when using the pairwise model
revealed that the four misclassifications included: (1) a 30-year-old Hispanic
female with 8 years of education and borderline intellectual functioning
(WAIS-III FSIQ ¼ 77) who reported speaking only 10% English when she was
Table 7 Proportion of embedded symptom validity test failures by group
Number of embedded Credible Non-credible

SVT failures n ¼ 66 n ¼ 37
0 53% 5%
1 41% 11%
2 5% 32%
3 1% 41%
4 0% 11%
SVT ¼ symptom validity test.
growing up and was diagnosed with a learning disability, bipolar disorder,

psychosis and substance abuse; (2) a 17-year-old African American male with 10
years of education and borderline intellectual functioning (FSIQ ¼ 72) who had
suffered a severe head injury with post-traumatic seizures, and had a history of
alcohol dependence and learning disability; (3) a 53-year-old Hispanic male with
one year of education in Mexico and borderline intelligence (FSIQ ¼ 79) for
whom English was his second language with multiple Axis I diagnoses including
schizophrenia NOS, panic disorder and major depressive disorder; and finally
(4) a 50-year-old Caucasian female with 14 years of education (FSIQ not
obtained) diagnosed with bipolar disorder and a rule out for somatoform
disorder. Notably, the first three subjects were already on disability, had no
other identifiable external incentive to feign, and all failed the same two
embedded indicators, Reliable Digit Span and Finger Tapping. The fourth false
positive was not on disability, failed three embedded indicators, but passed three
freestanding indicators.
DISCUSSION
The current study replicates the findings of Larrabee (2003a) in terms of
demonstrating good sensitivity and specificity for discriminating between credible
and non-credible patients on the basis of any pairwise failure combination of
embedded SVTs. While Larrabee demonstrated this with use of Benton Visual Form
Discrimination, WCST Failure to Maintain Set, the Lees-Haley Fake Bad Scale,
Reliable Digit Span, and Finger Tapping (total for both hands) for a combined hit
rate of 91.6% (n ¼ 95), the present investigation found similar results with use of
Reliable Digit Span, the Rey-Osterreith Effort Equation, the Rey Auditory Verbal
Learning Test Effort Equation, and Finger Tapping (dominant hand average) for a
combined hit rate of 90.3% (n ¼ 103). Further, as in Larrabee’s sample, the pairwise
failure model produced hit rates comparable to those obtained when the indicators
were entered as continuous predictors in logistic regression.
Of interest, current findings are highly similar to those of Larrabee (2003a)
despite some differences in methodology between the two studies. In addition to the
fact that at least half of the Larrabee (2003a) embedded indicators differed from those
in the current study, differing cut-scores were also employed for the tests in common
between the two investigations (i.e., Finger Tapping and Reliable Digit Span).
Further, the non-credible group in Larrabee’s (2003a) initial study was assigned using
different criteria from the current study (i.e., his sample included ‘‘definite’’
malingerers who displayed worse than chance performance on a forced choice
measure), although his cross-validation sample essentially matched that of the
current sample (i.e., ‘‘probable’’ malingerers who failed two or more dedicated SVTs
and had incentive to feign). In addition, Larrabee’s (2003a) first sample
included mostly individuals with mild, moderate, or severe head injury, although
his cross-validation sample was a mixed neurologic and psychiatric group more
comparable to the mixed clinical sample of multiple varying diagnoses used in the
current study.
Examining the results of logistic regression suggests that using all four
indicators yields the highest levels of predictive accuracy, although only three
indicators appeared to make significant (or almost significant) independent
contributions to the overall predictive accuracy (i.e., the R-O Effort Equation,
RAVLT Effort Equation, and Finger Tapping), demonstrating their incremental

value and lack of redundancy. This is likely because they each reflect a different
neuropsychological symptom domain (visual constructional skill/memory, verbal
memory, and motor speed/dexterity). In contrast, Reliable Digit Span did not
provide an independent contribution to group assignment and appeared redundant
with the ‘‘memory’’ SVTs (Rey Osterreith and Rey Auditory Verbal Learning Effort
Equation), perhaps because individuals feigning perceive Digit Span as a memory
task (Babikian et al., 2006).
It is important to note that failure on any one embedded index was fairly
common in this real-world clinical sample (i.e., 41% of our credible group failed at
least one measure). These findings are in contrast to those of Berry and colleagues
(Inman & Berry, 2002; Orey et al., 2000), who documented 100% specificity across
dedicated SVTs in their sample of college students with a history of mild head
injury. Of note, several meta-analyses show that mild traumatic brain-injury
subjects perform comparably to controls (Belanger, Curtiss, & Demery, 2005;
Belanger & Vanderploeg, 2005; Carroll, Cassidy, Holm, Kraus, & Coronado, 2004;
Frencham, Fox, & Maybery, 2005; Schretlen & Shapiro, 2003; Vanderploeg,
Curtiss, & Belanger, 2005). Thus, taken together, the data suggest that while
individuals with normal cognition may demonstrate 100% specificity on SVTs, it is
unlikely that credible patients with actual neurologic and/or psychiatric conditions
will perform at this level, and thus interpretation of SVTs needs to be adjusted
accordingly.
Table 5 also provides clinically relevant data on diagnostic classification
rates—i.e., positive (PPP) and negative (NPP) predictive power—at three base rates
of malingering (15%, 40%, and 50%). Since NPP is usually high and PPP is lower
in low base-rate environments (15%), a more conservative threshold for predicting
malingering is usually appropriate (i.e., requiring failure on three or more
malingering indicators; PPP ¼ 88.9%; NPP ¼ 91.5). However, in high base-rate
environments (40% or 50%), requiring failure on three or more malingering
indicators results in high PPP (96.4%) but unacceptably low NPP (66.7%). Thus,
more liberal criteria for predicting malingering are usually necessary in such
environments (i.e., failure on two or more malingering indicators). Failure on any
one indicator, as mentioned above, results in an unacceptable rate of false positive
error, as well as an unacceptable rate of PPP (67.1%). It should be noted here,

however, that while it is generally acceptable to attempt to keep sensitivity as high as
possible while maintaining specificity in the 90s, as shown in Table 5, requiring three
(rather than two) SVT failures leads to higher PPP. With respect to research design
methodology, Greve, Ord, Curtis, Bianchini, and Brennan (2008) recently selected
cut scores that led to high PPP (.80) to define their probable malingering groups and
cut scores that led to high NPP (.80) to define their clinical comparison groups, a
design issue that future research in this area may want to consider. With respect to
clinical practice, Larrabee and colleagues (2007) have suggested revision to the Slick
et al. (1999) criteria that include two or more SVT failures as representing criteria
for the detection of probable malingering, whereas three or more failures represent
criteria for the identification of definite malingering, which is also consistent with
the recent recommendation of Boone (2007). In fact, three or more SVT failures is
typically associated with very high PPP (100% in Larrabee, 2003a; nearly 100% in
the present sample).

Analysis of the false positive errors in the present sample using the pairwise
failure model suggests that individuals with evidence of premorbid neurological/
developmental compromise (e.g., learning disability) in combination with sub-
sequent CNS insult (e.g., severe head injury) and/or multiple comorbid psychiatric
diagnoses are at particular risk for failure on two or more SVTs despite likely
display of genuine performance. In addition, as previously reported (Salazar, Lu,
Wen, & Boone, 2007), individuals for whom English is their second language and/or
who are members of ethnic minorities may under-perform on some SVTs. Of note,
three of the four false positive identifications were due to failure on the Reliable
Digit Span and Finger Tapping indicators. Research demonstrates that Hispanic
people do not recall as many digits forward as their non-Hispanic counterparts and
that Digit Span may not be measuring the same attentional construct in Spanish
speakers as is found in English speakers (see Kaufman, 1990, for a review; also see
Ponton, Gonzalez, Hernandez, Herrera, & Higareda, 2000). In fact, preliminary
data from our clinic indicates that SVTs derived from Digit Span variables require
adjustments in Hispanic and ESL samples (Salazar et al., 2007).
Only one credible patient failed more than two SVTs, which demonstrates the
importance of administering at least three or more SVTs in the context of
neuropsychological assessment. However, it should be emphasized that subjects
who met psychometric criteria for dementia or mental retardation were excluded
from the credible group, and thus the current findings do not apply to these
populations. Data from our lab (Dean, Victor, Boone, Curiel, & Zeller, 2007a; Dean
et al., 2007b; Victor & Boone, 2007) in fact show high false positive rates on SVTs in
low-IQ and dementia samples (e.g., individuals with IQ 60–69 fail on average 40%
of administered SVTs; Dean et al., 2008; specificities across a range of SVTs fall in
the 30–70% range for individuals diagnosed with dementia; Dean, Victor, Boone,
Philpott, & Hess, in press) and indicate that SVT interpretation algorithms require
further modifications in these populations. This is consistent with the findings of
Meyers and Volbrecht (2003) who noted that in their study institutionalized
individuals requiring 24-hour care failed two or more SVTs, suggesting the need for
alternative strategies of assessment or an adjustment in cutoffs with this population.
Taken together, the results from Larrabee (2003a) and the current study show
that successful identification of symptom invalidity or response bias in ‘‘real-world’’
clinical samples, through pairwise SVT failure, is a robust finding replicable across a
wide range of embedded symptom validity indices, and is therefore appropriate for
use in clinical practice. Future research examining replication of this model with
freestanding SVTs in ‘‘real-world’’ samples is needed, particularly given that many
dedicated SVTs incorporate the same test format (i.e., forced choice), raising
questions regarding their redundancy and incremental validity.
ACKNOWLEDGEMENTS
Portions of this paper were presented at the 34th Annual Meeting of the
International Neuropsychological Society, Boston, Massachusetts, February 2006.
REFERENCES
Arnold, G., Boone, K., Dean, A., Wen, J., Nitch, S., Lu, P., et al. (2005). Sensitivity and
specificity of finger tapping test scores for the detection of suspect effort. The Clinical
Neuropsychologist, 19, 105–120.
Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). Sensitivity and specificity of various
digit span scores in the detection of suspect effort. The Clinical Neuropsychologist, 20,
145–159.
Belanger, H. G., Curtiss, G., & Demery, J. A. (2005). Factors moderating neuropsychological
outcomes following mild traumatic brain injury: A meta-analysis. Journal of the
International Neuropsychological Society, 11, 215–227.
Belanger, H. G., & Vanderploeg, R. D. (2005). The neuropsychological impact of sports-
related concussion: A meta-analysis. Journal of International Neuropsychological Society,
11, 345–357.
Binder, L. M., & Willis, S. C. (1991). Assessment of motivation after financially compensable
minor head trauma. Psychological Assessment, 3, 175–181.
Boone, K. B. (2007). A reconsideration of the Slick et al., (1999) criteria for
malingered neurocognitive dysfunction. In K. B. Boone (Ed.), Assessment of feigned
cognitive impairment: A neuropsychological perspective. New York: Guilford
Publications, Inc.
Boone, K. B., Lu, P., Back, C., King, C., Lee, A., Philpott, L., et al. (2002a). Sensitivity and
specificity of the Rey Dot Counting Test in patients with suspect effort and various
clinical samples. Archives of Clinical Neuropsychology, 17, 1–19.
Boone, K. B., Lu, P., Sherman, D., Palmer, B., Back, C., Shamieh, E., et al. (2000).
Validation of a new technique to detect malingering of cognitive symptoms: The b Test.
Archives of Clinical Neuropsychology, 15, 227–241.
Boone, K. B., Lu, P., & Wen, J. (2005). Comparison of various RAVLT scores in the
detection of non-credible memory performance. Archives of Clinical Neuropsychology,
20, 301–319.
Boone, K. B., Salazar, X., Lu, P., Warner-Chacon, K., & Razani, J. (2002b). The Rey 15-item
Recognition Trial: A technique to enhance sensitivity of the Rey 15-Item Memorization
Test. Journal of Clinical and Experimental Neuropsychology, 24, 561–573.
Carroll, L. J., Cassidy, J. D., Holm, L., Kraus, J., & Coronado, V. G. (2004). Methodological
issues and research recommendation for mild traumatic brain injury: The WHO
Collaborating Centre Task Force on Mild Traumatic Brain Injury. Journal of

Rehabilitation Medicine, 43(Suppl.), 113–125.
Dean, A., Victor, T., Boone, K., & Arnold, G. (2008). The relationship of IQ to effort test
performance. The Clinical Neuropsychologist, 22, 705–722.
Dean, A. C., Victor, T. L., Boone, K. B., Curiel, A., & Zeller, M. A. (2007a). Preliminary
data on the use of Finger Tapping as an effort test in demented populations. Poster
presentation at the 26th Annual National Academy of Neuropsychology Conference,
Scotsdale, Arizona.
Dean, A. C., Victor, T. L., Boone, K. B., Philpott, L. M., & Hess, R. A. (in press). Dementia
and effort test performance. The Clinical Neuropsychologist.
Dean, A. C., Victor, T. L., Boone, K. B., Philpott, L. M., Hess, R., & Razani, J. (2007b).
Effort test performance in patients with dementia. Poster presentation at the 26th Annual
National Academy of Neuropsychology Conference, Scotsdale, Arizona.
Frencham, K. A., Fox, A. M., & Maybery, M. T. (2005). Neuropsychological studies of mild
traumatic brain injury: A meta-analytic review of research since 1995. Journal of Clinical
and Experimental Neuropsychology, 27, 334–351.

Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of malingered amnesia
measures with a large clinical sample. Psychological Assessment, 6, 218–224.
Greiffenstein, M. F., Gola, T., & Baker, W. J. (1995). MMPI-2 validity scales versus domain
specific measures in detection of factitious traumatic brain injury. The Clinical
Neuropsychologist, 9, 230–240.
Greve, K. W., Ord, J., Curtis, K. L., Bianchini, K. J., & Brennan, A. (2007). Detecting
malingering in traumatic brain injury and chronic pain: A comparison of three forced-
choice symptom validity tests. The Clinical Neuropsychologist, 22, 769–949.
Haynes, S. N., & Lench, H. C. (2003). Incremental validity of new clinical assessment
measures. Psychological Assessment, 15, 456–466.
Hunsley, J., & Meyer, G. J. (2003). The incremental validity of psychological testing and
assessment: Conceptual, methodological and statistical issues. Psychological Assessment,
15, 446–455.
Inman, T. H., & Berry, D. T. R. (2002). Cross-validation of indicators of malingering:
A comparison of nine neuropsychological tests, four tests of malingering, and behavioral
observations. Archives of Clinical Neuropsychology, 17, 1–23.
Iverson, G. L., & Franzen, M. D. (1994). The Recognition Memory Test, Digit Span,
and Knox Cube Test as markers of malingered memory impairment. Assessment, 1,
323–334.
Iverson, G. L., & Franzen, M. D. (1996). Using multiple objective memory procedures to
detect simulated malingering. Journal of Clinical and Experimental Neuropsychology, 18,
38–51.
Iverson, G. L., & Tulsky, D. S. (2003). Detecting malingering on the WAIS-III unusual Digit
Span performance patterns in the normal population and in clinical groups. Archives of
Clinical Psychology, 18, 1–9.
Kaufman, A. S. (1990). Assessing adolescent and adult intelligence. Boston: Allyn &
Bacon.
Larrabee, G. J. (2003a). Detection of malingering using atypical performance patterns of
standard neuropsychological tests. The Clinical Neuropsychologist, 17, 410–425.
Larrabee, G. J. (2003b). Detection of symptoms exaggeration with the MMPI-2 in
litigants with malingered neurocognitive dysfunction. The Clinical Neuropsychologist,
17, 54–68.
Larrabee, G. J. (2008). Aggregation across multiple indicators improves the detection of
malingering: Relationship to likelihood ratios. The Clinical Neuropsychologist, 22,
666–679.
Larrabee, G. J., Greiffenstein, M. F., Greve, K. W., & Bianchini, K. J. (2007). Refining
diagnostic criteria for malingering. In G. J. Larrabee (Ed.), Assessment of malingered
neuropsychological deficits pp. 334–372). New York: Oxford University Press.
Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003). Effectiveness of the Rey
Osterreith Complex Figure Test and the Meyers and Meyers Recognition Trial in the
detection of suspect effort. The Clinical Neuropsychologist, 17, 426–440.
Martin, R. C., Hayes, J. S., & Gouvier, W. D. (1996). Differential vulnerability
between postconcussion self-report and objective malingering tests in identifying
simulated mild head injury. Journal of Clinical and Experimental Neuropsychology,
18, 265–275.
Meyers, J. E., & Volbrecht, M. E. (2003). A validation of multiple malingering detection
methods in a large clinical sample. Archives of Clinical Neuropsychology, 18, 261–276.
Millis, S. R. (2006). Introduction to logistic regression. Presentation at the XXth
annual meeting of the American Academy of Clinical Neuropsychology (AACN),
Philadelphia, PA.
Nelson, N. W., Boone, K., Dueck, A., Wagener, L., Lu, P., & Grills, C. (2003). Relationships
between eight measures of suspect effort. The Clinical Neuropsychologist, 17, 263–272.
Nitch, S., Boone, K. B., Wen, J., Arnold, G., & Alfano, K. (2006). The Utility of the Rey
Word Recognition Test in the detection of suspect effort. The Clinical Neuropsychologist,
20, 873–887.
Orey, S., Cragar, D. E., & Berry, D. T. R. (2000). The effects of two motivational
manipulations on the neuropsychological performance of mildly head-injured college
students. Archives of Clinical Neuropsychology, 15, 335–348.
Ponton, M. O., Gonzalez, J. J., Hernandez, I., Herrera, L., & Higareda, I. (2000). Factor
analysis of the Neuropsychological Screening Battery for Hispanics (NeSBHIS). Applied
Neuropsychology, 7, 32–39.
Rogers, R. (1997a). Introduction. In R. Rogers (Ed.), Clinical assessment of malingering and
deception (2nd ed., pp. 1–19). New York: Guilford Press.
Rogers, R. (1997b). Researching dissimulation. In R. Rogers (Ed.), Clinical assessment of
malingering and deception, (2nd ed., pp. 398–426). New York: Guilford Press.
Rosenfeld, B., Sands, S. A., & Van Gorp, W. G. (2000). Have we forgotten the base rate
problem?: Methodological issues in the detection of distortion. Archives of Clinical
Neuropsychology, 15, 349–359.
Salazar, X. F., Lu, P. H., Wen, J., & Boone, K. B. (2007). The use of effort tests in ethnic
minorities and in non-English speaking and English as a second language populations.
In K. B. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological
perspective pp. 405–427). New York: Guilford Press.
Schretlen, D. J., & Shapiro, A. M. (2003). A quantitative review of the effects of
traumatic brain injury on cognitive functioning. International Review of Psychiatry,
15, 341–349.
Sherman, D. S., Brauer-Boone, K., Lu, P., & Razani, J. (2002). Re-examination of the Rey
Auditory Verbal Learning Test/Rey Complex Figure discriminant function to detect
suspect effort. The Clinical Neuropsychologist, 16, 242–250.
Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999). Diagnostic criteria for malingered
neurocognitive dysfunction: Proposed standards for clinical practice and research. The
Clinical Neuropsychologist, 13, 545–561.
Suhr, J. A., & Gunstad, J. (2002). The effects of coaching on the sensitivity and specificity of
malingering measures. Archives of Clinical Neuropsychology, 15, 415–424.
Taylor, J. S. (1999). The legal environment pertaining to clinical neuropsychology.
In J. Sweet (Ed.), Forensic neuropsychology: Fundamentals and practice pp. 421–442).
Exton, PA: Swets & Zeitlinger.
Tombaugh, T. N. (1997). The test of memory malingering (TOMM): Normative data from
cognitively intact and cognitively impaired individuals. Psychological Assessment, 9,
260–268.
Vanderploeg, R. D, Curtiss, G., & Belanger, H. G. (2005). Long-term neuropsychological
outcomes following mild traumatic brain injury. Journal of the International
Neuropsychological Society, 11, 228–336.
Vickery, C. D., Berry, D. T. R., Dearth, C. S., Vagnini, V. L., Baser, R. E., Cragar, D. E.,
et al. (2004). Head injury and the ability to feign neuropsychological deficits. Archives of
Clinical Neuropsychology, 19, 37–48.
Victor, T., & Boone, K. B. (2007). Assessing effort in a mentally retarded population.
In K. B. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological
perspective pp. 310–345). New York: Guilford Publications, Inc.

Interpreting The Meaning of Multiple Symptom Validity Test Failure

Uploaded by

Copyright:

Available Formats

You might also like

Interpreting The Meaning of Multiple Symptom Validity Test Failure

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Interpreting The Meaning of Multiple Symptom Validity Test Failure

Uploaded by

Copyright:

Available Formats

The Clinical Neuropsychologist

ISSN: 1385-4046 (Print) 1744-4144 (Online) Journal homepage: http://www.tandfonline.com/loi/ntcn20

Interpreting the Meaning of Multiple Symptom

To link to this article: http://dx.doi.org/10.1080/13854040802232682

Published online: 11 Jun 2009.

Submit your article to this journal

Article views: 488

View related articles

Citing articles: 73 View citing articles

Full Terms & Conditions of access and use can be found at

INTERPRETING THE MEANING OF MULTIPLE

Tara L. Victor1,2,4, Kyle B. Boone2,3, J. Greg Serpa4,

System, Los Angeles, CA, USA

While it is recommended that judgments regarding the credibility of test performance be

Keywords: Symptom validity; Response bias; Incremental validity; Effort.

likelihood of a SVT failure in a malingerer as compared to the likelihood of a SVT

INTER-TEST CORRELATION AND INCREMENTAL VALIDITY

A second study by Meyers and Volbrecht (2003) investigated the validity of

cognitive deficits (i.e., they were not in litigation or attempting to obtain

Table 1 Frequency count of diagnosis by group

(Reliable Digit Span) to 86.1 (RAVLT Effort Equation) while individual

Table 2 Symptom validity indicators used

Measure Reference Cutoff

Free Standing (Criterion Variables)

RAVLT effort equationh Boone et al., 2005 12

substantial drop in sensitivity (51.4%). Table 5 also provides clinically relevant

Characteristic Credible Non-credible t/2 df p

Age 45.0 (13.4) 44.6 (11.8) 0.154 101 .878

Education 12.4 (3.4) 12.8 (2.6) 0.696 101 .448

Caucasian 42% 56%

Table 4 Correlations among predictor variables

RDS R-O RAVLT FTT

BR ¼ 15% BR ¼ 40% BR ¼ 50%

RDS 70.9 43.2 86.4

Table 6 Results of backward step-wise logistic regression analysis

1 91.8 85.7 95.6 70.321 4 .000

2 90.4 85.7 93.3 68.118 3 .000

3 88.5 80.6 93.6 64.392 2 .000

Table 7 Proportion of embedded symptom validity test failures by group

Number of embedded Credible Non-credible

growing up and was diagnosed with a learning disability, bipolar disorder,

RAVLT Effort Equation, and Finger Tapping), demonstrating their incremental

error, as well as an unacceptable rate of PPP (67.1%). It should be noted here,

the present sample).

Collaborating Centre Task Force on Mild Traumatic Brain Injury. Journal of

and Experimental Neuropsychology, 27, 334–351.

You might also like

Characteristic Credible Non-credible t/2 df p