Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Assessment

http://asm.sagepub.com/

Reliable Digit Span: A Systematic Review and Cross-Validation Study


Ryan W. Schroeder, Philip Twumasi-Ankrah, Lyle E. Baade and Paul S. Marshall
Assessment 2012 19: 21 originally published online 6 December 2011
DOI: 10.1177/1073191111428764

The online version of this article can be found at:


http://asm.sagepub.com/content/19/1/21

Published by:

http://www.sagepublications.com

Additional services and information for Assessment can be found at:

Email Alerts: http://asm.sagepub.com/cgi/alerts

Subscriptions: http://asm.sagepub.com/subscriptions

Reprints: http://www.sagepub.com/journalsReprints.nav

Permissions: http://www.sagepub.com/journalsPermissions.nav

Citations: http://asm.sagepub.com/content/19/1/21.refs.html

>> Version of Record - Feb 21, 2012

OnlineFirst Version of Record - Dec 6, 2011

What is This?

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


428764
111428764Schroeder et al.Assessment
© The Author(s) 2012

Reprints and permission:


ASM19110.1177/1073191

sagepub.com/journalsPermissions.nav

Assessment

Reliable Digit Span: A Systematic


19(1) 21­–30
© The Author(s) 2012
Reprints and permission:

Review and Cross-Validation Study sagepub.com/journalsPermissions.nav


DOI: 10.1177/1073191111428764
http://asm.sagepub.com

Ryan W. Schroeder1, Philip Twumasi-Ankrah1,


Lyle E. Baade1, and Paul S. Marshall2

Abstract
Reliable Digit Span (RDS) is a heavily researched symptom validity test with a recent literature review yielding more
than 20 studies ranging in dates from 1994 to 2011. Unfortunately, limitations within some of the research minimize
clinical generalizability. This systematic review and cross-validation study was conducted to address these limitations, thus
increasing the measure’s clinical utility. Sensitivity and specificity rates were calculated for the ≤6 and ≤7 cutoffs when data
were globally combined and divided by clinical groups. The cross-validation of specific diagnostic groups was consistent
with the data reported in the literature. Overall, caution should be used when utilizing the ≤7 cutoff in all clinical groups and
when utilizing the ≤6 cutoff in the following groups: cerebrovascular accident, severe memory disorders, mental retardation,
borderline intellectual functioning, and English as a second language. Additional limitations and cautions are provided.

Keywords
Reliable Digit Span (RDS), review, analysis, symptom validity, malingering

During the past 10 to 15 years, research on the topic of and professional organizations have urged the use of SVTs
neurocognitive symptom validity testing (SVT) has vastly in nonforensically based evaluations as well (American
expanded (Larrabee, 2003; Whitney, Davis, Shepard, Ber- Academy of Clinical Neuropsychology, 2007; Bush et al.,
tram, & Adams, 2009). This expansion is likely due, in part, 2005). In fact, the American Academy of Clinical
to neuropsychologists’ growing awareness of three find- Neuropsychology recently sponsored a consensus confer-
ings. First, neuropsychologists are unable to reliably dis- ence on neuropsychological SVT and the expert neuropsy-
criminate between individuals who are providing suspect chologists opined that, “The assessment of effort and
effort and individuals who are providing adequate effort genuine reporting of symptoms is important in all evalua-
based solely on general neuropsychological test data (Heaton, tions” to help ensure that the obtained neuropsychological
Smith, Lehman, & Vogt, 1978), self-reported symptoms test results are a valid representation of the patient’s current
(Lees-Haley & Brown, 1993), or clinical judgment (Millis cognitive abilities (Heilbronner et al., 2009, p. 1121).
& Putnam, 1996). Second, patient effort on testing accounts With the push to include SVTs in all neuropsychological
for a substantial amount of variance in neuropsychological evaluations, clinicians must be aware of the major findings
test performance, with some studies indicating that it might and trends in SVT research. Research published by Larrabee
account for as much as 50% of the variance in forensic set- (2008) and by Victor, Boone, Serpa, Buehler, and Ziegler
tings (Constantinou, Bauer, Ashendorf, Fisher, & McCaf- (2009) has provided a model for SVT usage that has resulted
frey, 2005; Green, Rohling, Lees-Haley, & Allen, 2001). in impressive hit rates. Both studies employed multiple
Third, base rates of malingering during neuropsychological SVTs throughout the evaluations and both studies found
evaluations are substantially higher than previously that failure of any two SVTs resulted in a sensitivity rate
believed, even in nonforensic settings (Boone, 2007). above 80% and a specificity rate above 90%. It is impor-
Initially, SVTs were used primarily in medicolegal set- tant to note that clinicians should choose SVTs that have
tings, as base rates of malingering are particularly high
in these settings (Larrabee, 2003). Additionally, in these 1
University of Kansas School of Medicine–Wichita, Wichita, KS, USA
2
setting, it is crucial that the validity of the neuropsychologi- Hennepin County Medical Center, Minneapolis, MN, USA
cal test results be empirically determined, as the test results
Corresponding Author:
often help determine severity of cognitive impairments, Ryan W. Schroeder, University of Kansas School of Medicine–Wichita,
which could have a large impact on whether patients receive 7829 E. Rockhill, Suite 105, Wichita, KS 67206, USA
substantial external incentives. More recently, clinicians Email: ryan.w.schroeder.psyd@hotmail.com

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


22 Assessment 19(1)

sensitivity rates as high as possible while ensuring that adopt a stance whereby all relevant studies are included
specificity rates be maintained at or above 90% to retain the whether they are published or not. This approach is gener-
high hit rates published in the literature (Boone, 2007). ally used to avoid the documented phenomenon termed
Of the multiple SVTs that clinicians might choose from, publication bias, in which research with significant findings
one of the oldest and most heavily researched is Reliable is more likely to be published than research with nonsig-
Digit Span (RDS; Boone, 2007). Greiffenstein, Baker, and nificant findings (Begg, 1994). However, it is noted that
Gola (1994) originally derived RDS from the Digit Span unpublished data may be of lower quality than published
subtest of the Wechsler Adult Intelligence Scale–Revised data, and some efforts are needed to ensure that only high-
(Wechsler, 1981). They calculated the measure by “sum- quality data are used (Martin, Pérez, Sacristán, & Álvarez,
ming the longest string of digits repeated without error over 2005). To remedy this conundrum, this systematic review
two trials under both forward and backward conditions” uses refereed published professional articles and only rele-
(Greiffenstein et al., 1994, pp. 219-220). vant data from additional sources that are believed to be of
A recent review of the literature yielded more than high quality (i.e., scholarly dissertations and data published
20 studies on RDS, with one study being a meta-analytic in books written or edited by leaders in the field of neuro-
review of the measure (Jasinski, Berry, Shandera, & Clark, psychological symptom validity testing).
2011). The meta-analysis indicates that there are strong A variety of techniques were used to gather data for
effect sizes across Wechsler Adult Intelligence Scale this study. First, online databases were searched using
(WAIS) test versions (e.g., WAIS-R vs. WAIS-III). EBSCOhost search engines. These engines searched
Additionally, the meta-analysis indicates that RDS effec- PsycARTICLES, PsychEXTRA, PsychINFO, Psychology
tively discriminates between individuals providing credible and Behavioral Sciences Collection, Medline, and other
effort and individuals providing suspect effort (average electronic databases. The keywords entered in the search
weighted effect size of 1.34). This information is crucial, as engines included “Reliable Digit Span,” “Reliable Digits,”
it indicates that the measure is valid and effective despite “RDS,” “Digit Span and malingering,” “Digit Span and
updated WAIS test versions. The meta-analysis provides effort,” “Digit Span and response bias,” “WAIS and malin-
useful information and it has many strengths; however, it is gering,” “WAIS and effort,” and “WAIS and response
not without limitations. bias.” After all the articles were collected, the reference sec-
The first limitation of the meta-analysis is that it does not tions of the articles were searched for additional studies that
report sensitivity and specificity rates for multiple cutoff may have been missed in the original search.
scores; instead, it shows sensitivity and specificity rates for Each study identified as potentially containing data rel-
a cutoff score of 7.1. Second, the meta-analysis does not evant to this systematic review was examined. When a
report sensitivity and specificity rates for different clinical study failed to provide enough data for inclusion in this
groups; rather, it reports global sensitivity and specificity article, the authors attempted to contact the primary author
rates based on nine published studies. Third, the meta-anal- of that study. Primary authors were also contacted for stud-
ysis does not report information on many clinical strengths ies that included data for one RDS cutoff but not other RDS
and weaknesses of the measure (e.g., its use with patients cutoffs. The authors were asked for additional data related
who have low IQs or patients who speak English as a sec- to their studies, and some of the authors were also asked if
ond language), as it would be challenging to quantify these they had access to other RDS data, such as scholarly dis-
factors in a meta-analysis given the available published sertations, that they would be willing to share for this sys-
data. Consequently, we must look to other sources to pro- tematic review.
vide this clinically applicable information. It was with these After all potential RDS studies were obtained (n = 28),
limitations in mind that this systematic review and cross- the inclusion criteria were narrowed to determine which
validation study was structured. This article will report studies would be included in this systematic review. The
(a) data on individual studies, (b) sensitivity and specificity first criterion was that the studies had to use RDS cutoffs
rates for various cutoff scores with multiple clinical groups, of ≤7 and/or ≤ 6. This criterion was established because
and (c) limitations that clinicians are likely to face when nearly all studies used one or both of these cutoff scores,
using RDS. This systematic review will add to the body of whereas other cutoff scores were rarely published.
published knowledge by synthesizing and reporting practi- Additionally, the available literature suggests that these two
cal data on the clinical utility of RDS. cutoff scores resulted in sensitivity and/or specificity rates
comparable with other embedded SVTs. The second crite-
rion was that the studies had to provide data allowing for the
Method calculation of sensitivity and/or specificity rates for the
Study Selection and Inclusion Criteria RDS cutoffs. The necessary data includes (a) the total num-
ber of subjects and the sensitivity and/or specificity rates or
Different strategies may be used to select studies for inclu- (b) the total number of subjects, the number of subjects
sion in a review article (Demakis, 2006). Some researchers correctly identified as providing adequate effort via RDS,

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


Schroeder et al. 23

and/or the number of subjects correctly identified as provid- into the following groups: (a) postconcussive and mild TBI
ing suspect effort via RDS. The third criterion for inclusion and (b) moderate to severe TBI. This final division was
was that the studies had to classify participants using performed because TBI patients are the most likely patients
(a) participants who had no incentive to feign deficits (for to be seen in forensic neuropsychological evaluations, and
specificity rates only), (b) simulated malingerers (for sensi- it is important to determine if severity of TBI will impact
tivity rates only), or (c) neurocognitive validity criteria sensitivity and/or specificity rates on RDS.
based on the method proposed by Slick, Sherman, and The sensitivity and specificity rates were calculated for
Iverson (1999; for sensitivity and/or specificity rates). In gen- the pooled data using the empirical Bayesian analysis meth-
eral, the Slick et al. criteria require the presence of an external ods. The nonlinear mixed (NLMIXED) procedure of SAS
incentive and failure of an SVT at worse than chance rates or System for Windows (Version 9.2) was used in the analysis.
failure of two or more SVTs not at worse than chance rates The Bayesian analysis method was used because it accounts
(see Slick et al., 1999 for more detailed information). This for the inverse association between sensitivity and specific-
third criterion was to ensure that the data included in this sys- ity rates, which can potentially underestimate the true val-
tematic review met current standards in identifying suspect ues of the rates when data are pooled in studies such as this
effort. The fourth criterion was that the studies could not (Irwig, Macaskill, Glasziou, & Fahey, 1995). Additionally,
reuse previously reported data. This final criterion was the Bayesian method controls for extraneous variables such
included to eliminate the possibility of redundant data in the as between-study correlations and between-study heteroge-
systematic review. When redundant data were identified in neity (Macaskill, 2004), which is likely to occur in large
studies, the redundant data were excluded, whereas the non- pooled data sets. Unfortunately, the Bayesian method has a
redundant data were included. When this strategy was not notable limitation as well. The drawback is that the number
possible (e.g., studies that mixed both previously published of studies included in the calculation is limited because of
and newly published data into one group), the most complete inclusion criteria that are inherent to the method.
data set was included in this systematic review. In addition to the Bayesian calculations, weighted mean
The literature that met the aforementioned inclusion cri- sensitivity and specificity rates were conducted for the
teria included 21 refereed professional articles, 1 book, and pooled data. The weighted mean sensitivity and specificity
1 scholarly dissertation. Five refereed professional articles rates have the advantage of including all studies into the
were excluded because they did not meet the inclusion cri- calculations; however, they have disadvantages in that they
teria. A listing of these articles can be found in the appendix do not control for extraneous variables and they are likely to
of this analysis. result in an underestimate of true sensitivity and specificity
rates. Because of these limitations, 95% confidence inter-
vals were added to the calculations.
Data Analysis From the Literature Review
After the appropriate literature was identified, data for cal-
culation of sensitivity and specificity rates were pooled for Data Analysis From a Personal Database
all RDS studies. The pooled data was also divided into In addition to the data reported in the literature, a personal
patient diagnostic groups. Most of the diagnostic groups database was also examined for this study. The personal
yielded specificity rates of 90% or higher for at least one of database provided additional and comparative information
the two frequently cited cutoff scores. The few diagnostic about the “special groups” because minimal information
groups that consistently produced specificity rates below regarding these groups was available in the literature. The
90% for both cutoff scores were identified as “special data from the personal database is not reported with the
groups” (i.e., cerebrovascular accident, severe memory data from the literature, as the current results from the per-
disorders, mental retardation, borderline IQ and below, sonal database have not been previously published. Instead,
English as a second language, and patients of Hispanic the personal database was used as a cross-validation data
background). These “special groups” were observed to be set for the previously identified “special groups.”
outliers in the current analysis and previous research has The personal database included 364 inpatients and 443
noted that most of these groups perform poorly on Digit outpatients referred for neuropsychological evaluations
Span or similar span-based tests despite the belief that ade- between 2007 and 2009. All patients were referred to
quate effort was provided (Boone, 2007; Lezak, Howieson, a board-certified neuropsychologist practicing in the
& Loring, 2004; Schroeder & Marshall, 2010). Consequently, Department of Psychiatry at Hennepin County Medical
these groups were analyzed separately from the remaining Center. An approval from the institutional review board
data. The remaining data were divided into the following was obtained from Hennepin County Medical Center to
diagnostic classes: nonclinical participants such as controls, use the data in this study.
volunteers, and simulators; mixed clinical non–traumatic All patients in the database completed comprehensive
brain injury (non-TBI) patients; and traumatic brain injury neuropsychological evaluations. During the evaluations,
(TBI) patients. Finally, the TBI group was further divided multiple SVTs were interspersed throughout the test

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


24 Assessment 19(1)

Table 1. Review of Specificity Rates for Reliable Digit Span Cutoff Scores of ≤7 and ≤6
Ratio of true Ratio of true
negatives Specificity negatives Specificity
Study Subjects using ≤7 (%) using ≤6 (%)
Greiffenstein et al. (1995) Moderate to severe TBI 39/68 57 — —
Greiffenstein et al. (1995) Persistent postconcussive 46/68 68 — —
Meyers and Volbrecht (1998) Mild TBI nonlitigating 47/49 96 48/49 98
Strauss et al. (2000) College control 15/21 72 — —
Inman and Berry (2002) College controls and TBI controls 48/48 100 48/48 100
Duncan and Ausborn (2002) Nonneurological prison inmate 96/134 72 121/134 90
Mathias et al. (2002) TBI mild, moderate, and severe 28/30 93 29/30 97
Strauss et al. (2002) Controls and TBI mixed severity — — 38/40 95
Larrabee (2003) Moderate to severe TBI 27/29 94 27/27 100
Etherton et al. (2005) College control 20/20 100 20/20 100
Etherton et al. (2005) College cold pain 20/20 100 20/20 100
Etherton et al. (2005) Moderate to severe TBI 63/69 91 68/69 99
Heinly et al. (2005) Mild TBI 64/77 83 72/77 93
Heinly et al. (2005) Moderate to severe TBI 63/69 91 68/69 99
Heinly et al. (2005) Psychiatric 92/128 72 113/128 88
Heinly et al. (2005)a Cerebrovascular accidenta 290/517 56 362/517 70
Heinly et al. (2005)a Severe memory disordersa 109/228 48 155/228 68
Babikian et al. (2006) Mixed clinical patients & controls 68/88 77 82/88 93
Schwarz et al. (2006) Undergrad controls — — 22/22 100
Axelrod et al. (2006) TBI mixed severities 22/29 76 27/29 93
Greve et al. (2007) Toxic exposure 34/38 89 37/38 97
Marshall & Happe (2007)a Mental retardationa 8/71 11 22/71 31
Graue et al. (2007)a Mental retardationa — — 4/26 15
Dean et al. (2008) Mixed clinical IQs 80 to ≥120 45/56 80 56/56 100
Dean et al. (2008)a Mixed clinical IQs 50-79a 22/47 47 31/47 66
Whitney et al. (2009) Mixed clinical 22/26 85 24/26 92
Greve et al. (2010) Chronic pain 150/176 85 174/176 99
Marshall et al. (2010) Adult attention deficit disorder 62/66 94 64/66 97
Schroeder (2010) Adult attention deficit disorder 75/88 85 83/88 94
Schroeder (2010) Outpatient psychological disorders 49/49 100 49/49 100
Schroeder & Marshall (2011) Psychotic disorders 85/103 83 99/103 96
Schroeder & Marshall (2011) Non-psychotic disorders 138/177 78 170/177 96
Salazar et al. (2007)a ESL outpatientsa 13/25 52 21/25 84
Salazar et al. (2007) Caucasian outpatients 63/76 83 76/76 100
Salazar et al. (2007) African American outpatients 24/30 80 28/30 93
Salazar et al. (2007)a Hispanic outpatientsa 12/27 44 22/27 81
Salazar et al. (2007) Asian outpatients 14/16 88 15/16 94
Note. ESL = English as a second language, TBI = traumatic brain injury.
a
indicates a special group.

batteries. The SVTs included in most test batteries were: as providing a credible effort—not failing Slick et al. (1999)
RDS (Greiffenstein et al., 1994), Sentence Repetition raw criteria—were included in this study, as the purpose of
score (Schroeder & Marshall, 2010), CVLT-II Forced the database was to cross-validate specificity rates in the
Choice Recognition Test errors (Root, Robbins, Chang, & “special groups.”
van Gorp, 2006), Logical Memory Rarely Missed Item
Index (Killgore & DellaPietra, 2000), Rey Complex Figure
recognition scores (Lu, Boone, Cozolino, & Mitchell, Results
2003), the average dominant hand Finger Tapping score Table 1 shows the specificity rates for each study included
(Arnold et al., 2005), and the Dot Counting Test (Boone, in this systematic review. Table 2 provides the pooled
Lu, & Herzberg, 2002). Only patients who were identified specificity rates when using the Bayesian method and the

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


Schroeder et al. 25

Table 2. Weighted Mean and Bayesian Calculated Specificity Rates for Reliable Digit Span Cutoff Scores of ≤7 and ≤6

Studies Subjects Total N ≤ 7 Specificity in % [95% CI] Total N ≤ 6 Specificity in % [95% CI]
Multiple studies (wm) All except special groups 1,848 82 [78%, 86%] 1,751 96 [95%, 97%]
Multiple studies (bm) All except special groups 1,012 85 [79%, 91%] 953 97 [95%, 99%]
Multiple studies (wm) Controls only 61 90 [50%, 100%] 62 100 [100%]
Multiple studies (wm) Mixed clinical only 1,163 82 [78%, 86%] 1,163 95 [93%, 97%]
Multiple studies (wm) All TBI patients 488 82 [72%, 92%] 350 97 [94%, 100%]
Multiple studies (wm) PC and Mild TBI 194 81 [47%, 100%] 126 95 [64%, 100%]
Multiple studies (wm) Mod/Sev TBI 235 82 [55%, 100%] 165 99 [98%, 100%]
Heinly et al. (2005) Cerebrovascular accident 517 56 517 70
Heinly et al. (2005) Memory disorders 228 48 228 68
Multiple studies Mental retardation 71 11 97 27
Dean et al. (2008) IQ 50-59 — — 3 33
Dean et al. (2008) IQ 60-69 — — 12 33
Dean et al. (2008) IQ 70-79 — — 32 81
Salazar et al. (2007) ESL outpatients 25 52 25 84
Salazar et al. (2007) Hispanic outpatients 27 44 27 81
Note. CI = confidence interval; TBI = traumatic brain injury. Confidence limits were not reported when the pooled data were composed of two or
fewer studies: wm = weighted means; bm = Bayesian method; PC = postconcussive; Mod/Sev = Moderate to Severe; ESL = English as a second language.

weighted average method. Specificity rates calculated by averages and the Bayesian method, respectfully. In the
the Bayesian method were included only for the “all sub- clinical subgroups, the ≤7 cutoff resulted in sensitivity rates
jects combined—excluding special groups” group. When ranging from 42% to 72%, whereas the ≤6 cutoff resulted in
this group was divided into subgroups, the subgroups con- rates ranging from 26% to 38%.
tained smaller sample sizes and nonconvergence of the Table 5 presents the specificity rates for varying clinical
Newton–Raphson maximization algorithm rendered the groups from the personal database. As can be seen, the
Bayesian calculations unreliable. Consequently, the speci- patients with severe memory disorders, patients who speak
ficity rates for these subgroups were calculated using only English as a second language, patients of Native American
weighted averages. Because no limitations are placed on and Hispanic backgrounds, and patients with IQ scores in
which studies can be included when using weighted aver- the 70s and below do not produce specificity rates above
ages, opposed to when using the Bayesian method, this 90% when using either RDS cutoff score. The <12 years,
calculation allowed the sample sizes to be maximized in <11 years, and <10 years of education groups all produced
each subgroup. specificity rates near 90% when using the ≤6 cutoff score.
As can be seen in Table 2, the RDS cutoff score of ≤7
resulted in global specificity rates of 82% and 85% when
calculated using weighted averages and the Bayesian Discussion
method, respectively. The RDS cutoff score of ≤6 resulted RDS is one of the most heavily researched SVTs available
in global specificity rates of 96% and 97% when calculated to neuropsychologists. In fact, a recent literature review
using weighted averages and the Bayesian method, respec- yielded more than 20 studies, ranging in dates from 1994
tively. When the subject pool was divided further, specific- to 2011, which evaluated the clinical utility of RDS.
ity rates (calculated via weighted averages) for the ≤7 cutoff Unfortunately, issues within some of the studies limit their
score were below 90% for all subgroups except the control clinical generalizability. As a result, this systematic review
subjects group, whereas the ≤6 cutoff score yielded speci- and cross-validation study was conducted to address the
ficity rates above 90% for all the groups except the “special limitations and improve clinical utility.
groups.” The results of this systematic review and cross-valida-
Table 3 shows the sensitivity rates for each study tion study indicate that the RDS cutoff score of ≤7 achieved
included in this systematic review. Table 4 shows the sensi- a global sensitivity rate of 48% when using weighted aver-
tivity rates when the data are pooled and divided by diag- ages and 58% when using the Bayesian method; however,
nostic groups. Global sensitivity rates for the ≤7 cutoff this cutoff score also produced inadequate specificity rates
equaled 48% and 58% when using weighted averages and (i.e., <90%) for the pooled data (using both statistical meth-
the Bayesian method, respectfully. Global sensitivity rates ods) and for all clinical subgroups (using weighted aver-
for the ≤6 cutoff equaled 30% and 35% when using weighted ages). It is of interest to note that the technical manual for

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


26 Assessment 19(1)

Table 3. Sensitivity Rates for Reliable Digit Span Cutoff Scores of ≤ 7 and ≤ 6

Ratio of true Ratio of true


Study Subjects positives using ≤7 Sensitivity (%) positives using ≤6 Sensitivity (%)
Greiffenstein et al. (1995) Postconcussive PM 47/53 89 — —
Strauss et al. (2000) College simulator malingerers 14/20 71 — —
Inman and Berry (2002) College simulator malingerers 12/44 27 7/44 16
Duncan and Ausborn (2002) Prison malingerers 37/54 68 31/54 57
Mathias et al. (2002) Mixed TBI PM 16/24 67 9/24 38
Strauss et al. (2002) Simulated malingerers — — 16/34 47
Larrabee (2003) Mixed TBI DMND 13/24 50 6/24 23
Etherton et al. (2005) College simulator malingerers 13/20 65 8/20 40
Etherton et al. (2005) Definite MND pain patient 21/35 60 13/35 37
Heinly et al. (2005) Mild TBI malingerers 34/48 71 22/48 46
Heinly et al. (2005) Mod/Sev TBI PM 12/23 52 6/23 26
Babikian et al. (2006) Mixed clinical malingerers 41/66 62 30/66 45
Schwarz et al. (2006) Symptom coached simulators — — 9/21 43
Schwarz et al. (2006) Test-coached simulators — — 9/21 43
Axelrod et al. (2006) Mild TBI malingerers 23/36 64 14/36 39
Greve et al. (2007) Toxic clinical malingerers 25/46 54 21/46 46
Graue et al. (2007) Simulated mental retardation — — 14/25 56
Whitney et al. (2009) Mixed PM 8/20 40 5/20 25
Ylioja et al. (2009) Postconcussive PM 16/33 48 9/33 27
Greve et al. (2010) Chronic pain malingerers 90/185 49 45/185 24
Marshall et al. (2010) Adult ADHD evaluation PM 59/267 22 37/267 14
Note. TBI = traumatic brain injury; MND = malingered neurocognitive dysfunction; ADHD = attention deficit/hyperactivity disorder; PM = Probable
Malingerers; DMND = Definite Malingering of Neuropsychological Dysfunction; Mod/Sev = Moderate to Severe.

Table 4. Weighted Mean Sensitivity Rates for All Studies With Reliable Digit Span Cutoff Scores of ≤7 and ≤6

Mean sensitivity in Mean sensitivity


Studies Subjects Total N for ≤7 % [95% CI] Total N for ≤6 in % [95% CI]
All studies (weighted) All studies 998 48 [39%, 57%] 1026 30 [24%, 36%]
All studies (Bayesian method) All studies 965 58 [48%, 68%] 1026 35 [28%, 42%]
Multiple studies (weighted) Simulators 84 46 [0%, 100%] 165 38 [24%, 52%]
Multiple studies (weighted) Mixed clinical (except TBI) 673 42 [27%, 57%] 673 27 [16%, 38%]
Multiple studies (weighted) All TBI patient malingerers 241 67 [54%, 80%] 188 35 [25%, 45%]
Multiple studies (weighted) PC and mild TBI PM 134 72 [18%, 100%] 81 38 [0%, 100%]
Multiple studies (weighted) Mod/Sev TBI PM 23 52 [—] 23 26 [—]
Note. CI = confidence interval; TBI = traumatic brain injury. No confidence intervals could be calculated for the moderate/severe TBI group; weighted =
weighted means; PC = postconcussive; PM = probable malingerers; Mod/Sev = Moderate to Severe.

the Advanced Clinical Solutions for the WAIS-IV and the As expected, this systematic review indicates that a cut-
Wechsler Memory Scale–4th edition (WMS-IV; Wechsler, off score of ≤6 resulted in lower sensitivity rates than the ≤7
2009) indicates that an RDS cutoff score of ≤7 produced cutoff; however, the ≤6 cutoff score achieved more appro-
specificity rates below 90% for all clinical groups analyzed priate specificity rates. The ≤6 cutoff score resulted in
during their standardization process (their raw data was not global sensitivity rates of 30% and 35% when calculated
added to the systematic review, as the data was not made with weighted averages and the Bayesian method, respec-
available to the authors). Therefore, this systematic review and tively. Global specificity rates resulted in 96% and 97%
the Advanced Clinical Solutions technical manual indicate when calculated with weighted averages and the Bayesian
that an RDS cutoff score of ≤7 produces inadequate speci- method, respectively. This cutoff also resulted in specificity
ficity rates in many clinical groups. rates above 90% for all clinical subgroups except the “special

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


Schroeder et al. 27

Table 5. Specificity Rates for Reliable Digit Span Cutoff Scores of ≤7 and ≤6 From Personal Database
Ratio of true negatives Ratio of true
Study Subjects using ≤7 Specificity (%) negatives using ≤6 Specificity (%)

HCMC databasea Severe memory disordera 7/19 37 11/19 58


a
HCMC database ESL patientsa 7/22 32 16/22 73
HCMC database Caucasian psychiatric patients 334/434 77 401/434 92
HCMC database AA psychiatric patients 102/149 68 139/149 92
HCMC database Asian psychiatric patients 11/13 85 13/13 100
a a
HCMC database NA psychiatric patients 13/20 65 17/20 85
HCMC databasea Hispanic psychiatric patientsa 9/16 56 11/16 69
HCMC database < 12 years education (M = 10.0) 102/156 65 15/156 90
HCMC database < 11 years education (M = 8.9) 41/74 55 65/74 88
HCMC database < 10 years education (M = 7.7) 21/35 60 31/35 89
HCMC database FSIQ 80+ (M = 98.9) 354/424 83 410/424 97
HCMC database FSIQ 80s (M = 84.9) 83/116 72 110/116 95
HCMC databasea FSIQ 70s (M = 74.9)a 57/95 60 81/95 85
a a
HCMC database FSIQ 60s (M = 65.2) 15/39 38 28/39 72
a
Note. indicates a special group, HCMC = Hennepin County Medical Center; FSIQ = full scale intelligence quotient, ESL = English as a second language,
AA = African American, NA = Native American.

groups” (using weighted averages). Interestingly, the had lower educational levels and IQ scores than individuals
Advanced Clinical Solutions technical manual (Wechsler, of other racial backgrounds who were included in this study.
2009) also indicates that an RDS cutoff score of ≤6 produces Thus, it is possible that the lower IQ scores resulted in the
specificity rates of 90% or greater in many clinical groups. lower specificity rates for these two groups. Consequently,
These groups include the temporal lobectomy, Asperger’s more research is needed on RDS performances by individu-
disorder, attention deficit/hyperactivity disorder, major depres- als of Hispanic and Native American origin. Of note,
sive disorder, and anxiety disorder groups. The Advance patients who were of Hispanic background in the Salazar,
Clinical Solutions technical manual further indicates that an Lu, Wen, and Boone (2007) study were proficient in English
RDS cutoff score of ≤6 results in specificity rates near, but and were tested in English. Patients who were of Hispanic
slightly lower than, 90% for individuals with autism, read- background in the personal database were native Spanish
ing disorders, mathematics disorders, and TBIs. Although speakers and were tested either directly in Spanish or
the Advanced Clinical Solutions technical manual indicates through the use of an interpreter.
that the TBI group achieves a specificity rate slightly below One might wonder if adjusted RDS cutoff scores could be
90% when using the ≤6 cutoff score, this systematic used with the aforementioned “special groups” or if other
review, which includes a much larger sample size of SVTs should be used. Based on the current data and the avail-
patients with TBIs, indicates that both the postconcussive/ able literature, adjusted RDS cutoff scores would likely be of
mild TBI and moderate/severe TBI groups achieve a spec- little benefit. Many of the special groups would require sub-
ificity rate greater than 90% when using this cutoff score. stantially lower cutoff scores to maintain specificity rates of
In addition to the aforementioned results, this systematic 90%. As can be seen in this study, lowering the cutoff score
review and cross-validation study also identified clinical from ≤7 to ≤6 resulted in a large sensitivity rate decrease.
samples in which RDS failed to produce adequate specific- Lowering the cutoff score even further would be expected to
ity rates. These samples include patients with CVAs, severe have an even more adverse effect on sensitivity rates. This is
memory disorders (e.g., dementia), mental retardation, IQ demonstrated in a study by Meyers and Volbrecht (1998),
scores in the borderline intellectual functioning range and which indicated that only 10.6% of litigating patients received
below, and those who speak English as a second language. RDS scores of ≤5 and only 2.1% received scores of ≤4.
Individuals of Hispanic and Native American origin also Consequently, it is recommended that clinicians use other
produced specificity rates below 90% for both RDS cutoffs. SVTs that maintain adequate sensitivity and specificity rates
It was noted, however, that individuals in these two groups when evaluating these “special group” patients.

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


28 Assessment 19(1)

Although the results of this systematic review and cross- Scale Digit Span subtest for malingering detection:
validation study indicate that an RDS cutoff score of ≤6 can A meta-analytic review. Journal of Clinical and
be used effectively in many clinical samples, it is important Experimental Neuropsychology, 33, 300-314.
to remember that no single SVT is perfect at detecting sus- Meyers, J. E., & Volbrecht, M. E. (2003). A valida-
pect effort. Obtaining a failing score on one SVT might be tion of multiple malingering detection methods in
suggestive of the possibility of suspect effort, but it does not a large clinical sample. Archives of Clinical Neu-
conclusively indicate that the patient is providing suspect ropsychology, 18, 261-276.
effort. Conversely, a passing score on an SVT is not proof Ruocco, A. C., Swirsky-Sacchetti, T., Chute, D. L.,
that a patient is providing credible effort. As Boone (2009) Mandel, S., Platek, S. M., & Zillmer, E. A. (2008).
has noted, patients feign deficits in a variety of ways and Distinguishing between neuropsychological malin-
some patients are more sophisticated than others. Thus, it is gering and exaggerated psychiatric symptoms in a
emphasized that multiple SVTs be interspersed throughout neuropsychological setting. The Clinical Neuropsy-
the evaluation to maximize the probability of making cor- chologist, 22, 547-564.
rect statements regarding the validity of the neuropsycho-
logical evaluation.
Declaration of Conflicting Interests
Finally, a formal diagnosis of malingering should not be
based solely on SVT failures. As Slick et al. (1999) have The authors declared no potential conflicts of interest with respect to
indicated, there are many factors that go into making a for- the research, authorship, and/or publication of this article.
mal diagnosis of malingering. First, by definition, a diagno-
sis of malingering requires the presence of a substantial Funding
external incentive. Second, SVT failures should not be The authors received no financial support for the research, author-
“fully accounted for by psychiatric, neurological, or devel- ship, and/or publication of this article.
opmental factors” (Slick et al., 1999, p. 552). Third, factors
other than malingering (fatigue, a desire to end testing, etc.) References
can cause a patient to fail two or more SVTs (Marshall American Academy of Clinical Neuropsychology. (2007). American
et al., 2010; Schroeder & Marshall, 2010). Fourth, evidence Academy of Clinical Neuropsychology (AACN) practice guide-
from other sources, such as self-reports and behavioral lines for neuropsychological assessment and consultation. The
observations, can all be suggestive of suspect effort regardless Clinical Neuropsychologist, 21, 209-231.
of SVT failure (Slick et al., 1999). Fifth, inconsistencies Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S.,
between neuropsychological test data and known patterns & McPherson, S. (2005). Sensitivity and specificity of finger
of brain functioning, observed behavior, reliable collateral tapping scores for the detection of suspect effort. The Clinical
reports, or documented background history may also sug- Neuropsychologist, 19, 105-120.
gest suspect effort. Consequently, RDS and other SVTs can Axelrod, B. N., Fichtenberg, N. L., Millis, S. R., & Wertheimer, J. C.
aid in making a diagnosis of probable neurocognitive (2006). Detecting incomplete effort with digit span from
malingering; however, these measures are neither sufficient the Wechsler Adult Intelligence Scale—Third Edition. The
nor required for the diagnosis. Clinical Neuropsychologist, 20, 513-523.
Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). Sensitivity
and specificity of various digit span scores in the detection of
Appendix suspect effort. The Clinical Neuropsychologist, 20, 145-159.
The following articles included Reliable Digit Span but were Begg, C. B. (1994). Publication bias. In H. Cooper, & L. V. Hedges
excluded from the data analysis portion of this study because (Eds.), The handbook of research synthesis (pp. 399-409).
the inclusion criteria were not met or the data was used in New York, NY: Russell Sage Foundation.
another study. Boone, K. B. (2007). Assessment of feigned cognitive impairment:
A neuropsychological perspective. New York, NY: Guilford
Greiffenstein, M. F., & Baker, W. J. (2007). Validity Press.
testing in dually diagnosed post-traumatic stress Boone, K. B. (2009). The need for continuous and comprehensive
disorder and mild closed head injury. The Clinical sampling of effort/response bias during neuropsychological
Neuropsychologist, 22, 565-582. examinations. The Clinical Neuropsychologist, 23, 729-741.
Greiffenstein, M., F., Baker, W. J., & Gola, T. (1994). Boone, K. B., Lu, P., & Herzberg, D. (2002). The Dot Counting
Validation of malingered amnesia measures with Test. Los Angeles, CA: Western Psychological Services.
a large clinical sample. Psychological Assessment, Bush, S., Ruff, R., Troster, A., Barth, J., Koffler, S., Pliskin, N.,
6, 218-224. . . . Silver, C. H. (2005). Symptom validity assessment: Practice
Jasinski, L. J., Berry, D. T. R., Shandera, A. L., & Clark, issues and medical necessity—NAN Policy and Planning Com-
J. A. (2011). Use of the Wechsler Adult Intelligence mittee. Archives of Clinical Neuropsychology, 20, 419-426.

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


Schroeder et al. 29

Constantinou, M., Bauer, L., Ashendorf, L., Fisher, J., & statement on the neuropsychological assessment of effort,
McCaffrey, R. J. (2005). Is poor performance on recognition response bias, and malingering. The Clinical Neuropsycholo-
memory effort measures indicative of generalized poor per- gist, 23, 1093-1129.
formance on neuropsychological tasks? Archives of Clinical Heinly, M. T., Greve, K. W., Bianchini, K. J., Love, J. M., &
Neuropsychology, 20, 191-198. Brennan, A. (2005). WAIS Digit Span-based indicators of
Dean, A. C., Victor, T. L., Boone, K. B., & Arnold, G. (2008). malingered neurocognitive dysfunction: Classification accu-
The relationship if IQ to effort test performance. The Clinical racy in traumatic brain injury. Assessment, 12, 429-444.
Neuropsychologist, 22, 705-722. Inman, T. H., & Berry, D. T. R. (2002). Cross-validation of indica-
Demakis, G. J. (2006). Meta-analysis in neuropsychology: Basic tors of malingering: A comparison of nine neuropsychological
approaches, findings, and applications. The Clinical Neuro- tests, four tests of malingering, and behavioral observations.
psychologist, 20, 10-26. Archives of Clinical Neuropsychology, 17, 1-23.
Duncan, S. A., & Ausborn, D. L. (2002). The use of Reliable Dig- Irwig, L., Macaskill, P., Glasziou, P., & Fahey, M. (1995). Meta-
its to detect malingering in a criminal forensic pretrial popula- analytic methods for diagnostic test accuracy. Journal of Clin-
tion. Assessment, 9, 56-61. ical Epidemiology, 48, 119-130.
Etherton, J. L., Bianchini, K. J., Ciota, M. A., & Greve, K. W. Jasinski, L. J., Berry, D. T. R., Shandera, A. L., & Clark, J. A.
(2005). Reliable digit span is unaffected by laboratory- (2011). Use of the Wechsler Adult Intelligence Scale Digit
induced pain: Implications for clinical use, Assessment, 12, Span subtest for malingering detection: A meta-analytic
101-106. review. Journal of Clinical and Experimental Neuropsychol-
Etherton, J. L., Bianchini, K. J., Greve, K. W., & Heinly, M. T. ogy, 33, 300-314.
(2005). Sensitivity and specificity of Reliable Digit Span in Killgore, W. D., & DellaPietra, L. (2000). Using the WMS-III to
malingered pain-related disability. Assessment, 12, 130-136. detect malingering: Empirical validation of the Rarely Missed
Graue, L. O., Berry, D. T. R., Clark, J. A., Sollman, M. J., Cardi, M., Index (RMI). Journal of Clinical and Experimental Neuropsy-
Hopkins, J., & Werline, D. (2007). Identification of feigned men- chology, 22, 761-771.
tal retardation using the new generation of malingering detection Larrabee, G. J. (2003). Detection of malingering using atypical
instruments: Preliminary findings. The Clinical Neuropsycholo- performance patterns on standard neuropsychological tests.
gist, 21, 929-942. The Clinical Neuropsychologist, 17, 410-425.
Green, P., Rohling, M. L., Lees-Haley, P. R., & Allen, L. A. Larrabee, G. J. (2008). Aggregation across multiple indicators
(2001). Effort has a greater effect on test scores than severe improves the detection of malingering: Relationship to likeli-
brain injury in compensation claimants. Brain Injury, 15, hood ratios. The Clinical Neuropsychologist, 22, 666-679.
1045-1060. Lees-Haley, P. R., & Brown, R. S. (1993). Neuropsychological
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation complaint base rates of 170 personal injury claimants. Archives
of malingered amnesia measures with a large clinical sample. of Clinical Neuropsychology, 8, 203-209.
Psychological Assessment, 6, 218-224. Lezak, M. D., Howieson, D. B., & Loring, D. W. (2004). Neuro-
Greiffenstein, M. F., Gola, T., & Baker, W. J. (1995). MMPI-2 psychological assessment (4th ed.). New York, NY: Oxford
validity scales versus domain specific measures in detection of University Press.
factitious traumatic brain injury. The Clinical Neuropsycholo- Lu, P., Boone, K. B., Cozolino, L., & Mitchell, C. (2003). Effec-
gist, 9, 230-240. tiveness of the Rey-Ostrerrieth Complex Figure Test and the
Greve, K. W., Bianchini, K. J., Etherton, J. L., Meyers, J. E., Meyers and Meyers Recognition Trial in the detection of sus-
Curtis, K. L., & Ord, J. S. (2010). The Reliable Digit Span pect effort. The Clinical Neuropsychologist, 17, 426-440.
Test in chronic pain: Classification Accuracy in detecting Macaskill, P. (2004). Empirical Bayes estimates generated in a
malingered pain-related disability, The Clinical Neuropsy- hierarchical summary ROC analysis agreed closely with those
chologist, 24, 137-152. of a full Bayesian analysis. Journal of Clinical Epidemiology,
Greve, K. W., Springer, S., Bianchini, K. J., Black, F. W., Heinly, 57, 925-932.
M. T., Love, J. M., . . . Ciota, M. A. (2007). Malingering in Marshall, P., & Happe, M. (2007). The performance of individu-
toxic exposure: Classification accuracy of reliable digit span als with mental retardation on cognitive tests assessing effort
and WAIS-III Digit Span Scaled Scores. Assessment, 14, 12-21. and motivation. The Clinical Neuropsychologist, 21, 826-840.
Heaton, R. K., Smith Jr., H. H., Lehman, R. A. W., & Vogt, A. T. Marshall, P., Schroeder, R., O’Brien, J., Fischer, R., Ries, A.,
(1978). Prospects for faking believable deficits on neuropsy- Blesi, B., & Barker, J. (2010). Effectiveness of symptom valid-
chological testing. Journal of Consulting and Clinical Psy- ity measures in identifying cognitive and behavioral symptom
chology, 46, 892-900. exaggeration in adult attention deficit hyperactivity disorder.
Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., The Clinical Neuropsychologist, 24, 1204-1237.
Millis, S. R., & Conference Participants. (2009). American Martin, J. L. R., Pérez, V., Sacristán, M., & Álvarez, E. (2005).
Academy of Clinical Neuropsychology consensus conference Is grey literature essential for a better control of publication

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014


30 Assessment 19(1)

bias in psychiatry? An example from three meta-analyses of psychotic and non-psychotic psychiatric populations. The
schizophrenia. European Psychiatry, 20, 550-553. Clinical Neuropsychologist, 25, 437-453.
Mathias, C. W., Greve, K. W., Bianchini, K. J., Houston, R. J., Schwarz, L. R., Gfeller, J. D., & Oliveri, M. V. (2006). Detecting
& Crouch, J. A. (2002). Detecting malingered neurocognitive feigned impairment with the digit span and vocabulary sub-
dysfunction using the Reliable Digit Span in traumatic brain tests of the Wechsler Adult Intelligence Scale—Third Edition.
injury. Assessment, 9, 301-308. The Clinical Neuropsychologist, 20, 741-753.
Meyers, J. E., & Volbrecht, M. (1998). Validation of reliable dig- Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999). Diagnos-
its for detection of malingering. Assessment, 5, 303-307. tic criteria for malingered neurocognitive dysfunction: Pro-
Millis, S. R., & Putnam, S. H. (1996). Detection of malingering posed standards for clinical practice and research. The Clinical
in postconcussive syndrome. In M. Rizzo & D. Tranel (Eds.), Neuropsychologist, 4, 545-561.
Head injury and postconcussive syndrome (pp. 481-498). Strauss, E., Hultsch, D. F., Hunter, M., Slick, D. J., Patry, B., &
New York, NY: Churchill Livingstone. Levy-Bencheton, J. (2000). Using intraindividual variability
Root, J. C., Robbins, R. N., Chang, L., & van Gorp, W. G. (2006). to detect malingering in cognitive performance. The Clinical
Detection of inadequate effort on the California Verbal Learn- Neuropsychologist, 14, 420-432.
ing Test (2nd ed.): Forced choice recognition and critical item Strauss, E., Slick, D. J., Levy-Bencheton, J., Hunter, M.,
analysis. Journal of the International Neuropsychological MacDonald, S. W. S., & Hultsch, D. F. (2002). Intraindividual
Society, 12, 688-696. variability as an indicator of malingering in head injury.
Salazar, X. F., Lu, P. H., Wen, J., & Boone, K. B. (2007). The use Archives of Clinical Neuropsychology, 17, 423-444.
of effort tests in ethnic minorities and in non-English-speaking Victor, T. L., Boone, K. B., Serpa, J. G., Buehler, M. A., & Ziegler,
and English as a second language populations. In K. B. Boone, E. A. (2009). Interpreting the meaning of multiple symptom
(Ed.). Assessment of feigned cognitive impairment: A neuro- validity test failure. The Clinical Neuropsychologist, 23,
psychological perspective (pp. 366-383). New York, NY: The 297-313.
Guilford Press. Wechsler, D. (1981). Wechsler Adult Intelligence Scale—Revised.
Schroeder, R. W. (2010). Validation of Digit Span-based effort San Antonio, TX: The Psychological Corporation.
indices with an adult attention deficit hyperactivity disorder Wechsler, D. (2009). Advanced Clinical Solutions for the WAIS-IV
population and a population of individuals who report atten- and WMS-IV. San Antonio, TX: Pearson.
tion difficulties that are due to other psychological disor- Whitney, K. A., Davis, J. J., Shepard, P. H., Bertram, D. M., &
ders (Unpublished doctoral dissertation). Argosy University, Adams, K. M. (2009). Digit span age scaled score in middle-
Eagan, MN. aged military veterans: Is it more closely associated with
Schroeder, R. W., & Marshall, P. S. (2010). Validation of the Sen- TOMM failures than reliable digit span? Archives of Clinical
tence Repetition Test as a measure of suspect effort. The Clini- Neuropsychology, 24, 263-272.
cal Neuropsychologist, 24, 326-343. Ylioja, S. G., Baird, A. D., & Podell, K. (2009). Developing a
Schroeder, R. W., & Marshall, P. S. (2011). Evaluation of the spatial analog of the Reliable Digit Span. Archives of Clinical
appropriateness of multiple symptom validity indices in Neuropsychology, 24, 729-739.

Downloaded from asm.sagepub.com at Polis Akademisi on May 16, 2014

You might also like