Validity and Reliability of A Pre-Employment Screening Test The Counterproductive Behavior Index (Cbi)

Journal of Business and Psychology, Vol. 18, No.
4, Summer 2004 ( 2004)
VALIDITY AND RELIABILITY

OF A PRE-EMPLOYMENT SCREENING
TEST: THE COUNTERPRODUCTIVE
BEHAVIOR INDEX (CBI)
Richard I. Lanyon
Arizona State University
Leonard D. Goodstein
Washington, D.C.
ABSTRACT: The Counterproductive Behavior Index (CBI) is a 120-item, true-

false questionnaire developed to assess five aspects of counterproductive work-
place behavior: Dependability Concerns, Aggression, Substance Abuse, Honesty
Concerns, and Computer Abuse, plus an overall measure of Total Concerns. It
also yields a Good Impression score. To assess predictive validity, undergradu-
ates with significant work experience simulated persons who had each of the five
counterproductive behaviors but were exercising care not to get caught trying to
conceal that behavior. All differences between simulated and normative respond-
ing were highly significant, with a median sensitivity of .89 for a specificity of
.90. For similar participants, construct validity correlations ranged from .37
though .72 with a median of .50, and the correlation of CBI Total Concerns with
a Total Validity Index was .66. Test-retest reliabilities of the CBI scales ranged
from .79 to .94 with a median correlation of .87. These compare favorably with
previously reported internal consistencies (Cronbach alphas). Analysis of the
CBI scores of the original normative group at different levels of Good Impression
showed that none of the six Concerns scores were affected by attempts to make
a good impression until the Good Impression score reached the 90th percentile.
This study is based in part on research projects carried out during the Spring 2002
semester by Laura Fuentes, Kristine Goto, Amy Kossman, and Dustin Morrissey, Depart-
ment of Psychology, Arizona State University. The assistance of Jeanette Goodstein in the
drafting of this article is gratefully acknowledged.
Address correspondence to Richard I. Lanyon, Department of Psychology, Arizona State
University, Tempe, AZ 85297-1104. E-mail: rlanyon@asu.edu or Leonard D. Goodstein, 4815
Foxhall Crescent, NW, Washington, DC 20007–1052. E-mail: lendg@AOL.com.
533
0889-3268/04/0600-0533/0  2004 Human Sciences Press, Inc.

534 JOURNAL OF BUSINESS AND PSYCHOLOGY
KEY WORDS: integrity; pre-employment; counterproductive; screening;

testing.
Deviant behavior has always been an issue for employers because of

its widespread nature. For example, McGurn (1988) reported that over
three-quarters of all employees admitted that they had stolen from their
employers at least once. Other deviance in the workplace such as vandal-
ism, computer misuse, absenteeism, and tardiness has been reported by
between one-third and three-quarters of employees surveyed (Harper,
1990; Murphy, 1993). Studies show that almost one-quarter of employees
had knowledge of recent illicit drug use by coworkers (Lehmann, Holcom,
& Simpson, 1990) and that over 40 percent of female employees reported
sexual harassment on the job. Deviant behavior at work is clearly a ma-
jor social and economic issue both for employers and for society.
The direct costs of this workplace deviance have attracted consider-
able research interest. The problem of employee theft of merchandise
(sometimes termed “shrinkage”) and cash is an enormous one. The an-
nual cost of workplace deviance such as violence is estimated at over $4
billion (Bensimon, 1994) and estimates of the total annual cost of a wide
range of workplace deviance range from $4 billion to $120 billion annu-
ally (Buss, 1993; Camera & Schneider, 1994). Regardless of how seri-
ously one takes these estimates, the magnitude of the problem should be
self-evident.
Although paper-and pencil integrity tests for pre-employment screen-
ing have existed since the late 1940s, at least three reasons have been
given for the great increase in their use since the late 1980s (Ones &
Viswesvaran, 1998). First, the federal Employee Polygraph Protection Act
of 1988 prohibited the use of the polygraph for most employment purposes.
Second, favorable reviews of integrity tests began to appear. Third, the
“reconsideration and emergence of personality based measures as worth-
while predictors in the industrial-organizational psychology literature led
to additional attention being directed to integrity tests” (p. 244).
The steady stream of research on these instruments has been re-
viewed on several occasions (Goldberg, Grenier, Guion, Sechrest, &
Wing, 1991; Murphy, 1993; Ones, Viswesaran, & Schmidt, 1993, 1995;
Sackett & Wanek, 1996; Barrett, 2001) and has led to the following con-
clusions. First, the scope of pre-employment integrity tests has gradually
broadened beyond dishonesty, to include screening for such problem ar-
eas as substance abuse, lack of dependability, and aggressiveness. Sec-
ond, approaches that have employed direct questions about dishonesty
and other counterproductive behavior have shown better validities than
indirect measures such as personality test scales. Several explanations
RICHARD I. LANYON AND LEONARD D. GOODSTEIN 535
have been offered for this unexpected finding that people tend to admit
their counterproductive behavior when asked directly (Jones & Terris,
1991). The most frequent explanation given by persons when asked
about their admission of misbehaviors is that “everyone does it” and that
they do less than their fellow employees.
The Counterproductive Behavior Index (CBI; Goodstein & Lanyon,
2003), a 120-item, multi-scale true/false integrity test, asks direct ques-
tions about behaviors and attitudes in five areas of workplace concern:
dependability concerns, aggression, substance abuse, honesty concerns,
and computer abuse, and includes a Good Impression scale. It also yields
a single composite measure of overall or total concern, or “organizational
deviance.” (The scales were named so that a high score would represent
deviance; thus, the word concerns was explicitly included in two of the
names.)
Test construction procedures, reported elsewhere (Goodstein & Lan-
yon, 2002), followed the outline for scale development described earlier
(Lanyon & Goodstein, 1997). Briefly, a universe of content was developed
for each concept, and items were written to map the content representa-
tively. The number of items was then edited down to 40 for each concept
(25 for computer abuse). This preliminary pool of 225 items was adminis-
tered to 191 workplace participants (89 males, 102 females) of varied age
(mean = 32 years), education (mean = 13 years), and occupational level,
and employed in various industries across 12 states.
Preliminary scores for each concept were computed based on the
items written for each scale, and correlations were computed between
each item and the preliminary score for its scale. Next, partial correla-
tions were computed, partialing out the contribution of the (preliminary)
Good Impression scale to the correlation between each item and its pre-
liminary scale score. The 20 items finally chosen for each scale were
those that showed highly significant item/total correlations and partial
correlations (all p < .001). Thus, each item contributed significantly to its
scale beyond a shared relationship with Good Impression. The final
items were checked for representativeness of content. Across the five
Concerns scales, the median correlation of items with their (preliminary)
scale score was .50, and the median partial correlation was .46. Correla-
tions of the final (20-item) scales with the final Good Impression scale
were relatively low (range −.21 to −.35), showing that the test construc-
tion strategy was successful in reducing the contribution of Good Impres-
sion to the scale scores, at least for the initial group of participants.
Establishing the validity of a test such as the CBI is an ongoing pro-
cess. The present paper describes three studies that address predictive
validity, construct validity, test-retest reliability, and the extent of the
scales’ remaining vulnerability to good-impression response distortion.
STUDY 1: PREDICTIVE VALIDITY
METHOD
Participants
This was a simulation study, involving 160 undergraduates at a
large university, who participated as one way of fulfilling a course re-
quirement. All had had significant work experience. They were tested in
small groups of 5 to 15, under conditions of anonymity. There were 88
males and 62 females; mean age was 19.4 (range 18–25). Some partici-
pants completed all six simulations, and some completed three. The un-
equal Ns for the groups (Table 1) were due to the realities of scheduling
and to incomplete data.
Procedure
The nature of the CBI and the purpose of the study were fully ex-
plained in advance to the participants. They then completed the CBI
either three or six times during one or two one-hour sessions, each time
with specific written and oral instructions to simulate one of the six char-
acteristics represented by the scales, including Good Impression. The or-
der of presentation was randomized among the testing groups. To simu-
Table 1
Means and Standard Deviations for Six CBI Scales when Simulating
Each Characteristic
Means (above) and Standard Deviations (below)
Simulated characteristic N D A S H C G
Dependability Concerns 114 13.03 9.00 8.82 13.07 11.98 3.98

4.93 5.98 5.91 4.71 4.77 3.21
Aggression 126 5.60 17.05 5.33 8.63 7.10 6.24
4.57 2.74 5.06 4.93 5.02 3.64
Substance Abuse 123 8.90 8.93 16.61 10.79 8.42 5.88
5.81 6.08 3.60 5.50 5.40 4.46
Honesty Concerns 103 6.39 7.14 5.62 14.15 9.05 7.04
4.41 4.94 4.44 3.73 4.80 4.38
Computer Abuse 109 6.05 5.83 4.87 8.05 15.34 8.68
4.58 5.13 4.90 5.23 3.83 4.87
Good Impression 160 1.15 1.59 1.42 1.81 1.60 16.67
2.42 2.78 2.60 3.23 2.44 3.33
Note. The means representing simulation of the same condition are shown in boldface.
late each of the five concerns, they were told to pretend that they
actually had engaged in such behavior or held such attitudes, and al-
though they wanted to do their “best” on the test in order to get the job,
they recognized that the test might be able to determine whether they
were lying. Therefore, they were to admit to some of the negative charac-
teristics. To simulate Good Impression, they were told to make the very
best impression that they could, in order to increase their chances of
getting the job.
Results
Means and standard deviations for the current participants for all
scales under all six simulation conditions are shown in Table 1. Table 2
repeats the raw score means on each scale when simulating the charac-
teristics of that particular scale, and shows these means as standard
scores (T-scores) based on the 191 participants of the original workplace
normative group (mean age = 32) and separately for the 56 participants
of the workplace normative group who were in the age range 18 through
25 (Goodstein & Lanyon, 2002).
Each of the six means under simulation conditions was compared
with the two corresponding normative means using t-tests. All compari-
sons were significant well beyond the .001 level. The data thus show that
the scales successfully identified the five areas of (simulated) concern
shown, even when participants were instructed to be careful about how
much of the negative characteristic to portray on the test. Simulated
Good Impression was also successfully identified.
Table 2
Standard Scores (in T-score form) of Means in Six Simulations
and Statistical Comparisons with Normative Means
Statistical Comparisons with:
Normative Group Age 18–25 Norms

Means for (N = 191) (N = 56)
Simulation
Simulated Characteristic Groups Standard Score t Standard Score t
Dependability Concerns 13.03 83 5.34 75 14.30

Aggression 17.05 81 32.31 79 19.22
Substance Abuse 16.61 86 32.02 78 16.07
Honesty Concerns 14.15 72 20.28 74 15.12
Computer Abuse 15.34 83 26.43 81 18.78
Good Impression 16.67 65 17.73 62 8.34
Note. All t values are p < .001.

Inspection of all the means in Table 1 shows that the five Concerns
scales differed in their degree of focus on their particular characteristic.
For example, in the condition simulating aggression, none of the other
Concerns scales differed from the normative mean. Thus, the content of
the Aggression scale is quite specific to aggressiveness. On the other
hand, in the simulation of Dependability Concerns, all the other scales
differed from their normative means, although none to the extent of the
Dependability Concerns scale.
Sensitivity and Specificity. For each scale, the distribution of (simulated)
problem scores was compared to the distribution in the normative group.
For example, the first data column in Table 3 considers a specificity of
.95 (i.e., a cutting point on each scale that only five percent of the norma-
tive group scored above). The values in the column show the correspond-
ing sensitivities of the scales, i.e., the proportion of problem respondents
who scored above the cutting point and were therefore correctly identi-
fied as having problems. The remaining columns show sensitivities for
specificity levels of .90, .85, 80, and .75.
It can be seen from Table 3 that all five Concerns scales showed high
sensitivity. Thus, overall accuracy was 91 percent or greater in identify-
ing problem persons when a false-positive rate of 15 percent is accepted
(i.e., a specificity of .85), and 96 percent or greater for a false-positive
rate of 25 percent.
In order to study the appropriateness of the CBI norms in practice,
the CBI was administered to a small group of applicants for jobs at a
medium-sized Midwestern manufacturing company (N = 17; 10 males
and 7 females). The completed tests were sent to the authors without
being examined or scored by the personnel director and these data
Table 3
Sensitivities at Five Levels of Specificity for the CBI Scales Based on a
Comparison of the Normative Group and Simulated Responding Groups
Specificity
Simulated Characteristic N .95 .90 .85 .80 .75
Dependability Concerns 114 .74 .89 .91 .94 .96

Aggression 126 .94 .98 1.00 1.00 1.00
Substance Abuse 123 .95 .98 1.00 1.00 1.00
Honesty Concerns 123 .60 .78 .91 .96 .98
Computer Abuse 109 .82 .89 1.00 1.00 1.00
Good Impression 160 .40 .63 .73 .81 .89
Note. The cumulative percentage figure numerically closest to each specificity was uti-
lized to determine the cutting points on which the sensitivities are based.
played no role in the employment decision. The mean scores for this
small group of job seekers were all numerically lower than those of the
workplace normative group, but were within half a standard deviation
of the norm mean except for the Substance Abuse scale mean which was
slightly less than half a standard deviation below the norm mean.
Nevertheless, four of the 17 applicants scored above the cutting
point indicating a “serious concern” provided in the CBI Technical Man-
ual (Goodstein & Lanyon, 2002) (a score above the 95th percentile) on at
least one of the scales. One 39-year-old male applicant scored at the “con-
cern” (85th percentile) or “serious concern” level on all five of the Conerns
scale despite a moderately high Good Impression score of 12. One 37-
year-old female applicant scored at the “serious concern” level on the
Aggression scale, which was somewhat surprising, at least to the au-
thors. These pilot data suggest that the CBI can facilitate the identifica-
tion of potentially counterproductive employees as part of a pre-employ-
ment screening process.
Discussion
To summarize the findings, all five Concerns scales were successful
in identifying persons who simulated the characteristic of concern, even
while exercising care not to “give too much away.” Three of the scales
(Aggression, Substance Abuse, and Computer Abuse) did well enough to
be able to correctly identify 100 percent of the problem respondents when
the false-positive rate was set at 15 percent. This success is consistent
with the observation that the items tapping these three areas of concern
are very specific and obviously descriptive of their area. In other words,
these concepts are narrow and readily defined. The remaining two areas
of concern—Dependability and Honesty—are inherently broader. These
broader concerns tend to be reflected within the other three areas, al-
though not vice versa. For example, an item such as “I have used alcohol
at work in the past year” could also be seen as reflecting lack of depend-
ability as well as substance abuse, but the item “I do my work thoroughly
and carefully” would not be seen as reflecting substance abuse. With
respect to sensitivity, by accepting a specificity of .90, between 78 and
98 percent (with an average of about 90 percent) of those respondents
with a simulated problem were correctly identified.
These research findings are limited in that they represent workers
who were also college students, and who were tested in a college setting
under research conditions rather than a real-life workplace setting. It
would be appropriate for users of the CBI to keep to track of the success
of the test in their own settings, and to develop their own expectations
over time as to its accuracy in their particular use.
The question of false positives merits further consideration. False
positives become important to the extent that there is a shortage of qual-

ified applicants. If there is an oversupply, then the fact of mistakenly
rejecting some of them has no consequence for the validity of the hiring
process, although it has a negative consequence for those applicants and
could also have legal implications. It is only when there are too few quali-
fied applicants to meet an employer’s needs that false positives make an
impact on the hiring process, since there is a cost to losing these other-
wise qualified applicants. One solution in these circumstances is to raise
the cutting point, which results in a slightly higher risk of hiring a per-
son with a potential problem. Another solution would be to establish a
larger applicant pool whenever possible.
STUDY 2: CONSTRUCT VALIDITY
METHOD
Participants
This study utilized 83 undergraduates at a large university, who
participated as one way of fulfilling a course requirement. All had had
significant work experience. They were tested in small groups of 5 to
15, under conditions of anonymity, over two one-hour periods on dif-
ferent days. There were 43 males and 40 females; mean age was
20 (range 18–29). The number of participants varied among the differ-
ent test instruments due to artifacts of scheduling and to incomplete
data.
Measures
Measures for establishing the construct validity of the CBI are de-
scribed in detail below. They were selected according to the following
criteria.
1. One major measure was selected for each of the five Concerns
scales and for the Good Impression scale. The selected measures
were either well established as measures of the behavior of inter-
est, or if no such measure was available, a validity index was
constructed specifically for the study.
2. Additional measures were utilized whenever it was considered
appropriate.
3. Construct validity was assessed by means of simple correlations

between the CBI scales and each of the validity measures.
Dependability Concerns. This scale was designed to assess problems with

conscientiousness and dependability. The validity measure that was se-
lected represented the Conscientiousness factor of the “Big Five” person-
ality factors; specifically, the Conscientiousness scale of the NEO Five
Factor Inventory (NEO-FFI; Costa & McCrae, 1992).
Aggression. The major validity measure was the Buss-Perry Aggression
Questionnaire (Buss & Perry, 1992), a carefully developed measure giv-
ing scores on four separate facets of aggression (physical aggression, ver-
bal aggression, anger, and hostility) plus a total score representing over-
all aggression. Because the Big Five concept of agreeableness is related
(negatively) to aggression (Costa & McCrae, 1992), the Agreeableness
scale of the NEO-FFI was used as a second validity measure.
Substance Abuse. The primary validity measure was the Drug Abuse
Screening Test (DAST), a 28-item questionnaire that screens for sub-
stance abuse problems within the general population (Skinner, 1982).
Because alcohol use is not directly addressed within the content of the
DAST, use was also made of the Alcohol Use Disorders Identification
Test (AUDIT; Babor, de la Fuente, & Grant, 1992). However, this instru-
ment was designed to assess problem drinking in high-risk populations,
and few of the ten items are relevant to the general population. Accord-
ingly, two items were selected whose content was most relevant—one
inquiring about frequency of alcohol use and a second asking about the
extent to which alcohol use interfered with regular life activities. The
simple sum of participants’ responses to these items (each answered on
a 5-point rating scale) was utilized, and termed the AUDIT Index.
Honesty Concerns. In contrast to the characteristics measured by the
first three scales of the CBI, there are few established measures of hon-
esty concerns. Most of the existing measures of this concept simply ask
respondents direct questions about dishonesty-related behaviors, such as
direct theft. Accordingly, four such questions were utilized. Responses
were transformed into rating-scale format, and the sum was termed the
Honesty Validity Index.
Computer Abuse. As with Honesty Concerns, a series of four direct ques-
tions was constructed, asking participants to report the specific number
of times they had actually engaged in the most central behaviors consti-
tuting computer abuse. An example is: “Over the past year that you have
worked, how often did you use a workplace computer to send personal e-
mail?” The sum based on rating-scale format was termed the Computer
Abuse Validity Index.
Good Impression. The CBI Good Impression Scale was validated against
the overall score of the Balanced Inventory of Desirable Responding
(BIDR; Paulhus, 1986, 1991). The BIDR is an established measure of
socially desirable responding, and consists of two scales that encompass
conscious and unconscious aspects. The present study utilized the sum
of the two scales (Impression Management and Self-Deceptive Enhance-
ment).
Total Concerns. The CBI Total Concerns score is the simple sum of par-
ticipants’ scores on the five Concerns scales. A Total Validity Index was
developed by combining the corresponding five main validity measures:
the Conscientiousness scale of the NEO-FFI, the Buss-Perry Aggression
Questionnaire, the DAST, the Honesty Validity Index, and the Computer
Abuse Validity Index. Because these five measures differed widely in
their range of scores, their scores were converted to z-scores before com-
bining them through simple addition.
Correlations were also determined between the scales of the CBI
and those of another pre-employment screening test, the Applicant Risk
Profiler (ARP; Llobet, 2001), a commercially available instrument with
scales assessing five of the six areas covered by the CBI: Workplace Pol-
icy Compliance, Workplace Aggression, Illegal Drug Use, Integrity/Hon-
esty, and Deception.
Procedure
The nature of the CBI and the purpose of the study were fully ex-
plained to the participants, and anonymity was guaranteed. Of the 83
participants, 41 completed the CBI a second time 2, 5, or 7 days later,
for the purpose of providing data to establish the test-retest reliability
of the scales. These data also enabled the computation of a second set of
correlations between the CBI scales and the construct validity measures.
Results
Correlations between the CBI scales and the various validity mea-
sures were computed for all participants and for males and females sepa-
rately. Each scale is considered in turn.
Dependability Concerns. Correlations between the CBI Dependability
Concerns and all five of the NEO-FFI factors are shown in Table 4. The
overall correlation between Dependability concerns and NEO-FFI Con-
scientiousness was −.50, which is highly significant ( p < .001). The sepa-
rate correlations for males (−.60) and females (−.48) were also significant
( p < .001 and p < .01 respectively).
Table 4
Correlations of the CBI Dependability Concerns Scale with the Five Scales
of the NEO Five-Factor Inventory
CBI Dependability Concerns
Overall Males Females

NEO Five Factor Inventory scale (N = 60) (N = 36) (N = 30)
Neuroticism .07 .31 −.04

Extraversion .05 −.03 .01
Openness .00 −.04 .15
Agreeableness −.05 −.22 .15
Conscientiousness −.50*** −.60*** −.48**
**p < .01; ***p < .001.
Table 4 also shows that none of the correlations between Depend-

ability Concerns and any of the other scales of the NEO-FFI reached
significance. This finding demonstrates that the Dependability Concerns
scale discriminates this particular characteristic from other personality
traits. Thus, there is evidence for discriminant validity as well as conver-
gent validity.
Aggression. Correlations of the CBI Aggression scale with the Buss-
Perry Aggression Questionnaire (including the total score and the four
separate facets of aggression) are shown in Table 5. The correlation of
.72 with the total score is highly significant (p < .001). In addition, the
Table 5
Correlations of the CBI Aggression Scale with the Four Facets and
Total Score of the Buss-Perry Aggression Questionnaire and with the
Agreeableness Scale of the NEO Five Factor Inventory
CBI Aggression

Buss-Perry Aggression Questionnaire (N = 78) (N = 40) (N = 38)
Physical Aggression .64*** .62*** .59***

Verbal Aggression .33*** .37* .28
Anger .67*** .67*** .67***
Hostility .54*** .51*** .55***
Total Score .72*** .72*** .68***
NEO-FFI Agreeableness −.40** −.30 −.50**
*p < .05; **p < .01; ***p < .001.

CBI Aggression scale was significantly correlated with each of the facets,
showing that it is a broad-based measure covering the major aspects of
aggression. The results are consistent with those obtained from males
and females separately.
Also shown in Table 5 are the correlations between the CBI Aggres-
sion scale and the NEO-FFI Agreeableness scale. These inverse correla-
tions further support the validity of the CBI Aggression scale.
Substance Abuse. Correlations for the CBI Substance Abuse scale are
shown in Table 6. The overall correlation with the Drug Abuse Screening
Test (DAST) was .57, which is highly significant ( p < .001). The correla-
tions for males and females separately (.55 and .60) are also highly sig-
nificant (each p < .001).
Correlations between the CBI Substance Abuse scale and the
AUDIT Index are also shown in Table 3. Over all participants, the corre-
lation between these two indices was highly significant. The overall cor-
relation was .42 (p < .001), and .45 for males and .40 for females (p <
.001 and p < .05 respectively).
Honesty Concerns. The overall correlation between the CBI Honesty Con-
cerns scale and the Honesty Validity Index was .37 (p < .001), with .36
for males (p < .05) and .38 for females (p < .05). These correlations, while
significant, are somewhat lower than those for the other CBI scales. The
computations were repeated using the retest responses of the subset of
participants who completed the CBI twice (see Table 7). This time the
correlations were substantially higher. It is noted that the amount of
dishonest job-related behavior reported by the participants in the pres-
ent study was quite minimal, and probably did not represent the full
range of problems with honesty that are typically found in the work-
place.
Table 6
Correlations of the CBI Substance Abuse Scale with the Drug Abuse Screening
Test (DAST) and a College-Related Index (AUDIT Index) from the Alcohol Use
Disorders Identification Test
CBI Substance Abuse

(N = 73) (N = 37) (N = 36)
DAST .57*** .55*** .60***

AUDIT Index .42*** .45** .40*
*p < .05; **p < .01; ***p < .001.

Table 7
Correlations of the CBI Scales with Construct Validity Measures
for the Retest Group
CBI Scale CBI Validity Measure N Correlation
Dependability Concerns NEO-FFI Conscientiousness 34 −.52**

Aggression B-P Aggression Questionnaire 41 .64***
NEO-FFI Agreeableness 41 −.46**
Substance Abuse DAST 37 .59***
AUDIT Index 37 .45**
Honesty Concerns Honesty Validity Index 39 .47**
Computer Abuse Computer Abuse Validity Index 39 .49**
Good Impression BIDR 40 .44**
Total Concerns Total Validity Index 30 .69***
**p < .01; ***p < .001.
Computer Abuse. The overall correlation between the CBI Computer

Abuse scale and the Computer Abuse Validity Index was .50, which is
highly significant (p < .001). For males separately, the correlation was
.61 (p < .001); and for females separately, .31 (p < .10). The lower correla-
tion for females presumably reflects the much narrower range of scores
for females, in line with the fact that computer abuse is not a common
behavior for females.
Good Impression. The overall correlation between the CBI Good Impres-
sion scale and the BIDR was .49, which is highly significant (p < .001).
Correlations for males and females separately were .41 (p < .01) and .55
(p < .001) respectively.
Total Concerns. The obtained relationship between the CBI Total Con-
cerns score and the Total Validity Index represents the validity of the
CBI as a whole. Over all participants, this correlation was .66, which is
highly significant (p < .001). For males separately the correlation was
.70, and for females separately, .58, both highly significant (p < .001)
Correlations With Retest Scores. The main construct validity correlations
were recomputed utilizing the responses of the 41 participants who had
been asked to complete the CBI a second time for purposes of determin-
ing test-retest reliability. These correlations are presented in Table 7. As
with the correlations based on the initial administration of the CBI, all
are strong, and overall are as high or higher than the initial correlations.
As noted above, particular support is shown for the validity of the CBI
Honesty Concerns scale.
Applicant Risk Profiler. In a further assessment of validity, correlations

were computed between the scales of the CBI and those of the ARP.
These correlations are shown in Table 8, which also presents the correla-
tions of the ARP scales with the construct validity measures and (for
comparison purposes) the correlations of the CBI scales with these valid-
ity measures. Three of the five correlations between the CBI scales and
the corresponding ARP scales are highly significant (p < .001). With re-
gard to the remaining two, inspection of the construct validity correla-
tions suggests that the ARP Workplace Policy Compliance scale probably
assesses a somewhat different concept from the CBI Dependability Con-
cerns scale, and that the ARP Deception scale is not a strong measure of
the Good Impression concept.
Overall, the correlations between the ARP scales and the various
construct validity measures were not as high as those obtained with the
CBI. The results raise questions about the usefulness of the ARP as a
personnel screening device.
Effect of Retest. In order to examine the effect on the scores of retesting
after 2–7 days, means for all CBI scales on retest were compared with
the original CBI mean scores for the 41 participants who had completed
the test twice. All retest means were found to be within 0.2 standard
deviation of the original mean except for the Computer Abuse scale, on
which the difference was 0.3 standard deviation. The Total Concerns
means differed by only .03 standard deviation. It can thus be concluded
that retesting had no meaningful effect on the scores.
Table 8
Correlations of the Applicant Risk Profiler (ARP) Scales with Comparable
CBI Scales and with Construct Validity Measures
Correlation with
Construct Validity
Measures
CBI Scale ARP Scale Correlation CBI ARP
Dependability
Concerns Workplace Policy Compliance .18 −.50*** −.37**
Aggression Workplace Aggression .60*** .72*** .66***
Substance Abuse Illegal Drug Use .65*** .57*** .35*
Honesty Concerns Integrity .40*** .37** .33*
Good Impression Deception .26* .49*** .17
Note. The number of participants for the correlations involving the ARP ranged from
62 through 66.
*p < .05; **p < .01; ***p < .001.
Reliability. The stabilities (test-retest reliabilities) of the CBI scales were

computed from the responses of the 41 participants who completed the
CBI twice. Test-retest reliability coefficients, shown in Table 9, range
from .79 for Good Impression to .94 for Substance Abuse with a median
of .87. These test-retest correlations compare quite favorably with the
internal consistencies of the CBI scales, reported in the Technical man-
ual (Goodstein & Lanyon, 2002), which range from .82 for Aggression to
.94 for Total Concerns, with a median of .84. The internal consistency
measures (Cronbach alphas) are also shown in Table 9 for comparison
purposes.
Discussion
The purpose of this study was to demonstrate the relationships of
the scales of the CBI to other measures of the same or similar character-
istics. Our results showed that these relationships were highly signifi-
cant for all of the scales, with a correlation of .66 between Total Concerns
and total Validity Index. Comparison of these data with those in pub-
lished reports of other instruments (e.g., Barrick & Mount, 1991;
Goodstein & Lanyon, 1999; Judge, Heller, & Mount, 2002; Lanyon &
Goodstein, 1997) indicates that these obtained values are equal or supe-
rior to validity levels that have been demonstrated to date.
The data on the reliability of the CBI, both the internal consistency
measures and the test-retest results, provide strong support for the con-
sistency of the CBI scores; that is, such scores are sufficiently stable to
support the use of the CBI in making selection decisions.
It is acknowledged that the participants in the present study were
tested in a college setting under anonymous conditions rather than in
the workplace. It would be expected that the present participants would
Table 9
Test-retest Reliability (Stability) Coefficients and Internal Consistency
Coefficients (Cronbach Alphas) for the Scales of the CBI
Reliability Coefficient
CBI Scale Test-Retest Internal Consistency
Dependability Concerns .87 .83

Aggression .91 .82
Substance Abuse .94 .87
Honesty Concerns .82 .84
Computer Abuse .81 .84
Good Impression .79 .90
Total Concerns .92 .94
have been more willing to admit faults than persons tested in the work-
place, and this difference would result in higher scores on the Concerns
scales for the present participants. Comparison of the mean scores on
the Concerns scales in the present study with those of the normative
group (who were tested in the workplace) shows such a result, although
the differences all were less than one standard deviation. In any event,
there is no reason to suppose that the underlying relationships between
the CBI scales and the construct validity measures would be any differ-
ent for persons tested in the workplace and persons with significant
work experience tested under research conditions. Thus, it is concluded
that the results of the present study can be generalized to the workplace
setting.
The normative group did show a substantially higher mean score
than the present participants on the Good Impression scale, although it
is of interest that their “carefulness” did not unduly depress their scores
on the Concerns scales when compared to the present participants. The
influence of good-impression response set is further explored in Study 3.
STUDY 3: INFLUENCE OF GOOD IMPRESSION
One of the perennial questions asked about any psychological assess-

ment instrument is the degree to which participants’ efforts to make a good
impression compromise the obtained results. These efforts, also known as
defensiveness and faking-good, to name just a few of the rubrics under
which this topic has been considered, were the focus of the third study.
Method
The use of partial correlations in the initial scale construction proce-
dures to select items that showed a relationship to the constructs beyond
a shared relationship with a good-impression response set served to re-
duce the influence of good impression response set as a factor diminish-
ing the validity of the instrument. Study 3 investigated the extent to
which the scales remained vulnerable to good-impression response dis-
tortion despite these earlier efforts to reduce or eliminate the impact of
a good-impression response set.
Participants. This study involved a further analysis of the CBI responses

of the 191 workplace participants in the initial standardization group
(Goodstein & Lanyon, 2002), who completed the test under regular condi-
tions.
Results
The mean scores for each of the five Concerns scales and for Total
Concerns were computed for seven different levels of the Good Impres-
sion scale score: 0–2, 3–5, 6–8, 9–11, 12–14, 15–17, and 18–20. These
mean scores are shown graphically in Figure 1, which also includes hori-
zontal lines that represent the normative mean score for each scale. Vi-
sual inspection shows that, overall, the scores on each scale begin to
show a meaningful decrease (more than one raw score point) when the
scores on the Good Impression scale exceed 14 or 15. Good Impression
scores of 14 and below appear not to systematically affect the scores on
any of the Concerns scales. It is noted that raw scores of 14 and 15 repre-
sent the 89th and 92nd percentiles respectively for Good Impression.
If it is assumed that decrements in the scores on the Concern scales
for respondents with Good Impression scores of 15 and above represent
deliberate suppression of their scores, then the 90th percentile would be
a meaningful cutting point above which to consider the test invalid due
to an excessive good-impression response set. This finding is consistent
with authors’ general experience that there tends to be a legitimate con-
cern regarding approximately one in ten applicants. It also shows that
the test construction procedures were successful in controlling the ad-
verse influence of this response set for nearly all respondents.
Discussion
The findings show that scores on the five Concerns scales are not
affected by a good-impression response set until the score on the Good
Impression scale reaches 14 or 15, corresponding to approximately the
90th percentile. Thus, the test construction procedures were broadly suc-
cessful in controlling any undesired effects of this response set.
However, there is no reason to assume that all the high scorers on
Good Impression necessarily made a deliberate attempt to suppress their
responses on the Concerns scales. Therefore, it is fair to conclude that
probably fewer than 10 percent of the tests are invalidated by a good-
impression response set, and that 10 percent represents a maximum fig-
ure. These findings are consistent with the conclusions of both Ones,
Viswesvaran, and Reiss (1996) and Smith and Ellingson (2002) that a
social desirability response set does not meaningfully affect the construct
validity of measures used for personnel selection.
Because the circumstances of testing vary from situation to situa-
tion, it is recommended that users of the CBI select 15 or more as a
tentative cutting point for indicating the potential invalidity of the test.
Users should also tabulate scores from their own particular setting to
Figure 1
Mean Scores on the CBI Concerns Scales for the Normative Group
at Seven Different Levels of the Good Impression Scale
Note. The normative mean for each scale is represented by a horizontal broken line.
see whether this cutting point is consistent with their personal judgment
regarding the number of respondents who appear to be misrepresenting
themselves.
OVERALL DISCUSSION
Taken together with the data previously reported in the Technical

manual (Goodstein & Lanyon, 2002), the present findings provide sub-
stantial support for using the CBI as part of a personnel selection pro-
cess. It is emphasized that the establishment of validity for an assess-
ment procedure of this type is an ongoing process, and that it is ideal
(although perhaps somewhat unrealistic) to have a specific demonstra-
tion of validity for each particular use to which the test is put. The stud-
ies of the CBI to date provide a foundation for confidence that it will be
a valid instrument in a variety of settings.
It is recognized, however, that the results presented in this article
do not meet the most demanding standards for validity testing. Ideally,
the CBI would be administered to a large group of job applicants and the
CBI data would play no role in the selection process. Long-term data on
the successful applicants’ on-the-job performance would then be used to
provide predictive validity evidence.
Such idealized studies are almost impossible to carry out in the real
world of business, for several reasons. First, few businesses are willing
to allow them. Second, many persons involved in the recruitment and
selection process verbally ask applicants some of the same questions that
are present in the CBI, often simply as a process to ensure that the
issues are addressed. Thus, those applicants not selected might have
been screened out by a parallel selection process. Third, to avoid the
hassle and possible bad publicity that may result, most organizations
choose not to prosecute employees for dishonest behavior, except for
egregious cases. Instead, they typically allow such employees to simply
resign without claiming any residual benefits. All of these reasons make
it difficult to conduct research on the incidence of problem behavior, and
more in-depth research is even more problematic.
Another approach to determining the validity of a pre-employment
screening instrument is to study changes in the incidence of counterpro-
ductive behaviors such as cash and merchandise shrinkage, tardiness
and absenteeism, arguments and fights on the shop floor, and low pro-
ductivity, following the systematic introduction of an integrity instru-
ment into the applicant selection process. A subsequent decrease in any
or all of these measures can then be attributed to the improvement in
the validity of the selection process by the addition of the instrument.
Although such studies might not impress academic purists, they will
more than meet the needs of business managers who are responsible for
the success of their enterprise.
REFERENCES
Babor, T. F., de la Fuente, J. R., & Grant, M. (1992). AUDIT: The Alcohol Use Disorders
Identification Test. Guidelines for use in primary health care. Geneva, Switzerland:
World Health Organization.
Barrett, P. (2001). Pre-employment integrity testing: Current methods, problems, and solu-
tions. Paper presented at a meeting of the British Computer Society: Information Secu-
rity Group, Milton Hill, Oxford, UK.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job perfor-
mance: A meta-analysis. Personnel Psychology, 44, 1–26.
Buss, A. H., & Perry, M. (1992). The aggression questionnaire. Journal of Personality and
Social Psychology, 63, 452–459.
Buss, D. (1993). Ways to curtail employee theft. Nation’s Business, 36–38.
Bensimon, H. F. (1994). Crisis and disaster management: Violations in the workplace.
Training and Development, 28, 27–32.
Camera, W., & Schneider, D. L. (1994). Integrity tests: facts and unresolved issues. Ameri-
can Psychologist, 49, 112–119.
Costa, P., & McCrae, R. (1992). Revised NEO Personality Inventory and NEO Five-Factor
Inventory: Professional Manual. Odessa, FL: Psychological Assessment Resources.
Goldberg, L. R., Grenier, J. R., Guion, R. M., Sechrest, L. B., & Wing, H. (1991). Question-
naires used in the prediction of trustworthiness in pre-employment selection decisions:
An APA task force report. Washington, D.C.: American Psychological Association.
Goodstein, L. D., & Lanyon, R. I. (1999). Applications of personality assessment to the
workplace: A review. Journal of Business and Psychology, 13, 291–322.
Goodstein, L. D., & Lanyon, R. I. (2002). Counterproductive Behavior Index (CBI): Techni-
cal manual., version 2.0. Amherst, MA: HRD Press.
Harper, D. (1990). Spotlight abuse—save profits. Industrial Distribution, 79, 47–51.
Jones, J. W., & Terris, W. (1991). Integrity testing for personnel selection. Forensic Reports,
4, 117–140.
Judge, T. A., Heller, D., & Mount, M. K. (2002). Five-factor model of personality and job
satisfaction. Journal of Applied Psychology, 87, 530–541.
Lanyon, R. I., & Goodstein, L. D. (1997). Personality assessment (3rd ed.). New York: Wiley.
Lehmann, W. E. K., Holcom, N. L., & Simpson, D. D. (1990). Employee health and perfor-
mance in the workplace: A survey of municipal workers in a large southwestern city.
Fort Worth: Texas Christian University, Institute of Behavioral Research.
Llobet, J. M. (2001). Applicant Risk Profiler: Administrator’s manual. Los Angeles: West-
ern Psychological Services.
McGurn, W. (1988, June 1). Spotting the thieves who work among us. Wall Street Journal,
p. 16a.
Murphy, K. R. (1993). Honesty in the workplace. Pacific Grove, CA: Brooks/Cole.
Ones, D. S. & Viewesaran, C. (1998). Integrity testing in organizations. In R. W. Griffin,
A. O’Leary-Kelly, & J. M. Collins (Eds.), Dysfunctional behavior in organizations: Vol.
2. Nonviolent behaviors in organizations. Greenwich, CT: JAI Press.
Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). The role of social desirability in person-
ality testing for personnel selection: The Red Herring. Journal of Applied Psychology,
81, 660–679.
Ones, D. S., Viswesaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analyses of
integrity test validities: Findings and implications for personnel selection and theories.
Journal of Applied Psychology, 78, 670–703.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1995). Integrity tests: Overlooked facts,
resolved questions, and unanswered questions. American Psychologist, 49, 456–457.
Paulhus, D. (1986). Self-deception and impression management in test responses. In A.

Angleitner & J. S. Wiggins (Eds.), Personality assessment via questionnaire (pp. 143–
165). New York: Springer.
Paulhus, D. (1991). Balanced Inventory of Desirable Responding (BIDR). In J. P. Robinson,
P. R. Shaver, & L. S. Wrightsman (Eds.), Measures of personality and social psycholog-
ical attitudes (pp. 17–59). San Diego: Academic Press.
Sackett, P. R., & Wanek, J. E. (1996). New developments in the use of measures of honesty,
integrity, conscientiousness, trustworthiness, and reliability for personnel selection.
Personnel Psychology, 49, 787–829.
Skinner, H. A. (1982). The Drug Abuse Screening Test. Addictive Behavior, 7, 363–371.
Smith, D. B., & Ellingson, J. E. (2002). Substance versus style: A new look at social desir-
ability in motivating contexts. Journal of Applied Psychology, 87, 211–219.

Validity and Reliability of A Pre-Employment Screening Test The Counterproductive Behavior Index (Cbi)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validity and Reliability of A Pre-Employment Screening Test The Counterproductive Behavior Index (Cbi)

Uploaded by

Copyright:

Available Formats

Journal of Business and Psychology, Vol. 18, No.

4, Summer 2004 ( 2004)

VALIDITY AND RELIABILITY

ABSTRACT: The Counterproductive Behavior Index (CBI) is a 120-item, true-

0889-3268/04/0600-0533/0  2004 Human Sciences Press, Inc.

KEY WORDS: integrity; pre-employment; counterproductive; screening;

Deviant behavior has always been an issue for employers because of

STUDY 1: PREDICTIVE VALIDITY

Means (above) and Standard Deviations (below)

Dependability Concerns 114 13.03 9.00 8.82 13.07 11.98 3.98

Statistical Comparisons with:

Normative Group Age 18–25 Norms

Dependability Concerns 13.03 83 5.34 75 14.30

Note. All t values are p < .001.

Simulated Characteristic N .95 .90 .85 .80 .75

Dependability Concerns 114 .74 .89 .91 .94 .96

positives become important to the extent that there is a shortage of qual-

STUDY 2: CONSTRUCT VALIDITY

3. Construct validity was assessed by means of simple correlations

Dependability Concerns. This scale was designed to assess problems with

CBI Dependability Concerns

Overall Males Females

Neuroticism .07 .31 −.04

**p < .01; ***p < .001.

Table 4 also shows that none of the correlations between Depend-

Overall Males Females

Physical Aggression .64*** .62*** .59***

*p < .05; **p < .01; ***p < .001.

CBI Substance Abuse

Overall Males Females

DAST .57*** .55*** .60***

*p < .05; **p < .01; ***p < .001.

CBI Scale CBI Validity Measure N Correlation

Dependability Concerns NEO-FFI Conscientiousness 34 −.52**

**p < .01; ***p < .001.

Computer Abuse. The overall correlation between the CBI Computer

Applicant Risk Profiler. In a further assessment of validity, correlations

CBI Scale ARP Scale Correlation CBI ARP

Reliability. The stabilities (test-retest reliabilities) of the CBI scales were

CBI Scale Test-Retest Internal Consistency

Dependability Concerns .87 .83

STUDY 3: INFLUENCE OF GOOD IMPRESSION

One of the perennial questions asked about any psychological assess-

Participants. This study involved a further analysis of the CBI responses

Taken together with the data previously reported in the Technical

Paulhus, D. (1986). Self-deception and impression management in test responses. In A.

You might also like

p < .01; *p < .001.

Physical Aggression .64* .62* .59***

p < .05; p < .01; p < .001.

DAST .57* .55* .60***

p < .05; p < .01; p < .001.

p < .01; *p < .001.