Professional Documents
Culture Documents
Validity and Reliability of A Pre-Employment Screening Test The Counterproductive Behavior Index (Cbi)
Validity and Reliability of A Pre-Employment Screening Test The Counterproductive Behavior Index (Cbi)
Leonard D. Goodstein
Washington, D.C.
This study is based in part on research projects carried out during the Spring 2002
semester by Laura Fuentes, Kristine Goto, Amy Kossman, and Dustin Morrissey, Depart-
ment of Psychology, Arizona State University. The assistance of Jeanette Goodstein in the
drafting of this article is gratefully acknowledged.
Address correspondence to Richard I. Lanyon, Department of Psychology, Arizona State
University, Tempe, AZ 85297-1104. E-mail: rlanyon@asu.edu or Leonard D. Goodstein, 4815
Foxhall Crescent, NW, Washington, DC 20007–1052. E-mail: lendg@AOL.com.
533
have been offered for this unexpected finding that people tend to admit
their counterproductive behavior when asked directly (Jones & Terris,
1991). The most frequent explanation given by persons when asked
about their admission of misbehaviors is that “everyone does it” and that
they do less than their fellow employees.
The Counterproductive Behavior Index (CBI; Goodstein & Lanyon,
2003), a 120-item, multi-scale true/false integrity test, asks direct ques-
tions about behaviors and attitudes in five areas of workplace concern:
dependability concerns, aggression, substance abuse, honesty concerns,
and computer abuse, and includes a Good Impression scale. It also yields
a single composite measure of overall or total concern, or “organizational
deviance.” (The scales were named so that a high score would represent
deviance; thus, the word concerns was explicitly included in two of the
names.)
Test construction procedures, reported elsewhere (Goodstein & Lan-
yon, 2002), followed the outline for scale development described earlier
(Lanyon & Goodstein, 1997). Briefly, a universe of content was developed
for each concept, and items were written to map the content representa-
tively. The number of items was then edited down to 40 for each concept
(25 for computer abuse). This preliminary pool of 225 items was adminis-
tered to 191 workplace participants (89 males, 102 females) of varied age
(mean = 32 years), education (mean = 13 years), and occupational level,
and employed in various industries across 12 states.
Preliminary scores for each concept were computed based on the
items written for each scale, and correlations were computed between
each item and the preliminary score for its scale. Next, partial correla-
tions were computed, partialing out the contribution of the (preliminary)
Good Impression scale to the correlation between each item and its pre-
liminary scale score. The 20 items finally chosen for each scale were
those that showed highly significant item/total correlations and partial
correlations (all p < .001). Thus, each item contributed significantly to its
scale beyond a shared relationship with Good Impression. The final
items were checked for representativeness of content. Across the five
Concerns scales, the median correlation of items with their (preliminary)
scale score was .50, and the median partial correlation was .46. Correla-
tions of the final (20-item) scales with the final Good Impression scale
were relatively low (range −.21 to −.35), showing that the test construc-
tion strategy was successful in reducing the contribution of Good Impres-
sion to the scale scores, at least for the initial group of participants.
Establishing the validity of a test such as the CBI is an ongoing pro-
cess. The present paper describes three studies that address predictive
validity, construct validity, test-retest reliability, and the extent of the
scales’ remaining vulnerability to good-impression response distortion.
536 JOURNAL OF BUSINESS AND PSYCHOLOGY
METHOD
Participants
This was a simulation study, involving 160 undergraduates at a
large university, who participated as one way of fulfilling a course re-
quirement. All had had significant work experience. They were tested in
small groups of 5 to 15, under conditions of anonymity. There were 88
males and 62 females; mean age was 19.4 (range 18–25). Some partici-
pants completed all six simulations, and some completed three. The un-
equal Ns for the groups (Table 1) were due to the realities of scheduling
and to incomplete data.
Procedure
The nature of the CBI and the purpose of the study were fully ex-
plained in advance to the participants. They then completed the CBI
either three or six times during one or two one-hour sessions, each time
with specific written and oral instructions to simulate one of the six char-
acteristics represented by the scales, including Good Impression. The or-
der of presentation was randomized among the testing groups. To simu-
Table 1
Means and Standard Deviations for Six CBI Scales when Simulating
Each Characteristic
Simulated characteristic N D A S H C G
Note. The means representing simulation of the same condition are shown in boldface.
RICHARD I. LANYON AND LEONARD D. GOODSTEIN 537
late each of the five concerns, they were told to pretend that they
actually had engaged in such behavior or held such attitudes, and al-
though they wanted to do their “best” on the test in order to get the job,
they recognized that the test might be able to determine whether they
were lying. Therefore, they were to admit to some of the negative charac-
teristics. To simulate Good Impression, they were told to make the very
best impression that they could, in order to increase their chances of
getting the job.
Results
Means and standard deviations for the current participants for all
scales under all six simulation conditions are shown in Table 1. Table 2
repeats the raw score means on each scale when simulating the charac-
teristics of that particular scale, and shows these means as standard
scores (T-scores) based on the 191 participants of the original workplace
normative group (mean age = 32) and separately for the 56 participants
of the workplace normative group who were in the age range 18 through
25 (Goodstein & Lanyon, 2002).
Each of the six means under simulation conditions was compared
with the two corresponding normative means using t-tests. All compari-
sons were significant well beyond the .001 level. The data thus show that
the scales successfully identified the five areas of (simulated) concern
shown, even when participants were instructed to be careful about how
much of the negative characteristic to portray on the test. Simulated
Good Impression was also successfully identified.
Table 2
Standard Scores (in T-score form) of Means in Six Simulations
and Statistical Comparisons with Normative Means
Inspection of all the means in Table 1 shows that the five Concerns
scales differed in their degree of focus on their particular characteristic.
For example, in the condition simulating aggression, none of the other
Concerns scales differed from the normative mean. Thus, the content of
the Aggression scale is quite specific to aggressiveness. On the other
hand, in the simulation of Dependability Concerns, all the other scales
differed from their normative means, although none to the extent of the
Dependability Concerns scale.
Sensitivity and Specificity. For each scale, the distribution of (simulated)
problem scores was compared to the distribution in the normative group.
For example, the first data column in Table 3 considers a specificity of
.95 (i.e., a cutting point on each scale that only five percent of the norma-
tive group scored above). The values in the column show the correspond-
ing sensitivities of the scales, i.e., the proportion of problem respondents
who scored above the cutting point and were therefore correctly identi-
fied as having problems. The remaining columns show sensitivities for
specificity levels of .90, .85, 80, and .75.
It can be seen from Table 3 that all five Concerns scales showed high
sensitivity. Thus, overall accuracy was 91 percent or greater in identify-
ing problem persons when a false-positive rate of 15 percent is accepted
(i.e., a specificity of .85), and 96 percent or greater for a false-positive
rate of 25 percent.
In order to study the appropriateness of the CBI norms in practice,
the CBI was administered to a small group of applicants for jobs at a
medium-sized Midwestern manufacturing company (N = 17; 10 males
and 7 females). The completed tests were sent to the authors without
being examined or scored by the personnel director and these data
Table 3
Sensitivities at Five Levels of Specificity for the CBI Scales Based on a
Comparison of the Normative Group and Simulated Responding Groups
Specificity
Note. The cumulative percentage figure numerically closest to each specificity was uti-
lized to determine the cutting points on which the sensitivities are based.
RICHARD I. LANYON AND LEONARD D. GOODSTEIN 539
played no role in the employment decision. The mean scores for this
small group of job seekers were all numerically lower than those of the
workplace normative group, but were within half a standard deviation
of the norm mean except for the Substance Abuse scale mean which was
slightly less than half a standard deviation below the norm mean.
Nevertheless, four of the 17 applicants scored above the cutting
point indicating a “serious concern” provided in the CBI Technical Man-
ual (Goodstein & Lanyon, 2002) (a score above the 95th percentile) on at
least one of the scales. One 39-year-old male applicant scored at the “con-
cern” (85th percentile) or “serious concern” level on all five of the Conerns
scale despite a moderately high Good Impression score of 12. One 37-
year-old female applicant scored at the “serious concern” level on the
Aggression scale, which was somewhat surprising, at least to the au-
thors. These pilot data suggest that the CBI can facilitate the identifica-
tion of potentially counterproductive employees as part of a pre-employ-
ment screening process.
Discussion
To summarize the findings, all five Concerns scales were successful
in identifying persons who simulated the characteristic of concern, even
while exercising care not to “give too much away.” Three of the scales
(Aggression, Substance Abuse, and Computer Abuse) did well enough to
be able to correctly identify 100 percent of the problem respondents when
the false-positive rate was set at 15 percent. This success is consistent
with the observation that the items tapping these three areas of concern
are very specific and obviously descriptive of their area. In other words,
these concepts are narrow and readily defined. The remaining two areas
of concern—Dependability and Honesty—are inherently broader. These
broader concerns tend to be reflected within the other three areas, al-
though not vice versa. For example, an item such as “I have used alcohol
at work in the past year” could also be seen as reflecting lack of depend-
ability as well as substance abuse, but the item “I do my work thoroughly
and carefully” would not be seen as reflecting substance abuse. With
respect to sensitivity, by accepting a specificity of .90, between 78 and
98 percent (with an average of about 90 percent) of those respondents
with a simulated problem were correctly identified.
These research findings are limited in that they represent workers
who were also college students, and who were tested in a college setting
under research conditions rather than a real-life workplace setting. It
would be appropriate for users of the CBI to keep to track of the success
of the test in their own settings, and to develop their own expectations
over time as to its accuracy in their particular use.
The question of false positives merits further consideration. False
540 JOURNAL OF BUSINESS AND PSYCHOLOGY
METHOD
Participants
This study utilized 83 undergraduates at a large university, who
participated as one way of fulfilling a course requirement. All had had
significant work experience. They were tested in small groups of 5 to
15, under conditions of anonymity, over two one-hour periods on dif-
ferent days. There were 43 males and 40 females; mean age was
20 (range 18–29). The number of participants varied among the differ-
ent test instruments due to artifacts of scheduling and to incomplete
data.
Measures
Measures for establishing the construct validity of the CBI are de-
scribed in detail below. They were selected according to the following
criteria.
1. One major measure was selected for each of the five Concerns
scales and for the Good Impression scale. The selected measures
were either well established as measures of the behavior of inter-
est, or if no such measure was available, a validity index was
constructed specifically for the study.
2. Additional measures were utilized whenever it was considered
appropriate.
RICHARD I. LANYON AND LEONARD D. GOODSTEIN 541
Good Impression. The CBI Good Impression Scale was validated against
the overall score of the Balanced Inventory of Desirable Responding
(BIDR; Paulhus, 1986, 1991). The BIDR is an established measure of
socially desirable responding, and consists of two scales that encompass
conscious and unconscious aspects. The present study utilized the sum
of the two scales (Impression Management and Self-Deceptive Enhance-
ment).
Total Concerns. The CBI Total Concerns score is the simple sum of par-
ticipants’ scores on the five Concerns scales. A Total Validity Index was
developed by combining the corresponding five main validity measures:
the Conscientiousness scale of the NEO-FFI, the Buss-Perry Aggression
Questionnaire, the DAST, the Honesty Validity Index, and the Computer
Abuse Validity Index. Because these five measures differed widely in
their range of scores, their scores were converted to z-scores before com-
bining them through simple addition.
Correlations were also determined between the scales of the CBI
and those of another pre-employment screening test, the Applicant Risk
Profiler (ARP; Llobet, 2001), a commercially available instrument with
scales assessing five of the six areas covered by the CBI: Workplace Pol-
icy Compliance, Workplace Aggression, Illegal Drug Use, Integrity/Hon-
esty, and Deception.
Procedure
The nature of the CBI and the purpose of the study were fully ex-
plained to the participants, and anonymity was guaranteed. Of the 83
participants, 41 completed the CBI a second time 2, 5, or 7 days later,
for the purpose of providing data to establish the test-retest reliability
of the scales. These data also enabled the computation of a second set of
correlations between the CBI scales and the construct validity measures.
Results
Correlations between the CBI scales and the various validity mea-
sures were computed for all participants and for males and females sepa-
rately. Each scale is considered in turn.
Dependability Concerns. Correlations between the CBI Dependability
Concerns and all five of the NEO-FFI factors are shown in Table 4. The
overall correlation between Dependability concerns and NEO-FFI Con-
scientiousness was −.50, which is highly significant ( p < .001). The sepa-
rate correlations for males (−.60) and females (−.48) were also significant
( p < .001 and p < .01 respectively).
RICHARD I. LANYON AND LEONARD D. GOODSTEIN 543
Table 4
Correlations of the CBI Dependability Concerns Scale with the Five Scales
of the NEO Five-Factor Inventory
Table 5
Correlations of the CBI Aggression Scale with the Four Facets and
Total Score of the Buss-Perry Aggression Questionnaire and with the
Agreeableness Scale of the NEO Five Factor Inventory
CBI Aggression
CBI Aggression scale was significantly correlated with each of the facets,
showing that it is a broad-based measure covering the major aspects of
aggression. The results are consistent with those obtained from males
and females separately.
Also shown in Table 5 are the correlations between the CBI Aggres-
sion scale and the NEO-FFI Agreeableness scale. These inverse correla-
tions further support the validity of the CBI Aggression scale.
Substance Abuse. Correlations for the CBI Substance Abuse scale are
shown in Table 6. The overall correlation with the Drug Abuse Screening
Test (DAST) was .57, which is highly significant ( p < .001). The correla-
tions for males and females separately (.55 and .60) are also highly sig-
nificant (each p < .001).
Correlations between the CBI Substance Abuse scale and the
AUDIT Index are also shown in Table 3. Over all participants, the corre-
lation between these two indices was highly significant. The overall cor-
relation was .42 (p < .001), and .45 for males and .40 for females (p <
.001 and p < .05 respectively).
Honesty Concerns. The overall correlation between the CBI Honesty Con-
cerns scale and the Honesty Validity Index was .37 (p < .001), with .36
for males (p < .05) and .38 for females (p < .05). These correlations, while
significant, are somewhat lower than those for the other CBI scales. The
computations were repeated using the retest responses of the subset of
participants who completed the CBI twice (see Table 7). This time the
correlations were substantially higher. It is noted that the amount of
dishonest job-related behavior reported by the participants in the pres-
ent study was quite minimal, and probably did not represent the full
range of problems with honesty that are typically found in the work-
place.
Table 6
Correlations of the CBI Substance Abuse Scale with the Drug Abuse Screening
Test (DAST) and a College-Related Index (AUDIT Index) from the Alcohol Use
Disorders Identification Test
Table 7
Correlations of the CBI Scales with Construct Validity Measures
for the Retest Group
Table 8
Correlations of the Applicant Risk Profiler (ARP) Scales with Comparable
CBI Scales and with Construct Validity Measures
Correlation with
Construct Validity
Measures
Dependability
Concerns Workplace Policy Compliance .18 −.50*** −.37**
Aggression Workplace Aggression .60*** .72*** .66***
Substance Abuse Illegal Drug Use .65*** .57*** .35*
Honesty Concerns Integrity .40*** .37** .33*
Good Impression Deception .26* .49*** .17
Note. The number of participants for the correlations involving the ARP ranged from
62 through 66.
*p < .05; **p < .01; ***p < .001.
RICHARD I. LANYON AND LEONARD D. GOODSTEIN 547
Discussion
The purpose of this study was to demonstrate the relationships of
the scales of the CBI to other measures of the same or similar character-
istics. Our results showed that these relationships were highly signifi-
cant for all of the scales, with a correlation of .66 between Total Concerns
and total Validity Index. Comparison of these data with those in pub-
lished reports of other instruments (e.g., Barrick & Mount, 1991;
Goodstein & Lanyon, 1999; Judge, Heller, & Mount, 2002; Lanyon &
Goodstein, 1997) indicates that these obtained values are equal or supe-
rior to validity levels that have been demonstrated to date.
The data on the reliability of the CBI, both the internal consistency
measures and the test-retest results, provide strong support for the con-
sistency of the CBI scores; that is, such scores are sufficiently stable to
support the use of the CBI in making selection decisions.
It is acknowledged that the participants in the present study were
tested in a college setting under anonymous conditions rather than in
the workplace. It would be expected that the present participants would
Table 9
Test-retest Reliability (Stability) Coefficients and Internal Consistency
Coefficients (Cronbach Alphas) for the Scales of the CBI
Reliability Coefficient
have been more willing to admit faults than persons tested in the work-
place, and this difference would result in higher scores on the Concerns
scales for the present participants. Comparison of the mean scores on
the Concerns scales in the present study with those of the normative
group (who were tested in the workplace) shows such a result, although
the differences all were less than one standard deviation. In any event,
there is no reason to suppose that the underlying relationships between
the CBI scales and the construct validity measures would be any differ-
ent for persons tested in the workplace and persons with significant
work experience tested under research conditions. Thus, it is concluded
that the results of the present study can be generalized to the workplace
setting.
The normative group did show a substantially higher mean score
than the present participants on the Good Impression scale, although it
is of interest that their “carefulness” did not unduly depress their scores
on the Concerns scales when compared to the present participants. The
influence of good-impression response set is further explored in Study 3.
Method
The use of partial correlations in the initial scale construction proce-
dures to select items that showed a relationship to the constructs beyond
a shared relationship with a good-impression response set served to re-
duce the influence of good impression response set as a factor diminish-
ing the validity of the instrument. Study 3 investigated the extent to
which the scales remained vulnerable to good-impression response dis-
tortion despite these earlier efforts to reduce or eliminate the impact of
a good-impression response set.
Results
The mean scores for each of the five Concerns scales and for Total
Concerns were computed for seven different levels of the Good Impres-
sion scale score: 0–2, 3–5, 6–8, 9–11, 12–14, 15–17, and 18–20. These
mean scores are shown graphically in Figure 1, which also includes hori-
zontal lines that represent the normative mean score for each scale. Vi-
sual inspection shows that, overall, the scores on each scale begin to
show a meaningful decrease (more than one raw score point) when the
scores on the Good Impression scale exceed 14 or 15. Good Impression
scores of 14 and below appear not to systematically affect the scores on
any of the Concerns scales. It is noted that raw scores of 14 and 15 repre-
sent the 89th and 92nd percentiles respectively for Good Impression.
If it is assumed that decrements in the scores on the Concern scales
for respondents with Good Impression scores of 15 and above represent
deliberate suppression of their scores, then the 90th percentile would be
a meaningful cutting point above which to consider the test invalid due
to an excessive good-impression response set. This finding is consistent
with authors’ general experience that there tends to be a legitimate con-
cern regarding approximately one in ten applicants. It also shows that
the test construction procedures were successful in controlling the ad-
verse influence of this response set for nearly all respondents.
Discussion
The findings show that scores on the five Concerns scales are not
affected by a good-impression response set until the score on the Good
Impression scale reaches 14 or 15, corresponding to approximately the
90th percentile. Thus, the test construction procedures were broadly suc-
cessful in controlling any undesired effects of this response set.
However, there is no reason to assume that all the high scorers on
Good Impression necessarily made a deliberate attempt to suppress their
responses on the Concerns scales. Therefore, it is fair to conclude that
probably fewer than 10 percent of the tests are invalidated by a good-
impression response set, and that 10 percent represents a maximum fig-
ure. These findings are consistent with the conclusions of both Ones,
Viswesvaran, and Reiss (1996) and Smith and Ellingson (2002) that a
social desirability response set does not meaningfully affect the construct
validity of measures used for personnel selection.
Because the circumstances of testing vary from situation to situa-
tion, it is recommended that users of the CBI select 15 or more as a
tentative cutting point for indicating the potential invalidity of the test.
Users should also tabulate scores from their own particular setting to
Figure 1
Mean Scores on the CBI Concerns Scales for the Normative Group
at Seven Different Levels of the Good Impression Scale
Note. The normative mean for each scale is represented by a horizontal broken line.
RICHARD I. LANYON AND LEONARD D. GOODSTEIN 551
see whether this cutting point is consistent with their personal judgment
regarding the number of respondents who appear to be misrepresenting
themselves.
OVERALL DISCUSSION
Although such studies might not impress academic purists, they will
more than meet the needs of business managers who are responsible for
the success of their enterprise.
REFERENCES
Babor, T. F., de la Fuente, J. R., & Grant, M. (1992). AUDIT: The Alcohol Use Disorders
Identification Test. Guidelines for use in primary health care. Geneva, Switzerland:
World Health Organization.
Barrett, P. (2001). Pre-employment integrity testing: Current methods, problems, and solu-
tions. Paper presented at a meeting of the British Computer Society: Information Secu-
rity Group, Milton Hill, Oxford, UK.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job perfor-
mance: A meta-analysis. Personnel Psychology, 44, 1–26.
Buss, A. H., & Perry, M. (1992). The aggression questionnaire. Journal of Personality and
Social Psychology, 63, 452–459.
Buss, D. (1993). Ways to curtail employee theft. Nation’s Business, 36–38.
Bensimon, H. F. (1994). Crisis and disaster management: Violations in the workplace.
Training and Development, 28, 27–32.
Camera, W., & Schneider, D. L. (1994). Integrity tests: facts and unresolved issues. Ameri-
can Psychologist, 49, 112–119.
Costa, P., & McCrae, R. (1992). Revised NEO Personality Inventory and NEO Five-Factor
Inventory: Professional Manual. Odessa, FL: Psychological Assessment Resources.
Goldberg, L. R., Grenier, J. R., Guion, R. M., Sechrest, L. B., & Wing, H. (1991). Question-
naires used in the prediction of trustworthiness in pre-employment selection decisions:
An APA task force report. Washington, D.C.: American Psychological Association.
Goodstein, L. D., & Lanyon, R. I. (1999). Applications of personality assessment to the
workplace: A review. Journal of Business and Psychology, 13, 291–322.
Goodstein, L. D., & Lanyon, R. I. (2002). Counterproductive Behavior Index (CBI): Techni-
cal manual., version 2.0. Amherst, MA: HRD Press.
Harper, D. (1990). Spotlight abuse—save profits. Industrial Distribution, 79, 47–51.
Jones, J. W., & Terris, W. (1991). Integrity testing for personnel selection. Forensic Reports,
4, 117–140.
Judge, T. A., Heller, D., & Mount, M. K. (2002). Five-factor model of personality and job
satisfaction. Journal of Applied Psychology, 87, 530–541.
Lanyon, R. I., & Goodstein, L. D. (1997). Personality assessment (3rd ed.). New York: Wiley.
Lehmann, W. E. K., Holcom, N. L., & Simpson, D. D. (1990). Employee health and perfor-
mance in the workplace: A survey of municipal workers in a large southwestern city.
Fort Worth: Texas Christian University, Institute of Behavioral Research.
Llobet, J. M. (2001). Applicant Risk Profiler: Administrator’s manual. Los Angeles: West-
ern Psychological Services.
McGurn, W. (1988, June 1). Spotting the thieves who work among us. Wall Street Journal,
p. 16a.
Murphy, K. R. (1993). Honesty in the workplace. Pacific Grove, CA: Brooks/Cole.
Ones, D. S. & Viewesaran, C. (1998). Integrity testing in organizations. In R. W. Griffin,
A. O’Leary-Kelly, & J. M. Collins (Eds.), Dysfunctional behavior in organizations: Vol.
2. Nonviolent behaviors in organizations. Greenwich, CT: JAI Press.
Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). The role of social desirability in person-
ality testing for personnel selection: The Red Herring. Journal of Applied Psychology,
81, 660–679.
Ones, D. S., Viswesaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analyses of
integrity test validities: Findings and implications for personnel selection and theories.
Journal of Applied Psychology, 78, 670–703.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1995). Integrity tests: Overlooked facts,
resolved questions, and unanswered questions. American Psychologist, 49, 456–457.
RICHARD I. LANYON AND LEONARD D. GOODSTEIN 553